设为首页 加入收藏

TOP

NoSQL的数据建模技术(八)
2014-11-24 07:39:20 来源: 作者: 【 】 浏览:19
Tags:NoSQL 数据 建模 技术
efficient for immutable data because it has small memory footprint and allows to fetch all leafs for a given node without traversals. Nevertheless, inserts and updates are quite costly because addition of one leaf cause extensive update of indexes.

Applicability: Key-Value Stores, Document Databases

(15) Nested Documents Flattening: Numbered Field Names
Search Engines typically work with flat documents, i.e. each document is a flat list of fields and values. The goal of data modeling is to map business entities to plain documents and this can be challenging if entities have complex internal structure. One typical challenge is a mapping of documents with hierarchical structure, i.e. documents with nested documents inside. Let’s consider the following example:


Nested Documents Problem

Each business entity is some kind of resume. It contains person’s name, a list of his or her skills with a skill level. An obvious way to model such entity is to create a plain document with Skill and Level fields. This model allows to search for person by skill or by level, but queries that combine both fields are liable to false matches, as depicted in the figure above.

One way to overcome this issue was suggested in [4.6] The main idea of this technique is to index each skill and corresponding level as a dedicated pair of fields Skill_i and Level_i and search all these pairs simultaneously (number of OR-ed terms in a query is as high as a maximum number of skills for one person):


Nested Document Modeling using Numbered Field Names

This approach is not really scalable because query complexity grows rapidly as a function of number of nested structures.

Applicability: Search Engines

(16) Nested Documents Flattening: Proximity Queries
The problem with nested documents can be solved using another technique that was also described in [4.6]. The idea is to use proximity queries that limit acceptable distance between words in the document. In the figure below, all skills and levels are indexed in one field, namely, SkillAndLevel, and query tells that words “Excellent” and “Poetry” should follow one another:


Nested Document Modeling using Proximity Queries

Work [4.3] describes a success story for this technique used on top of Solr.

Applicability: Search Engines

(17) Batch Graph Processing
Graph databases like neo4j are exceptionally good for exploring a neighborhood of a given node or exploring relationships between two or few nodes. Nevertheless, global processing of large graphs is not very efficient because general purpose graphs databases do not scale well. Distributed graph processing can be done using MapReduce and Message Passing pattern that was described, for example, in one of my previous articles. This approach makes Key-Value stores, Document databases, and BigTable-style databases suitable for processing of large graphs.

Applicability: Key-Value Stores, Document Databases, BigTable-style Databases

References
Finally, I provide a list of useful links related to NoSQL data modeling:

Key-Value Stores:
http://www.devshed.com/c/a/MySQL/Database-Design-Using-KeyValue-Tables/
http://antirez.com/post/Sorting-in-key-value-data-model. html
http://stackoverflow.com/questions/3554169/difference-between-document-based-and-key-value-based-databases
http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
BigTable-style Databases:
http://www.slideshare.net/
首页 上一页 5 6 7 8 下一页 尾页 8/8/8
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇NoSQL的特点 下一篇Vbox创建COM对象失败

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容:

·HyperText Transfer (2025-12-26 07:20:48)
·半小时搞懂 HTTP、HT (2025-12-26 07:20:42)
·CPython是什么?PyPy (2025-12-26 06:50:09)
·Python|如何安装seab (2025-12-26 06:50:06)
·python要学习数据分 (2025-12-26 06:50:03)