efficient for immutable data because it has small memory footprint and allows to fetch all leafs for a given node without traversals. Nevertheless, inserts and updates are quite costly because addition of one leaf cause extensive update of indexes.
Applicability: Key-Value Stores, Document Databases
(15) Nested Documents Flattening: Numbered Field Names
Search Engines typically work with flat documents, i.e. each document is a flat list of fields and values. The goal of data modeling is to map business entities to plain documents and this can be challenging if entities have complex internal structure. One typical challenge is a mapping of documents with hierarchical structure, i.e. documents with nested documents inside. Let’s consider the following example:
Nested Documents Problem
Each business entity is some kind of resume. It contains person’s name, a list of his or her skills with a skill level. An obvious way to model such entity is to create a plain document with Skill and Level fields. This model allows to search for person by skill or by level, but queries that combine both fields are liable to false matches, as depicted in the figure above.
One way to overcome this issue was suggested in [4.6] The main idea of this technique is to index each skill and corresponding level as a dedicated pair of fields Skill_i and Level_i and search all these pairs simultaneously (number of OR-ed terms in a query is as high as a maximum number of skills for one person):
Nested Document Modeling using Numbered Field Names
This approach is not really scalable because query complexity grows rapidly as a function of number of nested structures.
Applicability: Search Engines
(16) Nested Documents Flattening: Proximity Queries
The problem with nested documents can be solved using another technique that was also described in [4.6]. The idea is to use proximity queries that limit acceptable distance between words in the document. In the figure below, all skills and levels are indexed in one field, namely, SkillAndLevel, and query tells that words “Excellent” and “Poetry” should follow one another:
Nested Document Modeling using Proximity Queries
Work [4.3] describes a success story for this technique used on top of Solr.
Applicability: Search Engines
(17) Batch Graph Processing
Graph databases like neo4j are exceptionally good for exploring a neighborhood of a given node or exploring relationships between two or few nodes. Nevertheless, global processing of large graphs is not very efficient because general purpose graphs databases do not scale well. Distributed graph processing can be done using MapReduce and Message Passing pattern that was described, for example, in one of my previous articles. This approach makes Key-Value stores, Document databases, and BigTable-style databases suitable for processing of large graphs.
Applicability: Key-Value Stores, Document Databases, BigTable-style Databases
References
Finally, I provide a list of useful links related to NoSQL data modeling:
Key-Value Stores:
http://www.devshed.com/c/a/MySQL/Database-Design-Using-KeyValue-Tables/
http://antirez.com/post/Sorting-in-key-value-data-model.
html
http://stackoverflow.com/questions/3554169/difference-between-document-based-and-key-value-based-databases
http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
BigTable-style Databases:
http://www.slideshare.net/ |