data models can be considered as a side branch of evolution that origins from the Ordered Key-Value models. Graph databases allow to model business entities very transparently (this depends on that), but hierarchical modeling techniques make other data models very competitive in this area too. Graph databases are allied to Document databases because many implementations allow to model value as a map or document.
General Notes on NoSQL Data Modeling
The rest of this article describes concrete data modeling techniques and patterns. As a preface, I would like to provide a few general notes on NoSQL data modeling:
NoSQL data modeling often starts from the application-specific queries as opposed to relational modeling:
Relational modeling is typically driven by structure of available data, the main design theme is ”What answers do I have ”
NoSQL data modeling is typically driven by application-specific access patterns, i.e. types of queries to be supported. The main design theme is ”What questions do I have ”
NoSQL data modeling often requires deeper understanding of data structures and algorithms than relational database modeling does. In this article I describe several well-known data structures that are not specific for NoSQL, but are very useful in practical NoSQL modeling.
Data duplication and denormalization are the first-class citizens.
Relational databases are not very convenient for hierarchical or graph-like data modeling and processing. Graph databases are obviously a perfect solution for this area, but actually most of NoSQL solutions are surprisingly strong for such problems. That is why the current article devotes a separate section to hierarchical data modeling.
Although data modeling techniques are basically implementation agnostic, this is a list of the particular systems that I had in mind working on this article:
Key-Value Stores: Oracle Coherence, Redis, Kyoto Cabinet
BigTable-style Databases: Apache HBase, Apache Cassandra
Document Databases: MongoDB, CouchDB
Full Text Search Engines: Apache Lucene, Apache Solr
Graph Databases: neo4j, FlockDB
Conceptual Techniques
This section is devoted to the basic principles of NoSQL data modeling.
(1) Denormalization
Denormalization can be defined as copying of the same data into multiple documents or tables in order to simplify/optimize query processing or to fit user’s data into particular data model. Most techniques described in this article leverage denormalization in one or another form.
In general, denormalization is helpful for the following trade-offs:
Query data volume or IO per query VS total data volume. Using denormalization one can group all data that are needed to process a query in one place. This often means that for different query flows the same data will be accessed in different combinations. Hence we need to copy the same data and increase total data volume.
Processing complexity VS total data volume. Modeling-time normalization and consequent query-time joins obviously increase complexity of the query processor. Especially in distributed systems. Denormalization allows to store data in a query-friendly form to simplify query processing.
Applicability: Key-Value Stores, Document Databases, BigTable-style Databases
(2) Aggregates
All major genres of NoSQL provide soft schema capabilities in one way or another:
Key-Value Stores and Graph Databases typically do not pose constraints on values, so value can be of arbitrary format. It is also possib |