dience. But description of audience which is similar to aggregation query like
1 SELECT count(distinct(user_id)) ... GROUP BY category
can not be handled efficiently using inverted index if number of categories is big. To cope with this one can build a direct index in form {UserID -> [Categories]} and iterate it in order to build a final report. This schema is depicted below:
Counting Unique Users using Inverse and Direct Indexes
And as a final note, we should take into account that random retrieva l of records for each user ID in the audience can be inefficient. One can struggle with this problem leveraging batch query processing. This means that some number of users sets can be precomputed (for different criteria) and then all reports for this batch of audiences can be computed in one full scan of direct or inverse index.
Applicability: Key-Value Stores, BigTable-style Databases, Document Databases
Hierarchy Modeling Techniques
(11) Tree Aggregation
Trees or even arbitrary graphs (with the aid of denormalization) can be modeled as a single record or document.
This techniques is efficient when the tree is accessed at once (for example, an entire tree of blog comments is fetched to show a page with a post).
Search and arbitrary access to the entires may be problematic.
Updates are inefficient in the most of NoSQL implementations (in comparison with independent nodes).
Tree Aggregation
Applicability: Key-Value Stores, Document Databases
(12) Adjacency Lists
Adjacency Lits is a straightforward way of graph modeling – each node is modeled as an independent record that contains arrays of direct ancestors or descendants. It allows to search for nodes by identifiers of their parents or children and, of course, traverse a graph doing one hop per query. This approach is usually inefficient for getting an entire subtree for some node, for deep or wide traversals.
Applicability: Key-Value Stores, Document Databases
(13) Materialized Paths
Materialized Paths is a technique that helps to avoid recursive traversals of tree-like structures. This technique can be considered as a kind of denormalization. The idea is to attribute each node by identifiers of all its parents or children, so is possible to determine all descendants or predecessors of the node without traversal:
Materialized Paths for eShop Category Hierarchy
This technique is especially helpful for Full Text Search Engines because it allows to convert hierarchical structure to flat documents. One can see in the figure above that all products or subcategories within the Men’s Shoes category can be retrieved using a short query which is simply a category name.
Materialized Paths can be stored as a set of IDs or as a single string of concatenated IDs. The later option allows to search for nodes that meet a certain partial path criteria using regular expressions. This option is illustrated in the figure below (path includes node itself):
Query Materialized Paths using RegExp
Applicability: Key-Value Stores, Document Databases, Search Engines
(14) Nested Sets
Nested sets is a standard technique for tree-like structures modeling. It is widely used in relational databases, but it is perfectly applicable to Key-Value Stores and Document Databases. The idea is to store leafs of the tree in array and map each non-leaf node to a range of leafs using start and end indexes, as it shown in the figure below:
Modeling of eCommerce Catalog using Nested Sets
This structure is pretty |