ry that retrieves all users by a specified city can be supported by means of an additional table where city is a key:
Index Table Example
Index table can be updated for each update of the master table or in batch mode, but, anyway, it is an additional performance penalty and a consistency issue.
Index Table can be considered as an analog of materialized views in relational databases.
Applicability: BigTable-style Databases
(8) Composite Key Index
Composite key is a very generic technique, but it is extremely beneficial when a store with ordered keys is used. Composite keys in conjunction with secondary sorting allows to build kind of multidimensional index which is fundamentally similar to the previously described Dimensionality Reduction technique. For example, let’s take a set of records where each record is a user statistics. If we are going to aggregate these statistics by a region the user came from, we can use keys in form (State:City:UserID) that allow to iterate records for particular state or city given that store supports selection of key ranges by partial key match (as BigTable-style systems do):
1 SELECT Values WHERE state="CA:*"
2 SELECT Values WHERE city="CA:San Francisco*"
Composite Key Index
Applicability: BigTable-style Databases
(9) Aggregation with Composite Keys
Composite keys may be used not only for indexing, but for different types of grouping. Let’s consider an example. There is a huge array of log records with information about internet users and their visits of different sites (click stream). The goal is to count a number of unique users for each site. This is similar to the following SQL query:
1 SELECT count(distinct(user_id)) FROM clicks GROUP BY site
We can model this situation using composite keys with UserID prefix:
Counting Unique Users using Composite Keys
The idea is to keep all records for one user collocated, so it is possible fetch such a frame into memory (one user can not produce too many events) and eliminate site duplicates using hash table or whatever. An alternative technique is to have one entry for one user and appends sites to this entry as events arrive. Nevertheless, entry modification is generally less efficient than entry insertion in the majority of implementations.
Applicability: Ordered Key-Value Stores, BigTable-style Databases
(10) Inverted Search – Direct Aggregation
This technique is more data processing pattern, rather than data modeling. Nevertheless, data models are also impacted by usage of this pattern. The main idea of this technique is to use an index to find data that meets a criteria, but aggregate data using original representation or full scans. Let’s consider an example. There is a number of log records with information about internet users and their visits of different sites (click stream). Let assume that each record contains user ID, categories this user belongs to (Men, Women, Bloggers, etc), city this user came from, and visited site. The goal is to describe the audience that meet some criteria (site, city, etc) in terms of unique users for each category that occurs in this audience (i.e. in the set of users that meet the criteria).
It is quite clear that search of users that meet the criteria can be efficiently done using inverted indexes like {Category -> [user IDs]} or {Site -> [user IDs]}. Using such indexes, one can intersect or unify corresponding user IDs (this can be done very efficient if user IDs are stored as sorted lists or bit sets) and obtain an au |