3.4.1 Dimensional databases
Although the relational database schema model allows for great flexibility in defining ways to look at and process the data in the database, in a business situation, we often find that the way data is processed is different, especially by decision makers. Decision makers are not interested in the details of every single transaction recorded in the database, instead, they are interested in looking at the \"big\" picture. For instance, instead of looking at individual customer records in the operational database, which may include data about the customer\'s phone, address, etc., they are interested in knowing the number of customers per quarter or at the end of each month. As an example, let\'s take a look at the transaction table in an operational banking database shown in Fig 3-3 below.
Notice how the Transaction table is linked to many other tables through relationships. In the diagram, we show only the links to the Customer and Cost tables. However, all but two fields in the Transaction table are foreign keys, linking to other tables. The three non-foreign key fields are trans_amount, trans_type and time-stamp.
Fig 3-3. Relational model of a banking transactional system
As the OLTP system is used over time, the table grows in size and becomes full of transaction data, sometimes for the same customers, conducting similar transactions with different amounts and at different costs. Because of the large amount of data it holds and the versatility of the types of activities stored, the table becomes a good candidate for business analysis.
The business analyst may be interested in finding out the effect the transaction cost of ATM withdrawals had on the revenue. To get such results, the business analyst may ask a SQL developer to write a, possibly complex, query that would extract the data from the OLTP system and put it in the format they want. An example query may look like the code presented below:
SELECT cost_amount AS Cost,
COUNT (transaction_id) AS [Number of Transactions],
cost_amount * COUNT (transaction_id) AS Revenue
FROM Transaction, Cost
WHERE Cost.trans_cost_id = Transaction.trans_cost_id
AND Transaction.trans_type = \'ATM\'
GROUP BY cost_amount
It is needless to say how time and resource consuming it is to follow this path whenever the business analyst wants to get some summaries and aggregations of data. This is a simplified version of what could happen in the real world. In production OLTP systems, such a query may be much larger, involving many table joins that would affect the speed at which the results will return and affect the performance of people using the operational database for everyday important business transactions.
To solve this problem, a separate database would be built to represent the business facts more accurately. This database will be fed from the operational database, but after the data has been transformed to fit the new structure. This is because the schema of this database will not be relational; instead, it will be dimensional. Such database can have what is called a star or a snowflake schema, or a combination of both.
In these schemas, a fact table is usually joined to a number of \"dimension\" tables. For our example, we may build a Transaction table to record daily transactions of products offered by a given store. This table is referred to as the fact table. The Transaction table includes the facts of the business activities identified by dimensions. The facts are measures of activities. In our example, the transaction amount, trans_amount, is the only measure of facts in the Transaction table, just as it usually is in a typical relational database. Dimensions help put the facts in context and represent such things as time, service, customer, and location. In our example, these dimensions include time, branch, customer, and service. Fig 3-4 shows a representation of the star database schema.
The sales fact table, called Transaction in the diagram, captures transactions on a daily level for each branch, for all customers, and for all services. This table, therefore, will grow very large. To improve the efficiency with which data is retrieved from the database, aggregates are pre-computed at different levels and stored in the database or in an optimized format.
Fig 3-4. Dimensional model of the bank system in Fig 3-3
The tables linked to the fact table are called dimensions. They are used to generate the aggregations from the fact table. For instance, you can find the total monthly transaction fees of all services to all customers by all branches when you query the Transaction table grouping by month of the year. Or, you can find the total transaction fees by state at all times, for all customers, and for all services when you query the sales table grouping on state. You can have aggregations on a combination of the dimensions in the sales fact table.
3.4.2 OLAP cubes
The most important part of the Analysis server architecture is the OLAP cubes. These cubes usually represent the data. The dimensions of the cube represent the dimensions of the fact table. Each cell in the cube represents a fact corresponding to a level of detail for the different dimensions of the cube. Although the graphical representation of the cube can only show three dimensions, a data cube can have up to 128 dimensions when using SQL Server\'s Analysis Services.
Fig 3-5. OLAP cube
Figure 3-5 shows a representation of a data cube for the transaction table with the branch, service, and time dimensions shown. If you want to use this cube to find out the total mortgage transaction fees collected for branches in Michigan (MI) during 1998 for the mortgage service category, you need to look at the shaded cell, which is the resulting cell from the intersection of the three mentioned dimensions.
The Analysis server works with the Microsoft Decision Support Services (DSS) to allow you to build your cube from any source of data that has an OLE DB provider. This source can be a relational database in any database management system that has an ODBC driver, such as Oracle or Sybase SQL Server, or has a native OLE DB provider, such as SQL Server, or Microsoft Access. The data source for the cube can also be a dimensional database, or text files.
3.4.3 Cube partitions
Partitions represent logical divisions of cube data broken down by the values of a particular dimension or the data source. For example, you can have a partition on the transaction cube based on the state values creating partitions for regions like Midwest or Northeast. A cube will always contain at least one partition.
A partition can be saved separately on a different disk drive from the original cube. This allows you to place data that is not used or queried frequently on slower storage media, or alternatively place frequently queried data on higher speed storage media. Partitions can also be distributed and stored on different Analysis servers, providing a clustered approach to cube storage and distributing workload across Analysis servers. The partitions stored on other Analysis servers than the one that stores the meta data for them are called remote partitions.
3.4.4 Linked cubes
The analysis server allows linking two or more OLAP cubes that share common dimensions. This process creates linked cubes. An example of a shared dimension is time. Linked cubes take advantage of the shared dimensions to create links among data sets that seem unrelated. Linked cubes require no additional storage, and can link cubes based on different storage types. For instance, you can create a linked cube that links the employee and transaction cubes, even if the employee cube is stored in the relational database and the transaction cube in the multidimensional database. The validity of these relationships is closely associated with the quality of the raw data that supports them.