Use of Fact Tables for Data Warehousing

Filed under: General — admin at 2:43 pm on Thursday, May 22, 2008

The central factual transaction table is called the fact table. The surrounding reference tables are called the dimension tables. Put them together and you can imagine them representing a star: hence the name, star schema. The idea is that most queries first of all analyze the data surrounding the fact table to minimize the number of rows that need to be scanned. This is done in order to address the greatest performance hit on queries: specifically, the volume of data that has to be read off disk. By reducing the total volume of data to be scanned, we reduce the time it takes to execute a query.

Because, in most cases, the highest volume of data will be the fact data, it makes sense to reduce it by minimizing the actual set ofrows that need to be extracted from the fact table. This is done by ensuring that the much smaller dimension tables are used first, in order to filter out the transactions in the fact table that are not required to satisfy the query. In relational technology, this is achieved by joining the dimension tables prior to joining to the fact table. If the Cartesian product of the dimensions is not produced before scanning the fact table, the performance of the query will be disappointing.