《The Data Warehouse Toolkit》(Second Edition)
The Complete Guide to Dimensional Modeling
Ralph Kimball
Margy Ross
--Different Information Worlds
One of the most important assets of any organization is its information. This
asset is almost always kept by an organization in two forms: the operational
systems of record and the data warehouse. Crudely speaking, the operational
systems are where the data is put in, and the data warehouse is where we get
the data out.
操作型数据库侧重是把数据放进去哪里,而数据仓库侧重是从哪里获得数据。
--Goals of a Data Warehouse
1 The data warehouse must make an organization’s information easily accessible.
2 The data warehouse must present the organization’s information consistently.
3 The data warehouse must be adaptive and resilient to change.
4 The data warehouse must be a secure bastion that protects our information assets.
5 The data warehouse must be a secure bastion that protects our information
assets.
6 The data warehouse must serve as the foundation for improved decision
making.
7 The business community must accept the data warehouse if it is to be
deemed successful.
--数据仓库的目标:
1 数据仓库必须使系统性的信息更加容易获取;
2 数据仓库必须持续不断的呈现系统性的信息;
3 数据仓库必须对变化具有适应性和弹性;
4 数据仓库必须是一个保护信息资产的安全壁垒;
5 数据仓库必须作为改善决策制定的基础;
6 如果数据仓库被认为很成功,那么它们就必须接受数据仓库。
--Components of a Data Warehouse
1 Operational Source Systems
The source systems maintain little historical data, and if you have a
good data warehouse, the source systems can be relieved of much of the
responsibility for representing the past.
操作型数据源系统不应当展示历史数据。
2 Data Staging Area
The key architectural requirement for the data staging area is that it is off-limits to
business users and does not provide query and presentation services.
Unfortunately, some data warehouse
project teams have failed miserably because they focused all their
energy and resources on constructing the normalized structures rather than
allocating time to development of a presentation area that supports improved
business decision making.
不幸的是,一些数据仓库项目团队因为集中他们所有的精力和资源用来构建标准数据结构而不是分配足够的时间
去开发数据展现层用来支持改善决策制定。
It is acceptable to create a normalized database to support the staging processes;
however, this is not the end goal. The normalized structures must be off-limits to
user queries because they defeat understandability and performance. As soon as a
database supports query and presentation services, it must be considered part of the
data warehouse presentation area. By default, normalized databases are excluded
from the presentation area, which should be strictly dimensionally structured.
3 Data Presentation
We typically refer to the presentation area as a series of integrated data marts.
A data mart is a wedge of the overall presentation area pie.
We have several strong opinions about the presentation area. First of all, we
insist that the data be presented, stored, and accessed in dimensional schemas.
Our second stake in the ground about presentation area data marts is that they
must contain detailed, atomic data.
首先,我们坚持用多维数据模型展现,存储,访问数据;
其次,数据展现层的数据集市必须包括详细的原子数据。
Data in the queryable presentation area of the data warehouse must be dimensional,
must be atomic, and must adhere to the data warehouse bus architecture.
If the presentation area is based on a relational database, then these dimensionally
modeled tables are referred to as star schemas. If the presentation area
is based on multidimensional database or online analytic processing (OLAP)
technology, then the data is stored in cubes.
4 Data Access Tools
We use the term tool loosely to refer to the variety of capabilities
that can be provided to business users to leverage the presentation area for
analytic decision making.
A data access tool can be as simple as an ad hoc query tool or as complex as a
sophisticated data mining or modeling application.
数据访问工具可以简单到一个广告查询工具,也可以复杂到数据挖掘或建模应用。
Operational Data Store
Most commonly, an ODS is implemented to deliver operational reporting,
especially when neither the legacy nor more modern on-line transaction processing
(OLTP) systems provide adequate operational reports.
The ODS as a reporting instance may be a steppingstone
to feed operational data into the warehouse.
In other cases, ODSs are built to support real-time interactions, especially in customer
relationship management (CRM) applications such as accessing your
travel itinerary on a Web site or your service history when you call into customer
support.
Fact Table
1 A row in a fact table corresponds to a measurement. A measurement is a row in a
fact table. All the measurements in a fact table must be at the same grain.
2 The most useful facts in a fact table are numeric and additive.
3 We often describe facts as continuously valued mainly as a guide for the
designer to help sort out what is a fact versus a dimension attribute.
4 It is theoretically possible for a measured fact to be textual; however, the condition
arises rarely.The designer should make every effort to put textual measures into dimensions because
they can be correlated more effectively with the other textual dimension attributes and
will consume much less space.
5 It is very important that we do not try to fill the fact table with zeros representing
nothing happening because these zeros would overwhelm most of our fact tables.
6 Fact tables express the many-to-many relationships between dimensions in dimensional
models.
Dimension Tables
1 Dimension tables are the entry points into the fact table. Robust dimension attributes
deliver robust analytic slicing and dicing capabilities. The dimensions implement
the user interface to the data warehouse.
2 The best attributes are textual and discrete. Attributes should consist of real
words rather than cryptic abbreviations.
Dimensional Modeling Myths
Myth 1. Dimensional models and data marts are for summary data only.
Myth 2. Dimensional models and data marts are departmental, not enterprise, solutions.
Myth 3. Dimensional models and data marts are not scalable.
Myth 4. Dimensional models and data marts are only appropriate when there is a
predictable usage pattern.
Myth 5. Dimensional models and data marts can’t be integrated and therefore lead
to stovepipe solutions.
Common Pitfalls to Avoid
Pitfall 10. Become overly enamored with technology and data rather than
focusing on the business’s requirements and goals.
Pitfall 9. Fail to embrace or recruit an influential, accessible, and reasonable
management visionary as the business sponsor of the data warehouse.
Pitfall 8. Tackle a galactic multiyear project rather than pursuing more manageable,
while still compelling, iterative development efforts.
Pitfall 7. Allocate energy to construct a normalized data structure, yet run
out of budget before building a viable presentation area based on dimensional
models.
Pitfall 6. Pay more attention to backroom operational performance and ease
of development than to front-room query performance and ease of use.
Pitfall 5. Make the supposedly queryable data in the presentation area overly
complex. Database designers who prefer a more complex presentation
should spend a year supporting business users; they’d develop a much
better appreciation for the need to seek simpler solutions.
Pitfall 4. Populate dimensional models on a standalone basis without regard
to a data architecture that ties them together using shared, conformed
dimensions.
Pitfall 3. Load only summarized data into the presentation area’s dimensional
structures.
Pitfall 2. Presume that the business, its requirements and analytics, and the
underlying data and the supporting technology are static.
Pitfall 1. Neglect to acknowledge that data warehouse success is tied directly
to user acceptance. If the users haven’t accepted the data warehouse as a
foundation for improved decision making, then your efforts have been
exercises in futility.