|
In a typical data warehousing environment, the data flow
sequentially through four stages, as illustrated in the diagram
of components below.
The development team must therefore design, build and maintain
an integrated information supply chain to support these
four key processes:
Stage 1 – Data capture
The data in the warehouse will usually come from a number
of source
systems. Most of the data required for reporting
and analysis will originally have been entered through operational
systems and stored in associated transactional databases.
External data are often entered into spreadsheets or personal
databases. In some cases, source data may be captured automatically
e.g. web logs.
If new data are required, a suitable system may need to
be built to capture them. Otherwise, only minor changes
should be needed to existing systems, e.g. to tighten up
validation procedures.
In all cases, the owners of source data are responsible
for maintaining quality - and this may require substantial
effort.
Stage 2 – Transformation & cleansing
Raw data from each source have to be extracted, merged
and loaded into the data
warehouse. As part of this process, data are usually
re-structured to optimise subsequent use for querying, reporting
and analysis. This is often done in stages, in a data
staging area.
These data feeds need to be run on a regular basis to keep
the data warehouse up-to-date. To minimise disruption to
other systems, and warehouse users, this often has to be
completed within a tight overnight time window.
Developing ETL
procedures is by far the most difficult step in
building a data warehouse, but using ETL tools can save
time, in particular on long-term maintenance.
Stage 3 – Aggregation & analysis
Selected data are taken from the central warehouse using
query tools and processed to produce useful results. Often,
the most frequently accessed data are first summarised and
stored in functional
data marts (often in the form of OLAP cubes)
to improve response times. Additional performance measures
are typically derived at the same time. Analytic
applications may also be developed to help users
turn raw data into useful information.
Stage 4 - Presentation
Results from the previous stage are formatted for presentation
and delivered, usually in the form of reports. Several different
report
types are normally needed to suit different types
of user. The results might appear as text, tables or
charts and could be viewed on-line, printed, published on
a web server or distributed by email.
Querying, aggregation, analysis and presentation are usually
handled using one or more reporting
& analysis tools.
For further information on building a data warehouse, see
the following:
|