About OR
OR Topics - Data Warehousing & Business Intelligence
BUILDING A DATA WAREHOUSE

In a typical data warehousing environment, the data flow sequentially through four stages, as illustrated in the diagram of components below. The development team must therefore design, build and maintain an integrated information supply chain to support these four key processes:

Stage 1 – Data capture

The data in the warehouse will usually come from a number of source systems. Most of the data required for reporting and analysis will originally have been entered through operational systems and stored in associated transactional databases. External data are often entered into spreadsheets or personal databases. In some cases, source data may be captured automatically e.g. web logs.

If new data are required, a suitable system may need to be built to capture them. Otherwise, only minor changes should be needed to existing systems, e.g. to tighten up validation procedures.

In all cases, the owners of source data are responsible for maintaining quality - and this may require substantial effort.

Stage 2 – Transformation & cleansing

Raw data from each source have to be extracted, merged and loaded into the data warehouse. As part of this process, data are usually re-structured to optimise subsequent use for querying, reporting and analysis. This is often done in stages, in a data staging area.

These data feeds need to be run on a regular basis to keep the data warehouse up-to-date. To minimise disruption to other systems, and warehouse users, this often has to be completed within a tight overnight time window.

Developing ETL procedures is by far the most difficult step in building a data warehouse, but using ETL tools can save time, in particular on long-term maintenance.

Stage 3 – Aggregation & analysis

Selected data are taken from the central warehouse using query tools and processed to produce useful results. Often, the most frequently accessed data are first summarised and stored in functional data marts (often in the form of OLAP cubes) to improve response times. Additional performance measures are typically derived at the same time. Analytic applications may also be developed to help users turn raw data into useful information.

Stage 4 - Presentation

Results from the previous stage are formatted for presentation and delivered, usually in the form of reports. Several different report types are normally needed to suit different types of user. The results might appear as text, tables or charts and could be viewed on-line, printed, published on a web server or distributed by email.

Querying, aggregation, analysis and presentation are usually handled using one or more reporting & analysis tools.

For further information on building a data warehouse, see the following:

BUILDING A DATA WAREHOUSE
Click Here
Components
Click Here
Design issues
Click Here
Techniques
 
Click Here Section Map Click Here RELATIONSHIP WITH OR Click Here MANAGING
Click Here OVERVIEW Click Here MAIN Click Here RESOURCES

© 2002 The OR Society

Top of Page