About OR
OR Topics - Data Warehousing & Business Intelligence
MANAGING A DATA WAREHOUSE
Planning a data warehouse initiative

The key steps in developing a data warehouse can be summarised as follows:

  1. Project initiation
  2. Requirements analysis
  3. Design (architecture, databases and applications)
  4. Construction (selecting and installing tools, developing data feeds and building reports)
  5. Deployment (release & training)
  6. Maintenance

It is advisable to conduct a pilot exercise before embarking on a full-scale development effort. This will include most of the above steps, and provides an opportunity to:

  • understand new concepts and processes, and identify potential problems;
  • make more realistic plans and manage expectations;
  • evaluate alternative tools;
  • demonstrate benefits and gain management commitment.

Testing should be an integral part of construction, not a separate step in the development process.

Project initiation

No data warehousing project should commence without:

  • a clear statement of business objectives and scope;
  • a sound business case, including measurable benefits;
  • an outline project plan, including estimated costs, timescales and resource requirements;
  • high level executive backing, including a commitment to provide the necessary resources;

A small team is usually set up to prepare and present a suitable project initiation document. This is normally a joint effort between business and IT managers. If the organisation has limited data warehousing experience, it is useful to obtain external advice at this stage. If the project goes ahead, the project plan and business case should be reviewed at each stage.

It is widely regarded as good practise to develop a data warehouse in small, manageable phases (see pitfalls). Thus the analysis, design, construction and deployment steps will be repeated in cycles.

It is generally a good tactic to provide something that is not already available during the first phase, as this will help to stimulate real interest. This could be new data or enhanced functionality. It is also better to start with something relatively easy, which the warehousing team can deliver whilst still learning the ropes.

See project management techniques for more information on relevant methodologies and useful references.

Requirements analysis

Establishing a broad view of the business’ requirements should always be the first step. The understanding gained will guide everything that follows, and the details can be filled in for each phase in turn.

Collecting requirements typically involves 4 principal activities:

  • Interviewing a number of potential users to find out what they do, the information they need and how they analyse it in order to make decisions. It is often helpful to analyse some of the reports they currently use.
  • Interviewing information systems specialists to find out what data are available in potential source systems, and how they are organised.
  • Analysing the requirements to establish those that are feasible given available data.
  • Running facilitated workshops that bring representative users and IT staff together to build consensus about what is needed, what is feasible and where to start.

Design

The goal of the design process is to define the warehouse components that will need to be built. The architecture, data and application designs are all inter-related, and are normally produced in parallel.

Architecture design

The warehouse architecture describes all the hardware and software components that form the data warehousing environment and explains:

  • how the components will work together;
  • where they are located (geographically and on what platform);
  • who uses them;
  • who will build and maintain them.

The architecture needs to be considered at the outset, as this provides a framework for the selection of tools and the detailed design of individual components during the first and subsequent phases of development.

Data design

This step determines the structure of the primary data stores used in the warehouse environment, based on the outcome of the requirements analysis. It is best to produce a broad outline quickly, and then break the detailed design into phases, each of which usually progresses from logical to physical:

The logical design determines what data are stored in the main data warehouse and any associated functional data marts. There are a number of data modelling techniques that can be used to help.

Once the logical design is established, the next step is to define the physical characteristics of individual data stores (including aggregates) and any associated indexes required to optimise performance (see database optimisation).

The data design is critical to further progress, in that it defines the target for the data feeds and provides the source data for all reporting and analysis applications.

Application design

The application design describes the reports and analyses required by a particular group of users, and usually specifies:

  • a number of template report layouts;
  • how and when these reports will be delivered to users;
  • the functional requirements for the user interface.

There may be one or more applications associated with each data mart or phase of development.

Construction

Warehouse components are usually developed iteratively and in parallel. That said, the most efficient sequence to begin construction is probably as follows:

1) Tool selection & installation

Selecting tools is best carried out as part of a pilot exercise, using a sample of real data. This allows the development team to assess how well competing tools handle problems specific to their organisation, and to test system performance before committing to purchase.

The most important choices are the:

Clearly these need to be compatible, and it is worth checking reference sites to make sure they work well together.

It pays to define standards and configure the development, testing and production environments as soon as tools are installed, rather than waiting until development is well underway. Most vendors are willing to provide assistance with these steps, and this is normally well worth the investment.

2) Data staging system

This comprises the physical warehouse database, data feeds and any associated data marts and aggregates. The following steps are typical:

  • Create target tables in the central warehouse database;
  • Request initial and regular extracts from source systems;
  • Write procedures to transform extract data ready for loading (optionally creating interim tables in a data staging area);
  • Write procedures to load initial data into the warehouse (using a bulk loader);
  • Create and populate any data marts;
  • Write procedure to load regular updates into the warehouse;
  • Develop special procedures for a once-off bulk load of historic data;
  • Write validation/exception handling procedures;
  • Write archiving & backup procedures;
  • Create a provisional set of aggregates;
  • Automate all regular procedures;
  • Document the whole process.

However thorough the design process, problems with the real data are bound to surface at this stage. Substantial time should be allowed to resolve any issues that arise, establish appropriate data cleansing procedures (preferably within the source systems environment) and to validate all data before they are released for live use.

3) Application development

This step can begin once a sample or initial extract has been loaded, but it is usually best to leave the bulk of application development until the underlying data mart (or part of the central warehouse) and associated meta-data (especially object names) are stable.

It is a good idea to involve users in the development of reports and analytic applications, preferably through prototyping, but at least by asking them to carry out acceptance testing. Most modern business intelligence tools do not require programming, so it is possible for non-IT staff to build some of their own reports as well.

Deployment

It is too often assumed that the first version of a data warehouse can be rolled out in a matter of weeks, simply by showing all the users how to use the new reporting tools.

In practice, training needs to cover not just the basic use of the tools, but also the data that have been made available, and, more significantly perhaps, the new business processes or different ways of working that are intended. This training usually works best if delivered on a one-to-one basis.

As well as training, planning for deployment needs to cover:

  • Installing and configuring desktop PCs - any hardware upgrades or amendments to the ‘standard build’ need to be organised well in advance;
  • Implementing appropriate security measures - to control access to applications and data;
  • Setting up a support organisation to deal with questions about the tools, the applications and the data. However thoroughly the data were checked and documented prior to publication, users are likely to spot anomalies requiring investigation and to need assistance interpreting the results they obtain from the warehouse and reconciling these with existing reports;
  • Providing more advanced tool training later, when users are ready, and assisting potential power users to develop their first few reports.

If the first users find errors and inconsistencies in the data, don’t feel comfortable with the tool or can’t be bothered to learn how to use it properly, or won’t accept new procedures and responsibilities, all the time spent building the warehouse may ultimately be wasted. The following guidelines will help to reduce these risks:

  • Do not start deployment until the data are ready (available and validated) and the tools and update procedures have been tested;
  • Use a small, representative group to try out the finished system before rolling out, including users with a range of abilities and attitudes;
  • Do not grant system access to users until they have been trained.

Maintenance

A data warehouse is not like an OLTP system: development is never finished, but follows an iterative cycle (analyse – build – deploy). Also, once live, a warehousing environment requires substantial effort to keep running. Thus the development team should not anticipate handing over and moving on to other projects, but to spend half of their time on support and maintenance.

The most important activities are:

  • Monitoring the realisation of expected benefits;
  • Providing ongoing support to users (see deployment);
  • Training new staff;
  • Assisting with the identification and cleansing of dirty data;
  • Maintaining both feeds & meta-data as source systems change over time;
  • Tuning the warehouse for maximum performance (this includes managing indexes and aggregates according to actual usage);
  • Purging dormant data;
  • Recording successes and using these to continuously market the warehouse.

In addition, mechanisms need to be established to manage growth, in particular the prioritisation of requested enhancements, which often require the addition of further data sources.

Click Here Section Map Click Here RELATIONSHIP WITH OR Click Here MAIN
Click Here OVERVIEW Click Here BUILDING Click Here RESOURCES

© 2002 The OR Society

Top of Page