About OR
OR Topics - Data Warehousing & Business Intelligence
RESOURCES
Glossary of data warehousing terms

Ad hoc query

A once-off data extraction exercise to answer an unplanned question. Also see report types.

Aggregate

Data stored at a summary level to save space and/or improve query performance.

Analytic applications

Applications which enable the user to take warehouse data and further manipulate it in order to make better-informed decisions. Examples could include demand forecasting, pricing, competitor analysis and customer segmentation.

CHAID (Chi-squared Automated Interaction Detector)

A market segmentation technique designed to split a population into two or more categories based on available attributes e.g. gender, employment status, race, age band etc. The results are often presented in the form of a tree.

Clickstream analysis

Analysis of data derived from web activity logs to reveal the sequence of page "hits". Usually needs to be linked to other customer data to derive useful insights.

Closed loop applications

Applications which enable authorised users to save data they have derived back into the warehouse for future reference. Common examples include budgets, forecasts and performance targets, which can then be compared to actual data when they become available. Credit scores, customer segments and allocated costs might also be prepared by analysts and then incorporated in the warehouse for general use.

Conformed dimensions

Standardised dimensions that can be used across the whole organisation. A conformed dimension table used with different fact tables helps to ensure consistency between data marts.

Cube

A multi-dimensional representation of data that can be viewed from different perspectives.

Data cleansing

The process of correcting errors and removing inconsistencies before importing data into the warehouse.

Data dictionary

The index to all data in the warehouse, and home of meta-data.

Data mart

Similar to a data warehouse, but holding only the data needed for a specific business function or department e.g. sales analysis. Also see concepts.

Data staging

The process of extracting, transforming, loading and checking data on its way from source system to data warehouse. Copies may be stored at intermediate steps in a data staging area. Also see ETL and concepts.

Data warehouse

A central repository containing stable, accurate, consistent, clearly understood data that are needed for management information and decision making across the whole organisation. Also see concepts.

Data warehousing

The process of creating a vision, planning, designing, building, using, maintaining, managing and enhancing a data warehouse and/or data mart(s).

Decision processing

A data warehouse supports "decision processing" when it becomes part of a closed loop, analytic application integrated into routine operational procedures.

Dimension

A business perspective useful for analysing data. A dimension usually contains one or more hierarchies that can be used to drill up or down to different levels of detail. Typical dimensions include product, customer, time, department, location and channel.

Dimensional modelling

The process of identifying the dimensions required for analysis, defining the hierarchies and levels they contain, and making sure they conform.

Dimension table

A table containing the data for one dimension within a star schema. The primary key is used to link to the fact table, and each level in the dimension has a corresponding field in the dimension table.

Dormant data

Warehouse data that is never accessed. Can be archived to save disk space.

Drill down

A tool feature that allows the user to obtain more detailed data at the click of a mouse. In OLAP, users would drill down to the next level in a dimension hierarchy.

Drill through

Another tool feature that allows the user to obtain more detailed data. Having drilled down to the lowest level of detail available in a given cube or report, users may need to drill through to another cube or report, e.g. to view the basic transaction level data related to the selected cell.

Entity

In a relational database, each table contains data describing the corresponding business object or "entity", e.g. a customer, order or order line.

Entity relationship modelling

The process of identifying relevant business entities, describing the relationships between them and listing their associated attributes. The resulting model can be represented as an entity relationship diagram, and defines the logical structure for a relational database.

ETL (Extract, Transformation & Load)

The process of extracting data from source systems, transforming this into the required structure and loading into the data warehouse. ETL tools are available to assist with this process.

Fact table

The central table in a star schema, containing the basic facts or measures of interest.

Dimension fields are also included (as foreign keys) to link to each dimension table.

Fat client (see also thin client)

A systems architecture in which fully functional software is installed on each user’s PC. OLAP performance depends largely on the processing power and memory of the PC.

Filter

A selection criterion applied to limit the data to be included in a report or analysis.

In SQL, filters are expressed as WHERE clauses, e.g. ‘WHERE COUNTRY = UK’. With OLAP tools, filters are changed automatically by slicing and dicing, and by drilling up and down.

Granularity (or grain)

The level of detail of the facts (or measures) stored in the data warehouse.

Hierarchy

An organisation of data into a logical tree structure defining parent-child relationships between the levels in a dimension. Controls data consolidation and drill down paths.

A typical time dimension would have a hierarchy based on date, week, month, quarter and year.

Measure

A numeric value stored in a fact table or cube. Typical examples include sales value, sales volume, price, stock and headcount.

Meta data

Data describing the data held in the warehouse. This may include a description of the tables and fields in the warehouse, what they mean, similar descriptions of the source data and a mapping between the two, details of the transformations made, the reliability of each item, when it was last updated etc. Also see concepts.

MOLAP (Multi-dimensional OLAP)

On line analytic processing based on data stored in a multi-dimensional database such as Hyperion’s Essbase or TM/1.

Multi-dimensional database

A database in which cubes are physically stored in a format optimised for OLAP. Most use a proprietary structure that usually can only be accessed by the vendor’s own OLAP tools.

Normalisation

The process of assigning each attribute to exactly one table (based on the entity it describes) so as to eliminate all data duplication. A key step in the design of most on-line transaction processing (OLTP) systems.

NUMA (Non-uniform memory architecture)

A relatively recent innovation in parallel processing, combining elements of SMP and MPP.

OLAP (On line analytic processing)

An intuitive approach to the analysis of patterns and trends in data, characterised by the representation of that data in multi-dimensional cubes, which normally include summary data at one or more levels. The cubes may be stored in a multi-dimensional database (see MOLAP) or as star schemas in a relational database (see ROLAP).

The term was coined to distinguish the way a data warehouse is often used to track business performance from the way transactional (OLTP) systems are used in the day to day running of the business. It has since become synonymous with the class of software tools that facilitate this approach.

Also see concepts

OLTP (On line transaction processing)

Operational or OLTP systems are designed to optimise the creation and updating of individual records. Typical transactions might represent the addition of new customers, taking orders or recording a change of address. Reporting and analysis systems, by contrast, typically need to summarise large numbers of individual transaction records, and perform better if summary data is pre-calculated and stored (see aggregates).

ODS (Operational data store)

A subject-oriented, integrated, frequently updated store of detailed data needed to support transactional systems with integrated data, e.g. the current master customer list shared by several systems. If available, the ODS is often a good source for dimension data.

When the term was first coined, the ODS was expected to support queries requiring data at the most detailed level available, which typically had to be excluded from the warehouse to keep size within reasonable bounds. Thus the ODS would contain maximum detail and be refreshed in real time, whereas the warehouse would be lightly summarised and refreshed at regular intervals.

Nowadays, hardware costs are less of a limiting factor, and it is more usual to restructure even the most detailed data for query and analysis and include them in the warehouse, possibly in near real time. Thus an ODS is less likely to be queried directly by users, and acts more like a source system.

Parallel processing (SMP, MPP)

A computing architecture that can handle many queries simultaneously. There are two main variations:

Symmetric multi-processing (SMP) involves linking identical processors in a cluster with shared disk and memory, and then allocating queries so as to balance the load on each processor. This approach improves throughput, but not the time taken to execute a given query once it reaches the front of the queue. SMP works well for ad hoc queries running against small to medium data warehouses, but bottlenecks on shared resources tend to limit scalability.

Massively parallel processing (MPP) involves linking and co-ordinating many similar processors and then splitting individual queries into multiple work packages that can be processed simultaneously and then glued back together. This approach improves individual query response times, and is truly scaleable, but requires all software to be specially designed to subdivide queries into appropriate packages. MPP is better for standard queries running against large data warehouses

Portal

A gateway providing a single point of access and a common interface to all information and applications. Usually accessed through a simple web browser.

Power user

A user with good analysis skills who makes frequent use of the data warehouse, and is capable of writing their own ad hoc queries. Also see types of user.

Query

A short computer program that extracts data from a database, usually returning a list of records matching the specified criteria. Usually written in SQL, or as a script which is translated into SQL when run.

Query governor

Software that manages the execution of queries. Allows an administrator to prevent the database server being overloaded by limiting the amount of data retrieved and/or the time taken. More experienced users may be granted higher limits.

Referential integrity

A set of rules imposed in a relational database to ensure consistency between data held in different tables. For example, every employee must have one and only one department. To prevent the creation of ‘orphans’, a department record cannot be deleted until all corresponding employee records have been deleted or amended.

Relational database

A database in which data are stored in tables corresponding to business entities such as customers or orders. The relationships between these entities are then used to create access paths to each piece of data as it is needed (by joining the corresponding tables).

Contrast this to a hierarchical database, in which the access path is determined by the physical record structure.

ROLAP (Relational OLAP)

On line analytic processing based on data stored in a relational database such as DB2, Oracle or Microsoft SQL Server, usually designed with one or more star schemas.

Semantic layer

Part of the software architecture that uses meta-data to translate physical database field names into readily understood business terminology for use in reporting and analysis. (See also data dictionary)

Slice & dice

A feature of OLAP tools that enables users to filter and rotate (or pivot) a cube to look at any cross-section as if it were a 2 dimensional spreadsheet.

Slowly changing dimensions

Dimensions such as customer and employee, in which attributes like address or job title may change irregularly and infrequently. Also see temporal inconsistency.

SQL (Stuctured Query Language)

An industry standard programming language for extracting data from any relational database. Syntax is English like, but is only easy to understand for relatively simple queries.

Star schema

A simple form of relational database in which a large, central fact table is joined to several smaller dimension tables which act as lookups. For example, an orders table might be joined to master product, customer, calendar and department tables.

Table

A two-dimensional array in which each column (or field) represents an attribute of an entity, and each row (or record) represents an occurrence of that entity.

Thin client (see also fat client)

A system architecture in which most software components run on a shared server, with minimal use of the client (or user PC). Performance depends more on network capacity and server power. Software usually requires less maintenance.

Time-boxing

An approach to project management that sets a series of strict, short-term deadlines for the completion of deliverables. The scope or resources input may be flexed, but the essential concept is to get as much as possible done in the time available.

Touchpoint

Any point of contact with a customer, including telephone calls, letters, faxes, email, web site visits and face to face meetings.

The following have also published glossaries on the world wide web:

Click Here Section Map Click Here RELATIONSHIP WITH OR Click Here MANAGING
Click Here OVERVIEW Click Here BUILDING Click Here MAIN

© 2002 The OR Society

Top of Page