### Collecting and analysing the data

Data collection started in 1994. Works Consultancy staff collected data from thirteen types of buildings, including office buildings, schools, theatres, swimming pools and shopping plazas. 27 locations were surveyed, and multiple samples were collected from facilities that were expected to be particularly variable.

Most of the data was collected electronically, with arrival times measured by the time of cutting a pair of infrared beams. Occupancy times were measured either by magnetic switches on cubicle doors or infrared beams. Loggers normally used for rain-gauges were used to record the data. These were left in each location for at least three weeks. The result was a data set that far exceeds, in scope and scale, anything else we had seen. Gender ratios and total building occupancies had also to be measured to express the results in terms of the number of occupants. Where possible buildings with known numbers of occupants were used. Otherwise numbers were estimated by surveys. To ensure that at least the required performance criterion was met, average peak arrival rates (measured over 15-minute intervals) were used as inputs to the models. Employing peak rates also allowed for buildings where the arrivals were particularly ‘bursty’ - schools, theatres for example.

### The models

The main performance criterion we were to use was that the 90th percentile of the waiting time distribution was to be no more than 60 seconds. It is not clear where this criterion came from - possibly a chance remark of mine during an early demonstration of queueing calculations. However by the time I returned to the project in 1996 it had become fixed in the contract. This was a little unfortunate, as although tail points of waiting time distributions are popular performance criteria, in contrast to mean waiting times for example, they are difficult to calculate theoretically except for queues where the arrivals occur as a Poisson process and the service or occupancy times are negative exponentially distributed.

In queueing theory models with these characteristics and C servers or facilities are known by the shorthand notation of ‘M/M/C’. Now while arrival processes did appear reasonably Poisson, at least over intervals where the arrival rate was constant, occupancy times were usually much less variable than negative exponential, with coefficients of variation between those of Erlang-2 and Erlang-3. So strictly a model of the M/G/C class, where the service time distribution can have any general form, was required. The one exception to this was urinal occupancy times, which were uncannily negative exponential!

There are very few easy theoretical results for M/G/C models, so my first approach was a simulation model (in GPSS/H). This was quite short - about 50 lines of program - and allowed occupancy times to be more accurately modelled than simply assuming that they were exponential. It also had the advantage that aspects like the use of WC's in substitution for urinals (by men) could also be modelled. The difficulties, as usual with simulation, were that: the results were highly variable; and it was difficult and very slow to use the simulation program in an inverse way - to find a level of input parameter (the arrival rate) that will produce a specified output (90th percentile of waiting time equal to 60 seconds). It was clear that hundreds of hours of simulation would be needed.

I had also given Works Consultancy a spreadsheet M/M/C model (Figure 1), which could calculate the 90th percentile of the waiting time for these models directly. The formulas for this can be found in any queueing book (Gross and Harris Section 2.3, for example). In particular the waiting time distribution has a specific exponential distribution, and so the 90th percentile can be found by taking logs of this. This can be seen as the currently active formula in cell C13. With a bit of tweaking the ‘Goal Seek’ tool in Excel would even solve for the mean interarrival time (cell C2) which will give a 90th percentile of the waiting-time distribution of exactly 60 seconds (C13). It should be noted that this is an early version of the spreadsheet. It was decided as a matter of policy that all users would wash their hands!