The term could be broken down in to the following processes and steps, in this order:
- How to frame the question that needs to be answered?
- How to identify the data and the sources of that data, to analyse and to answer the question?
- How to locate and capture the data required for analysis?
- How to cleanse the data, mash-up the data or conduct 'data-munging', once the data has been located and collected?
- How to identify the right technology to capture and cleanse the data for analysis?
- How to choose the right selection of scientific methods to answer the given question and the order in which those scientific methods should be applied [machine learning algorithims included]?
- How to identify the appropriate technology to support the scientific analysis to be conducted
- How to correctly match the relevant data attributes from the source data, to the relevant parameters of each scientific method being used?
- How to interpret and relate the analytical results produced from the various scientific methods, back to the question being asked?
- How to visually represent the analytical results to the audience, in order to deliver the answer the original question and what is the best technology to use?
If these ten steps were to define the Laws of Data Science, what governance and standards should be put in place, to support these steps?
Look forward to your comments and suggestions.
Chair of the Analytics Network