Data Analysis

Shofa Jannah | Jul 14, 2023 min read

Additionally, it is important to acknowledge that data analysis has its foundations in statistics, a field with a rich and extensive past.

The origins of statistics can be traced back to ancient Egypt, specifically during the construction of the pyramids, as noted by archaeologists. The ancient Egyptians possessed exceptional skills in data organization, meticulously recording their calculations and theories on papyri, akin to early versions of spreadsheets and checklists.

Modern data analysts owe a great deal of gratitude to these ingenious scribes, whose contributions paved the way for a more advanced and streamlined analytical process.

Data Analysis Life Cycle

The data analysis life cycle encompasses the journey from raw data to informed decision-making. Throughout this cycle, data undergoes various stages, including creation, consumption, testing, processing, and reuse. By adopting a life cycle model, all crucial team members can contribute to success by strategically organizing their tasks from the beginning to the conclusion of the data analysis process. Although the specific phases of the data analysis life cycle may vary among experts, there are common foundational elements that exist within every data analysis process.

EMC’s data analysis life cycle

EMC Corporation’s data analytics life cycle is cyclical with six steps:

  1. Discovery
  2. Pre-processing data
  3. Model planning
  4. Model building
  5. Communicate results
  6. Operationalize

For more information, refer to this e-book, Data Science & Big Data Analytics

SAS’s iterative life cycle

An iterative life cycle was created by a company called SAS, a leading data analytics solutions provider. It can be used to produce repeatable, reliable, and predictive results

  1. Ask
  2. Prepare
  3. Explore
  4. Model
  5. Implement
  6. Act
  7. Evaluate

The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol. Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare, Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed to help analysts evaluate their solutions and potentially return to the ask phase again.

For more information, refer to Managing the Analytics Life Cycle for Decisions at Scale

Project-based data analytics life cycle

A project-based data analytics life cycle has five simple steps:

data

  1. Identifying the problem
  2. Designing data requirements
  3. Pre-processing data
  4. Performing data analysis
  5. Visualizing data

This data analytics project life cycle was developed by Vignesh Prajapati. It doesn’t include the sixth phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same steps as the life cycles we have already described. It begins with identifying the problem, preparing and processing data before analysis, and ends with data visualization.

For more information, refer to Understanding the data analytics project life cycle

Big data analytics life cycle

Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases divided into nine steps:

  1. Business case evaluation
  2. Data identification
  3. Data acquisition and filtering
  4. Data extraction
  5. Data validation and cleaning
  6. Data aggregation and representation
  7. Data analysis
  8. Data visualization
  9. Utilization of analysis results

This life cycle appears to have three or four more steps than the previous life cycle models. But in reality, they have just broken down what we have been referring to as Prepare and Process into smaller steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data before the analysis phase.

For more information, refer to Big Data Adoption and Planning Considerations