Keep Calm and Get Planning: The PPDAC Cycle
Consider the image below.
What do you think when you see data in a spreadsheet? How does it make you feel? How would you pull meaning out of rows and columns of data? Now consider a situation where you must make sense of something that is new to you or presented in an unfamiliar format. Perhaps you have endless rows and columns of data with no explanation. In this situation, you have probably asked yourself where do I start?
Where to Start?
You can start with a methodology. The methodology is the PPDAC cycle (PPDAC -The Data Problem Solving Cycle – Data Education in Schools (dataschools.education). Now consider the image below. What is it? The PPDAC cycle is a methodology that can be used to answer real-world problems using data. The PPDAC cycle consists of the following stages: Problem, Plan, Data, Analysis, and Conclusion.
Before you delve into the data, ask what problems the data solves. This stage is crucial because it asks questions that define an analysis plan. For example, the dataset shown above contains television viewing figures (Lesson 1 – Delving Into Data Science | STEM). We can ask the following: What practical problems does this data solve? What questions result from examining the data? Who would be interested in the data and why? How can our analysis reflect the problems we want to solve? Perhaps we simply want to know if viewership is higher for some shows on portable devices such as mobiles and tablets. If that is the case, then our analysis is simple and focused. Finally, consider what story the data tells us. We want any results from the analysis to be engaging as well as useful.
Many questions can arise in this phase. But these questions should be specific and relate to available data.
In this stage, data collection, preparation and analysis are planned. Again, the actual data analysis has not started. Planning an analysis also includes any ethical considerations associated with the data such as access, permissions, and data protection.
In this step, the data is examined to determine accuracy and quality. The steps needed to clean and prepare the data for analysis are planned. Also, any questions about the data should be addressed such as the meaning of codes and categories, data format for analysis, and any fields or records that should be eliminated. For example, we may want to reclassify TV programmes into categories such as ‘Entertainment’, ‘Drama’, or ‘Soap’. We would need to ensure that any classification we use matches domain-specific classifications. The data phase is perhaps the most important phase because trust in the quality and accuracy of data leads to trust in the analysis results.
In this step, the data is cleaned and prepared for analysis. The data is then analysed. Analysis can include visualization, descriptive statistics, modelling, and evaluation of results. For example, we can use the TV viewing data to understand which programs are watched by device. The data is ranked in the dataset. The field ‘Rank’ shows the programs with the highest total viewership. In this dataset, the top 3 programmes are ‘Ant and Dec’s Saturday Night Takeaway, ‘Call the Midwife’ and ‘Death in Paradise’. However, when we look at programme viewership by device, we can see that ‘EastEnders’ has the highest viewership on mobile devices. The line graph shows that there is a steep drop in viewing numbers for all other programs. Similar patterns of results emerge when looking at laptop and tablet viewership.
By visualising data in this manner, differences in viewing by device can be examined. This examination brings up more questions about why certain programs are viewed on portable devices such as mobile, tablets and laptops.
In this phase, the findings are summarized. The initial problem, the data, data preparation, and modelling results are reviewed. Additionally, deployment and actioning of results can be planned, and any new questions raised by the analysis can be examined.
Putting it all together
Now look at the image that we started with. Using PPDAC means that we know what to do with the data and how we can turn data into insight and solutions.
The PPDAC cycle allows us to make sense of the data, plan our analysis and gain insight which leads to real-world solutions. So, the next time you are planning to analyse data, keep calm, have a plan, and use the PPDAC cycle.
Version 1 & SPSS Statistics Software
Allowing students to illustrate data accurately and communicate in a way that everyone can understand is down to the tool that is deployed, and this is where data analysis software such as SPSS can deliver real benefit.
Version 1’s experienced consultants are on hand to help you find the best software and license type for your analytical and usage requirements.
Contact us to discuss your needs and identify the best SPSS product for you.
Data Education in Schools (2022) PPDAC – The Data Problem Solving Cycle. Available at: PPDAC -The Data Problem Solving Cycle – Data Education in Schools (dataschools.education).
STEM Learning (2022) The Teach Computing Curriculum. Year 9 Resources: Data Science. Available at: Data Science | STEM.