Data literacy is increasingly important as our access increases. Many of us walk around with powerful computers in our pocket (or, more honestly our hands) at all times. The first and last things we do in our days? Scroll through digital media.
Yet, while the volume of data we’re interacting with increases dramatically, algorithms continually limit the perspectives we encounter. When we use data out of context or blindly accept the data we see, we compound the problem.
To help you think critically about data, we’ve put together this checklist of data literacy best practices.
Why we use data
As discussed in our first article in this series, people use statistics and quantitative data for many reasons. These include:
- Reducing uncertainty and guesswork — e.g. a retail manager might use sales data to better understand when to schedule staff and what inventory to stock or put on sale.
- Solving problems — e.g. analyzing historic data for an organization’s interactions with different vendors in the supply chain might help a logistics manager identify bottlenecks or weak links.
- Tracking progress — e.g. establishing key performance metrics and then measuring those goals can help assess what’s working or needs refinement.
- Making plans — e.g. analyzing past patterns can illuminate larger trends to anticipate a future action or event. With data revealing new opportunities or shortcomings, a utility might optimize its energy production for sustainability gains.
Whatever your motivation, data literacy is essential.
Questions for those presenting and publishing data
Effective data collection and analysis is a true skill. No wonder the U.S. Bureau of Labor Statistics (BLS) has data scientists making a median annual wage of $108,020 in May 2023. The BLS also projects this career field will grow 36 percent from 2023 to 2033, much faster than the average for all occupations.
Yet even if you’re not trained in data science, you may want to present and publish data to establish credibility and make a point. Before using data in that pitch to a client, presentation to your boss, or campaign to customers, consider the following prompts.
1. Where is the data from?
Identifying the source of the data will help you evaluate the data’s quality. For example, you might see the statistic that 87% of teenagers want parents to limit their phone time. That could really help someone marketing an app that tracks phone activity. Yet a carefully designed survey of 20,000 teens at several high schools around the country could be cited more reliably than a finding that comes from asking 10 teens related to the investigator. Or, if the above app’s marketer is the source of the data, which serves their business goals, that could make you think twice about the objectivity of the research.
2. Who is represented in this data and who is missing?
Continuing the discussion above, this question addresses sampling bias and representation. Incomplete or skewed datasets can lead to conclusions that disadvantage or ignore certain groups. For example, if the data a healthcare system is using to determine its new patient offerings only comes from its urban locations, rural health patterns might be underrepresented.
3. What was the original purpose of collecting this data?
It can take an extra step or two, but investigating the original context of the data collection can help you determine its usefulness to you. For instance, a credit union trying to gather data about the primary reason for customer interactions with its after hours call center employees wouldn’t then want to take the findings from that study to make decisions about in-branch during-business-hours member priorities. And a grocery store manager couldn’t simply assume that the customer priorities would apply to their context as well.
4. How might my own biases affect my interpretation of this data?
Acknowledging personal biases is essential for maintaining objectivity. Say you’re analyzing graduation rates at a community college. Your analysis of the data might be shaped by your own perceptions. A number of types of bias could have impact, including:
- Experiential bias. If you graduated from college easily, following on the heels of your siblings and several generations, you might dismiss evidence that first-generation college students suffer from access to support that helps them understand the system.
- Confirmation bias. If you believe that certain student populations aren’t as academically prepared for college, you might attribute data around graduation rates to test scores and grades rather than willingness to attend or awareness of support services.
- Institutional bias. Maybe you’re doing this analysis because you work at a community college. As a result, you have a preexisting idea of how students behave in the system. Because of what you’ve seen at your institution, you’ll fail to consider other explanations of the data.
By asking yourself how your own biases could be impacting your interpretation and presentation of the data, you open yourself up to identify assumptions and explore other possible explanations.
Questions to ask when consuming data
Every day people use best practices for analyzing data and statistics to understand the world and drive action. Before relying on data to make a personal choice or professional decision, ask the following questions to avoid being misled by deceptive or incomplete data.
1. What is the source of this data?
Yes, we’re starting in the same place again. But, this is critical to evaluating credibility. Data from a middle schooler’s science project is not going to go through the same rigorous vetting that the Census Bureau does, for example.
Try to determine not only where the data came from but also how the data was verified. By looking into the methodology you can better evaluate the validity of the source and whether bias could be a concern. After all, a pharmaceutical company publishing data about their own drug’s effectiveness has different incentives than an independent research lab studying the same drug.
2. How current is this data?
A study on the efficiency of computers would have quite different results today than 30 years ago. Generally, you want to look for more current data as it is more likely to be relevant to your contemporary context.
Of course, if you are using data to determine trends and forecast future events, you’ll need to go to historical data (but look for reputable sources with an appropriate context and purpose).
3. What documentation is available?
“AI provides excellent answers 100% of the time.” Wow, that’s amazing. But, wait, how did that get determined? Reputable sources will document their methodology, define their terms, even outline their potential biases and any study limitations.
That statement about AI sounds a lot different when you learn that the researchers only asked one AI tool two questions, and they were basic arithmetic questions.
Looking at the details of the data you can also determine whether it is actually meaningful. A beauty company might tell you that its skin cream is 25% more effective. Yet if that’s referring to a one-point difference on a 100-point scale, that’s not actually statistically significant.
4. Does the person presenting this information have an agenda?
Whether you’re seeing a news headline, advertisement, or social media post, understanding who’s presenting the data and their motivations is crucial. A micro-brewery that posts about the health benefits of drinking lagers daily might be presenting skewed data. If the World Health Organization presented the same data you might want to go visit your local micro-brewery because the WHO’s mission is to promote world health and keep people safe.
5. Is this correlation rather than causation?
Ok, we’re going to get a bit more technical here, but this one’s important. You’ve probably seen a headline or social post telling you something like “people who do X live longer.” You think, “ooh I should do that, too.”
But, the question is whether doing X actually caused people to live longer. Or could the difference be explained by something else? Maybe people who do X also have healthier eating habits and exercise more often. The fact that they do X may be related (a correlation), but it may not be the direct cause (causation) of the longer lifespan.
6. Is this consistent with other data?
If you’re surprised by a statistic or data, that could be indicative of a larger problem. When you encounter a finding that doesn’t align with other reputable sources — or common sense — you will likely want to do a lot more digging before believing that data.
Do more (better!) with data literacy
We can claim that you’re 99% more likely to use or consume data effectively by asking yourself these questions, but you’ll notice that we don’t have any documentation to back that up.
Still, we can say with confidence that it’s important to make the effort to use and consume data intentionally. Whichever side of the screen, presentation, paper or other communication you’re on, ask yourself the questions in this checklist. Added vigilance can help you analyze data effectively, use it responsibly, and make data-driven decisions more confidently.
At Sogolytics, we take data and data literacy seriously. Work with our experts to ensure you’re developing surveys that are not only usable but also elicit reliable and relevant responses. Contact our team today!
FAQs
Q: What are the best practices for improving data literacy?
A: Always verify sources, check for biases, understand the methodology, and cross-reference findings with other credible sources.
Q: Why is data literacy important in decision-making?
A: Data literacy helps reduce uncertainty, track progress, and make data-driven decisions that improve outcomes.
Q: How can biases affect data interpretation?
A: Personal, confirmation, and institutional biases can skew how data is understood, leading to misleading conclusions.
Q: What is the difference between correlation and causation in data?
A: Correlation means two things are related, while causation means one directly affects the other. Misinterpreting this can lead to false assumptions.
Q: How can businesses apply data literacy best practices?
A: Organizations should ensure proper data collection, use credible sources, and encourage a culture of critical thinking and verification.