SAT Stats: Sampling I

Submitted by Tiffani on Tue, 02/20/2018 - 17:21

When I was in grad school, I took a research methods class in the public policy school. I had learned about different types of sampling at some point (Undergrad? High school? Who knows?), but this was the first time I had ever analyzed how samples worked and how taking different samples would affect what you could say about your research results. It was a great class -- and I still use information from that class when I think about studies that I read about in the news -- but the information felt like what one learns while getting a doctorate in sociology (or maybe a masters in public policy), not something that the average person knows.

So, imagine my surprise when a lot of that material showed up on the SAT.

The material on the SAT, of course, only scratches the surface of research design. But knowing about research design and sampling makes answering these questions a breeze. The problem is, most people never learn much about sampling or research design.

California middle schoolers learn that random sampling is when you pull people for research (or a survey) randomly, that convenience sampling is when you choose people who are easy to get, and that snowball sampling is when you take those convenient people and ask them to have their friends come take the survey too. Students who pay attention to these lessons leave middle school with a notion of the mechanics of sampling. But why, they still ask, does it matter?

It matters because when you do a survey or an experiement, you want to be able to talk about your results. And, your ability to talk about your results -- and have your results mean anything at all -- is dramatically constrained by your sample.

But, let's take a step back and define sample.

And before we can do that, we need to define population. A population is the entire group of people you'd like to talk about when you share the results of your research. A population can be all of the people in a country, all of the sophomores in a high school, everyone who takes the SAT, or all women who take daily vitamins. You can circumscribe a population however you want. But, you must define the group of people you want to talk about: your population.

After you have defined your population, you sample from it. Most populations are too big to survey everyone in it. So, we take a sample, and then we rely on probability to allow us to generalize to the entire population. How does that work? Probability tells us that if we choose a large enough random sample from a population, that sample should be statistically identical to the population.

How does that work? In the universe of flipping coins, you should get 50% heads and 50% tails. In a sample of 100 coin flips, according to theoretical probability, you should get 50 heads and 50 tails. Now, the real life sample could be off. You could get 49 heads, or 48. And that's sampling error. Samples have error. But, the numbers are close. And, when samples are random and big enough, actual statistics should be close enough to theoretically statistics that we can draw conclusions (and, typically, people who conduct research also report how big the error is). (Note: if you only flip a coin 4 times, you have a pretty high likelihood of getting something that is not even close to 50/50 -- that's because the sample is too small. Small samples have very high error. Flip the coin 10 times and you should get close to 50/50. Flip it 100 times and you should be even closer.)

So, if there are 500 sophomores in a high school, and we want to know what they think of their math teacher, if we randomly survey 100 of them, we can generalize to the entire class of 500 sophomores. And, in social science, and in the SAT, we assume that, as long as the sample isn't tiny, if 37% of the people surveyed think the math teacher is incompetent, we can bet that about 35-40% of the sophomores agree.

So, random samples are the holy grail. But, what about non-random samples? One thing that is important to know is that once a sample is non-random, those statistical assumptions go out the window. People generalize from non-random samples all the time, but they have no basis for doing it. Non-random samples really don't let you generalize beyond the people you talked to.

Why? Because the rules of probability rely on randomness. No randomness means no rules of probability.

But, even once you rule out a fully random sample, some partically random and some non-random samples are worst than others. There's an additional evil in sampling: selection bias. Non-random samples are statistically useless, but they are not deliberately misleading. And, even some selection bias can be benign. The worst types of selection bias are the types that are biased in ways that are related to the question at hand.

So, if we want to know what those sophomores think about their math teacher and we randomly survey those who are in the English building during 4th period, we don't have a random sample of sophmores, but we do have a random sample of sophmores who take English 4th period. Do we have any reason to believe that students who have English fourth period are more or less likely to dislike their math teacher than any other students? If we don't, then this sampling strategy might yield believable results.

On the other hand, if we were to survey just the students who have poor math grades, or just the students who have good math grades for that matter, we are biasing our answer. Students who struggle in math are probably more likely to dislike a math teacher than those who do well in math, and vice versa. Our question and our sample are related: that's the worst kind of sampling bias. And sometimes it's more subtle than that. What if we randombly survey students who are on campus 20 minutes before school starts? That might seem like a non-biased sample. But, because those are probably the most conscientious kids, the ones who come to school early, they might be the ones who are hardest on a teacher who is perceived to be incompetent. When you sample, and when you choose a sampling site, always think about how that sample/site might be biased and if that bias might affect how the respondants answer your questions.

The basic rules of sampling that SAT tests are this:

Biggers samples are better than smaller sample.
Random samples are better than non-random samples.
Non-random or partially random samples with a bias that is related to the topic being investigated are the worst.

Here's an example of an SAT sampling question from a released test:

A researcher conducted a survey to determine whether people in a certain large town prefer watching sports on television to attending the sporting event. The researcher asked 117 people who visited a local restaurant on a Saturday, and 7 people refused to respond. Which of the following factors makes it least likely that a reliable conclusion can be drawn about the sports-watching preferences of all people in the town?

A. Sample size
B. Population size
C. The number of poeple who refused to respond
D. Where the survey was taken

This one has an obvious wrong answer (B: you can't change the population size -- it is what it is), one pretty good answer (A: sample size can always be bigger!), one trick answer (C: is it bad for people to refuse to respond?), and one correct answer (D: it introduces bias to sample only in a restaurant: people who go out to eat may be more likely to go out to sporting events, while those who like to eat at home may also like to watch sports at home).

So, what about those refusals? Refusals can be a problem (especially if there are a lot of them!). In this case, there are not too many. And, we don't know anything about them. Refusals become a big problem if the people who refuse are systematically different from those who agree. So, for instance, if they were conducting a survey about sports preferences and were wearing jerseys that favored one team, then refusals might introduce a lot of bias (those who don't like the team that the surveyers are wearing might be more likely to refuse). But, in this case, we don't know if there was a trend in refusals (e.g., all of the old people refused or all of the men refused)-- and even these trends do not seem obviously biased in terms of the research question. We do know that there could be a relationship between the answer to the question and the survey site, so that's the answer that most obviously undermines this study (so, from a test prep perspective, this is a safer answer).

Obviously, there's a lot more to sampling than just how to avoid obvious bias, but it is Research Design 101 and it's fun to see SAT include it.

More on sampling and the conclusions you get to draw from your samples in the next post! But, for the time being, think about sampling when you read about studies and surveys. It can make you look at the results in a whole new way.

Tiffani's blog