share
interactive transcript
request transcript/captions
live captions
download
|
MyPlaylist
WALTER PECK: This episode is going to show you, I hope, or convince you that statistics are important, that statistical analysis has a role. Yes, I know there's an old saw that you can prove anything with a statistical test, or the idea that stats, because it's mathematical, is hard and therefore should be avoided. But really, it can't be avoided in science.
For example, let's imagine you have a drug where without the drug, there's a 25% survival rate, and with the drug in some sample you've done with a test group, there's a 27% survival rate. The question is, is that difference significant? Is that enough of a difference that the uncertainties of small sample size don't make it irrelevant? In other words, does it matter, have we had an effect?
That's what statistical analysis does for us. So it's a very powerful set of tools which we really have to understand if we want to understand science, particularly biological science.
I need to start, in order to do this, with a little examination of the nature of science and the scientific method. I know some of you are probably thinking, oh, I've heard this a gazillion times. But I need to do a little bit just to remind you.
First thing I want to mention about science is that science is empirical, objective, and tentative when done well. Empirical means it's based upon observation. Anything that could be detected with any kind of detector, whether it's something physical you and I were endowed with, like our eyes, or, say, radiation detector. Empirical means that it's something that we can observe, and sometimes numerical, sometimes categorical-- observable.
The second thing is that science is objective. If I have a ruler, and I'm measuring how long this pen is, and I get, I don't know, 8.7 centimeters, you should get 8.7 centimeters also within a little bit of uncertainty. In other words, it's independent of you and me. And we'll get into what the alternative is on that in just a second.
The third thing about science that I think that's really important to understand is it's tentative. Tentative means we're not getting capital T permanent Truth. We're getting lowercase t, temporary truth. It's our best guess, our best explanation, of the empirical world that we have now. Maybe tomorrow we're going to get some new results. Maybe tomorrow we'll have a better explanation. That's why Einstein took over from Newton.
So science is tentative. It doesn't mean it's useless. It's very useful. But it is not looking for capital T Truth.
Just because something does not follow these three basic categories doesn't mean it's not a useful human endeavor. For example, art. I love art. Van Gogh, I love his stuff. If he hadn't lived, the art that we enjoy of his work wouldn't exist. Why? Art is subjective. The individual matters.
Another one. Empirical. Empirical means observable. Well, there are some things in human life that are not observable that are very important-- things like rational thought, whole segments of philosophy, mathematical reasoning. Just because it's not based upon observation doesn't mean it's not important.
And finally, tentative. Tentative, like I said, means that tomorrow we might change it. There are whole sections of human endeavor, like religion, morals, which arguably are based upon capital T Truth. Doesn't mean they're valuable. Just means they're a different way of approaching the human condition.
I'm going to go through a little bit about the scientific method. I want to make sure you realize it's not hard and fast. It's not like a scientist wakes up in the mornings and says today, I'm doing step three, and tomorrow, step 4.a. It also is not something which you have to have all your x's crossed and your t's crossed and your i's dotted ahead of time. You can do it flexibly, as long as you understand that this is the ideal that you're seeking, and realize that if you're not doing the ideal, you've got some explaining to do.
The first thing in the scientific method is to look around the universe and find a question. Most importantly, find a question that's interesting to you. If you don't find something that's interesting to you, you're not going to have a passion for it.
Also, make sure it's a question about something that's measurable. Now, it could be measurable with numerical data, like how much does that person weigh, or how bright is that star? But it could also be something that's categorical, like is it red or is it blue? Or it could be something that seemingly isn't measurable at all by numbers, like attitudes. Survey research in the social sciences, they oftentimes set up scales where you go from one to seven, where one is dislike greatly or something like that, and seven is like greatly. And now you can put numbers on things that, at first blush, seem like they're not numerical.
In any case, you want to find a question that's interesting and has something that's measurable. And for the sake of this particular set of videos that I'm doing, for the tetrahymena studies, make sure it's a question that's doable. Keep it limited. Keep it simple. Simple studies in a limited amount of time you probably have are more likely to have a successful outcome in terms of finding something interesting or finding that something does not exist. Those simple studies are more likely to be successful than something that's trying to do too much.
After you've found a question, do some background research. And if it's an interesting question, there's probably other people who have done stuff before. And science is a communal undertaking. We have a whole community of people doing really great research. Do some background research. Ask your teacher for help there in terms of accessing sources.
On the basis of the question and the research you've done, you can come up with a hypothesis. Get a good hypothesis. A hypothesis, you've probably heard, is called an educated guess. I think that's a little bit more than a guess. You're trying to base it upon the research that's already been done.
Oftentimes, hypotheses are in if/then form. If this is true, then blah, blah, blah. A hypothesis attempts to test your tentative understanding of the question you're interested in. So you want to make a hypothesis that's testable and says something interesting.
At that point, the hard work starts. You design the experiment. When you design the experiment, you're setting up what are called the protocols. Again, I want to emphasize there's no way that anybody designing an experiment, at least in my experience, is so smart that they can predict every eventuality, every problem that arises. So it has to be flexible. But as far as possible, make it as precise and predictive as you can.
You'll be thinking about, as you design the experiment, things like what will you do? Will it be a controlled experiment? Will it be observational? How many times will you do it? What equipment will you need? What will you measure? What will you do with the data, and how, statistically, will it be analyzed, if appropriate? In other words, you want to think ahead as to what it is you're going to be doing and thinking about how that fits in to how good science is done, and try, as far as possible, to predict the limitations. At that point, you run the experiment, collect the data.
Through the years, I've noticed that one of the biggest problems with my students is brilliant thoughts, great hypotheses, crappy data collection techniques. What does that mean? You've got to write down the numbers with units and labels in an organized fashion. Have a data table ahead. Oftentimes a teacher will want to approve that before you actually do your study.
Now we're coming to the meat of today's talk, and that is analyze the data. This is where statistics come in. And again, like I told you a few minutes ago, science is tentative. We're not getting final truth. What we're doing is we're taking numbers in a very limited period of time and a very limited sample, and trying to look at the results and see what they tell us about the fundamental reality that we're really interested in.
What we get is how significant is the result. In other words, what's the probability we got it right? And that's what statistics is all about.
For example, imagine we have some scale. Remember I told you about the social science skill, one being strongly disagree and seven, strongly agree, and you got the numbers in between? Imagine males on this scale for some question had 4.2, and the females are 4.0. And let's say we did 100 males and 100 females. Is that a significant difference? Does that indicate the males and females have a difference of opinion on this issue?
Or imagine we have categorical data. Let's make this two kinds of bread, bread a and bread b. You do a study, and 38% of the people liked bread a and 36% liked bread b. Two categories, clearly either one or the other. Is that enough of a difference to indicate that you should be marketing bread a rather than bread b? You get the idea.
So on the basis of the kind of data you have, you have to decide what kind of test to run. I'm going to be doing two basic kinds of tests, and then a potpourri of other kinds of tests, so you know, statistics is an amazingly complex undertaking.
Like I said, I think you can understand the basic ideas. But there's lots and lots of ins and outs. There are many, many different kinds of statistical tests, and all sorts of details which I can even begin to cover in this little amount of time that I have with you today or on other videos. But I am going to give you a couple of options.
The first is something called a t-test. A t-test is when you have two samples and the data is numerical. Let's say you have males, and you have weight. And here's the average, x-bar, the sample of the males, and here is x-bar for the sample of females. And the question is, is that a significant difference? That's what t-tests do for you.
If, on the other hand, you have categorical data, like the bread example a minute ago, or there's two categories, it's either yes or no, or the individual's either male or female, and you want to now use that categorical data to determine significance, chi-square tests. They're relatively easy to run, and fun to do. And sometimes you can fudge things a little bit so numerical data end up being a chi-square test. Use your judgement.
But sometimes there are other tests that you might want to try because those first two categories don't cover what you're doing. One is called an ANOVA, a one-way analysis of variance. It's on the calculator. You can look it up from there and let the calculator do the work. It's like a t-test, only when you have more than two samples. It's numerical.
Again, an example would be let's say you have fertilizer of 1x, concentration, 2x concentration, 3x concentration, 4x concentration. And you want to find the effect upon the productivity of your bean field. Well, you run those four tests. And now you have four different sample groups. And how you want to compare the results-- quantitative, numerical. Rather than doing the t-test a bunch of times, you do something called an ANOVA that will look for whether or not there's any significant differences.
Sometimes your interest is, if I change x, what happens to y? There are two basic ways of analyzing that, a simpler and a little more complex. Correlation just says if x goes up, does y go up? If x goes up, does y go down? Doesn't talk about causality or the amount that y goes up or down as x goes up and down. It's just simply, do they go up and down together?
If, on the other hand, you're more interested in how does y change as x change, you're looking for regression. Regression fits a best-fit line or curve, and tells you something about the trend between the two variables.
Finally, on the scientific method, the last thing you need to do-- maybe not the last thing, but a very important late step is to disseminate the results. As I said before, science as a social activity. There's a community of people. And that's how we really make progress. We stand on each other's shoulders.
As part of the NIH-funded ASSET Program, students and teachers in middle and high school science classes are encouraged to participate in student-designed independent research projects. Veteran high school teacher Walter Peck, whose students regularly engage in independent research projects, presents this series of five videos to help teachers and students develop a better understanding of basic statistical procedures they may want to use when analyzing their data.