share
interactive transcript
request transcript/captions
live captions
download
|
MyPlaylist
WALTER PECK: Well, at this point, I hope you've watched the video on the ideas behind the T-test. I'm now going to reinforce how the T-test is done. And the whole idea here is to show you how you could do it and show you how you could do it manually if you want. But graphing calculators these days have really good statistical packaging-- includes doing T-test and giving you the significance levels that you want.
Again, let me remind you, the whole idea of a T-test is to give you some idea of whether you should accept your H0 and, therefore, reject your Ha, or accept your Ha and reject your H0. So again, I want to remind you we're using data for a hypothesized species of ancient human beings. And we're trying to decide whether the males and the females have the same average size or, on average, have a different size. Our H0 was that the size are the same. And our Ha was that the sizes are different.
Here's just part of a sample. And I'm going to assume that there's only 10 male individuals in these samples and 10 female individuals in these samples. That sounds like a small sample, but for fossiled humans, that's absurdly large. But I want to show you the ideas. So imagine we have 10 individuals in the male sample.
And then these are the masses in kilograms. All the numbers on this table have mass as their variable, and the unit is kilograms. The Xi's are the individual observations. So the first individual's 29.2 kilograms, and 34.1 kilograms, and on we go.
The second column is the deviation from the mean for each of the individual Xi's. That's Xi minus X bar. Well, that tells us we've got to figure out what X bar is. Well, X bar is pretty easy. Think about it.
If you have two 100s on quizzes and a 70 on your third quiz, and you want to know what your average, your mean is, you'd say, let's see, 100 plus 100 plus 70 is 270 divided by 3, the sample size, the n. 270 divided by 3 is 90.
So I can do the same thing here to get the X bar for that sample. X bar is equal to the sum of the individual values divided by the sample size. And I just created these numbers-- 30.0 kilograms.
So now we have X bar. So now I just simply subtract Xi minus X bar all the way down. The negatives don't really matter to me, honestly. I threw them in there just so you wouldn't get confused. For our calculation of standard deviation, we don't just need the deviations here, we need the deviation squared. And you can see that's where the negatives disappear. So negative 0.8 squared is 0.64, and on we go all the way down.
The whole reason why I did that was to figure out what the standard deviation is for this particular sample. Remember, standard deviation is basically how spread out your normal distribution is. So on the top of here-- this is all inside the square root-- we have the sum of the deviation squared divided by n, the sample size, minus 1. In other words, if I came back to my data table, and I had all 10 Xi's here, and I summed, added up, this column here, I get the sum of the deviation squared. That is the numerator here. And it's 62.27 in my make-believe sample divided by sample size minus 1. Take the square root, and there it is.
For those of you familiar with the calculation of standard deviation of a population sigma, the numbers are different, the calculations are different. Don't worry about it. This is standard statistical mathematics. It's what your calculator does for you when you hit the button.
The whole point of all that was to derive a table for the males and the females, the two subpopulations that I'm comparing, in terms of their X bars, their average masses, the standard deviations of the samples, the s's, and the sample sizes. There are the numbers I got from the male sample-- 30.0, 2.63, and 10. I'm going to pretend that I have also done the exact same thing for the female sample-- 27.2, 2.40, and 10. Those are the numbers you need to calculate something called a T statistic.
A T statistic is a really important number when doing a T-test. It lets you decide whether or not you can accept your H0 or accept your Ha. Let me show you what the T statistic calculation looks like. It's really not as difficult as it looks at first blush. The T is equal to the difference in the two population means. In our case, the X bar for the females and the X bar for the males-- doesn't matter which is which-- divided by s1 squared over n1, whole term added to s2 squared over n2.
So we need to do this calculation all the values here. So again, doesn't matter which is which. So this'll be X bar 1, s1, n1, X bar 2, s2, n2. At that point, it becomes very routine, and we just put numbers in.
So what's the point of the T statistic? Well, the T statistic let us know whether we would accept the H0-- that the two population means are the same, there's two subpopulation means are the same-- or Ha, that they're not the same. If we take a quick look at the T statistic, you can see at the top is X bar 1 and X bar 2. It's the difference in the sample means. Think about it.
If they are really far apart, the likelihood is that the samples come from populations whose means are far apart. So the bigger the difference between X bar 1 and X bar 2, the bigger our numerator is, the bigger our T statistic. So that gives us a basic idea of what the T statistic means. If the T statistic is big, it basically means that the two population means are probably not close together. That difference is big as well. And that's going to mean that we're going to be accepting the alternative hypothesis.
On the other hand, if the two population means are close together, hopefully the sample means are close together. This number is small, we get a small T statistic. So basically, T statistic being big tends toward the acceptance of an alternative hypothesis. Then the null hypothesis is rejected.
Let's look at it a little more thoroughly now. Let's look at the denominator. On the denominator, we've got-- the top of the denominator here-- we've got the standard deviation. And what does that mean? Well, standard deviation is how spread out the samples are. Imagine we have small standard deviations for each of our two samples-- in other words, our male and our female subpopulations.
Well, if they've got a small standard deviation, they're very narrow. We're really confident we got the right number. Maybe the population has a narrow variation, or maybe we just simply did really good measurements, and we got them really close together, and that's a good thing.
Or maybe the standard deviation is large, and they're spread out. Well, you can see that if I am looking for whether or not the difference is real, it's more robust and likely that the difference is real and that our numbers are good if the standard deviations are small. In other words, we got them spread out like this in our sample, there's less confidence that we got the right number, and there's less confidence that the difference is real.
So let's look back at our T statistic calculation. So now let's take a look at what standard deviation has-- what kind of effect it has upon our calculation of our T statistic. Well, if your standard deviation is large, you're going to have a larger number on the bottom.
When you divide by a larger number, well, think about it. When you go from 1/2 to 1/4, you're dividing by a larger number, but 1/4's actually a smaller proportion. So in the same sense, if our standard deviation is y, we've got a bigger denominator. We divide by a bigger number. Our T stat's going to be smaller.
What does that mean? Well, that means that when you have a spread out sample like this, you have less confidence that any difference is real. That makes a lot of sense.
Let's look at one last thing in the T statistic equation. And that is sample size. If I throw a fair coin 10 times, sometimes I'm going to get five heads, sometimes I'm going to get six, sometimes I'm going to get four. Very rarely I'll get 10, or 9, or 0, or 1. But what I get is a normal distribution like this centered on five heads. So the frequency of heads is 0.5.
It's like a family. If you've got 10 kids in a family, yeah, most likely, outcome is five boys and five girls. But occassionally-- it's not unheard of-- of having a run of 10 girls in a row, and it's still somewhat random.
But if you have a sample size of 1,000, which means you're taking that same fair coin and flipping it 1,000 times, the vagaries, the uncertainties, a small sample size [? tend to ?] disappear. Not completely, but the thing's narrow. In other words, you throw it 1,000 times, it's really unlikely you're gonna get 1,000 heads-- much less likely than getting 10 heads out of 10.
So you get a much narrower distribution. So you can see the bigger our sample size, assuming it's a nice random sample, the better the estimate we have for the mean.
What does that mean? Well, that means that if we have a bigger sample size, any differences we detect we should be more confident in. Well, think about it. If sample size goes up, this entire number down here goes down. When you divide by a smaller number, you get a bigger number. In other words, you can have more confidence that your difference is real and, therefore, that your T statistic indicates you should accept the alternative hypothesis.
I hope you followed all that. My students find that very helpful. You can probably tell I teach physics as well, which is like all mathematics and proportions like that. For those of you who have taken physics, you know what I'm talking about. All right.
So now I'm going to find the T calculation, the T statistic for that sample that I set up a few minutes ago. X bar 1 minus X bar 2 divided by the square root of s1 squared over n1 plus s2 squared over n2. That's from that table of s, n, and X bar we did for the two samples. I ended up getting a T statistic of 2.40. Now, at that point, that doesn't tell us anything, because what does 2.40 mean? So you got to have some way to figure that out.
What does it mean? You go to a table with what are called critical values. Critical values tell us what number you need to be larger than in order to determine that you would be accepting your Ha. Let me make sure I said that clearly. If your number for your T statistic is bigger than your critical value, then the difference, at whatever level of significance you want to accept, is large enough to accept the alternative hypothesis.
So there's this table. Notice there's something called df. That's degrees of freedom. I'll get to that in just a second. There's something called Confidence Level. We'll talk about that in a second as well. And then there's something called One Tail and Two Tail. For now, we're just going to be doing Two Tail. I'll explain what that means vis-a-vis One Tail in just a minute. So let's go all the way back here.
Critical values are from the table. Notice I said degrees of freedom. We have to know what that is. That's equal to the sample size minus 1. 10 minus 1 is 9. You might say, wait a sec, what if I have two samples? Do I add them? Nope. You take the smaller of the two sample sizes. So if you have the sample of 10 and then sample of 6, it's 6 minus 1 equals 5. So our degrees of freedom is 10 minus 1, or 9.
Now, I'm not going to show you right now, but the numbers that I'm going to show you on the previous slide that I'm going to bring up in a second are from this table. So from that table, if you use nine degrees of freedom and your alpha-- the level of significance you're looking for is 0.05-- the Two Tail test-- and we're only going to look at the Two Tail test-- for a Two Tail test, the critical value right there is 2.262.
The critical value for 0.01 is 3.250. What do these mean? It means if your T statistic is bigger than this, you can be 95% confident-- putting up with a 0.05 chance of a type one error-- you can be 95% confident that the difference is real and not due to chance.
If you want to be 99% confident, you're willing to put up with an alpha, a level of significance of 0.01-- you have to have a bigger number. That makes sense. If you want to be really sure that there's no difference here, you've got to have a much higher level of confidence, a much higher T statistic. In other words, if you want to be 99% sure that it's not due to chance, you've got to have a really big difference as measured by the T stats.
Well, looking at our numbers, our T statistic was 2.40. So our T statistic for a Two Tail test-- I'll explain again what that means in just a second-- our 2.4 is bigger than this. So at a 0.05, which is the typical scientific level of significance that you'll see, at a 0.05 level of significance, we could say that our results are significant.
The 2.4 is bigger than the 2.262. But at a 99% confidence, our number is less than the critical value, so we can't be 99% sure that the difference is real. We can be 95% sure. So I'm going to get rid of this asterisk here, so we realize that we're not significant on that.
Let's take a qualitative analysis of the Critical Values table just so you got an idea. Let's only pay attention to the Two Tail line. Ignore the One Tail line. And I'll get to that in just a second. So 95% confident means that you're 95% sure you got it right. In other words, if there's a difference, it's a real difference. There's only a 5% or less chance that the difference happened by chance. We can't eliminate it completely, but we can minimize that effect. We can identify the chance of a type one error.
So let's break it down here. So 95% confidence, a critical level of significance of 0.05. We come down here to nine degrees of freedom. There's the number. Our number was bigger than that, so the difference, we decided, was, at 95% confidence, real. Please notice as the degrees of freedom go up-- in other words, as the sample size goes up-- we just need a smaller T statistic to conclude that the difference is real. Well, remember, I told you a few minutes ago, sample size is important. The bigger it is, the more power we have in our statistical analysis.
The other thing I want to show you-- as we go to the right from 95% to 99%, well, going to 99% means that we want to be really, really, really sure that we're not accidentally rejecting a true H0. In other words, we might get down to 1% chance of an error.
Well, for that, the difference has to be big enough, because you get bigger and bigger and bigger for us to be sure that it's real. And so sure enough, we stay in a line of nine degrees of freedom. Notice it does indeed get bigger. Our critical value gets bigger. It's a harder and harder measure that we have to exceed to be 99% confident versus 95% confident.
I want to add one last feature of T-tests, which might make your study more powerful and, at the same time, more sensitive to what's going on. The test we did at this point was called a Two Tail test. In other words, our H0 was that the male and female means of the populations are the same, and our Ha is that the males and the females are not the same. In other words, the males could be larger than the females, or the females could be larger than the males. It is fair, however, if you are reasonably confident before you do your study, that this is the nature of whatever it is you're analyzing, to set up ahead what's called a One Tail test.
In a One Tail test, you, for good reason, can eliminate one of the two possibilities of Ha. Remember, I said Ha means they're not equal. So the males can be bigger than the females, and the females bigger than the males. Well, for good reason-- because all primates, if there is sexual dimorphism, the males are bigger than females-- we can eliminate, just out of our calculations altogether, the Tail, the side where the males are smaller than the females or the females are bigger.
So now our H0 is still that the two populations are equal in size. Our alternative now, however, is that the females are smaller than the males. So because we've eliminated one of the two tails, our critical value is now a smaller number, which means that our test could be more sensitive at a pretty good level of significance to detect the difference.
So let's take a look here. For a Two Tail test, if you want a critical value of 0.05 at nine degrees of freedom, the number is 2.262. But for a One Tail test, if you want a critical value of 0.05, the critical value is 1.833. So if you had a T statistic between those two numbers and you were doing a One Tail test, you could accept your Ha. But you can't do that with a Two Tail test. It's a lower threshold.
In fact, let me look back now on our own values. For a One Tail test, alpha, critical value of 0.05, well, 2.4 is bigger than that. We accept-- our 2.4 was our T statistic-- was bigger. So One Tail, we accepted it. Two Tail at a 99% level of confidence, 0.01 level of significance. Our 2.4 was smaller than 3.25. Ah, our 2.4 is still smaller than 2.810 for the One Tail test. So we still have to reject 99% confidence with a One Tail test, but our 2.4 is closer.
As part of the NIH-funded ASSET Program, students and teachers in middle and high school science classes are encouraged to participate in student-designed independent research projects. Veteran high school teacher Walter Peck, whose students regularly engage in independent research projects, presents this series of five videos to help teachers and students develop a better understanding of basic statistical procedures they may want to use when analyzing their data.