share
interactive transcript
request transcript/captions
live captions
download
|
MyPlaylist
SPEAKER 1: I'm really pleased to introduce Prantik Kundu. He joins us from Mount Sinai, where he's an assistant professor of psychiatry and radiology. Prantik received a bachelor of science from Manhattan College before going on to do a PhD in a joint program between the University of Cambridge and the National Institute of Health. So he was jointly supervised by Ed Bullmore and [INAUDIBLE]
In that work, Prantik really innovative and substantial ways with MRI acquisition and pre-processing, which he's going to be talking about today, and then multi-echo fMRI. So I'm really delighted, first of all, by how full the room is for a methods talk in human development. This is fantastic. I'm not recognizing a lot of these faces. I'm thinking maybe engineering and statistics maybe heard about this. Wonderful.
All right, so if you want more gory details, tomorrow morning at 10:00 AM, there's going to be a chalk talk that's going to be two hours. And that's in Kennedy, room 105. Any one here is welcome to attend there. That's where we're really going into the gory details.
PRANTIK KUNDU: All right, so I just want to thank Nathan and [INAUDIBLE] and Valerie for inviting me. Thanks, everybody, for showing up. And yes, it is surprising people are interested in technology, but that's good.
OK, so I'll be talking about, I guess, superficially, we'll be talking about a, fMRI technique that we developed between NIH and Cambridge called multi-echo fMRI that kind of takes a different approach at how fMRI is done, with some objectives we'll talk about. But more fundamentally, the point is the question of what you can do with fMRI and how you interpret fMRI.
So that being said, in someone's mind, this is basically what fMRI, you would hope, would do off the bat, where you acquire a movie of the brain doing things. This is a quote, unquote "resting state," so this is spontaneous activity of the brain propagating over the cortex. Hopefully, you can see that on the right, you have an axial view. On the left, you have a sagittal view.
And up here, you have a time course representing a single voxel. So you see, at the end of the day, what fMRI, in some sense, is supposed to do is give you a time course read out of activity of the brain over somewhere between 10,000 and 100,000 voxels all over the brain. So it's a compelling methodological argument.
So what does that involve? Anybody who's just starting in fMRI thinks it's something like this, where, OK, you're going to get some images, and maybe fix for motion. You do a little bit of smoothing, a little bit of filtering, and ta-da, you have something like an activation map or a resting state correlation map. So this is from Biswell's '95 paper about connectivity.
But for anybody who has spent more than a week doing this, you realize that's actually not what the analysis is. This is more like the analysis, where there's some alignment. There may be some smoothing. You have to have the correct cardiac pulsation. You have to handle movement parameters. You've got to deal with CSF signal.
You have to interpret white matter signal somehow. You have to deal with respiration. Then you had to maybe go into global signal, depending on what your philosophical opinion is of the mean, maybe despiking, maybe bandpassing. And then at the end of the day, nothing seems to work. Forget it. Let's just filter sensor time points.
So that's been the debate. That's basically a summary of fMRI methods over the past 15 years. Question, why? Why is it a mess? It's a mess because we don't actually have a ground truth. We don't have a general response model for what bold fMRI signal is supposed to be.
So in task, you have a notion that the activation is supposed to be correlated or corresponding to your experimental stimulus. And that's legitimate up to a point as much linear models are legitimate. They're legitimate up to a reasonable extent.
However, in resting state, it's an order of magnitude worse. Because now, you're just looking at the correlations. And correlations are just correlations. That's all they are. You only believe that, for example, resting states are real because they're pretty reproducible.
It took Biswell about three or four years before any anybody published that paper, and it took the field even longer to come to the realization that this is a physiological phenomenon. And then Logothetis had the paper where he showed that LFP was related to BOLD, and people believed it since then. But even then, there's lots of debate to this day.
All right, so what's multi-echo? Multi-echo is basically an attempt to find a different ground truth for fMRI that's based on a model from relaxometry, basically T2 star relaxometry. I'm not going to get into the detail of the different contrast mechanisms here. But when you acquire a BOLD signal, you're looking at one type of NMR relaxation process, which is the T2 star decay.
All right, so an irregular fMRI experiment, this is basically what happens. The red quadrilateral represents an excitation pulse. To get a single slice of an fMRI data set, you excite the slice, and then you wait a little while, and then you manipulate your MRI gradients to induce echoes. And then you read that out by reversing your gradients if you're doing EPI. And then you do a reconstruction, and you get an image. If it kind of looks like a brain, it makes you happy. That's great.
So what multi-echo is, it's very simple. You just add more readouts. And instead of waiting for some time to pass, you actually just start acquiring images as soon as you can. You just take note of what you're TEs are. What that buys you is something interesting, which is that you can actually make sense of the change in images with echo time.
So over here, remember, we're still acquiring a single slice. So all of this, these three images representing different points in the decay are happening over, roughly, 60 to 70 milliseconds, depending on how you accelerate. But what happens is that you can see that the earliest TE image is quite different from the middle and quite different from the latter one. And the reason is what you're seeing is the different signals from different voxels decaying at different rates in a way that's consistent with the tissue parameters and the state of perfusion. And there's some distortion in there, too, but the vast majority of the process is weighted by differences in tissue parameters.
All right, so here's the toy model of how this works. So a T2 star decay is pretty reasonably modeled as a monoexponential decay. If you wanted to do really quantitative relaxometry, it's more biexponential, but for argument's sake, it's monoexponential here.
All right, so now what happens? This point that they intercept here represents an excitation. And then given some time, the signal decays. Great. So then what happens when you have, let's say, an influx of oxygenated blood? What happens if you get an influx of oxyhemoglobin or, rather, a change in the concentration of oxyhemoglobin to deoxyhemoglobin? The homogeneity of the magnetic field increases, and your decay slows down. So the decay rate of this exponential is slower.
So then what happens when you take the difference between these two curves, these two exponentials that only vary by the decay rate is you get this curve. This is the BOLD contrast curve. This is why you acquire at 3T a TE of 30 milliseconds. Because you're acquiring just at this point where the difference of the exponentials is maximum.
When you're too early, you got no signal change. When you're too late, you have no signal change. And the middle is the sweet spot. And that's why you acquire it at just the single TE. But in a multi-echo experiment, the objective is to sample some approximation to a longer decay process.
All right, so what can multi-echo look like? So multi-echo is very versatile. You can acquire it in multiple field strengths. This is an example of the 7T data set, with three echo times going up to-- in plain resolution, there's 2.5 millimeters here. The last TE is 42 milliseconds, and you get good resolution. As you increase in echo time, you see a change in image contrast. You're getting a change in image contrast because different tissues decay differently, some faster, some slower. So you actually learn about your tissues from an fMRI experiment, which is not kind of expected of a conventional fMRI experiment.
Great. So now that you have this decay information encoded in the signals, you can actually make parameter maps. So for example, just given images of three different TEs, you can compute based on just a log linear fit, nothing special. You can get the intercept, which actually turns out to be a coil sensitivity map. And you can get a T2 star map. And you get that for free, just by going voxel by voxel, fitting the mean of the time series to a log linear exponential.
But then once you have the T2 star map, you can do something really cool. You can combine the images from the different echos to produce a susceptibility artifact compensated map. What does that mean? So basically, in this TE equals 15 image, you have very little or no drop out, which for those of you who actually do fMRI right now, is a problem.
As you go to later TEs, you actually see a progressive worsening of drop out. The reason why you have a dropout here is because the signal has already decayed away. You're already at the tail of the decay. So you what you want to do is you kind of can deal with that if you capture the signal earlier in the decay, which is what's happening at the TE equals 15.
But remember, there's a sweet spot. Remember the BOLD contrast curve. There's a sweet spot in the middle. So how do you deal with that? So it turns out that if you acquire multiple echoes, you can actually combine them with an adaptive combine. It's called an optimal combination. And by estimating the per voxel T2 star, you can create a synthetic image where every voxel has optimum contrast to noise.
So that means that you compensate for the orbital frontal susceptibility artifact, but you're not penalizing yourself and other parts of the brain. So it's basically like how your iPhone does HDR. It takes images and a few contrasts and combines them together to give you a contrast-optimized image-- same idea.
In the process, you also wind up, not only you decrease the susceptibility artifact, remember, this is a weighted average, so you're attenuated thermal noise. Random stuff cancels. We'll get to this in a bit, but for free, you have a doubling of SNR, almost free.
All right, so how else can this help us? Sorry, I think I can take a question if anybody has anything. No? Interrupt me whenever you want.
OK, so now moving forward, we looked at images so far, but how does this affect time series? So it turns out that in multi-echo data, time series also have a relationship between each other. It turns out that percent signal change of activation scales linearly with TE. So that means if I have a task, as you can see here, and if I have acquired at 10 milliseconds, and I have the percent signal change of this activation being 1%, if I acquire at 20 milliseconds, it's 2%. If I acquired 40 milliseconds, it's 4%.
Fine. The cool thing is that this happens both for resting state and task. So in a task, I can see a block structure. Great, wonderful, it scales. If I look at a resting state fluctuation, though, which has no intrinsic structure and if I didn't have the scaling information, I wouldn't know what it means. Once I have the scaling information, I know that this is something that is mediated by a T2 star process.
So it's inferences like this that start really driving home the interpretive benefit of a multi-echo acquisition.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: OK, in this image representing a multi-echo acquisition-- three echo times. And in the time series, the color of the times series corresponds to the echo time. So in a multi-echo acquisition, per slice, I've acquired multiple images, and I can recompose them to give me three different time series data sets representing the same period of time.
And so then if I take a single voxel, I can see its time series corresponded to the different echo times. And if I scale those time series to percent signal change, which is how fMRI time series are generally scaled, it turns out that as you increase in echo time, the magnitude of the change that you see scales linearly. Is that relatively clear? And so that effect happens in the same way for resting status task.
So why does that happen? Very simply, present single change is what? Delta S over S, right? Here's my delta S-curve. That's the contrast curve. If I divide the contrast curve by the mean of the two exponential decays, I get a line. You can do this just as a toy model.
But the cool thing happens is, all right, so in this case, I modulated R2 star. In this case, I modify the intercept. One is the rate, the other is the intercept. If I modify the intercept, and I take the difference of two exponential decays that differ by an intercept, I get another exponential decay. If I divide that exponential decay by the mean of these two exponentials decays, I get a flat line. Meaning percent signal change of something that's not R2 star does not scale with echo time. And pretty much anything that's not R2 star is an artifact.
So there are some arguments that some functional effects may be a little bit inflow bias. But basically, the stuff we're interested in is changes in oxygenation. Any change in oxygenation is going to produce an R2 star change. Anything that's wildly different from that is totally an artifact.
So therein, you actually have a mechanism by which you no longer need to look at maps and look at time series and kind of read the tea leaves and try to figure out what's what. This kind of paves the way for a very simple linear model based on the decay to tell you what's what. You can also derive this as a Taylor expansion for a monoexponential decay with an additive term. So you can model this as S0 plus delta S0, E to the negative R2 star plus delta R2 star and expand that. It's the first term of the Taylor expansion. Yeah?
AUDIENCE: Can you say explicitly what S and R are?
PRANTIK KUNDU: Sorry? S is basically spin density. S0 is spin density, alternatively, called initial signal intensity. So if I sight a tissue, and immediately, it begins radiating RF, so the initial intensity if I did some sort of photon count or if I did some sort of count, the initial intensity would be S0. As the spins in a tissue, the spins in a voxel start decohering and giving us contrast, the rate of that decoherence is R2 star. That's what the BOLD signal is. It's relaxation of the coherence of spins within a voxel.
The way BOLD works is, the way T2 star signal works is if in a voxel I have a whole bunch of signals that are wildly incoherent, they all destructively interfere, I get no signal. If they're all coherent, then I get a really strong signal. And as those spins slowly decohere based on an interaction between-- it's complicated, but it's an interaction between the gradient of the magnetic field in the voxel and the spins. That's what's mediating the rate of T2 star decay.
That's different from T2 decay, which is the relationship between the spins themselves. So there's T1, T2 star. T1 is a decay process that's mediated by the spin lattice. T2 is a decay signal process that's mediated by the interaction of chemical spins. T2 star is a process where the spin's interacting with the gradient of the magnetic field. And when you have oxyhemoglobin coming in and deoxyhemoglobin being washed out, the gradient of your local magnetic field decreases, so the spins can stay coherent longer. Makes sense?
AUDIENCE: So R is a rate?
PRANTIK KUNDU: It's a rate. It's just a rate of an exponential. It's like e to the negative KT is K.
AUDIENCE: And S is the intensity?
PRANTIK KUNDU: It's the intercept. So it's AE to the negative KT, R2 star is K.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: It's exactly the same process. It's exactly the same process. The modulation of a block, it's an arbitrary curve. So a curve is just--
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: So does resting state. It's not noise. It's not thermal noise. It's structured because each point has some predictive value of the next point. It's a smooth curve. It's an auto-correlated curve.
I'm just going to jump a little bit, but that takes us to ICA. So ICA, I believe, is a really important and pretty beautiful algorithm, at least, theoretically. Its application to fMRI has been helpful, because it's allowed us to decompose the signal in a useful way to find patterns.
How it works is you have a brain with a bunch of time series. You take each voxel time series. You load it into a matrix, just a regular old time series matrix. An ICA is a matrix factorization, which is basically saying that my input matrix of time series can be explained by a matrix representing maps and a matrix representing time series. If I multiply them back together, I get the original data. So x is equal to AW. So it's really simple.
So when you apply it to fMRI data, you get patterns. Some patterns, like this one-- I guess that's the motor cortex. And this one, this is some frontal parietal thing, maybe it's attention. They look reasonable.
You get other things that are clearly nonsense. This is some sort of artifact. And there are other things that are more problematic, which is this thing. If you looked at the axial view, it might look like something real. If you look at it sagittal, you realize it's all susceptibility artifact. But at the end of the day, it's like reading tea leaves.
So the multi-echo stuff really started when I was actually trying to do a different experiment. You're a first-year PhD student, so you just do whatever. It doesn't matter. So we had some multi-echo data lying around, and we had some conversations. So it turned out that if you try to solve the component selection problem with multi-echo data, the problem gets easier because we have two relaxometry models that we talked about.
We had the percent signal change scaling with echo time, which is the TE-dependence. And we have this percent signal change not scaling with echo time, which is the TE-independent, let's say. The two models here, we got a BOLD model, which is representing the percent signal change differences with TE, given a change in R2 star. And you have the noise model, which is what the expected percent signal change modulation will be if the intercept was changed.
If I then take a component that I think is a network-- so this frontal parietal thing-- I do my ICA in a way-- it's pretty trivial-- but I do my ICA in a way, let's say I just do it on the middle echo data set. And then I fit that time series to the data of the different echo separately. I get something like this.
So basically, these maps are scaled to percent signal change. Here's the little fluctuation that I got from maybe all three, maybe one of them, whatever. And when I fit this component time course to the brain, and I express the maps and percent signal change, I can see that the activation pattern is getting more intense with echo time, which is what I expected.
If per voxel, I take these percents signal changes and just fit them to a line-- pretty stupid-- but just fit three points to a line, you realize that this is a parameter map that you get, which is actually delta R2 star. This is a physical measurement. Its units are milliseconds. It's telling you what the activation is in terms of relaxometric terms.
And then if you do an F test for how good the fit is, you actually get a thresholdable map. So this is a statistical parametric map for T-dependence. But note that this thing on the right is just reflecting the fit of three points to a line, whereas this is reflecting the fit of a time series to other times.
But the point is that the fit is so robust, the effect is so robust and so strong that even with fitting three points to a line, you can get a really sparse, clean map. And then the point is, when you cross reference this with this, you see the same place where this time series is fitting the data here. That's the same place where the signal is also scaling with echo time.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: The signals the different echoes, the weights of the fit at the different echos. So given some arbitrary fluctuation, I fit it to the data of TE-1 I get a value, TE-2, I get a value, TE3, get a value, fit that to it. If I do the same thing against a null model where nothing changes, I get an empty map. Meaning, as you can see, these fluctuations are not fitting the noise model.
On the other hand, if I have an artifact component-- so this is clearly an artifact, has a pulsatility, it has this ring around the brain thing-- if I fit it to the T-dependence model, I get an empty map. This is the fMAP. This is the R2 star map. This is the SPM empty map. This modulation loads onto the noise model. Looking at the images, it's not surprising. But it's good that it works out that way.
So we can turn this into-- yes?
AUDIENCE: Did you say [INAUDIBLE]
PRANTIK KUNDU: Noise does affect rate, but BOLD strictly affects R2 star. BOLD, pretty specifically, affects rate. But noise, depending on the type of noise, some types of noise are explicitly S0, and other types of noise are both S0 and R2 star. But only functionally-related BOLD is purely R2 star.
AUDIENCE: OK, so the models that you used [INAUDIBLE]
PRANTIK KUNDU: Right. So they can be modeled as independent, within percent signal changes-- about 60%. So there is a joint model. When you expand the formula, you actually see that there are dependent terms as well. But the dependent terms don't really affect you that badly within 60% changes of R2 star. So since most brain changes are 1% to 2% to 4%, you're safe, and you can make this inference legitimately.
So then we know that we can have f-statistics or any kind of statistics on this kind of inference. So you can compute pseudo f-statistics for a component level statistical measure of T-dependence or T-independence, which we comically called kappa and ro. I'll let you figure out why.
So the bold weighting is represented as kappa, as a pseudo F-statistic for component-level T-dependent scaling. So instead of thinking about T-dependence in terms of a voxel, we can think about it on the basis of an entire statistical component. And this all works because this T-dependence stuff is linear. It works fast and works easily because the models are linear. If there were non-linear, this would not be nearly as clean.
So that means that at the end of the day, BOLD processes are high kappa. Kappa is proportional to F. Non-BOLD processes are low kappa and/or high ro.
So then if I have a decomposition of an individual data set, and I do an ICA, and I plot the kappa values for all the components, and I sort those values, I get a scree plot. And if I then plot the ro values that correspond to those kappa values on the bottom-- so basically, for component number 20, I know that component number 20's kappa is 150, and component number 20's ro is about 10. So that's how you read this.
And when you look at the whole thing, you can clearly see two regimes of components. One's a bold regime. One's not that. And so without even looking at the maps in the time series-- that's actually crucial-- without looking at the maps and time series, I know there's a rich BOLD space.
So that's one takeaway from this that fMRI, if you are heavily dependent on normative patterns or looking at your data, you're inherently invoking circular logic. In resting state, for example, which doesn't have a time series model, let's say you're dealing with pathology, so you don't even have a structural model. How do you know what you're interested in is what you're interested in? So you have to have an independent measure, and T-dependence is independent of time series stuff.
So logically, even if this was not more robust, even if this was noisier, which it thankfully isn't, this is logically sound. And so that allows you to go farther downstream, as you'll see in the rest of this talk, and actually make discoveries. Because you're open to things that you weren't expecting, because you have a pretty much foolproof model.
All right, so if I took the batch of components that I recognize as BOLD, and I plotted them from an individual 10-minute data set with fat voxels and not an especially special acquisition, I get a really, I think, an impressive representation of functional neuroanatomy in a single person. Thalamus, motor cortex, putamen here, frontal parietal areas, default mode, primary visual cortex-- this data set was acquired three years ago, four years ago. So this is without multiband or anything.
And another thing you may notice is you don't need to smooth the data, so that's another interesting point. This was published in 2013. This was in the PNAS paper, just to give an idea of what the layout is. But as I said before, we don't need templates. In fact, we don't even need a human brain. So we can do it in rat.
AUDIENCE: Well, you need them for some things.
[LAUGHTER]
PRANTIK KUNDU: So this is the exact same process done at 11.7 Tesla on a Metetomidine anesthetized rat. And we were kind of floored, but you get an amazing cortical parcellation in a single animal, based off a single scan under anesthesia. So you can actually isolate caudate putamen, M1, M2. This is something you're generally used to seeing from human data. So this is leveraging both the optimal combination function, which basically doubles your SNR off the bat-- which, frankly, is already because it's 11.7 T-- but it is also taking advantage of integrating relaxometry throughout the analysis process, which we'll talk about tomorrow.
But just to make a point, if you took the same data set and passed it through optimally combined-- so you're already taking advantage of multi-echo, but not fully, not including all the relaxometry that we do throughout the analysis process, you do melodic ICA. These are the components that you get. They're much sloppier.
For anybody who came to the lunch, you'll know that this is related to the dimensionality estimation problem, but we can talk about that later. But as you can see, this is robust to an y kind of anatomy that you might throw at it, including lesions.
All right, so back to the boring stuff-- denoising. So how does this manifest in denoising. Remember, we talked about we can take a data set, and we can factor it into a set of time series times the set of maps. So that means that we can null some of the time series and recombine the data to produce a denoised data set. So instead of coming up with a cardiac trace or coming up with some sort of motion trace, which we know is flawed just by looking at PPGs, we just denoise by rejecting components. So that, of course, depends on how good your components are, but that's a little bit detailed.
In an fMRI experiment-- we were talking about this at lunch too-- in an fMRI experiment, a lot of people make their judgment based on the image that they see that looks like anatomy. A lot of people make that decision based on the thing that looks like a brain. But in fMRI, you're never looking at the brain. You're looking at changes from the mean.
All your decisions, all your analysis is done after you subtract the mean. But nobody really looks at it like that. Everybody looks at just the image and says, oh, it looks like a brain. That's great. But that's not actually what you're interested in. You're interested in changes from the mean.
So here, we have subtracted out the mean, and we're going to play the video of what a multi-echo, even a combined multi-echo data set looks like if you subtract away the mean. And unsurprisingly, if you've done any fMRI analysis, you know it looks kind of crummy. You have all sorts of noise everywhere. You have stuff on the edge of the brain. The time series is eh, schmeh. There's some low frequency stuff, but there's a lot of high frequency stuff. Great.
So this is what we're dealing with. All the filtering that we apply, all the stuff that we do, it's to try to make this look regular. All right, so then what happens if you isolate the non-BOLD space? This is what the non-BOLD space looks like. I'm going to flip through the two. And look at the time series from here, now, look at the time series here.
This is from a representative voxel. You actually see what the underlying noise is. It's a high frequency fluctuation plus a big spike. If you play that-- so now, you can even see the cardiac pulsation here. You can see some of the pulsation here. And then we'll get to a big motion, and you'll actually see what happens during motion in just a second.
OK, getting there. Watch. OK, so when you're censoring and scrubbing, you're actually trying to get rid of that effect-- massive nonlinear disruption to the signal. And if you've done any kind of study on filtering, you realize how hard this is to filter. Because why? It's non-stationary. It doesn't have a fixed spectral fingerprint over time.
All right, so then what happens if you subtract this-- like, literally a subtraction of this-- from the time series before? You get this, which is the video that we started out with. So basically, if you add this to the movie that you just saw, you get the first movie. So by linear subtraction of non-linearly elucidated components, you can actually denoise your data to some absurd amount.
And what this means is, as we'll get to in a bit, what you're seeing is a movie of the brain over time, but you can see dynamics. Some of you know the dynamics is kind of big now, and they're trying to get to it in non-stationary states. What you're seeing is every frame in this video is a different state, but all it is just the activity.
So how does that relate to things like motion artifacts? So for a representing data set, this is a motion trace representing a slow head movement over time. This is the framewise displacement. So this is a shot to shot. This is the sum of the derivative of the motion parameters, which means if you have a big spike, that means you had a big motion. It's a big difference in displacement from the prior point.
So that's the FD trace. You've got this DVARS thing, which Jon Power came up with, which is the sum of the first derivatives of the signal-- absolute value. So basically, if you go shot to shot, if you have a big spike, what happens is you've had a big discharge of signal. You've just had a big change in signal. And the only reason you have a big change in signal is because you've had all tissues moving.
So then the bottom trace is the DVARS metric computed off of three different data sets. The black trace, if you can see it under the red trace, is the DVARS trace of the raw data. The red trace is the DVARS trace of the non-BOLD data. The blue trace is the DVARS trace of the BOLD data.
So as you can see, we've explained the full shape of the raw DVARS trace in the non-BOLD trace. So that means that goes to show you that we've taken care of a lot of the motion artifact. And especially given the DVARS was initially argued, the DVARS trace was a point of argument for why a given data had not fully explained its motion artifact.
I'll just point to it. The gold trace over here is the DVARS trace after motion regression. So you can see the gold trace or the yellow trace looks not so different from the black trace, and in some places, it's even worse. But when you're using MEICA you've actually very specifically removed motion as indicated by the DVARS trace of the non-BOLD signals. Meaning non-BOLD signals capture both linear and non-linear components of motion.
Why? Because it's not based on a linear model of motion. It's based on relaxometry, which is totally different, plus a matrix factorization step. What does that do to SNR? So given a single echo unaccelerated data set-- so no sense, no GRAPPA-- an average voxel as SNR with fat voxels, so 3 and 1/4, so 240 FOV, 64 grid is 130. If you do a multi-echo acquisition with GRAPPA factor 2, which normally reduces your SNR, after combination, you'd wind up doubling your SNR. So it compensates for any acceleration that you do because the thermal noise cancels.
And then when you remove your non-BOLD signals, you quadruple SNR. And if you factor relaxometry symmetry throughout the fMRI analysis process, you also explain 97% of the variance. That's really important, because it's actually trivial to say you've increased SNR, but you've only explained 30% of the variance, right? You reject all your components. You'll have infinite SNR.
But if you explain all the variance, what that means is only 2% of the signal, roughly, is thermal noise, which is critical. That means 98% of the signal is structured signal, which means that your acquisitions are way more powerful than you think they are. It's not surprising we have these incredible coils. We have these incredibly sensitive receivers. Our data should not be crap. And they're not, but they're interesting in ways that we're not really used to kind of exploring the data.
I think I'm taking a little too much time in these explanations. So the whole process, we call this multi-echo ICA. So we've implemented meica.py, which I hear you guys are using, and I'm thankful for, and I hope we have more communication after this. The objective of this code is to kind of be a Swiss army, knife so to integrate basic processes in terms of anatomical corrections, and skull stripping, and warping.
We've also implemented a unique T2 star map based co-registration, which is very fast. We do the warps in the correct way, which is all at the same time. We do this echo combination which you saw earlier, which takes care of the susceptibility artifact.
It uses a homegrown dimensionality estimation, which incorporates relaxometry. It does component selection, and it does all the inline resampling, and it turns it into a nifty data set that should be compatible with SPM. We kind of code it up in Bitbucket, but occasionally, we send releases to AFNI.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: That's right. So adaptive refers to dealing with the noise floor. In the orbital frontal cortex, it's decaying quickly, so the last echoes are in the weeds. If you try to fit that to a monoexponential decay, you wind up getting not a number because you're doing divide by 0. So in a multi-acquisition, the adaptive is they combine, actually, is smart to use only one or two or three. Most of the brain is actually done in the three or four or five echoes. But in the beginning, it knows to do it to two.
All right, so how does this compare to FIX, which is HTP-derived tool, technology, whatever you want to call it. It basically tries to use a machine learning approach to denoising data by component rejection. I'm not going to go into too far the details, but as you know, FIX requires templates. You have to tell it what you're looking for, and it finds it for you. MEICA doesn't. MEICA just looks at the T-dependence and makes decisions therein.
This is based on a study of healthy controls versus ADHD. The reason why this is a relevant study is because ADHD kids move a lot. And so how do we deal with SNR and also degrees of freedom lost? So we have a few things that we're comparing. We have a single echo, meaning we take the middle echo and treat it like it's just a regular data set, which you can do, which is one of the nice things about multi-echo. You can treat it like any old data set.
So you have the single echo uncleaned. We have the motion correction, white matter, regression, and CSF normalization. We have FIX in its soft mode. We have FIX in its aggressive mode. I don't know what that means.
And then we have the multi-echo data optimally combined, but not cleaned, and then we have MEICA. So it turns out, in terms T-SNR, both the ME unclean and the MEICA give you SNR. This is, by the way, a 1.5T scanner, which is relevant. I'll explain why in just a bit-- between 200 and 250, 350.
Whereas fixed soft and fixed ag, while both better than single echo and motion white matter, basically, the multi-echo methods are head and shoulders above, which you know why. Because you're attenuating the thermal noise, which is a major contributor to removing SNR, reducing SNR, plus you can handle the motion.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: The errors bars, they're the standard error of the SNR over regions of the brain and across subjects.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Well, I guess, whenever you have an increase in mean, generally, you're going to see an increase in standard deviation.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Sorry, I mean, standard errors, scaled standard deviation. It's the same number of subjects. I think, even at the low end, you're still more competitive than the lower error bars of the other methods. The lower bound here is still higher than the lower bound here. And these are very significant differences. Sorry, I don't have the p values here.
And in the ADHD case, you have a similar pattern. And so you're pushing SNR to above 200. But an important thing is if you look at what the degrees of freedom lost are. So increasing SNR is not surprising if you just remove all your components. But in this case, fix is removing way more components for a lesser gain, whereas MEICA is consistently removing far fewer components for a greater gain.
And the kind of take home is if you did a group ICA on this 1.5T data set, in MEICA, you can actually find the hippocampal nodes of the DMN, which should surprise you if you've worked at 1.5T. Meaning that at 1.5T, you can actually map functional networks [INAUDIBLE].
All right, so I'm going to move on to this technique that we call independent coefficient regression or independent component regression. So most people who do functional connectivity do it on the basis of time series, so computing correlation between time series. We made this inference that ICA is an interesting transformation. Most people think of it as just a way to get maps, but it's not just a way to get maps. It's the rotation of the data to maximize sparse effects.
In ICA, it's the rotation of the spatial covariance to maximize the difference in capacity between components. Basically, what that means is you're turning the data set from time series that are auto-correlated with unknown degrees of freedom to a data set of independently distributed coefficients, basically numbers, with known degrees of freedom, which means this is much closer to something you can do statistical inference on. Because a best linearly unbiased estimator requires, if you're correlating two things, they should be random things. And then you should know how many degrees of freedom.
So when you're correlating time series, you don't know that. Because you're dealing with auto-correlated time series. You don't know what the number of underlying latent processes is. So that basically means that at subject level, you can't do connectivity with inference, which you should know. That's why we threshold at R of 0.5, 0.3, 0.8-- nobody knows. You don't know because you don't have degrees of freedom.
All right, so ICR is this idea that, OK, since we're doing ICA anyway, let's look at what the coefficient space actually looks like. So let's say I want to compute connectivity. I have-- let's take a vector. This is my seed area, and I have a coefficient set.
If I look at a target area, this is another coefficient set. These are all the BOLD components. These are the weights of the BOLD components at two voxel. You can see it in the target. I'm going to then compute the Pearson correlation between these two, which is that. Then I'm just going to do a Fisher z-transform, which you guys might be familiar with. It's just the arc 10 of r.
But importantly, I'm not going to drop the degrees of freedom too. Normally, when people do functional connectivity analysis, they just do r 10h, and they drop the standard error, because they don't know what to do with it. So they pretend it doesn't exist. But it was there in the 1920 paper, and so it exists. It's a thing.
So if you model the number of BOLD components as the degrees of freedom for correlation, you get well-conditioned connectivity analysis with inference. So at a subject level, you get a z value, which is basically quantitative connectivity analysis, but quantitative on the basis of the statistics. And it's normalized against the degrees of freedom, and you get a p value.
So what does that look like in a single subject? So this is AFNI Instacorr, which you might be familiar with. This is a single data set, 10 minutes. I will now rove around, and I set my connectivity threshold based on a p value. So I have some concept of inference.
Now I'm at p01. It's clean. So if you go to visual cortex, at some point I'll get to the pulvinar over here. Wait for it. There we go. If you see it back there-- so basically, I'm jumping around and setting seeds. And I can get pretty robust connectivity analysis. Great. So you can do that at the level of a single subject.
So in the PNAS paper, we kind of demonstrated this for just a couple of representative data sets-- one with high motion, one with low motion. Moral of the story is, you have much more consistent connectivity maps, even between high motion and low motion because you've controlled for the number of degrees of freedom.
In one of the prior slides, I draw attention to it, but as you have more motion, you have decreasing degrees of freedom because you're losing sensitivity. Because as your brain is moving through a magnetic field, your sensitivity to small change in relaxation goes out the window. So you have massive information loss. Unless you control for that, all your analyses are going to be biased.
So what the degrees of freedom are is the information content in the data. And you condition your correlation analysis by the amount of information of the data using a really simple framework, which is efficient transformation. And at the end of the day, and because you're doing it on sparse coefficients, these are statistically well-conditioned. They don't have voodoo auto-correlations to deal with.
All right, at a group level, what does that translate to? So in cortical areas, like the DLPFC, you got clean sparse maps compared to time series correlation, which gives you more diffuse maps. In the contralateral cerebellum, you've got the motor cortex. You can check the PNAS paper for a more detailed description.
But the point is, with an ICC test, if you do this over a group, and you want to check for consistency, you have to both check for consistency and specificity. So how similar maps of the same seed look to each other, but also, how similar maps of different seeds look to each other. Maps of different seeds should look different from each other. And so the ideal ICC in that case is 0.
Whereas maps of the same seed should look similar to each other. The ideal ICC is 1. And so in the MEICA analysis and the ICR analysis, that's much more close to being achieved. With Valerie Voon at Cambridge, we've acquired, I think, well over 200 data sets. These are group level, one sample T-test maps, again, simple statistics. This is not, like, Beijing or anything. Not that there's anything wrong with that, but simplicity is nice.
In a seed-based connectivity of n equals 140, just using AFNI GroupInCorr, I see at the PCC, I get really nice hippocampus, accumbens, a medial prefrontal. I have that superior frontal, which people tend to ignore, but it's actually there. It's nice-- caudate.
But we can do cooler stuff, too, in the same exact data set, but the same threshold. I see that at the inferior colliculus, I get [INAUDIBLE] and the thalamic nuclei corresponding to audition.
So how does this map on to development? So as you know, as you increase in age, vascular coupling changes. Motion, of course, changes. You have a situation where both for physiological reasons and logistic reasons, it becomes difficult to find consistent connectivity. So this is basically comparing conventional time course connectivity.
Across age, you have wildly different maps. These are just five representative data sets from a cohort that we did. And then even after you've denoised the time series, if you haven't computed connectivity on a statistically well-conditioned basis, and you haven't controlled for degrees of freedom, you see some differences from the uncorrected data, but it's still pretty sloppy.
But if you look at the same five subjects, the same five data sets, and you compute connectivity using ICR, you see they're incredibly consistent, which is reflective of the result that I showed you previously, where at the group level, you have both specificity and consistency.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Across age. There may be amplitude differences, but basically, the pattern is the same. So again, it begs the question, what's actually happening with development? We're also not wildly changing with age.
Consistency is good. Some aspect of consistency is good. Based on these images, you could maybe argue that long distance connectivity is increasing with age. We're working on a paper that shows that right now based on this cohort.
All right, so tasks. Going back to basics, what if you have a block design task? Simple stuff-- mental versus physical. So the task here is mentalizing. This is done by Mark Lombardo, who is big in autism. You might know the name.
The mentalizing task is judging a theory. It's basically a theory of mind tests. You're doing some introspection into another person versus judging their physical qualities. The other task is just listening to stories with socially relevant content.
These times series give you an idea of how the denoising itself is working. So if I take a MPFC region, and from that region, I look at the time series from echo 30 millisecond-- this is a multi-echo data set, but I could just take the middle data set, and pretend the multi-echo is not there-- I see this time course. As you can see, it's very noisy, which shouldn't surprise you because it's in the dropout area.
But then if I look at the time series after just optimal combination, I see the block effect at a single voxel. If I then do the denoising, and like we did previously, if I split the BOLD from the non-BOLD signal, this is what the BOLD signal looks like. You see, in fact, a better fit to the block design, which is overlaid. And the non-BOLD data set actually is simple. You actually see what it is. It's a drift, plus a high frequency oscillation with a couple of spikes.
And if I look at the PCC, which is kind of nominally pretty robust, and the TE 30 millisecond data, you can see that it's a block design. But there's kind of this mess at the beginning. I don't know what to do with that, right? Maybe it's real, maybe it's not.
So if I look at the optimally combine data, it's not so different from the T equals 30 millisecond data, because this is an area where the signal's already pretty good. But then if I look at the BOLD signal, I actually can recover the block process, even at the beginning. And I see, very simply, that the non-BOLD process i just a drift. But it's a nonlinear drift that has kind of a weird shape.
So in a sense, it's even simplifying the noise. You actually can see the noise clearly for being a combination of simple processes. But the noise is non-stationary. It changes depending on where you're looking. Unless you know that structure, it's very difficult to denoise it because you run out of degrees of freedom. You can throw 30 noise regresses at it. At the end of the day, what do you have left? Nothing. You can't do any inference anymore.
All right, so just in terms of what do the activation maps look like, here's the self versus other task. Here's the stories task. From the MEICA data, here's just optimal combination plus regression. Notably, the optimal combination is already taking advantage of the multi-echo stuff.
The activation patterns are pretty similar. The MEICA is a little bit more robust, but where do we get a benefit? We get a benefit by looking at the means in standard deviations. So here's an ROI analysis of the mean data, the mean percent signal change across a whole bunch of different regions.
So you have cerebellum here. You have inferior a temporal. You have DLPFC, and you have three different denoising techniques. One is MEICA. One is GLMdenoise, which you may have heard of. And the other is just the optimal combined plus the motion regression.
I see that across all these regions, the means are actually pretty similar. Great. Look at the standard deviations. In MEICA, the standard deviation is half. That means the entire subject variability has been cut in half. This happens for both the self-other task and the stories task.
What does that mean? If you look at power for your studies, to get 80% power, you need half the sample size. So in a region like VMPFC, which is kind of noisy, to get 80% power, you would need 30 subjects. With MEICA, you need 15. To get something in the cerebellum, even with multi-echo, which is already giving a lot of advantage, you'd need 70 subjects. With MEICA, you can do 30.
I'm not saying that low end is a good thing. But if you have a low end, you can do more of it if you have MEICA. This also happens for the stories task.
In the NeuroImage paper, we're kind of struggling with the reviewers, because they're like, this is very similar to what you've done before. And we're like, no, it's completely different. That's not the point.
But we try to express it in dollar savings, and the dollar-savings point is really important. Because as we're moving to an environment of where everything is a U grant, and you need, like, five sites and 10,000 subjects because they're thinking, OK, the only way we can increase power is by having armies of subjects, the reality is, small labs don't operate like that. Junior PIs a lot of times don't have access to that. You must have a method that is sensitive at reasonable sample sizes.
So what does this also mean? That means if you do have a big data set, that means you can do more with it. So that means, because you have more power, basically, you can have much more complicated design.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: OK, so that's VMPFC. So you got VMPFC, cerebellum-- I think the VMPFC is--
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Oh, yeah. Let me see if I have-- I didn't make the whole slide. I can show you offline as well. We also have implemented, in combination with Valur Olafsson and Tom Liu at UCSD, as well as, now, CMRR, Essa Yacoub. So we have now both the Siemens and the GE fronts pushing on multi-echo. We can do multi-band four. I think we can go up the multi-band six. Even with GRAPPA 2, TR goes down to 980 milliseconds. You can push it farther. We can talk about it offline. But more time points is not necessarily better.
But anyway, the same thing happens. You denoise the data. Your BOLD SNR goes us. Great. So just to kind of get to your question, 7T is a good example of bad susceptibility artifacts, because you dephase a lot faster because the entire decay process is faster. In a 4 TE data set, we have multi-band 3, 4 TEs of 9 going to 50. Look at the single. You don't have egregious dropout. It actually looks like a whole brain.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Yeah, exactly. So we can shorten it further. When I was messing around with multi-band, I didn't fully understand it, so I wasn't comfortable with really pushing it. But I know we can push it harder now. Same trick with the denoising-- the raw data minus the non-BOLD space, actually, a spectrally not-so-complicated non-BOLD space gives you clean data, a single frame of which you can see the DNN in a single frame within 1.8 seconds.
So you want to talk about states, you got states. Just from a bare minimum decomposition, it turns out you can actually parcellate the [INAUDIBLE], including caudoputamen accumbens, thalamus, kind of posterior putamen and, of course, the DNN. But this is just from a single 10-minute data set. So in terms of subcritical areas and dropout, it's not a direct example, but--
All right, so how are we using this clinically? We're using it in epilepsy. So an important point is that we've been constrained by the filtering that we've been used. Nobody really knows it, but the reason why you guys use 20-second blocks is because that's the spectral envelope where you have the least amount of noise.
You go farther to the left, and you go to slower stuff. You have dominant, slow frequency drifts. You go farther together the right, you have dominant and high frequency fluctuations. You actually experiment the way you experiment because you're forced to do it that way. You don't know it, but that's basically what it is.
Event-related designs are a little bit better, because you step out of the stationary spatial spectral fingerprint. It's closer to random sampling, so you're a little bit less sensitive to stationary frequency stuff. But at the end of the day, that's why you do what you do.
Just to finish up quickly, epilepsy is really cool, because physiologically, in EEG, you have two really interesting aspects of epilepsy or, frankly, any lesional pathology. When you have lesion cortex, what happens is your EEG gets slower. That's actually a function of neural tissue damage. But then in certain types of epilepsy, alongside the slowing, you get complex discharges, like sharp waves, spike trends. And so these can be modeled as high frequency.
So you have, actually, two aspects of the data. You have a dominant slowing, and you have high frequency component. High frequency because you're explaining complicated shapes. So do we see that in epilepsy patients? Actually, surprisingly, we do. So we're scanning lesional and nonlesional epilepsy patients with multi-echo multi-band at 7T. Believe it or not, the RIB has said OK to this.
We look at the bold spectral fingerprint, and we average the FFTs of all the components from the patients versus all the component time series from the controls. We see extremely significant increases in patient high-frequency BOLD and also low-frequency BOLD. It's anecdotal, but you can see it just in these times series.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Non, these are unmedicated. These are just recently diagnosed. They're actually refractory. Some of them have been medicated, but the medicines haven't been working. So they get off because they still have epilepsy. And so this is done in collaboration with the [INAUDIBLE] Epilepsy Center.
And even for dealing with lesional cortex, we're having some luck in finding aberrant networks. This is a map of a single network that has this bizarre, kind of one-sided hippocampal temporal network, but that's completely overlapping with a venous cavern in the hippocampus, which the epileptologists are associating as the seizure focus.
And even in tumor patients, we're basically using an altered all-correlation approach after denoising. We're getting some luck in terms of mapping functionally eloquent regions within tumor tissues, which is a major problem in epilepsy, because you don't want to resect more than you have to. But you don't want to be too conservative, because you don't actually deal with the disease. I just through this is because it's pretty, but we can talk more about it later.
So the thing I want to leave you with is, I would strongly encourage starting to think a little bit more about the designs that you guys use, the cognitive that scientists use. Because my opinion is that there is a lot more hiding, a lot more data information in the signal than is conventionally assessed.
And Jen Evans from NIH tried this experiment, where basically, you want to say, OK, what kind of kooky experiments can we come up with? What kind of slightly off-the-cuff experiment designs can we play with?
So here's just a parametrically-varied block design. This is a combination of a long block. There's an amplitude modulation on that. And the most interesting thing is actually the sigmoid. So can you vary a task sigmoidally and see a sigmoidal change in BOLD? The task, in this case, was slowly varying visual contrast. Flashing checkerboard, vary the contrast slowly. You would expect to see a slow change that follows the contrast.
This is important for two reasons. One, we expect drug response to be modulated and slow, non-linear, but a sigmoidal fashion, and neuronal adaptation or habituation. I would expect a habituation signal to look something like this. But if you filter your data to remove low frequencies, you've lost a major component of that. And so now, you're kind of in the weeds. You're filtering process, but you're in the weeds.
So this gives you, I think, an opportunity to look at complex waveforms that correspond to complex neural processes. Just as a proof of concept, the sigmoid task, in raw fMRI from V1, this is what the signal looked like. After conventional filtering, it looks like this, so not very different from the original data. There's no activation in visual cortex from the fit of the sigmoid to the data.
In contrast, once you split the data, you see the same thing that's happening previously. You actually see a relatively simple noise process. It's just a drift. But the drift interferes with the actual BOLD signal to actually produce-- the drift plus the sigmoid equals flat line. When you isolate the non-BOLD source, which is just the drift, you actually recover the sigmoid. And then when you see where it maps to, it's the [INAUDIBLE].
So we are now messing with online anesthesia modulation to see what happens in BOLD as you change anesthesia. We have a really cool monkey rig, where we have an eight-channel custom phased [INAUDIBLE] coil, where the loop transmit is a [INAUDIBLE]. That's his brain. And now, we're doing super-long scans. so we do 60 minute scans with multi-band, multi-echo. And we modulate the anesthesia parametrically over the scan, and then we monitor tissue. So basically, exhaled isoflurane is an approximation to tissue isoflurane.
The gray trace is a BOLD time series after denoising that you see. As you slowly increase the anesthesia, you see this oscillatory process that starts, kind of gets wonky, and then damps. Then as you turn the anesthesia back down, you actually recover the oscillation.
If you take this gray time series and pass it through a continuous wavelet transform and pick an intermediate scale coefficient, you actually get this curve. And that's important, because a wavelet of decomposition is actually kind of deconvolution. When you transform the time series from an oscillatory domain to a wave coefficient domain, you get a trace that looks pretty damn similar to your tissue isoflurane concentration, which is the blue. It's a very significant correlation.
So anyway, this is stuff we're working on. We're in an R-21 submission process with this guy. I think it should be interesting. There are other papers that my collaborations with [INAUDIBLE] lab and Cambridge are working on. EU-AIMS, which is the EU autism collaboration, has a multi-site, multi-echo, but single band. NSPN, which is another UK-based development project, is using this. So these are kind of several hundred subject per.
We're working with the CMRR, with [INAUDIBLE] to try to get this into the lifespan, but we're getting some resistance from Steve Smith. So there's an ongoing multi-band versus multi-band, multi-echo debate, which is fun.
And just as a conclusion, long story short, multi-echo and multi-echo multi-band enable at least doubling of SNR. It's pretty simple. That's what it comes down to. Because you're sampling more data. You have more data, but that data conforms to a physical model. You can use it in an interesting way.
But it kind of goes beyond just going faster. Because you can apply this in task. You can apply it in resting state. You can apply it to field strength between 1.25T in human and 11.17 in mouse. It works the same way because the physical process is the same.
This is uniquely suited to denoising where filtering is undesirable or you don't have a template. So if you are an HTP lifespan, and you're scanning from age 6 to age 80, you're going to have a single fixed template for the whole thing? No, you're not. But are you going to have different fixed templates for different segments? You're not going to have that either. Because then the only thing you're looking at is your own fixed template.
Once you start passing your filter over the data, which you eventually start seeing, is your filter. You have to denoise in a way that's independent of the data. And at the end of the day, that's what this multi-echo stuff gives you. It's a different domain of data. It's a different domain of information that's informed by the physics. And if you incorporate it through your pipeline, I think it gives you good stuff.
Meica.py, as some of you know, freely available. It's a work in progress. It works, but it's in progress. So feel free to email me with bugs, and I will do my best to respond in a timely fashion.
And I've had lots of help from a lot of people. Wenming taught me everything I knew about physics, basic MRI stuff. And that was mainly by just going to his office and harassing him as a grad student. So do that to all your professors. And OK, that's it. Thank you.
[APPLAUSE]
Questions?
AUDIENCE: [INAUDIBLE] multi-echo MRI can be applied to [INAUDIBLE]
PRANTIK KUNDU: So I'm working with Cameron Craddock on this. So yes, but I think there's a lot of interesting ways of doing it. Because I've been talking about ICA, the multi-echo and the T-dependence analysis can be handled in a much simpler way, which is just, as you get an incoming volume, do the linear T-dependents or T-independents fits per volume.
So when Glover did this and [INAUDIBLE] and [INAUDIBLE] did this, they discovered that doing these fits on a time series basis-- like incoming real-time data-- would be noisier. However, that's actually not the worst thing in the world, because you would increased thermal noise in the data, but your contrast noise is not low. A block effect would show through. And a lot of them, the motion-mediated artifacts would actually be handled in the T2 star decay modeling that you do in real time.
So we're doing this with Cameron Craddock right now. There's other ways of doing this. Another way of handling it is getting a training data set, finding the artifacts or, alternatively, finding the BOLD things. And then as images come in, project each image onto your basis. You can get real-time time course per network. does that make sense? We can talk about it later, but yes, you can do it.
AUDIENCE: So you talked about, of course, with the increased signaling noise, that you're able to have statistically-powered smaller data sets. Are you seeing this taking in a sense of single-subject translational clinical applications also?
PRANTIK KUNDU: That's the whole motivation of my work.
AUDIENCE: Are we there? Is this it, or is it going to be--
PRANTIK KUNDU: It's unclear. First of all, I hope it didn't come off as a sell. The point is that this data is really rich and complicated. But at the same time, there's a lot of structure in it. And so we need to understand it better.
I opened with the slide that showed all the stuff we do to the data to clean it up. We're going to get to the translation step when we get a model of the data that fully explains all of the data. The reason why we can't translate just yet is because we haven't been able to explain, roughly, 100% of the data.
The papers that I'm working on actually show you how you can segment your entire signal space into thermal noise, non-BOLD, and BOLD. Once you know that, especially once you know what thermal noise is, then you can make a clear suggestion or recommendation to a clinician saying, hey, this is my noise floor. This is my signal space. If my signal space is much bigger than my noise floor, this [INAUDIBLE].
So at the end of the day, translation is predicated on my ability. So if you make a recommendation to a surgeon, it's somebody's life, right? So you have to have some sort of guarantee saying, OK, I guarantee you within certain probability that this will be OK. So once we can do that, we can translate. But to do that, we actually need to do something really technical, which is understand our thermal noise structure.
AUDIENCE: I was interested in the discussion of regression and ICA regression. My understanding of regression is [INAUDIBLE]. I'm wondering if you could just go to the next level of detail down and just say-- the degrees of freedom that we've been ignoring, is this related to the number of regressors, or the width of the vectors, the number of components? Is there some easy way to understand the degrees of freedom in a regression setting?
PRANTIK KUNDU: It's complicated. So there's degrees of freedom for correlation, and there's degrees of freedom in the times. And we had this debate for a while until I kind of coalesced onto an opinion. If you acquire 500 time points, you actually have 500 degrees of freedom. So that's degrees of freedom at the level of a time series.
So if you're fitting a block design to that, and you're regressing maybe six motion parameters, you have 500 minus 6. And if you're filtering out some low frequency and high frequency-- 500 minus 20.
AUDIENCE: 10 minus the [INAUDIBLE].
PRANTIK KUNDU: Right, exactly. However, when you start correlating one time series against another, what's your degrees of freedom there? It's actually the number of latent processes that are driving the two signals. Because you know you have some unknown number of latent processes that are interacting to produce your target time scale. So for correlation-- I mean it's ironic-- but to do correlation you must have a full decomposition.
To compute the correlation between any two regions, you must have a full decomposition to know what the space of possibilities is, how you get the degree of freedom. And it turns out that the number of BOLD components, from the stats-- it seems like this is actually borne out by the data-- the number of BOLD components is equal to the number of degrees of freedom for correlation. But degrees of freedom for correlation is different than the degrees of freedom in the time series. They're both called degrees of freedom, but they're different.
AUDIENCE: So we pick some number component, and then verify that accounts for 99.999% of the variance. Then call back the number of [INAUDIBLE].
PRANTIK KUNDU: Right, but the thing is, there is a number of degrees of freedom in the total signal space, which with MEICA, you can get to, like, 97, 98, which is pretty high actually. Just as an example, MELODIC, if you're lucky, it explains 60. So you're actually not explaining 40% of your experience in the average MELODIC decomposition.
But then the amount of variance that's actually explained by BOLD, you'd be shocked, but it's about 5% to 10%. If you include drifts, as if you modeled drifts, drifts explain huge amounts of variance. You acquire all the signal. About 5% to 10% of that variance is actually useful.
AUDIENCE: Are there certain categories about the notions to which ME is better-suited?
PRANTIK KUNDU: Yeah, if you're interested in psychiatry. Because if you're interested in limbic processes, and you have a big hole in the orbital frontal cortex, and your amygdala is kind of half-atrophied away, you can't do much science there. So basically, anything that really depends on hippocampus, amygdala orbitofrontal cortex, striatum, brain stem, and the bottom of the cerebellum.
Cognitive people-- if you're focused on attention, you might get away with it. But even then, it's unclear. But anything where you're even close to the dropout areas. That's the argument that we make for the lifespan project or anybody.
That's been the annoying part of the argument that we've been having. Guys, this is free. You don't need new hardware. You don't need to manipulate your gradients. You don't need fifth-order shim. Just acquire a few more TEs and combine them, and you'll get lots more regions than you were previously getting.
AUDIENCE: [INAUDIBLE]
PRANTIK KUNDU: Sure, anything on the dopamine circuit, anything involving brain stem, striatum, caudate, definitely.
AUDIENCE: So I was talking to Tom Power this last spring, and I asked him his take on the multi-echo denoising, specifically as it relates to motion and motion artifact. And he said it was a lot better, but he said, it's like chopping the trees off the mountains. You're still leaving the mountains, which are [INAUDIBLE] the effect of motion in the data. [INAUDIBLE] you're suggesting that the picture, in terms of cleanup and motion, isn't [INAUDIBLE]
PRANTIK KUNDU: Sure, I'll comment on that. So he and I have been having an ongoing discussion over the past couple of years. And there's a point that we agree on, and there's a point that we disagree on.
The data that I showed you here, especially on connectivity, note that I didn't really show much on time series correlation. In terms of connectivity, the time series correlation slide that I did show showed that, even with denoising, it kind of looks a mess. That was that development slide, so it's this thing.
So this is MEICA denoising. And then you do time series correlation. You see, some data sets are pretty messy. So the question is, why this is. And he sees this, too, and he's like, OK, you've chopped the mountain. You've chopped the trees off the mountain, whatever.
So the point is, my contention is, actually, the problem is a much more complicated one based on the statistical structure of the data. There's two things. One is the thing we talked about this morning. There's another aspect to the story that we'll try to get into tomorrow.
The problem is-- and I alluded to it earlier-- if you have a large amount of motion, you have a head traveling through space. Both signals is dependent on homogeneous magnetic fields. MRI is based on slice selection and grated magnetic field. If you have a voxel moving through a graded magnetic field, any inference you can make on susceptibility, like small changes in dephasing, go out the window.
So what happens is, as you have a brain, as you have moving, you incur two types of artifacts. One type of artifact is the spin density change. So that's a positive signal change. Sometimes it's seen as spikes. And the change in signal is because you're basically moving, let's say, from one gray matter voxel to another, where one has higher intensity to another. If this is an acquisition grid, this is my brain. So I'm moving like this. So you have a fixed grid, but the tissue is moving. So you get all sorts of fluctuations.
But once you've stripped off S0 modulations, you're left with the reality that as you were moving, your dephasing was a lot faster. What I'll show you tomorrow is that latent of the data, you have big deflections in the data. They're actually negative deflections. And in moments of really high movement, the contrast actually inverts.
OK, so he and I agreed that motion, if you look at just the time series and do time series correlation, it's still a problem. But it's not a, problem because we haven't handled motion artifact in the way that many people think about motion artifacts. People think about motion artifacts as some sort of additive signal.
It's not just an additive signal. It's additive signal plus information loss. That information loss is conveyed as the signal intensity drops, and your S0 decreases. So that's where we agree.
Where we disagree is how we want to deal with it. He wants to deal with it-- basically, the whole global signal argument is like, OK, if we just regress it out, everything will be peachy keen. My argument is, no, it's not. Yes, OK, if you regress it out, maybe your maps look a little bit better. But that's not addressing the problem. This problem is that we actually have unaccounted information loss. If you want to do time series correlation, you can't do time series correlation on auto-correlated time series
Prantik Kundu discusses emerging applications across disciplines and potential opportunities for discovery-oriented neuroscience using BOLD fMRI and clinical translation of functional imaging measures based on multi-echo fMRI. Kundu is assistant professor of radiology and psychiatry and faculty member at the Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai.