As I described in my last post, we can incorporate almost any physical effect into the calculation of our reference information in MRF. Since the beginning we’ve found it to be incredibly beneficial to represent these data as discretized dictionaries, even if we’re doing some kind of other optimization in an outer loop. I often get asked/criticized about this in various forms:
“Dictionaries take forever to calculate…”
“They have limited resolution so you can’t get an exact solution…”
“They take up too much memory…”
And these are legitimate questions, but the reality of development over the last 10 years has shown that not only are compact dictionaries easy to calculate and work with, they actually outperform methods like non-linear fitting in almost every aspect, to the point where we even use dictionaries for almost everything outside of MRF.
Traditional Iterative Methods
But let’s start with the historical gold standard: non-linear fitting methods like Simplex minimization or the Levenberg–Marquardt algorithm. For decades, these methods have been the only option for solving even medium-scale non-linear problems, like least-squares fitting, since computers didn’t have enough RAM to store the entire solution space, and the concept of compressed sensing hadn’t changed the way we compute. These methods all share a basic set of procedures: they start from one or more initial guesses for the solution. From there the best-guess solution moves down a gradient in the solution space towards the minimum. The difference between the methods is largely how these updates are calculated.
While these methods can solve difficult non-linear problems, they share a set of common limitations. First, these methods often stop at local minima and are not guaranteed to find the global minimum. This leads to a second problem, that the result can be highly dependent on the initial guess, biasing the result to particular local minimum without ever knowing it. They also suffer from the well-known concept of the “curse of higher dimensions” whereby these methods have increasing difficulty in navigating the solution space as the dimensionality increases, thus requiring specialized methods for these higher order problems. For inverse problems, the entire forward model typically has to be recalculated for each iteration, and it’s incredibly difficult to share information between different iterations or calculation threads— you’re basically on your own for each and every calculation no matter if you already did similar math on the last iteration or last pixel. Finally, these methods are incredibly difficult to efficiently parallelize on clusters and GPUs since the algorithms require an unknown number of steps to reach the minimum, thus requiring a significant investment in coordination between the different calculation threads. (And anyone suggesting a constant number of iterations in something like Levenberg–Marquardt has no standing to criticize resolution in dictionaries!)
Solve The Whole Problem with a Dictionary
Dictionary-based methods solve all of these issues. One of the biggest benefits to me is that dictionaries are almost like brute-force solutions. They calculate solutions for the *entire* space, so you’re guaranteed to get to the global minimum (or to the closest discretized solution) every time. There’s no bias from an initial guess. It doesn’t matter how ugly your problem is, you’ll still get the right solution in the same number of calculations even there’s tons of local minima or super flat minima. You can also precalculate the dictionary, which can be especially beneficial for problems that are very expensive to calculate. That also means that we effectively share the information that we calculate for one pixel with the other pixels in our image. Because we know exactly the number of calculations we need to do for each image, we can trivially parallelize the problem even on inflexible GPUs or FPGAs.
Now this may result in large dictionaries if you want it to, but one of the key observations of the last decade is that these dictionaries tend to be highly compressible. This really is intrinsic in anything that has to do with MRI— we have a finite T1 and T2 which tend to smooth out neighboring dictionary entries. Smoothness means compressible. Even in our some of our largest dictionaries, we typically get 50-100x compression so that they fit completely in a modern GPU. Compressed sensing has also taught us that we don’t even need to explicitly calculate the large dictionary. We can use methods like Randomized SVD to directly calculate the compressed dictionary.
All of this directly relates to the curse of dimensionality. Unlike iterative methods that get lost in higher dimensional spaces, dictionaries tend to become more and more compressible as the dimensionality increases. So while the memory to store a full explicit dictionary increases exponentially with the dimensionality, practically we’ve seen that memory only goes up approximately linearly with the dimensionality when dealing with things like MRF dictionaries. This means that we can calculate practical dictionaries for everything that we’re doing in MRF.
But we don’t even have to go through the whole dictionary every time. We can also take a tree-based approach, which we know from many other computational methods can be orders of magnitude faster than explicit calculation. As an example, the concept of fast group matching in MRF from Cauley et al is a simple 2-layer tree calculation that can add another factor of 10 or more acceleration over compression alone. An initial first pass across a set of template signal evolutions narrows the following calculations to only the relevant parts of the dictionary. The second stage then follows a compressed dictionary match, but in this case the compressed dictionaries in the leaf nodes are calculated over only a part of the larger dictionary, resulting in even higher compression ratios. Even with a 2-layer tree we’ve seen factors of 100 or more in calculation speed. Going to more tree layers can add more orders of magnitude to the speedup.
Off the Grid
All of these methods allow us to be more flexible in how we think about a dictionary. If you want higher resolution, you just calculate it and compress. But before we do that, we can also use concepts like polynomial fitting or quadratic/parabolic interpolation in the solution space to find even more resolution and acceleration without increasing the memory load. In this case, we can take advantage of the fact that the cost function (or inner product in a template match) is almost perfectly quadratic in form near the global minimum. (You can show that this is true for most cases where your solution isn’t being dominated by one of your regularization terms… and if you’re regularization term is dominant, you probably need to reformulate your problem anyway!) That means that we can do a rough dictionary match to get the approximate location of the global minimum. With this knowledge in hand, we can calculate the cost function/inner product for neighboring values as well. Based on these values, we can fit the cost function/inner product to a quadratic form using analytical equations which are trivial to solve. This gives us essentially continuous resolution throughout the solution space in a fixed, rapid calculation time that can be completely parallelized.
Broader Uses of Dictionaries
We actually studied all of this outside of MRF by comparing dictionaries against non-linear fitting for perfusion modeling in the liver. We showed that dictionary methods performed as well as optimized non-linear fitting methods in terms of accuracy but were 140-times faster to calculate even before parallelization on a GPU. Internally we continue to use dictionary based methods more and more, especially as our ability to deploy these methods on GPUs increases.
At this point, I’ve been doing iterative methods for solving non-linear problems for over 30 years. I’m really good at them. But today, across a wide range of areas, I am reconsidering using an iterative fitting method for almost everything except for truly large problems that can’t be compressed or broken into chunks that can fit in memory… but these problems are going to be very rare. I personally think all of us should get good as these kinds of methods. You can do great things with some compression and AI at this point! In any case, folks proposing to do non-linear fitting in quantitative MRI/MRF need to do some hard thinking as to why. What benefit are you seeing? Just because they’re traditional methods, doesn’t mean they’re the best solution. The real world is quantum. I promise you’ll be fine on the grid!
As always, let me know if you have any thoughts or questions. Thanks for reading!
—Mark