Episode 83 with Dr. Jerry Wolinsky on composite clinical trial outcome measures
MS researchers have been looking to composite endpoints as more sensitive and meaningful measures of MS progression and disability for testing new drugs.
Note: Each podcast includes an interview with a thought leader or newsmaker in MS and related demyelinating diseases. Listen to it here. Alternatively, you may subscribe to the podcast via iTunes or your favorite podcast app. In iTunes, for example, click File/Subscribe to Podcast and then enter this URL: http://msdiscovery.libsyn.com/rss
The 2015-16 series of MS podcasts is supported in part by a generous grant from Sanofi Genzyme. The content remains the sole responsibility of the Multiple Sclerosis Discovery Forum, an independent non-profit news organization.
Host – Dan Keller
Hello, and welcome to Episode Eighty-three of Multiple Sclerosis Discovery, the podcast of the MS Discovery Forum. I’m Dan Keller.
For years, MS researchers have been looking for a measure of MS progression and disability that would be meaningful to clinicians, clinical researchers, patients, and the regulatory agencies that approve new drugs, such as the Food and Drug Administration. To this end, people have looked to composite endpoints that are sensitive to small changes in patient condition and comparable across studies. At the ECTRIMS conference last fall in Barcelona, I met with Jerry Wolinsky MD, professor of neurology and director of the MS Research Group at the University of Texas Health Science Center at Houston, who leads us along the path to develop a useful measure incorporating composite endpoints.
Interviewer – Dan Keller
In terms of assessing progression and disability in MS, is there some advantage to having composite endpoints as opposed to the standard tests we’ve looked at?
Interviewee – Jerry Wolinsky
There are several different ways to think about composite endpoints. So one of the things that was introduced almost several decades ago was MSFC functional composite. So this was using three different ways of looking at different components of disability in patients with MS. One was a test of cognition. One was a test of fine motor skills in the upper extremities. And one was a test of walking abilities/walking speed. That particular composite looked very attractive. There was a fair amount of theoretical and practical work behind instituting the composite, and it was used in a number of trials. And it was based on some very important, I think, kind of statistical analysis.
So what it allowed one to do was to take patients either in a given study or across studies and try to normalize the data that you would get from those patients into something called a z-score, which is a way of ranking and evaluating how far across the group of patients people were scattered. And then one could conceptually add up the z-scores and have a composite number, and a single number that you could use to analyze trial data. It seemed to be rather sensitive, and it seemed to work well. But the z-score is very dimensionless, and it makes little sense to the practicing clinician, or certainly to patients, to know that you’re minus-two or minus-five or plus-two, and that maybe this has moved by two-hundredths of a point from the time you started in the study until you got to the end of the study.
So, highly sensitive, seemed very reproducible, maybe even a way to look across studies at different results, but neither patients or physicians and, most importantly, the FDA thought that this would be useful in day-to-day practice. So, while we’ve tested that kind of approach in multiple studies, it just hasn’t worked. But it did set up the notion that we could get a little bit more quantitative in things that could be useful on a daily basis, even using some of the same components of that MSFC.
So instead of thinking about how fast could one person walk compared to another, we said, how fast can a person walk using a timed walk of a fixed distance and at one point in time? And then say how much change over an interval of time would represent something that was likely to be reproducible and, more importantly, likely to be correlated with some measure of quality of life that also was deteriorating?
So then we got to the notion–and this was really best utilized thus far in the trials of 4-aminopyridine in terms of registration studies there–to say could you show a 20% improvement or more in this timed walk over an interval of time? And in that study, a certain number of patients were able to show it, and there was also some correlative data done to show that that amount of improvement correlated with things which were meaningful to the individual. And so I think that helped facilitate getting that drug through the registration process with the FDA.
One of the things that my colleagues and I did in looking at one of the trials in progressive disease, specifically the trial of rituximab in primary progressive MS, where we had the data that goes into the MSFC, because it had been collected in the study, was to try to develop a number of different composites. And actually, when you think about it, the main score that we use to rate studies is the EDSS score, and it itself is a composite. It takes into account graded changes in fine motor skills in what we would call the cerebellar system, in the pyramidal system, in the sensory systems, and cognitive systems. It’s just that the boundaries in moving in these individual functional scales are a little bit more subjective in terms of going from a zero to a one, two, or three. And then the scale itself is rather complicated in terms of how it put together to come to the final score, the extended disability status score. But it’s very well accepted by neurologists, and it’s accepted by the regulatory authorities as the standard.
So we took our standard changes on EDSS, which in this particular study had not shown efficacy across the group as a whole. So we looked at that in the placebo arm, and didn’t contaminate that with the treated arm, to say what was the rate of change on the EDSS alone? But then we also said, what about a 20% change over baseline that had occurred in an individual patient over intervals of testing and not just one that occurred at a particular setting compared to baseline, but one that continued to be seen at the next 3 months and the next 3 months. So it looked like it was a sustained change in the same way that we use EDSS now in trials to talk about sustained or accumulated permanent disability, at least over some interval of time.
So we said, okay, we can construct a progression curve based on that. And then we said, what does that look like? And said, well, this has some dimensions to it that are interesting. And we did the same thing with the Timed 25-Foot Walk, and we didn’t fool around with the PASAT [Paced Auditory Serial Addition Test] the cognitive measure because nobody likes it. Patients don’t appreciate it, and it’s a rather prolonged and not a simple test to use. And this is one that probably could be easily changed out with other cognitive tests that are probably as reliable and easier to complete. And we looked at how did patients progress using that change in the timed walk and said, well, that’s interesting too.
And then we went into the group as a whole and said, okay, how many patients changed on the EDSS over three months, confirmed? How many over six months, confirmed? How many did this on the Timed 25-Foot Walk? Did it cross the 20% threshold? How many did this on the 9-Hole Peg Test and, again, crossing the 20% threshold? And who were these patients, more importantly? So then we could develop series of Venn diagrams–if you will, circles–that showed who did it on just one test, who did it on all tests, who did it on two tests? And looked to see could we get a larger and larger proportion of the population that were showing progression?
And the answer is: We could. And for some tests, the incremental change was small, and for other tests the incremental change was relatively large. But when we looked at the results of the study, then, using different kinds of composites, you fail just on EDSS; you fail on EDSS, or you fail on Timed 25-Foot Walk; you fail on Timed 25-Foot Walk or 9-Hole Peg Test—we don’t care about EDSS in that one—you fail on all three. We could see that we could increase the sensitivity, that is, the number of people who were showing progression, using these kinds of composites, and hoped, therefore, that we could increase the sensitivity to drug effect.
So then we did the next step, which was to take both the placebo arm and the treated arms and say, okay, how did the curves change? So the overall curve showed no statistical benefit with the EDSS, until you went to subgroup analysis. And that was reported in the original paper. But when we modeled this, of course, the overall didn’t show the statistical effect. That’s where we were starting from. When we added in the Timed 25-Foot Walk, it looked like there was a better split. In fact, the effect size for the treatment improved. And this was not across subgroups, but across the entire population.
Interestingly enough, we probably got the biggest punch by throwing out the EDSS and just using the 9-Hole Peg Test and the Timed 25-Foot Walk. That has some advantages, because they can be done by anyone. In fact, they probably could be done remotely, or we probably could convert it to how many steps a day did you take and have your watch feed the message to us over the course of a day. There are a number of interesting different approaches that can be taken to this kind of concept, and some of these are being pursued by a collaborative group spearheaded through the NIH, as well as a private consortium, looking at newer ways to measure progression.
The good news is, I’m sure we’ll find things that are more sensitive. The good news is, I’m sure we’ll find things that are easier to apply. Another part of the good news is that the additional work increasingly is carried out with some representatives from the regulatory authorities to give us a feeling for what they really want to see. And what they would like to see is not just that we have composites that are sensitive and reproducible, but each of those composites that, before using them, has been shown to have some relevance for what patients complain of and what patients are looking for. So that’s the good news.
The bad news is we have to not only develop them, validate them, show that they work, we’ll probably have to constantly be comparing them back, in our future trials, to the standard, until we get our first drug that really works in these new, validated approaches that are being taken.
Do you think that different drugs will show you different effects on different parameters within the composite score, or do things pretty much move in synchrony?
You know, because multiple sclerosis is such a heterogeneous disease—heterogeneous in many ways, but the simplest one to think about is the lesions don’t exactly form in a way that suits us as trialists. So, many of the lesions are silent for whatever it is we’re trying to test, no matter how carefully we test for them except maybe with really high resolution MRI. So it depends where in the real estate the lesion has hit. So it’s easy to imagine that a relatively small lesion in the cerebellum particularly well-situated could cause some slowing of the ability to do the 9-Hole Peg Test, and yet it might take a very large lesion in the frontal lobe to do the same effect in that system.
In the same way, it may take just a small lesion in a pyramidal pathway, either in the spinal cord or in the internal capsule, to cause a significant change in the 25-Foot Walk and do nothing in the 9-Hole Peg Test. So, conceptually, we want to be able separately test—or relatively separately; the brain is fairly interconnected—separately test as many systems as we can and build upon them. Usually with these composites, you don’t lose too much by adding composites, as long as they’re truly independent of each other. As they become more interdependent, then the more you add, you may lose some of your ability to find small changes statistically. They’ll cancel out.
Even though these are composites, you’re still interested in the separate parameters? I mean, it looks like one parameter could offset another, and your composite score could be neutral, even though you have larger changes in the separate parameters.
What you’re trying to do, if you’re setting up your composites correctly, is not to have them cancel. And with the z-score we talked about before, it can cancel. With a composite, where you’re expecting each of the scales to be moving in a particular ordinal fashion that is going from better to worse, you don’t care where the worst comes from, if you’re saying we’ll take worse in any system. Where it gets tricky is, once you get good at that, then you might want to say, well, you get two points for getting worse in the walking system, because that’s more correlated with whether or not someone’s employable than it is if it’s in, let’s say, bladder measures, which we don’t have quantitatively—well, we do, but they’re just harder to apply—or perhaps on using other visual pathway measures that have yet to be introduced into the composites very well.
Thank you for listening to Episode Eighty-three of Multiple Sclerosis Discovery. This podcast was produced by the MS Discovery Forum, MSDF, the premier source of independent news and information on MS research. MSDF’s executive editor is Carol Cruzan Morton. Msdiscovery.org is part of the nonprofit Accelerated Cure Project for Multiple Sclerosis. Robert McBurney is our President and CEO, and Hollie Schmidt is Vice President of Scientific Operations.
Msdiscovery.org aims to focus attention on what is known and not yet known about the causes of MS and related conditions, their pathological mechanisms, and potential ways to intervene. By communicating this information in a way that builds bridges among different disciplines, we hope to open new routes toward significant clinical advances.
We’re interested in your opinions. Please join the discussion on one of our online forums or send comments, criticisms, and suggestions to firstname.lastname@example.org.
For Multiple Sclerosis Discovery, I'm Dan Keller.