The recent publication of our AIP Hashimoto’s study, Efficacy of the Autoimmune Protocol Diet as Part of a Multi-Disciplinary, Supported Lifestyle Intervention for Hashimoto’s Thyroiditis, has gathered some attention, which naturally means both tremendous support within the AIP community as well as understandable criticism. Recognizing that summary blog posts, social media announcements, and virtual communication are all relatively poor ways to dig into nuance and engage in a healthy critical discussion, I’m offering this piece as an introduction to addressing many of the study’s findings and criticisms, and to ultimately engage in a thought-provoking conversation to further the rigorous scientific process.
In order to really get into the nuance required to explore some of the study’s finer details and criticism, respecting your time as the reader, the entire dialogue will be broken into an extended series of posts (which will be published on my website: Resilient Roots) following this first overview. At the end of this article, I will share a little overview of the subsequent posts, the topics I will cover, and how to continue growing your critical knowledge as a ”citizen scientist.”
Research “Perfection” and Study Design
First and foremost, we should acknowledge that no publication or scientific experiment including the recent AIP study is ever perfect. Each and every study is flawed or limited in some way for a multitude of reasons. Most of these flaws or limitations are known before the study even begins (limitations based on the study design), while some of the limitations arise during the study investigation and testing itself. However, many limitations can be mitigated by refining:
- the methodological design of the study,
- the number of participants in the study,
- the population studied, and
- the methods of analysis or statistical calculations computed.
There are, of course, many other variables one could consider when it comes to designing research, but these are the main areas and points we will explore in the rest of this article and the series as a whole.
When it comes to methodological study design, the greater scientific community has a fairly well accepted gold standard for most interventional/investigational experiments known as a double-blind, randomized controlled trial (RCT), meaning that both the participants and researchers (or someone else involved in the study such as a statistician or someone delivering a treatment like a doctor) are not aware of who is getting what treatment (both the participants and researchers are blind, thus “double-blind”), the participants are randomly placed into either the treatment group or the control group, and the treatment group is compared to some sort of a control- in the case of drug and supplement trials this is often a placebo such as a sugar pill, and in interventional studies involving procedures, the control may involve a sham type of procedure such as sham acupuncture. RCTs often try to maintain as uniform an environment as possible such that potential confounders do not disrupt the study.
Although an RCT is a gold standard in clinical research, they are also extremely costly, and often necessitate a large research team. In understanding the time and financial investment necessary to perform an RCT, the research community, in addition to accepting the double-blind RCT as the gold standard, recognizes that preliminary research is needed to inform study authors about the expectations and hypotheses worth testing in an RCT. Because of this, the scientific community has created a logical and fairly linear sequence to test new hypotheses and new areas of research, and hopefully help us all avoid spending hundreds of thousands or even millions of dollars testing hypotheses that never had any hope of showing something statistically or clinically meaningful. (It is important to note that statistical and clinical significance are not actually the same, and the difference between these two will be addressed in detail as part of a future post in the series on my website.)
Designing the AIP Hashi’s Study
The first step in assessing a new hypothesis or examining a previous hypothesis in a new study population is to start with a very small pilot trial assessing feasibility (can you actually do this treatment or intervention?) and, in some cases, initial efficacy (did the study participants get some measurable benefit from the intervention?). In the case of our recent study, we sought to build upon the first pilot trial testing the feasibility and efficacy of AIP for Inflammatory Bowel Disease (IBD) published by Konijeti et al in 2017 (1).
This pilot study involved a single arm (only 1 group, so there was no control group) of 15 individuals with IBD, either ulcerative colitis or Crohn’s disease. The study reported some fairly compelling findings both in the patient’s subjective symptoms, as well as in the objective changes noted during endoscopy/colonoscopy- actually visualizing the participant’s intestinal (gut) tissue with a small camera, all of which pointed to reasonable efficacy that should be explored in further studies with a similar population of people with IBD, as well as with other populations of people with autoimmune disease such as rheumatoid arthritis (RA) and Hashimoto’s thyroiditis (HT).
In seeking to design a follow-up study to Konjeti et. al, the primary study team including myself, naturopathic medical student Adam Sadowski, Angie Alt, and Mickey Trescott, discussed the pros and cons of study design and specifically which autoimmune disease to study next. It was decided based on:
- the prevalence of autoimmune thyroid disease,
- the lack of many other efficacious treatments for HT, other than thyroid hormone replacement, the significant
- continued disease burden or symptomatic burden of individuals with HT,
- anecdotal clinical efficacy noted by Mickey Trescott in her nutritional consulting practice and by Angie Alt in
- her online health coaching program SAD to AIP in SIX, and
- anecdotal clinical evidence from my own experience with patients utilizing Angie’s program, that HT would be the next reasonable population to examine using a similar pilot study design.
It was also important to acknowledge the critical role of Angie’s SAD to AIP in SIX programs utilized in the AIP IBD study and to build upon this group therapy model in the AIP HT study.
The primary downside of choosing to study a population of individuals with HT stemmed from the reality that there are a limited number of objective markers to assess the disease state or disease progression in HT, because:
- Thyroid antibodies often do not correlate well to clinical symptom burden and do not necessarily correlate to disease activity at the level of the thyroid tissue.
- Thyroid antibodies can fluctuate for various reasons including acute illness and may not be a reliable marker when only tracked with two samples or over two time points. Even if a thyroid antibody level decreased or increased on a second test, I would need at least one more lab draw, and preferably more, to determine was this a real change to a new baseline, or just natural fluctuation or “statistical noise.”
For a pilot study of our size and our budget, there was simply no practical or ethical reason to pursue other methods, like multiple thyroid ultrasounds or invasive biopsies, making thyroid antibodies the most feasible and cost-effective choice, despite their limitations.
Even more so than antibodies, hormone levels fluctuate during the day and are dependent on the timing of replacement medication dosing if the individual is using such a medication. Recognizing the limitations in measuring thyroid hormone levels as a purely objective marker of thyroid function, we attempted to standardize the process as much as possible for the study, asking study members to not take replacement medication prior to laboratory testing and to do the test fasting at the same time on both testing occasions (before and after the 10-week program). While this was not a perfect solution, it was the most reasonable and practical solution given that we had women across the country going to separate labs.
CLICK TO EXPAND // More on pilot studies
More on Pilot Studies
I want to return to a discussion about the purpose and components of a pilot study, such as the AIP IBD study and the AIP HT study, for those who would like to further understand this part of the overall research process.
I like to think of a pilot trial as a “rough draft” for future studies or like a baby crawling before it can walk. There is a natural progression to research and we cannot start trying to run or talk before we simply roll over or stand. Below are a few more key points to consider regarding pilot trials and our pilot trial specifically:
1) Pilot trials are implemented as a “rough draft”.
What I mean by this is that they can serve as a template for more robust studies after some “edits” are made to the rough draft. We use them knowing that some wrinkles need to be ironed out, but where exactly those wrinkles are will only be found after the trial is completed or while we are conducting the trial. They create PRELIMINARY data, flush out methodology flaws, and help to refine a question being asked by the researchers. They help us to decide if pursuing future research is even worth the investment and time based on the preliminary data. I would hate to waste hundreds of thousands of dollars, or even potentially millions of dollars performing a large study without solid preliminary evidence, especially if it came from community crowdfunding efforts.
2) As mentioned earlier, pilot trials can seek to test feasibility, asking whether a given intervention is practically doable/deliverable to a given study population.
For our study specifically, we wanted to know how feasible it was for people to participate in the online health coaching group, interact with myself as a physician and eat AIP for 10 weeks. Markers like this are often measured by retention rate/dropout rate, the completion rate of surveys/laboratory tests or the number of adverse effects reported by study participants. Another feasibility question to ask could be: Is it possible to try and recruit 20 people meeting the study’s criteria via online invitations over a 1-2 month period. This can be measured by the number of people in the treatment arm of the study and the time it takes to reach the desired enrollment.
3) As mentioned earlier, pilot trials also may seek to assess efficacy (does the experiment generate expected and beneficial results in an IDEAL situation?).
As you may now realize from the previous discussion, efficacy can be assessed in a number of ways, both subjective and objective, and it is important in pilot study design to identify both objective and subjective markers that will help you determine the overall efficacy of the experiment. As we alluded to in the previous sections about objective markers for HT, even if one saw a drop in antibody levels as a result of an intervention, but symptoms/quality of life of the participants stayed the same or even worsened, would the study still be deemed efficacious? As you can see, there are many ways to assess and express efficacy in research and it is important as a citizen scientist to explore what exactly changed as part of an intervention to determine for yourself “did X work for Y.”
4) Lastly, pilot studies may seek to assess effectiveness, which can be thought of as whether an experiment can produce those results in a REAL WORLD setting.
In my opinion this an incredibly important aspect of pilot studies, as well as RCTs that involve dietary interventions. It is probably pretty obvious that there is a huge real world difference from having 17 women eating AIP in their homes, versus doing in an entirely controlled trial in a hospital setting where food is prepared for them and there is no way for them to eat or act outside of study parameters. In our study, we wanted to explore the REAL WORLD experience of participants making changes in their unique lives without even one element of in-person interaction with a member of the study team. We wanted them to eat an AIP diet, explore movement or exercise, prioritize sleep, explore stress reduction activities, and spend time in nature, all in the setting of communal education and encouragement from our multidisciplinary team.
What did we really study?
This naturally brings up the need to clarify what really was studied in the AIP HT study and the limitations regarding any conclusions one can make from the study design and its results. The study was titled: Efficacy of the Autoimmune Protocol Diet as Part of a Multi-Disciplinary, Supported Lifestyle Intervention for Hashimoto’s Thyroiditis for a very specific reason. Our study at its core was NOT simply an AIP dietary intervention. It was complex and multifaceted. It involved:
- interaction between study participants with the health coaching team,
- virtual interaction with myself as a functional physician on up to three separate occasions for all 17 participants individually throughout the 10-week study,
- guidance on numerous lifestyle recommendations as previously described, and
- interaction between study participants themselves in the closed online group.
There was SO much more happening than simply handing someone a list of AIP foods, giving them an AIP recipe book, or even meeting with a physician or nutritionist once to explore following the AIP diet. While this is a massive strength, in my opinion, to the intervention, we practically do not know what elements of this complex intervention resulted in the improvements seen in the quality of life and symptom burden. Was it the food? Was it sleeping more? Was it losing weight for some? Was it the social interaction with others doing the same thing? Was it receiving personalized care from a multi-disciplinary team? There are SO many possible answers as to why someone would receive benefit, and what was important to one participant may not have mattered to another. It is so complex!
While many researchers cringe at performing multi-faceted lifestyle interventions like this because of their inherent confounding, I personally find, in returning to the effectiveness point made earlier, that these types of studies are incredibly relevant and practical. I want to know what you, me, and Susie Q can do in our homes and how combining low-risk interventions may provide synergistic benefit beyond what might occur by simply changing one’s diet or walking more.
While you can certainly design very elegant studies to tease out which lifestyle factors actually influence the observed changes, these questions are not good questions to test in pilot studies, and, I would argue, start to lead away from the practical real-life application of such interventions. It is also totally reasonable to think that simply following an ancestral dietary template or a Mediterranean diet instead of AIP as part of a multi-faceted intervention could have resulted in similar improvements.
Even though we have significant clinical evidence of the benefit of AIP for many people and many autoimmune conditions (as a result of so many in this community sharing their personal stories utilizing AIP), our study was not designed to test the efficacy of AIP alone, even in the HT population studied, and we cannot conclude concretely that AIP specifically was behind the positive changes seen in quality of life, symptom burden, and inflammation.
As we state in the study text, and I hope is clear from the discussion above, our pilot was able to show preliminary evidence that the entire multi-faceted intervention that included a phased dietary elimination to get to AIP may be helpful to improve quality of life and symptom burden in middle-aged women with HT. At the expense of being both too conservative or too speculative, I think it is reasonable and important to acknowledge this conclusion as the primary conclusion from our study, and we should not simply go shouting from the rooftops that AIP can cure HT for all people. While AIP was almost assuredly a factor in the large multi-faceted intervention, it would be better to also acknowledge the greater multi-disciplinary intervention involving community support, personalized functional nutrition, group health-coaching, and integrated, personalized functional medical care.
Clinical Game-Changing and New Gold Standards
From this lens, we can see that actually this study was critical to the development of evidence regarding the use of personalized health coaching, functional nutrition, and functional medicine- somewhat new and perhaps controversial paradigms/clinical models for addressing chronic disease. Even more astonishing is the fact that there was no new supplement or drug use as part of the study intervention! Participants were unable to start new supplements or drugs (outside of a medication for acute illness like an antibiotic) and all study recommendations focused on lifestyle changes and food modifications based on AIP dietary principles or recommendations based on organic acid nutritional testing to include specific foods with certain nutrient density (i.e. foods with more B6). As you can see, dietary and lifestyle changes on their own proved to be the driving therapeutic intervention, and we, in this larger integrated, personalized care movement, should be excited by these preliminary findings showing the potential benefit of dietary and lifestyle changes, in conjunction with community support, to support individuals with chronic disease.
As I will discuss in future blog posts as well as podcast conversations with Angie and Adam, the results of this study, as well as the results of the IBD study, are potentially clinically game-changing to the field of integrative and functional medicine. As a functional clinician, seeing patients, who on average have very similar symptom burdens as the patients in the AIP HT study, it can be very challenging to know where to begin therapy when there appears to be so much in disarray. It can be easy to get lured into expensive testing, fancy supplement, and drug protocols without emphasizing dietary and lifestyle changes. The two AIP studies combined are now beginning to accumulate evidence that clinical care for individuals with HT or stable IBD should perhaps begin with the participation in a similar multi-faceted lifestyle intervention, followed by a secondary assessment after 10-12 weeks in order to determine the level of improvement (or lack thereof) without the use of expensive diagnostic or therapeutic interventions.
As a clinician, it makes my life much easier to have a patient participate in such an intervention and see symptom burden decrease some fraction, thus elucidating what unresolved areas I should explore more deeply in my individual work as a functional clinician. This alone could save patients thousands of dollars, accelerate treatment progression, improve integrative care, help clinicians avoid unnecessary tests/therapies, improve the therapeutic alliance between patient and provider, or client and coach, and promote truly integrated care between healthcare professionals across all fields of medicine, nutrition, and coaching.
Perhaps we can all begin to see that, a multi-faceted dietary and lifestyle intervention, such as the one studied here, may, with increasing evidence become a “gold standard” for a dietary and lifestyle intervention utilizing a nutrient dense, phased elimination diet (AIP). As a clinician, I would feel much more assured if someone participated in Angie’s SAD to AIP in SIX programs or in a similar intervention that the person “gave AIP and lifestyle change a reasonable shot.” I would feel much less confident that AIP either helped or didn’t help someone who simply went cold turkey on their own, and stated after three weeks of a valiant attempt that nothing had changed. I am not saying that individuals cannot do AIP on their own, as you guys are a living testament to the radical empowerment and changes that can occur when you make changes on your own without such a support team, but for an outside clinician hearing from patients that say they “tried this and tried that” and it did or didn’t help, it can be very difficult to discern to what degree the patient actually did “this” or “that’ therapy.
There is a powerful movement beginning from grassroots efforts and I cannot tell you how excited I am about the empowerment that occurred in 17 women with HT who decided to improve their lives through dietary and lifestyle changes with no guarantee that anything would work. I am so excited about the future of integrated health coaching, functional nutrition, and functional medicine all applied in a communal group setting. As Dr. Sarah Ballantyne shared in a panel discussion at the 2019 Nutritional Therapy Association’s Annual Conference, we have to remain diligent and persistent, changing when new evidence arises and avoid dogma in our own echo chamber. AIP is a powerful tool, but a singular tool that must be examined and applied in the right context and not seen as a dogma for any autoimmune disease. It will also be critical to make proper and balanced conclusions from the AIP research we conduct, as this will only strengthen our field as a whole, and make conducting larger research studies utilizing the AIP dietary intervention a reality.
It has been my intention in this first article to unpack some of the nuances of the AIP HT study, to discuss the major elements of the full multi-disciplinary intervention, and to suggest some reasonable conclusions we can make from the study’s results. I illustrated aspects of the overall scientific process, the major elements of pilot studies versus randomized-controlled trials, and the role pilot study’s play in establishing the initial evidence needed to inform future studies. I also began to unpack one of the primary limitations of the AIP HT study involving the identification and use of an objective study marker, and how we, as the study team, tried to balance the practical and financial concerns of conducting such a study while utilizing the most relevant objective markers possible for our study population.
In the future installments of this series we’ll explore:
- a more formal discussion of the primary study critiques
- an in-depth look of the study’s objective results, specifically exploring the findings regarding thyroid hormones and thyroid antibodies, and why simply looking at the statistically insignificant changes seen at the group level misses some of the most incredibly critical points
- all things statistical methods, outlining the techniques utilized by the study team to best represent the study data, as well as some of the nuance behind clinical and statistical significance.
- some concluding remarks about the greater scientific process and community as a whole, as well as how we can all, as scientists and laymen, contribute positively to the quality and relevance of the research that we conduct
If you’d like to continue digging in, you can follow along on my website here: Resilient Roots. Thanks for sticking with me on the first part of this journey!