Clinical benefit is a subjective parameter that regulators and policy-makers must try to evaluate in a fair and objective manner. As such, the success or failure of investigational therapies is dictated by strict statistical thresholds for specific clinical trial endpoints. However, some agents that demonstrate statistically significant benefits in early studies go on to fail in subsequent assessments, and others might fall short of a significance threshold while still offering meaningful benefit. This has led some to question if there is a systematic error in how we determine the statistical significance of benefit during the various phases of clinical development, including go/no-go decisions in phase II drug development. The phase III/registration trial setting is especially impactful, as there is high personal cost and high financial burden for patients and providers and for developers for false-positive and false-negative errors, respectively.
To address the issue from a statistical standpoint, Dr Changyu Shen et al conducted a rigorous analysis of phase III randomized superiority trials to quantify false-positive, false-negative, and true-negative benefit conclusions. The authors identified 362 eligible phase III trials from ClinicalTrials.gov for the analysis, focusing on those with overall survival (OS) and progression-related survival endpoints. The trials included patients with lung, breast, gastrointestinal, and hematologic cancers. Their analysis found that 87% of the phase III oncology clinical trials in the United States during the past 10 years have been negative for OS benefit (including false-positive and true-negative), and concluded that a very large group of ineffective therapies are being tested in phase III trials. This calculated high failure rate was based on “target effect size” for OS, rather than each trial’s primary endpoint and individual statistical power calculations.
In their analysis, Dr Shen and colleagues uncovered what they considered to be a high (58.4%) false-positive OS rate among studies vs what they defined as the target effect size when P = .05 was used as the statistical cutoff value. The authors then evaluated the potential effect of using different P value thresholds to reduce false-positive errors. Reducing the P value threshold to P = .005 reduced the number of false-positives to 34.7%. However, the false-negative rate increased from 0.9% (for P = .05) to 3.6% (for P = .005). Clearly, there is a balance between allowing an ineffective therapy access to the market vs blocking access to a therapy that might provide meaningful benefit to patients. In addition to the potential across-the-board change in P value approach evaluated by the authors, a flexible model could be used that is based on the clinical settings. In settings where there are essentially no other options, the detrimental effects of a false-positive drug being approved would be lower than in settings where the new agent might displace effective therapies.
For example, in clinical settings where there are many patients and multiple current therapies (eg, breast, lung, and colorectal cancers), the more stringent cutoff (P = .005) might better reflect the value of a new therapy, whereas in orphan drug settings and when there is especially poor prognosis, the statistical significance threshold could be more relaxed. Indeed, some of the 87% of clinical trials considered to have been negative for OS benefit by Shen et al might have offered important benefits in PFS and symptom control or benefit in rare cancer settings, wherein it is often not feasible to generate sufficient statistical power in a study.
The question of meaningful clinical benefit might not be resolved by an adjustment to P value cutoffs, but rather through holistic evaluation of the overall clinical trial data in the context of unmet medical needs. When the potential benefit of a New Drug Application or Biologics License Application is called into question, the US Food and Drug Administration (FDA) currently convenes advisory committee meetings to gain expert insights on the objective and subjective attributes of the new therapy. Expert discussion through independent platforms can also serve to help physicians select and sequence FDA-approved therapies on the basis of their experience with all the agents in the landscape. In this way, the science of therapy can move beyond statistical stringency. It is clear that we have learned a lot in the past decade, but further work is needed to optimize clinical trial design to reduce the potential risks of false-positive and false-negative benefit conclusions.
High level
The study by Dr Shen and colleagues provides important food for thought in that clinical development in oncology has been hit-or-miss in the past decade, and the possibility exists that many therapies that do not provide statistically significant benefits might have made it into the treatment landscape. This elevates the importance of expert oversight of real-world efficacy and discussion of clinical practice experiences in open forums. The publication also discusses the theoretical effects of increasing statistical stringency, such as lowering the P value threshold for OS endpoints of phase III trials, on the false-positive error rates for phase III trials. The authors suggested that higher stringency could be applied to phase II trials to decrease the number of candidate drugs that go into phase III development and fail, exposing fewer patients to therapies that do not offer sufficient efficacy and reducing development costs for drugs that will ultimately fail. However, a critical analysis of the available phase II trial data for drugs that failed vs those that succeeded in demonstrating benefit in the phase III setting would be needed to determine whether this approach would reveal any insightful correlations.
Ground level
Endpoints, P values, and sample size are important considerations when reviewing phase III trial data for new treatment options. Awareness of potential study design limitations can better frame the clinical meaning of the new therapy’s reported benefits. However, these parameters are not always clear from published study results. The recent publication criticizing pivotal trial statistics underscores the importance of discussing the relative benefits of the different therapy options in clinical practice experiences with peers and experts to gain important insights and guidance for therapy selection and sequencing in evolving treatment landscapes.