Deep dive: The challenge of causal inference in RWD analysis

Causal machine learning (ML) is an evolving field that aims to model and infer cause-and-effect relationships in data. Unlike traditional ML, which focuses primarily on making predictions based on associations in data, causal ML seeks to answer questions about the effects of an intervention, e.g. via a drug, from data that may have been collected under real-world conditions (real-world data – RWD). The challenge arising in this context is, however, that unlike the situation in a randomised controlled trial (RCT), patients taking a specific drug might demonstrate systematic differences to patients who were not treated with that drug, because the treatment is a consequence of an informed decision by a doctor. Thus, the naïve comparison of treated and untreated patients could be affected by statistical confounders and thus lead to wrong conclusions about the effectiveness and safety of a drug under real-world conditions. The fundamental problem in this regard is that one can only observe the factual outcome of a given treatment assignment for an individual, but not the opposite.

What causal ML offers

Causal ML combines statistical techniques with ML algorithms (e.g. Random Forests, neural networks) to estimate a defined causal quantity of interest, for example the conditional average treatment effect (CATE), also known as individual treatment effect. Opposed to the average treatment effect (ATE), which is often the quantity of interest in RCTs, the CATE describes the treatment effect averaged over subgroups of patients that share the same covariates (e.g. age, sex, certain diagnoses). To make such an estimate possible, assumptions on the causal structure of the problem must be made regarding the existence and observability of confounders. Furthermore, it has to be defined, which treatment type (binary or continuous) and covariates should be modelled. Depending on the choices, different techniques exist among which the modeler can choose. Examples include so-called meta-learners (S-, T-, X-, R- and doubly robust learners) as well as instrumental variables techniques. Many of these approaches have recently been implemented in publicly available programming libraries such as Microsoft’s EconML.

Checking the robustness

Due to the made assumptions it is vital to check the statistical robustness of the estimate against violations of the assumptions. This is done via refutation tests, which include adding random variables, randomly permuting the assignment of patients to the compared groups and sub-sampling from the data. A refutation test can only detect an existing violation of model assumptions but not provide any guarantees for the models’ correctness. Nevertheless, robustness checks are currently the best practice in causal ML.

Interpretation of results

Findings from causal ML analysis should be interpreted with care and the rationale behind choosing specific methods clearly stated. If possible, obtained estimates should be compared against those from an RCT and the inherent limitations of RWD for causal inference acknowledged.

Causal ML in Real4Reg

In Real4Reg we plan to implement and apply causal ML techniques for different tasks:

  • Target trial emulation: The aim is here to estimate ATE of SGLT2 inhibitors, a class of drugs that are used to treat type 2 diabetes. We will evaluate, in how far the ATE estimated via causal inference techniques based on RWD differs from that reported in RCTs.
  • Individualized drug effectiveness: The aim is here to estimate the CATE of SGLT2 inhibitors rather than the ATE. This will help to understand, which patients may benefit most from this treatment.
  • Individualized adverse drug reactions (ADRs): The aim is here to estimate the CATE of fluoroquinolones, a class of antibiotics, to better understand, which patients may show an elevated risk of ADRs.

To achieve our aims, we will leverage large-scale RWD from Denmark, Finland, Germany and Portugal.

Conclusion

Causal ML significantly enriches the toolbox of RWD analysis techniques and offers the potential to better personalize treatment strategies in the future. Specifically, in the context of regulatory decision making, causal ML could lead to an improved understanding of which drugs are most effective for which subgroups of patients and which drugs may lead to unwanted side effects in dedicated patient subgroups. Altogether, this offers the potential for more effective and safer drug prescriptions in the future.

References

Feuerriegel, S., Frauen, D., Melnychuk, V. et al. Causal machine learning for predicting treatment outcomes. Nat Med 30, 958–968 (2024). https://doi.org/10.1038/s41591-024-02902-1
Judea Pearl. Causality. Cambridge university press, 2009
https://causalml.readthedocs.io