Home | Facilities | Prediction Modelling Group | Prediction Modelling Presentations

Hands using a laptop next to graphic of brain inside a head outline

Prediction Modelling Presentations

Our Prediction Modelling group host regular presentations to engage and introduce new members to our community. Find out more about upcoming sessions here.

To be notified of upcoming presentations and be added to the prediction modelling distribution list, please email raquel.iniesta@kcl.ac.uk. Note that you do not have to be affiliated with King’s College London or NIHR Maudsley Biomedical Research Centre -our talks are open to everyone with an interest in Prediction Modelling.

All our presentations will be uploaded below, and available on our YouTube playlist.

Watch the latest presentation:

Haider Mian | Deploying machine learning algorithms to stratify patients with inflammatory bowel disease using routinely collected electronic health records | 1 February 2024

Personalised medicine approaches are eagerly awaited to facilitate individualisation of medical care for patients with inflammatory bowel disease (IBD). Multiple approaches have already been explored in attempts to stratify patients into different prognostic trajectories. In this study we aimed to use unsupervised machine learning algorithms to cluster patients based on their routinely collected electronic health records in an unsupervised approach.

Click the drop-downs to see past presentations:

Yiyang Ge | Using topological data analysis to assess the quality of imputed missing data

25 October 2023

As the number of dimensions increases, big datasets from precision medicine research studies can exhibit complex shapes and unexpected behaviours. The statistical analysis of such data necessitates sophisticated analytical methods capable of capitalizing on the high dimension of these datasets. This talk will present novel methods of applying TDA to devise a unique approach for assessing the quality of data imputation for missing values. The method establishes a pipeline that combine TDA with permutation testing to identify differences in topological data structures among datasets. This provides valuable information for tailoring the selection of missing data imputation strategies.

Gabriele La Malfa | Fairness in Multi-Agent Systems

28 September 2023

In this presentation we delve into the field of algorithmic fairness in Multi-Agent Systems (MAS), focusing on the fairness of agents' decision-making processes. We first provide a definition of fairness and we present the reasons why it is relevant for AI-based decisions. Various fairness metrics, e.g., demographic parity, conditional statistical parity and fairness through awareness are discussed. We show how to apply these metrics in multi-agent systems, providing an explanation of the key adaptations. We complete the presentation with an application of the metrics to the Harvest Tree Game, an original configuration of multi-agent systems that are already well-known in the literature.

Dr Raquel Iniesta | Fair modelling: a qualitative framework for an ethical development & implementation of AI models for precision medicine

24 May 2023

The integration of AI-based decision tools into routine clinical care is opening the door to a completely new paradigm where doctors and machines can collaborate to decide a right diagnose or treatment for a patient, based on individual patient's biomedical information. A number of important ethical challenges rise from the development of the AI tools to their implementation. In this talk Raquel will introduce Fair modelling, a qualitative framework that aims to serve as an interrogation for an ethical integration of AI decision systems in healthcare. During her talk, the role that clinicians, developers and patients have in ensuring an ethical development and deployment of AI models will be discussed. Several ethical challenges will be identified and connected with the four ethical principles of the medical profession —Respect for autonomy, Beneficence, Non-Maleficence and Justice.

Josefien Breedvelt | To taper or top-up? Using individual participant data meta-analyses and decision tree modelling for risk stratification and personalisation of psychological interventions to prevent depressive relapse

26 April 2023

Individual participant data meta-analyses (IPDMA) have in recent years been applied to a range of mental health conditions to understand individual differences in treatment response and aid the personalisation of interventions. This presentation covers the results of a large-scale effort to collect and synthesise available data from randomised controlled trials studying the efficacy of psychological interventions versus control to prevent depressive relapse for people in remission from depression (see also: itfra.org). It will further describe how individual participant data could be used to potentially improve risk stratification using decision tree analyses. It will reflect on the practical and methodological considerations of using IPDMA to aid the personalisation of interventions to individual participant characteristics. It will also cover plans for conducting IPDMA for preventing the onset and relapse of common mental health conditions.

Dr Huajie Jin | Using Whole Disease Modelling to Inform Resource Allocation Decisions in Schizophrenia Services

29 March 2023

Whole disease models (WDMs) are large-scale, system-level models which can evaluate multiple decision questions across an entire care pathway. Whilst WDMs can offer several advantages as a platform for undertaking economic analyses, the development of a WDM requires a significant initial investment of time and resources and presents additional challenges for model verification and validation.

During this talk, Lily discusses:

The motivations for her to develop a WDM for schizophrenia services in the UK
Methods for developing the schizophrenia WDM
Reflections on pros and cons of the whole disease modelling approach

Zeljko Kraljevic | Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs

22 February 2023

Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We will explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings.

Lei Luo | A novel convolutional neural network approach for classifying brain states under image stimuli

25 January 2023

The mechanism of human neural responses to different stimuli has always been of interest to neuroscientists. In clinical situations, tools to distinguish different diseases or states are required. However, classic classification methods have obvious shortcomings: traditional clinical categorical methods may not be competent for behaviour prediction or brain state classification and traditional machine learning models are improvable in classification accuracy. With the increasing use of convolutional neural networks (CNN) in neuroimaging computer-assisted classification, an ensemble classifier of CNNs might be able to mine hidden patterns from MEG signals.

Robin Genuer | Random survival forests for competing causes with multivariate longitudinal endogenous covariates

28 November 2022

Joint models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of endogenous markers. We extended the random survival forest methodology to incorporate multivariate longitudinal endogenous markers. At each split of the nodes of the random forest trees, mixed models for the longitudinal markers are fitted and the predicted random effects are used among the others time-fixed predictors to split the subjects. The individual-specific event prediction is derived as the average over all trees of the leaf-specific cumulative incidence function computed using the Aalen-Johansen estimator. We demonstrate in a simulation study the performances of our methodology, both in a small and a large dimensional context. The method is applied to predict the individual risk of dementia in the elderly (accounting for the competing death) according to the trajectories of cognitive functions, brain imaging markers, and general clinical evaluation. Our method is implemented in the R package DynForest.

Dr Lauric Ferrat | The use of machine learning to improve prognostic and diagnostic accuracy

27 July 2022

The use of machine learning to improve prognostic and diagnostic accuracy has been increasing at the expense of classic statistical models. In this talk Dr Lauric Ferrat presents results comparing the prediction performance of several well-known machine learning approaches to logistic regression. He then argues that focus should not be made on performance optimisation but clinical utility and ease of model access.

Dr Mizanur Khondoker | Performance comparison of dynamic prediction based on joint models and landmark analysis

29 June 2022

In conventional prediction models, predictors are typically measured at a single fixed time point such as at baseline or the most recent follow-up. Dynamic prediction has emerged as a more appealing prediction technique that takes account of longitudinal history of biomarkers for making predictions. In this talk Dr Mizanur Khondoker presents results from a simulation study comparing the prediction performance of two well-known approaches for dynamic prediction, namely joint modelling and landmarking approaches.

George Gifford | Network Clustering of Cognition and Clinical Symptoms in Psychosis

25 May 2022

Unsupervised learning techniques have been applied to psychosis groups in the hope of finding meaningful but undiscovered groupings of patients. A methodological option for unsupervised learning is network-based clustering, which relies on the topology of the data represented as a network. This study used cognitive and symptom data from a cohort of healthy controls and those with a Clinical High Risk of Psychosis to test the validity of graph clustering and to explore the use of a multilayer clustering method for multimodal unsupervised learning. Graph clustering was able to produce results highly similar to k-means clustering and to separate groups into those with significantly different functioning scores. Multilayer clustering was used to tune the similarity of clustering solutions between modalities.

Dr Nicholas Cummins | Artificial Intelligence: Challenges and Ethical Concerns

30 March 2022

Artificial Intelligence (AI) systems and applications are gaining greater prominence in everyday life. With this growth comes the need to discuss and debate the implications of this development. With this is mind, this talk aims to introduce some of the key concepts relating to what AI really means, different means to achieving it, and outline key challenges and ethical considerations.

Dr Joie Ensor | Building and validating prediction models: An overview of sample size guidance and the pmsampsize package

24 November 2021

In this talk, Joie discusses some of the considerations when deciding how much data is ‘enough’ when looking to i) develop a new clinical prediction model (CPM) and ii) validate an existing CPM. When designing a study to develop a new CPM, researchers must ensure a large enough sample size to develop a model that predicts as accurately as possible. Conversely, when designing a study to validate an existing CPM, we must ensure a sample size large enough to estimate model performance accurately and precisely in an external sample.

Dr Ewan Carr | Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

20 October 2021

In this talk, Ewan introduces a pipeline that exploits recent developments in topological data analysis to identify homogeneous clusters in high-dimensional data. The approach is based on Mapper, an algorithm that reduces a point cloud into a one-dimensional graph. Written in Python and freely available online, the pipeline offers several advantages over existing clustering techniques. These include the ability to integrate prior knowledge into the clustering process and selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types.

Diana Shamsutdinova | Combining classical and machine-learning methods in Survival Analysis to boost predictive performance and preserve interpretability

22 September 2021

In this talk, Diana speaks about survival analysis, which deals with the longitudinal data and estimates both the distribution of time-to-event in a population over the observation time and how the time-to-event depends on the risk factors.

Dr Andrew Lawrence | Introduction to dCVnet – software for clinical prediction

15 September 2021

In this talk, Andrew speaks about dCVnet, a software tool for prediction modelling. It produces tuned elastic-net regression models with cross-validated prediction performance measures. This approach can be useful in smaller samples or with many predictors. The tool is fast, easy to use and, in contrast to more general prediction modelling software, requires minimal statistical programming experience. dCVnet was developed recently with support from the Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Trust and King’s College London and is freely available at https://github.com/AndrewLawrence/dCVnet.

Mihai Ermaliuc | Creating Ensembles of Generative Adversarial Network Discriminators for One-class Classification

7 July 2021

In this talk, Mihai introduces a new technique based on Generative Adversarial Networks (GANs), that is able to achieve high performance in the one-class classification problem. He talks about the introduction of an algorithm for one-class classification based on binary classification of the target class against synthetic samples. Mihai's work was recently nominated for the best PhD student paper award of the International Conference of Engineering Applications of Neural Networks, EANN 2021.

Lucy Bull | Harnessing repeated measurements of predictor variables: A review of existing methods for clinical risk prediction

21 April 2021

In this talk, Lucy Bull provides an overview of her methodological work that focuses on how we can make better use of routinely-collected medical data to enhance the reliability and applicability of clinical prediction models (CPMs). More specifically, Lucy highlights the motivations behind incorporating longitudinal data into clinical prediction models, provides a detailed overview of available methodology and discusses the challenges faced when applying such methodology to real-world data, using a case-study in chronic disease.

Isobel Ridler | Regularised Structural Equation Modelling Application to Psychometric Scales

31 March 2021

Isobel talks structural equation modelling (SEM) and Regularised SEM (regSEM) as a method incorporating penalised likelihood into the SEM framework. In this seminar, regSEM is applied to a model of outcome prediction including a large psychometric scale in first a simulation study, and then a real-world longitudinal data set, allowing for a comparison of standard maximum likelihood estimation and regSEM, and demonstrating the ability of regSEM to perform sparse model selection and hence potentially optimise a scale for outcome prediction.

Dr Andreas Groll | Regularization and Effect Selection in Cox Frailty Models

17 February 2021

In this talk, Dr Andreas Groll investigates the effect structure in the Cox frailty model, which is the most widely used model that accounts for heterogeneity in survival data.

Since in survival models one has to account for possible variation of the effect strength over time the selection of the relevant features has to distinguish between several cases, covariates can have time-varying effects, can have time-constant effects or be irrelevant. Regularization approaches are discussed that are able to distinguish between these types of effects to obtain a sparse representation that includes the relevant effects in a proper form. This idea is applied to a real world data set, illustrating that the complexity of the influence structure can be strongly reduced by using such a regularization approach.

Dr Raquel Iniesta | Augmenting Machine Learning with Topological Data Analysis for precision

27 January 2021

Topological Data Analysis (TDA) is a recently emerged field offering promising tools to extract descriptors of the shape and structure of complex data.

In this talk, Raquel provides an overview of TDA methods that complement current analytical approaches based on machine learning for precision medicine studies. She also introduces two popular techniques from TDA: the Persistent Diagram and Mapper graph, and discusses how these techniques are effective, based upon the literature available where TDA has been applied in the context of precision medicine. Lastly, she very briefly presents her and her team's ongoing work on how to integrate TDA with machine learning models to identify homogeneous subgroups of patients and predict clinical outcomes.

Dr Sam Leighton | Development and validation of a non-remission risk prediction model in First Episode Psychosis

18 November 2020

Sam covers the development and validation a risk prediction model of symptom non-remission in first-episode psychosis. His development cohort consisted of 1027 patients with first episode psychosis recruited between 2005 to 2010 from 14 early intervention services across the National Health Service in England.

The prediction model showed good discrimination (C-statistic of 0.72 (0.66, 0.78) and adequate calibration with intercept alpha of 0.14 (-0.11, 0.39) and slope beta of 1.15 (0.76, 1.53). Our model improved the net benefit by 13%, equivalent to 13 more detected non-remitted first episode psychosis individuals per 100. Hence, using our model would be worthwhile if we accept using it on eight individuals to predict one additional non-remitted individual, or using our model on eight individuals will avoid unnecessary additional interventions in one individual.

Dr Florian Privé | Efficient penalized regression methods for genetic prediction

14 October 2020

Dr Florian will start by introducing the penalized regression models, and their pros and cons, particularly in the context of genetic prediction, then explain how these models can still be used, even for very large datasets. He will present some results of using penalized regression to predict 240 different phenotypes based on 1M genetic variants for each of 500K individuals.

Dr Florian Privé is a postdoc in Aarhus, Denmark. He is interested in using statistical learning to advance precision medicine. He is specifically developing tools to analyze very large datasets and methods to build predictive models based on large genetic data. I’m also fond of Data Science and an R(cpp) enthusiast.

Filippo Corponi | Frontal lobes dysfunction across clinical clusters of acute schizophrenia

10 September 2020

Schizophrenia is a heterogenous disease comprising manifold clinical phenotypes which may underlie distinct biological underpinnings. Frontal lobes are a key area of brain dysfunction in schizophrenia. The frontal assessment battery (FAB) is a battery screening for a dysexecutive syndrome in neurodegenerative diseases.

Filippo Corponi presents his work investigating the relationship between frontal lobe impairment and symptom profiles defined along the Positive and Negative Syndrome Scale (PANSS) principal components in patients with acute schizophrenia.

Dr Olesya Ajnakina | Using modern statistical learning methods to estimate all cause mortality risk

1 July 2020

Dr Olesya Ajnakina discusses her large population-based cohort study which addresses the need to develop a robust prediction model for estimating an individual risk for all-cause mortality. This allows relevant assessments and interventions to be targeted appropriately.

Having employed modern statistical learning algorithms and addressed the weaknesses of previous models, the new mortality model achieved good discrimination and calibration to quantify absolute 10-year risk of all-cause mortality in older adults, as shown by its performance in a separate validation cohort. The model can be useful for clinical, policy, and epidemiological applications.

Professor Daniel Stahl | Applying Statistical Learning Methods to Improve Analyses of Medical Research Studies

2 May 2020

The aim of this presentation is an introduction to statistical learning and prediction modelling.

Daniel explains the key differences between inferential statistical modelling and prediction modelling and then introduces the concept of prediction modelling and statistical learning. Finally, Daniel assesses the usefulness of statistical learning algorithms for applications in medical research as an alternative to classical statistical inference methods by reanalysing an event-related brain potential (ERP) dataset from infants at high or low risk of developing autism. Daniel also explains the concept of cross-validation for model selection and validation and provide a brief introduction to regularized regressions.

Cookies: How we use information on our website:

Prediction Modelling Presentations