Our Prediction Modelling group host regular presentations to engage and introduce new members to our community.
Watch the latest presentation:
Haider Mian | Deploying machine learning algorithms to stratify patients with inflammatory bowel disease using routinely collected electronic health records | 1 February 2024
Personalised medicine approaches are eagerly awaited to facilitate individualisation of medical care for patients with inflammatory bowel disease (IBD). Multiple approaches have already been explored in attempts to stratify patients into different prognostic trajectories. In this study we aimed to use unsupervised machine learning algorithms to cluster patients based on their routinely collected electronic health records in an unsupervised approach.
Click the drop-downs to see past presentations:
25 October 2023
As the number of dimensions increases, big datasets from precision medicine research studies can exhibit complex shapes and unexpected behaviours. The statistical analysis of such data necessitates sophisticated analytical methods capable of capitalizing on the high dimension of these datasets. This talk will present novel methods of applying TDA to devise a unique approach for assessing the quality of data imputation for missing values. The method establishes a pipeline that combine TDA with permutation testing to identify differences in topological data structures among datasets. This provides valuable information for tailoring the selection of missing data imputation strategies.
28 September 2023
In this presentation we delve into the field of algorithmic fairness in Multi-Agent Systems (MAS), focusing on the fairness of agents' decision-making processes. We first provide a definition of fairness and we present the reasons why it is relevant for AI-based decisions. Various fairness metrics, e.g., demographic parity, conditional statistical parity and fairness through awareness are discussed. We show how to apply these metrics in multi-agent systems, providing an explanation of the key adaptations. We complete the presentation with an application of the metrics to the Harvest Tree Game, an original configuration of multi-agent systems that are already well-known in the literature.
24 May 2023
The integration of AI-based decision tools into routine clinical care is opening the door to a completely new paradigm where doctors and machines can collaborate to decide a right diagnose or treatment for a patient, based on individual patient's biomedical information. A number of important ethical challenges rise from the development of the AI tools to their implementation. In this talk Raquel will introduce Fair modelling, a qualitative framework that aims to serve as an interrogation for an ethical integration of AI decision systems in healthcare. During her talk, the role that clinicians, developers and patients have in ensuring an ethical development and deployment of AI models will be discussed. Several ethical challenges will be identified and connected with the four ethical principles of the medical profession —Respect for autonomy, Beneficence, Non-Maleficence and Justice.
26 April 2023
Individual participant data meta-analyses (IPDMA) have in recent years been applied to a range of mental health conditions to understand individual differences in treatment response and aid the personalisation of interventions. This presentation covers the results of a large-scale effort to collect and synthesise available data from randomised controlled trials studying the efficacy of psychological interventions versus control to prevent depressive relapse for people in remission from depression (see also: itfra.org). It will further describe how individual participant data could be used to potentially improve risk stratification using decision tree analyses. It will reflect on the practical and methodological considerations of using IPDMA to aid the personalisation of interventions to individual participant characteristics. It will also cover plans for conducting IPDMA for preventing the onset and relapse of common mental health conditions.
29 March 2023
Whole disease models (WDMs) are large-scale, system-level models which can evaluate multiple decision questions across an entire care pathway. Whilst WDMs can offer several advantages as a platform for undertaking economic analyses, the development of a WDM requires a significant initial investment of time and resources and presents additional challenges for model verification and validation.
During this talk, Lily discusses:
- The motivations for her to develop a WDM for schizophrenia services in the UK
- Methods for developing the schizophrenia WDM
- Reflections on pros and cons of the whole disease modelling approach
22 February 2023
Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We will explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings.
25 January 2023
The mechanism of human neural responses to different stimuli has always been of interest to neuroscientists. In clinical situations, tools to distinguish different diseases or states are required. However, classic classification methods have obvious shortcomings: traditional clinical categorical methods may not be competent for behaviour prediction or brain state classification and traditional machine learning models are improvable in classification accuracy. With the increasing use of convolutional neural networks (CNN) in neuroimaging computer-assisted classification, an ensemble classifier of CNNs might be able to mine hidden patterns from MEG signals.
27 July 2022
The use of machine learning to improve prognostic and diagnostic accuracy has been increasing at the expense of classic statistical models. In this talk Dr Lauric Ferrat presents results comparing the prediction performance of several well-known machine learning approaches to logistic regression. He then argues that focus should not be made on performance optimisation but clinical utility and ease of model access.
29 June 2022
In conventional prediction models, predictors are typically measured at a single fixed time point such as at baseline or the most recent follow-up. Dynamic prediction has emerged as a more appealing prediction technique that takes account of longitudinal history of biomarkers for making predictions. In this talk Dr Mizanur Khondoker presents results from a simulation study comparing the prediction performance of two well-known approaches for dynamic prediction, namely joint modelling and landmarking approaches.
25 May 2022
Unsupervised learning techniques have been applied to psychosis groups in the hope of finding meaningful but undiscovered groupings of patients. A methodological option for unsupervised learning is network-based clustering, which relies on the topology of the data represented as a network. This study used cognitive and symptom data from a cohort of healthy controls and those with a Clinical High Risk of Psychosis to test the validity of graph clustering and to explore the use of a multilayer clustering method for multimodal unsupervised learning. Graph clustering was able to produce results highly similar to k-means clustering and to separate groups into those with significantly different functioning scores. Multilayer clustering was used to tune the similarity of clustering solutions between modalities.
30 March 2022
Artificial Intelligence (AI) systems and applications are gaining greater prominence in everyday life. With this growth comes the need to discuss and debate the implications of this development. With this is mind, this talk aims to introduce some of the key concepts relating to what AI really means, different means to achieving it, and outline key challenges and ethical considerations.
24 November 2021
In this talk, Joie discusses some of the considerations when deciding how much data is ‘enough’ when looking to i) develop a new clinical prediction model (CPM) and ii) validate an existing CPM. When designing a study to develop a new CPM, researchers must ensure a large enough sample size to develop a model that predicts as accurately as possible. Conversely, when designing a study to validate an existing CPM, we must ensure a sample size large enough to estimate model performance accurately and precisely in an external sample.
20 October 2021
In this talk, Ewan introduces a pipeline that exploits recent developments in topological data analysis to identify homogeneous clusters in high-dimensional data. The approach is based on Mapper, an algorithm that reduces a point cloud into a one-dimensional graph. Written in Python and freely available online, the pipeline offers several advantages over existing clustering techniques. These include the ability to integrate prior knowledge into the clustering process and selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types.
22 September 2021
In this talk, Diana speaks about survival analysis, which deals with the longitudinal data and estimates both the distribution of time-to-event in a population over the observation time and how the time-to-event depends on the risk factors.
15 September 2021
In this talk, Andrew speaks about dCVnet, a software tool for prediction modelling. It produces tuned elastic-net regression models with cross-validated prediction performance measures. This approach can be useful in smaller samples or with many predictors. The tool is fast, easy to use and, in contrast to more general prediction modelling software, requires minimal statistical programming experience. dCVnet was developed recently with support from the Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Trust and King’s College London and is freely available at https://github.com/AndrewLawrence/dCVnet.
7 July 2021
In this talk, Mihai introduces a new technique based on Generative Adversarial Networks (GANs), that is able to achieve high performance in the one-class classification problem. He talks about the introduction of an algorithm for one-class classification based on binary classification of the target class against synthetic samples. Mihai's work was recently nominated for the best PhD student paper award of the International Conference of Engineering Applications of Neural Networks, EANN 2021.
21 April 2021
In this talk, Lucy Bull provides an overview of her methodological work that focuses on how we can make better use of routinely-collected medical data to enhance the reliability and applicability of clinical prediction models (CPMs). More specifically, Lucy highlights the motivations behind incorporating longitudinal data into clinical prediction models, provides a detailed overview of available methodology and discusses the challenges faced when applying such methodology to real-world data, using a case-study in chronic disease.
31 March 2021
Isobel talks structural equation modelling (SEM) and Regularised SEM (regSEM) as a method incorporating penalised likelihood into the SEM framework. In this seminar, regSEM is applied to a model of outcome prediction including a large psychometric scale in first a simulation study, and then a real-world longitudinal data set, allowing for a comparison of standard maximum likelihood estimation and regSEM, and demonstrating the ability of regSEM to perform sparse model selection and hence potentially optimise a scale for outcome prediction.
17 February 2021
In this talk, Dr Andreas Groll investigates the effect structure in the Cox frailty model, which is the most widely used model that accounts for heterogeneity in survival data.
Since in survival models one has to account for possible variation of the effect strength over time the selection of the relevant features has to distinguish between several cases, covariates can have time-varying effects, can have time-constant effects or be irrelevant. Regularization approaches are discussed that are able to distinguish between these types of effects to obtain a sparse representation that includes the relevant effects in a proper form. This idea is applied to a real world data set, illustrating that the complexity of the influence structure can be strongly reduced by using such a regularization approach.
27 January 2021
Topological Data Analysis (TDA) is a recently emerged field offering promising tools to extract descriptors of the shape and structure of complex data.
In this talk, Raquel provides an overview of TDA methods that complement current analytical approaches based on machine learning for precision medicine studies. She also introduces two popular techniques from TDA: the Persistent Diagram and Mapper graph, and discusses how these techniques are effective, based upon the literature available where TDA has been applied in the context of precision medicine. Lastly, she very briefly presents her and her team's ongoing work on how to integrate TDA with machine learning models to identify homogeneous subgroups of patients and predict clinical outcomes.
18 November 2020
The prediction model showed good discrimination (C-statistic of 0.72 (0.66, 0.78) and adequate calibration with intercept alpha of 0.14 (-0.11, 0.39) and slope beta of 1.15 (0.76, 1.53). Our model improved the net benefit by 13%, equivalent to 13 more detected non-remitted first episode psychosis individuals per 100. Hence, using our model would be worthwhile if we accept using it on eight individuals to predict one additional non-remitted individual, or using our model on eight individuals will avoid unnecessary additional interventions in one individual.
14 October 2020
Dr Florian will start by introducing the penalized regression models, and their pros and cons, particularly in the context of genetic prediction, then explain how these models can still be used, even for very large datasets. He will present some results of using penalized regression to predict 240 different phenotypes based on 1M genetic variants for each of 500K individuals.
Dr Florian Privé is a postdoc in Aarhus, Denmark. He is interested in using statistical learning to advance precision medicine. He is specifically developing tools to analyze very large datasets and methods to build predictive models based on large genetic data. I’m also fond of Data Science and an R(cpp) enthusiast.
10 September 2020
Schizophrenia is a heterogenous disease comprising manifold clinical phenotypes which may underlie distinct biological underpinnings. Frontal lobes are a key area of brain dysfunction in schizophrenia. The frontal assessment battery (FAB) is a battery screening for a dysexecutive syndrome in neurodegenerative diseases.
Filippo Corponi presents his work investigating the relationship between frontal lobe impairment and symptom profiles defined along the Positive and Negative Syndrome Scale (PANSS) principal components in patients with acute schizophrenia.
1 July 2020
Dr Olesya Ajnakina discusses her large population-based cohort study which addresses the need to develop a robust prediction model for estimating an individual risk for all-cause mortality. This allows relevant assessments and interventions to be targeted appropriately.
Having employed modern statistical learning algorithms and addressed the weaknesses of previous models, the new mortality model achieved good discrimination and calibration to quantify absolute 10-year risk of all-cause mortality in older adults, as shown by its performance in a separate validation cohort. The model can be useful for clinical, policy, and epidemiological applications.
2 May 2020
The aim of this presentation is an introduction to statistical learning and prediction modelling.
Daniel explains the key differences between inferential statistical modelling and prediction modelling and then introduces the concept of prediction modelling and statistical learning. Finally, Daniel assesses the usefulness of statistical learning algorithms for applications in medical research as an alternative to classical statistical inference methods by reanalysing an event-related brain potential (ERP) dataset from infants at high or low risk of developing autism. Daniel also explains the concept of cross-validation for model selection and validation and provide a brief introduction to regularized regressions.