1 of 23

2021 BayesiaLab Conference — A Zoom Virtual Event

A Zoom Virtual Event — October 11–15, 2021

The 9th Annual BayesiaLab Conference

Since Judea Pearl first proposed Bayesian networks in the 1980s, this new paradigm's attractive mathematical and statistical properties have become well-understood and widely utilized within computer science. However, their enormous potential as a practical research framework—beyond computer science—has only emerged more recently. Promoting the practical use of Bayesian networks for research, analytics, and reasoning is the principal objective of our annual BayesiaLab Conference.

For the last eight years, this event has provided unique opportunities to learn about the state of the art in practical applications of Bayesian networks, from architecture to zoology and everything in between. Many of our regular conference attendees appreciate the cross-disciplinary nature of the program, and you may find that an innovative methodology from medical research could very well apply to a marketing science problem or vice versa.

Conference Presentations

If you missed some of the talks, we uploaded recordings of all presentations and the corresponding slides.

BayesiaLab 10 Update

Dr. Lionel Jouffe, CEO, Bayesia S.A.S.

Presented at the 9th Annual BayesiaLab Conference on October 11, 2021.

Presentation Video

About the Presenter

Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel holds a Ph.D. in Computer Science from the University of Rennes and has worked in Artificial Intelligence since the early 1990s. While working as a Professor/Researcher at ESIEA, Lionel started exploring the potential of Bayesian networks.
After co-founding Bayesia in 2001, he and his team have been working full-time on the development of BayesiaLab. Since then, BayesiaLab has emerged as the leading software package for knowledge discovery, data mining, and knowledge modeling using Bayesian networks. It enjoys broad acceptance in academic communities, business, and industry.

Previous Conference Presentations

Long-Term Effects of Marketing using Bayesian Networks

Presented at the 9th Annual BayesiaLab Conference on October 11, 2021.

Abstract

Long-term brand marketing is important to achieve sustainable growth. There are many important areas that companies can invest in paid ads, performance marketing, and affiliate marketing. They will all help with growth, but eventually, you will hit that glass ceiling. Plus, with nearly every market oversaturated at this point, we will need to spend a whole lot of money to have any chance of standing out.

In my talk, I demonstrate how Course5 is disrupting the space of Digital Media Optimization with its new solution offering and why it is important for the industry to consider and adopt such solutions in the challenging times we are in.

Presentation Video

About the Presenters

Anand Wilson, Lead Data Science Consultant, Advanced Analytics Course5 Intelligence

Anand comes with 9+ years of experience in applied artificial intelligence and data sciences. He has worked for marque clients such as Lenovo, Intel, Microsoft, Novartis, Novo Nordisk, GE, Mars Wrigley, PepsiCo, etc., enabling digital transformation using A.I.

In his current role, Anand focuses on developing and market solutions based on the Bayesian Network model theory, which enables us to quantify the causality in an observational study. A major area of work/research includes Knowledge Modelling, Machine Learning with BayesiaLab, and Inference.

Anand comes from applied statistics background. He has a master's degree in statistics. Anand carries an acute interest in machine reasoning, causal inference, and experimental designs, along with machine learning and data science.

Buvana Iyer, Principal Solution Architect, Discovery Solution Course5 Intelligence

Buvana does consult for C-level clients and has led projects achieving org-level implementation of enterprise analytics, including Software Development, BI & Analytics, and ML & AI solutions (at scale), adopting DevOps philosophy with agile delivery. Expertise in leveraging both traditional statistics and machine learning techniques to create solutions and deliver business value.

Buvana comes from applied Mathematics background. She has a master's in Mathematics. Buvana carries an acute interest in Predictive Analytics, Statistical modeling, machine reasoning, and experimental designs, along with machine learning and data science.

Presentation Slides

Previous Presentations by Course5 Intelligence

Customer Preference Sequencing for Better Customer Engagement (Laval Virtual World, 2020)
Modern Approaches to Causal Modelling in Customer Experience Measurement (Durham, 2019)

Learning a Bayesian Structure to Model Entrepreneurial Intentions Towards Business Creation

Abstract

Economic growth in most advanced countries is driven by small and medium enterprises, and most countries prioritize entrepreneurship for economic growth and innovation. This is very apparent in the United Arab Emirates, where an average of about 39% of adults want to start a business in the next three years. As such, Entrepreneurial intentions have been a major focus of research, but they have always been studied using generic models. We use Bayesian Networks (BN) to model entrepreneurial intentions as it provides an advantage over classical methods. To our knowledge, no study has used the BN framework to model entrepreneurial intentions within the UAE. Using the Theory of Planned Behavior (TPB) as a foundation, a cross-sectional study was conducted among a random sample of 324 Emirati University students in the UAE. We implemented Unsupervised Structural learning within BayesiaLab using the SopEQ unsupervised algorithm to minimize the “Minimum Description Length” (MDL) score. Our model provides confirmation of and more robust statistical support for existing theoretical frameworks. It helped not only find relationships among the different entrepreneurial factors but also assess the effects of changes in these variables on intentions. One of the strengths of our study is the inclusion of attitudes toward entrepreneurship and self-efficacy variables. Accordingly, the main conclusion that can be drawn from our model is that Entrepreneurial intentions are highly affected by attitude, self-efficacy, subjective norms, and opportunity feasibility. The results can be used by professionals for proposing new policies for university opportunities and government support.

Presentation Video

About the Presenter

Linda Smail, Ph.D. Department of Mathematics & Statistics College of Natural and Health Sciences Zayed University, Dubai, United Arab Emirates linda.smail@zu.ac.ae

Linda Smail is an Associate professor in the Department of Mathematics and Statistics at Zayed University, Dubai, United Arab Emirates, where she teaches Mathematics and Statistics courses. She obtained her Ph.D. in Mathematics from Marne-La-Vallée University, France, in 2004. Her research interests are in inference, learning graphical models, and applications of Bayesian Networks in different fields, from Education to Health.

Presentation Materials

Predicting the “Cracking” Behavior of Road Networks in UAE with Bayesian Belief Networks

Presented at the 9th Annual BayesiaLab Conference on October 12, 2021.

Abstract

Cracking is one of the major factors which leads to the deterioration of road structures. Globally, United Arab Emirates is a country with high-standard road networks. Highway organizations take the initiative to measure the cracking and confirm whether it is within the prescribed limit. However, such monitoring activities are expensive in terms of cost, labor, and machinery, which ultimately leads to failure in the timely repair and maintenance activities of the roads. This results in a reduction of the service life of the pavements. This study aims to develop a solution for this problem by studying the historical data of factors that influence cracking in roads. To perform this, data related to major road networks in the country are collected from the highway agency. The data include environmental factors, traffic intensity, and factors like road type and age of the road. A Supervised Learning algorithm will determine the role of each factor that contributes to cracking. Once the significance of each factor is analyzed, further analysis based on a dynamic Bayesian network will aid in estimating the future values of cracking on the roads without measuring it. This study thus can be a major contribution in the transportation field to improve the quality of road networks.

Presentation Video

About the Presenter

Ms. Babitha did her B.Tech in Civil Engineering and M.Tech in Construction Engineering and Management from India. She has worked as a Research Assistant at UAE University in the Civil and Environmental Engineering Department, and her work was mainly focused on applying Artificial Intelligence techniques in the Civil Engineering field. Currently, she is doing a Ph.D. at UAE University on a topic involving the application of Bayesian networks.

Presentation Materials

Explainable Bayesian Networks for Transport Policy

Abstract

To deal with increasing amounts of data, decision and policymakers frequently turn to advances in machine learning and artificial intelligence to capitalise on the potential reward. But there is also a reluctance to trust black-box models, especially when such models are used to support decisions and policies that affect people directly, like those associated with transport and people's mobility. Recent developments focus on explainable artificial intelligence to bolster models' trustworthiness. In this paper, we demonstrate the use of an explainable-by-design model, Bayesian Networks, on travel behaviour. The model incorporates various demographic and socioeconomic variables to describe full-day activity chains: activity and mode choice, as well as the activity and trip durations. More importantly, this paper shows how the model can be used to provide the most relevant explanation for people's observed travel behaviour. The overall goal is to show that model explanations can be quantified and, therefore, assist policymakers to truly make evidence-based decisions. This goal is achieved through two case studies to explain people's vulnerability as it pertains to their total trip duration.

Presentation Video

About the Presenter

Alta de Waal, Ph.D. BMW Software Factory South Africa

Alta is a senior data scientist at the BMW Software Factory in South Africa. She has more than 20 years of experience in the design, development, and implementation of different components in the AI value chain. Her current research focus is natural language processing (NLP) and explainable methods in AI for the purpose of actionable insights, fairness, and accountability in these systems.

Presentation Materials

Previous Presentations

Bayesian Networks for Knowledge Discovery and Curriculum Optimisation in Academic Programmes (Laval Virtual World, 2020)
Activity-Based Travel Demand Generation Using Bayesian Networks (Laval Virtual World, 2020)
Spatially Discrete Probability Maps for Anti-Poaching Efforts (Paris, 2017)

Treatment of Missing Data in Bayesian Structural Learning: A Simulation Study for Social Science

Presented at the 9th Annual BayesiaLab Conference on October 12, 2021.

Abstract

Bayesian networks allow us to uniquely visualise data and tackle complex interdisciplinary problems. Bayesian networks are based on Bayes' theorem. The premise of this theory is that initial (prior) beliefs can be updated based on new evidence. Part of the appeal of this method is its intuitive nature. The process of updating beliefs, given new information, is common to everyday scenarios. Bayesian networks can be used for variable inference (identifying the value of variables), parameter inference (identifying probabilistic dependencies between variables), and structure learning (understanding associations among variables). Social science is an area with large amounts of complex interdisciplinary data where Bayesian networks may be useful to unravel relationships among variables. However, the uptake of Bayesian networks in social science is relatively low. Here, we look at how Bayesian networks have been applied to antibiotic resistance and antimicrobial use and explore potential barriers to their use in this field of study. The complex nature of this biosocial phenomenon means that applications are increasingly making use of social science data, e.g., survey data. This type of data is often associated with high levels of missing data. Here, we further consider how this missing data can be addressed for Bayesian network structure learning. We compare a commonly used method in social science, multiple imputation by chained equations (MICE), with one specific for Bayesian network learning, structural expectation-maximization (SEM). We simulate multiple incomplete data sets with different missingness mechanisms, numbers of categorical variables, and amounts of missing data. We evaluate and compare the performance of MICE and SEM in capturing the real Bayesian network structure under each condition. We find that applying either method (MICE or SEM) provides better structure recovery than doing nothing, and SEM, in general, outperforms MICE. This finding is robust across missingness mechanisms, the number of variables, and the amount of missing data. This suggests that taking advantage of the additional information provided by network structure during SEM can improve the performance of Bayesian networks for social science and other interdisciplinary analyses.

Presentation Video

About the Presenters

Madeleine Clarkson Irvine Building University of St Andrews St Andrews, KY16 9AL, Fife, UK mcc23@st-andrews.ac.uk

Ms. Madeleine Clarkson has an undergraduate degree in Economics from the University of Cape Town, South Africa, and an MSc in the Control of Infectious Disease from the London School of Hygiene and Tropical Medicine(LSHTM), United Kingdom. She has worked as a research assistant in infectious disease modeling at Imperial and LSHTM. She is currently undertaking a Ph.D. in Bayesian Network analysis of Antimicrobial resistance at the University of St Andrews based within Dr. V Anne Smith's Lab.

Xuejia Ke Harold Mitchell Building University of St Andrews St Andrews, KY16 9TH, Fife, UK xk5@st-andrews.ac.uk

Ms. Xuejia Ke has an undergraduate degree in Pharmacy from China Pharmaceutical University, China, and an undergraduate degree in Pharmacology and Biochemistry from the University of Strathclyde, United Kingdom. She has an MSc in Bioinformatics from the University of Edinburgh, United Kingdom. She has worked on statistical models and software for RNA-seq quantification from subcellular fractions in her MSc project. She is currently undertaking a Ph.D. in Bayesian Network analysis of social science data at the University of St Andrews within Dr. V Anne Smith's lab.

Presentation Slides

The Capacity of Territories to Integrate the Innovations of Electric Battery and Hydrogen Mobility

Presentation Video

About the Presenter

I am Gilles Voiron, a research engineer and Geo Data Scientist, an expert in electric mobility. My job is to carry out studies based on precise data to help and advise businesses and communities in the ecosystem of electric mobility. The studies carried out considering local specifications allow lessons to be learned to propose recommendations for today and to plan the needs in the years to come through geo-prospective models on the horizon 2025-2030-2035.

Presentation Materials

Using Bayesian Networks for Environmental Health Risk Assessment

Presented at the 9th Annual BayesiaLab Conference on October 13, 2021.

Abstract

Bayesian networks are a very powerful tool to better understand the links between the living environment of the population and its health. The study reported here investigated the potential relationships between air pollution, socio-economy, and proven pathologies (e.g., respiratory, cardiovascular) within an industrial area in Southern France (Etang de Berre), gathering steel industries, oil refineries, shipping, road traffic, and experiencing a Mediterranean climate. A total of 178 variables were simultaneously integrated within a Bayesian model at an intra-urban scale. Unsupervised and supervised algorithms (maximum spanning tree, tree-augmented naive classifier), as well as sensitivity analyses, were used to better understand the links between all variables and highlighted correlations between population exposure to air pollutants and some pathologies. Adverse health effects (bronchus and lung cancers for 15–65 years old people) were observed for hydrofluoric acid at low background concentration (<0.003 μg m−3) while exposure to particulate cadmium (0.210–0.250 μg m−3) disrupts insulin metabolism for people over 65 years-old leading to diabetes. Bronchus and lung cancers for people over 65 years old occurred at low background SO2 concentration (6 μg m−3) below European limit values. When benzo[k]fluoranthene exceeded 0.672 μg m−3, we observed a high number of hospital admissions for respiratory diseases for 15-65 years-old people. The study also revealed the important influence of socio-economy (e.g., single-parent family, people with no qualification at 15 years old) on pathologies (e.g., cardiovascular diseases). Finally, diffuse polychlorinated biphenyl (PCB) pollution was observed in the study area and can potentially cause lung cancers.

Presentation Video

About the Presenter

Sandra Pérez, Ph.D. UMR ESPACE 7300 98, Bd E. Herriot BP 3209 06204 Nice Cedex France sandra.perez@univ-cotedazur.fr

Sandra is an associate professor in geography at the University of Cote d'Azur. She has been conducting research in environmental health for 15 years, more specifically on the pathogenic potential of geographic spaces.

Business Valuation Using Bayesian Networks

Abstract

Given wide plausible value ranges, the greatest value that a business valuation expert offers a client may be the ability to persuade others (e.g., judges) to locate their preponderance of probabilities (evidence) across the client’s interval within the plausible value range. Accomplishing this feat is a function of technical valuation expertise, as well as communication tools and techniques. This presentation explores Bayesian networks as a platform for facilitating the probabilistic estimation, negotiation, and communication of business value.

Presentation Video

Presentation Slides

About the Presenter

Kurt S. Schulzke, JD, CPA, CFE Associate Professor of Accounting & Law University of North Georgia kurt.schulzke@ung.edu

Kurt Schulzke, JD, CPA, CFE, teaches forensic accounting and audit analytics at the University of North Georgia. He has published on revenue recognition, materiality, expert witnessing, economic damages, and business valuation through a Bayesian networks lens in a variety of outlets, including the Columbia Journal of Transnational Law, Vanderbilt Journal of Transnational Law, Journal of Forensic Accounting Research, Tennessee Journal of Business Law, and The Value Examiner. With an M.S. in Applied Statistics from Kennesaw State University, he is equally adept as counsel, expert witness, or neutral in valuation-related matters.

Previous Conference Presentations

A Bayesian Network Model to Predict Outcomes in Acute Myocarditis Patients

Presented at the 9th Annual BayesiaLab Conference on October 13, 2021.

Abstract

Background

Acute myocarditis is an inflammation of the myocardium. It can occur at any age and has no typical presentation. Myocarditis is also an important cause of morbidity and mortality. Moreover, there is no current prognosis score existing.

Purpose

The objective of this research was to predict and quantify the risk of cardiovascular events defined by extracorporeal membrane oxygenation (ECMO), heart transplant, or death in acute myocarditis patients.

Methods

Using the AMPHIBIA registry, we developed a Bayesian network model to create a prognostic score. This Score quantifies the probability for a patient to reach our composite endpoint. We decided to exclusively retain baseline data while excluding all further data to predict early detection of the event. With the Bayesian theorem, we can draw a representation of our variables' dependencies, namely a Bayesian network. More precisely, a Bayesian network is a directed acyclic graph representing variables by a node and their conditional dependencies by a direct link. To create our predictive model, we first needed to discretize our continuous variables as the Bayesian networks require. Then, we used a supervised learning algorithm: the Markov Blanket. After defining our target variable, the Market Blanket only keeps strongly related variables (p-value << 0.01 in our model), excluding all other variables for the prediction of the event.

Results

Our model shows good performances using only 6 variables from our dataset with an area under the curve of 91% for predicting whether or not the patient will reach the endpoint. With the cross-validation method, the model also performs well, with an area under the curve of 90%. The clinician can directly use the prediction output to classify the patients at their entrance to consider low, medium, and high-risk patients and send them to the appropriate hospital department: standard or intensive care unit. Finally, looking at the posterior probabilities, the patients who will most likely reach the endpoint are women with pre-cardiogenic shock, high NTprobnp, high creatinine, low TP, and no chest pain. Conversely, the patients who won’t reach the endpoint are more often men with chest pain, no cardiogenic shock, high TP, low NT pro-BNP, and low creatinine.

Presentation Video

About the Presenter

My name is Gatien Hubert, and I am a fourth-year medical student at Sorbonne University, France.

At the same time, I have also followed statistics courses, still at Sorbonne University. As part of this course, I have done an internship in the INSERM Department of Cardiology at La Pitié Salpêtrière Hospital, where I enjoyed using Bayesialab to develop my models.

Presentation Slides

Implementation of Bayesian Networks for Modelling Stress Corrosion Cracking

Presented at the 9th Annual BayesiaLab Conference on October 13, 2021.

Abstract

In the oil and gas industry, materials employed in upstream operations and pipelines operate at elevated pressures and temperatures, while exposed to substantial concentrations of corrosive agents such as chlorides (Clˉ), carbon dioxide (CO2), and hydrogen sulfide (H2S). Such aggressive conditions give rise to different failure mechanisms associated with environmentally assisted cracking (EAC); predominately stress corrosion cracking (SCC). This corrosion phenomenon is considered a critical threat for production systems, as it can accelerate the mechanical failure of components due to the combined influence of non-cyclic stresses (i.e., residual, external, or operational), and corrosion-oxidation reactions in a reactive environment.

Due to their high mechanical properties and low corrosion rates, the use of corrosion-resistant alloys (CRAs), such as duplex stainless steel (DSS) alloys, have increasingly been employed to improve the integrity of production equipment and transportation facilities. However, the performance of DSS in the hydrocarbon recovery is not well documented in industry standards, such that the operating limits of DSS are often perceived as overly conservative. Thus, the susceptibility of DSS alloys to SCC is actively being investigated to optimize the material selection processes, and thereby ensure the reliability of hydrocarbon production systems. Nonetheless, despite the numerous investigations devoted to describing the SCC phenomenon, a thorough understanding of SCC mechanisms remains elusive.

Given the stochastic nature of failures by corrosion phenomena and the numerous factors involved in SCC, we focus on the implementation of Bayesian networks (BNs) to establish an explanatory framework for the SCC of DSS in downhole environments. Here, we report the initial stage of a BN model, where we exploit the advantage of BNs to reconcile various sources of information sources (e.g., literature relevant to SCC, results from atomistic simulations, experimental data) within one overall framework.

Additionally, we present the use of machine learning (ML) techniques that assisted in the elucidation of the BN configuration. To treat the uncertainty in our dataset due to unobserved data points, we employ advanced data imputation methods (e.g., tree-based models), together with the expectation and maximization (EM) algorithm, and entropy-based models in BayesiaLab. Future plans will also be detailed for which this BN model is intended to predict both the damage stages of DSS, and safe operating thresholds in downhole environments.

Presentation Video

About the Presenter

Abraham Rojas Zuniga M.Phil., Ph.D. Candidate Faculty of Science and Engineering Western Australian School of Mines and Curtin Corrosion Centre, Curtin University abraham.rojaszuniga@research.uwa.edu.au

I am a Petroleum Engineer with five years of industry experience. I am particularly interested in the application of various simulation techniques, from deterministic to data-driven approaches, to study different phenomena associated with material science and risk assessment.

Presentation Materials

Using Bayesian Networks to Prevent Breakdowns on Hydraulic Systems of Crawler Excavators

Abstract

In the public works environment, avoiding the breakdowns of construction machines is a major challenge. Indeed, this phenomenon can represent a significant economic cost at three different levels. First, we need to pay for the repair of the machine, which is called the direct cost. Then, a breakdown will eventually lead to a delay in the progress of the work or to the need to rent another machine to replace the one that is unavailable, all this representing the indirect cost. And finally, a breakdown can also affect the lifetime of a machine, and optimizing this lifetime is a priority when handling a fleet of public works equipment.

In order to reduce the number of breakdowns, our goal is to develop a system of predictive maintenance, comparably to what can be used in the industry, using the telematic data that the machines produce.

Presentation Video

About the Presenter

I studied for five years (2015-2020) at INSA Rennes (engineering school) in the Department of Applied Mathematics. After an internship (2019) and a one-year work-study contract (2019-2020), I am now working full-time as a data scientist at CHARIER, a public works company operating mainly in the West of France.

Apply BBN Models to Identify Consumer Persona in High-Dimensional Parameter Space

Presented at the 9th Annual BayesiaLab Conference on October 13, 2021.

Abstract

It is a challenge to cluster and segment data in high-dimensional space. Traditional clustering methods relying on distance (e.g., k-means) or density (DBScan) generally fail to identify meaningful clusters in high dimensional space. We investigated clustering methods in high-dimensional space using Bayesian Belief Network (BBN) models, k-means, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), Polytomous variable Latent Class Analysis (PoLCA), and Profile Regression. These methods were used to cluster a set of users and prospective users of Setlist Beauty, which is a digital iPhone App owned by P&G. There are around 500 variables to describe these users. We found that the BBN model performs very well in high-dimensional clustering. Most importantly, it provides metrics to inform us what variables/questions can differentiate consumers and what answers to these questions characterize a consumer segment. These segments and metrics helped deliver actionable insights for targeted advertisement, acquisition, and App feature improvement, etc.

Presentation Video

About the Presenter

Dr. Yong Zhang leverages Bayesian data and modeling science to develop strategies for product design, manufacturing, storage, and transportation across P&G to improve consumers’ life quality and drive positive influence on the environment and society. He develops first principle and data science/machine learning methods and tools through Front End Innovation projects to enable and promote the capability across P&G for breakthrough consumer understanding and product innovation. The methods and tools can be used to extract and integrate information from a variety of data sources to find a “Body of Evidence” for consumer and product research based on Nonparametric Bayesian statistics and deep learning algorithms.

Presentation Materials

Bayesian Belief Network Analysis of Computer Network Traffic Supporting Cybersystem Protection

Presented at the 9th Annual BayesiaLab Conference on October 14, 2021.

Abstract

Comprehension of computer network traffic structure is an important part of geo-intelligence information technology’s quest to safeguard computer systems that support important areas of research and defense. Of crucial importance is the need to understand the structure of computer network traffic patterns, which are crucial to building complex algorithms that defend against computer system intrusion and attack. Bayesian belief networks and machine learning are applied to open-source computer network traffic data to develop algorithms relevant to exhuming latent patterns and modeling computer server state changes. In particular, manifold learning and Bayesian statistical methods are applied to a multidimensional data set to explore whether a two-tier analytical approach based on statistical characterization and modeling is appropriate. Preliminary statistical analysis shows which server sites from a ten-dimensional array experience a high probability of attack. Results also reveal a pattern where certain computer server sites are connected, which in turn provides a guide as to where cyber security resources should be placed to support computer network health. The structural simplicity of the developed algorithmic array offers a rigorous but flexible methodology applicable to a variety of cyber defense systems.

Presentation Video

About the Authors

Nicholas V. Scott, Ph.D. Riverside Research Institute Open Innovation Center 2640 Hibiscus Way Beavercreek, OH 45431 nscott@riversideresearch.org

Dr. Nicholas Scott is a modeling scientist and physical oceanographer and has been a member of the professional staff at Riverside Research in Dayton, OH, since October 2012. He investigates the applicability of traditional and non-traditional signal and image processing techniques to extracting information from remotely sensed imagery. This includes hyperspectral and multispectral imagery. His present work includes statistical modeling of geo-intelligence information, sensor array time series analysis of environmental data, and applying pattern recognition techniques to turbulent flow imagery and numerically simulated data. He is also involved in applying probabilistic graphical modeling algorithms for information fusion and statistical inference.

Jack McCarthy Duke University, Dept. of Statistical Science jack.mccarthy@duke.edu

Previous Conference Presentations

Spatio-temporal Multicomponent Optimal Learning State Estimation of Direct Numerically Simulated Turbulent Features: A Smart Sensing Approach (Laval Virtual World, 2020)
Bayesian Structural Field Analysis (Durham, 2019)
Bayesian Network Modeling of Imagery Features From Direct Numerically Simulated Turbulent Sediment-Laden Oscillatory Flow (Chicago, 2018)

Cybersecurity Risk Assessment with Bayesian Networks

Presented at the 9th Annual BayesiaLab Conference on October 14, 2021.

Abstract

Risk assessment is challenging when data is unavailable, hard to obtain, or costly to process. Organizations often request estimates from experts instead. This talk demonstrates how to integrate cybersecurity data with expert estimates using Bayesian Networks. Cybersecurity analysts, resource managers, and executives can use Bayesian Network models to perform risk assessments, select security controls, and prioritize which suspicious events to investigate first. System administrators can configure autonomous sources of data including vulnerability scanners and cybersecurity event monitoring systems to automatically update these hybrid network models alongside inputs from risk analysts and executives.

Presentation Video

Presentation Materials

About the Presenter

Corey Neskey Vice President, Quantitative Risk Hive Systems corey.neskey@hivesystems.io

Corey has been providing analyses, architecting secure environments, and leading security program implementations in IT security and risk since 2011. His career started with informing executive decision-making using algebraic data analyses for explanation, simulation, and attribution (i.e., intelligence analysis, forensics, SOC, CIRT), and optimization. His toolset expanded to more descriptive and predictive methods (i.e., machine learning/AI for risk assessment, vulnerability prioritization, event correlation). He is now developing skills for integrating these analytical areas and expanding beyond algebraic methods and static probability calculus to using Bayesian network models.

Genes, Publicly Available Databases, and Bayesian Networks

A Strategic Approach to Probabilistic Networks in Poultry and Stress

Presented at the 9th Annual BayesiaLab Conference on October 14, 2021.

Abstract

Exploring publicly available genetic data repositories, such as Gene Expression Omnibus or Array Express, represents a great possibility to collect data previously published and get a deeper insight into a particular field of genetics. In the field of poultry genetics, experimental designs evaluate only a relatively small number of birds per study, requiring the combination of multiple sources into one bigger dataset for further analysis, focusing on one variable of interest, such as stress. Bayesian networks are a useful tool to overcome this challenge, as they can deal with uncertainty and noise resulting from different experimental designs, discovering relationships that are not necessarily linear. Therefore, our goal was to identify genes associated with stress in chickens, invoking an approach to Bayesian networks that involved the identification of genes of interest, the reduction of the dimensionality, followed by the learning of the structure of the consensus Bayesian network. Initially, genes identified in a previously published study were extracted from two other datasets with a similar experimental design. Our dataset consisted of 50 chickens, 101 genes and their expression values, and the stress condition. As the number of genes was rather too large to apply Bayesian networks algorithms directly, a supervised Naïve Bayes algorithm was implemented. The top 10 genes that contributed the most to the stress condition were used to learn the structure of the Bayesian network by the software Banjo to search for the best consensus network. Our results showed that all genes, as well as the condition, were included in the overall structure of the consensus network, indicating that all were interconnected. Interestingly, WNT7A, the gene that contributed the most to the condition according to Naïve Bayes, was found in close association with it in the network. Additionally, HSPH1 also displayed a relationship with the condition. The discovery of these two genes could be further explored in future studies as genes related to stress resistance or stress resilience with the aim of improving the welfare of chickens bred under commercial environments.

Presentation Video

Presentation Materials

About the Presenter

Emiliano Ariel Videla Rodríguez School of Biology University of St Andrews St Andrews, Fife KY16 9TH United Kingdom

Using Bayesian Networks to Map Winter Habitat for Mountain Goats in Coastal British Columbia

Presented at the 9th Annual BayesiaLab Conference on October 14, 2021.

Abstract

Mountain goats are an iconic wildlife species of western North America, inhabiting steep and largely inaccessible terrain in remote areas. But they are also at risk from genetic isolation, climate change, and a variety of other stressors. Managing populations is challenging because mountain goats are difficult and expensive to inventory, and biologists have to rely on models to predict the species’ abundance and distribution. I used landscape characteristics evident at point locations of mountain goat observations, along with an equal number of random locations, to learn the structure and parameters of a Bayesian network that predicted the suitability of habitats for mountain goats. I then used the model to process evidence scenario files of >100 million records to map the suitability of mountain goat habitat at a 25-m resolution throughout the study area. The model has subsequently been used to assess the effectiveness of current protected areas for mountain goats and to generate preliminary population estimates. Modeling the system as a Bayesian network provided a number of advantages over traditional parametric approaches because, as with many ecological studies, input variables were correlated, and animals exhibited non-linear responses to landscape conditions.

Presentation Video

Presentation Slides

About the Presenter

Steven F. Wilson, Ph.D. EcoLogic Research 302-99 Chapel Street Nanaimo, BC V9R 5H3 Canada steven.wilson@ecologicresearch.ca

Steve Wilson has 30 years of experience working at technical and professional levels in strategic and operational planning for wildlife and other ecological values. He specializes in quantitative approaches to decision support and policy analysis. Steve holds a Ph.D. in wildlife ecology from the University of British Columbia in Vancouver.

Previous Conference Presentations

Characterizing Background Metal Concentration in Soils from Southeastern U.S. Cities

Presented at the 9th Annual BayesiaLab Conference on October 14, 2021.

Abstract

Understanding the background metal concentrations of soils is important for setting remedial goals at polluted sites. To better understand urban background concentrations for contaminated site remediation and risk assessment, personnel from Region 4 at the U.S. Environmental Protection Agency led a collection and analysis effort for urban soils in five states of the southeastern U.S. Each of the cities within these states had 50 samples collected from randomly chosen grid cells with additional qualifying criteria for within-grid cell sampling. Seven cities in these five states were included in the current Bayesian network analysis (Gainesville, FL; Lexington, KY; Louisville, KY; Raleigh, NC; Winston-Salem, NC; Columbia, SC; and Memphis, TN). Chemical concentration data frequently contain analyzed values that are considered non-detected data. These data are often assumed to have a potential concentration that ranges from 0 to the method detection limit of the analysis. Preliminary work examined the influence of substitution for case file usage on discretization thresholds for these non-detected data. The final metals chosen for analysis and other urban site measurement data were condensed into a single case file with each case representing one sampling site with columns for concentrations of metals, coordinates, land use, nearby emission sources, city, and state information for each sampling site. Data clustering with expectation-maximization was used to create a new factor variable with cluster states based on the metals data from all cities. Relationships between the identified metals concentration clusters and nodes from the case file that were excluded from the clustering analysis (cities, nearby emission sources, and land use) were also examined. These analyses explored the relationship of different sampling site characteristics with the metals clusters through sensitivity analyses and probability distribution changes. Data clustering analysis can be useful for interpreting and exploring background metals concentration sampling data for urban regions.

EPA Disclaimer: The views expressed in this presentation are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Authors

John F. Carriger U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Solutions and Emergency Response, Land Remediation Technology Division, Environmental Decision Analytics Branch, Cincinnati, OH

Robert G. Ford U.S. Environmental Protection Agency, Office of Research and Development, Center for Environmental Solutions and Emergency Response, Land Remediation Technology Division, Contaminated Sites and Sediments Branch, Cincinnati, OH

Tim Frederick Sydney Chan U.S. Environmental Protection Agency, Region 4, Superfund and Emergency Management Division, Resource and Scientific Integrity Branch, Scientific Support Section, Atlanta, GA

Yuen-Chang Fung Tetra Tech, Inc.

About the Presenter

John Carriger is a researcher with the U.S. Environmental Protection Agency’s Office of Research and Development. John received his Ph.D. in Marine Science from the College of William & Mary in 2009. His research interests are developing and applying causal modeling, decision analysis, and risk assessment tools to diverse environmental problems. John lives and works in Cincinnati, OH, USA.

Noise — A Flaw in Human Judgment

Abstract

Imagine that two doctors in the same city give different diagnoses to identical patients—or that two judges in the same courthouse give markedly different sentences to people who have committed the same crime. Suppose that different interviewers at the same firm make different decisions about indistinguishable job applicants—or that when a company is handling customer complaints, the resolution depends on who happens to answer the phone. Now imagine that the same doctor, the same judge, the same interviewer, or the same customer service agent makes different decisions depending on whether it is morning or afternoon, or Monday rather than Wednesday. These are examples of noise: variability in judgments that should be identical.

In Noise, Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein show the detrimental effects of noise in many fields, including medicine, law, economic forecasting, forensic science, bail, child protection, strategy, performance reviews, and personnel selection. Wherever there is judgment, there is noise. Yet, most of the time, individuals and organizations alike are unaware of it. They neglect noise. With a few simple remedies, people can reduce both noise and bias, and so make far better decisions.

Packed with original ideas, and offering the same kinds of research-based insights that made Thinking, Fast and Slow and Nudge groundbreaking New York Times bestsellers, Noise explains how and why humans are so susceptible to noise in judgment—and what we can do about it.

Presentation Video

About the Keynote Speaker

Olivier Sibony is a professor, writer, and advisor specializing in the quality of strategic thinking and the design of decision processes. Olivier teaches Strategy, Decision Making, and Problem Solving at HEC Paris. He is also an Associate Fellow of Saïd Business School at Oxford University.

Before he was a professor, Olivier spent 25 years with McKinsey & Company in France and in the U.S., where he was a Senior Partner. There, he was, at various times, a leader of the Global Strategy Practice and of the Consumer Goods & Retail Sector.

Olivier’s research interests focus on improving the quality of decision-making by reducing the impact of behavioral biases. He is the author of articles in various publications, including “Before You Make That Big Decision,” co-authored with Nobel Prize winner Daniel Kahneman, which was selected as the cover feature of Harvard Business Review’s book selection of “10 Must-Reads on Making Smart Decisions”. In French, he also authored a book, Réapprendre à Décider.

Olivier builds on this research and on his experience to advise senior leaders on strategic and operational decision-making. He is a frequent keynote speaker and facilitator of senior management and supervisory board meetings. He also serves as a member of corporate, advisory, and investment boards.

Olivier Sibony is a graduate of HEC Paris and holds a Ph.D. from Université Paris-Dauphine.

BayesiaLab Independence of Causal Influence Models

Presented at the 9th Annual BayesiaLab Conference on October 15, 2021.

Abstract

One of the most challenging tasks when building Bayesian networks with a group of subject matter experts is the parametrization of the model, i.e., the quantification of the probabilistic relationships. Indeed, with the default tabular representation of Conditional Probability Distributions (CPDs), the number of probabilities to be elicited grows exponentially with respect to the number of parent nodes.

As a result, there is not only a problem regarding the time it takes to fill in these large Conditional Probability Tables (CPTs) but also a problem of consistency between the elicited probabilities.

Even though formulas offer an effective way to fill CPTs, it is not really possible to use them "live" in a brainstorming session because they can seem quite complex and abstract to experts. An elegant alternative approach is determining whether the model requires considering all interactions between the parent nodes. If not, we can "divorce" the parent nodes and effectively represent their Independent Causal Influence (ICI) on the child node.

BayesiaLab 10 includes a new tool for automatically generating different types of ICI models. Experts can choose the Combination Function (Or, And, Max, Min, Sum, Vote, Average) and then elicit Local Causal Effects using simple counterfactual questions.

ICI models are not only useful for modeling expert knowledge but can also be useful in the context of Supervised Learning. The model described above is indeed a Bayesian Regression Model. When data is available, Bayesian Parameter Updating can be utilized to automatically estimate the Local Effects.

Presentation Video

About the Presenter

Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia S.A.S. Lionel holds a Ph.D. in Computer Science from the University of Rennes and has worked in Artificial Intelligence since the early 1990s. While working as a Professor/Researcher at ESIEA, Lionel started exploring the potential of Bayesian networks.
After co-founding Bayesia in 2001, he and his team have been working full-time on the development of BayesiaLab. Since then, BayesiaLab has emerged as the leading software package for knowledge discovery, data mining, and knowledge modeling using Bayesian networks. It enjoys broad acceptance in academic communities, business, and industry.

Supervised Elicitation Model for Causal Analysis of Companies' Performance

Presented at the 9th Annual BayesiaLab Conference on October 15, 2021.

Abstract

We propose a decision support system that introduces “supervised elicitation,” an approach in machine learning and AI for elicitation practices. Thanks to a semi-automatic initialization of the causal analysis process, it alleviates the domain experts' workload and shortens the duration of iterative analysis, producing a disruptive innovation.

Supervised elicitation involves BayesiaLab earlier in the process, coupled with complementary methods borrowed from network science. Iteratively applying the method to a dataset of about 700 variables, we retained 100 decisive variables elicited for causal analysis.

The team IQ&AI implemented supervised elicitation for a multinational company willing to obtain an accurate and global insight into its performance factors from high-dimensional and sparse data sets.

Presentation Video

About the Presenters

Joel Païn has held managing positions all along his 25+ year career. More specifically, he created and managed two firms (Evaneo and Up & Up) and has been the CEO of several companies and investment firms (Positive Planet, CroiSens, SPA, FinanCités…). He has also acquired extensive experience in strategy consulting and restructuring: he has led many strategy analysis, strategy, and restructuring consulting assignments (with Deloitte, EY, and Up & Up). On these occasions, he has had the opportunity to measure the gap between the way consulting firms deliver strategic consultancy and the kind of answers and level of quality of service clients really expect to receive. He is convinced that bridging this gap is an issue, that can be at least partially solved thanks to new methodologies (IQ & AI), based on AI, experts, and Bayesian networks.

In 1984, Christophe Thovex started software programming while studying music in Paris, his first professional career until he turned 30. Since 2000, he worked as a consultant, analyst-programmer, engineer in industrial information systems, before getting involved in network science with a Ph.D. thesis (2009-12), 3 years after it was recognized by the US Research Council (2006). He has delivered numerous analyses, software codes, support services, reports, and research outcomes for various SMEs, large companies, French institutions, and territorial authorities – e.g., MAIF, Alstom Marine, Keolis, Bouygues Telecom, Bonduelle, APEC, or Rennes Métropole. As the main author of about 30 scientific publications since 2010, he still collaborates with the French “Conseil National pour la Recherche Scientifique” (CNRS) and to program committees/editorial boards for international conferences and journals (IEEE/ACM).

With more than 20 years of experience as a statistician and information system analyst, Emmanuel Keita is passionate about building bridges between expertise and data analysis, IQ, and AI, and therefore, about BayesiaLab! AI Associate Senior Consultant for Aveyo Consulting (Aveyo.fr), Emmanuel loves popularizing the advantages and fallacies of AI to a large audience (managers and general public) and also giving conferences and lectures to (future) data scientists. Committed to the societal issues of data science, Emmanuel is a National Defense Auditor (France, Prime Minister) and is currently involved in a private blockchain project (Digital seals, Avkee.com).

Presentation Materials

Automatic Generation of Bayesian Network Simulators from Financial Texts

Presented at the 9th Annual BayesiaLab Conference on October 15, 2021.

Abstract

Using Natural Language Understanding (NLU) on millions of texts and optimization, we automatically generate Bayesian Networks centered around financial targets such as the “USA inflation” or “ExxonMobil profits” with strong predictive capabilities. These Bayesian Networks are then loaded into the BayesiaLab simulation tool to enable the testing of various hypotheses on the future evolution of any of the drivers of the specified target. These capabilities can be applied to publicly listed companies, commodities, or macroeconomic indicators.

Presentation Video

Presenters

Dr. Olav Laudy, Chief Data Scientist, Causality Link Dr. Pierre Haren, CEO, Causality Link

2021 BayesiaLab Conference — A Zoom Virtual Event

The 9th Annual BayesiaLab Conference

Conference Presentations

BayesiaLab 10 Update

Presentation Video

About the Presenter

See Also

Previous Conference Presentations

Long-Term Effects of Marketing using Bayesian Networks

Abstract

Presentation Video

About the Presenters

Presentation Slides

Previous Presentations by Course5 Intelligence

Learning a Bayesian Structure to Model Entrepreneurial Intentions Towards Business Creation

Abstract

Presentation Video

About the Presenter

Presentation Materials

Predicting the “Cracking” Behavior of Road Networks in UAE with Bayesian Belief Networks

Abstract

Presentation Video

About the Presenter

Presentation Materials

Explainable Bayesian Networks for Transport Policy

Abstract

Presentation Video

About the Presenter

Presentation Materials

Previous Presentations

Treatment of Missing Data in Bayesian Structural Learning: A Simulation Study for Social Science

Abstract

Presentation Video

About the Presenters

Presentation Slides

The Capacity of Territories to Integrate the Innovations of Electric Battery and Hydrogen Mobility

Presentation Video

About the Presenter

Presentation Materials

Using Bayesian Networks for Environmental Health Risk Assessment

Abstract

Presentation Video

About the Presenter

Business Valuation Using Bayesian Networks

Abstract

Presentation Video

Presentation Slides

About the Presenter

Previous Conference Presentations

A Bayesian Network Model to Predict Outcomes in Acute Myocarditis Patients

Abstract

Background

Purpose

Methods

Results

Presentation Video

About the Presenter

Presentation Slides

Implementation of Bayesian Networks for Modelling Stress Corrosion Cracking

Abstract

Presentation Video

About the Presenter

Presentation Materials

Using Bayesian Networks to Prevent Breakdowns on Hydraulic Systems of Crawler Excavators

Abstract

Presentation Video

About the Presenter

Apply BBN Models to Identify Consumer Persona in High-Dimensional Parameter Space

Abstract

Presentation Video

About the Presenter

Presentation Materials

Bayesian Belief Network Analysis of Computer Network Traffic Supporting Cybersystem Protection

Abstract

Presentation Video

About the Authors

Previous Conference Presentations

Cybersecurity Risk Assessment with Bayesian Networks

Abstract

Presentation Video