resources – AI Resources

CPSC 5380: Big Data Systems: Trends and Challenges

Today’s Internet-scale applications and cloud services generate massive amounts of data. At the same time, the availability of inexpensive storage has made it possible for these services and applications to collect and store every piece of data they generate, in the hopes of improving their services by analyzing the collected data. This introduces interesting new opportunities and challenges designing systems for collecting, analyzing, and serving the so-called big data. This course looks at technology trends that have paved the way for big data applications, surveys state-of-the-art systems for storage and processing of big data, and considers future research directions driven by open research problems. Our discussions span topics such as cluster architecture, big data analytics stacks, scheduling and resource management, batch and stream analytics, graph processing, ML/AI frameworks, and serverless platforms and disaggregated architectures.

AI/ML, Statistics/Data Science

CPSC 5710: Trustworthy Deep Learning

Rex (Zhitao) Ying

In recent years, deep learning has seen applications in many fields, from science and technology, to finance, humanity, and businesses. However, real-world, high-impact machine learning applications demand more than just model performance. In particular, deep learning models are often required to be “trustworthy,” so that domain experts can trust that the models consistently behave in a way that corresponds to their domain knowledge. For example, medical experts would expect a deep learning diagnosis model to be able to explicitly utilize medical domain knowledge in its prediction; an insurance company would expect a decision on insurance price to be explainable in terms of risk factors; a financial company would expect its fraud detection model to be robust to adversarial attacks; a physicist would expect models to provide consistency with the underlying laws. This course introduces various fields of trustworthy deep learning, including model robustness, defenses for adversarial attacks, interpretability, explainability, fairness, privacy, domain adaptation, rules, and constraints. The course covers some of these aspects in the context of graph neural networks but also covers many other ML models in general deep learning, natural language processing, and computer vision.

AI/ML, Ethics

S&DS 5720: YData: Data Science for Political Campaigns

Joshua Kalla, PhD

Political campaigns have become increasingly data driven. Data science is used to inform where campaigns compete, which messages they use, how they deliver them, and among which voters. In this course, we explore how data science is being used to design winning campaigns. Students gain an understanding of what data is available to campaigns, how campaigns use this data to identify supporters, and the use of experiments in campaigns. The course provides students with an introduction to political campaigns, an introduction to data science tools necessary for studying politics, and opportunities to practice the data science skills presented in S&DS 523.

Statistics/Data Science, Social Sciences

CPSC 5700: Artificial Intelligence

Tesca Fitzgerald, PhD

Introduction to artificial intelligence research, focusing on reasoning and perception. Topics include knowledge representation, predicate calculus, temporal reasoning, vision, robotics, planning, and learning.

AI/ML, Engineering, Physical and Natural Sciences

S&DS 6890: Scientific Machine Learning

Lu Lu, PhD

AI/ML, Statistics/Data Science, Engineering, Medicine/Biomedical Sciences, Physical and Natural Sciences

CPSC 5460: Data and Information Visualization

Holly Rushmeier, MS, PhD

Visualization is a powerful tool for understanding data and concepts. This course provides an introduction to the concepts needed to build new visualization systems, rather than to use existing visualization software. Major topics are abstracting visualization tasks, using visual channels, spatial arrangements of data, navigation in visualization systems, using multiple views, and filtering and aggregating data. Case studies to be considered include a wide range of visualization types and applications in humanities, engineering, science, and social science.

Statistics/Data Science, Engineering, Humanities, Medicine/Biomedical Sciences, Physical and Natural Sciences, Social Sciences

CPSC 5150: Law and Large Language Models

Ruzica Piskac, PhD, Scott J. Shapiro, JD, PhD

This course is intended for computer science and law students interested in how artificial intelligence can be applied to legal reasoning. It combines basic AI theory with practical project work, focusing on using tools like large language models (LLMs) and other AI technologies for tasks common in legal practice. Students learn how to automate case summarization, draft legal memos and briefs, simulate oral arguments for better argumentation skills, and assist in the preparation of pro-se motions for self-represented litigants. The course emphasizes hands-on experience, helping students build real-world skills in applying AI in legal settings. Our goal is to bring together students from computer science and from law and match them together in the teams. Each team works on a project that automates a specific aspect of the legal process or legal reasoning, focusing on practical, real-world applications. In addition to all standard course requirements, graduate students need to present a recent, relevant research paper in class.

AI/ML, Humanities

CPSC 5860: Probabilistic Machine Learning

Andre Wibisono, MA, MEng, PhD

This course provides an overview of the probabilistic frameworks for machine learning applications. The course covers probabilistic generative models, learning and inference, algorithms for sampling, and a survey of generative diffusion models. This course studies the theoretical analysis of the problems and how to design algorithms to solve them. This course familiarizes students with techniques and results in literature and prepares them for research in machine learning.

AI/ML, Statistics/Data Science

S&DS 5650: Introductory Machine Learning

John Lafferty, PhD

This course covers the key ideas and techniques in machine learning without the use of advanced mathematics. Basic methodology and relevant concepts are presented in lectures, including the intuition behind the methods. Assignments give students hands-on experience with the methods on different types of data. Topics include linear regression and classification, tree-based methods, clustering, topic models, word embeddings, recurrent neural networks, dictionary learning, and deep learning. Examples come from a variety of sources including political speeches, archives of scientific articles, real estate listings, natural images, and others. Programming is central to the course and is based on the Python programming language.

AI/ML, Statistics/Data Science

CPSC 5810: Introduction to Machine Learning

Alex Wong, MS, PhD

This course focuses on fundamental topics in machine learning. We begin with an overview of different components of machine learning and types of learning paradigms. We introduce a linear function, discuss how one can train a linear function on a given dataset, and utilize it to tackle classification and regression problems. We then consider kernel methods to enable us to solve nonlinear problems. Additionally, we introduce the concept of generalization error and overfitting. We discuss the role of regularization and extend linear regression to ridge regression. We also cover topics in optimization, beginning from gradient descent and extending it to stochastic gradient descent and its momentum variant. We also cover the concept of alternating optimization and topics within it. We introduce the curse of dimensionality and discuss topics on dimensionality reduction. Finally, we conclude the course with neural networks: how to build them using the topics discussed, how to optimize them, and how to apply them to solve a range of machine learning tasks.

AI/ML, Statistics/Data Science, Physical and Natural Sciences

CPSC 7760: Topics in Industrial AI Applications

Xiuye (Sue) Chen, PhD

This seminar aims to familiarize students with cutting-edge topics in industrial AI research and their practical applications. We will explore a broad range of topics such as large language models, image generation, ML/AI systems considerations, autonomous vehicles, robotics, recommender systems, ambient intelligence, and AI applications in the life sciences and healthcare. Most sessions will be devoted to in-depth discussions of one to two key papers on modern AI applications. We will also feature a series of industry guest speakers, providing students with the opportunity to learn directly from practicing experts. In this seminar, students are expected to present papers, actively participate in class discussions, and work either individually or in groups on a final project that emphasizes the practical implementation of AI techniques. Students should be familiar enough with ML/AI concepts to read academic papers, and comfortable with programming to run open source code in the ML/AI space.

AI/ML, Medicine/Biomedical Sciences, Engineering

CPSC 5830: Deep Learning on Graph-Structured Data

Rex (Zhitao) Ying, PhD

Graph structure emerges in many important domain applications, including but not limited to computer vision, natural sciences, social networks, languages, and knowledge graphs. This course offers an introduction to deep learning algorithms applied to such graph-structured data. The first part of the course is an introduction to representation learning for graphs and covers common techniques in the field, including distributed node embeddings, graph neural networks, deep graph generative models, and non-Euclidean embeddings. The first part also touches upon topics of real-world significance, including auto-ML and explainability for graph learning. The second part of the course covers important applications of graph machine learning. We learn ways to model data as graphs and apply graph learning techniques to problems in domains including online recommender systems, knowledge graphs, biological networks, physical simulations and graph mining. The course covers many deep techniques (graph neural networks, graph deep generative models) catered to graph structures. We cover basic deep learning tutorials in this course.

AI/ML, Statistics/Data Science, Medicine/Biomedical Sciences, Physical and Natural Sciences

MGT 853: AI Strategy & Marketing

Vineet Kumar, PhD

Artificial Intelligence is a general-purpose technology which has the potential to transform many aspects of business and society. In business, the impact ranges from commonplace predictive improvements at one end of the spectrum to opportunities for creating entirely new markets at the other. As background, the course will briefly introduce students to Artificial Intelligence / Machine Learning methods comprising of Unsupervised, Supervised and Reinforcement Learning. Through a combination of lectures and case studies, we will evaluate how to integrate AI into decision making, and examine the strategic choices facing companies developing and using AI / ML technologies. We will evaluate how both consumers and decision-makers evaluate decisions made by AI systems, and the feasibility of explainable AI. The course will also examine issues at the intersection of AI and Society including fairness and bias that are proving to be especially challenging, and an understanding of how both consumers and decision-makers evaluate decisions made by AI systems. Note: This is a new course currently under development, so there is no syllabus currently available. The syllabus will be available and posted to canvas and the professor’s website when it becomes available during the spring-1 term.

AI/ML, Management/Business

MGT 575: Generative AI and Social Media

Tauhid Zaman, PhD, MEng

This course equips students with the tools and techniques of generative AI, focusing on its transformative applications for social media analysis and content creation. Emphasizing practical, hands-on learning, the curriculum trains students to leverage AI for analyzing, designing, and optimizing social media strategies. Key topics include: 1. Building social media apps for sentiment analysis, influencer identification, and audience segmentation 2. Harnessing generative AI to craft compelling text and visual content tailored to specific audiences 3. Automating and optimizing social media campaigns to boost engagement and impact Students will work extensively with advanced AI tools such as ChatGPT, gaining experience in analysis, content generation, and app development. Course assignments and projects are grounded in real-world social media datasets, culminating in a group project where students will create a social media application powered by generative AI. As a fully project-based course, there are no exams. The assignments and projects are designed to be accessible to all students, regardless of prior coding experience, making it an ideal opportunity to develop expertise in applying generative AI to the dynamic field of social media.

AI/ML, Social Sciences

MGT 899: Generative AI & Entrepreneurship

Anand Ranganathan, PhD

The advent of Generative AI has revolutionized industries by enabling the creation of new content, solutions, and business models. In this course, we will explore how entrepreneurs can harness the power of Generative AI to innovate, build scalable ventures, and drive competitive advantage. Through a blend of theory, hands-on work, and market analysis, students will learn how to leverage AI to develop innovative products and build AI-driven businesses. Our course delves into the foundational principles of constructing, deploying, and managing Generative AI systems in real-world scenarios. We'll explore widely used concepts, techniques, and frameworks, such as prompt engineering, working with external structured and unstructured datasets, knowledge extraction, agentic workflows, multimedia search and generation, code generation, chatbots, etc. Additionally, we'll delve into various aspects of LLMOps, with a particular emphasis on metrics for evaluating Generative AI systems. We’ll also explore certain strategies for enhancing performance and accuracy, such as fine-tuning and Graph-RAG. Finally, we’ll analyze regulatory and ethical considerations in using AI for business. In each lecture, we will go through the concepts, techniques, and frameworks, followed by an analysis of entrepreneurial opportunities in the space. This will include a review of some sample companies in the space. We will explore the opportunity for startups to disrupt different industries and technology spaces, while at the same time, examining the dangers that startups have of themselves getting disrupted by bigger players. As a result, by the end of the course, students will get a wide view of the landscape of Gen AI companies and the opportunities and challenges that exist.

AI/ML, Statistics/Data Science, Management/Business

BIS 568: Applied Artificial Intelligence in Healthcare

Wade Schulz, MD, PhD

Recent advances in machine learning (ML) offer tremendous promise to improve the care of patients. However, few ML applications are currently deployed within healthcare institutions and even fewer provide real value. This course is designed to empower students to overcome common pitfalls in bringing ML to the bedside and aims to provide a holistic approach to the complexities and nuances of ML in the healthcare space. The class focuses on key steps of model development and implementation centered on real-world applications. Students apply what they learn from the lectures, assignments, and readings to identify salient healthcare problems and tackle their solutions through end-to-end data engineering pipelines.

Students are expected to be proficient in programming (R, Python, or Julia preferred) and have some prior experience in machine learning including data preprocessing (e.g., Python-Pandas, R- Tidyverse) and the development and validation of ML models (e.g. logistic regression, random forest, XGBoost). Otherwise, permission of the instructor is required.

AI/ML, Medicine/Biomedical Sciences

BIS 565: The Role of Ethics and Equity in Data Science and AI

Bhramar Mukherjee, PhD

With the explosion of conversational generative artificial intelligence (AI) tools, such as ChatGPT, Gemini, Llama, Claude Sonnet, DeepSeek, and many others, innovations in data science are greatly influencing day-to-day decision making of the public including decision regarding health, well-being, prevention, treatment and care. This new course thematically belongs to the intersectional field of critical data studies, data science, and public health. Critical data studies is an interdisciplinary field focused on the social, cultural, ethical, and epistemological aspects of data. We first define some of the fundamental technical terms and tools in modern data science, machine learning (ML), and AI such as random forest, neural networks, transformer, auto-encoder, embeddings, stable diffusion process, large language models/foundation models, reinforcement learning, and prediction-powered inference. We then introduce the notion of data equity and data ethics in broad philosophical terms, held by a theoretical framework that appeals to a set of key underlying principles, drawing primarily from the extant computer science and statistical/epidemiological literature. We introduce these core concepts and associated evaluation metrics. The discussed concepts include: fairness, accountability, transparency, ethics, privacy, governance, reflexivity, reproducibility, generalizability, representativeness, causality, confounding bias, selection bias, and information bias. The course consists of lectures, homework, paper presentations, discussion sessions, and a final project that involves critical appraisal of an open-source AI/data science tool or prediction model in terms of the principles taught in the course.

AI/ML, Statistics/Data Science, Ethics, Medicine/Biomedical Sciences, Public Health/Biostatistics, Social Sciences

Yale Summer Course in Public Health Modeling

Virginia Pitzer, ScD

The Yale School of Public Health’s Summer Course in Public Health Modeling is an exciting opportunity to learn to understand and implement the latest techniques from distinguished Yale faculty and network with an international group of public health researchers.

The course is designed to provide researchers, clinicians, industry professionals, and policymakers with the systems-based perspective and analytic tools they need to better understand and manage the complex forces that drive the health of populations. Course topics include prediction and control of infectious disease outbreaks such as COVID-19, optimal decision-making in healthcare delivery, and designing interventions to mitigate the effects of drug overdoses.

Course instructors are Yale faculty experts in epidemiology, biostatistics, health policy, and health care operations who have been at the forefront of informing model-guided responses to COVID-19 and other disease threats locally, nationally, and around the world.

AI/ML, Statistics/Data Science, Ethics, Medicine/Biomedical Sciences, Physical and Natural Sciences, Public Health/Biostatistics

PUBH 580 Seminar for Modeling in Public Health

A. David Paltiel, MBA, PhD, Virginia Pitzer, ScD

This yearlong, monthly seminar is targeted most specifically to students in the Public Health Modeling Concentration but open to all interested members of the Yale community. The seminar features talks by faculty from across Yale University doing modeling-related research, as well as invited speakers from other universities and public health agencies. The objectives are to offer students the opportunity to witness the scope and range of questions in public health policy and practice that may be addressed, understood, and informed using model-based approaches; appreciate the breadth of public health modeling research being conducted around the University and beyond; explore possible collaborations/relationships with other scholars and professionals; review, critique, and evaluate model-based public health research in a structured environment; and form their own opinions regarding the applicability, relevance, and responsible use of modeling methods. Two terms of this no-credit seminar are required of students in the Public Health Modeling Concentration. For each class, one or two readings are circulated/posted on the course website prior to the talk. Students are encouraged to read the articles and articulate questions for the speaker.

AI/ML, Statistics/Data Science, Medicine/Biomedical Sciences, Public Health/Biostatistics

Courses & Educational Programs

CPSC 5380: Big Data Systems: Trends and Challenges

CPSC 5710: Trustworthy Deep Learning

S&DS 5720: YData: Data Science for Political Campaigns

CPSC 5700: Artificial Intelligence

S&DS 6890: Scientific Machine Learning

CPSC 5460: Data and Information Visualization

CPSC 5150: Law and Large Language Models

CPSC 5860: Probabilistic Machine Learning

S&DS 5650: Introductory Machine Learning

CPSC 5520: Deep Learning Theory and Applications

CPSC 7520: Biomedical Data Science: Mining and Modeling

CPSC 5810: Introduction to Machine Learning

CPSC 7760: Topics in Industrial AI Applications

CPSC 5830: Deep Learning on Graph-Structured Data

S&DS 5170: Applied Machine Learning and Causal Inference

S&DS 5230: YData: An Introduction to Data Science

S&DS 6650: Intermediate Machine Learning

SOCY 5670: AI in Social Science Methods

Peter Salovey and Marta Moret Data Science Fellows Program

MGT 695: Intro to AI Applications

MGT 802: Large Language Models

MGT 554: AI for Business Decisions

MGT 853: AI Strategy & Marketing

MGT 860: Generative AI for Managers

MGT 575: Generative AI and Social Media

MGT 819: Data Science

MGT 899: Generative AI & Entrepreneurship

AI in Medicine Student Interest Group Monthly Seminar

Yale University and Boehringer Ingelheim Biomedical Data Science Fellowship Program

NIH/NLM Biomedical Informatics and Data Science Training Program

BIS 555: Machine Learning with Biomedical Data

EMD 538 Quantitative Methods for Infectious Disease Epidemiology

BIS 568: Applied Artificial Intelligence in Healthcare

BIS 565: The Role of Ethics and Equity in Data Science and AI

Yale Summer Course in Public Health Modeling

Quantitative Methods for Infectious Disease - Time Series Analysis

Ordinary Differential Equations (ODE) in R

Course Materials for Bootstrap Learning Python with R

BIS 557: How to Create an R Package

PUBH 580 Seminar for Modeling in Public Health

Getting started with Git in RStudio

Introduction to R Lab

Teaching Workshops and Journal Clubs for Faculty

AI at Yale Symposium

Learn About AI

Virtual AI Brown Bag Series