Invited Speakers

Amarda Shehu

Professor, Department of Computer Science
Volgenau School of Engineering
George Mason University
Program Director, National Science Foundation

Website: https://cs.gmu.edu/~ashehu/

From Genotypes to Phenotypes: Calling (Deep) Modelers to Step up Their Game

Abstract:

Anfinsen showed us that protein tertiary structure was largely encoded in the amino-acid sequence. This, combined with rapid advances in genome sequencing, held the promise that we would soon know what genes do, particularly genes central to human biology and disorders. Yet, the genotype-to-phenotype problem has proven more challenging than anticipated. A seemingly simple question of how the variation in a protein translates to changes in its phenotypic profile takes a large amount of wet-laboratory time and resources in carefully-designed experiments. Computational models have largely proven incapable of stepping up to the challenge. Even AlphaFold2, with its promise of having solved the decades-old protein structure prediction problem, cannot capture fundamental knowledge about structure dynamics and its central role in protein function. In this talk I will overview work in my laboratory on two instantiations of this problem, linking protein sequence variations to dynamics-governed dysfunction in proteinopathies and linking chemical and biological space in small molecule generation. I will present various models that integrate data in evolutionary computation-based frameworks, as well as purely data-driven, deep models that learn latent representations. I hope lessons of success and failures from my laboratory will inspire us to raise up our game and pursue AI frameworks able to address those interesting, complex, and messy questions that molecular biology never fails to provide.

Biography:

Dr. Amarda Shehu is a Professor in the Department of Computer Science in the College of Engineering and Computing with affiliated appointments in the Department of Bioengineering and School of Systems Biology at George Mason University. She is also Co-Director of the Center for Advancing Human-Machine Partnerships (CAHMP), a Transdisciplinary Center for Advanced Study at George Mason University. Shehu obtained her Ph.D. from Rice University in 2008, where she was an NIH predoctoral fellow in the Nanobiology Program and was dually trained in AI and Molecular Biophysics. Shehu's research focuses on AI-driven scientific discoveries that bridge scientific disciplines and advance understanding. In particular, her laboratory has made many contributions in bioinformatics and computational biology regarding the relationship between macromolecular sequence, structure, dynamics, and function. Shehu has published over 150 technical papers with postdoctoral, graduate, undergraduate, and high-school students. She is the recipient of an NSF CAREER Award, and her research is regularly supported by various NSF programs, as well as state and private research awards. Shehu is also the recipient of the 2020 Beck Family Presidential Medal for Faculty Excellence in Research and Scholarship, the 2018 Mason University Teaching Excellence Award, the 2014 Mason Emerging Researcher/Scholar/Creator Award, and the 2013 Mason OSCAR Undergraduate Mentor Excellence Award. She currently serves as Program Director at the National Science Foundation in the Information and Intelligent Systems Division of the Computer and Information Science and Engineering Directorate.

Predrag Radivojac

Professor of Computer Science
Khoury College of Computer Sciences
Northeastern University

Website: https://www.ccs.neu.edu/home/radivojac/

On the developing protocols and guidelines for the use of variant pathogenicity predictors in the clinic

Abstract:

The development of machine learning algorithms and tools for variant and genome interpretation has been an active field of research for more than two decades. While significant advances have been made in the research space, translation of these methodologies to the clinic has attracted less attention. Recently, the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) have jointly proposed the guidelines for the interpretation of sequence variants, among which computational tools can be used. While these recommendations were a huge step forward, they are only qualitative in nature, they do not fully utilize the potential of computational tools and also unintentionally allow room for their misuse. This presentation will describe the ongoing efforts of the Computational Subgroup of the ClinGen's Sequence Variant Interpretation workgroup towards developing procedures and guidelines for the use of computational methods in the clinic. We will discuss the framework for characterizing the contributions of computational tools in the clinic as well as present evaluation that can help clinicians to adequately use the best performing methods..

Biography:

Predrag Radivojac is a Professor of Computer Science at Northeastern University. He received his Bachelor's and Master's degrees in Electrical Engineering from the University of Novi Sad and University of Belgrade, Serbia. His Ph.D. degree is in Computer Science from Temple University (2003) under the direction of Prof. Zoran Obradovic and co-direction of Prof. Keith Dunker. In 2004 he held a post-doctoral position in Keith Dunker's lab at Indiana University School of Medicine, after which he joined Indiana University Bloomington. He moved to Northeastern in 2018. Prof. Radivojac's research is in the areas of computational biology and machine learning with specific interests in protein function, MS/MS proteomics, genome interpretation, and precision health. He received the National Science Foundation (NSF) CAREER Award in 2007 and is an August-Wilhelm Scheer Visiting Professor at Technical University of Munich (TUM) as well as an honorary member of the Institute for Advanced Study at TUM. Prof. Radivojac's projects have been regularly supported by NSF and National Institutes of Health (NIH). He is currently an Editorial Board member for the journals Bioinformatics and Human Genetics, Associate Editor for PLoS Computational Biology. He served three terms (elected) on the Board of Directors of the International Society for Computational Biology (ISCB) between 2012 and 2021.

Marinka Zitnik

Assistant Professor
Department of Biomedical Informatics
Harvard Medical School

Website: https://zitniklab.hms.harvard.edu/

Few-Shot Learning for Network Biology

Abstract:

Prevailing methods for learning on knowledge graphs require abundant label information. However, labeled examples are scarce for problems at the scientific frontier, considerably limiting the methods' use for tasks that require reasoning about new phenomena, such as novel drugs in development, emerging pathogens, and patients with rare diseases. In this talk, I will describe algorithms that enable few-shot learning for network biology. At the core is the notion of local subgraphs that transfer information from one learning task to another, even when each task has only a handful of labeled examples. This principle is theoretically justified as we show that the evidence for a prediction can be found in the local subgraph surrounding target nodes or edges. I will illustrate few-shot learning methods on several problems, including modeling ultra high-order drug combinations and studying proteins across 1,840 species.

Biography:

Marinka Zitnik is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Broad Institute of MIT and Harvard, and Harvard Data Science. Her research recently won best paper and research awards from the International Society for Computational Biology, International Conference on Machine Learning, the Bayer Early Excellence in Science Award, Amazon Faculty Research Award, Rising Star Award in EECS, and Next Generation Recognition in Biomedicine, being the only young scientist who received such recognition in both EECS and Biomedicine.

Daisuke Kihara

Professor
Department of Biological Sciences
Department of Computer Science
Showalter University Faculty Scholar
Purdue University

Website: https://kiharalab.org

Deep-learning-assisted protein 3D structure modeling for medium-resolution cryo-EM density maps

Abstract:

The significant progress of the cryo-electron microscopy (cryo-EM) poses a pressing need for software for structural interpretation of EM maps. Particularly, protein structure modeling tools are needed for EM maps of around 4 Å resolution or worse, where building a main-chain structure is challenging. Our group is actively developing computational tools for protein structure modeling for cryo-EM by applying various algorithms, including deep learning. In this presentation, we focus two of such methods from our lab and recent extension of the development. We have developed a de novo protein main-chain modeling tool named MAINMAST for cryo-EM maps of up to about 4 Å resolution (Nat. Commn., 2018). MAINMAST builds main-chain traces of a protein in an EM map from a minimum spanning tree that is constructed by connecting local high-density points. For maps at a lower resolution, application of deep learning can be effective to capture structural information from density maps. The methods named Emap2sec (Nat. Methods, 2019; Nat Commn. 2021) uses convolutional deep neural network (CNN) to scan an EM map with a 3D voxel and assigns a type of protein structure class, i.e. alpha helix, beta strand, or coil, or nucleotides, from density patterns of the voxel and its neighbors. Recently, we further enhanced the capability of structure modeling by essentially combining these two approaches, in which we use identified structural features by deep learning to guide structure modeling. The approach can be applied related useful tasks, such as detecting modeling errors or density map segmentation.

Biography:

Daisuke Kihara is a full professor in the Department of Biological Sciences and the Department of Computer Science at Purdue University, West Lafayette, Indiana. He received a B.S. degree from the University of Tokyo, Japan in 1994, and a Ph.D. degree from Kyoto University, Japan in 1999. After studying as a postdoctoral researcher with Prof. Jeffrey Skolnick for about 3 years he joined Purdue University in 2003. He was promoted to full professor in 2014. He has been working on various topics in protein bioinformatics including developments of algorithms for protein-protein docking, protein tertiary structure prediction, structure modeling from low-resolution cryo-electron microscopy data, protein function prediction, and computational drug design. He has published over 200 research articles. In 2013, he was named a University Faculty Scholar by Purdue University. In 2021, he was elected as a AIMBE (The American Institute for Medical and Biological Engineering) Fellow.

Dong Xu

Shumaker Endowed Professor
Department of EECS and C.S. Bond Life Sciences Center
University of Missouri-Columbia

Website: http://digbio.missouri.edu

Applications of Deep Learning in Single-Cell Sequencing Data Analyses

Abstract:

Single-cell RNA-sequencing (scRNA-Seq) is widely used to reveal the heterogeneity and dynamics of tissues, organisms, and complex diseases, but its analyses still suffer from multiple challenges, including the sequencing sparsity and complex differential patterns in gene expression. We introduce scGNN (single-cell Graph Neural Network) as a hypothesis-free deep learning framework for scRNA-Seq analyses. It integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer's disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrates disease-related neural development and the differential mechanism by identifying ten cell clusters with enriched signature genes. We further developed RESEPT, a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics by reconstructing and segmenting a transcriptome mapped RGB image. RESEPT can identify the tissue architecture, and represent corresponding marker genes and biological functions accurately. Both scGNN and RESEPT provide critical insights into the underlying mechanisms driving the complex tissue heterogeneities in development and diseases.

Biography:

Dong Xu is Paul K. & Dianne Shumaker Endowed Professor in Department of Electrical Engineering and Computer Science, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his PhD from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016 and Director of Information Technology Program during 2017-2020. Over the past 30 years, he has conducted research in many areas of computational biology and bioinformatics, including single-cell data analysis, protein structure prediction and modeling, protein post-translational modifications, protein localization prediction, computational systems biology, biological information systems, and bioinformatics applications in human, microbes, and plants. His research since 2012 has focused on the interface between bioinformatics and deep learning. He has published more than 400 papers with more than 19,000 citations and an H-index of 69 according to Google Scholar. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015 and American Institute for Medical and Biological Engineering (AIMBE) Fellow in 2020.

Jinbo Xu

Professor
Toyota Technological Institute at Chicago

Website: https://home.ttic.edu/~jinbo/

Protein Structure Prediction by Deep Learning

Abstract:

Accurate description of protein structure and function is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics. Although greatly improved, experimental protein structure determination is still low-throughput and costly, especially for membrane proteins. As such, computational structure prediction is often resorted to. Predicting the structure of a protein without detectable homologs in PDB is very challenging and usually needs a large amount of computing power. This talk will present some deep learning methods (deep convolutional residual neural network and Transformers) that have revolutionized protein structure prediction, and show that even with only a personal computer deep learning may predict the structure of a protein much more accurately than ever before.

Biography:

Dr. Jinbo Xu is a full professor at the Toyota Technological Institute at Chicago, a computer science research and educational institute affiliated with the University of Chicago. Dr. Xu’s research lies in machine learning, optimization and computational biology. He has developed several popular bioinformatics programs such as the CASP-winning RaptorX (http://raptorx.uchicago.edu) for protein structure prediction and IsoRank/HubRank for biological network analysis. The deep learning method invented by him for protein structure prediction has been widely adopted by the community and initiated the revolution of protein structure prediction, due to which he was invited to give a keynote talk at the 2019 3DSIG session of ISMB, the largest bioinformatics conference in the world. Dr. Xu is an Associate Editor of <<Bioinformatics>> and has received many awards, including Alfred P. Sloan Research Fellowship, NSF CAREER award, RECOMB's Test-of-Time award, RECOMB best paper award 2014 and PLoS Computational Biology Research Prize.

Zheng Wang

Assistant Professor
Frost Junior Fellow
Department of Computer Science
Department of Biology
Sylvester Comprehensive Cancer Center
University of Miami

Website: http://www.cs.miami.edu/~zwang/

SCL: a lattice-based approach to infer 3D chromosome structures from single-cell Hi-C data

Abstract:

In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the three-dimensional structures of chromosomes based on single-cell Hi-C data. We developed SCL (Single-Cell Lattice), a computational method to reconstruct three-dimensional (3D) structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2D Gaussian function specifically for the characteristics of single-cell Hi-C data. The Gaussian function and a graph network are used to impute missing Hi-C values. A chromosome is represented as beads-on-a-string and stored in a 3D cubic lattice. Metropolis-Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3D structures (at both 500 kb and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topological domains, and H3K4me3 enriched domains by mapping data from previous studies onto the SCL-inferred 3D structures. The C++ source code of SCL is freely available at http://dna.cs.miami.edu/SCL/.

Biography:

Dr. Zheng Wang is an assistant professor at the department of computer science at the University of Miami. The research interest of Dr. Zheng Wang's lab is in bioinformatics, specifically in the topics of 3D genome analysis, protein structure prediction, protein function prediction, and biological network analysis. He has collaborated with biologists, neuroscientists, and cancer scientists to work on research problems in the life science and medical fields. Deep learning, optimization, Metropolis-Hastings simulation, Hidden Markov Model, graph kernel, graph alignment, and graph convolutional networks are the representative algorithms used in his research.