Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Rapid advances in high-throughput technologies, such as microarrays, mass spectrometry and new/next-generation sequencing, can monitor quantitatively the presence or activity of thousands of genes, RNAs, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the pressing need to address complex biomedical challenge, and the gap between the two have collectively created exciting opportunities for data mining researchers.
While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory network mapping, have not been convincingly addressed. Besides these, new technologies such as next-generation sequencing are producing massive amount of sequence data; managing, mining and compressing these data raise challenging issues. Finally, there is a pressing need to use these data and computational techniques to build network models of complex biological processes and disease phenotypes. Data mining will play an essential role in addressing these fundamental problems and the development of novel therapeutic/diagnostic solutions in the post-genomics era of medicine.
Workshop History (2001-2012)
Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2013), held annually or biannually in conjunction with the ACM SIGKDD Conference.
This will be the 12th year for the workshop.
Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field. The program of the workshops included 10-11 contributed papers, and 1-2 invited talks. Information on past workshops is available at the following web pages:
The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of Building network and predictive models of biological processes and diseases using complex data. This field focuses on the use of computational approaches, especially from data mining and machine learning, and the large amount and variety of biological data being generated. The goal here is to build accurate predictive or descriptive network models of biological processes and diseases. These approaches have revolutionized the new age biology by enabling novel discoveries in basic biology and diseases like cancer and diabetes, as well as the development of therapeutics.
We encourage papers that propose novel data mining techniques for areas including but not limited to:
Building predictive models for complex phenotypes from large-scale biological data
Discovering biological networks and pathways underlying biological processes and diseases
Processing of new/next-generation sequencing (NGS) data for genome structural variation analysis, discovery of biomarkers and mutations, and disease risk assessment
Discovery of genotype-phenotype associations
Novel methods and frameworks for mining and integrating big biological data
Metagenome analysis using sequencing data
RNA-seq and microarray-based gene expression analysis
Genome-wide analysis of non-coding RNAs
Genome-wide regulatory motif discovery
Correlating NGS with proteomics data analysis
Functional annotation of genes and proteins
Chemo-informatics: Drug discovery, Virtual screening and Combinatorial chemistry
Knowledge discovery in clinical data and electronic medical records
Special biological data management techniques
Information visualization techniques for biological data
Semantic webs and ontology-driven biological data integration methods
Privacy and security issues in mining genomic and health databases
: Prof. Eric Schadt, Ph.D., Chair, Department of Genetics and Genomic Sciences and Director, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, USA will deliver the plenary address titled "Building predictive models of disease" (abstract below) at the workshop. He will discuss his cutting-edge work in genomics and systems biology, both computational and experimental, which is expected to be of great interest to the BIOKDD audience. His work has received extensive media coverage (e.g., NY Times and CBS News).
Plenary address abstract:
The causal chain of events that lead to the development of complex diseases such as schizophrenia remains elusive. Such diseases are complex, resulting from the interplay of potentially hundreds (or thousands) of genetic loci and environmental factors. Genetic and environmental perturbations induce changes in the molecular interactions of cellular pathways whose collective effect may become clear through the organized structure of multiscale biological networks. We have developed a novel systems approach to study psychiatric disorders such as schizophrenia that models the global molecular, functional, and structural changes in the affected brain that in turn can lead us to the root causes of the disease. To characterize the molecular, cellular, and physiological systems associated with common human diseases, we constructed gene regulatory networks, functional and structural MRI based networks, high-content phenotypic networks and then integrated these network models across all of the data modalities generated across multiple human cohorts comprised of several thousand individuals. Because DNA variation was systematically assessed across all cohorts, it provides a common set of perturbations that can be leveraged to not only infer causal relationships among different molecular and higher order traits, but that can help link networks at different scales (e.g., molecular and imaging) across cohorts. Through this integrative network-based approach, we rank-order the resulting network structures for relevance to different diseases, highlighting both known and novel biological pathways involved in disease pathogenesis and progression. We demonstrate that the causal network structures we construct from this big data integration exercise is a useful predictor of response to gene perturbations and presents a novel framework to test models of disease mechanisms underlying disease. We further demonstrate that our approach can offer novel insights for drug discovery programs aimed at treating disease by screening our disease-associated networks against molecular signatures induced by marketed and novel compounds across a number of cell-bases systems, including those derived from stem cells isolated from patients with disease.
Invited talks: BIOKDD'13 will feature invited talks by prominent researchers in computational biology and data mining:
Prof. Predrag Radivojac, Indiana University will deliver a talk titled State-of-the-art in protein function prediction. His summary of the talk: In this talk I will first provide the significance and computational problem formulation of protein function prediction. I will then present details of the first Critical Assessment of Functional Annotation (CAFA) experiment, where we evaluated state-of-the-art in the field. We provided evidence that modern methods significantly outperform simple BLAST alignments but that there is significant need and room for improvement. I will lay out possible avenues for improvements and accuracy assessment of function prediction proposed by my research group. Finally, I will briefly discuss the CAFA 2013-2014 challenge whose start is anticipated for Summer 2013.
Ananth Grama, Purdue University will deliver a talk titled Systems Biology of Cellular Aging and Age-Related Degeneracies. His summary of the talk: Cellular aging is a multi-factorial complex phenotype, characterized by
the accumulation of damaged cellular components over the organism's
life-span. The progression of aging depends on both the increasing rate of
damage to DNA, RNA, proteins, and cellular organelles, as well as the
gradual decline of the cellular defense mechanisms against stress. This
can ultimately lead to a dysfunctional cell, with a higher risk factor for
a number of diseases, including cancers, cardiovascular disease, and
multiple neurodegenerative disorders. With a view to uncovering the
pathways associated with aging, and their role in age-related
degeneracies, we have developed a number of algorithms and statistical
models that integrate and analyze disparate data over human and yeast
interactomes. In this talk, we present two recent results: (i) we
demonstrate the use of directed random walks in uncovering the downstream
effectors of Target of Rapamycin (TOR), a highly conserved protein kinase
that plays a key role in the aging process of various organisms; and (ii)
we build tissue-specific networks for human cells and develop a complete
framework for projecting these tissue-specific networks on to the yeast
interactome. The goals of this effort are many-fold -- strong alignments
indicate tissues for which yeast is a good model organism (in terms of
underlying biochemistry), alignments reveal specific pathways that are
well conserved, and they serve as a first step in understanding the
etiology of age-related degeneracies.
Refer to the BIOKDD Fan site on Facebook (www.facebook.com/Biokdd) for the following date updates (The panel on the left pane of this page contains the real-time update information of the workshop!):
May 22nd, 2013 Paper Submission Due
June 25th, 2013 Notification of Acceptance
July 5th, 2013 (5 PM EST) Camera-ready Paper Due
August 11th, 2013 Workshop Presentation
All papers will be published at the workshop proceedings and at the ACM digital library.
Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.
Publication of proceeding and expanded papers. A selection of accepted papers will also be invited to be submitted to a special section of the reputed IEEE Transactions on Computational Biology and Bioinformatics. Each paper submitted to the special issue should contain "a sufficient amount of new material" relative to the worksop version. These papers will go through further review before acceptance for publication in the special issue.
A special BIOKDD workshop registration is required for each accepted paper in addition to conference registration. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter. For those who do not present a BIOKDD workshop paper, this registration fee is not required. BIOKDD registration using the Google Checkout link below.
KDD-2013 conference has a separate and mandatory registration process. If you register with the conference, you can get a printed proceeding for a nominal fee at the conference registeration desk directly.
Please use the following Google Checkout to pay the workshop publication fees.
Mohammed Zaki, Ph.D.
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180
Jake Y. Chen, Ph.D.
Indiana University School of Informatics
Purdue University School of Science Department of Computer & Information Science
Indiana Center for Systems Biology and Personalized Medicine
Indianapolis, IN 46202