In conjunction with the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'13)

12th International Workshop on Data Mining in Bioinformatics (BIOKDD '13)
August 11 2013 * Chicago, IL, USA

BIOKDD '13 Workshop

Workshop Home

  Call for Papers
  Important Dates
  Reference Book


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Rapid advances in high-throughput technologies, such as microarrays, mass spectrometry and new/next-generation sequencing, can monitor quantitatively the presence or activity of thousands of genes, RNAs, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the pressing need to address complex biomedical challenge, and the gap between the two have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory network mapping, have not been convincingly addressed. Besides these, new technologies such as next-generation sequencing are producing massive amount of sequence data; managing, mining and compressing these data raise challenging issues. Finally, there is a pressing need to use these data and computational techniques to build network models of complex biological processes and disease phenotypes. Data mining will play an essential role in addressing these fundamental problems and the development of novel therapeutic/diagnostic solutions in the post-genomics era of medicine.

Workshop History (2001-2012)

Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2013), held annually or biannually in conjunction with the ACM SIGKDD Conference. This will be the 12th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and 1-2 invited talks. Information on past workshops is available at the following web pages:

Call for Papers

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of Building network and predictive models of biological processes and diseases using complex data. This field focuses on the use of computational approaches, especially from data mining and machine learning, and the large amount and variety of biological data being generated. The goal here is to build accurate predictive or descriptive network models of biological processes and diseases. These approaches have revolutionized the new age biology by enabling novel discoveries in basic biology and diseases like cancer and diabetes, as well as the development of therapeutics.

We encourage papers that propose novel data mining techniques for areas including but not limited to:

  • Building predictive models for complex phenotypes from large-scale biological data
  • Discovering biological networks and pathways underlying biological processes and diseases
  • Processing of new/next-generation sequencing (NGS) data for genome structural variation analysis, discovery of biomarkers and mutations, and disease risk assessment
  • Discovery of genotype-phenotype associations
  • Novel methods and frameworks for mining and integrating big biological data
  • Comparative genomics
  • Metagenome analysis using sequencing data
  • RNA-seq and microarray-based gene expression analysis
  • Genome-wide analysis of non-coding RNAs
  • Genome-wide regulatory motif discovery
  • Structural bioinformatics
  • Correlating NGS with proteomics data analysis
  • Functional annotation of genes and proteins
  • Chemo-informatics: Drug discovery, Virtual screening and Combinatorial chemistry
  • Knowledge discovery in clinical data and electronic medical records
  • Special biological data management techniques
  • Information visualization techniques for biological data
  • Semantic webs and ontology-driven biological data integration methods
  • Privacy and security issues in mining genomic and health databases

Papers should be at most 9 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Using the ACM Proceedings Format is highly recommended. Paper should be submitted in PDF format through CMT at the following link:

Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD10)

Program Overview

  • Program

  • 9:00-9:10 AM Welcome remarks BioKDD'13 organizers
    9:10-9:35 AM Heuristic Approaches for Time-Lagged Biclustering Joana Gonēalves and Sara Madeira
    9:35-10:00 AM Drug-Target Interaction Prediction for Drug Repurposing with Probabilistic Similarity Logic Shobeir Fakhraei, Louiqa Raschid and Lise Getoor
    10:00-10:25 AM Computational phenotype prediction of ionizing-radiation-resistant bacteria with a multiple-instance learning model Sabeur Aridhi, Haitham Sghaier, Mondher Maddouri and Engelbert Mephu Nguifo
    10:25-11:00 AM Coffee break
    11:00 AM-Noon Keynote talk Eric Schadt, Icahn School of Medicine at Mount Sinai
    Noon-1:30 PM Lunch On your own
    1:30-1:55 PM Signal Detection in Genome Sequences Using Complexity Based Features Mehdi Kargar, Aijun An, Nick Cercone, Kayvan Tirdad and Morteza Zihayat
    1:55-2:20 PM A Fast and Scalable Clustering-based Approach for Constructing Reliable Radiation Hybrid Maps Raed I. Seetan, Anne M. Denton, Omar Al-Azzam, Ajay Kumar, M. Javed Iqbal and Shahryar F. Kianian
    2:20-2:45 PM Mining Spatially Cohesive Itemsets in Protein Molecular Structures Cheng Zhou, Pieter Meysman, Boris Cule, Kris Laukens and Bart Goethals
    2:45-3:30 PM Invited talk 1: State-of-the-art in protein function prediction Predrag Radivojac, Indiana University
    3:30-4:00 PM Coffee break
    4:00-4:25 PM MFMS: Maximal frequent module set mining from multiple human gene expression data sets Saeed Salem and Cagri Ozcaglar
    4:25-5:10 PM Invited talk 2: Systems Biology of Cellular Aging and Age-Related Degeneracies Ananth Grama, Purdue University
    5:10-5:15 PM Closing remarks BioKDD'13 organizers

  • Location: BIOKDD '13 will be held in conjuction with ACM KDD-2013 at the Chicago Sheraton in Chicago, USA. The following is the contact information for the hotel:
  • Chicago Sheraton
    301 East North Water Street (Link to map)
    Chicago, IL, USA
    Tel: +1 (312) 464-1000

  • Plenary Speaker: Prof. Eric Schadt, Ph.D., Chair, Department of Genetics and Genomic Sciences and Director, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, USA will deliver the plenary address titled "Building predictive models of disease" (abstract below) at the workshop. He will discuss his cutting-edge work in genomics and systems biology, both computational and experimental, which is expected to be of great interest to the BIOKDD audience. His work has received extensive media coverage (e.g., NY Times and CBS News).

    Plenary address abstract: The causal chain of events that lead to the development of complex diseases such as schizophrenia remains elusive. Such diseases are complex, resulting from the interplay of potentially hundreds (or thousands) of genetic loci and environmental factors. Genetic and environmental perturbations induce changes in the molecular interactions of cellular pathways whose collective effect may become clear through the organized structure of multiscale biological networks. We have developed a novel systems approach to study psychiatric disorders such as schizophrenia that models the global molecular, functional, and structural changes in the affected brain that in turn can lead us to the root causes of the disease. To characterize the molecular, cellular, and physiological systems associated with common human diseases, we constructed gene regulatory networks, functional and structural MRI based networks, high-content phenotypic networks and then integrated these network models across all of the data modalities generated across multiple human cohorts comprised of several thousand individuals. Because DNA variation was systematically assessed across all cohorts, it provides a common set of perturbations that can be leveraged to not only infer causal relationships among different molecular and higher order traits, but that can help link networks at different scales (e.g., molecular and imaging) across cohorts. Through this integrative network-based approach, we rank-order the resulting network structures for relevance to different diseases, highlighting both known and novel biological pathways involved in disease pathogenesis and progression. We demonstrate that the causal network structures we construct from this big data integration exercise is a useful predictor of response to gene perturbations and presents a novel framework to test models of disease mechanisms underlying disease. We further demonstrate that our approach can offer novel insights for drug discovery programs aimed at treating disease by screening our disease-associated networks against molecular signatures induced by marketed and novel compounds across a number of cell-bases systems, including those derived from stem cells isolated from patients with disease.

  • Invited talks: BIOKDD'13 will feature invited talks by prominent researchers in computational biology and data mining:

    Prof. Predrag Radivojac, Indiana University will deliver a talk titled State-of-the-art in protein function prediction. His summary of the talk: In this talk I will first provide the significance and computational problem formulation of protein function prediction. I will then present details of the first Critical Assessment of Functional Annotation (CAFA) experiment, where we evaluated state-of-the-art in the field. We provided evidence that modern methods significantly outperform simple BLAST alignments but that there is significant need and room for improvement. I will lay out possible avenues for improvements and accuracy assessment of function prediction proposed by my research group. Finally, I will briefly discuss the CAFA 2013-2014 challenge whose start is anticipated for Summer 2013.

    Ananth Grama, Purdue University will deliver a talk titled Systems Biology of Cellular Aging and Age-Related Degeneracies. His summary of the talk: Cellular aging is a multi-factorial complex phenotype, characterized by the accumulation of damaged cellular components over the organism's life-span. The progression of aging depends on both the increasing rate of damage to DNA, RNA, proteins, and cellular organelles, as well as the gradual decline of the cellular defense mechanisms against stress. This can ultimately lead to a dysfunctional cell, with a higher risk factor for a number of diseases, including cancers, cardiovascular disease, and multiple neurodegenerative disorders. With a view to uncovering the pathways associated with aging, and their role in age-related degeneracies, we have developed a number of algorithms and statistical models that integrate and analyze disparate data over human and yeast interactomes. In this talk, we present two recent results: (i) we demonstrate the use of directed random walks in uncovering the downstream effectors of Target of Rapamycin (TOR), a highly conserved protein kinase that plays a key role in the aging process of various organisms; and (ii) we build tissue-specific networks for human cells and develop a complete framework for projecting these tissue-specific networks on to the yeast interactome. The goals of this effort are many-fold -- strong alignments indicate tissues for which yeast is a good model organism (in terms of underlying biochemistry), alignments reveal specific pathways that are well conserved, and they serve as a first step in understanding the etiology of age-related degeneracies.

Important Dates

Refer to the BIOKDD Fan site on Facebook ( for the following date updates (The panel on the left pane of this page contains the real-time update information of the workshop!):

May 22nd, 2013      Paper Submission Due
June 25th, 2013      Notification of Acceptance
July 5th, 2013 (5 PM EST)      Camera-ready Paper Due
August 11th, 2013      Workshop Presentation


All papers will be published at the workshop proceedings and at the ACM digital library.

Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. A selection of accepted papers will also be invited to be submitted to a special section of the reputed IEEE Transactions on Computational Biology and Bioinformatics. Each paper submitted to the special issue should contain "a sufficient amount of new material" relative to the worksop version. These papers will go through further review before acceptance for publication in the special issue.


  1. A special BIOKDD workshop registration is required for each accepted paper in addition to conference registration. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter. For those who do not present a BIOKDD workshop paper, this registration fee is not required. BIOKDD registration using the Google Checkout link below.
  2. KDD-2013 conference has a separate and mandatory registration process. If you register with the conference, you can get a printed proceeding for a nominal fee at the conference registeration desk directly.
Please use the following Google Checkout to pay the workshop publication fees.


General Chairs

Mohammed Zaki, Ph.D.
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180

Web site:

Jake Y. Chen, Ph.D.
Indiana University School of Informatics
Purdue University School of Science Department of Computer & Information Science
Indiana Center for Systems Biology and Personalized Medicine
Indianapolis, IN 46202

Web site:

Program Chairs

Gaurav Pandey, Ph.D.
Department of Genetics and Genomic Sciences
Icahn Institute for Genomics and Multiscale Biology
Icahn School of Medicine at Mount Sinai
New York, NY 10029

Web site:

Huzefa Rangwala, Ph.D.
Department of Computer Science & Engineering     
George Mason University
Fairfax, VA 22030

Web site:

George Karypis, Ph.D.
Department of Computer Science & Engineering     
University of Minnesota
Minneapolis, MN 55255

Web site:

Program Committee

William S Noble    University of Washington
Ambuj Singh    University of California, Santa Barbara
Jinbo Xu    Toyota Technological Institute at Chicago
Andrea Tagarelli      University of Calabria, Italy
Asa Ben-Hur      Colorado State University
Bojan Losic    Icahn School of Medicine at Mount Sinai
Chad Myers     University of Minnesota
Chandan K. Reddy     Wayne State University
T.M. Murali      Virginia Tech
Francis Chin     University of Hong Kong
Gang Fang     Icahn School of Medicine at Mount Sinai
Jieping Ye      Arizona State University
Mohammad Al Hasan    IUPUI
Jun (Luke) Huan    University of Kansas
Jinze Liu    University of Kentucky
Tae Hyun Hwang    University of Texas Southwest Medical Center
Vipin Kumar    University of Minnesota
Mehmet Koyuturk     Case Western Reserve University
Minghua Deng Peking University, China
Jie Zheng Nanyang Technological University of Singapore
Naren Ramakrishnan Virginia Tech
Rui Chang    Icahn School of Medicine at Mount Sinai
Rui Kuang University of Minnesota
Saeed Salem North Dakota State University
Tamer Kahveci University of Florida
Xia Ning    NEC Labs
Xiaohua (Tony) Hu     Drexel University
Sanghamitra Bandyopadhyay Indian Statistical Institute, Kolkata, India
Ying Ding Indiana University
Predrag Radivojac Indiana University
Min Song    New Jersey Institute of Technology
Stefan Kramer Johannes Guternberg University Mainze
Vladimir Pavlovic North Dakota State University
Isidore Rigoutsos Jefferson University

Reference Book

Announcing a BIOKDD 733-page new book
"Biological Data Mining"
Edited by Jake Y. Chen and Stefano Lonardi
published by Chapman & Hall/CRC Press (Sept 2009).



©2004-2011 Dr. Jake Y. Chen. All Rights Reserved.