BIOKDD '12 Workshop


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the inherent uncertainties in data collection processes, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

The past two decades have witnessed rapid technological advances in biological data collection and acquisition. These advances in biotechnology enabled interrogation of cellular systems at various levels, leading to generation and collection of large-scale biological data (mostly in public databases) at an exponential rate. The explosion of biological data is leading to a paradigm shift in research methods in life sciences; from hypothesis-driven research to data driven research. In the last decade, sophisticated algorithms for knowledge discovery and data mining have demonstrated great promise in extracting novel biological information from complex, heterogeneous, and very high dimensional biological datasets.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

Workshop History (2001-2012)

Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2012), held annually or biannually in conjunction with the ACM SIGKDD Conference. This will be the 11th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and 1-2 invited talks. Information on past workshops is available at the following web pages:

Call for Papers

BIOKDD has successfully established a tradition in providing a platform for the presentation and discussion of advances in data mining techniques that primarily target biological data in the last ten years. BIOKDD 2012 will target submissions on analyzing a broad range of biological, biochemical, and clinical datasets. The data of interest include “omic” datasets (genomic, transcriptomic, proteomic, metabolomic, interactomic), biochemical datasets, and clinical datasets (ranging from physiological measurements to free text). Papers that integrate multiple types of data to extract novel information will also be of great interest to the workshop. The topics of interest, classified according to data type, include the following:

Genomics/Sequence Analysis:

  • Analysis of next-generation sequencing data
  • Phylogenetics and comparative genomics
  • Genome-wide association studies and genetic interactions
  • Motif finding
  • Population genetics, Haplotype analysis

Transcriptomics/Functional Genomics:

  • Analysis of DNA microarray and RNA-seq data
  • RNAi and microRNA analysis
  • Biomarker discovery
  • RNA structure prediction


  • Computational proteomics
  • Prediction of protein structure and interactions
  • Identification of drug targets
  • Post-translational regulation of proteins


  • Analysis and comparison of metabolic networks and pathways
  • Flux balance analysis

Systems Biology/Interactomics:

  • Analysis of protein interaction networks
  • Genetic regulatory networks, DNA-protein interactions
  • Modeling of biological systems
  • Mining of gene-environment, gene-drug, drug-drug interactions


  • Drug discovery
  • Virtual screening
  • Combinatorial chemistry

Health Informatics/Translational Science:

  • Knowledge discovery in clinical data and electronic medical records
  • Privacy and security issues in mining health databases
  • Integration of biological and clinical datasets
  • Mining of neurological and physiological data

Data Mining Methodologies:

  • Statistical learning methods in bioinformatics
  • Biomedical text mining
  • Biological data management techniques
  • Semantic webs and ontology-driven biological data integration methods

Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Paper should be submitted in PDF/PS format through Easychar at the following link:

Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD10)

Important Dates

Refer to the BIOKDD Fan site on Facebook ( for the following date updates (The panel on the left pane of this page contains the real-time update information of the workshop!):

5/15/2012      Paper Submission Due (deadline extended from 5/7/2012)
6/06/2012      Notification of Acceptance
6/15/2012      Camera-ready Paper Due
8/12/2012      Workshop Presentation


All papers will be published at the workshop proceedings and at the ACM digital library.

Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. A selection of accepted papers will also be invited to be submitted to a special issue of IEEE/ACM Transactions on Computational Biology (TCBB). Each paper submitted to the special issue should contain "a sufficient amount of new material" relative to the worksop version, by TCBB's (and IEEE's) rules as specified here:
These papers will go through further review before acceptance for publication at TCBB.

Program Overview

  • Duration: 1 HALF DAY
  • Location: BIOKDD '12 will be held in conjuction with ACM KDD-2012, at a hotel in Beijing, China. The following is the contact information for the hotel:
  • TBD
    Beijing, China
    Tel: +86 (10)
    Maps & Directions

  • Research Papers: Approximately 5 research papers will be accepted for oral presentations. Each paper will be peer reviewed by at least three members of the program committee. Papers accepted as "full papers" will have 25 minutes and those accepted as "short papers" will have 15 minutes of time (including oral presentation and question & answers). A copy of the proceeding will be posted later.
  • Plenary Speaker:
    • Wei Wang, PhD, Professor, University of North Carolina, Chapel Hill.



  1. A special BIOKDD workshop registration is required for each accepted paper in addition to conference registration. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter. For those who do not present a BIOKDD workshop paper, this registration fee is not required. The BIOKDD registration will be open after June 1st 2012.
  2. KDD-2012 conference has a separate and mandatory registration process. If you register with the conference, you can get a printed proceeding for a nominal fee at the conference registeration desk directly.

We will support Google Checkout to pay the workshop publication fees.


8:30-8:35: Opening Remarks

8:35-9:30: Keynote presentation. Mining Genetic Interactions in Genome‐Wide Association Study
Prof. Wei Wang (University of North Carolina)

Session I (9:30 am – 10:30 am)

9:30-9:50 Detecting Protein Complexes from Noisy Protein Interaction Data
Dmitry Efimov, Nazar Zaki, Jose Berengueres

9:50-10:10 Globalized Bipartite Local Model for Drug‐Target Interaction Prediction
Jian-Ping Mei, Chee-Keong Kwoh, Peng Yang, Xiao-Li Li, Jie Zheng

10:10-10:30 2D Similarity Kernels for Biological Sequence Classification
Pavel P. Kuksa

10:30-11:00 Coffee Break

Session II (11:00 am – 11:40 am)

11:00-11:20 Learning to Extract Chemical Names based on Random Text Generation and Incomplete Dictionary Direction for Our Field
Su Yan, W. Scott Spangler, Ying Chen

11:20-11:40 Biomedical Text Categorization with Concept Graph Representations Using a Controlled Vocabulary
Meenakshi Mishra, Jun Huan, Said Bleik, Min Song

11:40 – 11:50 Closing Remarks

Workshop Proceedings

  • BIOKDD '12 electronic proceedings will be made available online here.


General Chairs

Mohammed Zaki, Ph.D.
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180

Web site:

Jake Y. Chen, Ph.D.
Indiana University School of Informatics
Purdue University School of Science Department of Computer & Information Science
Indiana Center for Systems Biology and Personalized Medicine
Indianapolis, IN 46202

Web site:

Program Chairs

Tamer Kahveci, Ph.D.
Department of Computer and Information Science and Engineering
University of Florida
Gainsville, FL 32611-6125

Web site:

Saeed Salem, Ph.D.
Department of Computer Science     
North Dakota State University
Fargo, ND 58102

Web site:

Mehmet Koyuturk, Ph.D.
Department of Electrical Engineering & Computer Science
Center for Proteomics & Bioinformatics      
Case Western Reserve University
Cleveland, OH 44106

Web site:

Program Committee

Asif Javed    Genome Institute Singapore
Chris Bailey Kellog    Dartmouth College
Jinbo Xu    Toyota Technological Institute at Chicago
Yufeng Wu      University of Connecticut
Xiang Zhang      Case Western Reserve University
Ranadip Pal    Texas Tech University
Saad Sheikh      University of Florida
Surabh Sinha    UIUC
T.M. Murali      Virginia Tech
Tolga Can      METU
Xiaoning Qian      University of South Florida
Anne Denton      NDSU
Mohammad Al Hasan    IUPUI
Jun (Luke) Huan    University of Kansas
Mukul Bansal    MIT
Fahad Saeed    NIH
Vipin Kumar    University of Minnesota
Aidong Zhang    University at Buffalo
Nirmalya Bandyopadhyay Johns Hopkins University
Jie Zheng Nanyang Technological University of Singapore
Dongxiao Zhu Wayne State University

Reference Book

Announcing a BIOKDD 733-page new book
"Biological Data Mining"
Edited by Jake Y. Chen and Stefano Lonardi
published by Chapman & Hall/CRC Press (Sept 2009).