BIOKDD '10 Workshop, "Mining BioComplexity: From Molecular Systems to Health"


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the inherent uncertainties in data collection processes, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

Workshop History (2001-2009)

Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2009), held annually or biannually in conjunction with the ACM SIGKDD Conference. This will be the 9th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and a panel/invited talk.

Information on past workshops is available at:

Call for Papers

The goal of this workshop is to encourage KDD researchers and practitioners to take on the numerous challenges that Bioinformatics offers. This year the workshop will feature the theme of “Mining biocomplexity: From Molecular Systems to Health”. Complex biological systems consist of components that are in themselves complex and interacting with each other. Understanding how the various components work in concert, using modern high-throughput biology and data mining methods, is crucial to the ultimate goal of genome-based economy such as genome medicine and new agricultural and energy solutions. Applying the study of complex biological systems, health informatics aims to discover novel and useful patterns in large volumes of health care related data and to explore the links between disease physiology and molecular bio-sciences. BIOKDD offers a premier forum of presenting data mining concepts and tools for integrating data from heterogeneous multimedia sources, especially those from new high-throughput experimental data collection technologies, and expediting knowledge discovery in a wide range of areas including biological, biomedical, pharmaceutical, nursing, clinical care, dentistry, and public health research.

We encourage papers that propose novel data mining techniques for post-genome bioinformatics studies in areas such as:

  • Phylogenetics and comparative Genomics
  • DNA microarray data analysis
  • Deep sequencing data analysis
  • RNAi and microRNA analysis
  • Protein/RNA structure prediction
  • Sequence and structural motif finding
  • Modeling of biological networks and pathways
  • Statistical learning methods in bioinformatics
  • Dynamics modeling for biological networks
  • Computational proteomics
  • Computational biomarker discoveries
  • Gene-environment, Gene-drug, drug-drug interaction discoveries
  • Computer aided drug discoveries
  • Biomedical Text mining
  • Biological data management techniques
  • Semantic webs and ontology-driven biological data integration methods
  • Knowledge discovery in electronic medical records
  • Privacy and security issues in mining health databases

Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Paper should be submitted in PDF/PS format through Easychar at the following link:

Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD08)

Important Dates

5/10/2010      Deadline for Submission of Papers (extended from 5/4/2010!)
5/21/2010      Notification of Acceptance; Workshop Registration Open
6/01/2010      Submission of Camera Ready Papers
7/25/2010      Workshop Presentation


All papers will be published at the workshop proceedings and at the ACM digital library.

Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. Expanded version of selected high-quality papers from the workshop will be invited for publication in a special issue of a major bioinformatics/biocomputing journal (in 2007, it was Journal of Bioinformatics and Computational Biology; in 2008, it was ACM/IEEE Transactions on Computational Biology and Bioinformatics). Details of the journal/book publication will be announced online after the workshop.

Program Overview



  1. The workshop registration is required for each accepted paper. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter. For those who do not present a BIOKDD workshop paper, this registration fee is not required. The BIOKDD registration will be open as between June 1st 2010 and July 25th 2010.
  2. KDD-2010 conference has a separate registration process for those interested in the whole conference event. The conference registration ($750), however, will not be required for participation in this workshop. If you register with the conference, you can get a printed proceeding for a nominal fee at the conference registeration desk directly.

To register officially for the workshop, please use the following Google Checkout to pay the fees.


8:25­ 8:30: Opening Remarks

Session I: Systems Biology

8:30-8:45 Discovery of Error-tolerant Biclusters from Noisy Gene Expression Data

8:45-9:00 Systematic Construction and Analysis of Co-expression Networks for Identification of Functional Modules and cis-regulatory Elements

9:00- 9:15 A Fast Markov Blankets Method for Epistatic Interactions Detection in Genome-wide Association Studies

9:15 – 9:30 Combining Active Learning and Semi-supervised Learning Techniques to Extract Protein Interaction Sentences

9:30-10:30: Keynote Speech

10:30-11:15 Coffee Break & Poster Session

Session II: Proteins and Genes

11:15-11:30 Efficient Motif Finding Algorithms for Large-Alphabet Inputs

11:15 – 11:45 Planning Combinatorial Disulfide Cross-links for Protein Fold Determination

11:45 – 12:00 A New Approach for Detecting Bivariate Interactions in High Dimensional Data using Quadratic Discriminant Analysis

12:00 – 1:00: Panel Discussion


Workshop Proceedings

  • BIOKDD '10 electronic proceedings will be made available online here.


Program Chairs

Jun (Luke) Huan
Department of Electrical Engineering and Computer Science     
University of Kansas
Lawrence, KS, 66047-7621

Web site:

Jake Y. Chen
Indiana University School of Informatics
Purdue University School of Science Department of Computer & Information Science
Indiana Center for Systems Biology and Personalized Medicine
Indianapolis, IN 46202

Web site:

Mohammed Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180

Web site:

Program Committee (additional members being invited)

Chris Bailey-Kellogg

Dartmouth College

Xue-wen Chen

University of Kansas

Francisco Couto

University of Lisbon, Portugal

Jean X. Gao

University of Texas at Arlington

Miao He

Sun Yat-sen University, China

Tony Hu

Drexel University

Vipin Kumar

University of Minnesota at Twin Cities

Doheon Lee

KAIST, Korea

Jie Liang

University of Illinois at Chicago

Jiao Li


Jinze Liu

University of Kentucky

Stefano Lonardi

Unversity of California, Riverside

Zoran Obradovic

Temple University

Srinivasan Parthasarathy

Ohio State University

Jianhua Ruan

University of Texas, San Antonio

Saeed Salem

North Dakota State University

Leming Shi

Food and Drug Administration

Ambuj Singh

University of California at Santa Barbara

Min Song

New Jersey's Science & Technology University

Vincent Tseng

National Cheng Kung University, Taiwan

James Wang

Penn State University

Jason Wang

New Jersey's Science & Technology University

Wei Wang

University of North Carolina at Chapel Hill

Dong Xu

University of Missouri at Columbus

Jinbo Xu

Toyota Technological Institute

Aidong Zhang

SUNY Buffalo

Shuxing Zhang

MD Anderson Cancer Research Institute

Sheng Zhong

University of Illinois at Urbana-Champion

Reference Book

Announcing a BIOKDD 733-page new book
"Biological Data Mining"
Edited by Jake Y. Chen and Stefano Lonardi (BIOKDD '07-08 co-chairs)
published by Chapman & Hall/CRC Press (Sept 2009).