BIOKDD '11 Workshop, "Data Mining Challenges in Next‐generation Sequencing (NGS)"


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the inherent uncertainties in data collection processes, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

Workshop History (2001-2010)

Data Mining approaches seem ideally suited for Bioinformatics, since they are data-driven and do not require a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2010), held annually or biannually in conjunction with the ACM SIGKDD Conference. This will be the 10th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and a panel/invited talk.

Information on past workshops is available at:

Call for Papers

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of “Data Mining Challenges in Next‐generation Sequencing (NGS)”. NGS is revolutionizing biological, biomedical, and health research. There are enormous data analysis and knowledge discovery challenges in the NGS technology, including expression analysis, mutational analysis, alternative slicing pattern discovery, whole transcription sequence alignment, epigenetics site discovery, storing and compression of high volume sequence data and clustering and classification of structural variations in a population.

We encourage papers that propose novel data mining techniques for post-genome bioinformatics studies in areas such as the following, although excellent papers without the use of NGS will also be considered:

  • NGS data processing
  • Genome structural variation analysis
  • Exome sequencing
  • Comparative assessment of data qualities between NGS and microarray-based technology
  • Comparative Genomics
  • Metagenomics using NGS
  • RNA‐seq expression analysis
  • Genome‐wide analysis of non‐coding RNAs
  • Mutational analysis and disease risk assessment
  • Genome‐wide motif finding
  • Modeling of biological networks and pathways from NGS data
  • NGS and structural bioinformatics applications
  • Correlating NGS with proteomics data analysis
  • Biomarker discoveries in NGS data
  • Gene functional annotation
  • Special biological data management techniques for NGS data
  • Special information visualization techniques for NGS data analysis
  • Semantic webs and ontology‐driven NGS data integration methods
  • Knowledge discovery of genotype‐phenotype associations in NGS
  • Privacy and security issues in mining genomic databases

Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Paper should be submitted in PDF/PS format through Easychar at the following link:

Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD10)

Important Dates

Refer to our Workshop Fan site on Facebook ( for the following date updates (The panel on the left pane of this page contains the real-time update information of the workshop!):

5/20/2011      Paper Submission Due (extended from original deadline: 5/14/2011)
6/06/2011      Notification of Acceptance
6/15/2011      Camera-ready Paper Due
8/21/2011      Workshop Presentation


All papers will be published at the workshop proceedings and at the ACM digital library.

Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. Expanded version of selected high-quality papers from the workshop will be invited for publication in a special issue of a major bioinformatics/biocomputing journal (in 2007, it was Journal of Bioinformatics and Computational Biology; in 2008, it was ACM/IEEE Transactions on Computational Biology and Bioinformatics; in 2010, it was BMC Bioinformatics). Details of the journal/book publication will be announced online after the workshop.

Program Overview

  • Duration: 1 HALF DAY
  • Location: BIOKDD '11 will be held in conjuction with ACM KDD-2011, at the Manchester Grand Hyatt in San Diego, CA, USA. The following is the contact information for the hotel:
  • Manchester Grand Hyatt
    One Market Place
    San Diego, California, USA 92101

    Tel: +1 619 232 1234
    Fax: +1 619 233 6464

    Maps & Directions

  • Research Papers: We accepted 3 full papers and 2 short papers. Each paper was peer reviewed by at least two members of the program committee and papers with declared conflict of interest were reviewed blindly to ensure impartiality. The full papers will have 25 minutes and the short papers will have 15 minutes of time (including oral presentation and question & answers). A copy of the proceeding can be found here.
  • Plenary Speakers: We invited two speakers below.
    • Vineet Bafna, PhD, Professor, University of California, San Diego
    • Harry Gao, MD, PhD, Director, DNA Sequencing/Solexa Core Lab, City of Hope.



  1. A special BIOKDD workshop registration is required for each accepted paper in addition to conference registration. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter. For those who do not present a BIOKDD workshop paper, this registration fee is not required. The BIOKDD registration will be open after June 1st 2011.
  2. KDD-2011 conference has a separate and mandatoryregistration process. If you register with the conference, you can get a printed proceeding for a nominal fee at the conference registeration desk directly.

Please use the following Google Checkout to pay the workshop publication fees.


8:25-8:30: Opening Remarks

8:30-9:25: Invited Speaker presentation 1

Session I (9:30 am – 10:10 am)

9:30 – 9:55 [L] Zhen Hu and Raj Bhatnagar. Algorithm for Low-Variance Biclusters to Identify Coregulation Modules in Sequencing Datasets

9:55 – 10:10 [S] Mina Maleki, Md. Mominul Aziz and Luis Rueda. Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions

10:10-10:30 Coffee Break

10:30-11:25: Invited speaker presentation 2

Session II (11:30 am – 12:35 pm)

11:30 – 11:55 [L] K.S.M. Tozammel Hossain, Chris Bailey-Kellogg, Alan Friedman, Michael Bradley, Nathan Baker and Naren Ramakrishnan. Using Physicochemical Properties of Amino Acids to induce Graphical Models of Residue Couplings

11:55 – 12:20 [L] Hamching Lam and Daniel Boley. Analyze Influenza Virus Sequences Using Binary Encoding Approach

12:20 – 12:35 pm [S] Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi and Alok Choudhary. A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data

12:35 – 12:45 Closing Remarks

Workshop Proceedings

  • BIOKDD '11 electronic proceedings will be made available online here.


General Chairs

Mohammed Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180

Web site:

Jake Y. Chen
Indiana University School of Informatics
Purdue University School of Science Department of Computer & Information Science
Indiana Center for Systems Biology and Personalized Medicine
Indianapolis, IN 46202

Web site:

Program Chairs

Mohammad Al Hasan
Department of Computer Science
Indiana University ‐ Purdue University
723 W. Michigan St., #SL‐277
Indianapolis, IN 46202

Web site:

Jun (Luke) Huan
Department of Electrical Engineering and Computer Science     
University of Kansas
Lawrence, KS, 66047-7621

Web site:

Program Committee

Vineet Chaoji

Yahoo Research, Bangalore

Bin Chen

Indiana University, Bloomington

Xiang Chen

Chinese Academy of Science

Md Tamjid Hoque


Jingshan Huang

University of South Alabama

Hasan Jamil

Wayne State University

Asif Javed

IBM TJ Watson Research Center

George Karypis

University of Minnesota, Twin Cities

Mei Liu

Vanderbilt University

Huzefa Rangwala

George Mason University

Chandan Reddy

Wayne State University

Isidore Rigoutsos

Thomas Jefferson University

Jianhua Ruan

University of Texas, San Antonio

Saeed Salem

North Dakota State University, Fargo

Min Song

New Jersey Institute of Technology

Vincent Tseng

National Taiwan University

Duygu Ucar

Stanford University

Vladimir Vacic

Columbia University

Jason Wang

New Jersey Institute of Technology

Wei Wang

University of North Carolina, Chapel Hill

Jinbo Xu

Toyota Technological Institute, Chicago

Jie Zheng

Nanyang Technological University, Singapore

Reference Book

Announcing a BIOKDD 733-page new book
"Biological Data Mining"
Edited by Jake Y. Chen and Stefano Lonardi (BIOKDD '07-08 co-chairs)
published by Chapman & Hall/CRC Press (Sept 2009).