BIOKDD '08 Workshop

Announcing a BIOKDD 733-page new book
"Biological Data Mining"
Edited by Jake Y. Chen and Stefano Lonardi (BIOKDD '07-08 co-chairs)
published by Chapman & Hall/CRC Press (Sept 2009).


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the presence of biological answers to data observed despite noises, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

Workshop History (2001-2007)

Data Mining approaches seem ideally suited for Bioinformatics, since it is data-rich, but lacks a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2007), held annually in conjunction with the ACM SIGKDD Conference. This will be the 8th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and 1-2 invited talks.
Information on past workshops is available at:

Call for Papers

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature a theme “complex biological systems” and “knowledge discovery”. Different from analyzing single molecules, complex biological systems consist of components that are in themselves complex and interacting with each other. Understanding how the various components work in concert, using modern high-throughput biology and data mining methods, is crucial to the ultimate goal of genome-based economy such as genome medicine and new agricultural and energy solutions.

  • Phylogenetics and comparative Genomics
  • DNA microarray data analysis
  • RNAi and microRNA Analysis
  • Protein/RNA structure prediction
  • Sequence and structural motif finding
  • Modeling of biological networks and pathways
  • Statistical learning methods in bioinformatics
  • Computational proteomics
  • Computational biomarker discoveries
  • Computational drug discoveries
  • Biomedical text mining
  • Biological data management techniques
  • Semantic webs and ontology-driven biological data integration methods

Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Paper should be submitted in PDF/PS format through Easychar at the following link:

Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD07)

Important Dates

5/26/2008      Deadline for Submission of Papers
6/23/2008      Notification of Acceptance; Workshop Registration Open
7/25/2008      Submission of Camera Ready Papers
8/24//2008     Workshop Presentation


All papers will be published at the workshop proceedings and at the ACM digital library.

Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. Expanded version of selected high-quality papers from the workshop will be invited for publication in a special issue of a major bioinformatics/biocomputing journal (in 2007, it was Journal of Bioinformatics and Computational Biology). Details of the journal/book publication will be announced after the workshop at the workshop's web site.

Online BIOKDD '08 Proceedings. You may download the pdf document here.

Program Overview

Duration: ONE HALF DAY

Location: BIOKDD '08 will be held in the morning of August 24th 2008, in conjuction with ACM KDD 2008, at Las Vegas, Nevada, USA. The following is the contact information for the hotel:

Loews Lake Las Vegas Resort
101 Montelago Boulevard
Henderson, Nevada



  1. The BIOKDD '08 workshop presenter-only registrations are now closed.
  2. To attend the workshop, you do not need to register with us.
  3. BIOKDD '08 attendees (except for presenters) may check with the ACM SIGKDD '08 conference and go through this registration process to officially register for workshop and get printed workshop proceedings from them.


The workshop will be held on August 24th 2008 at the SIGKDD '08 conference site.

8:20-8:30am: Opening Remarks

Session 1.

8:30-8:50am: Talk 1: Function Prediction Using Neighborhood Patterns
• Petko Bogdanov and Ambuj K. Singh, University of California, Santa Barbara, USA

8:50-9:10am: Talk 2: Statistical Modeling of Medical Indexing Processes for Biomedical Knowledge Information Discovery from Text
• Markus Bundschus, Mathaeus Dejori, Shipeng Yu, Volker Tresp, and Hans-Peter Kriegel, University of Munich, Germany

9:10-9:30am: Talk 3: Information Theoretic Methods for Detecting Multiple Loci Associated with Complex Diseases
• Pritam Chanda, Aidong Zhang, Lara Sucheston and Murali Ramanathan, State University of New York, Buffalo, USA

9:30-9:50am: Talk 4: A Fast, Large-scale Learning Method for Protein Sequence Classification
• Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic, Rutgers University, USA

9:50-10:05am: Coffee Break

Session 2.

10:05-10:50am: Keynote Talk: Link Mining: exploring the power of links
• Philip Yu, University of Illinois, Chicago, USA

10:50-11:10am: Talk 5: Catching Old Influenza Virus with A New Markov Model page 38-43
• HamChing Lam and Daniel Boley, University of Minnesota, USA

11:10-11:30am: Talk 6: GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics
• Aaron Smalter, Luke Huan, Gerald Lushington and Yi Jia, University of Kansas, USA

11:30-11:50am: Talk 7: Reinforcing Mutual Information-based Strategy for Feature Selection for Microarray Data
• Jian Tang, Shuigeng Zhou, Feng Li and Jiang Kai, Fudan University, China

11:50am-12:10pm: Talk 8: Graph-based Temporal Mining of Metabolic Pathways with Microarray Data
• Chang hun You, Lawrence B. Holder, Diane J. Cook, Washington State University, USA

12:10-12:20pm: Concluding Remarks

Workshop Proceedings

  • BIOKDD '08 workshop was successfully completed on August 24th 2008. You may find the electronic proceedings here.


Program Chairs

Stefano Lonardi
Department of Computer Science & Engineering
University of California
Riverside, CA 92521

Web site:

Jake Y. Chen
Indiana University School of Informatics
Purdue School of Science Department of Computer & Information Science
Indiana University–Purdue University Indianapolis
Indianapolis, IN 46202
Web site:

Mohammed Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180-3590

Web site:

Program Committee

Alberto Apostolico Georgia Tech & University of Padova
Ann Loraine University of North Carolina, Charlotte
Chad Myers University of Minnesota
Chandan K. Reddy Wayne State University
Dong Xu University of Missouri
Giuseppe Lancia University of Udine, Italy 
Isidore Rigoutsos  IBM T. J. Watson Research Center
Jason Wang New Jersey Institute of Technology
Jie Zheng NCBI, USA
Jing Li Case Western Reserve University
Knut Reinert Freie Universitt Berlin, Germany
Li Liao University of Delaware
Luke Huan University of Kansas
Mehmet Koyuturk  Case Western Reserve University
Michael Brudno University of Toronto
Muhammad Abulaish Jamia Millia Islamia, India 
Natasa Przulj UC Irvine
Phoebe Chen Deakin University, Australia
Rui Kuang University of Minnesota
Seungchan Kim Arizona State University
Si Luo Purdue University
Simon Lin Northwestern University
Walid G. Aref Purdue University
Wei Wang University of North Carolina, Chapel Hill
Xiaohua (Tony) Hu Drexel University
Yaoqi Zhou Indiana University
Yves Lussier University of Chicago