National Technical Reports Library - NTRL

National Technical Reports Library

The National Technical Information Service acquires, indexes, abstracts, and archives the largest collection of U.S. government-sponsored technical reports in existence. The NTRL offers online, free and open access to these authenticated government technical reports. Technical reports and documents in its repository may be available online for free either from the issuing federal agency, the U.S. Government Publishing Office’s Federal Digital System website, or through search engines.




Details
Actions:
Download PDFDownload PDF
Download

Comparison of Signature Pattern Analysis Methods in Molecular Epidemiology.


DE2001763365

Publication Date 2000
Personal Author Burr, T.; Charlton, W.; Stanbro, W.
Page Count 11
Abstract We consider the supervised learning problem of assigning test influenza sequences to their correct group (where the group is the host species), Assume that training cases (influenza sequences and their group labels) are available, usually via estimates of phylogenetic (evolutional) trees as a special case of unsupervised learning. We compare three supervised learning methods: (1) a published signature pattern analysis (VESPA) approach; (2) an unpublished Bayesian approach that assumes sites are independent, and (3) a nearest-neighbor approach with flexible evolutionary distance measures. Although the Bayesian approach has the attractive feature of reporting estimated probabilities for each group for each test sequence, those proprobabilities are somewhat suspect because of the site-independence assumption that is difficult to remove. We investigate the impact of this independence assumption and show that it can be conservative or anti-conservative (meaning that it leads to either overstating or understating the separability of the groups). The VESPA approach also assumes site independence, but it has the advantage of allowing for dependence among sequence scores in a way that can easily be estimated, as we illustrate. Finally, the distance-based method is always a strong contender, and especially in this case because of the ease of incorporating evolutionary models into the distance measure. All three methods are of potential use on similar problems, with no single method emerging as the clear winner. We compare conclusions under the three approaches for sequences from the Nucleoprotein (NP) gene of the human influenza RNA virus from three host species. This data is available from the influenza database maintained at Los Alamos National Laboratory (http:/71inker.lanL gov u/searchJrarne. html).
Keywords
  • Epidemiology
  • Medicine
  • Molecules
  • Molecular biology
  • Signature pattern analysis
  • Supervised learning methods
  • Independence assumptions
Source Agency
  • Technical Information Center Oak Ridge Tennessee
NTIS Subject Category
  • 57U - Public Health & Industrial Medicine
  • 57F - Cytology, Genetics, & Molecular Biology
Corporate Authors Los Alamos National Lab., NM.; Department of Energy, Washington, DC.
Document Type Conference Proceedings
NTIS Issue Number 200124
Contract Number
  • W-7405-ENG-36
Comparison of Signature Pattern Analysis Methods in Molecular Epidemiology.
Comparison of Signature Pattern Analysis Methods in Molecular Epidemiology.
DE2001763365

  • Epidemiology
  • Medicine
  • Molecules
  • Molecular biology
  • Signature pattern analysis
  • Supervised learning methods
  • Independence assumptions
  • Technical Information Center Oak Ridge Tennessee
  • 57U - Public Health & Industrial Medicine
  • 57F - Cytology, Genetics, & Molecular Biology
  • W-7405-ENG-36
Loading