计算机数据库论文栏目提供最新计算机数据库论文格式、计算机数据库硕士论文范文。详情咨询QQ:1847080343(论文辅导)

基于基因数据库系统的序列比对算法研究

日期:2018年01月15日 编辑: 作者:无忧论文网 点击次数:3367
论文价格:250元/篇 论文编号:lw200706282118257802 论文字数:40452 所属栏目:计算机数据库论文
论文地区:中国 论文语种:中文 论文用途:硕士毕业论文 Master Thesis
摘 要 生物信息学是当今最重要、最前沿的科学发展领域之一,已被广泛用于基因序列数据的获取、处理、分析和管理等许多方面,对于分子生物学和生物医学研究的深入发展发挥了巨大作用。 本文的工作是在本人所从事开发研制的《上海复旦博容生物基因公司生物信息综合管理与分析系统》的基础上进行的。博容公司过去传统的手工分析生物基因序列的方式已经不能满足迅猛增长的信息积累的需要,实施大规模生物信息数据库管理成为了生物技术和计算机技术发展的必然要求。针对生物信息处理中所遇到的实际问题——海量的基因数据库序列比对,我们在系统中采用了快速、高效的序列比对算法技术,实现了基因序列的分析流程的完全自动化。本文的主要工作就是序列比对算法研究以及基因数据库系统的实施。 本文的第一章首先介绍了生物信息学发展的内容与现状,提出了当前生物信息学发展所遇到的困难以及存在的问题,阐明了《博容生物信息综合管理与分析系统》开发的重要性和应用背景。在第二章中首先深入讨论了博容公司生物信息处理流程,研究了该公司的现状并提出具体的改进意见,然后再详细介绍了我们所开发的系统的具体结构与功能。 本文的第三章首先介绍了生物基因序列比对算法的原理和方法(动态规划算法),它包括全局序列比对,局部序列比对和间隙比对方法以及它们的综合,然后在此基础上又进一步研究了基于启发式思想的BLAST算法。在第四章中,作者提出了对传统的序列比对算法的一些改进,包括一种快速动态规划算法和对BLAST算法的一些改进方法,这些算法可以更快、更准确的实现基因序列的比对,在《博容生物信息综合管理与分析系统》中这些改进算法已经得到了具体实施。 本文所研究的课题——基因序列比对技术对于目前生物技术迅猛发展,国内生物技术企业面临参与国际竞争的环境下,如何利用先进的计算机技术和信息处理技术,实现生物基因信息的进一步分析、加工、处理,最终实现生物技术和计算机技术的结合,促进生物技术的发展,作了有益的研究。 关键词:序列比对,动态规划算法,BLAST算法,生物信息技术,基因数据库 Sequence Alignment Algorithm Study Based On the Gene Database System Abstract Biological informatics is one of the most important and advanced scientific frontiers now. It has been widely applied in many aspects such as the analysis, management of the gene sequence data and proved it’s key role in the research of molecular biology and medicine. The content of this article is based on the development of the Bioroad gene database system. Traditional gene data information manually analysis process used by Bioroadd Co. Ltd., will not still suit to the increasing demand of the development of bioinformatics and computer techniques. So, a huge gene database and a complete automatic analysis process are needed for the new system that have been fulfilled by us. At the same time, because of the near exponential growth of the sequence data stored in the database, very rapid and efficient alignment algorithms to extract information from the database become essential to the molecular biologist. The research of these kinds of algorithms is the major work of this article. Chapter 1 of this article firstly introduces the development of biological informatics and points out the problems and difficulties existed in this field now. Then it expatiates the importance and the backgrounds of the development of the gene data management system. In Chapter 2 we firstly analysis the gene data analysis process of the bioroad Co. Ltd., and present some improving measure, then the structure and function of this system that we have fulfilled is introduced in detail. In the chapter 3 of this article, I first study the principles and methods of the sequences alignment algorithm, the dynamic programming algorithm. It includes the algorithm of global alignment, local alignment and gapped alignment. Then the BLAST algorithm based on the heuristic methods in analyzed and this one is widely adapted in many practical systems. Some improving methods about the traditional sequence alignment algorithms are given in the chapter 4 of this article. These methods, such as an efficient algorithm to locate all locally optimal alignment and the two-hit algorithm, can greatly increase the speed and sensibility of the sequence alignment between the query sequence and the database. This article is meaningful to the genetic enterprises at home on doing research on how to face the competition from the outside world, including how to use advanced computer and information processing technology to realize further analyzing and processing of genetic information, finally to achieve the full combination of biologic informatics and computer technology. Keywords: Sequence Alignment, dynamic programming algorithm, BLAST algorithm, biologic informatics, Gene database