

日期:2019年06月13日 编辑:ad200901081555315985 作者:论文网 点击次数:1326
论文价格:150元/篇 论文编号:lw201906131456129742 论文字数:15748 所属栏目:计算机应用论文
论文地区:中国 论文语种:中文 论文用途:本科毕业论文 BA Thesis
摘  要
With the development of network technology, people are getting more and more information, but also with all kinds of complicated and redundant information. When we use the general search engine to find the information we need, we often need to screen and search in many pages, so we waste a lot of time and energy. Therefore, the theme search engine comes into being and solves these problems.
In addition, the use of the Nutch is very convenient, can according to customer's own need to be set, and anyone can see the search engine's working process, has the openness. The application of Nutch in real life also proves that Nutch has stability, which is very valuable for people who like to study search engines.
This article mainly elaborated on the basis of the Nutch news topic search engine and the actual application, the invention of today's society is very more people browse the news on the Internet, but many sites in order to attract more people to watch, there are so many news on the page is not high quality. Again in another ways, with the improvement of people's living standard, people also more and more high to the requirement of the quality of the news, so the invention of a new news topic search engine are urgently needs to solve the problem.
This paper explained the evolution process of search engine, in the face of challenge, and explained the advantages and the current situation of topic search engine, at the same time in understanding the Nutch run on the conditions of application of the subject matter of how to select the web was carefully studied, and worked out the news topic search engine of the concrete implementation plan, and then expounds the Nutch, Tomcat part such as how to carry out assembly, test application and were compared with other news service platform. Finally, the article is summarized.
Keywords: Nutch; Search Engines; Crawler; Fetching Strategy; News
目   录
摘  要 I
Abstract II
第1章 相关技术概述 1
1.1 搜索引擎的结构 1
1.1.1 搜索引擎系统概述 1
1.1.2 搜索引擎的构成 1
1.1.3 搜索引擎的主要指标及分析 2
1.2 面向大数据的搜索引擎技术概述 2
1.2.1 Map-Reduce计算模型 2
1.2.2 HDFS分布式文件系统 3
1.2.3 HBASE分布式数据库 3
1.2.4 Spark云计算框架 4
1.3 基于大数据分析的智能搜索引擎应用分析 4
1.3.1 智慧搜索需求分析 4
1.3.2 大数据分析与智慧搜索 5
1.4 网络机器人 6
1.4.1 什么是网络机器人 6
1.4.2 网络机器人的结构分析 6
1.5 开源搜索引擎Nutch 6
1.5.1 搜索引擎工具包Lucene 6
1.5.2 Nutch的介绍 8
1.6 JavaCC技术 9
第2章 垂直搜索引擎策略设计 11
2.1 基于链接结构特征 11
2.1.1 PageRank算法 11
2.1.2 HITS算法 12
2.1.3 本文实现的算法 13
2.2 基于内容评价 14
2.2.1 Fish Search算法 14
2.2.2 Shark Search算法 15
2.3 其他相关策略 16
2.3.1 基于巩固学习的聚焦搜索 16
2.3.2 基于语境图的聚焦搜索 17
第3章 垂直搜索引擎的设计与实现 18
3.1 需求分析阶段 18
3.1.1 需求分析 18
3.1.2 系统总体结构图 18
3.1.3 系统开发以及运行环境 20
3.2 功能模块设计 20
3.2.1 网络爬虫模块 20
3.2.2 索引功能模块 21
3.2.3 检索功能模块 22
3.2.4 Lucene评分算法的改进 23
3.3 改进结果测试 24
3.3.1 索引模块的测试 24
3.3.2 检索模块的测试 24
第4章 总结与展望 27
参考文献 28
致  谢 31
第1章 相关技术概述
1.1 搜索引擎的结构
1.1.1 搜索引擎系统概述

1.1.2 搜索引擎的构成 网络蜘蛛
第4章 总结与展望