【摘要】 所谓聚类是指按照事物的某些属性,把事物聚集成类,使类间的相似性尽量小,类内相似性尽量大的一个无监督学习过程。聚类分析在经济管理及工程等许多领域有大量的实际背景。目前,关于精确数值形式聚类信息(一般指聚类对象特征指标值或相似矩阵以及指标权重)聚类分析方法已取得丰富的研究成果。但在许多实际问题中,由于对被聚类对象的信息估计不精确或测量的误差以及人为判断等原因,评价信息常常以区间数、三角模糊数、语言短语甚至是语言区间信息等形式出现。因此,针对具有语言信息的聚类方法进行研究,无论是在理论方面,还是在应用方面,都具有重要的意义。为此,本文针对具有语言信息的聚类方法进行了分析和研究,主要研究内容概括如下:第一章介绍了本文研究的背景和意义;介绍了本文的研究目标与内容;并提出了本文的拟创新点和研究思路。第二章对具有语言信息的聚类方法的理论及相关问题的研究成果进行了综述,并对已有的研究成果作出总结。第三章首先介绍了聚类分析的概念以及两种比较常见的聚类方法。然后介绍了语言变量概念。最后介绍了二元语义及其集结算子。第四章首先在语言变量和二元语义的基础上定义了语言区间变量和区间二元语义,并给出了相应的算子;然后对具有语言区间信息的聚类问题作出了描述,给出了基于语言区间信息的最大树聚类方法及FCM聚类方法;最后针对这两种方法分别给出了算例。第五章针对具有实数、区间数、语言变量等不同形式评价信息的聚类问题,提出了一种新的基于混合评价信息的FCM聚类方法,并给出了具体算例。在本文最后总结了本文的主要研究成果及结论和本文的主要贡献,并指出了今后需要进一步开展的研究工作。
【Abstract】 Clustering is an unsupervised study process; the objective of cluster analysis is to group a set of objects into clusters such that objects within the same cluster have a high degree of similarity, while objects belonging to different clusters have a high degree of dissimilarity. It has been most commonly applied in the economic and the management areas, etc. If the clustering information (feature value of clustering object or similarity matrix or the feature weight) is exactly numerical (crisp) data, there are many literatures on this topic. But for many actual problems, because of the vague and the non-precise of the object’s feature, the object’s feature values are interval number or triangular fuzzy numbers or linguistic information even linguistic interval information forms. Therefore, with respect to the research of clustering analysis problems with linguistic information, not only in theory but also in application, there are important significances.This paper studies clustering analysis with linguistic information, with contents as follows:In chapter 1, the background, meaning, purpose and the main work of the paper are introduced; Moreover, the point of innovation and the research idea of this paper are given.In chapter 2, the methods for clustering analysis with linguistic information and its related problems are summarized.Chapter 3 gives the conception of clustering analysis and two methods of clustering, and then introduces the definitions of linguistic variable Finally, the conception and aggregation operators of 2-tuple is introduced.Chapter 4 gives the conception of interval linguistic variable, introduces the conception and aggregation operators of interval 2-tuple.And then the description of clustering problem with interval linguistic information is given. A maximal tree clustering method and a FCM clustering method base on interval linguistic 2-tuple information processing are present. Finally, two examples show the applicability of the proposed methods separately. In chapter 5, aiming at the clustering analysis problems with mixed attribute information such as real number, interval number and natural language, a new clustering analysis algorithm is proposed, which is the extension of the traditional FCM clustering method. Finally, an example is given to show the applicability of the proposed FCM clustering method.Finally, the dissertation draws a conclusion, summarizes the research fruits. On the basis of the above, some suggestions on future research are put forward.
【关键词】 聚类; 语言信息; 语言区间信息; 二元语义; 最大树聚类方法; FCM聚类方法;
【Key words】 clustering; linguistic information; linguistic interval information; linguistic 2-tuple; maximal tree clustering method; FCM clustering method;
目录:
摘要 5-6
Abstract 6-7
第1章绪论 11-20
1.1研究背景 11-14
1.1.1聚类分析是完成数据挖掘任务的重要手段 11-12
1.1.2聚类分析在经济管理中的实际应用背景 12-13
1.1.3基于语言信息聚类分析的出现 13-14
1.2问题的提出 14-15
1.2.1需要对基于语言信息的聚类方法进行研究 14
1.2.2需要对基于语言区间信息的聚类方法进行研究 14
1.2.3需要对基于语言与其他形式混合信息的聚类方法进行研究 14-15
1.3研究目标及研究内容 15-16
1.3.1研究目标 15
1.3.2研究内容 15-16
1.4研究方法及研究思路 16-17
1.4.1研究方法 16
1.4.2研究思路 16-17
1.5本文的创新点 17-18
1.6论文结构 18-20
第2章相关文献综述 20-29
2.1文献的检索源及检索方式 20
2.2关于语言评价信息相关研究成果综述 20-23
2.