搜索资源列表
Enhancedtextmining
- 强化版本文本挖掘流程,包含分词,分类聚类,分词结果评估等-Enhanced version of the text mining process, including word segmentation, classification clustering, segmentation results uation, etc.
InfoRetri
- 基于朴素贝叶斯的文本分类,包含去停用词,分词,特征提取,分类等-Text classification, based libsvm, included to stop words, segmentation, feature extraction and classification
1
- 检测中文文章的相似度,首先对中文文章分词处理,然后提取特征,计算特征向量夹角。检验是否相似-Similarity detection Chinese article, the first article of the Chinese word processing and feature extraction, feature vector angle calculation. Test whether similar
NLPLibSVM
- libsvm分词训练集的java版本。包括libsvm.jar以及训练集样本-Libsvm version of the Java word segmentation training set. Including libsvm.jar and training set samples
ICTCLAS_api
- 用于为指定文本进行分词操作。按照不同的词性进行分词。-Used to specify the text for the operation of word segmentation. According to different parts of speech.
kmeansClassifier
- 该程序实现了keans分类,使用IK分词技术实现分词。-The program implements the k means classification, the use of IK word segmentation technology to achieve word segmentation.
fenciledebeiyesi
- 中文文本分词系统+基于贝叶斯算法的文本分类源码,用matlab实现。-Chinese word segmentation system+ based on Bayes text classification source code, using matlab implementation.
Sogou-character-porfile
- 介绍人物标签处理的过程,从数据采集,分词,预处理,算法选择以及结果展示方面来介绍相关过程。-This paper introduces the process of character label processing, and introduces the process of data acquisition, word segmentation, preprocessing, algorithm selection and result display.
kctp
- 此代码实现数据的预处理,包括分词、去符号、去停用词等。(This code realizes the preprocessing of data, including participle, symbol, stop words, etc.)
textclustering-master
- 对于大文本进行挖掘聚类,该方法不考虑文字词语出现的频率信息,考虑上下文语境,将所有的字根据预定义的特征进行词位特征学习,获得一个训练模型。然后对待分字符串的每一个字进行词位标注,最后根据词位定义获得最终的分词结果。(Digging for large text clustering, the method does not consider the text word frequency of information, considering the context, all the words
新闻言论自动提取
- 根据新闻的内容,在线提取言论实体与发表的观点,利用哈工大的pyltp语言模型对输入的新闻进行分句、分词、命名实体识别,判断新闻是否存在实体,对包含实体的内容进行依存句法分析,若谓语存在相似词列表中,则后面句子为观点内容。