搜索资源列表
partition
- 分词系统的实现和测试 基于字符串的分词,根据分词标记提取单个词组-Segmentation system implementation and testing of the sub-string based on word segmentation based on extracting a single phrase marker
PanGu_Release_V2.3.1.0
- 盘古分词算法,应用于搜索和需要分词的地方,源码-Pangu word segmentation algorithm
Yard0.2.0
- 一个非常好的中文分词,用的是搜狗的词库,非常适合中文方面的分词-A very good Chinese word, using a thesaurus search dogs, ideal for a word in Chinese
ICTCLAS50_Windows_32_JNI
- 中科院开发的中文分词算法,带词性标注等,经典的一个算法-Chinese Academy of Sciences of the Chinese word segmentation algorithm developed, with part of speech tagging, etc., a classic algorithm
Topic-oriented-meta-search-engine
- 关键词:面向主题 元搜索引擎 神经网络 相关性 中文分词 -Keywords: subject-oriented meta-search engines neural network relevance Chinese word
zhongwenfenci
- 从后向前的分词程序,采用哈希查找的方式,实现中文分词程序-From back to front of the word program, using the hash to find a way to achieve the Chinese word program
20117230242204
- 有关语料处理的代码,包括简单分词,语法树分析 -The corpus processing code, including a simple word, syntax tree analysis
WordSegment
- 用C++开发的分词系统 运用基于哈希的逆向最大匹配算法 基于词典-Word in C development system uses a hash-based reverse maximum matching algorithm is based on dictionary
ICTCLAS50_Windows_32_C
- C++实现的中文分词算法,可以直接运行,也可以编译运行,还可以添加词典。-C++ of the realization of the Chinese word segmentation algorithm, it can run directly, also can compile operation, also can add dictionary.
stopword-list
- 在文本进行分类聚类之前,必须对文本进行预处理。预处理的第一步是分词,这中间需要去除停用词。这个文件就是停用词列表-Must preprocess the text before the text classification clustering. The first step in preprocessing is the word, the middle need to remove the stop words. This file is the stop word list
textcluster
- 实现中文分词并聚类输出,分词算法是自己写的以空格分词,如果有需要高级的分词算法可自己下载相关算法-Realization of the Chinese word segmentation and clustering output
chinese-analyzer
- 主要是对自然语言进行分析,支持中文分词。对中文进行识别-Natural language analysis, support for Chinese word segmentation. Identification of Chinese
yuantongji
- 实现对语句分词后的词项进行词频统计。用c++编写。-The statement word after word frequency statistics. Prepared using c++.
segment
- 利用双数组和字典实现了一个中文分词程序,优点是效率高,分词速度快,鲁棒性好。适合搜索引擎分词使用-Pairs of array and dictionary is a Chinese word segmentation procedure, the advantages of high efficiency, segmentation speed, robustness. Word for search engines
CrfDeocder-windows-source
- 中文分词,利用条件随机场进行分词,里面有VC6写的和VC8写的两种。-Chinese word segmentation using conditional random field segmentation, which VC6 and VC8 write two.
TextCategorizer
- 自己实现的中文分词器、贝叶斯文本分类器 附分词词典、中文停用词表 用于数据挖掘学习、交流 Visual Studio 2010 开发-Realize his Chinese word segmentation, Bayesian text classifier the attached word dictionary, the Chinese stop word table is used for data mining learning, exchange of the Visua
IKTEST3.2
- 调用开源接口IKSegmentation、Lexeme实现的,分词功能。-Call open source interface, word.
Stemmer
- 在英语中,一个单词常常是另一个单词的“变种”,如:happy=>happiness,这里happy叫做happiness的词干(stem)。在信息检索系统中,我们常常做的一件事,就是在Term规范化过程中,提取词干(stemming),即除去英文单词分词变换形式的结尾。 应用最为广泛的、中等复杂程度的、基于后缀剥离的词干提取算法是波特词干算法,也叫波特词干器(Porter Stemmer)。详见官方网站。比较热门的检索系统包括Lucene、Whoosh等中的词干过滤器就是采用的波
ChineseSegment
- 一个完整的中文分词程序,有源码,词典,训练集。算法简洁高效,准确率高。包含了一种将标注语料和词典融合的新型分词方法。将语料分割为2:1为训练集和测试集,加上一个外部词典,准确率可以达到95 。适合入门者学习。也适合需要一个简单分词工具的应用。-A Chinese word segmentation procedures, source, dictionary, the training set. The algorithm is simple and efficient, high accura
Chinese-WordCut
- 这是一个中文分词程序,读入一个Txt文档,可以对里面的段落进行分词-This is a Chinese word segmentation program that reads a Txt document segmentation paragraphs inside