搜索资源列表
darts-0.2.tar
- 双数组辞典生成程序。利用双数组实现trie算法,对于不定长度共同前缀查询情况,比哈希方法更为有效。经常用于分词辞典的制作。-array dictionary-generation procedures. Using two arrays to achieve Trie algorithm for the indefinite length of the inquiry common prefix than Hash methods more effective. Frequently used
myKbest_0513
- 中文分词, N-最短路径算法 ICTCLAS研究学习组 http://groups.google.com/group/ictclas?msg=subscribe-Chinese word segmentation, N-shortest path algorithm ICTCLAS Studies Group http : / / groups.google.com / group / sub ictclas msg = scribe
SentenceSplitter
- .NET写的中文分词组件, 适用于小型搜索引擎中文分词。
wordseg
- 基于正向最大匹配法的分词。采用hash表技术将一段连续的话用所给词库进行分词输出。
Cidianku2
- 词库,delphi,中文分词软件,还不是很完善,请高手加以改进.
code
- 这其中涉及了黑名单、文本分类算法、短信内容分词、特征向量 选取等关键技术-That involves a black list, text classification algorithm, SMS is divided into words, feature vector selected key technologies such as
WebPages_WordSplitting
- 自动提取网页内容(附带简单的 HTTPAnalyzer 类),并根据词典进行分词。-Automatically get the content from webpages, and split the words based on the internal Chinese dictionary.
WebPages_InvertedFile
- 根据中文分词结果生成倒排文档,并将结果输出到文本文件中。-Generate the inverted file based on the result of word-splitting, and output to a text file.
fencisuanfa
- 用正向最大匹配发实现句子的分词。是基于词典的分词算法。该算法的特点是速度快,准确率高。-Made to achieve a positive match with a maximum sentence segmentation. Dictionary-based segmentation algorithm. The algorithm is characterized by fast and accurately.
liaotianfenci
- 一种基于国标2312(GB2312)汉字编码标准的分词算法,实现的分词效果是分成单个的汉字,可以识别英文、空格、中英文符号和数字等。也称原子分词算法。-Based on GB 2312 (GB2312) Chinese character coding standard segmentation algorithm to achieve the segmentation effect is divided into individual characters, can be identified
partition
- 分词系统的实现和测试 基于字符串的分词,根据分词标记提取单个词组-Segmentation system implementation and testing of the sub-string based on word segmentation based on extracting a single phrase marker
PanGu_Release_V2.3.1.0
- 盘古分词算法,应用于搜索和需要分词的地方,源码-Pangu word segmentation algorithm
Yard0.2.0
- 一个非常好的中文分词,用的是搜狗的词库,非常适合中文方面的分词-A very good Chinese word, using a thesaurus search dogs, ideal for a word in Chinese
zhongwenfenci
- 从后向前的分词程序,采用哈希查找的方式,实现中文分词程序-From back to front of the word program, using the hash to find a way to achieve the Chinese word program
20117230242204
- 有关语料处理的代码,包括简单分词,语法树分析 -The corpus processing code, including a simple word, syntax tree analysis
WordSegment
- 用C++开发的分词系统 运用基于哈希的逆向最大匹配算法 基于词典-Word in C development system uses a hash-based reverse maximum matching algorithm is based on dictionary
ICTCLAS50_Windows_32_C
- C++实现的中文分词算法,可以直接运行,也可以编译运行,还可以添加词典。-C++ of the realization of the Chinese word segmentation algorithm, it can run directly, also can compile operation, also can add dictionary.
yuantongji
- 实现对语句分词后的词项进行词频统计。用c++编写。-The statement word after word frequency statistics. Prepared using c++.
segment
- 利用双数组和字典实现了一个中文分词程序,优点是效率高,分词速度快,鲁棒性好。适合搜索引擎分词使用-Pairs of array and dictionary is a Chinese word segmentation procedure, the advantages of high efficiency, segmentation speed, robustness. Word for search engines
Chinese-WordCut
- 这是一个中文分词程序,读入一个Txt文档,可以对里面的段落进行分词-This is a Chinese word segmentation program that reads a Txt document segmentation paragraphs inside