搜索资源列表
语料库
- 一份很重要的语料库,为你的分词程序是一个很好用的资料库文件-a very important corpus, as your segmentation procedure is a very good use of the database file
TextCategorization
- 基于朴素贝叶斯算法实现的中文文本分类程序。可以对中文文本进行分类识别,使用时先对分类器进行训练,然后进行识别。该Beta版本仅支持对3类文本进行分类,使用简单的中文分词方法,本程序尚不具备实用性,用于算法研究和改进。-based on Bayesian algorithms to achieve the Chinese text classification procedure. Can the Chinese text classification identification, the us
WordClassify
- 一个分词程序,c代码,有很详细的注释,便于阅读
CLucene
- clucene 源码,并且增加了自己写的正向最大匹配算法的分词程序。-clucene source code, and increase their own to write the forward maximum matching algorithm for the sub-word program.
GBKhash
- 利用了GBK编码的hash表,快速进行汉语分词的自然语言程序-Advantage of the GBK-encoded hash table, fast Chinese word segmentation of natural language program
ChineseSegment
- 一个完整的中文分词程序,有源码,词典,训练集。算法简洁高效,准确率高。包含了一种将标注语料和词典融合的新型分词方法。将语料分割为2:1为训练集和测试集,加上一个外部词典,准确率可以达到95 。适合入门者学习。也适合需要一个简单分词工具的应用。-A Chinese word segmentation procedures, source, dictionary, the training set. The algorithm is simple and efficient, high accura
hmmWordSegmentation
- 这是一个基于hmm模型的句子分词程序,语言是python,目前输入语句不支持标点符号。-This program is for divising a sentence into seperate words base on hmm.
NBclassfier
- 贝叶斯情感分类器,基于五倍交叉法来验证。程序可以直接运行,改程序是在基于已经分词的情况下实施的。-Bayesian classifier, emotion to verify five times based on the crossover. Program can be run directly, the program is based on the segmentation of the case.
IKAnalyzer
- JAVA实现简单客服的机器人系统,分词用系统用IK分词,机器人语言用AIML。程序已经实现java socket服务的建立。实现了中文分词,同义词输出,答案匹配。用到的库有IK、program-ab。搞了一个月的小成果,希望大家能用到。-JAVA simple customer service robotic systems, word by word IK systems, robot language with AIML. Procedures have been implemented t
遗忘算法(词库生成、分词、词权重)演示程序
- 通过非主流的遗传算法进行关键词提取,分词的功能(Through the non mainstream genetic algorithm for keyword extraction, word segmentation function)
jieba-jieba3k
- MATLAB 结巴分词的工具包,用于很多中文分词的模式识别代码程序,利用已有函数工具包提高工作效率,内有安装说明(MATLAB jieba toolkit, used for many Chinese word segmentation pattern recognition code programs, using existing function toolkits to improve work efficiency, with installation instructions)