搜索资源列表
FreeICTCLAS.rar
- 汉语词法分析系统ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System),该系统的功能有:中文分词;词性标注;未登录词识别。分词正确率高达97.58 (973专家组评测),未登录词识别召回率均高于90 ,其中中国人名的识别召回率接近98 处理速度为31.5Kbytes/s。ICTCLAS的特色还在于:可以根据需要输出多个高概率结果,有多种输出格式,支持北大词性标注集,973专家组给出的词性标注集合。该系
java-cluster.zip
- 用java语言实现文本聚类,包括聚类前的数据预处理:分词、降维、建立向量空间模型等,Implementation using java language text clustering, including clustering of the data pre-processing before: segmentation, dimensionality reduction, set up, such as Vector Space Model
Chinese-Segmentation.rar
- 自己编写的中文分词源程序,用vc++编写,附有完整的文档,以及标准的分词数据库,I have written the source code of the Chinese word segmentation, using vc++ to prepare, with complete documentation, as well as sub-standard speech database
windows_c_32.rar
- 中国科学院的最新版本的中文分析程序,可以进行分词、词性标注等,The latest version of the Chinese Academy of Sciences of the Chinese language analysis procedures, can be sub-word-of-speech tagging, etc.
windowsC32.rar
- 汉语词法分词系统,主要功能包括中文分词;词性标注;命名实体识别;新词识别;同时支持用户词典。,Morphology of Chinese word segmentation systems, the main features include Chinese word segmentation-of-speech tagging named entity recognition new word identification At the same time support the use
VisualC.rar
- 分词算法,是用C++代码实现的,并有实际的运行例子。,Segmentation algorithm is the use of C++ code, and there is a practical example of the operation.
mmseg
- 基于双数组trie的分词程序,分词速度20MB/S,能够支持GBK、UTF8编码-Double array trie-based sub-word procedure word speed 20MB/S, can support GBK, UTF8 encoding
MyHL
- 调用海量智能分词研究版的dll获取分词的结果(C#) -Call massive intelligence Segmentation Research version of the dll to obtain the results of sub-word (C#)
v1.4.02
- 一个开源的分词系统,可实现人名识别和词库管理。-Segmentation of an open source system, name recognition can be achieved and thesaurus management.
NICTCLAS_Release
- 中科院分词程序 开源但是词库非开源 中科院分词程序 开源但是词库非开源-wordspilt
CutWordApp
- csharp实现的分词器,完整可以运行!结合正向逆向匹配法,效率较高-csharp device to achieve the sub-word, complete run! Forward Reverse with matching, more efficient
FreeICTCLAS
- ictclas c++版源代码,适用于C++语言的学习和中文分词算法的研究。-ictclas c++ version of the source code for C++ language learning and Chinese word segmentation algorithm.
segmentation
- 基于hashmap的首字哈希查找法,正向最大匹配法分词系统。代码用c++编写,本系统很好的实现了分词功能。-Based on the first word hash hashmap Find law, being the largest sub-word matching system. Code using c++ development, the system achieved a very good word function.
Chinesewordsegmentationalgorithm
- 中文分词算法,跟金山词霸一样,当鼠标移动到语句上时,能自动分割词语-Chinese word segmentation algorithm with the same PowerWord, when the mouse moved to sentence when the words automatically partition
word-frequency
- java 编写的词频统计,包含极易分词软件的包,Lucene包,程序调试通过-java written word frequency, word that contains the software package easy points, Lucene package, program debugging by
ICTCLAS
- 计算所汉语词法分析系统ICTCLAS.分词正确率高达97.58%(973专家组评测),未登录词识别召回率均高于90%,其中中国人名的识别召回率接近98%处理速度为31.5Kbytes/s。ICTCLAS的特色还在于:可以根据需要输出多个高概率结果,有多种输出格式,支持北大词性标注集,973专家组给出的词性标注集合。-Calculate the Chinese Lexical Analysis System ICTCLAS. Segmentation correct rate of 97.58 p
SW_I_WordSegment
- SW-I中文分词算法,MFC程序,在visual studio 2008中调试通过。默认词库为mdb,由于较大未包含在源文件中,请自行下载mdb格式的词典。-SW-I Chinese word segmentation algorithm, MFC procedures, visual studio 2008 in debug through. Default thesaurus for the mdb, as a result of the larger not included in the
OpenCNSegmenter
- 中文分词,可以将中文的句子按照单词进行切分,很优秀的算法,在网络中得到-Chinese word segmentation, Chinese sentence can be carried out in accordance with the word segmentation, it is excellent algorithm, in the network have been
ICTCLAS
- 中科院分词系统VC++版本,在VS2005下编译通过,含有所有源代码,可以保证在中科院算法上,自己添加新的想法,或者对已有的算法进行优化。-Word Segmentation System, Chinese Academy of Sciences VC++ version at compile under VS2005 passed, containing all the source code, algorithms can guarantee at the Chinese Academy o
ChineseWordSeg
- 采用最大概率法的中文自动分词软件,分词准确率达到70 以上。-Maximum probability method of Chinese word segmentation software, word accuracy rate of 70 .