搜索资源列表
textclassification
- 基于svm的分词系统,可以对一段话中的词进行分离处理-Svm-based segmentation system, the passage can be isolated word processing
Tibetan-monosyllabc-dictionary
- 是藏语单音节词典,在藏藏语分词等研究领域使用比较频繁,特此分享-Tibetan monosyllabic dictionary is used in Tibet and other Tibetan areas of research word more frequently, hereby Share
LBChSeg
- 这是用c++写的一个正向最大匹配中文分词算法,主要实现的是中文的分词,从左向右,实现分词的最大匹配-This is the biggest match using c++ to write a forward maximum matching Chinese word segmentation algorithm, the main achievement is the Chinese word, and left to right, to achieve sub-word
sphinx-chinese-2.2.1
- PHP全文检索引擎--sphinx中文分词包-sphinx chinese support
search
- cygwin平台下实现的分词搜索,将输入文本分词后关键字检索-cygwin search
SearchEngine
- dySE 是个开源的 Java 小型搜索引擎。该搜索引擎分为三个模块:爬虫模块、预处理模块和搜索模块。其中详细阐述了: 多线程页面爬取、正文内容提取、文本提取、分词、索引建立、快照等功能的实现。-dySE is an open source Java small search engines. The search engine is divided into three modules: crawler module, pretreatment module and search module
fenci
- 适用于中文分词处理,运行环境为vc++6.0-Suitable for Chinese word processing, operating environment vc++ 6.0
java_libsvm
- libsvm+itics分词,可以用于文本的分类预测-libsvm+itics word that can be used for text classification prediction
paoding
- 庖丁分词的analyzer的代码,继承了analyzer类,可以直接使用。-Paoding segmentation analyizer code.
ICTCLAS50_Windows_64_JNI
- 中科院ictclas中文分词器 中科院ictclas中文分词器-CAS ictclas Chinese word segment CAS ictclas Chinese word segment
IKAnalyzer
- IKAnalyzer是一个开源的,基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始,IKAnalyzer已经推出了3个大版本。最初,它是以开源项目Luence为应用主体的,结合词典分词和文法分析算法的中文分词组件。新版本的IKAnalyzer3.0则发展为面向Java的公用分词组件,独立于Lucene项目,同时提供了对Lucene的默认优化实现。 -IKAnalyzer is an open source, lightweight java-based de
201411149222244
- 随便下载一篇中文的文本文档,通过这个程序可以将文档进行分词处理,还能够统计词语出现的次数-To download a Chinese text documents, through this program can be word processing document, will also be able to statistics the number of occurrences of words and phrases
use_nlpir
- 解决NLPIR-ICTCLAS2014分词系统词库过大,造成读取缓慢问题 -Solve NLPIR-ICTCLAS2014 segmentation system lexicon is too large, resulting in slow read issue
Project2
- 分词实验程序,通过读取txt中的中文词典,根据词典中的词语对文本进行中文分词-Word segmentation experiments, by reading the Chinese dictionary in txt, according to the words in the dictionary to Chinese word segmentation
syzlsearch_v2.5
- 基于Lucene开发的站内搜索解决方案,其集成专为站内搜索设计的细粒度中文分词,有效兼顾搜准和搜全率,无缝支持多种数据库数据汇入索引,支持全站,资讯,博客等多类型同时搜索,支持按相关度/时间检索,支持按时间过滤搜索-Based on Lucene development of the station search solution, its integration is designed for the station search and design of fine-grained Chin
Enhancedtextmining
- 强化版本文本挖掘流程,包含分词,分类聚类,分词结果评估等-Enhanced version of the text mining process, including word segmentation, classification clustering, segmentation results uation, etc.
InfoRetri
- 基于朴素贝叶斯的文本分类,包含去停用词,分词,特征提取,分类等-Text classification, based libsvm, included to stop words, segmentation, feature extraction and classification
1
- 检测中文文章的相似度,首先对中文文章分词处理,然后提取特征,计算特征向量夹角。检验是否相似-Similarity detection Chinese article, the first article of the Chinese word processing and feature extraction, feature vector angle calculation. Test whether similar
HanLP-1.2.7
- HanLP是一个致力于向生产环境普及NLP技术的开源Java工具包,支持中文分词(N-最短路分词、CRF分词、索引分词、用户自定义词典、词性标注),命名实体识别(中国人名、音译人名、日本人名、地名、实体机构名识别),关键词提取,自动摘要,短语提取,拼音转换,简繁转换,文本推荐,依存句法分析(MaxEnt依存句法分析、神经网络依存句法分析)。-HanLP is a dedicated to popularize NLP technology to production environment of
sentence_split
- 中文分词算法,输出以\作为分隔符,需要词典-Chinese word segmentation algorithm, the output with \ as the delimiter, needs dictionary