搜索资源列表
luceneStudy
- lucene搜索引擎小例子,可以完整运行,带jar包,分词使用lucene自带的分词器-lucene search engine example, you can complete run, with a jar lucene comes word segmentation
search_engine
- 搜索引擎课程的几次作业,第一次作业实现分词算法,使用正向最大匹配原则。第二次作业实现倒排索引。第三次作业实现正排索引。第四次作业实现索引功能。实现的比较简单,但均可正常运行。-Several job search engine programs, the first job segmentation algorithm, using forward maximum matching principle. The second job inverted index. The third job t
ansj_seg-master
- 这是一个ictclas的java实现.基本上重写了所有的数据结构和算法.词典是用的开源版的ictclas所提供的.并且进行了部分的人工优化 内存中中文分词每秒钟大约100万字(速度上已经超越ictclas) 文件读取分词每秒钟大约30万字 准确率能达到96 以上-This is a ictclas of java. Basically rewrite all the data structures and algorithms. Dictionary is prov
WordSequence
- 正向最大匹配法实现中文分词,并计算其正确率(P)、召回率(R)和F测度-Positive maximum matching method to realize the Chinese word segmentation
mmsegger_src_1.0
- 最大正向匹配分词算法,用于分词和自然语言处理-Maximum forward matching word segmentation algorithm
HLSeg_JAVA_Example
- 中文分词 支持对输出颗粒的控制,可以输出普通颗粒与用于检索的小颗粒;同时输出词串所在句号、段号、词号、词性等信息。 关于分词输出颗粒,我们认为各种应用对分词要求的颗粒度是不同的. 比如自动分类、关键词抽取比搜索需要的分词颗粒度要大, 因为这样表示文本语义特征时效果会更好, 而检索有一个查全率的要求, 就需要把分词单位做的更为细致, 不然就会造成漏查。 海量系统现在提供了两种颗粒的规则, 其中, 默认的为大颗粒接口, 主要用于自动分类、信息挖潜、机器翻译、语音合成、人工智能等领域,
test
- 计算句子的相似度,分为计算词形相似度和词序相似度。使用中科院提供的分词工具实现分词。-Calculate sentence similarity, divided into the calculation of word shape similarity and word order similarity. Use segmentation tools provided by the Chinese Academy of Sciences segmentation.
WordSegment
- 以MyEclipse为编程环境,实现自动汉语分词-MyEclipse is the programming environment, automatic Chinese word segmentation
CutWords
- 正向最大匹配实现的分词程序,有注释,清晰易懂-Positive maximum matching realize word segmentation procedure, have comments, transparent
ICTCLAS_Demo
- 该程序用于短信过滤分析,首先使用ICTCLAS分词系统对输入短信进行分词,接着使用贝叶斯算法分析训练模型。最后对测试集中的短信进行垃圾短信预测。由于短信涉及个人隐私,使用时请自行添加训练数据集和测试数据集-The procedures used for SMS filtering analysis, the first to use the segmentation the ICTCLAS segmentation system input SMS, then use a Bayesian al
MapTest
- 倒排索引,此程序,运用ICTClas分词工具实现的中文分词,并建立倒排索引输出到指定文件。-Inverted index, this procedure, use ICTClas segmentation tool to achieve the Chinese word segmentation and indexing inverted output to the specified file.
Segment
- java实现的分词操作,可用于将一句话按照汉语习惯分成对应词-java achieve segmentation operation, can be divided into the corresponding word sentence in accordance with the Chinese habit
FenciEvaluater
- 分词召回率和(R)和准确率(P)的统计,可以根据分词结果和标准结果计算出召回率和准确率值-Segmentation recall rate (R) and accuracy (P) statistics, can be calculated according to the results of the segmentation results and standard recall and accuracy values
CnFenci(0)
- 基于字典的分词,采用正向最大匹配和反向最小匹配相结合。-Forward maximum matching and reverse the minimum matching combination of dictionary-based segmentation.
CWSSFenci
- java基于字典的分词,字典存储结构采用Hash表,并和Lucene的token流接口相结合,可以再lucene中使用-Hash tables java dictionary-based segmentation, dictionary storage structure and lucene in use and Lucene token stream interface combined
WordRecoverTool
- 一种基于二叉树数据库模型的分词算法,很好用 -A segmentation algorithm based on binary tree database model, well used
1569407281
- Image-guided surgery is one other important application of segmentation. Recent advances in technology have made it possible to acquire images of the patient while the surgery is taking place. The goal is then to segment relevant regions of int
chinese-analyzer
- imdict chinese analyzer分词程序 ,是中科院ICTCALS的重实现,加入了lucene的分词jar包,完整的程序-word program imdict chinese analyzer is the Chinese Academy ICTCALS heavy realize adding a the lucene segmentation jar package, complete program
1127
- 应用中科院的分词模块进行分词并把结果存在硬盘,广泛应用于自然语言处理-Application of the Chinese Academy of Sciences segmentation module segmentation, and there is a hard drive, is widely used in natural language processing
Java
- 能实现分词,去除停用词,统计词频的Java的源代码-To achieve segmentation, removal of stop words, word frequency statistics Java source code