搜索资源列表
TDHCursorFactory
- 是一个使用Perl语言编写的一个开源文本挖掘的程序。其中涉及了众多文本挖掘的技术,如文本聚类、分词、索引,搜索引擎、字典等等。-Perl is a language to use an open source text mining process. Involving a large number of text mining technologies, such as text clustering, segmentation, indexing, search engines, dicti
CutwordShort
- 用于搜索引擎上的切词程序,可以获得比较好的分词结果。运行速度可以达到约500k词/s(笔记本上)-For the search engine on the segmentation procedure can be the result of a better word. Running speed of about 800k words/s (notebook)
webSearch
- 迅龙中文Web搜索引擎核心代码 运行环境 Microsoft .NET Framework 2.0 C# 开发 注意:必需步骤 nSearch\xOcx\install.bat 加载分词组件 -迅龙中文Web搜索引擎核心代码
phpsojiqidll
- 自己写的一些PHP扩展EXT DLL,包括搜集齐分词扩展,imagick动态处理图片类,搜集齐核心排序算法DLL(www.sojiqi.com)-PHP to write some of their expansion EXT DLL, including the expansion of the collection of word Qi, imagick dynamic picture categories, the core sorting algorithm to collect Qi
DictSeg
- Lucene的一个不错的分词组件,效果还不错,现在已经是1.4版本了-Lucene' s a good word components, the results were good, and is now 1.4 version of the
chinafenci
- 中文分词,读取txt文档然后给词分类,
SphinxV0.9.8.1source
- SphinxV0.9.8.1source.zip VC++ 基于Lucene扩展的支持中文分词的开源搜索引擎 中文用户-Sphinx V0.9.8.1 source.zip VC++ extensions to support Lucene-based Chinese word segmentation in Chinese open source search engine users
luceneCH2
- 亲自测试成功的《开发自己的搜索引擎》第二章,以及一个分词器的测试程序。-Personally successfully tested a " to develop its own search engine" second chapter, as well as a word breaker testing procedures.
Solution1
- 利用lucen.net和盘古分词算法,在使用时利用索引将定点spider定制的网页,然后搜寻相应的点放到服务器索引库-use lucen.net technology and pagu
N-gram
- N-gram中文分词系统,通过前后项切分,计算概率,进而获得最佳的切分-N-gram Chinese segment,by FMM and RMM,we can Calculate the probabilities,then,we can get the best segment.
fenci
- 用C++写的一个分词算法,有需要的可以下来-Written in C++, a segmentation algorithm, we need to look down
baidu
- 百度(baidu)分词算法分析 有关百度搜索的数据分析-baidu
totsearch
- 淘特站内搜索引擎(C#版)基于Lucene.Net核心,通过高效的中文分词算法将数据库中内容进行分析、索引并保存至硬盘中。前台搜索时,通过读取索引文件查询,避免了传统数据库查询在高并发及海量数据下的性能问题。因前台搜索不在连接数据库,为不希望数据库放到前台的特殊用户群体提供了数据快速查询解决方案-Amoy Special Search Engine (C# Edition) Based on Lucene.Net core, through the effective Chinese word
HZ_Freq
- java中文分词系统,可供大家学习,祝成功路上越走越远!-Java wordseg program
dotNetSegDemo
- 搜索引擎分词支持Lucene.net.rar
chinese_lucene
- c#中文分词类库+可用来做搜索引擎分词+准确率90 +支持Lucene.net.rar
fenci
- 帮组我们实现中文分词,程序较为粗糙,请见谅,-Help us to achieve Chinese word group, the program is more rough, please forgive me,
windows_JNI_32
- 这是中科院的分词软件,内容很全,有使用说明,及相应的demo,用于中文NLP领域的预处理,非常好!-This is the Chinese Academy of Sciences of the segmentation software, content is very wide, there are instructions for use, and the corresponding demo, pre-processing for Chinese NLP areas, very good!
IKAnalyzer3.2.5Stable_src
- IK ANALYZER 一个进行中文分词的软件-a chinese word-splitting tool
TokenizerTest2011
- C# 分词演示程序 TokenizerTest2011.rar