搜索资源列表
WordSeg
- 利用最大匹配法进行汉语句子的分词 最大匹配算法是最常用的分词算法,简单实用正确率可达到80%以上-the maximum matching method for the Chinese Sentence Word maximum matching algorithm is the most commonly used word segmentation algorithm, simple and practical accuracy rate can reach more than 80%
ProbWordSeg
- 最大概率分词法,这种分词算法能够较好的解决汉语分词中的歧义问题,但分词效率比最大匹配分词算法要低-greatest probability points accidence, Segmentation algorithm can be used to solve the Chinese word segmentation of Ambiguity, but Word efficient than the largest matching segmentation algorithm lower
CRFPP0[1].53
- 条件随机域,主要用于标记序列,可以进行分词,词性标注,句法分析,以及文本抽取等。-condition random field
FreeICTCLAS
- 中科院自动化所的ICTCLAS,C++编写。用于中文文本分词-Automation of the Chinese Academy of Sciences ICTCLAS, C++ to prepare. For the Chinese text word segmentation
RostNat
- 很不错的语料分析工具,有分词、分析等等。最主要的还有TF/IDF的分析结果。很是实用-Very good tool for corpus analysis, took part in word analysis, and so on. The main TF/IDF analysis of the results. Is practical
ycsfwordseg
- 基于遗传算法的分词论文 基于遗传算法的分词论文-Segmentation Based on Genetic Algorithms PapersSegmentation Based on Genetic Algorithms PapersSegmentation Based on Genetic Algorithms Papers
Bayes_1
- 首先,对CATEGORY中的txt文件分类; 其次,对多个txt文件中的英文文本进行分词; 最后,通过贝叶斯公式进行分类;-First, in the txt file CATEGORY classification Secondly, multiple txt files in English text word Finally, by Bayes formula to be classified
code
- 这其中涉及了黑名单、文本分类算法、短信内容分词、特征向量 选取等关键技术-That involves a black list, text classification algorithm, SMS is divided into words, feature vector selected key technologies such as
vb
- 连接数据库 分词 去除停用词 计算权重值-Connect to the database to remove stop words word weighted value
CLucene
- clucene 源码,并且增加了自己写的正向最大匹配算法的分词程序。-clucene source code, and increase their own to write the forward maximum matching algorithm for the sub-word program.
dict
- 已处理过的中文分词词典Chinese Word Segment Dictionary,you may need to use it in your CWS program-Chinese Word Segment Dictionary,you may need to use it in your CWS program
YH_zhizhu_1.0
- 军长搜索是一款基于 Microsoft .NET 2.0 开发的垂直搜索引擎。系统有着强大的文件和数据库引索能力,支持中英文分词,文件相似度分析排序,文件数据时实监控与更新,恐龙级的引索速度和毫秒级的搜索速度,搜索结果高亮显示,系统分两部分组成第一部分是C/s的搜索蜘蛛,第二部分是B/s的 WEB用户搜索显示界面,其整个系统的工作过程完全模仿了超级搜索引擎的工作原理。系统支持对站内和全网的引索。 产品适用范围: 行业垂直搜索引擎、大型新闻门户网站站内搜索、大型行业门户网站
ICTCLASV1.2
- 中科院计算所的分词工具,可以进行分词工作-ICT tools by the word, the work can be sub-word
sample
- 中文分词,中文词法分析是中文信息处理的基础与关键-Chinese word
segChnWord
- 中文分词评测系统,用于评测中文分词的质量,给出准确率等-Chinese word segmentation evaluation system for evaluating the quality of Chinese word segmentation, given the accuracy of such
WebPages_WordSplitting
- 自动提取网页内容(附带简单的 HTTPAnalyzer 类),并根据词典进行分词。-Automatically get the content from webpages, and split the words based on the internal Chinese dictionary.
WebPages_InvertedFile
- 根据中文分词结果生成倒排文档,并将结果输出到文本文件中。-Generate the inverted file based on the result of word-splitting, and output to a text file.
fencisuanfa
- 用正向最大匹配发实现句子的分词。是基于词典的分词算法。该算法的特点是速度快,准确率高。-Made to achieve a positive match with a maximum sentence segmentation. Dictionary-based segmentation algorithm. The algorithm is characterized by fast and accurately.
liaotianfenci
- 一种基于国标2312(GB2312)汉字编码标准的分词算法,实现的分词效果是分成单个的汉字,可以识别英文、空格、中英文符号和数字等。也称原子分词算法。-Based on GB 2312 (GB2312) Chinese character coding standard segmentation algorithm to achieve the segmentation effect is divided into individual characters, can be identified
Chinese-text-categorization-Study
- 本文通过对Bayes、KNN、SVM 应用于中文文本分类进行比较实验研究。 应用ICTCLAS 对中文文档进行分词,在大维数,多数据情况下应用TFIDF 进行 特征选择,并同时利用它实现了对特征项进行加权处理,使文本库中的每个文本 具有统一的、可处理的结构模型。然后通过三类分类算法实现了对权值数据进行 训练和分类。-Based on the Bayes, KNN, SVM applied to compare the Chinese text ca