搜索资源列表
THULAC_lite_java_v1
- 中文文本分词 词频统计,分词,去掉停词。 仅支持UTF-8编码-Chinese text segmentation To get the word frequency, word segmentation, remove stop words. Support only UTF-8 encoding
NLPLibSVM
- libsvm分词训练集的java版本。包括libsvm.jar以及训练集样本-Libsvm version of the Java word segmentation training set. Including libsvm.jar and training set samples
ICTCLAS
- 中科院的中文分词系统ICTCLAS,eclipse直接引入项目就可以用,已经测试过很好用的。- Chinese Academy of Sciences Chinese word breaker ICTCLAS, eclipse introduced directly into the project can be used, it has been tested very good use.
divide
- 采用正向最大匹配算法实现中文分词,基于matlab2013编程-MATLAB code
ICTCLAS_api
- 用于为指定文本进行分词操作。按照不同的词性进行分词。-Used to specify the text for the operation of word segmentation. According to different parts of speech.
kmeansClassifier
- 该程序实现了keans分类,使用IK分词技术实现分词。-The program implements the k means classification, the use of IK word segmentation technology to achieve word segmentation.
fenciledebeiyesi
- 中文文本分词系统+基于贝叶斯算法的文本分类源码,用matlab实现。-Chinese word segmentation system+ based on Bayes text classification source code, using matlab implementation.
NLP-speech-tagging
- 基于隐马尔可夫模型的中文分词、词性标注、命名实体识别-Based on Chinese word hidden Markov model, speech tagging, named entity recognition
Sogou-character-porfile
- 介绍人物标签处理的过程,从数据采集,分词,预处理,算法选择以及结果展示方面来介绍相关过程。-This paper introduces the process of character label processing, and introduces the process of data acquisition, word segmentation, preprocessing, algorithm selection and result display.
2_simplifyweibo
- 20万情感语料 已经分词了 20万情感语料 已经分词了-200 thousand, the emotional corpus has been participle
jieba for Python
- jieba分词功能在python中的实现方法(The Method of jieba for word-split in python)
jieba
- 将句子分成很小的独立词,来提取信息,对照数据字典得到有用的关键信息,进行智能筛选题目或回答问题。(The sentence is divided into very small independent words to extract information, and the data dictionary is used to obtain useful key information.)
3130383
- 最大概率分词法,这种分词算法能够较好的解决汉语分词中的歧义问题,但分词效率比最大匹配分词算法要低()
9178839
- 汉语分词算法,包含最大匹配和基于概率的分词算法()
kctp
- 此代码实现数据的预处理,包括分词、去符号、去停用词等。(This code realizes the preprocessing of data, including participle, symbol, stop words, etc.)
textclustering-master
- 对于大文本进行挖掘聚类,该方法不考虑文字词语出现的频率信息,考虑上下文语境,将所有的字根据预定义的特征进行词位特征学习,获得一个训练模型。然后对待分字符串的每一个字进行词位标注,最后根据词位定义获得最终的分词结果。(Digging for large text clustering, the method does not consider the text word frequency of information, considering the context, all the words
Alice
- 支持中文的alice,中文分词采用的是mmseg算法(alice with mmseg support chinese,need download mmseg4j.jar.)
DeepLearning
- 用于分词,深度学习算法,使用了RNN神经网络,可以进行参数设置(Used for participle, depth learning algorithm)
文本分类_监管处罚Rcode
- NLP分词,本代码可以用于切割中文关键词,实现信息归类(NLP participle, this code can be used to cut Chinese key words, the realization of information classification)
text_classification_AI100-master
- 实现LSTM情感分析,中文,用到结巴分词,LSTM模型,环境python3.0(Achieve LSTM sentiment analysis, Chinese, use stuttering participle, LSTM model, environment python3.0)