搜索资源列表
pythonChinesecut
- 在网上下载的程序,本人不但对程序本身做了详细的注释,还对中文切词的一些思路做了一定的介绍,更便于在此基础上进行修改。2007年9月3号在www.pudn.com上首发!
nlp
- 中文自然語言處理相關程式,包括中文字頻統計及Jensen-Shannon Divergence計算程式,並包含古典文獻範例-This rar file include natural language processing related programs, includeing Chinese term frequency statistics, Jensen-Shannon Divergence program and text examples.
arabic-english-dictionary
- A arabic-english dictionary software including the dictionary definition files.
ngrams
- 自然语言处理相关程序,有关分词的和词频统计-Natural language processing procedures, the statistical segmentation and word frequency
iniparse-0.4.tar
- A nice Python library to extend some functionalities of the builtin module. It is able to read and write (manage) ini configuration files!
dealwith
- 主要是对分词完后的文本进行处理 包括去停用词 选择特定的词性的词语 去词性-Text processing
smallseg_0.6.tar
- 一个简单的中文分词系统的原代码,实现了基于language model的分词逻辑-word segment
convertfilename
- 不同语言环境中压缩文件,解压压后通常出现乱码,该脚本可转换乱码-file name code convert
SV
- IBM Model 1 Expectation Algorithm which takes two pieces of texts in different languages, and outputs the text alignment in a table, as well as the Viterbi alignment
Milkshake-0.1.1-src
- to do list in python
BytePairEncoding
- python处理语言,使用二进制转化。这个不错的-python processing language, using the binary conversion. This is true
envec.py
- 识别中文,对中文词进行统计,打印出每个中文词的数目。-Identify Chinese, the Chinese word for statistics, print out the number of each Chinese word.
randomGen.py.tar
- 首先利用现有文本训练trigram模型,再用模型随机生成n个单词的文本-First, the use of existing training trigram model text, and then the model is randomly generated n-word text
numdect.py
- 汉字月份识别与转换为数字月份,特殊:含有十的汉字的处理-Character recognition and converted to digital in February month, special: contains characters deal with ten
PJ
- 个人期末Project,中文信息处理,可生成同风格随机歌词-One final Project, Chinese information processing, can be generated with the random style of lyrics
translate
- This is Google Translator written in Python
terminal
- terminal 5250 coneection
jieba_plus
- 解决jieba分词中部分bug,包括全角字母和数字等,更新中(solve part of the bugs in Jieba segmentation, update)
jieba
- python结巴分词模块.能对句子进行分词.(python module that make sentences into words)
jieba-0.39
- Python非常强大的中文分词包,用于自然语言处理。(Python is a very powerful Chinese word wrapper for natural language processing.)