搜索资源列表
nlp
- nlp中的词频统计,功能是统计语料库中的词频。 以及基于隐马尔可夫的音字转换系统-nlp the frequency statistics, the function of word frequency statistics corpus. As well as audio-based Hidden Markov word conversion system
lankasite2
- 兰卡斯特汉语语料库,做NLP、文本处理之类的可以用得上-Lancaster Chinese corpus, NLP can do lingua franca
ngramtool-20040527-mingw32-static
- 在Windows环境下,可以对大规模语料进行n-gram统计,并且可以删除冗余子串。-In the Windows environment, you can carry out a large-scale corpus based n-gram statistics, and you can held redunction of substring.
VQ
- 实现了基于VQ的语音识别系统,里面有自己录的语料,可以实时的录入并识别十个数字的单音字,作为语音识别的初学很有帮助-Realized VQ-based speech recognition system, which has its own record of corpus, real-time entry and word recognition tone ten digits, as speech recognition beginners helpful
Encrypt
- 单字母加密,基于特征值,英语 语料库(词汇量7万左右)-Single letter encryption, based on the characteristic values of English corpus (vocabulary 70000 or so)
WPCrawler
- 网络爬虫,也叫网络蜘蛛,有的项目也把它称作“walker”。维基百科所给的定义是“一种系统地扫描互联网,以获取索引为目的的网络程序”。网络上有很多关于网络爬虫的开源项目,其中比较有名的是Heritrix和Apache Nutch。 有时需要在网上搜集信息,如果需要搜集的是获取方法单一而人工搜集费时费力的信息,比如统计一个网站每个月发了多少篇文章、用了哪些标签,为自然语言处理项目搜集语料,或者为模式识别项目搜集图片等等,就需要爬虫程序来完成这样的任务。而且搜索引擎必不可少的组件之一也
quanwenjiansuo
- 全文检索程序,最长匹配,可以立刻找到所有出现的句子,需要语料库,(例如人民日报)。-text retri procedures, the longest match, can immediately find all the sentences need to Corpus (for example, the People' s Daily).
aiml-en-us-foundation-alice.snapshot
- ALICE问答系统的aiml格式对话语料,比较全的英文问答系统语料,供大家研究使用,可翻译成中文,参考设计中文问答系统。-aiml format dialog data ALICE question answering system, comparison of the whole corpus of English question answering system for everyone to use, can be translated into Chinese, reference d
Word2VEC
- 从Word2vec训练好的语料中提取余弦距离-The cosine distance is extracted the corpus of Word2vec training.
aiml
- aiml python 版本 里面包含alice语料库 有需要的朋友可以下载一下-aiml python version
learning-data-mining-with-python
- 《python数据挖掘入门与实践》随书源代码,Chapter1-Chapter12.使用ipython notebook运行,包括社会媒体挖掘,作者归属,新闻语料分析,大数据处理等应用实例。-Python data mining entry and practice with the book source code, using Chapter1-Chapter12. IPython notebook operation, including social media mining, aut
tc-corpus-answer
- 复旦中文文本语料库,共十类文本,未分词,有兴趣可以-Fudan Chinese text corpus
GMM_gulici
- 基于GMM的孤立词识别,包含源代码和语料-isolated word recognition based on GMM, including source code and the corpus
aec-test-audio
- 用于测试AEC的一些音频语料,用于测试AEC的一些音频语料,-Used for testing the AEC some audio corpora,
jevmkm
- SVM文本分类器源程序,英文界面,包含语料,没有解压缩密码,-The SVM classifier source program text, English interface, contains the corpus, not unzip password,
databayy
- 一份很重要的语料库,为你的分词程序是一个很好用的资料库文件-An important corpus, word segmentation procedure for you is a very useful files
95777978
- SVM文本分类器源程序,英文界面,包含语料,没有解压缩密码,-The SVM classifier source program text, English interface, contains the corpus, not unzip password,
canaonstruction
- 这是一个语料库查询系统,可以学习一下VC的文件操作和管理平台建设-This is a corpus query system, can learn VC file operations and management platform construction
LSI
- 基于隐语义模型的新闻相似度分析,根据一片包含三千多篇的新闻语料库,做新闻相似度分析。-Based on the similarity news hidden semantic analysis of the model, according to a news article that contains more than three thousand corpus, do news similarity analysis.
DocumentSimilarity.py
- 基于向量空间模型的计算新闻相似度算法,根据一篇1998年的人民日报语料库,进行文章相似度计算,输出结果为一个上三角矩阵-News similarity algorithm to calculate the vector space model, according to a People' s Daily Corpus 1998, carried articles similarity calculation, output is an upper triangular matrix