搜索资源列表
Frequency
- 这是已经封装好的exe程序,可以获取指定txt文本中词频统计信息,方便爬取数据-It is already packaged exe program, you can get the text to specify txt word frequency statistics to facilitate crawling data
WordsCount
- Windows下基于visual studio 2010平台的一个简单但是全面的文本词频统计程序,非常好用!-Under Windows simple but comprehensive text word frequency statistics program, very easy to use visual studio 2010 on a platform!
MFC-Look-it-up-in-the-dictionary
- 查词典、分词、词频统计程序,非常实用读者,建议下载。-Look it up in the dictionary, word segmentation, word frequency statistics program, very practical readers, it is recommended to download.
lda-c
- LDA是一种文档主题生成模型,也称为一个三层贝叶斯概率模型,包含词、主题和文档三层结构。文档到主题服从Dirichlet分布,主题到词服从多项式分布。 LDA是一种非监督机器学习技术,可以用来识别大规模文档集(document collection)或语料库(corpus)中潜藏的主题信息。它采用了词袋(bag of words)的方法,这种方法将每一篇文档视为一个词频向量,从而将文本信息转化为了易于建模的数字信息。但是词袋方法没有考虑词与词之间的顺序,这简化了问题的复杂性,同时也为
WordCount
- WORDCOUNT单词词频统计,能遍历文件夹,分辨文件类型并统计文档文件中的单词频率,排序输出。包含文件搜索类ScanWord,单词统计类TextDoc,并有两个类的JUnitTest文件及相应的说明文档readme,正确性文档。-Word frequency statistics file and sort output, including file search category ScanWord, word statistical category TextDoc, and there
cipintongji
- 文本词频统计的程序,挺简单的很好用,里面附有源代码,已经测试过了。-Text word frequency statistics program, very simple easy to use, which with the source code, has been tested.
textquery
- 一个查询书籍中词频出现次数的程序,是C++ primer 第十章的内容,使用map和vector实现-A query book word occurrences of the program
wordCount
- 代码功能:文件中的词频检测,一个文件中存储待检测文本,另一个文件中存储检测的单词,且单词可为多个,检测中单词的单复数为同一单词,且能检测数字、数字与字母组合的单词- Code Function: document word frequency detection, a file stored text to be detected, another file storage detection of words, and words can be multiple, detection of
word_find
- 英文词频统计,单复数算一个单词,大小写算一个单词-word frequency find
tfidf-computation-using-Lucene
- tf-idf 是进行词频统计的程序,可对词频进行统计,用Lucene-tf-idf is the frequency statistics of procedures, word frequency statistics for using Lucene
NlPIR
- 中文分词与词频统计 64位 为eclipse工程文件,可直接运行,效果不错-Chinese word and word frequency statistics 64
Counting
- 该程序用于实现统计词频功能,从文件读取内容,将统计结果输出到文件 -count the frequent of the words a file,and write down the result in a file.
nlp
- nlp中的词频统计,功能是统计语料库中的词频。 以及基于隐马尔可夫的音字转换系统-nlp the frequency statistics, the function of word frequency statistics corpus. As well as audio-based Hidden Markov word conversion system
ICTCLAS2014
- 中文自然语言处理相关程序,包括中文词频统计、新词发现等功能,并含有示例文档。-Chinese Natural Language Processing related procedures, including Chinese word frequency statistics, new word detection and other functions, and contains a sample document.
5-4solution
- 基于词频的文件相似度,实现一种简单原始的文件相似度计算-Frequency-based document similarity, the original file to implement a simple similarity calculation
sort
- 利用插入排序和首字母归类统计英文单词的词频,经过一些优化-Use insertion sort and classify the first letter of the English word word frequency statistics, after some optimization
IDF
- IDF反映了在文档集合中一个单词对一个文档的重要性,经常在文本数据挖据与信息提取中用来作为权重因子。在一份给定的文件里,词频(termfrequency-TF)指的是某一个给定的词语在该文件中出现的频率。逆向文件频率(inversedocument frequency,IDF)是一个词语普遍重要性的度量。-IDF reflects the importance of a word in a document collection for a document, often in the text
word
- 统计一个txt文本中的词频。使用方法是将要统计的文本文件放在py文件的同一目录下,并根据py文件注释更改文本文件名-calculate the tf within a*.txt
zipf_law
- Zipf发现,不同语言中,不同词语出现的频率分布符合统一的数学规律。 Zipf律(Zipf law):将某一个单词(如the)在一篇文章中出现的次数排名记作r,该词语出现的次数记作f(r)的话,那么f(r)~r-1 以《哈姆雷特》为例,用MATLAB对其进行词频分析,并验证Zipf律。-Zipf found, different languages, different words with the frequency distribution in line with unified
hadoop_py
- hadoop词频统计,即统计一个句子或一篇文章中某词出现的频率。-hadoop frequency statistics, namely the statistical frequency of an article in a sentence or a word appears.