搜索资源列表
pca-test
- PCA简单实现,自己引用数据,能出两个基本指标图-PCA simple implementation code, their own reference data, to a data map, Q statistics and T2 statistic diagram
lda_perplexity
- 用训练出的模型测试词以及概率,并统计词数和计算困惑度-With the trained model test and the probability and statistics of words, words and perplexity calculation
pls
- 多元统计回归,偏最小二乘,适用于数据挖掘-Multivariate Statistics
Hadoop
- 使用hadoop开发,可以对输入文件中出现的关键词统计词频并进行不同文本词频统计高低的排序,本代码需要用户自行定义关键词和输入文件-Use hadoop development, can appear in the input file keyword statistics word frequency and low frequency statistics different sort of text, the code requires a user-defined keywords an
Data
- 《R统计与数据挖掘》一书中所有源码,可以直接编译- R statistics and data mining, a book all the source code, can be directly compiled
programme
- 将彩色图转化成灰度图,在此基础上统计连通区域,根据一定的统计特性,设置一定的准则,检测出人脸。-The color map is converted to grayscale, communicating regional statistics on this basis, according to certain statistical properties, set certain criteria, detects a person' s face.
DataTest
- 统计一亿个IP中每个出现的次数,找不到大数据之类的分类,只能选择数据挖掘-Statistics IP in one hundred million times each appears, can not find such a large data classification, data mining can only choose
NaiveBayesClassifier.m
- I use Matlab 2008a which does not support Naive Bayes Classifier. scr ipt supports normal and kernel distributions. Statistics toolbox for 2008a version is used in the scr ipt. Also includes function for confusionmat
code
- (神经网络)多个隐含层的多层感知器网络训练数据得到网络,并使用测试数据统计所设计多层感知器的平均识别正确率-Multi layer perceptron network training data with multiple hidden layers is obtained, and the average recognition accuracy of the multi-layer perceptron is designed by using the test data statisti
Part1
- 实现了500篇纽约时报新闻的数据挖掘,包括数据预处理、基本数据统计等-Achieved 500 New York Times news data mining, including data preprocessing, basic data statistics, etc.
WordFrequenceCount
- 基于文本的词频计算,对文本内的单词进行统计,可统计上万单词,一次输出。-Based on the text of the word frequency calculation, the words within the text statistics, statistics can be tens of thousands of words, an output.
统计建模于R
- 基于R语言的建模,结合例子的代码实现,包括假设检验与各种统计量的计算(Based on the R language modeling, combined with the code implementation of the example, including the hypothesis test and the calculation of various statistics)
Crawler.tar
- 利用了python3.5编写了一个爬虫,爬取豆瓣上电影《声之形》的评论,并统计评论词的频率,制作了词云(Using python3.5 to write a crawler, climb the comments on the movie "sound shape", and statistics the frequency of the comment word, making the word cloud)
R
- 金融数据的 R 分析,包括线性模型,时序分析,波动率分析,VaR 风险分析等。(An Introduction to Analysis of Financial Data with R)
dataanalyse
- 利用pandas、numpy、scipy组建的数据分析工具。可以实现均值、频数、最大值、最小值、分位数等得统计。(Data analysis tools built by pandas, numpy and SciPy. The statistics of mean, frequency, maximum, minimum and quantile can be achieved.)
