搜索资源 - 语料 - 搜珍网

CDN加速镜像 | 设为首页 | 加入收藏夹

热门搜索： 源码 Android 整站插件识别 p2p OpenCV 网络编程游戏源码算法更多...

登陆 | 会员注册

当前位置：

搜索资源 - 语料

下载资源主分类

源码下载

Web源码

开发工具

文档下载

其它资源

资源分类

搜索资源列表

transfer.py.tar

0下载：
通过Unicode内码值计算，将中文全角标点符号转换为中文半角标点符号的Python脚本，可用于统一语料中的标点符号全半角。-Code value in Unicode within the Chinese full-width punctuation into Chinese half-width punctuation Python scr ipt can be used for the width of a unified corpus of punctuation.
所属分类：AI-NN-PR
- 发布日期：2017-04-03
- 文件大小：557
- 提供者：滑车

Guess

0下载：
自然语言处理程序，读入一段文本，进行猜字。根据前文文本，采用3元文法或者4元文法（可选），进行逐个猜字，并计算猜字准确率。训练文本为1998年1月人民日报整理语料。-Natural language processing program that reads a piece of text, to guess the word. According to the former text, using a 3 gram or 4 gram (optional), conducted by-gues
所属分类：Windows Develop
- 发布日期：2017-11-19
- 文件大小：27244544
- 提供者：王蔚

identified-in-set

1下载：
基于MFCC参数和DTW算法的语音识别算法，对0-9这十个数字的中文发音进行识别。该程序对特定的五个人做集合内的识别，程序中已有语料。-The MFCC parameter and DTW algorithm-based speech recognition algorithm, the 0-9 ten digits Chinese pronunciation recognition. The program on five specific identification collection
所属分类：Other systems
- 发布日期：2017-11-23
- 文件大小：274462
- 提供者：lemywong

identified-out-of-set

0下载：
基于MFCC参数和DTW算法的语音识别算法，对0-9这十个数字的中文发音进行识别。该程序对特定的三个人的声音做训练，并用来识别这三个人之外的30个人的发音，即特定人的集合外的识别，程序中已有语料。-The MFCC parameter and DTW algorithm-based speech recognition algorithm, the 0-9 ten digits Chinese pronunciation recognition. Do training of the progr
所属分类：Other systems
- 发布日期：2017-11-16
- 文件大小：1769309
- 提供者：lemywong

TIMIT

1下载：
TIMIT的一部分语料库，不是很全，刚从网上下载的希望对大家有所帮助-Part of the corpus of TIMIT, not very full, just downloaded we want to be helpful.
所属分类：语音合成与识别
- 发布日期：2013-12-14
- 文件大小：9192088
- 提供者：zhao

ChineseSegment

0下载：
一个完整的中文分词程序，有源码，词典，训练集。算法简洁高效，准确率高。包含了一种将标注语料和词典融合的新型分词方法。将语料分割为2:1为训练集和测试集，加上一个外部词典，准确率可以达到95 。适合入门者学习。也适合需要一个简单分词工具的应用。-A Chinese word segmentation procedures, source, dictionary, the training set. The algorithm is simple and efficient, high accura
所属分类：AI-NN-PR
- 发布日期：2017-11-13
- 文件大小：14581979
- 提供者：张忠辉

pinyin_python

0下载：
能将任一分过词的文章，进行去重、排序，转换为拼音、将拼音转换为音素。可用于汉语语音识别前的语料准备。代码已在python 2.7上运行通过。-Able to any one point of the cross-word article, de-emphasis, sort, convert Pinyin Pinyin conversion to phonemes. Can be used for the corpus preparation before the Chinese speech
所属分类：Speech/Voice recognition/combine
- 发布日期：2017-11-07
- 文件大小：78680
- 提供者：main

AIMLTest

0下载：
此程序采用AIML实现机器对话，当你问问题的时候，他会做做相应的回答。文件里面包含以下简单的语料库进行测试，如果做对话的可以了解下。-This program uses AIML machine dialogue, when you ask questions, he will be doing the appropriate answer. File which contains the following simple corpus for testing, if the dialogue
所属分类：AI-NN-PR
- 发布日期：2017-11-13
- 文件大小：10246144
- 提供者：huangzhong

pfr199801

0下载：
PFR人民日报标注语料库(版本1.0，下面简称PFR语料库)是在得到人民日报社新闻信息中心许可的条件下，以1998年人民日报语料为对象，由北京大学计算语言学研究所和富士通研究开发中心有限公司共同制作的标注语料库。为了促进中文信息处理研究的发展，我们三方计划公开PFR语料库。作为公开的前期工作，从4月3日起，在我们三方的主页上免费公开PFR语料库1月份的语料，欢迎大家下载。PFR语料库的制作规范参阅《现代汉语语料库加工――词语切分与词性标注规范》。如果您在研究或论文工作中使用PFR语料库，请注明来
所属分类：MultiLanguage
- 发布日期：2017-11-05
- 文件大小：2216152
- 提供者：icypriest

segment

0下载：
用最大匹配法对汉语进行自动分词 seg.py 分词的实现 accuracy.py 分词性能评估 PD_1998_01_POS.txt ”人民日报“语料库-Automatically the word seg.py segmentation achieve maximum matching of Chinese accuracy.py word performance assessment PD_1998_01_POS.txt " People' s Daily"
所属分类：Other systems
- 发布日期：2017-11-16
- 文件大小：2825399
- 提供者：高圆圆

pos_tag

1下载：
用viterbi方法进行词性标注 pos_tag.py 词性标注 evaluate.py 词性标注性能评估 PD_1998_01_POS.txt ”人民日报“语料库标准词性标注结果.txt 语料库中后10 的数库（分词+词性标注）-Using the viterbi methods for part-of-speech tagging pos_tag.py part-of-speech tagging the evaluate.py speech tagging perform
所属分类：Other systems
- 发布日期：2016-10-07
- 文件大小：3090432
- 提供者：高圆圆

Text-Classification_libSVM

0下载：
用seg进行分词输入参数一：输入文本语料所在的文件夹路径。如文本文件语料都放在 train//text 文件夹下，则参数为：train//text//* 。注意：必须每篇文章在一个txt文本中。输入参数二：输入存储分词后的结果文件所在的文件夹路径：如：result//text。注意：不需要加* 本工具采用了中科院的中文分词工具，ICTCLAS，请自行到ICTCLAS官网下载该工具。并把Data文件夹，Configure.xml，ICTCLAS30.h，ICTCLAS3
所属分类：AI-NN-PR
- 发布日期：2017-11-06
- 文件大小：4230849
- 提供者：李勇军

fenci

0下载：
利用HMM，针对《1998年人民日报》语料库进行研究，最终实现了中文语句的自动分词-By HMM, research, and ultimately the Chinese statement for the 1998 People' s Daily " Corpus automatic segmentation
所属分类：CSharp
- 发布日期：2017-11-20
- 文件大小：4230586
- 提供者：txd

NER

0下载：
一个简单的基于OpenNLP的命名实体识别系统，语料采用CoNLL-2002-A simple named entity recognition system based on OpenNLP corpus of CoNLL-2002
所属分类：MultiLanguage
- 发布日期：2017-11-07
- 文件大小：2267
- 提供者：fish

pu1

3下载：
用于机器学习中垃圾邮件过滤的垃圾邮件语料库-For machine learning in spam filtering spam corpus
所属分类：AI-NN-PR
- 发布日期：2017-11-19
- 文件大小：1423024
- 提供者：

computer-voice-input

0下载：
将语音录入问题分为三个模块进行研究：语音识别模块、字转换模块和语料库建立模块。-Voice recording is divided into three modules for research: speech recognition module, word conversion module and corpus creation module.
所属分类：software engineering
- 发布日期：2017-11-14
- 文件大小：3282610
- 提供者：lhj

Speech-Corpus

0下载：
声语音连续语音语料库，包含用于语音识别培训和测试的数据-Voiced speech continuous speech corpus, contains the data for the voice recognition training and testing
所属分类：Speech/Voice recognition/combine
- 发布日期：2017-11-11
- 文件大小：8136589
- 提供者：zhangxin

RMM

0下载：
这个是RMM算法，支持正向、逆向最大匹配，是自然语言处理的重要算法之一，只要替代程序中的词库即可。本词库取自1988年人民日报语料材料，算法对中文分词精确度达到90 以上-This is RMM algorithm supports forward, reverse maximum matching, natural language processing algorithm, as long as the alternative procedures thesaurus can. The th
所属分类：Search Engine
- 发布日期：2017-11-09
- 文件大小：251403
- 提供者：he

bhav-saar-master

1下载：
一种自然语言处理的算法用于情感分析将一篇文章以关键字的角度来区别正面负面已经添加了中文词典，可以对中文使用（请对语料先分词）-A natural language processing algorithm for sentiment analysis will be an article with keywords to distinguish between the positive and negative perspective has been added Chinese dict
所属分类：数据结构常用算法
- 发布日期：2013-10-14
- 文件大小：28100473
- 提供者：jiang

TFIDF

0下载：
语料库中计算tfidf的值。java开发完成。-Corpus tfidf calculated value. java development is completed.
所属分类：Java Develop
- 发布日期：2017-11-19
- 文件大小：1697
- 提供者：qfxu

« 1 2 3 4 5 6 7 89 10 11 12 13 »

搜珍网 www.dssz.com

本网站为编程资源及源代码搜集、介绍的搜索网站，版权归原作者所有！　　粤ICP备11031372号

1999-2046 搜珍网 All Rights Reserved.