搜索资源列表
语音合成语料库管理系统的研究与设计
- 本文主要叙述了语料及其管理系统的研究与设计用最新的开发工具和现有软件达到系统的设计 目标-This paper describes the corpus and its management system research and design using the latest development tools and existing software system to achieve the design goal
v.206(预处理)
- lex语法分析,对BNC语料库进行文本标注前的预处理,将与SGML标注与文本词性标注无关的删除掉-this is lex syntax analyzing,annotate with BNC syntax LIB.
POSTagger
- (1)从已经标注好词性的语料中统计得到词性标记的二元转移矩阵,以及每个词以确定的词性标记出现的次数等数据(训练阶段) (2)利用动态规划算法快速选取词性标记路径,得到词性标记结果 (3)可以选择不同的词性标记集 -(1) has been marked from the Corpus POS good statistical POS be labeled binary transfer matrix, and every word to determine the POS m
CJCorpus
- 一个日汉平行的双语语料库,含有4053个句子-a parallel to the Japanese and Chinese bilingual corpus, containing 4,053 Sentence
TestCorpusyuliaoguanli
- 1. 这是一个简单的语料库管理系统 2. 可以添加和删除语料文件,统计语料中的字数 3. 可以查找语料中的汉字串以及重叠形式 4. 语料文件存放在corpus目录下,查询结果保存在跟语料库相同目录下 5. corpus目录下有4个文本文件(其中test1, test2是两个小文件)供测试用 6. 只能处理文本文件,GB内码-1. This is a simple Corpus management system 2. They can add and delete corpu
Wordsegmentation2
- NLP技术实现,对语料库进行自动统计生成分词词典,对训练集进行分词,列出所有的分词可能并计算每种可能的概率。请使用者自行加入语料库和测试集。-NLP technology to automatically Corpus Health Statistics ingredients dictionary, the training set for segmentation, list all the sub-term may calculate the probability of each pos
wordpos
- 给定带有分词和词性标注信息语料,从中总结单词的词频,并按照出现次数排序输出-given with sub-term and part-of-speech tagging information corpus, it is concluded that the words and phrases, and in accordance with the order of the output frequency
sports_veronicasun
- 1998年1月份人民日报语料中体育类文章的识别,C语言-January 1998 Corpus People's Daily Sports article identification, C Language
SegAndPosTools
- 实现语料的分割,和特征值的提取,还有bayes分类器-achieve Corpus segmentation, and eigenvalue extraction, and bayes classifier
tagging
- nlp 用隐马可夫实现语料标记,并对结果进行测试
AssignMSRAWSInfo
- 为msra语料加入标记信息,以便进行后续处理,文件中有标注前和标注后的例子,很容易看懂
CRF++-0.50
- CRF++ 5.0的 source 应用在自然语言识别等领域的机器训练,对2G以上的语料不会出现内存溢出等问题
chinese
- 中文信息处理基础 第一讲VC环境编程简介 第二讲文件处理 第三讲字符编码 第四讲字频统计 第五讲文本断句 第六讲语料库-Basic information first deal with English-speaking environment for programming VC brief introduction stresses the second file handle character encoding the third stresses t
chinese-text
- 文本分类语料库,经过编辑手工整理与分类的新闻语料与对应的分类信息。其分类体系包括几十个分类节点,网页规模约为十万篇文档-Text classification corpus, edited manually compiled and classification of news corpus and the corresponding classification information. Their classification system includes dozens of classi
fenci
- 自己下载一个语料库,根据程序,计算权重,然后对语料库进行分词-Download a corpus itself, according to the procedures for calculating the weights, and then carried out on sub-word corpus
11
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
22
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
3
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
4
- 关于语音识别中语料库的建立与整理,以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
aclImdb_v1.tar
- 英文影评语料库,用于英文情感分析。包含训练集和测试集,均为标注数据。(English movie reviews corpus)