搜索资源 - 语料 - 搜珍网

CDN加速镜像 | 设为首页 | 加入收藏夹

热门搜索： 源码 Android 整站插件识别 p2p OpenCV 网络编程游戏源码算法更多...

登陆 | 会员注册

当前位置：

搜索资源 - 语料

下载资源主分类

源码下载

Web源码

开发工具

文档下载

其它资源

资源分类

搜索资源列表

Unsupervise

1下载：
利用隐马尔可夫模型实现词性标注。此为无监督模型。内含语料库和测试集。方便大家学习。-The use of Hidden Markov Model to achieve part of speech tagging. This is no oversight model. Corpus and the test set contains. To facilitate them to learn.
所属分类：中文信息处理
- 发布日期：2014-01-17
- 文件大小：12274076
- 提供者：lyn

word-segment-tool-for-chinese

0下载：
基于北大语料库的分词器，简单，有使用说明-a chinese word segment tool
所属分类：Linux-Unix program
- 发布日期：2017-03-24
- 文件大小：617639
- 提供者：witfox

Apriori_DIC

0下载：
数据挖掘经典算法 Apriori and DIC 同时有 Brin关于DIC的论文和训练语料-Classical data mining algorithm Apriori and DIC at the same time on the DIC thesis Brin and training corpus
所属分类：Algorithm
- 发布日期：2017-03-29
- 文件大小：116329
- 提供者：luowei

cipinbijiao

0下载：
对北大语料进行地名前后次的抽取，通过设置一个阈值，来控制选择。-Names of the Beijing University corpus to carry out before and after the meeting of the extraction, by setting a threshold, to control the choice.
所属分类：MultiLanguage
- 发布日期：2017-05-03
- 文件大小：323922
- 提供者：马龙

Language_model_learning_in_chinese

0下载：
语言模型学习论文-中文基于最大熵方法的统计语言模型.pdf 基于对话回合衰减的cache语言模型在线自适应研究.pdf 基于Web网页语料构建动态语言模型.pdf 统计语言模型综述.pdf -Language model to study papers- Chinese based on the maximum entropy method of statistical language model. Pdf Round attenuation based on di
所属分类：Development Research
- 发布日期：2017-04-10
- 文件大小：1246519
- 提供者：wen6860

POSTagger_Src

0下载：
包含了词条及其词性标记，频度信息的词典练语料的格式要求：每个词以 / 分隔， / 后是该词的词性标记。词性标记后至少要有一个空格。一个句子的所有词必须在同一行中。击“开始词性标注”选取文本文件（一次可以选择多个）进行标注处理-Includes a term and its part of speech marks, the frequency of information and training Corpus dictionary format requirements: Each w
所属分类：MultiLanguage
- 发布日期：2017-03-27
- 文件大小：174557
- 提供者：张耀

SpamFiltering

1下载：
该程序实现的是一个垃圾邮件过滤系统，方法采用的是NAIVE Bayes，语料库用的是LINspam—public，程序中有使用说明，希望大家一起探讨改进一下.-The program is a spam filtering system, methods used NAIVE Bayes, Corpus used LINspam-public, the procedures in use, hoping to improve what we explore.
所属分类：Kill Virus
- 发布日期：2014-09-11
- 文件大小：119414
- 提供者：李贺

segment

0下载：
1 本程序说明了用概率法猜测“人名”的一般过程 2 用户可以修改config.ini文件中的值 3 用于测试的三个文件中： test1是小学语文课本语料 test2是按句分行的语料 test3是包含歧义串的语料-A descr iption of the procedures for using the probability method guess " names" the general course of 2 users can modify
所属分类：MultiLanguage
- 发布日期：2017-03-29
- 文件大小：236528
- 提供者：allcy

generate_wordlist

0下载：
一个生成词典的程序，从语料中抽取每一个不同的词按格式要求组成词典。-a program for generating wordlist,the detail is to get every word from corper and form a wordlist.
所属分类：Windows Develop
- 发布日期：2017-05-18
- 文件大小：4742436
- 提供者：糊涂虫

viterbi

0下载：
NLP中viterby算法的实现,对语料进行处理，建模，然后可以对新的语料进行句法标注-NLP algorithm implementation in viterby
所属分类：Communication-Mobile
- 发布日期：2017-05-02
- 文件大小：723453
- 提供者：skyxiang

jzym

1下载：
垃圾邮件过滤器，你可以将快捷方式直接放到桌面上就可以用了，很方便哦！打开后先训练，然后选择你需要测试的txt文件进行测试，其中，自带的那个“邮件测试文件夹”是用来测试用这个邮件库过滤邮件正确率的，当然你也可以自己用自己准备好的邮件进行测试，不过“邮件测试文件夹”“合法邮件”“垃圾邮件”这3个文件夹名字不能改变. 你可以直接向Sample这个文件夹下的 “合法邮件”与“垃圾邮件”里直接增加自己的语料，当然你的语料库越大测试越准确了！-Spam filters, you ca
所属分类：Java Develop
- 发布日期：2017-04-03
- 文件大小：481620
- 提供者：yy

072282

0下载：
提出了一种自动构造特定领域本体的方法，该方法应用术语抽取和多重聚类技术。在术语抽取阶段，通过术语在专业语料与背景语料中出现概率的对比，采用LLR公式对术语进行评分，取得了更好的抽取效果。在层级关系发现过程中，采用上下文共现信息结合HowNet中词语的语义相似度，进行术语间相似度度量，力求获得术语间最合理的相关状况。同时改进了k-medoids聚类算法，更准确地发现术语的层级关系，进而构造出特定领域的本体。-This paper presents an approach to mining dom
所属分类：Other systems
- 发布日期：2017-04-17
- 文件大小：100753
- 提供者：xiaobai

reuters

0下载：
路透社预处理工具，简单方便实用快捷，可把语料集按类别分类-Reuters Preprocessing tools, fast and simple and practical, can be classified according to the corpus set
所属分类：Special Effects
- 发布日期：2017-04-05
- 文件大小：762225
- 提供者：zxj

word_split

0下载：
这个一个基于逆向最大匹配的分词程序，语料规模比较小。-The maximum matching based on the reverse of the sub-term process, relatively small-scale corpus.
所属分类：MultiLanguage
- 发布日期：2017-04-09
- 文件大小：1517543
- 提供者：nancy

segword

0下载：
segword训练语料处理程序，针对人民日报199801训练语料进行训练的程序-segword
所属分类：MultiLanguage
- 发布日期：2017-05-12
- 文件大小：2726561
- 提供者：weiwei

BootCaT-0.1.2.tar

1下载：
此软件是开源软件，主要用于中文信息处理，信息检索。本人主要用于网络获取双语语料库。此软件用perl编写，模块独立性强，在获得收集一些种子网址后，即可用于双语网络获取。-The perl scr ipts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list
所属分类：WEB(ASP,PHP,...)
- 发布日期：2017-03-29
- 文件大小：51575
- 提供者：liwen

bilingual-sentence-aligner.tar

0下载：
BILINGUAL SENTENCE ALIGNER 在获得双语平行文本后，希望得到句子级对齐语料库，这步往往决定了语料库的质量是否达标。此软件用perl编写，版权归Microsoft Corporation所有。可以用于非商业。-BILINGUAL SENTENCE ALIGNER (c) Microsoft Corporation. All rights reserved. Your use of the Microsoft software ("Software")
所属分类：WEB(ASP,PHP,...)
- 发布日期：2017-03-29
- 文件大小：19480
- 提供者：liwen

crawler

1下载：
实习时做的网络爬虫程序，爬取“金融时报”和“ftchinese”网站的双语文本语料。带源码和可执行文件，并附使用说明。做自然语言处理方面的好例子-When the network attachment procedure reptiles, climb a " Financial Times" and " ftchinese" bilingual text corpora website. With source and executable files, a
所属分类：Java Develop
- 发布日期：2016-04-25
- 文件大小：745366
- 提供者：杨文海

PU123ACorpora.tar

0下载：
这是一个供做垃圾邮件方面东西的朋友的语料库，很好用的，望对大家有帮助-This is a place for things to do in junk e-mail a friend corpus, well used, hope helpful to everyone
所属分类：MultiLanguage
- 发布日期：2017-05-21
- 文件大小：6427967
- 提供者：王嘉琪

clcl

0下载：
关于语音识别中语料库的建立与整理，以及分析统计-Speech Recognition Corpus on the establishment and finishing, as well as the analysis of statistical
所属分类：Speech/Voice recognition/combine
- 发布日期：2017-04-24
- 文件大小：163370
- 提供者：comma

« 1 2 3 4 56 7 8 9 10 ... 13 »

搜珍网 www.dssz.com

本网站为编程资源及源代码搜集、介绍的搜索网站，版权归原作者所有！　　粤ICP备11031372号

1999-2046 搜珍网 All Rights Reserved.