资源列表
webcrawler
- 一个java 开发的网络爬虫,采集功能比较强大-Development of a java web crawler, collecting more powerful features
introduce-to--search-engine
- 梁斌写的经典搜索引擎入门书籍《走进搜索引擎》,作者为南大毕业,现在在清华读博-Liang Bin, search engines started to write the classic book " into the search engine" , author NTU graduate, and now pursue a Ph.D. degree in Tsinghua University
PHPSou_v1.2_GBK_20111226
- php开发的搜索引擎,蜘蛛抓爬系统等等,适合个人搜索-php development search engine spider Scratch system, suitable for personal search
nutch
- nutch视频 简单搭建环境 搜索引擎 视频讲解 容易-own yourself search engine
yssfor
- 1、真正的搜索引擎: 2、 网页蜘蛛灵活高效。 3、可控的正文提取。 4、可控的中文分词及新词学习。 5、无人值守。 6、BS架构,虚拟主机支持。 7、强大功能,简单使用。 8、个性化。 9、增强网站软实力-1, the real search engine: 2, Web Spider flexible and efficient. 3, the body of controllable extraction. 4, controlled the Chinese
Lucene+Nutch
- 该书首先描述了开发平台的配置, 接着详细介绍LUCENE和NUTCH开发。-The book first describes the development platform configuration, and then details the development of Lucene and NUTCH.
luceneAndnutch
- Lucene+nutch构建搜索引擎原书光般内容-the source code of use Lucene+ nutch to build a search engine
heritrix1.14.4
- heritrix1.14.4.zip版,欢迎下载-heritrix1.14.4.zip version, welcome to download
JTextPro-1.0.tar
- JTextPro: A Java-based Text Processing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker
Char04
- 网络搜索引擎代码,内涵各种爬行算法和相关子程序-This program code designed an eDonkey network crawling system which could avoid being added to the blacklist of the central server and break the count restriction of the results when crawler search something from the server.Af
crawler_without_ring_vs2008_PQ
- 网络爬虫,为从网络的网页爬相关的网页来进行展示!-net crawling
wtxx
- 一个课程设计,用于将下载的网页,去除无用信息,基于本地的lucene搜索引擎,可以输入关键字,然后查找那些文件包含这个keyword-A course design, used for download web pages, remove useless information, based on local lucene search engine, can enter keyword and then find those that file contains the keyword