搜索资源列表
htmlparser
- HTML的解析器,是Majestic-12分布式搜索引擎的一部分。作者Alex Chudnovsky, Majestic-12 Ltd (UK)。这个是3.0版本,性能经过多次优化,文档也比较全。也可以到http://www.majestic12.co.uk下载。-HTML parser, Majestic-12 distributed search engine part. Author Alex Chudnovsky, Majestic-12 Ltd (UK). This is versio
lucene-2.4.0
- 最好的分析器代码,不过是class文件形式,可以反编译的,快快看看吧-The best parser code, but a form of class files, you can decompile and quickly take a look at it
htmlcxx-0.83.tar
- htmlcxx0.83著名的HTML&css解析器-htmlcxx 0.83 well-known HTML css parser
Parser-LiveInternet
- Parser LiveInternet - program for parsing liveinternet
pubchem
- web crawler,python ,针对puchem,收取化学物质信息,以csv格式记录。采用beautifulsoup 开发,采用lxml解析器,爬取速度较慢,请多等待。可以修改指定爬取范围,也可以根据cid来爬取(Web crawler, python, for puchem, collection of chemical information, recorded in CSV format. Use beautifulsoup development, use lxml parser
