Selector

[TOC]

用来解析网页的库有很多, 比如beautifulsoup, lxml, 但是scrapy默认使用的是selector.

Scrapy的选择器是Selector的示例.

通过传送文本(text)或TextResponse对象来创建, 并自动选择解析规则(XML/HTML)

>>> from scrapy.selector import Selector
>>> from scrapy.http import HtmlResponse

从text创建

>>> body = '<html><body><span>good</span></body></html>'
>>> Selector(text=body).xpath('//span/text()').extract()[u'good']

从response对象创建

>>> response = HtmlResponse(url='http://example.com', body=body)
>>> Selector(response=response).xpath('//span/text()').extract()
[u'good']

方便起见, reponse对象有selector属性, 来创建selector:

>>> response.selector.xpath('//span/text()').extract()[u'good']

最后更新于5年前

这有帮助吗？