scrapy command note
scrapy basic command
Scrapy has some useful subcommands, like "startproject" I introduced in a previous entry.
pythonのフレームワークでサクッとクローラをつくる。"Python Framework Scrapy" - ケンキュウハック
This is a note for scrapy subcommands.
startobject
Create a Scrapy project.$ scrapy startproject newproject
You can edit python files under newproject directory.
genspider
Create a new spider and check available templates.$ scrapy genspider -t basic newspider01 example.com Created spider 'newspider01' using template 'basic' in module: scrapy_sample.spiders.newspider01
Create a "newspider01" crawls to "http://www.example.com/".
Following command shows available templates.
scrapy genspider -l basic crawl csvfeed xmlfeed
crawl
Start crawling a spider.$ scrapy crawl newspider01
list
Show all spiders$ scrapy list newspider01 newspider02
view
Open a web page in a browser.scrapy view http://www.example.com/
This opens the given url page in your browser.
shell
Check parameters in a python console.scrapy shell http://www.example.com/some/page.html
...
[s] Available Scrapy objects:
[s] hxs <HtmlXPathSelector xpath=None data=u'<html><head><title>Example Domain</title'>
[s] item {}
[s] request <GET http://www.example.com/some/page.html>
[s] response <200 http://www.iana.org/domains/example>
[s] settings <CrawlerSettings module=<module 'scrapy_sample.settings' from '/Users/shinya/scrapy_sample/scrapy_sample/settings.pyc'>>
[s] spider <BaseSpider 'default' at 0x10a0ef190>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
>>>print hxs
<HtmlXPathSelector xpath=None data=u'<html><head><title>Example Domain</title'>Check parameters, the spider took from the url, in a python console.
You can see more information here.