python - Scraping Multiple Websites with Single Spider using Scrapy -

i using scrapy scrape data this website. following code spider .

class stackitem(scrapy.item): def __setitem__(self, key, value):     if key not in self.fields:         self.fields[key] = scrapy.field()     self._values[key] = value  class betaspider(crawlspider):     name = "betaspider"       def __init__(self, *args, **kwargs):          super(betaspider, self).__init__(*args, **kwargs)          self.start_urls = [kwargs.get('start_url')]      rules = (rule (linkextractor(unique=true, allow=('.*\?id1=.*',),restrict_xpaths=('//a[@class="prevnext next"]',)), callback="parse_items", follow= true),)      def parse_items(self, response):         hxs = htmlxpathselector(response)         posts = hxs.select("//article[@class='classified']")         items = []          post in posts:             item = stackitem()             item["job_role"] = post.select("div[@class='uu mb2px']/a/strong/text()").extract()             item["company"] = post.select("p[1]/text()").extract()             item["location"] = post.select("p[@class='mb5px b red']/text()").extract()             item["desc"] = post.select("details[@class='aj mb10px']/text()").extract()             item["read_more"] = post.select("div[@class='uu mb2px']/a/@href").extract()             items.append(item)             item in items:                 yield item

this code item pipelines:

class myexporter(object):  def __init__(self):     self.mycsv = csv.writer(open('out.csv', 'wb'))     self.mycsv.writerow(['job role', 'company','location','description','read more'])  def process_item(self, item, spider):     self.mycsv.writerow([item['job_role'], item['company'], item['location'], item['desc'], item['read_more']])      return item

this working fine. now, have scrape following websites (for example) using same spider.

i have scrape tags of above mentioned websites, store csv file using item pipelines.

actually, list of websites scrapped endless. in project, user enter url , scrapped results returned user. so, want generic spider can scrape website.

for single website, working fine. but, how can accomplished multiple site having different structure ? scrapy enough solve it?

different spider better can use api run scrapy script, instead of typical way of running scrapy crawl remember scrapy built on top of twisted asynchronous networking library, need run inside twisted reactor.

Comments

Khalif MohammedMay 4, 2020 at 12:22 AM
This comment has been removed by the author.
ReplyDelete
Replies
desktopMay 3, 2023 at 10:06 PM
Buy Dishwash Liquid Online at Best Prices | Happy Ganga
Best Surface Cleaner
ReplyDelete
Replies

Add comment

Search This Blog

Premier

python - Scraping Multiple Websites with Single Spider using Scrapy -

Comments

Post a Comment

Popular posts from this blog

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -