python 2.7 - Need help extracting links from a TD in webpage -

i new @ python , trying hands @ building small web crawlers. trying code program in python 2.7 beautifulsoup extract profile urls page , subsequent pages

http://www.bda-findadentist.org.uk/pagination.php?limit=50&page=1

here trying scrape urls linked details page, such this

http://www.bda-findadentist.org.uk/practice_details.php?practice_id=6034&no=61881

however, lost how make program recognize these urls. not within div class or id, rather encapsulated within td bgcolor tag

<td bgcolor="e7f3f1"><a href="practice_details.php?practice_id=6034&amp;no=61881">view details</a></td>

please advise on how can make program identify these urls , scrape them. tried following, neither worked

for link in soup.select('td bgcolor=e7f3f1 a'): link in soup.select('td#bgcolor#e7f3f1 a'): link in soup.findall('a[practice_id=*]'):

my full program follows:

import requests bs4 import beautifulsoup  def bda_crawler(pages):     page = 1     while page <= pages:         url = 'http://www.bda-findadentist.org.uk/pagination.php?limit=50&page=' + str(page)         code = requests.get(url)         text = code.text         soup = beautifulsoup(text)         link in soup.findall('a[practice_id=*]'):              href = "http://www.bda-findadentist.org.uk" + link.get('href')              print (href)         page += 1  bda_crawler(2)

please help

many thanks

Search This Blog

Premier

python 2.7 - Need help extracting links from a TD in webpage -

Comments

Post a Comment

Popular posts from this blog

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -