python - Filter out href in a list instead of soup.find

python - Filter out href in a list instead of soup.find_all -

hi filter out of announcement on website following script

 gdata_even=soup.find_all("li", {"class":"list2col "})  gdata_odd=soup.find_all("li", {"class":"list2col odd "})

finally take of announcements in gdata depending on whether item has word:

for l in range(len_data):             if _checkdate(gdata_even[l].text):                 if _checkwordsv2(gdata_even[l].text):                     pass                 else:                     initial_list.append(gdata_even[l].text.encode("utf-8"))              if _checkdate(gdata_odd[l].text):                 if _checkwordsv2(gdata_odd[l].text):                     pass                 else:                     initial_list.append(gdata_odd[l].text.encode("utf-8"))

the problem facing gdata_even[l] , gdata_odd[l] has following output:

<li class="list2col "><div class="indexcol"><span class="date">25 aug 2015 12:00:06 cest</span></div><div class="contentcol"><div class="categories">frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134">fra:deletion of instruments xetra - 25.08.2015-001</a></h3></div></li>

here want link of item embedded in href following code doesn't work:

    h3url = gdata[l].find("a").get("href")     print h3url

can please assist, thank you.

maybe there error on how getting gdata because code should work.

>>> beautifulsoup import beautifulsoup >>> doc='<li class="list2col "><div class="indexcol"><span class="date">25 aug 2015 12:00:06 cest</span></div><div class="contentcol"><div class="categories">frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134">fra:deletion of instruments xetra - 25.08.2015-001</a></h3></div></li>' >>> soup = beautifulsoup(doc) >>> h3url = soup.find('a').get('href') >>> print h3url  /xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134

Search This Blog

Premier

python - Filter out href in a list instead of soup.find_all -

Comments

Post a Comment

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -