python - Filter out href in a list instead of soup.find_all -
hi filter out of announcement on website following script
gdata_even=soup.find_all("li", {"class":"list2col "}) gdata_odd=soup.find_all("li", {"class":"list2col odd "})
finally take of announcements in gdata depending on whether item has word:
for l in range(len_data): if _checkdate(gdata_even[l].text): if _checkwordsv2(gdata_even[l].text): pass else: initial_list.append(gdata_even[l].text.encode("utf-8")) if _checkdate(gdata_odd[l].text): if _checkwordsv2(gdata_odd[l].text): pass else: initial_list.append(gdata_odd[l].text.encode("utf-8"))
the problem facing gdata_even[l] , gdata_odd[l] has following output:
<li class="list2col "><div class="indexcol"><span class="date">25 aug 2015 12:00:06 cest</span></div><div class="contentcol"><div class="categories">frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134">fra:deletion of instruments xetra - 25.08.2015-001</a></h3></div></li>
here want link of item embedded in href following code doesn't work:
h3url = gdata[l].find("a").get("href") print h3url
can please assist, thank you.
maybe there error on how getting gdata because code should work.
>>> beautifulsoup import beautifulsoup >>> doc='<li class="list2col "><div class="indexcol"><span class="date">25 aug 2015 12:00:06 cest</span></div><div class="contentcol"><div class="categories">frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134">fra:deletion of instruments xetra - 25.08.2015-001</a></h3></div></li>' >>> soup = beautifulsoup(doc) >>> h3url = soup.find('a').get('href') >>> print h3url /xetra-en/newsroom/xetra-newsboard/fra-deletion-of-instruments-from-xetra---25.08.2015-001/1913134
Comments
Post a Comment