regex - Find string between two sets of string (python / urllib2 / beautiful soup) -
i have following source code of web page web trying parse data from
<span class="reviewcount"> <a href="...reviews-whatiwant-city..." target="_blank" onclick="xx;">1,361 reviews</a> </span>
edit (with beautiful soup):
to extract information parse data using beautiful soup. use following code:
spans = soup.findall('span', attrs={"class":u"reviewcount"}) span in spans: = span.find('a') print re.search('(?<=reviews-)(.*?)(?=-city)', a.get('href'))
but information
<_sre.sre_match object @ 0x7f84fce05300> <_sre.sre_match object @ 0x7f84fce05300> <_sre.sre_match object @ 0x7f84fce05300> <_sre.sre_match object @ 0x7f84fce05300>
and not bytes between "reviews-" , "-city". assist me in finding right syntax? thanks.
re.search()
returns "match" object. need saving group value if there match:
spans = soup.find_all('span', attrs={"class":u"reviewcount"}) span in spans: = span.find('a') match = re.search(r'reviews\-(.*?)\-city', a.get('href')) if match: print(match.group(1))
Comments
Post a Comment