i need extract complete text except <p><a href><rel>
etc following html code.
<p>many of features made samsung galaxy s4 1 of anticipated phones in recent history -- such 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">full hd super amoled display</a>, powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/exynos/blog_spotlight_on_the_exynos5octa.html" rel="nofollow" target="_blank">samsung exynos 5 octa</a> in international version , <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">qualcomm snapdragon 600</a> in u.s. version) , 16gb, 32gb , 64gb storage options -- bringing grief rushed purchase fourth-generation galaxy s series smartphone upon late april release.</p>
i have tried below code
from bs4 import beautifulsoup urllib2 import urlopen base_url = "http://www.chicagoreader.com" def get_category_links(section_url): html = urlopen(section_url).read() soup = beautifulsoup(html, "lxml") div in soup.findall("div", attrs={'class':'field-content'}): print div.find("p").content[0]
but giving following output
many of features made samsung galaxy s4 1 of anticipated phones in recent history -- such 5-inch 1920 x 1080
i unable complete text, should give text after href , rel etc tags, please suggest me how below output.
many of features made samsung galaxy s4 1 of anticipated phones in recent history -- such 5-inch 1920 x 1080 full hd super amoled display powerful processors .samsung exynos 5 octa in international version , ">qualcomm snapdragon 600 in u.s. version) , 16gb, 32gb , 64gb storage options -- bringing grief rushed purchase fourth-generation galaxy s series smartphone upon late april release.
thanks..
you can use .text
:
>>> bs4 import beautifulsoup >>> html = '<p>many of features made samsung galaxy s4 1 of anticipated phones in recent history -- such 5-inch 1920 x 1080 <a href="http://www.bubblews.com/news/421662-samsung-galaxy-s4-worlds-first-full-hd-super-amoled-display" rel="nofollow" target="_blank">full hd super amoled display</a>, powerful processors (<a href="http://www.samsung.com/global/business/semiconductor/minisite/exynos/blog_spotlight_on_the_exynos5octa.html" rel="nofollow" target="_blank">samsung exynos 5 octa</a> in international version , <a href="http://www.qualcomm.com/snapdragon/blog/topics/snapdragon 600" rel="nofollow" target="_blank">qualcomm snapdragon 600</a> in u.s. version) , 16gb, 32gb , 64gb storage options -- bringing grief rushed purchase fourth-generation galaxy s series smartphone upon late april release.</p>' >>> soup = beautifulsoup(html) >>> print soup.p.text many of features made samsung galaxy s4 1 of anticipated phones in recent history -- such 5-inch 1920 x 1080 full hd super amoled display, powerful processors (samsung exynos 5 octa in international version , qualcomm snapdragon 600 in u.s. version) , 16gb, 32gb , 64gb storage options -- bringing grief rushed purchase fourth-generation galaxy s series smartphone upon late april release.
Comments
Post a Comment