文章詳情頁

python爬蟲beautifulsoup解析html方法

瀏覽：44日期：2022-07-03 08:08:36

用BeautifulSoup 解析html和xml字符串

實(shí)例：

#!/usr/bin/python# -*- coding: UTF-8 -*-from bs4 import BeautifulSoupimport re#待分析字符串html_doc = '''<html><head> <title>The Dormouse’s story</title></head><body> The Dormouse’s story Once upon a time there were three little sisters; and their names were <a rel='external nofollow' id='link1'>Elsie</a>, <a rel='external nofollow' id='link2'>Lacie</a> and <a rel='external nofollow' id='link3'>Tillie</a>; and they lived at the bottom of a well....'''# html字符串創(chuàng)建BeautifulSoup對象soup = BeautifulSoup(html_doc, ’html.parser’, from_encoding=’utf-8’)#輸出第一個(gè) title 標(biāo)簽print soup.title#輸出第一個(gè) title 標(biāo)簽的標(biāo)簽名稱print soup.title.name#輸出第一個(gè) title 標(biāo)簽的包含內(nèi)容print soup.title.string#輸出第一個(gè) title 標(biāo)簽的父標(biāo)簽的標(biāo)簽名稱print soup.title.parent.name#輸出第一個(gè) p 標(biāo)簽print soup.p#輸出第一個(gè) p 標(biāo)簽的 class 屬性內(nèi)容print soup.p[’class’]#輸出第一個(gè) a 標(biāo)簽的 href 屬性內(nèi)容print soup.a[’href’]’’’soup的屬性可以被添加,刪除或修改. 再說一次, soup的屬性操作方法與字典一樣’’’#修改第一個(gè) a 標(biāo)簽的href屬性為 http://www.baidu.com/soup.a[’href’] = ’http://www.baidu.com/’#給第一個(gè) a 標(biāo)簽添加 name 屬性soup.a[’name’] = u’百度’#刪除第一個(gè) a 標(biāo)簽的 class 屬性為del soup.a[’class’]##輸出第一個(gè) p 標(biāo)簽的所有子節(jié)點(diǎn)print soup.p.contents#輸出第一個(gè) a 標(biāo)簽print soup.a#輸出所有的 a 標(biāo)簽，以列表形式顯示print soup.find_all(’a’)#輸出第一個(gè) id 屬性等于 link3 的 a 標(biāo)簽print soup.find(id='link3')#獲取所有文字內(nèi)容print(soup.get_text())#輸出第一個(gè) a 標(biāo)簽的所有屬性信息print soup.a.attrsfor link in soup.find_all(’a’): #獲取 link 的 href 屬性內(nèi)容 print(link.get(’href’))#對soup.p的子節(jié)點(diǎn)進(jìn)行循環(huán)輸出 for child in soup.p.children: print(child)#正則匹配，名字中帶有b的標(biāo)簽for tag in soup.find_all(re.compile('b')): print(tag.name)

爬蟲設(shè)計(jì)思路：

python爬蟲beautifulsoup解析html方法

詳細(xì)手冊：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

到此這篇關(guān)于python爬蟲beautifulsoup解析html方法的文章就介紹到這了,更多相關(guān)beautifulsoup解析html內(nèi)容請搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：python 通過 pybind11 使用Eigen加速代碼的步驟下一條：python可視化 matplotlib畫圖使用colorbar工具自定義顏色

相關(guān)文章：

1. python爬蟲實(shí)戰(zhàn)之制作屬于自己的一個(gè)IP代理模塊2. asp批量添加修改刪除操作示例代碼3. .NET6打包部署到Windows Service的全過程4. 基于javaweb+jsp實(shí)現(xiàn)企業(yè)財(cái)務(wù)記賬管理系統(tǒng)5. 如何在jsp界面中插入圖片6. Vue element ui用戶展示頁面的實(shí)例7. Ajax返回值類型與用法實(shí)例分析8. 使用FormData進(jìn)行Ajax請求上傳文件的實(shí)例代碼9. css代碼優(yōu)化的12個(gè)技巧10. HTML 絕對路徑與相對路徑概念詳細(xì)

排行榜

					
					python爬蟲實(shí)戰(zhàn)之制作屬于自己的一個(gè)IP代理模塊
Python lxml庫的簡單介紹及基本使用講解
python實(shí)現(xiàn)在內(nèi)存中讀寫str和二進(jìn)制數(shù)據(jù)代碼
IDEA部署Docker到WSL2的詳細(xì)過程
Android安全問題-網(wǎng)絡(luò)傳輸
基于javaweb+jsp實(shí)現(xiàn)企業(yè)財(cái)務(wù)記賬管理系統(tǒng)
HTML 絕對路徑與相對路徑概念詳細(xì)
如何在jsp界面中插入圖片
python 利用toapi庫自動(dòng)生成api
Java程序的編碼規(guī)范（6）
解決ajax請求后臺,有時(shí)收不到返回值的問題