文章詳情頁

python獲取整個網頁源碼的方法

瀏覽：13日期：2022-07-15 13:20:58

1、Python中獲取整個頁面的代碼：

import requestsres = requests.get(’https://blog.csdn.net/yirexiao/article/details/79092355’)res.encoding = ’utf-8’print(res.text)

2、運行結果

實例擴展：

from bs4 import BeautifulSoupimport time,re,urllib2t=time.time()websiteurls={}def scanpage(url): websiteurl=url t=time.time() n=0 html=urllib2.urlopen(websiteurl).read() soup=BeautifulSoup(html) pageurls=[] Upageurls={} pageurls=soup.find_all('a',href=True) for links in pageurls: if websiteurl in links.get('href') and links.get('href') not in Upageurls and links.get('href') not in websiteurls: Upageurls[links.get('href')]=0 for links in Upageurls.keys(): try: urllib2.urlopen(links).getcode() except: print 'connect failed' else: t2=time.time() Upageurls[links]=urllib2.urlopen(links).getcode() print n, print links, print Upageurls[links] t1=time.time() print t1-t2 n+=1 print ('total is '+repr(n)+' links') print time.time()-tscanpage(http://news.163.com/)

到此這篇關于python獲取整個網頁源碼的方法的文章就介紹到這了,更多相關python如何獲取整個頁面內容請搜索好吧啦網以前的文章或繼續瀏覽下面的相關文章希望大家以后多多支持好吧啦網！

Python 編程

上一條：python爬蟲使用正則爬取網站的實現下一條：python線程里哪種模塊比較適合

相關文章：

1. python爬蟲實戰之制作屬于自己的一個IP代理模塊2. HTML 絕對路徑與相對路徑概念詳細3. Android Studio設置顏色拾色器工具Color Picker教程4. IntelliJ IDEA刪除類的方法步驟5. python 利用toapi庫自動生成api6. python實現PolynomialFeatures多項式的方法7. Spring如何使用xml創建bean對象8. IntelliJ IDEA設置默認瀏覽器的方法9. python實現在內存中讀寫str和二進制數據代碼10. Java程序的編碼規范（6）

排行榜

					
					python爬蟲實戰之制作屬于自己的一個IP代理模塊
python實現在內存中讀寫str和二進制數據代碼
IntelliJ IDEA刪除類的方法步驟
HTML 絕對路徑與相對路徑概念詳細
python 利用toapi庫自動生成api
Java程序的編碼規范（6）
python實現PolynomialFeatures多項式的方法
IntelliJ IDEA設置默認瀏覽器的方法
Spring如何使用xml創建bean對象
Android Studio設置顏色拾色器工具Color Picker教程
Android使用WebView實現離線閱讀功能