文章詳情頁

python爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》的示例代碼

瀏覽：19日期：2022-07-08 09:42:02

爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》

代碼

import requestsfrom bs4 import BeautifulSoup# 反爬headers = { ’User-Agent’: ’Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36’}# 獲得請求def open_url(url): response = requests.get(url, headers=headers) response.encoding = response.apparent_encoding html = response.text return html# 提取標(biāo)題def get_title(url): soup = BeautifulSoup(url, ’lxml’) title_tag = soup.find(’dd’) title = ’n’ + title_tag.h1.get_text() + ’n’ return title# 提取文本def get_texts(url): soup2 = BeautifulSoup(url, ’lxml’) text_tags = soup2.find_all(’dd’, id='contents') return text_tags# 保存標(biāo)題def save_title(filename, title): with open(filename, ’a+’, encoding=’utf-8’) as file: file.write(title)# 保存文本def save_text(filename, text): with open(filename, ’a+’, encoding=’utf-8’) as file: file.write(text)# 主程序函數(shù)def main(): num = input(’《純陽劍尊》你想要下載第幾章？（1-802）’) num = int(num) number = 8184027 + num url = ’https://www.23us.so/files/article/html/15/15905/’ + str(number) + ’.html’ filename = ’純陽劍尊.txt’ r = open_url(url) title = get_title(r) tags = get_texts(r) save_title(filename, title) for text_tag in tags: text = text_tag.get_text() + ’n’ save_text(filename, text) print(’第{}章已經(jīng)下載完成！’.format(num))if __name__ == ’__main__’: main()

爬取結(jié)果：

python爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》的示例代碼

以上就是python爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》的示例代碼的詳細(xì)內(nèi)容，更多關(guān)于python 爬取頂點(diǎn)小說網(wǎng)的資料請關(guān)注好吧啦網(wǎng)其它相關(guān)文章！

Python 編程

上一條：Python通過getattr函數(shù)獲取對象的屬性值下一條：Python使用內(nèi)置函數(shù)setattr設(shè)置對象的屬性值

相關(guān)文章：

1. Python獲取抖音關(guān)注列表封號賬號的實(shí)現(xiàn)代碼2. ajax請求添加自定義header參數(shù)代碼3. Python數(shù)據(jù)分析之pandas函數(shù)詳解4. 解決Python 進(jìn)程池Pool中一些坑5. php測試程序運(yùn)行速度和頁面執(zhí)行速度的代碼6. 無線標(biāo)記語言(WML)基礎(chǔ)之WMLScript 基礎(chǔ)第1/2頁7. 三個不常見的 HTML5 實(shí)用新特性簡介8. 使用.net core 自帶DI框架實(shí)現(xiàn)延遲加載功能9. php網(wǎng)絡(luò)安全中命令執(zhí)行漏洞的產(chǎn)生及本質(zhì)探究10. Warning: require(): open_basedir restriction in effect,目錄配置open_basedir報錯問題分析

排行榜

					
					Python如何讀寫CSV文件
淺談Django QuerySet對象(模型.objects)的常用方法
Python使用jupyter notebook查看ipynb文件過程解析
解決Python 進(jìn)程池Pool中一些坑
IntelliJ IDEA調(diào)整字體大小的方法
docker容器調(diào)用yum報錯的解決辦法
php測試程序運(yùn)行速度和頁面執(zhí)行速度的代碼
php網(wǎng)絡(luò)安全中命令執(zhí)行漏洞的產(chǎn)生及本質(zhì)探究
IntelliJ IDEA 2020最新激活碼(親測有效，可激活至 2089 年)
無線標(biāo)記語言(WML)基礎(chǔ)之WMLScript 基礎(chǔ)第1/2頁
Sun免費(fèi)推廣Java 明年3月前公開全部源代碼