Python作为最容易上手的编程语言之一,使用它做爬虫非常方便。前天某位站长找我帮他爬一部小说,简单看了下这个网站,没啥反爬措施,为了方便我甚至没开多线程,下面是python源码。
复制
import requests from bs4 import BeautifulSoup def getdata(url): response=requests.get(url) response.encoding = 'UTF-8' gksoup = BeautifulSoup(response.text, "html") article=gksoup.find('div',attrs={'class':'tagCol'}) content=article.find('p').get_text() return content response=requests.get(url="https://www.huangdizhijia.com/novel-7283.html") response.encoding = 'UTF-8' gksoup = BeautifulSoup(response.text, "html") article=gksoup.find('div',attrs={'class':'tagCol'}) mllist=article.find_all('a') with open('output.txt', 'w', encoding='utf-8') as file: for i in range(0, len(mllist)): url='https://www.huangdizhijia.com'+mllist[i].get("href") file.write(mllist[i].text + '\n') print(mllist[i].text) file.write(getdata(url) + '\n')
就这样简单,哈哈哈,运行过程中每爬一章就会输出章节名字,爬完后会在当前目录中输出整个小说的txt文件。
评论 (1)