赞
踩
需要提前安装requests和bs4库:
- #爬虫两种获取HTML文本信息的方法,来自bs4的BeautifulSoup和lxml的etree
- #本文介绍第一种写法来自bs4的BautifulSoup
- import requests
- from bs4 import BeautifulSoup
- #爬取网站地址
- url="https://tophub.today/n/KqndgxeLl9"
- #伪装浏览器浏览信息,获取user-Agent(在chrome浏览器输入 chrome://version )
- header={'user-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"}
- #抓取网站信息
- response = requests.get(url,headers=header)
- res=response.text#转化为spring类型
- soup=BeautifulSoup(res,features="lxml")
- heat=soup.select('tr td')
-
- for h in heat:
- print(h.get_text())
- s='num,title,heat\n'
-
- for i in range(len(heat)):
- if (i+1)%4!=0:
- s+=heat[i].get_text()+","
- else:
- s+="\n"
-
- with open('1.csv','w',newline='',encoding='utf8') as fw:
- fw.write(s)

得出的结果:

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。