当前位置:   article > 正文

用python爬取豆瓣电影信息_python爬取电影信息for循环

python爬取电影信息for循环
任何一个网站,第一件事,观察你要的东西在不在页面源代码
如果在
   直接请求url即可
如果不在
   装包工具观察,数据究竟是从哪个url加载进来的


方案一,参数太长了,看起来费劲
  1. import requests
  2. url="https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=0&limit=20"
  3. headers = {
  4. "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
  5. }
  6. resp = requests.get(url,headers=headers)
  7. requests.exceptions.JSONDecodeError: Expecting value 返回的东西不是json
  8. print(resp.text)#
  9. dic = resp.json()
  10. print(dic)
#方案二
  1. import requests
  2. headers = {
  3. "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
  4. }
  5. url = "https://movie.douban.com/j/chart/top_list"
  6. dic= {
  7. "type": "13",
  8. "interval_id": "100:90",
  9. "action": "",
  10. "start":"0", # 0==>1,20==》2,40=>3
  11. "limit":"20",
  12. }
  13. #发送get请求,并将参数带过去
  14. resp = requests.get(url,params=dic,headers=headers)
  15. print(resp.json())

实现
  1. import requests
  2. import json
  3. with open("douban.txt",mode="w",encoding="utf-8") as f:
  4. for i in range(5):
  5. start=i*20 #0 20 40 60 80
  6. headers = {
  7. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
  8. }
  9. url = "https://movie.douban.com/j/chart/top_list"
  10. dic = {
  11. "type": "13",
  12. "interval_id": "100:90",
  13. "action": "",
  14. "start": start, # 0==>1,20==》2,40=>3
  15. "limit": "20",
  16. }
  17. #每次循环得到一批新的参数
  18. # print(dic)
  19. resp = requests.get(url,params=dic,headers=headers)
  20. # print(resp.json())
  21. #后续的工作
  22. for item in resp.json():
  23. # print(item)
  24. type=item['types']
  25. types=type[1]
  26. title=item['title']
  27. url=item['url']
  28. f.write(types)
  29. f.write("|")
  30. f.write(title)
  31. f.write("|")
  32. f.write(url)
  33. f.write("\n")
或者开头可以换成这个
  1. import requests
  2. headers = {
  3. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
  4. }
  5. for i in range(5):
  6. url = f"https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start={i*20}&limit=20"
  7. resp=requests.get(url,headers=headers)
  8. lst=resp.json()

                
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/393356
推荐阅读
相关标签
  

闽ICP备14008679号