当前位置:   article > 正文

APP爬虫之mitmdump的使用(待改)_response(flow) jd

response(flow) jd

1. 准备工作

  • 安装mitmproxy和mitmdump(windows下不可以使用mitmproxy)
  • 手机设置代理,端口8080
  • 配置好mitmproxy的CA证书
  • mongoDB安装运行

2. 编写脚本  script.py

3. 运行mitmdump

    Mitmdump -s script.py

4. 操作手机便可得到对应输出

  1. import json
  2. import pymongo
  3. from urllib.parse import unquote
  4. import re
  5. client = pymongo.MongoClient('localhost', 27017)
  6. db = client['jd']
  7. comments_collection = db['comments']
  8. products_collection = db['products']
  9. def response(flow):
  10. global comments_collection, products_collection
  11. # 提取评论数据
  12. url = 'api.m.jd.com/client.action'
  13. if url in flow.request.url:
  14. pattern = re.compile('sku\".*?\"(\d+)\"')
  15. # Request请求参数中包含商品ID
  16. body = unquote(flow.request.text)
  17. # 提取商品ID
  18. id = re.search(pattern, body).group(1) if re.search(pattern, body) else None
  19. # 提取Response Body
  20. text = flow.response.text
  21. data = json.loads(text)
  22. comments = data.get('commentInfoList') or []
  23. # 提取评论数据
  24. for comment in comments:
  25. if comment.get('commentInfo') and comment.get('commentInfo').get('commentData'):
  26. info = comment.get('commentInfo')
  27. text = info.get('commentData')
  28. date = info.get('commentDate')
  29. nickname = info.get('userNickName')
  30. pictures = info.get('pictureInfoList')
  31. print(id, nickname, text, date)
  32. comments_collection.insert({
  33. 'id': id,
  34. 'text': text,
  35. 'date': date,
  36. 'nickname': nickname,
  37. 'pictures': pictures
  38. })
  39. url = 'cdnware.m.jd.com'
  40. if url in flow.request.url:
  41. text = flow.response.text
  42. data = json.loads(text)
  43. if data.get('wareInfo') and data.get('wareInfo').get('basicInfo'):
  44. info = data.get('wareInfo').get('basicInfo')
  45. id = info.get('wareId')
  46. name = info.get('name')
  47. images = info.get('wareImage')
  48. print(id, name, images)
  49. products_collection.insert({
  50. 'id': id,
  51. 'name': name,
  52. 'images': images
  53. })

代码可能不能正常运行,可将数据库的存储操作封装为一个接口,然后在主程序体内调用

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/article/detail/45199
推荐阅读
  

闽ICP备14008679号