赞
踩
萌新们刚开始爬微博数据一定要用微博手机端的网页https://m.weibo.cn/
用谷歌浏览器,先分析网站结构
进入博主主界面,右键,然后点击检查,点击Network





点击Headers container和type在这里找
然后preview看网站结构
data,cards,mblog以及要爬取的数据都在这里

import requests import pprint from urllib.parse import urlencode from pyquery import PyQuery import json base_url='https://m.weibo.cn/api/container/getIndex?' #代码块 请求的方法 #根据页数获取数据 def get_page(page): prames ={ 'containerid':'1076033937348351', 'value':'3937348351', 'page':page } response = requests.get(base_url+urlencode(prames)) return response.json() #解析数据 def prase_data(res_json): if res_json.get('data'): for node in res_json['data']['cards']: item = dict() item['text'] = PyQuery(node['mblog']['text']).text() item['id'] = node['mblog']['id'] item['screen_name'] = node['mblog']['user']['screen_name'] item['attitudes_count'] = node['mblog']['attitudes_count'] item['comments'] = node['mblog']['comments_count'] item['reposts_count'] = node['mblog']['reposts_count'] print(item) def main(): for page in range(1,20): res_jsom = get_page(page) prase_data(res_jsom) if __name__ == '__main__': main() #get_page()
爬取结果,下篇介绍如何导入Excel中

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。