赞
踩
常见的两种商业爬虫: 论坛文章评论数据的采集:微博、知乎、贴吧、推特 电商网站的数据采集:京东、淘宝、。 import requests import pprint import time import openpyxl workbook = openpyxl.Workbook() sheet = workbook.active # 1-10 # 循环爬取10页留言 for i in range(1,10): url = '' print(url) response = requests.get(url) data = response.jason() # pprint.pprint(data) cards = data['data']['cards'] # 一页遍历10条 for card in cards: mblog = card.get('mblog',None) if blog: # 有内容再进行提取 mid = mblog.get('mid',None) text = mblog.get('text',None) source = mblog.get('source',None) aythor_name = mblog.get('user',{}).get('screen_name',None) print([aythor_name,source,text,mid]) sheet.append([aythor_name,source,text,mid]) # 请求评论下面的评论 # # 爬取每一条数据下面的评论 comments_url = '' comments_response = requests.get(comments_url) comments_data = comments_response,json() # print('comments_data',comments_data) data_list = comments_data.get('data',{}).get('data') # print('data_list',data_list) for comment in data_list: # print(comment) comment_text = comment['text'] comment_mid = comment['mid'] username = comment['user']['screen_name'] # 请求一条数据进行演示 print([username,comment_mid,comment_text]) sheet.append([username,comment_mid,comment_text]) break time.sleep(5) break workbook.save('微博数据.xlsx')
想要获取更多的数据分析资料,可以关注公众号,DATA ANALYSIS SHARING,根据提示后台回复想要的资料,即可获取。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。