赞
踩
需求:度量上市公司年报的管理层语调的代码,要求给出积极(pos)语调、消极(neg)语调、积极词汇数量、消极词汇数量、段落总词数、停用词数量、总句数量,最后计算出管理层的乐观主义指标。最终结果要输入到Excel中,按照年份和公司代码排列。代码要求适应环境为python3,可以更改年报的选取时间。
环境:
Python3实现度量上市公司年报管理层语调,并输出到Excel中。
本代码使用了中文停用词库和情感词库来进行文本分析。在运行代码前,请确保已经安装以下Python库:
代码实现:
- import jieba
- import pandas as pd
- import numpy as np
- import os
- import re
-
-
- # 设置停用词表
- stopwords_path = 'stopwords.txt'
- stopwords = set([line.strip() for line in open(stopwords_path, 'r', encoding='utf-8').readlines()])
-
-
- # 设置情感词表
- posdict_path = 'posdict.txt'
- posdict = set([line.strip() for line in open(posdict_path, 'r', encoding='utf-8').readlines()])
-
- negdict_path = 'negdict.txt'
- negdict = set([line.strip() for line in open(negdict_path, 'r', encoding='utf-8').readlines()])
-
-
- # 对文本进行分词并过滤停用词
- def segment(text):
- seg_list = jieba.cut(text)
- filtered_words = [word for word in seg_list if word not in stopwords]
- return filtered_words
-
-
- # 计算积极语调、消极语调、积极词汇数量、消极词汇数量、段落总词数、停用词数量、总句数量
- def analyze_text(text):
- pos_count = 0
- neg_count = 0
- pos_word_count = 0
- neg_word_count = 0
- word_count = 0
- stopword_count = 0
- sentence_count = 0
-
- # 将文本按照段落分割
- paragraphs = re.split('\n|\r', text)
-
- for para in paragraphs:
- if not para.strip():
- continue
- # 将段落分割成句子
- sentences = re.split('[。!?]', para)
- for sentence in sentences:
- if not sentence.strip():
- continue
- # 将句子分词并过滤停用词
- words = segment(sentence)
- word_count += len(words)
- stopword_count += sum([1 for word in words if word in stopwords])
- pos_word_count += len([word for word in words if word in posdict])
- neg_word_count += len([word for word in words if word in negdict])
- # 判断句子的情感倾向
- if pos_word_count > neg_word_count:
- pos_count += 1
- elif pos_word_count < neg_word_count:
- neg_count += 1
- sentence_count += 1
-
- return pos_count, neg_count, pos_word_count, neg_word_count, word_count, stopword_count, sentence_count
-
-
- # 计算乐观主义指标
- def calculate_optimism(pos_count, neg_count, sentence_count):
- if pos_count + neg_count == 0:
- return 0
- optimism = pos_count / (pos_count + neg_count)
- return optimism
-

读取年报,并分析文本,计算上述文本指标,并输出到Excel中:
- # 读取年报文件并分析文本
- def analyze_report(file_path):
- with open(file_path, 'r', encoding='utf-8') as f:
- text = f.read()
- pos_count, neg_count, pos_word_count, neg_word_count, word_count, stopword_count, sentence_count = analyze_text(text)
- optimism = calculate_optimism(pos_count, neg_count, sentence_count)
- return {
- 'pos_count': pos_count,
- 'neg_count': neg_count,
- 'pos_word_count': pos_word_count,
- 'neg_word_count': neg_word_count,
- 'word_count': word_count,
- 'stopword_count': stopword_count,
- 'sentence_count': sentence_count,
- 'optimism': optimism
- }
-
-
- # 读取目录下的所有年报文件并分析
- def analyze_reports(dir_path):
- company_reports = {}
- for root, dirs, files in os.walk(dir_path):
- for file in files:
- if file.endswith('.txt'):
- company_code = file.split('_')[0]
- file_path = os.path.join(root, file)
- report_data = analyze_report(file_path)
- if company_code in company_reports:
- company_reports[company_code].append(report_data)
- else:
- company_reports[company_code] = [report_data]
- return company_reports
-
-
- # 将数据存储在Pandas DataFrame中并输出到Excel文件
- def save_to_excel(company_reports, output_path):
- rows = []
- for company_code, reports in company_reports.items():
- for report in reports:
- row = {
- 'company_code': company_code,
- 'pos_count': report['pos_count'],
- 'neg_count': report['neg_count'],
- 'pos_word_count': report['pos_word_count'],
- 'neg_word_count': report['neg_word_count'],
- 'word_count': report['word_count'],
- 'stopword_count': report['stopword_count'],
- 'sentence_count': report['sentence_count'],
- 'optimism': report['optimism']
- }
- rows.append(row)
- df = pd.DataFrame(rows)
- df.to_excel(output_path, index=False)
-
-
- # 设置年报文件目录和输出Excel文件路径
- dir_path = 'annual_reports'
- output_path = 'annual_report_data.xlsx'
-
- # 分析年报并存储数据到Excel文件中
- company_reports = analyze_reports(dir_path)
- save_to_excel(company_reports, output_path)

在使用此代码之前,需要将停用词表、积极词表和消极词表放置在与代码相同的目录中,并将其命名为stopwords.txt
、posdict.txt
和negdict.txt
。
此外,还需要将要分析的年报文件放置在名为annual_reports
的文件夹中。
在运行完整代码后,将在指定的输出路径中生成一个Excel文件,其中包含乐观主义指标以及其他有关年报的分析数据,按公司代码和年份排序。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。