度量上市公司年报中的管理层语调（Python代码实现）_年报语调 csdn

作者：盐析白兔 | 2024-06-15 00:41:34

踩

年报语调 csdn

需求：度量上市公司年报的管理层语调的代码，要求给出积极（pos）语调、消极（neg）语调、积极词汇数量、消极词汇数量、段落总词数、停用词数量、总句数量，最后计算出管理层的乐观主义指标。最终结果要输入到Excel中，按照年份和公司代码排列。代码要求适应环境为python3，可以更改年报的选取时间。

环境：

Python3实现度量上市公司年报管理层语调，并输出到Excel中。

本代码使用了中文停用词库和情感词库来进行文本分析。在运行代码前，请确保已经安装以下Python库：

jieba
pandas

代码实现：


import jieba
import pandas as pd
import numpy as np
import os
import re
 
 
# 设置停用词表
stopwords_path = 'stopwords.txt'
stopwords = set([line.strip() for line in open(stopwords_path, 'r', encoding='utf-8').readlines()])
 
 
# 设置情感词表
posdict_path = 'posdict.txt'
posdict = set([line.strip() for line in open(posdict_path, 'r', encoding='utf-8').readlines()])
 
negdict_path = 'negdict.txt'
negdict = set([line.strip() for line in open(negdict_path, 'r', encoding='utf-8').readlines()])
 
 
# 对文本进行分词并过滤停用词
def segment(text):
    seg_list = jieba.cut(text)
    filtered_words = [word for word in seg_list if word not in stopwords]
    return filtered_words
 
 
# 计算积极语调、消极语调、积极词汇数量、消极词汇数量、段落总词数、停用词数量、总句数量
def analyze_text(text):
    pos_count = 0
    neg_count = 0
    pos_word_count = 0
    neg_word_count = 0
    word_count = 0
    stopword_count = 0
    sentence_count = 0
    
    # 将文本按照段落分割
    paragraphs = re.split('\n|\r', text)
    
    for para in paragraphs:
        if not para.strip():
            continue
        # 将段落分割成句子
        sentences = re.split('[。！？]', para)
        for sentence in sentences:
            if not sentence.strip():
                continue
            # 将句子分词并过滤停用词
            words = segment(sentence)
            word_count += len(words)
            stopword_count += sum([1 for word in words if word in stopwords])
            pos_word_count += len([word for word in words if word in posdict])
            neg_word_count += len([word for word in words if word in negdict])
            # 判断句子的情感倾向
            if pos_word_count > neg_word_count:
                pos_count += 1
            elif pos_word_count < neg_word_count:
                neg_count += 1
            sentence_count += 1
            
    return pos_count, neg_count, pos_word_count, neg_word_count, word_count, stopword_count, sentence_count
 
 
# 计算乐观主义指标
def calculate_optimism(pos_count, neg_count, sentence_count):
    if pos_count + neg_count == 0:
        return 0
    optimism = pos_count / (pos_count + neg_count)
    return optimism

读取年报，并分析文本，计算上述文本指标，并输出到Excel中：


# 读取年报文件并分析文本
def analyze_report(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        text = f.read()
    pos_count, neg_count, pos_word_count, neg_word_count, word_count, stopword_count, sentence_count = analyze_text(text)
    optimism = calculate_optimism(pos_count, neg_count, sentence_count)
    return {
        'pos_count': pos_count,
        'neg_count': neg_count,
        'pos_word_count': pos_word_count,
        'neg_word_count': neg_word_count,
        'word_count': word_count,
        'stopword_count': stopword_count,
        'sentence_count': sentence_count,
        'optimism': optimism
    }
 
 
# 读取目录下的所有年报文件并分析
def analyze_reports(dir_path):
    company_reports = {}
    for root, dirs, files in os.walk(dir_path):
        for file in files:
            if file.endswith('.txt'):
                company_code = file.split('_')[0]
                file_path = os.path.join(root, file)
                report_data = analyze_report(file_path)
                if company_code in company_reports:
                    company_reports[company_code].append(report_data)
                else:
                    company_reports[company_code] = [report_data]
    return company_reports
 
 
# 将数据存储在Pandas DataFrame中并输出到Excel文件
def save_to_excel(company_reports, output_path):
    rows = []
    for company_code, reports in company_reports.items():
        for report in reports:
            row = {
                'company_code': company_code,
                'pos_count': report['pos_count'],
                'neg_count': report['neg_count'],
                'pos_word_count': report['pos_word_count'],
                'neg_word_count': report['neg_word_count'],
                'word_count': report['word_count'],
                'stopword_count': report['stopword_count'],
                'sentence_count': report['sentence_count'],
                'optimism': report['optimism']
            }
            rows.append(row)
    df = pd.DataFrame(rows)
    df.to_excel(output_path, index=False)
 
 
# 设置年报文件目录和输出Excel文件路径
dir_path = 'annual_reports'
output_path = 'annual_report_data.xlsx'
 
# 分析年报并存储数据到Excel文件中
company_reports = analyze_reports(dir_path)
save_to_excel(company_reports, output_path)

在使用此代码之前，需要将停用词表、积极词表和消极词表放置在与代码相同的目录中，并将其命名为stopwords.txt、posdict.txt和negdict.txt。

此外，还需要将要分析的年报文件放置在名为annual_reports的文件夹中。

在运行完整代码后，将在指定的输出路径中生成一个Excel文件，其中包含乐观主义指标以及其他有关年报的分析数据，按公司代码和年份排序。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/盐析白兔/article/detail/720205