赞
踩
文件流接收
1、前端传来的对象是二进制文件流,有两种方法保存本地。
(1)通过open()方法将文件流写入保存
(2)直接用调用 file.save() 方法保存传来的文件流:
from flask import Flask,request app = Flask(__name__) @app.route('/upload',methods = ['POST']) def file_receive(): # 获取文件对象 file = request.files['file'] # 获取文件名 filename = file.filename # file.save 也可保存传来的文件 # file.save(f'./{filename}') with open(f'./{filename}','wb') as f: f.write(file.stream.read()) return {'success':1} if __name__ == '__main__': app.run()
测试该段代码的文件上传可以用requests实现,用open()创建一个二进制对象,传给后端:
import requests
def uploads():
url = 'http://127.0.0.1:5000/upload'
files = {'file':open('C:\\Users\\xxx\\Desktop\\push\\test.mp4','rb')}
r = requests.post(url,files = files)
print(r.text)
if __name__=="__main__":
uploads()
试过了,行不通request.data
为空,真是的
from flask import Flask,request app = Flask(__name__) @app.route('/upload',methods = ['POST']) def file_receive(): # 获取文件对象 file = request.files['file'] # 获取参数body body = request.data filename = file.filename # file.save 也可保存传来的文件 # file.save(f'./{filename}') with open(f'./{filename}','wb') as f: f.write(file.stream.read()) return {'success':1} if __name__ == '__main__': app.run()
requests 测试代码:
import requests
def uploads():
url = 'http://127.0.0.1:5000/upload'
body = {'info':'test'}
files = {'file':open('C:\\Users\\xxx\\Desktop\\push\\test.mp4','rb')}
r = requests.post(url,json = body,files = files)
print(r.text)
if __name__=="__main__":
uploads()
flask 文件上传与接收
假设我们目前有一些文件,和参数需要通过POST发送到请求服务端,我们可以通过content type为multipart/form-data 来同时传入这两个参数。
准备参数
我们先设置需要传入的参数,这里 file_path 需要改成自己的文件
import requests
# 设置要上传的文件
file_path = "path/to/your/file" # 这里替换成文件目录
files = {
"file1": ("filename", open(file_path, "rb"))
}
# 设置要发送的JSON数据
params = {
'key1': 'value1',
'key2': 'value2'
}
编写service
在服务端要如何获取文件和JSON参数?我们首先要知道,通过如上方式传入数据,content-type是multipart/form-data 。所以我们在服务端应该使用 request.form.to_dict() 来获取表格里的参数内容。
我们新建一个命名为 service.py 的文件,写入一下脚本来启动命名为"upload-endpoint"的服务。我们这里服务没有做数据处理,只是把它们打印出来。
from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/upload-endpoint', methods=['POST']) def upload_endpoint(): try: # 获取JSON数据 params = request.form.to_dict() print(params) # 获取上传的文件 file1 = request.files['file1'] print(file1) # 处理JSON数据和文件 # 在这里,你可以根据需要对JSON数据和文件进行操作 # 例如,你可以保存文件到服务器,访问JSON数据等 # 返回一个响应 response_data = { 'message': 'JSON参数和文件已成功接收和处理' } return jsonify(response_data), 200 except Exception as e: error_message = str(e) return jsonify({'error': error_message}), 400 if __name__ == '__main__': app.run(debug=True, port=5000)
请求
# 发送POST请求,同时传送JSON数据和文件
response = requests.post('http://127.0.0.1:5000/upload-endpoint', data=params, files=files)
如何用python request同时上传文件和JSON参数
parser.add_argument('-f', '--config_file', dest='config_file', type=argparse.FileType(mode='r'))
改进如下
yaml_path='test.yaml'
parser.add_argument('-f', '--config_file', dest='config_file',type=argparse.FileType(mode='r'),default=yaml_path)
现在很多python代码使用parser解析输入参数, 我们如果想要在IDE里(如pycharm)分析源代码,不可能每一次都使用命令行进行,因此这里面使用了一个技巧,即源程序在定义完入口命令行参数后,使用了args = parser.parse_args() 来接送实际使用命令行时的输入,我们这里把这句代码替换为:
args= parser.parse_args(“从命令行传入的参数”.split())
args = parser.parse_args("--input ../example_graphs/karate.adjlist --output ./output".split())
str=“–input …/example_graphs/karate.adjlist”
args = parser.parse_args(str.split())
就报错AttributeError: 'str' object has no attribute 'spilt'
可以使用第三种方式
args = parser.parse_args(【'--input',str】)
Pycham不用命令行传入参数
Python 中使用 argparse 解析命令行参数 | Linux 中国
有一些第三方库用于命令行解析,但标准库 argparse 与之相比也毫不逊色。
无需添加很多依赖,你就可以编写带有实用参数解析功能的漂亮命令行工具。
Python 中的参数解析
使用 argparse 解析命令行参数时,第一步是配置一个 ArgumentParser 对象。这通常在全局模块内完成,因为单单_配置_一个解析器没有副作用。
import argparse
PARSER = argparse.ArgumentParser()
ArgumentParser 中最重要的方法是 .add_argument(),它有几个变体。默认情况下,它会添加一个参数,并期望一个值。
PARSER.add_argument("--value")
查看实际效果,调用 .parse_args():
PARSER.parse_args(["--value", "some-value"])
Namespace(value='some-value')
也可以使用 = 语法:
PARSER.parse_args(["--value=some-value"])
Namespace(value='some-value')
为了缩短在命令行输入的命令,你还可以为选项指定一个短“别名”:
PARSER.add_argument("--thing", "-t")
可以传入短选项:
PARSER.parse_args(“-t some-thing”.split())
Namespace(value=None, thing=‘some-thing’)
或者长选项:
PARSER.parse_args(“–thing some-thing”.split())
Namespace(value=None, thing=‘some-thing’)
类型
有很多类型的参数可供你使用。除了默认类型,最流行的两个是布尔类型和计数器。布尔类型有一个默认为 True 的变体和一个默认为 False 的变体。
PARSER.add_argument(“–active”, action=“store_true”)
PARSER.add_argument(“–no-dry-run”, action=“store_false”, dest=“dry_run”)
PARSER.add_argument(“–verbose”, “-v”, action=“count”)
除非显式传入 --active,否则 active 就是 False。dry-run 默认是 True,除非传入 --no-dry-run。无值的短选项可以并列。
传递所有参数会导致非默认状态:
PARSER.parse_args(“–active --no-dry-run -vvvv”.split())
Namespace(value=None, thing=None, active=True, dry_run=False, verbose=4)
默认值则比较单一:
PARSER.parse_args(“”.split())
Namespace(value=None, thing=None, active=False, dry_run=True, verbose=None)
子命令
经典的 Unix 命令秉承了“一次只做一件事,并做到极致”,但现代的趋势把“几个密切相关的操作”放在一起。
git、podman 和 kubectl 充分说明了这种范式的流行。argparse 库也可以做到:
MULTI_PARSER = argparse.ArgumentParser()
subparsers = MULTI_PARSER.add_subparsers()
get = subparsers.add_parser(“get”)
get.add_argument(“–name”)
get.set_defaults(command=“get”)
search = subparsers.add_parser(“search”)
search.add_argument(“–query”)
search.set_defaults(command=“search”)
MULTI_PARSER.parse_args(“get --name awesome-name”.split())
Namespace(name=‘awesome-name’, command=‘get’)
MULTI_PARSER.parse_args(“search --query name~awesome”.split())
Namespace(query=‘name~awesome’, command=‘search’)`
程序架构
使用 argparse 的一种方法是使用下面的结构:
## my_package/__main__.py import argparse import sys from my_package import toplevel parsed_arguments = toplevel.PARSER.parse_args(sys.argv[1:]) toplevel.main(parsed_arguments) ## my_package/toplevel.py PARSER = argparse.ArgumentParser() ## .add_argument, etc. def main(parsed_args): ... # do stuff with parsed_args
在这种情况下,使用 python -m my_package 运行。或者,你可以在包安装时使用 console_scprits 入口点。
总结
argparse 模块是一个强大的命令行参数解析器,还有很多功能没能在这里介绍。它能实现你想象的一切。
python解压压缩包
如果是从前端上传的zip,只想将解压后的文件夹存在服务器中,那么先解压再保存(保存之后才存在文件路径),可以将前端输入的zip文件
现在我们直接使用上一步产生的 spam.zip 文件内容,首先假定输入为字节数据,然后窥探其中每一个条目的文件信息与内容 import zipfile import io import os def read_zipfiles(path, folder=''): for member in path.iterdir(): filename = os.path.join(folder, member.name) if member.is_file(): print(filename, ':', member.read_text()) # member.read_bytes() else: read_zipfiles(member, filename) with open('spam.zip', 'rb') as myzip: zip_data = myzip.read() with zipfile.ZipFile(io.BytesIO(zip_data)) as zip_file: read_zipfiles(zipfile.Path(zip_file))
# 处理压缩文件
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) # 压缩文件保存在项目路径下
local_dir = os.path.join(base_dir, '11') # 新创建一个路径,用来放压缩后的文件
hh = os.path.join(base_dir, filename) # 这个是找到压缩文件路径-------C:/Code/haha.zip
print(hh)
print(local_dir)
shutil.unpack_archive(filename=hh, extract_dir=local_dir)# 把文件保存在刚刚设定好的路径下
os.remove(hh) # 最后把压缩文件删除
dst = open(dst, "wb")
```python from PIL import Image import os # 打开图片 image = Image.open('example.jpg') # 保存图片到指定文件夹 if not os.path.exists('new_folder'): os.makedirs('new_folder') image.save('new_folder/example_new.jpg')
上述代码中,使用os模块创建一个新的文件夹new_folder,并将图片保存到这个文件夹中。
我的代码报错
python - IO错误: Errno 13 Permission denied for specific files
些许类似,没明白
if suffix.lower() in ['jpg', 'png', 'jpeg']: # uploaded_file.save(image_folder + uploaded_file.filename.split('.')[-2]) # image_folder = image_folder + uploaded_file.filename.split('.')[-2] save_path=image_folder + uploaded_file.filename.split('.')[-2] # # uploaded_file.save(save_path + uploaded_file.filename) # image_folder = image_folder + uploaded_file.filename.split('.')[-2] # uploaded_file.save('./images/hhh/'+ uploaded_file.filename) # image_folder = image_folder + 'hhh/' # save_path=image_folder + uploaded_file.filename.split('.')[-2]+'/'+ uploaded_file.filename.split('.')[-2]+'.' # print(save_path) # uploaded_file.save(save_path + suffix.lower()) # image_folder = image_folder + uploaded_file.filename.split('.')[-2] print(uploaded_file.filename) print(type(uploaded_file.filename)) save_path=os.path.join(save_path, uploaded_file.filename) print(save_path) with open(uploaded_file.filename, 'wb') as f: print('222') f.write(uploaded_file) print('111')
# from paddleocr import PaddleOCR import os import sys import importlib __dir__ = os.path.dirname(__file__) sys.path.append(os.path.join(__dir__, '')) import cv2 import logging import numpy as np from pathlib import Path # import base64 # from io import BytesIO from PIL import Image def _import_file(module_name, file_path, make_importable=False): spec = importlib.util.spec_from_file_location(module_name, file_path) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) if make_importable: sys.modules[module_name] = module return module tools = _import_file( 'tools', os.path.join(__dir__, 'tools/__init__.py'), make_importable=True) ppocr = importlib.import_module('ppocr', 'paddleocr') ppstructure = importlib.import_module('ppstructure', 'paddleocr') from ppocr.utils.logging import get_logger from tools.infer import predict_system from ppocr.utils.utility import check_and_read, get_image_file_list, alpha_to_color, binarize_img from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url from tools.infer.utility import draw_ocr, str2bool, check_gpu from ppstructure.utility import init_args, draw_structure_result from ppstructure.predict_system import StructureSystem, save_structure_res, to_excel logger = get_logger() __all__ = [ 'PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result', 'save_structure_res', 'download_with_progressbar', 'to_excel' ] SUPPORT_DET_MODEL = ['DB'] VERSION = '2.7.0.3' SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet'] BASE_DIR = os.path.expanduser("~/.paddleocr/") DEFAULT_OCR_MODEL_VERSION = 'PP-OCRv4' SUPPORT_OCR_MODEL_VERSION = ['PP-OCR', 'PP-OCRv2', 'PP-OCRv3', 'PP-OCRv4'] DEFAULT_STRUCTURE_MODEL_VERSION = 'PP-StructureV2' SUPPORT_STRUCTURE_MODEL_VERSION = ['PP-Structure', 'PP-StructureV2'] MODEL_URLS = { 'OCR': { 'PP-OCRv4': { 'det': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar', }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar', }, 'ml': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar' } }, 'rec': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/english/en_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/en_dict.txt' }, 'korean': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/korean_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/korean_dict.txt' }, 'japan': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/japan_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/japan_dict.txt' }, 'chinese_cht': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt' }, 'ta': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/ta_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ta_dict.txt' }, 'te': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/te_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/te_dict.txt' }, 'ka': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/ka_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ka_dict.txt' }, 'latin': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/latin_dict.txt' }, 'arabic': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/arabic_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/arabic_dict.txt' }, 'cyrillic': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/cyrillic_dict.txt' }, 'devanagari': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/devanagari_PP-OCRv4_rec_infer.tar', 'dict_path': './ppocr/utils/dict/devanagari_dict.txt' }, }, 'cls': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar', } }, }, 'PP-OCRv3': { 'det': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar', }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar', }, 'ml': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar' } }, 'rec': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/en_dict.txt' }, 'korean': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/korean_dict.txt' }, 'japan': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/japan_dict.txt' }, 'chinese_cht': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt' }, 'ta': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ta_dict.txt' }, 'te': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/te_dict.txt' }, 'ka': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ka_dict.txt' }, 'latin': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/latin_dict.txt' }, 'arabic': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/arabic_dict.txt' }, 'cyrillic': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/cyrillic_dict.txt' }, 'devanagari': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar', 'dict_path': './ppocr/utils/dict/devanagari_dict.txt' }, }, 'cls': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar', } }, }, 'PP-OCRv2': { 'det': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar', }, }, 'rec': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' } }, 'cls': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar', } }, }, 'PP-OCR': { 'det': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar', }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar', }, 'structure': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar' } }, 'rec': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/ppocr_keys_v1.txt' }, 'en': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/en_dict.txt' }, 'french': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/french_dict.txt' }, 'german': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/german_dict.txt' }, 'korean': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/korean_dict.txt' }, 'japan': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/japan_dict.txt' }, 'chinese_cht': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/chinese_cht_dict.txt' }, 'ta': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ta_dict.txt' }, 'te': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/te_dict.txt' }, 'ka': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/ka_dict.txt' }, 'latin': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/latin_dict.txt' }, 'arabic': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/arabic_dict.txt' }, 'cyrillic': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/cyrillic_dict.txt' }, 'devanagari': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar', 'dict_path': './ppocr/utils/dict/devanagari_dict.txt' }, 'structure': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar', 'dict_path': 'ppocr/utils/dict/table_dict.txt' } }, 'cls': { 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar', } }, } }, 'STRUCTURE': { 'PP-Structure': { 'table': { 'en': { 'url': 'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar', 'dict_path': 'ppocr/utils/dict/table_structure_dict.txt' } } }, 'PP-StructureV2': { 'table': { 'en': { 'url': 'https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar', 'dict_path': 'ppocr/utils/dict/table_structure_dict.txt' }, 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar', 'dict_path': 'ppocr/utils/dict/table_structure_dict_ch.txt' } }, 'layout': { 'en': { 'url': 'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar', 'dict_path': 'ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt' }, 'ch': { 'url': 'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar', 'dict_path': 'ppocr/utils/dict/layout_dict/layout_cdla_dict.txt' } } } } } def parse_args(mMain=True): import argparse parser = init_args() parser.add_help = mMain parser.add_argument("--lang", type=str, default='ch') parser.add_argument("--det", type=str2bool, default=True) parser.add_argument("--rec", type=str2bool, default=True) parser.add_argument("--type", type=str, default='ocr') parser.add_argument( "--ocr_version", type=str, choices=SUPPORT_OCR_MODEL_VERSION, default='PP-OCRv4', help='OCR Model version, the current model support list is as follows: ' '1. PP-OCRv4/v3 Support Chinese and English detection and recognition model, and direction classifier model' '2. PP-OCRv2 Support Chinese detection and recognition model. ' '3. PP-OCR support Chinese detection, recognition and direction classifier and multilingual recognition model.' ) parser.add_argument( "--structure_version", type=str, choices=SUPPORT_STRUCTURE_MODEL_VERSION, default='PP-StructureV2', help='Model version, the current model support list is as follows:' ' 1. PP-Structure Support en table structure model.' ' 2. PP-StructureV2 Support ch and en table structure model.') for action in parser._actions: if action.dest in [ 'rec_char_dict_path', 'table_char_dict_path', 'layout_dict_path' ]: action.default = None if mMain: return parser.parse_args() else: inference_args_dict = {} for action in parser._actions: inference_args_dict[action.dest] = action.default return argparse.Namespace(**inference_args_dict) def parse_lang(lang): latin_lang = [ 'af', 'az', 'bs', 'cs', 'cy', 'da', 'de', 'es', 'et', 'fr', 'ga', 'hr', 'hu', 'id', 'is', 'it', 'ku', 'la', 'lt', 'lv', 'mi', 'ms', 'mt', 'nl', 'no', 'oc', 'pi', 'pl', 'pt', 'ro', 'rs_latin', 'sk', 'sl', 'sq', 'sv', 'sw', 'tl', 'tr', 'uz', 'vi', 'french', 'german' ] arabic_lang = ['ar', 'fa', 'ug', 'ur'] cyrillic_lang = [ 'ru', 'rs_cyrillic', 'be', 'bg', 'uk', 'mn', 'abq', 'ady', 'kbd', 'ava', 'dar', 'inh', 'che', 'lbe', 'lez', 'tab' ] devanagari_lang = [ 'hi', 'mr', 'ne', 'bh', 'mai', 'ang', 'bho', 'mah', 'sck', 'new', 'gom', 'sa', 'bgc' ] if lang in latin_lang: lang = "latin" elif lang in arabic_lang: lang = "arabic" elif lang in cyrillic_lang: lang = "cyrillic" elif lang in devanagari_lang: lang = "devanagari" assert lang in MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION][ 'rec'], 'param lang must in {}, but got {}'.format( MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION]['rec'].keys(), lang) if lang == "ch": det_lang = "ch" elif lang == 'structure': det_lang = 'structure' elif lang in ["en", "latin"]: det_lang = "en" else: det_lang = "ml" return lang, det_lang def get_model_config(type, version, model_type, lang): if type == 'OCR': DEFAULT_MODEL_VERSION = DEFAULT_OCR_MODEL_VERSION elif type == 'STRUCTURE': DEFAULT_MODEL_VERSION = DEFAULT_STRUCTURE_MODEL_VERSION else: raise NotImplementedError model_urls = MODEL_URLS[type] if version not in model_urls: version = DEFAULT_MODEL_VERSION if model_type not in model_urls[version]: if model_type in model_urls[DEFAULT_MODEL_VERSION]: version = DEFAULT_MODEL_VERSION else: logger.error('{} models is not support, we only support {}'.format( model_type, model_urls[DEFAULT_MODEL_VERSION].keys())) sys.exit(-1) if lang not in model_urls[version][model_type]: if lang in model_urls[DEFAULT_MODEL_VERSION][model_type]: version = DEFAULT_MODEL_VERSION else: logger.error( 'lang {} is not support, we only support {} for {} models'. format(lang, model_urls[DEFAULT_MODEL_VERSION][model_type].keys( ), model_type)) sys.exit(-1) return model_urls[version][model_type][lang] def img_decode(content: bytes): np_arr = np.frombuffer(content, dtype=np.uint8) return cv2.imdecode(np_arr, cv2.IMREAD_UNCHANGED) def check_img(img): if isinstance(img, bytes): img = img_decode(img) if isinstance(img, str): # download net image if is_link(img): download_with_progressbar(img, 'tmp.jpg') img = 'tmp.jpg' image_file = img img, flag_gif, flag_pdf = check_and_read(image_file) if not flag_gif and not flag_pdf: with open(image_file, 'rb') as f: img_str = f.read() img = img_decode(img_str) if img is None: try: buf = BytesIO() image = BytesIO(img_str) im = Image.open(image) rgb = im.convert('RGB') rgb.save(buf, 'jpeg') buf.seek(0) image_bytes = buf.read() data_base64 = str(base64.b64encode(image_bytes), encoding="utf-8") image_decode = base64.b64decode(data_base64) img_array = np.frombuffer(image_decode, np.uint8) img = cv2.imdecode(img_array, cv2.IMREAD_COLOR) except: logger.error("error in loading image:{}".format(image_file)) return None if img is None: logger.error("error in loading image:{}".format(image_file)) return None if isinstance(img, np.ndarray) and len(img.shape) == 2: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) return img class PaddleOCR(predict_system.TextSystem): def __init__(self, **kwargs): """ paddleocr package args: **kwargs: other params show in paddleocr --help """ params = parse_args(mMain=False) params.__dict__.update(**kwargs) assert params.ocr_version in SUPPORT_OCR_MODEL_VERSION, "ocr_version must in {}, but get {}".format( SUPPORT_OCR_MODEL_VERSION, params.ocr_version) params.use_gpu = check_gpu(params.use_gpu) if not params.show_log: logger.setLevel(logging.INFO) self.use_angle_cls = params.use_angle_cls lang, det_lang = parse_lang(params.lang) # init model dir det_model_config = get_model_config('OCR', params.ocr_version, 'det', det_lang) params.det_model_dir, det_url = confirm_model_dir_url( params.det_model_dir, os.path.join(BASE_DIR, 'whl', 'det', det_lang), det_model_config['url']) rec_model_config = get_model_config('OCR', params.ocr_version, 'rec', lang) params.rec_model_dir, rec_url = confirm_model_dir_url( params.rec_model_dir, os.path.join(BASE_DIR, 'whl', 'rec', lang), rec_model_config['url']) cls_model_config = get_model_config('OCR', params.ocr_version, 'cls', 'ch') params.cls_model_dir, cls_url = confirm_model_dir_url( params.cls_model_dir, os.path.join(BASE_DIR, 'whl', 'cls'), cls_model_config['url']) if params.ocr_version in ['PP-OCRv3', 'PP-OCRv4']: params.rec_image_shape = "3, 48, 320" else: params.rec_image_shape = "3, 32, 320" # download model if using paddle infer if not params.use_onnx: maybe_download(params.det_model_dir, det_url) maybe_download(params.rec_model_dir, rec_url) maybe_download(params.cls_model_dir, cls_url) if params.det_algorithm not in SUPPORT_DET_MODEL: logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL)) sys.exit(0) if params.rec_algorithm not in SUPPORT_REC_MODEL: logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL)) sys.exit(0) if params.rec_char_dict_path is None: params.rec_char_dict_path = str( Path(__file__).parent / rec_model_config['dict_path']) logger.debug(params) # init det_model and rec_model super().__init__(params) self.page_num = params.page_num def ocr(self, img, det=True, rec=True, cls=True, bin=False, inv=False, alpha_color=(255, 255, 255)): """ OCR with PaddleOCR args: img: img for OCR, support ndarray, img_path and list or ndarray det: use text detection or not. If False, only rec will be exec. Default is True rec: use text recognition or not. If False, only det will be exec. Default is True cls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False. bin: binarize image to black and white. Default is False. inv: invert image colors. Default is False. alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white. """ assert isinstance(img, (np.ndarray, list, str, bytes)) if isinstance(img, list) and det == True: logger.error('When input a list of images, det must be false') exit(0) if cls == True and self.use_angle_cls == False: logger.warning( 'Since the angle classifier is not initialized, it will not be used during the forward process' ) img = check_img(img) # for infer pdf file if isinstance(img, list): if self.page_num > len(img) or self.page_num == 0: self.page_num = len(img) imgs = img[:self.page_num] else: imgs = [img] def preprocess_image(_image): _image = alpha_to_color(_image, alpha_color) if inv: _image = cv2.bitwise_not(_image) if bin: _image = binarize_img(_image) return _image if det and rec: ocr_res = [] for idx, img in enumerate(imgs): img = preprocess_image(img) dt_boxes, rec_res, _ = self.__call__(img, cls) if not dt_boxes and not rec_res: ocr_res.append(None) continue tmp_res = [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)] ocr_res.append(tmp_res) return ocr_res elif det and not rec: ocr_res = [] for idx, img in enumerate(imgs): img = preprocess_image(img) dt_boxes, elapse = self.text_detector(img) if not dt_boxes: ocr_res.append(None) continue tmp_res = [box.tolist() for box in dt_boxes] ocr_res.append(tmp_res) return ocr_res else: ocr_res = [] cls_res = [] for idx, img in enumerate(imgs): if not isinstance(img, list): img = preprocess_image(img) img = [img] if self.use_angle_cls and cls: img, cls_res_tmp, elapse = self.text_classifier(img) if not rec: cls_res.append(cls_res_tmp) rec_res, elapse = self.text_recognizer(img) ocr_res.append(rec_res) if not rec: return cls_res return ocr_res import json import os import io import zipfile import shutil class Result: def __init__(self, id, value): self.id = id self.value = value def result_encoder(obj): if isinstance(obj, Result): return {'id': obj.id, 'PaddleOCR': obj.value} return json.JSONEncoder.default(obj) import paddle paddle.disable_signal_handler() # 在2.2版本提供了disable_signal_handler接口 from flask import Flask, request app = Flask(__name__) @app.route('/OCR', methods=['GET','POST']) def fun(): print(request.files) uploaded_file = request.files['file'] if not uploaded_file: return {'error': 'No file is provided'} suffix = uploaded_file.filename.split('.')[-1] # 取得文件的后缀名 # #也可以根据文件的后缀名对文件类型进行过滤,如: if suffix.lower() not in ['jpg', 'png', 'jpeg', 'zip']: return {'error': 'The uploaded file type is invalid'} image_folder = './images/' if not os.path.exists(image_folder): os.makedirs(image_folder) if suffix.lower() in ['jpg', 'png', 'jpeg']: # uploaded_file.save(image_folder + uploaded_file.filename.split('.')[-2]) # image_folder = image_folder + uploaded_file.filename.split('.')[-2] save_path=image_folder + uploaded_file.filename.split('.')[-2] if not os.path.exists(save_path): os.makedirs(save_path) uploaded_file.save(os.path.join(save_path, uploaded_file.filename)) image_folder = save_path else: zip_buffer = io.BytesIO(uploaded_file.read()) with zipfile.ZipFile(zip_buffer, 'r') as zip_ref: zip_ref.extractall(image_folder) # 解压缩到指定的目标文件夹 save_path=image_folder + uploaded_file.filename.split('.')[-2] # uploaded_file.save(image_folder+'/'+uploaded_file.filename) # with zipfile.ZipFile('/data1/xyj/PaddleOCR/images/app_test.zip', 'r') as zip_ref: # zip_ref.extractall(image_folder) # 解压缩到指定的目标文件夹 # image_folder = save_path # with zipfile.ZipFile('/data1/xyj/PaddleOCR/images/app_test.zip', 'r') as zip_ref: # for member in zip_ref.infolist(): # zip_ref.extract(member.filename, image_folder) # image_folder = save_path # shutil.unpack_archive('/data1/xyj/PaddleOCR/images/app_test.zip', image_folder, 'zip') image_folder = save_path # image_folder = "/data1/xyj/datasets/zh_test" # image_folder = request.json['image_folder'] output_path = 'outputs/' if not os.path.exists(output_path): os.makedirs(output_path) output_path = os.path.join(output_path, "zh_test_PaddleOCR.json") if (os.path.exists(output_path)): os.remove(output_path) # Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换 # 例如`ch`, `en`, `fr`, `german`, `korean`, `japan` ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory ans = {} for filename in os.listdir(image_folder): img_path = os.path.join(image_folder, filename) result = ocr.ocr(img_path, cls=True) for res in result: outputs='' if res is not None: for line in res: outputs=outputs+line[1][0]+' ' res = Result(filename, outputs) with open(output_path, "a", encoding="utf8") as file: json.dump(result_encoder(res), file, ensure_ascii=False, indent=4) ans[filename] = outputs # 将列表转换为 JSON 格式的字符串 json_data = json.dumps(ans, ensure_ascii=False) # 将 JSON 字符串写入文件 with open("data.json", "w") as file: file.write(json_data) return {'result': json_data} if __name__ == '__main__': app.run(host='0.0.0.0', port=5009)
app_ppocr.py
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。