import fitz
from tqdm import tqdm

def pdf2html(input_path,html_path):
    doc = fitz.open(input_path)
    for page in tqdm(doc):
        html_content = page.getText('html')
    print("开始输出html文件")
    with open(html_path, 'w', encoding='utf8', newline="") as fp:
        fp.write(html_content)

input_path = r'G:\back\pyfile\翻译\pdf_translate-master\3.pdf' # 如果报错 就用绝对路径
html_path = r'G:\back\pyfile\翻译\pdf_translate-master\input.html'
pdf2html(input_path,html_path)

参考文章

1.CSDN python自动化将pdf转成html

2.CSDN 彻底解决Python模块pdfkit写入中文时乱码问题

3.github Error: Unable to write to destination

您的电子邮箱地址不会被公开。 必填项已用*标注

评论(1)