zoukankan      html  css  js  c++  java
  • pdf提取信息到excel

    1. 报错:No module named 'StringIO'

    Python3中出现“No module named 'StringIO'”错误处理方法_浅笑古今的博客-CSDN博客

    2. [Python] 自动化办公 PDF提取文字、表格、图片 - 简书 (jianshu.com)

    3. python写入excel(方式1) - 景月 - 博客园 (cnblogs.com)

    4. 有些PDF失效,所以最后添加了try except防止报错退出循环

    任务要求:从PDF中提取所有信息,将其存入到excel表格中,代码包含了读取PDF,制作excel表格以及将提取出来的内容添加到表格中,代码如下

    import xlsxwriter
    import pdfplumber
    import os
    # 创建excel文件
    xl = xlsxwriter.Workbook(r'C:UsersEleniDesktop	est	est.xlsx')
    # 添加sheet
    sheet = xl.add_worksheet('sheet1')
    sheet.write_string("A1","编号")
    sheet.write_string("B1","date")
    sheet.write_string("C1","project")
    sheet.write_string("D1","property")
    sheet.write_string("E1","vendor name")
    sheet.write_string("F1","vendor address")
    sheet.write_string("G1","agent company")
    sheet.write_string("H1","agent address")
    sheet.write_string("I1","agent contact")
    sheet.write_string("J1","agent phone")
    sheet.write_string("K1","agent fax")
    sheet.write_string("L1","Purchaser name")
    sheet.write_string("M1","Co purchaser name")
    sheet.write_string("N1","Purchaser address")
    sheet.write_string("O1","Purchaser email")
    sheet.write_string("P1","Purchaser phone")
    sheet.write_string("Q1","Purchaser FIRB")
    sheet.write_string("R1","Purchase reason")
    sheet.write_string("S1","Purchaser Solicitor company")
    sheet.write_string("T1","Purchaser Solicitor address")
    sheet.write_string("U1","Purchaser Solicitor contact")
    sheet.write_string("V1","Purchaser Solicitor phone")
    sheet.write_string("W1","Purchaser Solicitor fax")
    sheet.write_string("X1","Purchaser Solicitor email")
    sheet.write_string("Y1","Purchase Price")
    sheet.write_string("Z1","Deposit Token")
    sheet.write_string("AA1","agent notes")
    x=1
    for i in range(7,11):
        file_path = os.getcwd() + '\' + 'Desktop' + '\' + 'test' + '\' + str(i) + '.pdf'
        li=[]
        try:
            with pdfplumber.open(file_path) as pdf:
                page = pdf.pages[0]
                for n in range(3):
                    li.append(page.extract_table()[n][1])
                for n in range(4,6):
                    li.append(page.extract_table()[n][1])
                for n in range(14,19):
                    li.append(page.extract_table()[n][1])
                for n in range(20,27):
                    li.append(page.extract_table()[n][1])
                for n in range(28,36):
                    li.append(page.extract_table()[n][1])
                x+=1 #A的序号是从X+1开始进行编号
                y=x
                z=y-1
                sheet.write_string("A%d" % x,"%d" % z)
                sheet.write_string("B%d" % y, "%s" % li[0])
                sheet.write_string("C%d" % y, "%s" % li[1])
                sheet.write_string("D%d" % y, "%s" % li[2])
                sheet.write_string("E%d" % y, "%s" % li[3])
                sheet.write_string("F%d" % y, "%s" % li[4])
                sheet.write_string("G%d" % y, "%s" % li[5])
                sheet.write_string("H%d" % y, "%s" % li[6])
                sheet.write_string("I%d" % y, "%s" % li[7])
                sheet.write_string("J%d" % y, "%s" % li[8])
                sheet.write_string("K%d" % y, "%s" % li[9])
                sheet.write_string("L%d" % y, "%s" % li[10])
                sheet.write_string("M%d" % y, "%s" % li[11])
                sheet.write_string("N%d" % y, "%s" % li[12])
                sheet.write_string("O%d" % y, "%s" % li[13])
                sheet.write_string("P%d" % y, "%s" % li[14])
                sheet.write_string("Q%d" % y, "%s" % li[15])
                sheet.write_string("R%d" % y, "%s" % li[16])
                sheet.write_string("S%d" % y, "%s" % li[17])
                sheet.write_string("T%d" % y, "%s" % li[18])
                sheet.write_string("U%d" % y, "%s" % li[19])
                sheet.write_string("V%d" % y, "%s" % li[20])
                sheet.write_string("W%d" % y, "%s" % li[21])
                sheet.write_string("X%d" % y, "%s" % li[22])
                sheet.write_string("Y%d" % y, "%s" % li[23])
                sheet.write_string("Z%d" % y, "%s" % li[24])
        except:
            print('error')
            
    sheet.set_column('A:B', 60)
    xl.close()
  • 相关阅读:
    HDU_2030——统计文本中汉字的个数
    HDU_2028——求多个数的最小公倍数
    HDU_2027——统计元音
    HDU_2026——将单词的首字母变大写
    HDU_2025——查找最大的字母
    HDU_2024——判断字符串是否是c语言合法标识符
    HDU_2023——求平均成绩
    HDU_2022——海选女主角
    HDU_2021——最少RMB问题
    HDU_2020——按绝对值排序
  • 原文地址:https://www.cnblogs.com/eleni/p/14302775.html
Copyright © 2011-2022 走看看