zoukankan      html  css  js  c++  java
  • 【Python】Docx解析

    1、cd D:ProgramDataAnaconda3

    2、pip install python-docx

    3、python代码处理

    # -*- coding: utf-8 -*-
     
    
    
    import os
    import docx
    from win32com import client as wc
    
    docs = []
     
    def traverse(f):
        fs = os.listdir(f)
        for f1 in fs:
            tmp_path = os.path.join(f,f1)
            if not os.path.isdir(tmp_path):
                #print('文件: %s'%tmp_path)
                if  os.path.splitext(tmp_path)[-1].lower() == ".doc" or os.path.splitext(tmp_path)[-1].lower() == ".docx":
                    #print('文件: %s'%tmp_path)
                    docs.append(tmp_path)
            else:
                #print('文件夹:%s'%tmp_path)
                traverse(tmp_path)
    
    
    def parseDoc(f):
        doc = docx.Document(f)
        parag_num = 0
        for para in doc.paragraphs :
            print("----------------------------------------------------")
            print(para.text)
            print("----------------------------------------------------")
            parag_num += 1      
        print ('This document has ', parag_num, ' paragraphs')
    
    def doc2docx(full_path):
        #dirname = os.path.dirname(full_path)
        #filename = os.path.basename(full_path)
        #newpath = full_path.replace('doc','docx')
        newpath = full_path + "x"
    
        if os.path.exists(newpath):
            return
    
        # 首先将doc转换成docx
        word = wc.Dispatch("Word.Application")
    
        # 找到word路径 + 文件名 ,即可打开文件 
        doc = word.Documents.Open(full_path)
        
        # 使用参数16表示将doc转换成docx,保存成docx后才能 读文件
        doc.SaveAs(newpath,16)
        doc.Close()
        word.Quit()
    
                
    path = 'E:/NLP/Docs/'
    
    traverse(path)
     
    for k,v in enumerate(docs):
        if k < 1:
            print(k,v)
            parseDoc(v)
            #doc2docx(v)
  • 相关阅读:
    路径
    JSTL-3
    JSTL-2
    JSTL-1
    EL和JSTL的关系
    Mybatis控制台打印SQL语句的两种方式
    centOS7安装JDK
    centOS7下安装GUI图形界面
    centOS7配置IP地址
    Office2016专业增强版永久激活
  • 原文地址:https://www.cnblogs.com/defineconst/p/9915851.html
Copyright © 2011-2022 走看看