zoukankan      html  css  js  c++  java
  • 只是一个文件节点类为了项目的数据处理

      已经研究生二年级下学期了,已经为了这个检索项目写了差不多2年代码了,回想大四下学期就开始接触的这个项目,在研一的时候根本不知道科研如何做,而且项目就自己一个人,也是胡乱写了代码,而且心事太多,简直只能用一个词语形容就是混乱。

      但是在大二上学期10月份的时候,随着一位同学加入简直就是可以说这个项目才真正开始。在我们的系统完成后,我便心血来潮整理我之前写过的代码,因为我们要写论文,所以需要做很多的数据处理来完成实验对比部分,其实这部分数据处理我在大一的时候就已经写过类似的代码,结果现在不得不重新再写,因为写的时间比回想代码时候更短,所以我发现好多代码都重复写了,这是我整理代码的初衷。我更加想的是用一个文件树的数据结构+数据处理算法流程去流水化我们数据处理模块,以后数据处理的代码就可以复用,干苦力的总是应该想办法提高自己的工作效率。所以我带着这个想法实现了下面这个类。用Python写的,因为Python做数据处理,字符处理,批处理真的太便利。其实这个类或许只能我自己用,为什么我会写出一个博客来,或许是因为以后我带研一新生做论文的时候我会让他去看回我们所写过的代码。让他去用我们写过的代码,我并没太多时间带一个新生,所以我让他来看我的博客。

      我的数据结构其实就是个多叉树,用来表示文件目录结构。每一个结点其实就是一个文件,并且用栈和队列实现遍历树的算法,实现添加节点的算法。直接上代码了,以后有时间的时候在回来写注释:

    import os
    from strOp import strExt
    from collections import deque
    from tblOp import tblConcat
    
    class FileNode:
        def __init__(self, _fileName_s='',
                     _brothers=None,
                     _sons=[],
                     _isDir_b=False,
                     _parent= None
                     ):
            self.fileName_s = _fileName_s
            self.bro = _brothers
            self.sons = _sons
            self.isDir_b = _isDir_b
            self.parent = _parent
    
    def addNodeUnderPathUnrecur(root, _path_s):
        ''' inputs: 
                root -> the root of directory tree. It must give the root of the d
                _path_s -> add the sons under the path of _path_s. 
                           if _path_s is equal to 'D:\CS_DATA\' 
                           then all the file under it is added as sons of the node named 'CS_DATA'
            outputs:
                Add all the files under _path_s as its sons. The input must give the root of directory
        '''
        node = searchNodeFromGivenFilePath(root, _path_s)
        filesUnderPath = os.listdir(_path_s)
        lenOfFilesUnderPath = len(filesUnderPath)
        for i in range(lenOfFilesUnderPath):
            if len(node.sons) == 0:
                newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)
                node.sons.append(newNode)
            else:
                newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)
                node.sons[len(node.sons)-1].bro = newNode
                node.sons.append(newNode)
                #isSameName(node, newNode) file system will ensure that no the same name files exist.
    
    def searchNodeFromGivenFilePath(root, _path_s):
        ''' inputs: 
                root -> Must give the root of directory. Meaning the absolute path of a node.
                _path_s -> The absolute path of a node. Examples: 'D:\CS_DATA\'
            output:
                Search the directory tree from root to find the node whose fileName_s is equal to 'CS_DATA'.
                So, you must give the absolute path. Whether 'D:\CS_DATA\' or 'D:\CS_DATA' would be fine.
        '''
        if _path_s[-1] != '\':
            _path_s += '\'
        
        folderStructure = _path_s.split('\')
        if root.bro != None:
            print 'input root is not root of file tree'
            return
        if folderStructure[0] != root.fileName_s:
            print 'the head of input path is not same as root'
            return
        stack = []
        stack.append(root)
        for i in range(1,len(folderStructure)-1):
            if len(stack) == 0:
                print 'stack is empty'
                break
            node = stack.pop()
            flag = 0
            for j in node.sons:
                if folderStructure[i] == j.fileName_s:
                    stack.append(j)
                    flag = 1
            if flag == 0:
                print 'can not find the folder %s' % folderStructure[i]
                return None
        node = stack.pop()
        return node
    
    def addNodeAsSonFromGivenNode(root, _sonPath_s):
        ''' inputs:
                root -> The root of the directory. Which directory that you want to add the node.
                _sonPath_s -> The absolute path of added node. 
                Examples: 'D:\CS_DATA\tree\' means add the node named 'tree' to its parent 'CS_DATA'
            outputs:
                The directory tree with added node.
        '''
        if _sonPath_s[-1] != '\':
            _sonPath_s += '\'
        fileStructure = _sonPath_s.split('\')
        lenOfFileStructure = len(fileStructure)
        if lenOfFileStructure <= 2:
            print 'These is not son in the input path %s' % _sonPath_s
            return
        
        _sonFileName_s = fileStructure[-2]
        _parentPath_s = ''
        for i in range(len(fileStructure)-2):
            _parentPath_s = _parentPath_s + fileStructure[i] + '\'
        _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s)
    
    def _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s):
        ''' inputs:
                root -> The root of directory tree.
                _parentPath_s -> The absolute path of parent
                _sonFileName_s -> the filename of added node
            outputs:
                This function is a auxiliary function of addNodeAsSonFromGivenNode
        '''
        if _parentPath_s[-1] != '\':
            _parentPath_s += '\'
        
        parentNode = searchNodeFromGivenFilePath(root, _parentPath_s)
        if parentNode == None:
            print 'can not find the parent folder %s' % _parentPath_s
            return None
        if len(parentNode.sons) == 0:
            newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)
            if isSameName(parentNode, newNode):
                return
            parentNode.sons.append(newNode)
        else:
            newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)
            if isSameName(parentNode, newNode):
                return
            parentNode.sons[len(parentNode.sons)-1].bro = newNode
            parentNode.sons.append(newNode)
    
    def isSameName(parentNode, sonNode):
        ''' inputs:
                parentNode -> The parent node.
                sonNode -> the son node.
            outputs:
                If sonNode is already in parentNode.sons then return True.
        '''
        for node in parentNode.sons:
            if node.fileName_s == sonNode.fileName_s:
                print 'has same node %s\%s -> %s' % (parentNode.fileName_s, node.fileName_s, sonNode.fileName_s)
                return True
        return False
    
    def addNodeUnderPathRecur(root, _path_s):
        ''' inputs:
                root -> The root of directory.
                _path_s -> The absolute path wanted to be added. Examples: 'D:\CS_DATA\'
            outputs:
                1. Add all the file nodes under _path_s recursively. 
                2. The _path_s must exist in root.
            Unsafe:
                1. Some system directory can not be added recursively. Examples: 'D:\System Volume Information'
                2. I do not make the judgment between files whether have same name when adding.
                3. So, this function must use in the premise of operation system ensuring the rule for us.
        '''
        if _path_s[-1] != '\':
            _path_s = _path_s + '\'
        
        fileStructure = _path_s.split('\')
        if fileStructure[0] == root.fileName_s and len(fileStructure) == 2:
            print '_path_s can not be the root'
            return
        
        returnNode = currentNode = searchNodeFromGivenFilePath(root, _path_s)
        if currentNode == None:
            print 'can not find the path'
            return
        queue = deque([])
        fileName_sl = os.listdir(_path_s)
        for fileName_s in fileName_sl:
            file_s = _path_s + fileName_s
            newNode = FileNode(fileName_s, None, [], os.path.isdir(file_s), currentNode)
            queue.append(newNode)
        while(len(queue) != 0):
            newNode = queue.popleft()
            currentNode = newNode.parent
            lenOfSonsCurrentNode = len(currentNode.sons)
            if lenOfSonsCurrentNode == 0:
                currentNode.sons.append(newNode)
            else:
                currentNode.sons[lenOfSonsCurrentNode-1].bro = newNode
                currentNode.sons.append(newNode)
            
            if newNode.isDir_b == True:
                fullPathOfNewNode = getFullPathOfNode(newNode)
                subFileName_sl = os.listdir(fullPathOfNewNode)
                for subFileName_s in subFileName_sl:
                    subNewNode = FileNode(subFileName_s, None, [], os.path.isdir(fullPathOfNewNode+subFileName_s), newNode)
                    queue.append(subNewNode)
        return returnNode       
     
    def printBrosOfGivenNode(root, _path_s):
        ''' inputs:
                root -> The root of the directory.
                _path_s -> Examples: 'D:\CS_DATA' , 'D:\CS_DATA\'
            outputs:
                print out the bros of 'CS_DATA' for 'D:\CS_DATA'
                print out the sons of 'CS_DATA' for 'D:\CS_DATA\'
        '''
        if _path_s[-1] != '\':
            node = searchNodeFromGivenFilePath(root, _path_s)
            if node == None:
                print 'can not find the node'
            parentOfNode = node.parent
            headOfSons = parentOfNode.sons[0]
            printStr = headOfSons.fileName_s + ','
            while(headOfSons.bro != None):
                headOfSons = headOfSons.bro
                printStr = printStr + headOfSons.fileName_s + ','
        else:
            node = searchNodeFromGivenFilePath(root, _path_s)
            if node == None:
                print 'can not find the node'
            printStr = ''
            if len(node.sons) == 0:
                print 'its sons is empty'
            else:
                for son in node.sons:
                    printStr = printStr + son.fileName_s + ','
        print printStr[:-1]
    
    def crtFileTreeFromPath(_path_s):
        ''' inputs:
                _path_s -> Examples: 'D:\sketchDataset\' 
            outputs:
                This function will create the root node by 'D:',
                and then, call addNodeUnderPathUnrecur to add files under 'D:\',
                and then, again call addNodeUnderPathUnrecur to add files under 'D:\sketchDataset\'
                This process is a loop until the last separator of _path_s.
        '''
        if _path_s[-1] != '\':
            _path_s += '\'
        fileStructure = _path_s.split('\')
        lenOfFileStructure = len(fileStructure)
        root = FileNode(_fileName_s=fileStructure[0], _isDir_b=os.path.isdir(fileStructure[0]))
        
        fileStr = root.fileName_s + '\'
        addNodeUnderPathUnrecur(root, fileStr)
        for i in range(1, lenOfFileStructure-1):
            file_s = fileStructure[i]
            fileStr = fileStr + file_s + '\'
            addNodeUnderPathUnrecur(root, fileStr)
        return root
    
    def searchLeafNodeUnderGivenNode(root, _path_s):
        ''' inputs:
                root -> For the given directory tree.
                _path_s -> The absolute path of node that wanted to search all the leafs under it.
            outputs:
                Return all the leafs under the given _path_s.
                Leaf is the file whose has not sons and it is not a directory
        '''
        node = searchNodeFromGivenFilePath(root, _path_s)
        leafs = []
        if node == None:
            print 'can not find the node in searchLeafNodeUnderGivenNode'
            return
        queue = deque([])
        queue.append(node)
        while(len(queue) != 0):
            currentNode = queue.popleft()
            if len(currentNode.sons) == 0 and (currentNode.isDir_b == False):
                leafs.append(currentNode)
            else:
                for son in currentNode.sons:
                    queue.append(son)
        return leafs        
    
    def getFullPathOfNode(givenNode):
        ''' 
            find the full(absolute) path of the input node.
        '''
        tmpNode = givenNode
        fullPathOfNode = tmpNode.fileName_s + '\'
        while(tmpNode.parent != None):
            tmpNode = tmpNode.parent
            fullPathOfNode = tmpNode.fileName_s + '\' + fullPathOfNode
        return fullPathOfNode
    

     比如我要计算草图检索的验证集,可以上上面的代码后面添加代码:

    if __name__ == '__main__':
        root = crtFileTreeFromPath('D:\sketchDataset\')
        categroyNode = addNodeUnderPathRecur(root, 'D:\sketchDataset\category\')
        leafs = searchLeafNodeUnderGivenNode(root, 'D:\sketchDataset\category\')
        containModel_t = {}
        for i in range(len(leafs)):
            if leafs[i].parent.fileName_s not in containModel_t:
                containModel_t[leafs[i].parent.fileName_s] = []
                containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s='.off'))
            else:
                containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s='.off'))
        categroyNode = addNodeUnderPathRecur(root, 'D:\sketchDataset\all_categorized_sketches\')
        sketchToCate_t = {}
        for son in categroyNode.sons:
            sketchNodes = son.sons
            for sketchNode in sketchNodes:
                sketchName = strExt.extractSketchNameWithSuffix(sketchNode.fileName_s, suffix_s='.txt')
                if sketchName not in sketchToCate_t:
                    sketchToCate_t[sketchName] = son.fileName_s
         
        wanted = tblConcat.concatTableByKey_ValAndVal_Vals(sketchToCate_t, containModel_t)
        print wanted

     结果就是,也就是草图165号的验证模型是'm1646.off, m1647.off'等等。

    {'s165.txt': ['m1646.off', 'm1647.off', 'm1648.off', 'm1649.off', 'm1650.off', 'm1651.off', 'm1652.off', 'm1653.off', 'm1654.off', 'm1655.off', 'm1656.off', 'm1657.off', 'm1658.off', 'm1659.off', 'm1660.off', 'm1661.off', 'm1662.off', 'm1663.off', 'm1664.off', 'm1665.off'] ......}
    
  • 相关阅读:
    Core Data 入门
    iOS布局之Auto Layout
    iOS 布局之 Springs and Struts
    Soul 学习笔记---使用 nacos 实现数据同步上篇(七)
    Soul 学习笔记---使用 zookeeper 实现数据同步(六)
    Soul 学习笔记---数据同步 websocket 连接建立过程分析(五)
    Soul 学习笔记---soul 数据同步的浅显分析(四)
    Soul学习笔记---运行 soul-examples-dubbo(三)
    Soul学习笔记---运行 soul-examples-http(二)
    windows下安装zookeeper 及 遇到的问题---打开zkServer.cmd闪退,此时不应有 Javajdk1.8.0_144
  • 原文地址:https://www.cnblogs.com/Key-Ky/p/4461700.html
Copyright © 2011-2022 走看看