zoukankan      html  css  js  c++  java
  • 项目笔记《DeepLung:Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification》(三)(下)结果评估

    在(上)中讲了如何得到csv文件并调用noduleCADEvaluationLUNA16.py求取froc值,这里就讲一讲froc值是如何求取的。

       annotations_filename = './annotations/annotations.csv'
        annotations_excluded_filename = './annotations/annotations_excluded.csv'
        seriesuids_filename = './annotations/seriesuids.csv'
        results_filename = './annotations/3DRes18FasterR-CNN.csv'#3D Faster R-CNN - Res18.csv' #top5.csv'#
        seriesuids_path = '/home/ustc/lclin/code/DeepLung/evaluationScript/series_uid9.csv'
        results_path = "/home/ustc/lclin/code/DeepLung/detector/results/wzy_res18/retrft969/val120/predanno-0.125pbb.csv"
        noduleCADEvaluation(annotations_filename,annotations_excluded_filename,seriesuids_path,results_path,'./')

    如上面代码所示,输入标签文件,结果文件,调用noduleCADEvaluation函数即可,任务完成,十分简单。

    接下来看一下这个函数。

    def noduleCADEvaluation(annotations_filename,annotations_excluded_filename,seriesuids_filename,results_filename,outputDir):
        '''
        function to load annotations and evaluate a CAD algorithm
        @param annotations_filename: list of annotations
        @param annotations_excluded_filename: list of annotations that are excluded from analysis
        @param seriesuids_filename: list of CT images in seriesuids
        @param results_filename: list of CAD marks with probabilities
        @param outputDir: output directory
        '''
        
        print annotations_filename
        
        (allNodules, seriesUIDs) = collect(annotations_filename, annotations_excluded_filename, seriesuids_filename) #根据标签和用户id,求出所有结节,所有用户
        
        evaluateCAD(seriesUIDs, results_filename, outputDir, allNodules, #根据结节,用户,结果文件,输出froc值
                    os.path.splitext(os.path.basename(results_filename))[0],
                    maxNumberOfCADMarks=100, performBootstrapping=bPerformBootstrapping,
                    numberOfBootstrapSamples=bNumberOfBootstrapSamples, confidence=bConfidence)

    这个函数调用了两个函数collect和evaluateCAD。先从collect开始看。

    def collect(annotations_filename,annotations_excluded_filename,seriesuids_filename):
        annotations          = csvTools.readCSV(annotations_filename)
        annotations_excluded = csvTools.readCSV(annotations_excluded_filename)
        seriesUIDs_csv = csvTools.readCSV(seriesuids_filename)
        
        seriesUIDs = [] #建立一个用户列表,将用户id添加进去
        for seriesUID in seriesUIDs_csv:
            seriesUIDs.append(seriesUID[0]) #seriesUID也是一个列表,只不过只有一个元素
    
        allNodules = collectNoduleAnnotations(annotations, annotations_excluded, seriesUIDs) 
        
        return (allNodules, seriesUIDs)

    该函数又调用了另一个函数collectNoduleAnnotations。

    def collectNoduleAnnotations(annotations, annotations_excluded, seriesUIDs):
        allNodules = {} #将所有结节存储在字典中
        noduleCount = 0
        noduleCountTotal = 0
        
        for seriesuid in seriesUIDs:
            # print 'adding nodule annotations: ' + seriesuid
            
            nodules = []
            numberOfIncludedNodules = 0    #对真正用来检测的结节计数
            
            # add included findings
            header = annotations[0]
            for annotation in annotations[1:]:
                nodule_seriesuid = annotation[header.index(seriesuid_label)]
                
                if seriesuid == nodule_seriesuid:  #将结节所属的用户id与要建立字典索引的id比较,若相同,就获取它
                    nodule = getNodule(annotation, header, state = "Included")
                    nodules.append(nodule)
                    numberOfIncludedNodules += 1
            
            # add excluded findings
            header = annotations_excluded[0]
            for annotation in annotations_excluded[1:]:
                nodule_seriesuid = annotation[header.index(seriesuid_label)]
                
                if seriesuid == nodule_seriesuid: #对无关结节也执行上面的操作,不同的是不对include计数
                    nodule = getNodule(annotation, header, state = "Excluded")
                    nodules.append(nodule)
                
            allNodules[seriesuid] = nodules
            noduleCount      += numberOfIncludedNodules
            noduleCountTotal += len(nodules)
        
        print 'Total number of included nodule annotations: ' + str(noduleCount)
        print 'Total number of nodule annotations: ' + str(noduleCountTotal)
        return allNodules

    这段代码比较简单,其中有一个获取结节的函数getNodule,需要看一下

    def getNodule(annotation, header, state = ""):
        nodule = NoduleFinding()
        nodule.coordX = annotation[header.index(coordX_label)] #依次将x,y,z添加到nodule对象的属性中
        nodule.coordY = annotation[header.index(coordY_label)]
        nodule.coordZ = annotation[header.index(coordZ_label)]
        
        if diameter_mm_label in header:          #检查有无直径标签,有则添加
            nodule.diameter_mm = annotation[header.index(diameter_mm_label)]
        
        if CADProbability_label in header:       #检查有无概率标签,有则添加
            nodule.CADprobability = annotation[header.index(CADProbability_label)]
        
        if not state == "":
            nodule.state = state
    
        return nodule

     这里又又出现一个函数,准确地说是类,NoduleFinding(),这个类存在于nodulefinding.py模块,也是在evaluationScript这个文件夹中。

    一起来欣赏下这个类,一目了然,而且这个模块也只有这段代码。

    class NoduleFinding(object):
      '''
      Represents a nodule
      '''
      
      def __init__(self, noduleid=None, coordX=None, coordY=None, coordZ=None, coordType="World",
                   CADprobability=None, noduleType=None, diameter=None, state=None, seriesInstanceUID=None):
    
        # set the variables and convert them to the correct type
        self.id = noduleid
        self.coordX = coordX
        self.coordY = coordY
        self.coordZ = coordZ
        self.coordType = coordType
        self.CADprobability = CADprobability
        self.noduleType = noduleType
        self.diameter_mm = diameter
        self.state = state
        self.candidateID = None
        self.seriesuid = seriesInstanceUID

    另外,大家可能会奇怪,xxx_label都是些什么,这些在文件noduleCADEvaluationLUNA16.py中都有定义,如下,看一下标签文件就明白了。

    seriesuid_label = 'seriesuid'
    coordX_label = 'coordX'
    coordY_label = 'coordY'
    coordZ_label = 'coordZ'
    diameter_mm_label = 'diameter_mm'
    CADProbability_label = 'probability'

    最后,回到起点,看看froc究竟是如何计算的,有请evaluateCAD函数,这段代码超级长。

    def evaluateCAD(seriesUIDs, results_filename, outputDir, allNodules, CADSystemName, maxNumberOfCADMarks=-1,#输入病人id,检测结果,评估结果的输出目录,所有标签结节,
                    performBootstrapping=True,numberOfBootstrapSamples=1000,confidence = 0.95):         #检测系统的名字,其余参数与自助采样有关,不太懂
        '''
        function to evaluate a CAD algorithm
        @param seriesUIDs: list of the seriesUIDs of the cases to be processed
        @param results_filename: file with results
        @param outputDir: output directory
        @param allNodules: dictionary with all nodule annotations of all cases, keys of the dictionary are the seriesuids
        @param CADSystemName: name of the CAD system, to be used in filenames and on FROC curve
        '''
    
        nodOutputfile = open(os.path.join(outputDir,'CADAnalysis.txt'),'w') #写入CADAnalysis文件
        nodOutputfile.write("
    ")
        nodOutputfile.write((60 * "*") + "
    ")
        nodOutputfile.write("CAD Analysis: %s
    " % CADSystemName)
        nodOutputfile.write((60 * "*") + "
    ")
        nodOutputfile.write("
    ")
    
        results = csvTools.readCSV(results_filename)#读取检测结果csv格式,seriesuid,coordX,coordY,coordZ,probability
    
        allCandsCAD = {}
        
        for seriesuid in seriesUIDs: #对每个病例读取相应的候选结节
            
            # collect candidates from result file
            nodules = {}  #从检测结果文件中获取与病例对应的候选结节,添加进字典
            header = results[0]
            
            i = 0
            for result in results[1:]:
                nodule_seriesuid = result[header.index(seriesuid_label)]
                
                if seriesuid == nodule_seriesuid:
                    nodule = getNodule(result, header)
                    nodule.candidateID = i
                    nodules[nodule.candidateID] = nodule
                    i += 1
    
            if (maxNumberOfCADMarks > 0):
                # number of CAD marks, only keep must suspicous marks
    
                if len(nodules.keys()) > maxNumberOfCADMarks:
                    # make a list of all probabilities
                    probs = []
                    for keytemp, noduletemp in nodules.iteritems():
                        probs.append(float(noduletemp.CADprobability))
                    probs.sort(reverse=True) # sort from large to small
                    probThreshold = probs[maxNumberOfCADMarks]
                    nodules2 = {}
                    nrNodules2 = 0
                    for keytemp, noduletemp in nodules.iteritems():
                        if nrNodules2 >= maxNumberOfCADMarks:
                            break
                        if float(noduletemp.CADprobability) > probThreshold:
                            nodules2[keytemp] = noduletemp
                            nrNodules2 += 1
    
                    nodules = nodules2
            
            # print 'adding candidates: ' + seriesuid
            allCandsCAD[seriesuid] = nodules #将病例与对应候选结节存入字典
        
        # open output files
        nodNoCandFile = open(os.path.join(outputDir, "nodulesWithoutCandidate_%s.txt" % CADSystemName), 'w')
        
        # --- iterate over all cases (seriesUIDs) and determine how
        # often a nodule annotation is not covered by a candidate
    
        # initialize some variables to be used in the loop 
        candTPs = 0    #这里是一堆定义,让人头大
        candFPs = 0
        candFNs = 0
        candTNs = 0
        totalNumberOfCands = 0 #总候选结节数,也就是你的检测结果文件中的所有结节数量
        totalNumberOfNodules = 0 #总标签结节数
        doubleCandidatesIgnored = 0
        irrelevantCandidates = 0
        minProbValue = -1000000000.0 # minimum value of a float
        FROCGTList = []
        FROCProbList = []
        FPDivisorList = []
        excludeList = []
        FROCtoNoduleMap = []
        ignoredCADMarksList = []
    
        # -- loop over the cases
        for seriesuid in seriesUIDs:
            # get the candidates for this case
            try:
                candidates = allCandsCAD[seriesuid]
            except KeyError:
                candidates = {}
    
            # add to the total number of candidates
            totalNumberOfCands += len(candidates.keys())
    
            # make a copy in which items will be deleted
            candidates2 = candidates.copy() #对候选结节复制一个副本
    
            # get the nodule annotations on this case
            try:
                noduleAnnots = allNodules[seriesuid] #获取所有结节中属于该病例的结节(既包括真的结节,也包括无关的结节)
            except KeyError:
                noduleAnnots = []
    
            # - loop over the nodule annotations
            for noduleAnnot in noduleAnnots: #对标签结节循环处理
                # increment the number of nodules
                if noduleAnnot.state == "Included": #如果是用来评测的真结节,则
                    totalNumberOfNodules += 1
    
                x = float(noduleAnnot.coordX) #获取结节坐标
                y = float(noduleAnnot.coordY)
                z = float(noduleAnnot.coordZ)
    
                # 2. Check if the nodule annotation is covered by a candidate
                # A nodule is marked as detected when the center of mass of the candidate is within a distance R of
                # the center of the nodule. In order to ensure that the CAD mark is displayed within the nodule on the
                # CT scan, we set R to be the radius of the nodule size.
                diameter = float(noduleAnnot.diameter_mm)
                if diameter < 0.0:
                  diameter = 10.0
                radiusSquared = pow((diameter / 2.0), 2.0)
    
                found = False
                noduleMatches = []
                for key, candidate in candidates.iteritems(): #对每一个候选结节,判断是否与真实结节相交
                    x2 = float(candidate.coordX)
                    y2 = float(candidate.coordY)
                    z2 = float(candidate.coordZ)
                    dist = math.pow(x - x2, 2.) + math.pow(y - y2, 2.) + math.pow(z - z2, 2.)
                    if dist < radiusSquared: #判断是否在半径距离内
                        if (noduleAnnot.state == "Included"): #若是用来检测的结节,添加进noduleMatches,并删除该候选结节
                            found = True
                            noduleMatches.append(candidate)
                            if key not in candidates2.keys():#这里不复杂,就是说把每个与标签结节相交的候选结节提取出来后,要删除副本中的id,这样是检测会否有其它标签结节也与该结节所属的候选结节相交
                                print "This is strange: CAD mark %s detected two nodules! Check for overlapping nodule annotations, SeriesUID: %s, nodule Annot ID: %s" % (str(candidate.id), seriesuid, str(noduleAnnot.id))
                            else:
                                del candidates2[key]
                        elif (noduleAnnot.state == "Excluded"): # an excluded nodule #若是无关结节,则添加进ignoredCADMarksList
                            if bOtherNodulesAsIrrelevant: #    delete marks on excluded nodules so they don't count as false positives
                                if key in candidates2.keys():
                                    irrelevantCandidates += 1
                                    ignoredCADMarksList.append("%s,%s,%s,%s,%s,%s,%.9f" % (seriesuid, -1, candidate.coordX, candidate.coordY, candidate.coordZ, str(candidate.id), float(candidate.CADprobability)))
                                    del candidates2[key]
                if len(noduleMatches) > 1: # double detection #如果一个标签结节对应多个候选结节,记录下数目
                    doubleCandidatesIgnored += (len(noduleMatches) - 1)
                if noduleAnnot.state == "Included": #若该结节是include结节
                    # only include it for FROC analysis if it is included
                    # otherwise, the candidate will not be counted as FP, but ignored in the
                    # analysis since it has been deleted from the nodules2 vector of candidates
                    if found == True: #若找到与之匹配的候选结节
                        # append the sample with the highest probability for the FROC analysis
                        maxProb = None
                        for idx in range(len(noduleMatches)):
                            candidate = noduleMatches[idx]
                            if (maxProb is None) or (float(candidate.CADprobability) > maxProb):
                                maxProb = float(candidate.CADprobability) #记录匹配的候选结节的概率,将最大概率存入maxProb
    
                        FROCGTList.append(1.0) #添加 1
                        FROCProbList.append(float(maxProb)) #添加刚刚的最大概率
                        FPDivisorList.append(seriesuid) #添加病例id
                        excludeList.append(False) #添加False
                        FROCtoNoduleMap.append("%s,%s,%s,%s,%s,%.9f,%s,%.9f" % (seriesuid, noduleAnnot.id, noduleAnnot.coordX, noduleAnnot.coordY, noduleAnnot.coordZ, float(noduleAnnot.diameter_mm), str(candidate.id), float(candidate.CADprobability)))
                        candTPs += 1 #添加1
                    else: #若未找到与之匹配的结节
                        candFNs += 1 
                        # append a positive sample with the lowest probability, such that this is added in the FROC analysis
                        FROCGTList.append(1.0)
                        FROCProbList.append(minProbValue)
                        FPDivisorList.append(seriesuid)
                        excludeList.append(True)
                        FROCtoNoduleMap.append("%s,%s,%s,%s,%s,%.9f,%s,%s" % (seriesuid, noduleAnnot.id, noduleAnnot.coordX, noduleAnnot.coordY, noduleAnnot.coordZ, float(noduleAnnot.diameter_mm), int(-1), "NA"))
                        nodNoCandFile.write("%s,%s,%s,%s,%s,%.9f,%s
    " % (seriesuid, noduleAnnot.id, noduleAnnot.coordX, noduleAnnot.coordY, noduleAnnot.coordZ, float(noduleAnnot.diameter_mm), str(-1)))
    
            # add all false positives to the vectors
            for key, candidate3 in candidates2.iteritems(): #此时candidata2中都是无人领取的结节,都是FP
                candFPs += 1
                FROCGTList.append(0.0)
                FROCProbList.append(float(candidate3.CADprobability))
                FPDivisorList.append(seriesuid)
                excludeList.append(False)
                FROCtoNoduleMap.append("%s,%s,%s,%s,%s,%s,%.9f" % (seriesuid, -1, candidate3.coordX, candidate3.coordY, candidate3.coordZ, str(candidate3.id), float(candidate3.CADprobability)))
    
        if not (len(FROCGTList) == len(FROCProbList) and len(FROCGTList) == len(FPDivisorList) and len(FROCGTList) == len(FROCtoNoduleMap) and len(FROCGTList) == len(excludeList)):
            nodOutputfile.write("Length of FROC vectors not the same, this should never happen! Aborting..
    ")
    
        nodOutputfile.write("Candidate detection results:
    ")
        nodOutputfile.write("    True positives: %d
    " % candTPs)
        nodOutputfile.write("    False positives: %d
    " % candFPs)
        nodOutputfile.write("    False negatives: %d
    " % candFNs)
        nodOutputfile.write("    True negatives: %d
    " % candTNs)
        nodOutputfile.write("    Total number of candidates: %d
    " % totalNumberOfCands)
        nodOutputfile.write("    Total number of nodules: %d
    " % totalNumberOfNodules)
    
        nodOutputfile.write("    Ignored candidates on excluded nodules: %d
    " % irrelevantCandidates)
        nodOutputfile.write("    Ignored candidates which were double detections on a nodule: %d
    " % doubleCandidatesIgnored)
        if int(totalNumberOfNodules) == 0:
            nodOutputfile.write("    Sensitivity: 0.0
    ")
        else:
            nodOutputfile.write("    Sensitivity: %.9f
    " % (float(candTPs) / float(totalNumberOfNodules)))
        nodOutputfile.write("    Average number of candidates per scan: %.9f
    " % (float(totalNumberOfCands) / float(len(seriesUIDs))))
    
        # compute FROC
        print FROCGTList
        print FROCProbList
        print len(FROCGTList)
        fps, sens, thresholds = computeFROC(FROCGTList,FROCProbList,len(seriesUIDs),excludeList) #这一步最为关键,计算FROC值,其实返回的是召回率与假阳性率的列表
        
        if performBootstrapping: #是否使用自助采样,不太懂
            fps_bs_itp,sens_bs_mean,sens_bs_lb,sens_bs_up = computeFROC_bootstrap(FROCGTList,FROCProbList,FPDivisorList,seriesUIDs,excludeList,
                                                                      numberOfBootstrapSamples=numberOfBootstrapSamples, confidence = confidence)
            
        # Write FROC curve #计算到上面就结束了,后面是一些画图
        with open(os.path.join(outputDir, "froc_%s.txt" % CADSystemName), 'w') as f:
            for i in range(len(sens)):
                f.write("%.9f,%.9f,%.9f
    " % (fps[i], sens[i], thresholds[i]))
        
        # Write FROC vectors to disk as well
        with open(os.path.join(outputDir, "froc_gt_prob_vectors_%s.csv" % CADSystemName), 'w') as f:
            for i in range(len(FROCGTList)):
                f.write("%d,%.9f
    " % (FROCGTList[i], FROCProbList[i]))
    
        fps_itp = np.linspace(FROC_minX, FROC_maxX, num=10001)
        
        sens_itp = np.interp(fps_itp, fps, sens)
        frvvlu = 0
        nxth = 0.125
        for fp, ss in zip(fps_itp, sens_itp):
            if abs(fp - nxth) < 3e-4:
                frvvlu += ss
                nxth *= 2
            if abs(nxth - 16) < 1e-5: break
        print(frvvlu/7, nxth)
        print(sens_itp[fps_itp==0.125]+sens_itp[fps_itp==0.25]+sens_itp[fps_itp==0.5]+sens_itp[fps_itp==1]+sens_itp[fps_itp==2]
            +sens_itp[fps_itp==4]+sens_itp[fps_itp==8])
        if performBootstrapping:
            # Write mean, lower, and upper bound curves to disk
            with open(os.path.join(outputDir, "froc_%s_bootstrapping.csv" % CADSystemName), 'w') as f:
                f.write("FPrate,Sensivity[Mean],Sensivity[Lower bound],Sensivity[Upper bound]
    ")
                for i in range(len(fps_bs_itp)):
                    f.write("%.9f,%.9f,%.9f,%.9f
    " % (fps_bs_itp[i], sens_bs_mean[i], sens_bs_lb[i], sens_bs_up[i]))
        else:
            fps_bs_itp = None
            sens_bs_mean = None
            sens_bs_lb = None
            sens_bs_up = None
    
        # create FROC graphs
        if int(totalNumberOfNodules) > 0:
            graphTitle = str("")
            fig1 = plt.figure()
            ax = plt.gca()
            clr = 'b'
            plt.plot(fps_itp, sens_itp, color=clr, label="%s" % CADSystemName, lw=2)
            if performBootstrapping:
                plt.plot(fps_bs_itp, sens_bs_mean, color=clr, ls='--')
                plt.plot(fps_bs_itp, sens_bs_lb, color=clr, ls=':') # , label = "lb")
                plt.plot(fps_bs_itp, sens_bs_up, color=clr, ls=':') # , label = "ub")
                ax.fill_between(fps_bs_itp, sens_bs_lb, sens_bs_up, facecolor=clr, alpha=0.05)
            xmin = FROC_minX
            xmax = FROC_maxX
            plt.xlim(xmin, xmax)
            plt.ylim(0.5, 1)
            plt.xlabel('Average number of false positives per scan')
            plt.ylabel('Sensitivity')
            plt.legend(loc='lower right')
            plt.title('FROC performance - %s' % (CADSystemName))
            
            if bLogPlot:
                plt.xscale('log', basex=2)
                ax.xaxis.set_major_formatter(FixedFormatter([0.125,0.25,0.5,1,2,4,8]))
            
            # set your ticks manually
            ax.xaxis.set_ticks([0.125,0.25,0.5,1,2,4,8])
            ax.yaxis.set_ticks(np.arange(0.5, 1, 0.1))
            # ax.yaxis.set_ticks(np.arange(0, 1.1, 0.1))
            plt.grid(b=True, which='both')
            plt.tight_layout()
    
            plt.savefig(os.path.join(outputDir, "froc_%s.png" % CADSystemName), bbox_inches=0, dpi=300)
    
        return (fps, sens, thresholds, fps_bs_itp, sens_bs_mean, sens_bs_lb, sens_bs_up)

     这里会计算FROC,调用的是computeFROC函数。

    def computeFROC(FROCGTList, FROCProbList, totalNumberOfImages, excludeList):
        # Remove excluded candidates
        FROCGTList_local = []
        FROCProbList_local = []
        for i in range(len(excludeList)):
            if excludeList[i] == False:
                FROCGTList_local.append(FROCGTList[i])
                FROCProbList_local.append(FROCProbList[i])
        
        numberOfDetectedLesions = sum(FROCGTList_local)
        totalNumberOfLesions = sum(FROCGTList)
        totalNumberOfCandidates = len(FROCProbList_local)
        fpr, tpr, thresholds = skl_metrics.roc_curve(FROCGTList_local, FROCProbList_local, pos_label=1)
        if sum(FROCGTList) == len(FROCGTList): # Handle border case when there are no false positives and ROC analysis give nan values.
          print "WARNING, this system has no false positives.."
          fps = np.zeros(len(fpr))
        else:
          fps = fpr * (totalNumberOfCandidates - numberOfDetectedLesions) / totalNumberOfImages
        sens = (tpr * numberOfDetectedLesions) / totalNumberOfLesions
        return fps, sens, thresholds

    到此,结束。

  • 相关阅读:
    LBS 经纬度定位
    LBS 经纬度定位
    GPS定位基本原理
    GPS定位基本原理
    Android Studio 之 启动和停止服务
    Android Studio 之 启动和停止服务
    【算法】最短路——两点最短总权和
    【算法】最短路——两点最短总权和
    【郑轻】[1743]解方程
    【郑轻】[1743]解方程
  • 原文地址:https://www.cnblogs.com/wzyuan/p/9718779.html
Copyright © 2011-2022 走看看