zoukankan      html  css  js  c++  java
  • no2.crossdomain.xml批量读取(待完善)

    读取太多url有问题

    #coding=utf-8 
    import urllib
    import requests
    import sys
    import re
    import time
    
    
    def getxml(url):
        xml = urllib.urlopen(url+'/crossdomain.xml')
        xmlread = xml.read() 
        reg = str(r'(?=domain=)(.*?)(?=/>)')
        #reg = str(r'<?xml*(.*?)</')
        reg = re.compile(reg)
        domaintxt = re.findall(reg,xmlread)
        #print domaintxt
        return domaintxt
    
    f = open('xmlsource.txt','r')
    f1 = open('reslut.txt','w')
    #try:
    context=list_of_all_the_lines = f.readlines( )
    for i in context:
        #context:
        x = i.strip()
        print 'website:'+x+' have '+str(len(getxml(x)))+' domain:'
        print >>f1,'website:'+x+' have '+str(len(getxml(x)))+' domain:'
        #print context[i] +str(len(getxml(x)))
        xmllen = len(getxml(x))
        for m in range(0,xmllen,1):
            falresult = getxml(x)[m]
            falresult = falresult.replace('"','')
            falresult = falresult.replace('domain=','')
            print falresult
            print >>f1,falresult
        print ('
    ')
        print >>f1,('
    ')
        time.sleep(1)
    print ('Over')
    print >>f1,('Over')
    f1.close()
    

     xml:

    http://www.sina.com.cn/
    http://www.discuz.net/
    http://www.rising.com.cn/
    http://www.ifeng.com//
    http://www.sdo.com/
    http://www.sogou.com/
    http://www.163.com/

     

  • 相关阅读:
    VUE和支付
    酱油荷包蛋
    .NET CORE迁移踩坑
    文本超过控件长度自动显示省略号的css
    jquery综合
    123
    Java l ArrayList
    SQL|几种插入语句的区别
    Spark l Spark启启动时的Master参数是什么?
    【外】001-python3之zip函数
  • 原文地址:https://www.cnblogs.com/crac/p/5451639.html
Copyright © 2011-2022 走看看