selinium是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。支持的浏览器包括IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera等。学习python爬虫基础的人,都会接触到这个selinium框架。
一、首先,当然是下载selinium模块,前提你已经下载了python3,还有python编辑器,比如pycharm,IDLE,Visual Studio Code等等,还有很多python编辑器,详情可查看该链接:https://baijiahao.baidu.com/s?id=1620388483830154843&wfr=spider&for=pc,博主使用的是pycharm编辑器。
二、因为selinium框架是运行在浏览器上的,所以要先下载好浏览器对应的各浏览器驱动。一般都是用谷歌、火狐、IE浏览器,对应的浏览器驱动可以查看该链接:https://www.cnblogs.com/momolei/p/10118526.html,注意:不同的浏览器的版本对应的xxx.exe 版本也不一样,这个很重要哦。下载好的xxx.exe应该放到python3目录下。
如果以上步骤都已经弄好,在CMD黑窗口下载selinium:pip install selinium。
三、然后,可以在python编辑器上,调试是否可以利用selinium框架打开浏览器。比如:
from selenium import webdriver #设置chromedriver browser = webdriver.Chrome("C:Program Files (x86)GoogleChromeApplicationchromedriver.exe") #设置超时时间 browser.set_page_load_timeout(10) #打开百度网页 browser.get("https://www.baidu.com") print(browser.page_source)
如果能够看到已经打开百度,正常返回了内容,说明你已经成功了50%,安装好了selinium模块后,就可以进行cacti流量图的爬取了。
四、好了,可以进行正文代码部分了。
from selenium import webdriver from lxml import etree import time import datetime driver =webdriver.Chrome(r'D:python3.7chromedriver.exe') driver.get('cacti的IP地址,比如http://xxx/graph_view.php') name = driver.find_element_by_name("login_username") passwd = driver.find_element_by_name("login_password") name.send_keys('登录账号') passwd.send_keys('登录密码') submit = driver.find_element_by_xpath('//td/input[@value="登录"]') submit.click() i = 6 for i in range(6,-1,-1): if i == 6: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days = i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list1= [format_otherStyleTime1,format_otherStyleTime2,format_otherStyleTime3,format_otherStyleTime4] elif i==5: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list2=[format_otherStyleTime1,format_otherStyleTime2,format_otherStyleTime3,format_otherStyleTime4] elif i==4: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list3 = [format_otherStyleTime1, format_otherStyleTime2, format_otherStyleTime3, format_otherStyleTime4] elif i == 3: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list4 = [format_otherStyleTime1, format_otherStyleTime2, format_otherStyleTime3, format_otherStyleTime4] elif i == 2: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list5 = [format_otherStyleTime1, format_otherStyleTime2, format_otherStyleTime3, format_otherStyleTime4] elif i == 1: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] list6 = [format_otherStyleTime1, format_otherStyleTime2, format_otherStyleTime3, format_otherStyleTime4] elif i == 0: threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i)) otherStyleTime = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime1 = "%s 06:00:00" % otherStyleTime.split()[0] format_otherStyleTime2 = "%s 10:00:00" % otherStyleTime.split()[0] format_otherStyleTime3 = "%s 18:00:00" % otherStyleTime.split()[0] format_otherStyleTime4 = "%s 22:00:00" % otherStyleTime.split()[0] threeDayAgo = (datetime.datetime.now() - datetime.timedelta(days=i-1)) otherStyleTime1 = threeDayAgo.strftime("%Y-%m-%d %H:%M:%S") format_otherStyleTime5 = "%s 06:00:00" % otherStyleTime1.split()[0] list7 = [format_otherStyleTime1, format_otherStyleTime2, format_otherStyleTime3, format_otherStyleTime4,format_otherStyleTime5] list = (list1+list2+list3+list4+list5+list6+list7) else: break for i in range(0,30): driver.find_element_by_name("date1").clear() # 调用clear()方法去清除 driver.find_element_by_name("date2").clear() driver.find_element_by_name("date1").send_keys(list[i]) driver.find_element_by_name("date2").send_keys(list[i+1]) button = driver.find_element_by_name("button_refresh_x").click() a = driver.find_element_by_xpath(".//tbody/tr[4]/td//table/tbody/tr/td[2]/a/img").click() picture_list=('%s %s'%(i,'.jpg')) driver.save_screenshot(picture_list) b = driver.find_element_by_xpath('.//tbody/tr/td//a[2]').click() driver.close()
一气呵成,可以看到这个py文件下就有了你想要的流量图。我采集的流量图时间间断是根据我工作所需的要求,小伙胖可以根据自己需要的时间段进行修改。
我觉得中间那个日期的for循环,应该是可以简单点的,但是目前还没有想到怎么优化这段代码,后续有优化再更新博文。小伙伴如果有更好的想法也可以私聊我哦。
如需转载,请附带原创链接,感谢!