zoukankan      html  css  js  c++  java
  • 网页hack程序编写

    网页hack程序编写

    batch

    Q: 脚本当前路径

    %~dp0

    Q:用tee重定向

    cmd 2>&1 | "C:Program FilesGitusrin ee" pyDownload.log

    python

    使用wget模块下载网页

    def download(url, file):
    	cmd = 'python -m wget %s -o "%s"'%(url, file)
    	#my_system(cmd)
    	ret = subprocess.run(cmd, timeout=8).returncode
    	print(' %s return %d'%(cmd, ret))
    	if ret != 0:
    		raise  'error, %s return %d'%(cmd, ret)
    

    获取a和b中间的字符串

    def getStrIn(cont, a, b):
    	p1 = cont.find(a)
    	p1 = p1 + len(a)
    	
    	p2 = cont.find(b, p1)
    	name=cont[p1:p2]
        return name
    

    异常处理

    try:
        xxx
    except:
        print("Unexpected error:", sys.exc_info()[0])
    

    发送json数据

    	try:
    		response = requests.post('http://www.hzcourse.com/web/refbook/queryAllChapterList', data={'ebookId':ebookId,'token':token})
    		resp_json = response.json()
    	except:
    		print("Unexpected error:", sys.exc_info()[0])
    

    发送数据2

    import requests
    
    link = """https://api-zero.livere.com/v1/comments/
                list?callback=jQuery1124049866736766120545_
                1506309304525&limit=10&offset=1&repSeq=3871836
                &requestPath=%2Fv1%2Fcomments%2Flist
                &consumerSeq=1020&livereSeq=28583
                &smartloginSeq=5154&_=1506309304527"""
    headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} 
    
    r = requests.get(link, headers= headers)
    print (r.text)
    

    selenium

    Q: click()函数有时候会hang怎么办?

    A:不知道

    Q: get()函数有时候会hang怎么办?

    A:没有好办法。 一般是因为网页加载未完成。如果你觉得已经ok了,点一下浏览器的停止按钮就可以

    打开网页

    from selenium import webdriver
    driver = webdriver.Chrome()
    driver.get('http://ebooks.cmanuf.com/all?id=1&type=2&code=AC05')
    

    通过css selector获取网页内容

    book = driver.find_element_by_css_selector('#booklist > dd:nth-child(%d) > a'%(i+1))
    book_href = book.get_attribute('href')
    book_text = book.text
    

    tampermonkey

    header模板

    // ==UserScript==
    // @name         TimeBooking
    // @namespace    http://tampermonkey.net/
    // @version      0.1
    // @description  try to take over the world!
    // @author       You
    // @match        https://www.citibank.com.hk/*
    // @match        https://www.services.online-banking.hsbc.com.hk/*
    // @match        https://e-banking1.hangseng.com/*
    // @match        https://ebsnew.boc.cn/*
    // @match        https://its.bochk.com/cdc.overview.do
    // @grant          GM_xmlhttpRequest
    // ==/UserScript==
    
    (function() {
        'use strict';
        
    })();
    

    xpath用法模板

            var xpath='//textarea';
            var tags_data_image =document.evaluate(xpath, document, null, XPathResult.ANY_TYPE,null);
    
            var textareavalue='';
            var tags=[];
            for(var tag=tags_data_image.iterateNext(); tag; tag=tags_data_image.iterateNext())
            {
                tags.push(tag);
                if(tag.value.length>0) {
                    textareavalue = tag.value;
    
                }
            }
    
    
    function _x(STR_XPATH) {
        var xresult = document.evaluate(STR_XPATH, document, null, XPathResult.ANY_TYPE, null);
        var xnodes = [];
        var xres;
        while (xres = xresult.iterateNext()) {
            xnodes.push(xres);
        }
    
        return xnodes;
    }
    
    $(_x('/html/.//div[@id="text"]')).attr('id', 'modified-text');
    

    Logging

    console.log(...)
    

    current url

    var currentLocation = window.location;
    
    currentLocation.host
    
    

    Find substring

    s.indexOf('citibank')
    

    JSON to string

    var sFinal = {"value1":s2, "value2":today};
    alert(JSON.stringify(sFinal));
    

    send JSON request

            GM_xmlhttpRequest ( {
                method:     "POST",
                url:        'https://maker.ifttt.com/trigger/bankmoney/with/key/feQcXd0QuePnJb23E97bv',
                data:       JSON.stringify(sFinal),
                headers:    {
                    "Content-Type": "application/json"
                },
                onload:     function (response) {
                    console.log ("gut response " + response);
                    alert("Success " + response);
                }
            } );
    

    insert a node after a node

        function insertAfter(newElement,targetElement) {
            //target is what you want it to go after. Look for this elements parent.
            var parent = targetElement.parentNode;
    
            //if the parents lastchild is the targetElement...
            if(parent.lastchild == targetElement) {
                //add the newElement after the target element.
                parent.appendChild(newElement);
            } else {
                // else the target has siblings, insert the new element between the target and it's next sibling.
                parent.insertBefore(newElement, targetElement.nextSibling);
            }
        }
    

    create a html node

            var btn = document.createElement("a");
            btn.innerText='DoMyTask!';
            btn.addEventListener("click",updateAll);
            btn.setAttribute('color','red');
    

    stop on debugger

     debugger;
    

    when windoes loaded, call a function

        window.addEventListener('load', function() {
              var checkExist = setInterval(function() {
                if (getRefObject()!=null) {
                    console.log("Exists!");
    				main();
                    clearInterval(checkExist);
                }
            }, 1000); // check every 100ms
        }, false);
    

    css selector

           var matches = document.querySelectorAll(".adaver_box, #div-ad-top, #adHeaderTop, #adFlashLink, *[id^='adRectangle'], #adTextLink, *[id^='divSkyscraper'], *[id^='div-ad-'], *[id^='google_ads'], .anv-ad-content");
            matches.forEach(function(element){
                element.parentNode.removeChild(element);
            });
    

    Use jquery

    // @require      http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js
    
    $('a[onmousedown^="return rwt("]').removeAttr('onmousedown');
    
    $('li.action-menu-item.ab_dropdownitem a[href^="http://webcache.googleusercontent."]').each(
    			function() {
    				$(this).closest('div.action-menu.ab_ctl').after(' ').after($(this))
    			}
    		)
    
    

    如何分析网页

    Chrome浏览器“检查”功能

    步骤一:打开“检查”功能。用Chrome浏览器打开Hello World文章。右击页面的任意位置,在弹出的快弹菜单中单击“检查”命令,得到如图4-5所示的页面窗口。

    img

    步骤二:找到真实的数据地址。单击页面中的Network选项,然后刷新网页。此时,Network会显示浏览器从网页服务器中得到的所有文件,一般这个过程称为“抓包”。因为所有文件已经显示出来了,所以需要的评论数据一定在其中。

    一般而言,这些数据可能以 json 文件格式获取。我们可以在Network中的 All找到真正的评论文件“list?callback=jQuery11240879907919223679”。点击 Preview 即可查看数据,如图4-6所示。

    img

    步骤三:爬取真实评论数据地址。既然找到了真实的地址,接下来就可以直接用requests请求这个地址获取数据了,代码如下:

    import requests
    
    link = """https://api-zero.livere.com/v1/comments/
                list?callback=jQuery1124049866736766120545_
                1506309304525&limit=10&offset=1&repSeq=3871836
                &requestPath=%2Fv1%2Fcomments%2Flist
                &consumerSeq=1020&livereSeq=28583
                &smartloginSeq=5154&_=1506309304527"""
    headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} 
    
    r = requests.get(link, headers= headers)
    print (r.text)
    

    Links

    https://zhuanlan.zhihu.com/p/31127887 requests

    https://zhuanlan.zhihu.com/p/31127896 seleum

    https://zhuanlan.zhihu.com/p/73742321

    https://www.jianshu.com/p/beb200cda628 seleum

  • 相关阅读:
    linux grep --我最喜欢的命令~~
    svmrank 的误差惩罚因子c选择 经验
    转:机器学习中的算法(2)-支持向量机(SVM)基础
    转:关于python文件操作大全
    python 求两个时间差
    多个excel合并(excel2007)
    oracle12c 新建表空间
    数据库表被锁了
    join ,left join ,right join有什么区别
    最简洁的权限(菜单)控制
  • 原文地址:https://www.cnblogs.com/cutepig/p/12263568.html
Copyright © 2011-2022 走看看