zoukankan      html  css  js  c++  java
  • 大数据背景下互联网用户行为分析

    要求:

    编一个程序就是要能捕捉后一个上网期间做的事情,比如他浏览百度页面3分钟,然后浏览新浪6分钟,然后下面还浏览了其他网页等等,能用程序捕捉到他所有的上网行为

    准备工作:
    查看需要可能用到的包:
    pynput.mouse:包含控制和监控鼠标或者触摸板的类。
    
    pynput.keyboard:包含控制和监控键盘的类。
    

      

    鼠标事件监听器是一个线程,所有的回调函数都会在独立的线程中运行。

    调用pynput.mouse.Listener.stop,发起StopException异常,或者回调函数中返回False都会停止事件的监听。

    对鼠标的操作:

     1 #!/usr/bin/env python3
     2 #-*- coding:utf-8 -*-
     3 '''
     4 Administrator 
     5 2018/8/16 
     6 '''
     7 
     8 from pynput.mouse import Button, Controller
     9 import time
    10 
    11 mouse = Controller()
    12 print(mouse.position)
    13 time.sleep(3)
    14 print('The current pointer position is {0}'.format(mouse.position))
    15 
    16 
    17 #set pointer positon
    18 mouse.position = (277, 645)
    19 print('now we have moved it to {0}'.format(mouse.position))
    20 
    21 #鼠标移动(x,y)个距离
    22 mouse.move(5, -5)
    23 print(mouse.position)
    24 
    25 mouse.press(Button.left)
    26 mouse.release(Button.left)
    27 
    28 #Double click
    29 mouse.click(Button.left, 1)
    30 
    31 #scroll two  steps down
    32 mouse.scroll(0, 500)
    View Code

    对鼠标行为的监控:

     1 from pynput import mouse
     2 
     3 def on_move(x, y):
     4     print('Pointer moved to {0}'.format(
     5         (x, y)))
     6 
     7 def on_click(x, y, button, pressed):
     8     print('{0} at {1}'.format(
     9         'Pressed' if pressed else 'Released',
    10         (x, y)))
    11     if not pressed:
    12         # Stop listener
    13         return False
    14 
    15 def on_scroll(x, y, dx, dy):
    16     print('Scrolled {0} at {1}'.format(
    17         'down' if dy < 0 else 'up',
    18         (x, y)))
    19 
    20 # Collect events until released
    21 with mouse.Listener(
    22         on_move=on_move,
    23         on_click=on_click,
    24         on_scroll=on_scroll) as listener:
    25     listener.join()
    View Code

    处理鼠标监听器错误

     1 from pynput import mouse
     2 
     3 class MyException(Exception): pass
     4 
     5 def on_click(x, y, button, pressed):
     6     if button == mouse.Button.left:
     7         raise MyException(button)
     8 
     9 # Collect events until released
    10 with mouse.Listener(
    11         on_click=on_click) as listener:
    12     try:
    13         listener.join()
    14     except MyException as e:
    15         print('{0} was clicked'.format(e.args[0]))
    View Code

    键盘事件监听器是一个线程,所有的回调函数都会在独立的线程中运行。

    调用pynput.keyboard.Listener.stop,发起StopException异常,或者回调函数中返回False都会停止事件的监听。

    传递给回调函数的key参数是一个pynput.keyboard.Key类的实例。当特殊按键和普通按键一起按下时,数字字母按键的值会被放置在pynput.keyboard.KeyCode类的实例中,对于不知道的按键会返回None。

    Controlling the keyboard 控制键盘

     1 from pynput.keyboard import Key, Controller
     2 
     3 keyboard = Controller()
     4 
     5 # Press and release space
     6 keyboard.press(Key.space)
     7 keyboard.release(Key.space)
     8 
     9 # Type a lower case A; this will work even if no key on the
    10 # physical keyboard is labelled 'A'
    11 keyboard.press('a')
    12 keyboard.release('a')
    13 
    14 # Type two upper case As
    15 keyboard.press('A')
    16 keyboard.release('A')
    17 with keyboard.pressed(Key.shift):
    18     keyboard.press('a')
    19     keyboard.release('a')
    20 
    21 # Type 'Hello World' using the shortcut type method
    22 keyboard.type('Hello World')
    View Code

    Monitoring the keyboard 监控键盘

     1 from pynput import keyboard
     2 
     3 def on_press(key):
     4     try:
     5         print('alphanumeric key {0} pressed'.format(
     6             key.char))
     7     except AttributeError:
     8         print('special key {0} pressed'.format(
     9             key))
    10 
    11 def on_release(key):
    12     print('{0} released'.format(
    13         key))
    14     if key == keyboard.Key.esc:
    15         # Stop listener
    16         return False
    17 
    18 # Collect events until released
    19 with keyboard.Listener(
    20         on_press=on_press,
    21         on_release=on_release) as listener:
    22     listener.join()
    View Code

    处理键盘监听器错误

     1 from pynput import keyboard
     2 
     3 class MyException(Exception): pass
     4 
     5 def on_press(key):
     6     if key == keyboard.Key.esc:
     7         raise MyException(key)
     8 
     9 # Collect events until released
    10 with keyboard.Listener(
    11         on_press=on_press) as listener:
    12     try:
    13         listener.join()
    14     except MyException as e:
    15         print('{0} was pressed'.format(e.args[0]))
    View Code

    利用python实现查看浏览器历史记录

     1 #coding:utf8
     2 '''
     3 Created on 2018年8月16日
     4 
     5 @author: Administrator
     6 '''
     7 #统计浏览器访问历史记录
     8 #se://version/ 用于查看浏览器文件存储地址
     9 
    10 
    11 import os  
    12 import sqlite3  
    13 import operator  
    14 from collections import OrderedDict  
    15 import matplotlib.pyplot as plt  
    16 
    17 def parse(url):  
    18     try:  
    19         parsed_url_components = url.split('//')  
    20         sublevel_split = parsed_url_components[1].split('/', 1)  
    21         domain =sublevel_split[0].replace("www.", "")  
    22         return domain
    23     except IndexError:  
    24         print('URL format error!') 
    25 
    26 def analyze(results):  
    27     prompt =input("[.] Type <c> to print or <p> to plot
    [>] ")
    28 
    29     if prompt == "c":
    30         with open('./history.txt','w') as f:
    31             for site, count in sites_count_sorted.items():
    32                 f.write(site+'	'+str(count)+'
    ')
    33     elif prompt == "p":
    34         key=[]
    35         value=[]
    36         for k,v in results.items():
    37             key.append(k)
    38             value.append(v)
    39         n=25
    40         X=range(n)
    41         Y=value[:n]
    42         plt.bar(X,Y,align='edge')
    43         plt.xticks(rotation=45)  
    44         plt.xticks(X,key[:n])
    45         for x,y in zip(X,Y):
    46             plt.text(x+0.4, y+0.05,y, ha='center', va= 'bottom')
    47         plt.show()
    48     else:  
    49         print("[.] Uh?")  
    50         quit()  
    51 
    52 if __name__=='__main__':
    53     #path to user's history database (Chrome)  
    54     data_path=r"D:360浏览器360se6User DataDefault"
    55     files=os.listdir(data_path)
    56     #"D:360浏览器360se6User DataDefaultHistory"
    57     history_db = os.path.join(data_path, 'History')
    58     print(history_db)
    59 
    60     #querying the db  
    61     c = sqlite3.connect(history_db)  
    62     cursor = c.cursor()  
    63     select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits WHERE urls.id = visits.url;"  
    64     cursor.execute(select_statement)  
    65 
    66     results = cursor.fetchall() #tuple  
    67 
    68     sites_count = {} #dict makes iterations easier :D  
    69 
    70     for url, count in results:  
    71         url = parse(url)  
    72         if url in sites_count:  
    73             sites_count[url] += 1  
    74         else:  
    75             sites_count[url] = 1  
    76 
    77     sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True))  
    78 
    79     analyze (sites_count_sorted)  
    View Code

    实现原理,找到浏览器浏览历史的保存的SQLit数据文件。利用代码读取数据,并对数据进行处理,加工

    #-*- utf-8 -*-
    
    
    # 已经可以在这个上面编写了
    
    # 统计浏览器访问历史记录
    # se://version/ 用于查看浏览器文件存储地址
    # matplotlib  这是用来显示数据的包   一般都是科学计算用的。估计你用的会多点。你可以好好学习一下  还有一个包 numpy
    #
    
    import os  # 访问系统的包
    import sqlite3  # 链接数据库文件胡包
    import operator
    from collections import OrderedDict
    import matplotlib.pyplot as plt
    import re
    
    def parse(url):
        try:
            parsed_url_components = url.split('//')
            sublevel_split = parsed_url_components[1].split('/', 1)
            domain = sublevel_split[0].replace("www.", "")
            return domain
        except IndexError:
            print('URL format error!')
    def filter_data(url):
        try:
            parsed_url_components = url.split('//')
            sublevel_split = parsed_url_components[1].split('/', 1)
            data=re.search('w+.(com|cn|net|tw|la|io|org|cc|info|cm|us|tv|club|co|in)',sublevel_split[0])
            if data:
                return data.group()
            else:
                yuming_count.add(sublevel_split[0])
                return "ok"
        except IndexError:
            print('URL format error!')
    
    def analyze(results):
        prompt = input("[.] Type <c> to print or <p> to plot
    [>] ")
    
        if prompt == "c":
            with open('./history.txt', 'w') as f:
                for site, count in sites_count_sorted.items():
                    f.write(site + '	' + str(count) + '
    ')
        elif prompt == "p":
            key = []
            value = []
            for k, v in results.items():
                key.append(k)
                value.append(v)
            n = 25
            X = range(n)
            Y = value[:n]
            plt.bar( Y,X, align='edge')
            plt.xticks(rotation=45)
            plt.xticks(X, key[:n])
            for x, y in zip(X, Y):
                plt.text(x + 0.4, y + 0.05, y, ha='center', va='bottom')
            plt.show()
        else:
            print("[.] Uh?")
            quit()
    def analyze2(results):
        print("我一看就知道你要打印折线图")
        key=[]
        value=[]
        for k,v in results.items():
            key.append(k)
            value.append(v)
        n = 20
        X = key[:n]
        Y = value[:n]
    
        plt.plot(Y,X,label="number count")
        plt.xticks(rotation=45)
        plt.xlabel('numbers')
        plt.ylabel('webname')
        plt.title('number count')
        plt.show()
    
    if __name__ == '__main__':
        # 先运行起来吧
        #  这是查看360的。
    
        history_db = r"C:UsersAdministratorDesktopHistory"
        # files=os.listdir(data_path)
    
        # history_db = os.path.join(data_path, 'History')
        # print(history_db)
    
        # querying the db
        c = sqlite3.connect(history_db)
        cursor = c.cursor()
        select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits WHERE urls.id= visits.url;"
        cursor.execute(select_statement)
    
        results = cursor.fetchall() # tuple
    
        sites_count = {}  # dict makes iterations easier :D
        yuming_count=set()#创建一个空的集合,用来收集已经存在国际域名
        for url, count in results:
            url= filter_data(url)
            if url in sites_count:
                sites_count[url] += 1
            else:
                sites_count[url] = 1
        print(yuming_count)
        # print(sites_count)
        del sites_count["ok"]
        sites_count_sorted = OrderedDict(sorted(sites_count.items(), key=operator.itemgetter(1), reverse=True))
        #
        # # analyze(sites_count_sorted)
        analyze2(sites_count_sorted)
    View Code

    利用python编程实现对浏览器接口的监听,实时查看用户访问的网站和数据


     

    https://blog.csdn.net/xuanhun521/article/details/51779292 

    Python黑客编程3网络数据监听和过滤

    课程的实验环境如下:

    •      操作系统:window7

    •      编程工具:pycharm IDE

    •      Python版本:3.6.4

    •      涉及到的主要python模块:pypcap,dpkt,scapy,scapy-http


    https://blog.csdn.net/sinat_22659313/article/details/53420492

    大数据背景下互联网用户行为分析





  • 相关阅读:
    LeetCode 485. Max Consecutive Ones
    LeetCode 367. Valid Perfect Square
    LeetCode 375. Guess Number Higher or Lower II
    LeetCode 374. Guess Number Higher or Lower
    LeetCode Word Pattern II
    LeetCode Arranging Coins
    LeetCode 422. Valid Word Square
    Session 共享
    java NIO
    非阻塞IO
  • 原文地址:https://www.cnblogs.com/Mengchangxin/p/9487278.html
Copyright © 2011-2022 走看看