zoukankan      html  css  js  c++  java
  • Go 和 Colly笔记

    Colly是Go下功能比较完整的一个HTTP客户端工具.

    安装

    使用GoLand作为开发环境

    GOROOT: go目录放到了/opt/go, 所以GOROOT默认指向的也是/opt/go

    GOPATH: 在Settings->Go->GOPATH里配置Global GOPATH, 指向 /home/milton/WorkGo

    GOPROXY: 在Settings->Go->Go Modules下, 设置 Environments, GOPROXY=https://goproxy.cn

    在GoLand内部的Terminal里查看环境变量, 命令 go env, 确认路径无误, 然后执行以下命令安装

    # v1
    go get -u github.com/gocolly/colly
    
    # v2
    go get -u github.com/gocolly/colly/v2
    

    基础使用

    增加import

    import "github.com/gocolly/colly/v2"
    

    调用

    func main() {
    	// Instantiate default collector
    	c := colly.NewCollector(
    		// Visit only domains: hackerspaces.org, wiki.hackerspaces.org
    		colly.AllowedDomains("hackerspaces.org", "wiki.hackerspaces.org"),
    	)
    
    	// On every a element which has href attribute call callback
    	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    		link := e.Attr("href")
    		// Print link
    		fmt.Printf("Link found: %q -> %s
    ", e.Text, link)
    		// Visit link found on page
    		// Only those links are visited which are in AllowedDomains
    		c.Visit(e.Request.AbsoluteURL(link))
    	})
    
    	// Before making a request print "Visiting ..."
    	c.OnRequest(func(r *colly.Request) {
    		fmt.Println("Visiting", r.URL.String())
    	})
    
    	// Start scraping on https://hackerspaces.org
    	c.Visit("https://hackerspaces.org/")
    }
    

      

    使用代理池

    参考文档中的例子 http://go-colly.org/docs/examples/proxy_switcher/  这里的例子要注意两个问题

    1. 初始化时, 需要设置AllowURLRevisit, 否则在访问同一URL时会直接跳过返回之前的结果

    c := colly.NewCollector(colly.AllowURLRevisit())
    

    2. 还需要设置禁用KeepAlive, 否则在多次访问同一网址时, 只会调用一次GetProxy, 这样达不到轮询代理池的效果, 相关信息 #392, #366 , #339 

    c := colly.NewCollector(colly.AllowURLRevisit())
    
    c.WithTransport(&http.Transport{
    	DisableKeepAlives: true,
    })
    

    Golang里的协程同步(等价于Java中的锁)

    Mutex

    在Go程序中为解决Race Condition和Data Race问题, 使用Mutex来锁定资源只能同时被一个协程调用, 通过 &sync.Mutex() 创建一个全局变量, 在子方法里面通过Lock()和Unlock()锁定和释放资源. 注意defer关键字的使用.

    import (
    	"strconv"
    	"sync"
    )
    
    var myBalance = &balance{amount: 50.00, currency: "GBP"}
    
    type balance struct {
    	amount   float64
    	currency string
    	mu       sync.Mutex
    }
    
    func (b *balance) Add(i float64) {
    	b.mu.Lock()
    	b.amount += i
    	b.mu.Unlock()
    }
    
    func (b *balance) Display() string {
    	b.mu.Lock()
    	defer b.mu.Unlock()
    	return strconv.FormatFloat(b.amount, 'f', 2, 64) + " " + b.currency
    }
    

    读写锁使用RWMutex, 在Mutex的基础上, 增加了RLock()和RUnlock()方法. 在Lock()时依然是互斥的, 但是RLock()与RLock()之间不互斥

    import (
    	"strconv"
    	"sync"
    )
    
    var myBalance = &balance{amount: 50.00, currency: "GBP"}
    
    type balance struct {
    	amount   float64
    	currency string
    	mu       sync.RWMutex
    }
    
    func (b *balance) Add(i float64) {
    	b.mu.Lock()
    	b.amount += i
    	b.mu.Unlock()
    }
    
    func (b *balance) Display() string {
    	b.mu.RLock()
    	defer b.mu.RUnlock()
    	return strconv.FormatFloat(b.amount, 'f', 2, 64) + " " + b.currency
    }
    

     Channel

    Channel类似于Java中的Semaphore, 通过设置channel容量限制同时工作的协程数, channel满了之后协程会被阻塞

    package main                                                                                                                                                           
    
    import (
        "fmt"
        "time"
        "strconv"
    )
    
    func makeCakeAndSend(cs chan string) {
        for i := 1; i<=3; i++ {
            cakeName := "Strawberry Cake " + strconv.Itoa(i)
            fmt.Println("Making a cake and sending ...", cakeName)
            cs <- cakeName //send a strawberry cake
        }   
    }
    
    func receiveCakeAndPack(cs chan string) {
        for i := 1; i<=3; i++ {
            s := <-cs //get whatever cake is on the channel
            fmt.Println("Packing received cake: ", s)
        }   
    }
    
    func main() {
        cs := make(chan string)
        go makeCakeAndSend(cs)
        go receiveCakeAndPack(cs)
    
        //sleep for a while so that the program doesn’t exit immediately
        time.Sleep(4 * 1e9)
    }
    

     可以设置channel的容量 

    c := make(chan Type, n)
    

      

  • 相关阅读:
    Alpha阶段项目复审
    复审与事后分析
    测试与发布(Alpha版本)
    第七天
    第六天
    团队作业第4周——项目冲刺
    第一天
    第二天
    第四天
    第五天
  • 原文地址:https://www.cnblogs.com/milton/p/13093544.html
Copyright © 2011-2022 走看看