从0.7.1版本开始,WebMagic开始使用了新的代理APIProxyProvider
。因为相对于Site的“配置”,ProxyProvider定位更多是一个“组件”,所以代理不再从Site设置,而是由HttpClientDownloader
设置
API | 说明 |
---|---|
HttpClientDownloader.setProxyProvider(ProxyProvider proxyProvider) | 设置代理 |
ProxyProvider
有一个默认实现:SimpleProxyProvider
。它是一个基于简单Round-Robin的、没有失败检查的ProxyProvider。可以配置任意个候选代理,每次会按顺序挑选一个代理使用。它适合用在自己搭建的比较稳定的代理的场景。
代理示例:
- 设置单一的普通HTTP代理为101.101.101.101的8888端口,并设置密码为"username","password"
HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
httpClientDownloader.setProxyProvider(SimpleProxyProvider.from(new Proxy("101.101.101.101",8888,"username","password")));
spider.setDownloader(httpClientDownloader);
- 设置代理池,其中包括101.101.101.101和102.102.102.102两个IP,没有密码
HttpClientDownloader httpClientDownloader = new HttpClientDownloader();
httpClientDownloader.setProxyProvider(SimpleProxyProvider.from(
new Proxy("101.101.101.101",8888)
,new Proxy("102.102.102.102",8888)));
如果对于代理部分有建议的,欢迎参与讨论#579 更多的代理ProxyProvider实现
package com.mwq.job.task; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component; import us.codecraft.webmagic.Page; import us.codecraft.webmagic.Site; import us.codecraft.webmagic.Spider; import us.codecraft.webmagic.downloader.HttpClientDownloader; import us.codecraft.webmagic.processor.PageProcessor; import us.codecraft.webmagic.proxy.Proxy; import us.codecraft.webmagic.proxy.SimpleProxyProvider; @Component public class ProxyTest implements PageProcessor { @Scheduled(fixedDelay = 1000) public void process(){ //创建下载器 HttpClientDownloader httpClientDownloader = new HttpClientDownloader(); //给下载器设置代理服务器信息 httpClientDownloader.setProxyProvider(SimpleProxyProvider.from(new Proxy("150.109.32.166",80))); Spider.create(new ProxyTest()) .addUrl("http://ip.chinaz.com/getip.aspx") .setDownloader(httpClientDownloader) .run(); } @Override public void process(Page page) { System.out.println(page.getHtml().toString()); } Site site = Site.me(); @Override public Site getSite() { return site; } }
提供两个免费代理服务商网站:
米扑代理:https://proxy.mimvp.com/free.php
西刺免费代理:http://www.xicidaili.com/