最近想注册一个域名,使用万网尝试了很多域名,基本都已被注册。之前听说双拼域名很火,所以想写个脚本,看看哪些双拼域名还未被注册。
一、查询接口
网上搜索了一下,万网的域名查询接口比较简单易用,查询URL格式为: http://panda.www.net.cn/cgi-bin/check.cgi?area_domain=aaa.com
返回值及含义:
210 : Domain name is available
211 : Domain name is not available
212 : Domain name is invalid
214 : Unknown error
二、编程思路
1. DomainGenerator读取文件pinyin.txt,获取所有可用的拼音字母。遍历拼音字母, 组装成双拼域名。这个拼音列表是从网上搜索来的,可能会有纰漏。
2. 创建域名检测线程DomainRunner,每个线程采用httpclient调用万网的域名查询接口。
3. 每个线程调用DomainValidator检查返回结果。
4. 线程ResultRunner将可用域名写入domain.txt文件。
三、核心代码
DomainGenerator.java, 启动类,读取拼音列表,组装需要检测的域名,创建检测线程和结果处理线程。
- package com.learnworld;
-
- import java.util.List;
- import java.io.BufferedReader;
- import java.io.BufferedWriter;
- import java.io.FileReader;
- import java.io.FileWriter;
- import java.util.ArrayList;
- import java.util.concurrent.ArrayBlockingQueue;
- import java.util.concurrent.CountDownLatch;
- import java.util.concurrent.ExecutorService;
- import java.util.concurrent.Executors;
- import java.util.concurrent.LinkedBlockingQueue;
- import java.util.concurrent.atomic.AtomicInteger;
-
- import org.apache.http.impl.client.CloseableHttpClient;
- import org.apache.http.impl.client.HttpClients;
- import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
-
- public class DomainGenerator {
-
- public static void main(String[] args){
- // pinyin list, read from pinyin.txt
- List<String> items = new ArrayList<String>();
- // domain list, which need to check
- ArrayBlockingQueue<String> taskQueue = new ArrayBlockingQueue<String>(163620);
- // available domain list, which need to save into file
- LinkedBlockingQueue<String> resultQueue = new LinkedBlockingQueue<String>();
- // counter, need to count unavailable domain statistical information
- AtomicInteger count = new AtomicInteger(0);
-
- // Httpclient initialization
- PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
- cm.setMaxTotal(20);
- cm.setDefaultMaxPerRoute(20);
- CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
-
- try {
- // pinyin.txt, used to save all available pinyin
- BufferedReader reader = new BufferedReader(new FileReader("pinyin.txt"));
- // domain.txt, used to save all available domain result
- BufferedWriter writer = new BufferedWriter(new FileWriter("domain.txt"));
-
- String item = null;
- while((item = reader.readLine()) != null){
- items.add(item);
- }
-
- // generate domain list
- for (String item1 : items){
- for (String item2 : items) {
- taskQueue.offer(item1 + item2 + ".com");
- }
- }
-
- int domainThreadNum = 3;
- CountDownLatch downLatch = new CountDownLatch(domainThreadNum);
- ExecutorService executor = Executors.newFixedThreadPool(domainThreadNum + 1);
-
- // start domain check thread
- for(int i = 0; i < domainThreadNum; i++){
- executor.execute(new DomainRunner(taskQueue, resultQueue, downLatch, count, httpClient));
- }
-
- // start result handle thread
- executor.execute(new ResultRunner(resultQueue, writer));
-
- downLatch.await();
- System.out.println("All tasks are done!");
-
- // TODO, suggest use volatile flag to control ResultRunner
- executor.shutdownNow();
-
- reader.close();
- writer.close();
- httpClient.close();
- } catch (Exception e) {
- e.printStackTrace();
- }
-
- }
- }
DomainRunner:域名检测线程,从域名domainQueue中读取域名,调用接口进行检测。 如果域名可用,将结果放入resultQueue中等待写入文件。
- package com.learnworld;
-
- import java.io.IOException;
- import java.util.Calendar;
- import java.util.concurrent.ArrayBlockingQueue;
- import java.util.concurrent.CountDownLatch;
- import java.util.concurrent.LinkedBlockingQueue;
- import java.util.concurrent.atomic.AtomicInteger;
-
- import org.apache.http.HttpEntity;
- import org.apache.http.client.config.RequestConfig;
- import org.apache.http.client.methods.CloseableHttpResponse;
- import org.apache.http.client.methods.HttpGet;
- import org.apache.http.impl.client.CloseableHttpClient;
- import org.apache.http.protocol.BasicHttpContext;
- import org.apache.http.protocol.HttpContext;
- import org.apache.http.util.EntityUtils;
-
- public class DomainRunner implements Runnable {
-
- private ArrayBlockingQueue<String> domainQueue;
- private LinkedBlockingQueue<String> resultQueue;
- private CountDownLatch downLatch;
- private AtomicInteger count;
- private CloseableHttpClient httpClient;
-
- public DomainRunner(ArrayBlockingQueue<String> domainQueue,
- LinkedBlockingQueue<String> resultQueue, CountDownLatch downLatch,
- AtomicInteger count, CloseableHttpClient httpClient) {
- super();
- this.domainQueue = domainQueue;
- this.resultQueue = resultQueue;
- this.downLatch = downLatch;
- this.count = count;
- this.httpClient = httpClient;
- }
-
- @Override
- public void run() {
- String domain = null;
- while ((domain = domainQueue.poll()) != null) {
- boolean isDomainAvailable = false;
-
- RequestConfig requestConfig = RequestConfig.custom()
- .setSocketTimeout(5000)
- .setConnectTimeout(5000)
- .setConnectionRequestTimeout(5000)
- .build();
-
- HttpGet httpGet = new HttpGet("http://panda.www.net.cn/cgi-bin/check.cgi?area_domain=" + domain);
- httpGet.setConfig(requestConfig);
- httpGet.setHeader("Connection", "close");
- HttpContext context = new BasicHttpContext();
- CloseableHttpResponse response = null;
- try {
- response = httpClient.execute(httpGet, context);
- HttpEntity entity = response.getEntity();
- int status = response.getStatusLine().getStatusCode();
- if (status >= 200 && status < 300) {
- String resultXml = EntityUtils.toString(entity);
- isDomainAvailable = DomainValidator.isAvailableDomainForResponse(resultXml);
- EntityUtils.consumeQuietly(entity);
- } else {
- System.out.println(domain + " check error.");
- }
- } catch (Exception e) {
- e.printStackTrace();
- } finally {
- try {
- httpGet.releaseConnection();
- if (response != null) {
- response.close();
- }
-
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
-
- // result handle
- if(isDomainAvailable) {
- resultQueue.offer(domain);
- } else {
- int totalInvalid = count.addAndGet(1);
- if (totalInvalid % 100 == 0) {
- System.out.println(totalInvalid + " " + Calendar.getInstance().getTime());
- }
- }
- }
-
- downLatch.countDown();
-
- }
-
- }
DomainValidator: 对万网返回结果进行检查,判断域名是否可用。
- package com.learnworld;
-
- public class DomainValidator {
-
- public static boolean isAvailableDomainForResponse(String responseXml){
- if(responseXml == null || responseXml.isEmpty()){
- return false;
- }
-
- if(responseXml.contains("<original>210")){
- return true;
- } else if(responseXml.contains("<original>211")
- || responseXml.contains("<original>212")
- || responseXml.contains("<original>214")){
- return false;
- } else {
- System.out.println("api callback error!");
- try {
- Thread.sleep(60000);
- } catch (InterruptedException e) {
- e.printStackTrace();
- }
-
- return false;
- }
- }
-
- }
ResultRunner: 结果处理线程,将可用域名写入文件domain.txt中。
- package com.learnworld;
-
- import java.io.BufferedWriter;
- import java.util.concurrent.LinkedBlockingQueue;
-
- public class ResultRunner implements Runnable{
-
- private LinkedBlockingQueue<String> resultQueue;
- BufferedWriter writer;
-
- public ResultRunner(LinkedBlockingQueue<String> resultQueue,
- BufferedWriter writer) {
- super();
- this.resultQueue = resultQueue;
- this.writer = writer;
- }
-
- @Override
- public void run() {
- String result = null;
- try {
- while ((result = resultQueue.take()) != null) {
- writer.write(result);
- writer.newLine();
- writer.flush();
- }
- } catch (Exception e) {
- e.printStackTrace();
- }
-
- }
-
- }
四、总结
1. 第一版程序采用单线程处理,性能很差,每100个域名大概需要90s左右,主要原因是网络IO延迟比较大。将代码修改为多线程处理,创建两个检测线程,每100个域名大概需要30s左右。
2. 提高检测线程数会加快处理性能,但建议不超过三个,原因有两个:
1) 万网采用了阿里云的过滤技术,如果一段时间内某个IP的请求数很高,就会将该IP加入屏蔽列表。 我开始采用了100个线程,不到1分钟就被屏蔽。
2)当请求数很高时,网络连接不能得到及时释放,很多TCP连接处于TIME_WAIT状态,进而出现BindException错误。
3. 我遍历了所有的双拼域名,目前约有1万个域名尚未被注册,结果见附件。我又遍历了四位及以下的纯英文字母域名,已经全部被注册。
需要注册双拼域名的童鞋要抓紧了~~