zoukankan html css js c++ java

使用Chrome无头浏览器获取puzzle team club解谜游戏的谜面

零、用什么工具爬取网站

　　之前的两个游戏谜面，都是眼看，手动输入的，这给解谜带来了一些不方便。尤其是那种special daily battle之类的，谜面都很大，一个个写很费时。有没有什么方法能快速拿到谜面，并且把谜面直接输出到文件里？答案是爬虫，网页抓取。

　　只是puzzle team club的网页防爬虫措施做得太好，网页里没有关于谜面的信息，抓来的数据包分析不出（我会说是包的数量太多了吗），只能用无头浏览器。

　　开始使用phantomJS，获取网页代码部分Python代码如下：

def getChessByPhantomJS():
    driver = webdriver.PhantomJS()
    driver.get('https://www.puzzle-dominosa.com/?size=8')
    source = driver.page_source
    driver.quit()
#

View Code

　　但是运行结果不如意，最终只给了一个没有谜面的基本模板网页。

　　用Chrome效果有如何呢？（不晓得如何配置chrome无头浏览器的可以右转baidu）

def getChessByChrome():
    path = r'D:chromedriver.exe'
    chrome_options = Options()
    #后面的两个是固定写法 必须这么写
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
    try:
        driver.get('https://www.puzzle-dominosa.com/?size=8')
    except Exception as e:
        print(e)
    source = driver.page_source
    driver.quit()
    return source

View Code

　　运行结果（不如说是运行过程，因为这个B一直不退出）

DevTools listening on ws://127.0.0.1:62344/devtools/browser/8c9f8f4a-407a-4045-b
41c-b9f898d4d37b
[1203/174652.884:INFO:CONSOLE(1)] "Uncaught TypeError: window.googletag.pubads i
s not a function", source: https://www.puzzle-dominosa.com/build/js/public/new/d
ominosa-95ac3646ef.js (1)

View Code

　　可以给程序加个超时退出：

def getChessByChrome():
    path = r'D:chromedriver.exe'
    chrome_options = Options()
    #后面的两个是固定写法 必须这么写
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
    try:
        driver.set_page_load_timeout(30)
        driver.get('https://www.puzzle-dominosa.com/?size=8')
    except Exception as e:
        print(e)
    source = driver.page_source
    driver.quit()
    return source

View Code

　　这样就能把网页代码交给分析函数，输出谜面了。

一、如何拿到dominosa谜面

　　不过就做到这里还没完，我们要的是谜面。为此，我们需要分析代码：

图1.dominosa游戏的谜面代码

　　看到了吧？这里的谜面直接反映在代码的class名上，cell3对应谜面的3，而且同级元素超过谜面单位长度时，谜面会换行。

　　代码可以这样写：

def solve():
    source = getChessByChrome()
    htree = etree.HTML(source)
    chessSize = len(htree.xpath('//div[@id="game"]/div/div/div/..'))
    puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
    if len(puzzleId) != 0:
        puzzleId = puzzleId[0]
    else:
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
    x = (round((4 * chessSize + 1)**0.5) - 1) // 2
    print(x)
    print(x+1)
    chess = ''
    for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div/div/..')):
        value = className.xpath('./@class')[0].split(' ')[1][4:]
        if i % (x+1) == x:
            chess += value + '
'
        else:
            chess += value + ' '
    with open('dominosaChess' + puzzleId + '.txt','w') as f:f.write(chess[:-1])

View Code

　　这样就可以拿到使用Dancing link X （舞蹈链）求解dominosa游戏这里面要求的谜面文件了。

　　附带一提，这里为了查询谜面方便，输出的文件名字带有谜面ID；如果这是特别谜题，则输出的文件名字带有特别谜题的标题。

　　附带一些运行结果与谜面对比图（文件名dominosaChess7,092,762.txt）：

4 5 2 2 7 3 3 0 6
2 7 5 6 2 6 4 1 5
4 4 5 6 0 2 6 0 2
7 3 3 5 0 0 3 4 4
0 1 3 3 4 1 3 2 1
5 7 0 5 3 2 1 1 6
1 6 6 7 5 2 6 7 1
7 4 0 0 4 5 1 7 7

　　对应谜面截图：

图2.ID为7,092,762的谜面

二、如何拿到star battle谜面

　　拿到符合使用深度优先搜索DFS求解star battle游戏这里面要求的谜面文件要费点功夫。

　　咱们查看下图吧：

图3.star battle谜面代码

　　这里的谜面代码class名字都有一定意义，比如bl表示左侧有分割线，br表示右侧有分割线。

　　这里只给我们提供了分割线，我们需要的是标示每个方格所属是哪个块的那种排布。要做到这种，我们需要使用BFS，宽度优先搜索。

def solve():
    if url.find('size=') == -1:
        limit = 1
    else:
        size = url.split('size=')[1]
        size = int(size)
        if size >= 1 and size <= 4:
            limit = 1
        elif size <= 6:
            limit = 2
        elif size <= 8:
            limit = 3
        else:
            limit = size - 5
    source = getChessByFile()
    htree = etree.HTML(source)
    chessSize = len(htree.xpath('//div[@id="game"]/div/div'))
    puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
    if len(puzzleId) != 0:
        puzzleId = puzzleId[0]
    else:
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
    chessSize = round(chessSize**0.5)
    chess = [[-1 for _ in range(chessSize)] for __ in range(chessSize)]
    borderss = [['' for _ in range(chessSize)] for __ in range(chessSize)]
    chessStr = ''
    maxBlockNumber = 0
    # br: on the right; bl: on the left; bb: on the down; bt: on the up
    for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div[contains(@class,"cell")]')):
        x = i // chessSize
        y = i % chessSize
        value = className.xpath('./@class')[0]
        if value[:4] != 'cell':
            continue
        value = value.replace('cell selectable','')
        value = value.replace('cell-off','')
        borderss[x][y] = value
    for i in range(chessSize):
        for j in range(chessSize):
            if chess[i][j] != -1:
                continue
            queue = [(i, j)]
            chess[i][j] = str(maxBlockNumber)
            while len(queue) > 0:
                oldQueue = deepcopy(queue)
                queue = []
                for pos in oldQueue:
                    x, y = pos[0], pos[1]
                    #
                    if x > 0 and borderss[x][y].find('bt') == -1 and chess[x-1][y] == -1:
                        queue.append((x-1, y))
                        chess[x-1][y] = chess[i][j]
                    #
                    if x < chessSize - 1 and borderss[x][y].find('bb') == -1 and chess[x+1][y] == -1:
                        queue.append((x+1, y))
                        chess[x+1][y] = chess[i][j]
                    #
                    if y > 0 and borderss[x][y].find('bl') == -1 and chess[x][y-1] == -1:
                        queue.append((x, y-1))
                        chess[x][y-1] = chess[i][j]
                    #
                    if y < chessSize - 1 and borderss[x][y].find('br') == -1 and chess[x][y+1] == -1:
                        queue.append((x, y+1))
                        chess[x][y+1] = chess[i][j]
                    #
            maxBlockNumber += 1
    chessStr = '
'.join(' '.join(chessRow) for chessRow in chess)
    with open('starBattleChess' + puzzleId + '.txt','w') as f:f.write(str(limit)+'
'+chessStr)

View Code

　　附带一些运行结果与谜面对比图（文件名starBattleChess3,876,706.txt）：

　　对应谜面截图：

图4.ID为3,876,706的谜面

查看全文

相关阅读:
A1126 Eulerian Path (25分)
A1125 Chain the Ropes (25分)
A1124 Raffle for Weibo Followers (20分)
A1123 Is It a Complete AVL Tree (30分)
A1122 Hamiltonian Cycle (25分)
A1121 Damn Single (25分)
A1120 Friend Numbers (20分)
A1119 Pre- and Post-order Traversals (30分)
总的调试开关
 sourceInsight

原文地址：https://www.cnblogs.com/dgutfly/p/11978537.html