zoukankan      html  css  js  c++  java
  • 使用Chrome无头浏览器获取puzzle team club解谜游戏的谜面

    零、用什么工具爬取网站

      之前的两个游戏谜面,都是眼看,手动输入的,这给解谜带来了一些不方便。尤其是那种special daily battle之类的,谜面都很大,一个个写很费时。有没有什么方法能快速拿到谜面,并且把谜面直接输出到文件里?答案是爬虫,网页抓取。

      只是puzzle team club的网页防爬虫措施做得太好,网页里没有关于谜面的信息,抓来的数据包分析不出(我会说是包的数量太多了吗),只能用无头浏览器。

      开始使用phantomJS,获取网页代码部分Python代码如下:

    def getChessByPhantomJS():
        driver = webdriver.PhantomJS()
        driver.get('https://www.puzzle-dominosa.com/?size=8')
        source = driver.page_source
        driver.quit()
    #
    View Code

      但是运行结果不如意,最终只给了一个没有谜面的基本模板网页。

      用Chrome效果有如何呢?(不晓得如何配置chrome无头浏览器的可以右转baidu)

    def getChessByChrome():
        path = r'D:chromedriver.exe'
        chrome_options = Options()
        #后面的两个是固定写法 必须这么写
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--disable-gpu')
        driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
        try:
            driver.get('https://www.puzzle-dominosa.com/?size=8')
        except Exception as e:
            print(e)
        source = driver.page_source
        driver.quit()
        return source
    View Code

      运行结果(不如说是运行过程,因为这个B一直不退出)

    DevTools listening on ws://127.0.0.1:62344/devtools/browser/8c9f8f4a-407a-4045-b
    41c-b9f898d4d37b
    [1203/174652.884:INFO:CONSOLE(1)] "Uncaught TypeError: window.googletag.pubads i
    s not a function", source: https://www.puzzle-dominosa.com/build/js/public/new/d
    ominosa-95ac3646ef.js (1)
    View Code

      可以给程序加个超时退出:

    def getChessByChrome():
        path = r'D:chromedriver.exe'
        chrome_options = Options()
        #后面的两个是固定写法 必须这么写
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--disable-gpu')
        driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
        try:
            driver.set_page_load_timeout(30)
            driver.get('https://www.puzzle-dominosa.com/?size=8')
        except Exception as e:
            print(e)
        source = driver.page_source
        driver.quit()
        return source
    View Code

      这样就能把网页代码交给分析函数,输出谜面了。

    一、如何拿到dominosa谜面

      不过就做到这里还没完,我们要的是谜面。为此,我们需要分析代码:

     图1.dominosa游戏的谜面代码

      看到了吧?这里的谜面直接反映在代码的class名上,cell3对应谜面的3,而且同级元素超过谜面单位长度时,谜面会换行。

      代码可以这样写:

    def solve():
        source = getChessByChrome()
        htree = etree.HTML(source)
        chessSize = len(htree.xpath('//div[@id="game"]/div/div/div/..'))
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
        if len(puzzleId) != 0:
            puzzleId = puzzleId[0]
        else:
            puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
        x = (round((4 * chessSize + 1)**0.5) - 1) // 2
        print(x)
        print(x+1)
        chess = ''
        for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div/div/..')):
            value = className.xpath('./@class')[0].split(' ')[1][4:]
            if i % (x+1) == x:
                chess += value + '
    '
            else:
                chess += value + ' '
        with open('dominosaChess' + puzzleId + '.txt','w') as f:f.write(chess[:-1])
    View Code

      这样就可以拿到使用Dancing link X (舞蹈链)求解dominosa游戏这里面要求的谜面文件了。

      附带一提,这里为了查询谜面方便,输出的文件名字带有谜面ID;如果这是特别谜题,则输出的文件名字带有特别谜题的标题。

      附带一些运行结果与谜面对比图(文件名dominosaChess7,092,762.txt):

    4 5 2 2 7 3 3 0 6
    2 7 5 6 2 6 4 1 5
    4 4 5 6 0 2 6 0 2
    7 3 3 5 0 0 3 4 4
    0 1 3 3 4 1 3 2 1
    5 7 0 5 3 2 1 1 6
    1 6 6 7 5 2 6 7 1
    7 4 0 0 4 5 1 7 7

      对应谜面截图:

     图2.ID为7,092,762的谜面

    二、如何拿到star battle谜面

      拿到符合使用深度优先搜索DFS求解star battle游戏这里面要求的谜面文件要费点功夫。

      咱们查看下图吧:

     图3.star battle谜面代码

      这里的谜面代码class名字都有一定意义,比如bl表示左侧有分割线,br表示右侧有分割线。

      这里只给我们提供了分割线,我们需要的是标示每个方格所属是哪个块的那种排布。要做到这种,我们需要使用BFS,宽度优先搜索。

    def solve():
        if url.find('size=') == -1:
            limit = 1
        else:
            size = url.split('size=')[1]
            size = int(size)
            if size >= 1 and size <= 4:
                limit = 1
            elif size <= 6:
                limit = 2
            elif size <= 8:
                limit = 3
            else:
                limit = size - 5
        source = getChessByFile()
        htree = etree.HTML(source)
        chessSize = len(htree.xpath('//div[@id="game"]/div/div'))
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
        if len(puzzleId) != 0:
            puzzleId = puzzleId[0]
        else:
            puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
        chessSize = round(chessSize**0.5)
        chess = [[-1 for _ in range(chessSize)] for __ in range(chessSize)]
        borderss = [['' for _ in range(chessSize)] for __ in range(chessSize)]
        chessStr = ''
        maxBlockNumber = 0
        # br: on the right; bl: on the left; bb: on the down; bt: on the up
        for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div[contains(@class,"cell")]')):
            x = i // chessSize
            y = i % chessSize
            value = className.xpath('./@class')[0]
            if value[:4] != 'cell':
                continue
            value = value.replace('cell selectable','')
            value = value.replace('cell-off','')
            borderss[x][y] = value
        for i in range(chessSize):
            for j in range(chessSize):
                if chess[i][j] != -1:
                    continue
                queue = [(i, j)]
                chess[i][j] = str(maxBlockNumber)
                while len(queue) > 0:
                    oldQueue = deepcopy(queue)
                    queue = []
                    for pos in oldQueue:
                        x, y = pos[0], pos[1]
                        #
                        if x > 0 and borderss[x][y].find('bt') == -1 and chess[x-1][y] == -1:
                            queue.append((x-1, y))
                            chess[x-1][y] = chess[i][j]
                        #
                        if x < chessSize - 1 and borderss[x][y].find('bb') == -1 and chess[x+1][y] == -1:
                            queue.append((x+1, y))
                            chess[x+1][y] = chess[i][j]
                        #
                        if y > 0 and borderss[x][y].find('bl') == -1 and chess[x][y-1] == -1:
                            queue.append((x, y-1))
                            chess[x][y-1] = chess[i][j]
                        #
                        if y < chessSize - 1 and borderss[x][y].find('br') == -1 and chess[x][y+1] == -1:
                            queue.append((x, y+1))
                            chess[x][y+1] = chess[i][j]
                        #
                maxBlockNumber += 1
        chessStr = '
    '.join(' '.join(chessRow) for chessRow in chess)
        with open('starBattleChess' + puzzleId + '.txt','w') as f:f.write(str(limit)+'
    '+chessStr)
    View Code

      附带一些运行结果与谜面对比图(文件名starBattleChess3,876,706.txt):

    1
    0 0 1 1 2
    0 0 3 1 2
    0 0 3 4 4
    0 3 3 4 4
    0 3 3 4 4

      对应谜面截图:

     图4.ID为3,876,706的谜面

  • 相关阅读:
    第05组 Alpha冲刺 (6/6)
    第05组 Alpha冲刺 (5/6)
    第五次作业
    第05组 Alpha冲刺 (4/6)
    第05组 Alpha冲刺 (3/6)
    第05组 Alpha冲刺 (2/6)
    第05组 Alpha冲刺 (1/6)
    第四次作业
    差分约束
    置换群的性质与burnside引理
  • 原文地址:https://www.cnblogs.com/dgutfly/p/11978537.html
Copyright © 2011-2022 走看看