预计阅读时间:15分钟
背景:搜索资料时候偶然发现的,很有意思,每一关都覆盖了很多知识点
Python版本:3.0
Talking is cheap,show me the code
主页: http://www.pythonchallenge.com/
热身关: 点击开始挑战,进入热身关卡
http://www.pythonchallenge.com/pc/def/0.html
1.根据提示,输入238.html
2.得到新提示: No... the 38 is a little bit above the 2...
3. 重新观察图片,输入 http://www.pythonchallenge.com/pc/def/274877906944.html
1 bogon:~ hbai$ python 2 Python 2.7.6 (default, Sep 9 2014, 15:04:36) 3 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin 4 Type "help", "copyright", "credits" or "license" for more information. 5 >>> 2**38 6 274877906944 7 >>>
4. 恭喜,正式进入第一关
第一关:http://www.pythonchallenge.com/pc/def/map.html
主页提示: What about making trans? 根据翻译规律,#k -> M O->Q E > G 每个字符都后移2位
1 #coding=utf-8 2 3 #In py2.7 need following import 4 #from string import maketrans 5 6 7 #page: http://www.pythonchallenge.com/pc/def/map.html 8 #尝试1: 替换指定的3个字符,发现句子还是看不懂 9 #k -> M O->Q E > G 10 str = "g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj." 11 12 print(str.replace('k','m').replace('o','q').replace('e','g')) 13 14 #根据提示,使用transtab翻译 15 intab = "abcdefghijklmnopqrstuvwxyz" 16 outtab = "cdefghijklmnopqrstuvwxyzab" 17 trantab = str.maketrans(intab, outtab) 18 19 print(str.translate(trantab)) 20 21 #http://www.pythonchallenge.com/pc/def/map.html 22 print('http://www.pythonchallenge.com/pc/def/'+ 'map'.translate(trantab) + '.html')
使用maketrans、translate进行翻译,过关
第二关:http://www.pythonchallenge.com/pc/def/ocr.html
根据提示,查看网页源代码
<!-- find rare characters in the mess below: --> <!-- %%$@_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{_@#_^{* @##&{#
。。。。
-->
目标:找到出现最少的字符
将字符串copy到本地保存,运行很慢,但是最终得到答案equality
def check_CharFrequence(str): decode = [] for i in str: if str.count(i) < 5: decode.append(i) print(''.join(decode)) #print sorted(char_freq.items(),key = lambda x: (x[1])) # aeilquty with open('C2_info.txt') as f: #My method: it's very very not good, because of N^N complex check_CharFrequence(f.read()) print(f.read())
进一步的思考: 请查看标准答案页面 http://www.pythonchallenge.com/pcc/def/equality.html
第三关:http://www.pythonchallenge.com/pc/def/equality.html
根据提示: One small letter, surrounded by EXACTLY three big bodyguards on each of its sides.
第一次写出正则: #pattern = re.compile('[A-Z]{3}([a-z])[A-Z]{3}',re.S) ,保存源码后运行但是发现还是不对
后来参考答案,发现应该修改如下
#coding=utf-8 import re #page= http://www.pythonchallenge.com/pc/def/equality.html #Previous std answer: http://www.pythonchallenge.com/pcc/def/equality.html #Current page is http://www.pythonchallenge.com/pcc/def/linkedlist.php sampleStr='kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJIHEmbT \ MUKLECKdCthezSYBpIElRnZugFAxDRtQPpyeCBgBfaRVvvguRXLvkAdLOeCKxsDUvBBCwdpMMWmuELeG \ ENihrpCLhujoBqPRDPvfzcwadMMMbkmkzCCzoTPfbRlzBqMblmxTxNniNoCufprWXxgHZpldkoLCrHJq \ vYuyJFCZtqXLhWiYzOXeglkzhVJIWmeUySGuFVmLTCyMshQtvZpPwuIbOHNoBauwvuJYCmqznOBgByPw' '''Hint: One small letter, surrounded by EXACTLY three big bodyguards on each of its sides. ''' #My method is NOT correct #pattern = re.compile('[A-Z]{3}([a-z])[A-Z]{3}',re.S) #Following is CORRECT pattern = re.compile('[a-z][A-Z]{3}([a-z])[A-Z]{3}[a-z]') #print pattern.findall(sampleStr) with open('C3_info.txt') as f: codeList = pattern.findall(f.read()) print(''.join(codeList))
第四关: http://www.pythonchallenge.com/pc/def/linkedlist.php
老规矩,查看源码发现提示如下:
<!-- urllib may help. DON'T TRY ALL NOTHINGS, since it will never end. 400 times is more than enough. --> <center> <a href="linkedlist.php?nothing=12345"><img src="chainsaw.jpg" border="0"/></a>
尝试打开页面http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
有意思,需要依次爬到第300层就胜利了
1 # coding=utf-8 2 3 # page = http://www.pythonchallenge.com/pc/def/linkedlist.php 4 5 page = 'http://www.pythonchallenge.com/pc/def/linkedlist.php' 6 loopMainpage = 'http://www.pythonchallenge.com/pc/def/' 7 firstpage = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=' 8 9 from urllib import request 10 import time 11 import re 12 13 14 def looppage(page,num): 15 response = request.urlopen(page+str(num)) 16 html = response.read() 17 print(html.decode("utf-8")) 18 pattern = re.compile('and the next nothing is (\d{1,10}).*?') 19 target = re.findall(pattern,html.decode("utf-8")) 20 print(page + target[0]) 21 return target[0] 22 23 24 import random 25 return_num = looppage(firstpage,'82682') 26 i = 0 27 while i < 300: 28 print('Index %s:' % i) 29 return_num = looppage(firstpage,return_num) 30 time.sleep(random.randint(5,10)) 31 i +=1
爬的过程中遇到的坑:
1. 页面有时候就不响应了,因此添加了随机等待时间 (其实应该用匿名代理随机爬最保险,但是那个方法我还没写完。。。)
2. 有一层是提示要当前数字除以二,因此要手工输入一次再继续爬
爬到最后,恭喜你: peak.html
Index 107:
and the next nothing is 52899
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=52899
Index 108:
and the next nothing is 66831
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=66831
Index 109:
peak.html
Traceback (most recent call last):
File "/Users/hbai/PycharmProjects/interview/Py_study/pythonchallenge/C4.py", line 57, in <module>
return_num = looppage(firstpage,return_num)
File "/Users/hbai/PycharmProjects/interview/Py_study/pythonchallenge/C4.py", line 32, in looppage
print(page + target[0])
IndexError: list index out of range
Process finished with exit code 1
第五关:http://www.pythonchallenge.com/pc/def/peak.html
网上提示应该使用pickle库进行操作,试了一下没成功,有空再继续吧
To Be Continued...