http://bluehua.org/page/2
使用phantomjs抓取ITC和android market的安装统计数据
九月 14th, 2011
同事用python抓各大andriod市场的页面分析软件下载量,发现andriod market的数据是通过ajax加载的,而且貌似加 密过了,没法直接解析。后来俺发现了phantomjs这个命令行的webkit,在可以在命令行渲染网页,这样不论数据怎么加载,怎么加密,对于标准的 浏览器都无能为力了。软件的原理跟以前介绍的一个命令行web截图工具一样,内嵌一个qt4的webkit,然后渲染到xvfb虚拟出的xserver上。不同是这个提供了js的api,用起来方便,可以用来做爬虫,站点监控,服务端截图。
使用方法:
第一步,安装phantomjs
Mac os & windows:
直接下载.dmg或.exe安装包即可:http://code.google.com/p/phantomjs/downloads/list
mac下安装完闭,可执行文件的路径:/Applications/phantomjs.app/Contents/MacOS/phantomjs
ubuntu:
sudo add-apt-repository ppa:jerome-etienne/neoip sudo apt-get update sudo apt-get install phantomjs
centos 5.3:
折腾开始了。。由于phantomjs的linux版本是通过pyqt4实现的,所以装起来比较麻烦
首先我们需要安装qt4.7.而yum默认安装的是4.1
rpm -ivh http://software.freivald.com/centos/software.freivald.com-1.0.0-1.noarch.rpm yum update fontconfig fontconfig-devel yum install qt4 qt4-devel #如果已经安装过qt4则执行 yum update qt4 qt4-devel
安装Xvfb
yum install xorg-x11-server-Xvfb xorg-x11-server-Xorg xorg-x11-fonts*
安装python 2.7,自带的为2.4,没法用
wget http://python.org/ftp/python/2.7.2/Python-2.7.2.tar.bz2 tar jxvf Python-2.7.2.tar.bz2 cd Python-2.7.2 ./configure --prefix=/opt/python27 make make install cd ..
安装setup tools
wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz#md5=7df2a529a074f613b509fb44feefe74e tar zxvf setuptools-0.6c11.tar.gz cd setuptools-0.6c11 /opt/python27/bin/python setup.py install cd ..
安装sip
wget http://www.riverbankcomputing.com/static/Downloads/sip4/sip-4.12.4.tar.gz tar zxvf sip-4.12.4.tar.gz cd sip-4.12.4 /opt/python27/bin/python configure.py make make install cd ..
安装pyqt4
wget http://www.riverbankcomputing.com/static/Downloads/PyQt4/PyQt-x11-gpl-4.8.5.tar.gz tar zxvf PyQt-x11-gpl-4.8.5.tar.gz cd PyQt-x11-gpl-4.8.5 /opt/python27/bin/python configure.py -q /usr/lib/qt4/bin/qmake #对于64位系统 #/opt/python27/bin/python configure.py -q /usr/lib64/qt4/bin/qmake make make install cd ..
最后安装pyphantomjs
mkdir pyphantomjs cd pyphantomjs wget http://phantomjs.googlecode.com/files/pyphantomjs-1.2.0-source.zip unzip pyphantomjs-1.2.0-source.zip /opt/python27/bin/python setup.py install
折腾到这里,pyphantomjs已经安装到了/opt/python27/bin/pyphantomjs
直接执行/opt/python27/bin/pyphantomjs –help,会发现报错
sip.setapi('QString', 2) ValueError: API 'QString' has already been set to version 1
解决方法
编辑/opt/python27/bin/pyphantomjs,在开始追加几句
#!/opt/python27/bin/python # EASY-INSTALL-ENTRY-SCRIPT: 'PyPhantomJS==1.2.0','console_scripts','pyphantomjs' #fix start import sip sip.setapi('QString', 2) sip.setapi('QVariant', 2) #fix end __requires__ = 'PyPhantomJS==1.2.0'
第二步:
下载我们用来解析数据的js脚本:android_itc_daliy_report
修改里面的变量:
TIMEOUT = 120;//脚本执行超时时间 ACCOUNT = '';//登陆账号 PASSWORD = '';//密码
第三步:执行抓取脚本
对于Mac os:
#抓取andriod market的安装总量 /Applications/phantomjs.app/Contents/MacOS/phantomjs --load-images=no AndroidMarketDailyReport.js #抓取ITC的每天的安装量,需要制定日期,而且日期必须是web页面里的日期选择框里有的日期 /Applications/phantomjs.app/Contents/MacOS/phantomjs --load-images=no ITCDailyReport.js 09/06/2011
对于Centos:
#首先保证Xvfb已经启动 Xvfb :0 -screen 0 1024x768x24 & #抓取andriod market的安装量 DISPLAY=:0 /opt/python27/bin/pyphantomjs --load-images=no --ignore-ssl-errors=yes AndroidMarketDailyReport.js #抓取ITC的每天的安装量,需要制定日期,而且日期必须是web页面里的日期选择框里有的日期 DISPLAY=:0 /opt/python27/bin/pyphantomjs --load-images=no --ignore-ssl-errors=yes ITCDailyReport.js 09/06/2011
获取输出结果,以Mac os为例:
/Applications/phantomjs.app/Contents/MacOS/phantomjs –load-images=no ITCDailyReport.js 09/06/2011|grep REPORT
REPORT: soft_name 0000
REPORT: soft_name 0000
/Applications/phantomjs.app/Contents/MacOS/phantomjs –load-images=no AndroidMarketDailyReport.js |grep REPORT
REPORT: total 0000
REPORT: real 0000
如果没有输出结果,则说明有异常,账号错误,超时,等等。。
Key | Mac | Windows | Linux | Notes |
rbKeyUp | 126 | 26 | 103 | |
rbKeyDown | 125 | 28 | 108 | |
rbKeyLeft | 123 | 25 | 105 | |
rbKeyRight | 124 | 27 | 106 | |
rbKeyBackspace | 117 | 8 | 14 | |
rbKeyEnter | 76 | * | 28 | |
rbKeyHome | 115 | 36 | 102 | |
rbKeyEnd | 119 | 35 | 107 | |
rbKeyPageDown | 121 | 34 | 109 | |
rbKeyPageUp | 116 | 33 | 104 | |
rbKeyReturn | 36 | 13 | * | |
rbKeyDelete | 51 | 46 | 111 | |
rbKeyTab | 48 | 9 | 15 | |
rbKeySpacebar | 49 | 20 | 57 | |
rbKeyShift | 56 | 10 | * | |
rbKeyControl | 59 | 11 | * | |
rbKeyMenu | 58 | 18 | 139 | The Alt key |
rbKeyPrintScreen | * | 42 | 210 | |
rbKeyEscape | 53 | 27 | 1 | |
rbKeyCapsLock | 57 | 20 | 58 | |
rbKeyHelp | 114 | 47 | 138 | |
rbKeyF1 | 122 | 112 | 59 | |
rbKeyF2 | 120 | 113 | 60 | |
rbKeyF3 | 99 | 114 | 61 | |
rbKeyF4 | 118 | 115 | 62 | |
rbKeyF5 | 96 | 116 | 63 | |
rbKeyF6 | 97 | 117 | 64 | |
rbKeyF7 | 98 | 118 | 65 | |
rbKeyF8 | 100 | 119 | 66 | |
rbKeyF9 | 101 | 120 | 67 | |
rbKeyF10 | 109 | 121 | 68 | |
rbKeyF11 | 103 | 122 | 87 | |
rbKeyF12 | 111 | 123 | 88 | |
rbKeyMacFn | 63 | * | * | |
rbKeyMacOption | 58 | * | * | |
rbKeyMacCommand | 55 | * | * | |
rbKeyWinLeftWindow | * | 91 | * | On “Natural” keyboards |
rbKeyWinRightWindow | * | 92 | * | On “Natural” keyboards |
rbKeyWinApplication | 110 | 93 | * | On “Natural” keyboards |
rbKeyQ | 12 | 81 | 16 | |
rbKeyW | 13 | 87 | 17 | |
rbKeyE | 14 | 69 | 18 | |
rbKeyR | 15 | 82 | 19 | |
rbKeyT | 17 | 84 | 20 | |
rbKeyY | 16 | 89 | 21 | |
rbKeyU | 32 | 85 | 22 | |
rbKeyI | 34 | 73 | 23 | |
rbKeyO | 31 | 79 | 24 | |
rbKeyP | 35 | 80 | 25 | |
rbKeyA | * | 65 | 30 | |
rbKeyS | 1 | 83 | 31 | |
rbKeyD | 2 | 68 | 32 | |
rbKeyF | 3 | 70 | 33 | |
rbKeyG | 5 | 71 | 34 | |
rbKeyH | 4 | 72 | 35 | |
rbKeyJ | 38 | 74 | 36 | |
rbKeyK | 40 | 75 | 37 | |
rbKeyL | 37 | 76 | 38 | |
rbKeyZ | 6 | 90 | 44 | |
rbKeyX | 7 | 88 | 45 | |
rbKeyC | 8 | 67 | 46 | |
rbKeyV | 9 | 86 | 47 | |
rbKeyB | 11 | 66 | 48 | |
rbKeyN | 45 | 78 | 49 | |
rbKeyM | 46 | 77 | 50 | |
rbKey0 | 29 | 48 | 11 | |
rbKey1 | 18 | 49 | 2 | |
rbKey2 | 19 | 50 | 3 | |
rbKey3 | 20 | 51 | 4 | |
rbKey4 | 21 | 52 | 5 | |
rbKey5 | 23 | 53 | 6 | |
rbKey6 | 22 | 54 | 7 | |
rbKey7 | 26 | 55 | 8 | |
rbKey8 | 28 | 56 | 9 | |
rbKey9 | 25 | 57 | 10 | |
rbKeyPeriod | 47 | 190 | 52 | |
rbKeyComma | 43 | 188 | 51 | |
rbKeySlash | 44 | 191 | 53 | The key with /? generally next to right shift key. |
rbKeyNum0 | 82 | 96 | 82 | On numeric keypad or with NumLock |
rbKeyNum1 | 83 | 97 | 79 | On numeric keypad or with NumLock |
rbKeyNum2 | 84 | 98 | 80 | On numeric keypad or with NumLock |
rbKeyNum3 | 85 | 99 | 81 | On numeric keypad or with NumLock |
rbKeyNum4 | 86 | 100 | 75 | On numeric keypad or with NumLock |
rbKeyNum5 | 87 | 101 | 76 | On numeric keypad or with NumLock |
rbKeyNum6 | 88 | 102 | 77 | On numeric keypad or with NumLock |
rbKeyNum7 | 89 | 103 | 71 | On numeric keypad or with NumLock |
rbKeyNum8 | 91 | 104 | 72 | On numeric keypad or with NumLock |
rbKeyNum9 | 92 | 105 | 73 | On numeric keypad or with NumLock |
rbKeyMultiply | 67 | 106 | 55 | On numeric keypad or with NumLock |
rbKeyAdd | 69 | 107 | 78 | On numeric keypad or with NumLock |
rbKeySubtract | 78 | 109 | 74 | On numeric keypad or with NumLock |
rbKeyDivide | 75 | 111 | 98 | On numeric keypad or with NumLock |
rbKeyDecimal | 65 | 110 | 83 | On numeric keypad or with NumLock |
rbKeyNumEqual | 81 | * | 117 | On numeric keypad or with NumLock |
Lua&ios dev
八月 13th, 2011
又一个与同事分享的ppt,关于lua的一点皮毛,满足大家的好奇心。关于lua本身,我也只能分享这点皮毛了。合适的地方用合适的语言,如此而已。
Hello lua
六月 23rd, 2011
lua将作为下一门新语言学习。如果按照一年学习一门新语言的标准来说,我算超勤快的了。lua被发明的目的便是嵌入C或C++程序,给程序带来编译语言不及的灵活性。
google了一下lua,发现大部分都是做游戏开发的C++程序员的文章,看来不只是魔兽世界在用。
lua做嵌入的优势:
1 . 小,整个解释器不到200K
实际测试:
编译一个空的IOS项目 205K
嵌入Lua后的IOS项目 414K
2 . 运行速度快
实际测试:
做了三个测试程序,分别内嵌lua(静态链接),javascript(动态链接JavascriptCore,链接库6.4M),python(动态链接python2.6,链接库2.0M)
三个程序都做相同的事情,初始化一个脚本运行环境,打印一个字符串,销毁
比如lua,其他JS和python类似,只不过调用的api不一样:
static int lua_printf(lua_State *L) { const char *cmsg = luaL_checkstring(L, 1); printf("%s\n", cmsg); return 0; } static void eval_lua(NSString *code) { lua_State *L; L = lua_open(); luaopen_base(L); lua_register(L, "printf", lua_printf); luaL_loadstring(L, [code cStringUsingEncoding:NSUTF8StringEncoding]); lua_pcall(L, 0, LUA_MULTRET, 0); lua_close(L); } int main (int argc, const char * argv[]) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate]; eval_lua(@"printf(\"你好 lua\")"); NSTimeInterval duration = [NSDate timeIntervalSinceReferenceDate] - start; printf("total time:%.5f\n", duration); [pool release]; return 0; }
PK结果
$ ls -lh *_test -rwxr-xr-x 1 hualu staff 10K 6 23 20:32 js_test -rwxr-xr-x 1 hualu staff 136K 6 23 20:37 lua_test -rwxr-xr-x 1 hualu staff 9.4K 6 23 20:32 py_test $ ./lua_test 你好 lua total time:0.00018 $ ./js_test 你好 Javascript total time:0.00557 $ ./py_test 你好 Python total time:0.01311
3. lua支持多线程,每个线程可以配置独立的解释器(没有亲测,道听途说)
4. 语法简单,其实这个可以算优点,比JS要简单易懂的多。。。
lua这么小巧的身躯太适合嵌入手机软件了。可以动态的从server上加载一些lua脚本来运行,免去劳烦用户更新软件的烦恼~.就目前所知,愤怒的小鸟是一个混合编程的好例子,关卡的设置均由lua控制。
其实已经有geek为前面三门语言做了Objc的Bridge,项目分别是:
对于lua有wax
对于js有jscocoa
对于python有PyObjc
而且他们的目的已经不是简单的嵌入Objectivc了,而是代替objc作为MAC或IOS应用的开发语言。。。当然我并不是很赞同这种偷懒的方法,脚本要适度使用。
最后,看的一些资料:
Lua 5.1 参考手册
xcode中添加静态链接库
lua和python谁更适用于嵌入MMORPG?
three20的TTURLRequest的两点
六月 20th, 2011
1. 即使发异步请求,请求也不是并发的,而是通过TTURLRequestQueue排队顺序完成。
2. view在滚动,或者使用特效切换view时会暂停掉TTURLRequestQueue
- (void)scrollViewWillBeginDragging:(UIScrollView *)scrollView { //开始拖拽滚屏的时候暂停请求 //这揍是为啥拖动的时候TTImageView都不会加载的原因 [TTURLRequestQueue mainQueue].suspended = YES; [_controller didBeginDragging]; if ([scrollView isKindOfClass:[TTTableView class]]) { TTTableView* tableView = (TTTableView*)scrollView; tableView.highlightedLabel.highlightedNode = nil; tableView.highlightedLabel = nil; } }