投递任务,注意资源设置
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=4G #SBATCH --time 00:05:00 #SBATCH --job-name jupyter-notebook #SBATCH --output jupyter-notebook-%J.log # get tunneling info XDG_RUNTIME_DIR="" node=$(hostname -s) user=$(whoami) cluster="tigercpu" port=8889 # print tunneling instructions jupyter-log echo -e " Command to create ssh tunnel: ssh -N -f -L ${port}:${node}:${port} ${user}@${cluster}.princeton.edu Use a Browser on your local machine to go to: localhost:${port} (prefix w/ https:// if using password) " # load modules or conda environments here module load anaconda3 # Run Jupyter jupyter-lab --no-browser --port=${port} --ip=${node}
在本地电脑映射端口
ssh -N -f -L 8889:tiger-h26c2n22:8889 <yourusername>@tigercpu.princeton.edu
在浏览器中打开即可使用
有些分析比较耗费资源,结果文件也是上G的,这时再把结果copy到本地处理就也不合适了。
那就需要在HPC上使用python或R来处理数据,之前使用jupyter一直不成功,想把vim变成R的IDE,发现更难,各种配置很复杂,使用起来门槛也比较高。
今天碰巧搜素了PBS上运行jupyter,还真的找到了正确的配置方法,核心就是用ssh做了一个映射,用本地的端口来监听远程的端口,只要在一个局域网内,就能通过地址和主机名来连接,通过ssh协议来通讯
-N Do not execute a remote command. This is useful for just forwarding ports.
-f Requests ssh to go to background just before command execution. This is useful if ssh is going to ask for passwords or passphrases, but the user wants it in the background. This implies -n. The recommended way to start X11 programs at a remote site is with something like ssh -f host xterm. If the ExitOnForwardFailure configuration option is set to “yes”, then a client started with -f will wait for all remote port forwards to be successfully established before placing itself in the background.
-L local_socket:remote_socket Specifies that connections to the given TCP port or Unix socket on the local (client) host are to be forwarded to the given host and port, or Unix socket, on the remote side. This works by allocating a socket to listen to either a TCP port on the local side, optionally bound to the specified bind_address, or to a Unix socket. Whenever a connection is made to the local port or socket, the connection is forwarded over the secure channel, and a connection is made to either host port hostport, or the Unix socket remote_socket, from the remote machine. Port forwardings can also be specified in the configuration file. Only the superuser can forward privileged ports. IPv6 addresses can be specified by enclosing the address in square brackets. By default, the local port is bound in accordance with the GatewayPorts setting. However, an explicit bind_address may be used to bind the connection to a specific address. The bind_address of “localhost” indicates that the listening port be bound for local use only, while an empty address or ‘*’ indicates that the port should be available from all interfaces.
如此大部分的数据分析都可以用HPC来做了,有望实现数据分析的大一统。
在HPC上还有个优点,数据管理比较规范,数据也不容易丢失。
一些基本配置:
.libPaths() .libPaths("/home/-/softwares/R_lib_361") install.packages('IRkernel') IRkernel::installspec() install.packages("devtools", dependencies=TRUE, INSTALL_opts = c('--no-lock')) library(devtools)
nb添加目录
pip install jupyter_contrib_nbextensions jupyter contrib nbextension install --user
参考:
https://raw.githubusercontent.com/jalvesaq/Nvim-R/master/doc/Nvim-R.txt
https://gist.github.com/tgirke/7a7c197b443243937f68c422e5471899