在 Slurm 上运行 Jupyter
以下所有示例作业脚本中关于 SBATCH 的选项
supervisor
,请替换为用户账号关联的付费账户名- 其它的 SBATCH 选项请根据实际使用需要进行修改
首先创建一个 Slurm 作业脚本。
vi jupyterLab.sh
CPU 节点运行 Jupyter
将以下文本粘贴到作业脚本中,然后保存文件。
#!/bin/bash
#SBATCH --job-name=jupyter
#SBATCH --partition=hpxg
#SBATCH --ntasks-per-node=1
#SBATCH --time=06:00:00
#SBATCH --output=./jupyter.log
hostip=`ip a|grep 10.255.255.255|awk {'print $2'}|awk -F/ {'print $1'}` # 此行代码请勿删除或修改。
echo "ssh -L8888:${hostip}:8888 ${USER}@swarm.whu.edu.cn" > ./jupyter.log # 此行代码请勿删除或修改。
module load jupyter # 加载系统预装的jupyter,如需使用自己安装的jupyter,请注释或删除此行。
jupyter lab --ip=${hostip} --port=8888 # 通过计算节点的8888端口,启动JupyterLab服务
GPU 节点运行 Jupyter
将以下文本粘贴到作业脚本中,然后保存文件。
#!/bin/bash
#SBATCH --account=supervisor
#SBATCH --job-name=jupyter
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=06:00:00
#SBATCH --output=./jupyter.log
hostip=`ip a|grep 10.255.255.255|awk {'print $2'}|awk -F/ {'print $1'}` # 此行代码请勿删除或修改。
echo "ssh -L8888:${hostip}:8888 ${USER}@swarm.whu.edu.cn" > ./jupyter.log # 此行代码请勿删除或修改。
module load jupyter # 加载系统预装的jupyter,如需使用自己安装的jupyter,请注释或删除此行。
jupyter lab --ip=${hostip} --port=8888 # 通过计算节点的8888端口,启动JupyterLab服务
将此作业脚本提交给 Slurm:
sbatch jupyterLab.sh
现在,检查您的作业是否正在运行:
squeue -u $USER
确定作业已启动 后,检查 jupyter.log 日志输出,找出我们需要用于创建 SSH 隧道的相关信息。
[user@swarm ~]$ cat ./jupyter.log
如果成功运行,日志文件输出内容如下,仅需关注第一行和最后一行:
ssh -L8888:10.34.0.12:8888 xinming@swarm.whu.edu.cn
[I 2023-04-19 15:50:35.298 ServerApp] jupyterlab | extension was successfully linked.
[I 2023-04-19 15:50:35.305 ServerApp] nbclassic | extension was successfully linked.
[I 2023-04-19 15:50:38.073 ServerApp] notebook_shim | extension was successfully linked.
[I 2023-04-19 15:50:38.074 ServerApp] panel.io.jupyter_server_extension | extension was successfully linked.
[I 2023-04-19 15:50:38.260 ServerApp] notebook_shim | extension was successfully loaded.
[I 2023-04-19 15:50:38.261 LabApp] JupyterLab extension loaded from /software/conda/anaconda3/2023.03/lib/python3.10/site-packages/jupyterlab
[I 2023-04-19 15:50:38.262 LabApp] JupyterLab application directory is /software/conda/anaconda3/2023.03/share/jupyter/lab
[I 2023-04-19 15:50:38.270 ServerApp] jupyterlab | extension was successfully loaded.
[I 2023-04-19 15:50:38.313 ServerApp] nbclassic | extension was successfully loaded.
[I 2023-04-19 15:50:38.314 ServerApp] panel.io.jupyter_server_extension | extension was successfully loaded.
[I 2023-04-19 15:50:38.317 ServerApp] Serving notebooks from local directory: /home/xinming
[I 2023-04-19 15:50:38.317 ServerApp] Jupyter Server 1.23.4 is running at:
[I 2023-04-19 15:50:38.317 ServerApp] http://10.34.0.12:8888/lab?token=5ab3861accf57e32f1351ab895cb456c30a20cf9cd10f86c
[I 2023-04-19 15:50:38.317 ServerApp] or http://127.0.0.1:8888/lab?token=5ab3861accf57e32f1351ab895cb456c30a20cf9cd10f86c
[I 2023-04-19 15:50:38.317 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-04-19 15:50:38.438 ServerApp]
To access the server, open this file in a browser:
file:///home/xinming/.local/share/jupyter/runtime/jpserver-17370-open.html
Or copy and paste one of these URLs:
http://10.34.0.12:8888/lab?token=5ab3861accf57e32f1351ab895cb456c30a20cf9cd10f86c
or http://127.0.0.1:8888/lab?token=5ab3861accf57e32f1351ab895cb456c30a20cf9cd10f86c
在本地电脑上打开一个新的终端窗口,复制并执行 ./jupyter.log 第一行生成的命令,创建一个SSH隧道:
windows 通过 cmd (命令提示符)
执行 ,Linux 或 Mac 通过 terminal
执行。
ssh -L8888:10.34.0.12:8888 xinming@swarm.whu.edu.cn # 输入密码登录成功后,请勿关闭此终端窗口。
在通过本地电脑打开浏览器,访问 ./jupyter.log 最后一行生成的网址。
http://127.0.0.1:8888/lab?token=5ab3861accf57e32f1351ab895cb456c30a20cf9cd10f86c
如果在日志提示端口 8888 已在使用中,请取消此作业。并修改作业脚本中的8888端口(1024 以上的任何端口)。