You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Next, open the SSH Keys page in your Alliance account: [https://ccdb.alliancecan.ca/ssh_authorized_keys](https://ccdb.alliancecan.ca/ssh_authorized_keys). Paste your key into the SSH Key field, give it a name (typically the host name of the computer where you generated it) and hit Add Key.
68
68
69
+
**NOTE:** You may need to wait up to 30 minutes after adding your ssh key for it to work when trying to login via ssh. Have lunch and come back.
69
70
70
71
## SSH Access
71
72
@@ -127,6 +128,8 @@ In addition to your home directory, you have a minimum of additional 250 GB scra
127
128
128
129
A detailed description of the scratch purging policy is available on the Alliance Canada website: [https://docs.alliancecan.ca/wiki/Scratch_purging_policy](https://docs.alliancecan.ca/wiki/Scratch_purging_policy)
129
130
131
+
Your scratch space directory will not exist when you initially log in. To have it set up send a request to [[email protected]](mailto:[email protected]). Include the name of your PI in the email.
132
+
130
133
## Shared projects
131
134
132
135
For collaborative projects where many people need access to the same files, you need a shared project space. These are stored at `/project`.
@@ -140,7 +143,7 @@ To reduce the storage footprint for each user, we've made various commonly-used
140
143
Instead of copying these datasets on your home directory, you can create a symlink via:
141
144
142
145
```
143
-
ln -s /dataset/PATH_TO_DATASET ~/PATH_OF_LINK # path of link can be some place in your home directory so that PyTorch/TF can pick up the dataset to these already downloaded directories.
146
+
ln -s /datasets/PATH_TO_DATASET ~/PATH_OF_LINK # path of link can be some place in your home directory so that PyTorch/TF can pick up the dataset to these already downloaded directories.
144
147
```
145
148
146
149
## Shared model weights
@@ -152,6 +155,61 @@ Similar to datasets, model weights are typically very large and can be shared am
152
155
Unlike the legacy Bon Echo (Vaughan) cluster, there is no dedicated checkpoint space in the Killarney cluster. Now that the `$SCRATCH` space has been greatly expanded, please use this for any training checkpoints.
153
156
154
157
158
+
# Migration from legacy Vaughan (Bon Echo) Cluster
159
+
160
+
**NOTE:** The approach for migrating detailed here requires that you set up a second ssh key on killarney. Your public ssh key on the vaughan cluster will be different than the one on your local machine.
161
+
162
+
The easiest way to migrate data from the legacy Vaughan (Bon Echo) Cluster to Killarney is by using a file transfer command (likely `rsync` or `scp`) from an SSH session.
163
+
164
+
Start by connecting via https://support.vectorinstitute.ai/Killarney?action=AttachFile&do=view&target=User+Guide+to+Killarney+for+Vector+Researchers.pdfsh into the legacy Bon Echo (Vaughan) cluster:
165
+
166
+
167
+
```
168
+
username@my-desktop:~$ ssh v.vectorinstitute.ai
169
+
Password:
170
+
Duo two-factor login for username
171
+
172
+
Enter a passcode or select one of the following options:
173
+
174
+
1. Duo Push to XXX-XXX-3089
175
+
2. SMS passcodes to XXX-XXX-3089
176
+
177
+
Passcode or option (1-2): 1
178
+
Success. Logging you in...
179
+
Welcome to the Vector Institute HPC - Vaughan Cluster
180
+
181
+
Login nodes are shared among many users and therefore
182
+
must not be used to run computationally intensive tasks.
183
+
Those should be submitted to the slurm scheduler which
184
+
will dispatch them on compute nodes.
185
+
186
+
For more information, please consult the wiki at
187
+
https://support.vectorinstitute.ai/Computing
188
+
For issues using this cluster, please contact us at
If you forget your password, please visit our self-
191
+
service portal at https://password.vectorinstitute.ai.
192
+
193
+
Last login: Mon Aug 18 07:28:24 2025 from 184.145.46.175
194
+
```
195
+
196
+
Next, use the `rsync` command to copy files across to the Killarney cluster. In the following example, I'm copying the contents of a folder called `my_projects` to my Killarney home directory.
For CPU's, A/I/OT stands for **A**llocated, **I**dle, **O**ther (eg. down) and **T**otal. Even if the GPU's on a node are available, if there are no Idle CPU's on the node then you won't be able to use it.
392
+
333
393
334
394
# Software Environments
335
395
396
+
## Pre-installed Environments
336
397
The cluster comes with preinstalled software environments called **modules**. These will allow you to access many different versions of Python, VS Code Server, RStudio Server, NodeJS and many others.
337
398
338
399
To see the available preinstalled environments, run:
@@ -347,7 +408,8 @@ To use an environment, use `module load`. For example, if you need to use Python
347
408
module load python/3.10.12
348
409
```
349
410
350
-
If there isn't a preinstalled environment for your needs, you can use Poetry or python-venv. Here is a quick example of how to use python venv.
411
+
## Custom Environments
412
+
If there isn't a preinstalled environment for your needs, you can use [uv](https://docs.astral.sh/uv/), or python-venv. For ongoing projects it is highly recommended to use uv to manage dependencies. To just run something quickly one time, python-venv might be easier. Here is a quick example of how to use python venv.
351
413
352
414
In the login node run the following:
353
415
@@ -407,7 +469,7 @@ When a job exceeds its time limit, it will get stopped by the Slurm scheduler. F
407
469
408
470
In order to avoid losing your work when your job exits, you will need to implement checkpoints - periodic snapshots of your work that you load from so you can stop and resume without much lost work.
409
471
410
-
On the legacy Bon Echo cluster, there was a dedicated checkpoint space in the file system for checkpoints. **⚠️ In Killarney, there is no dedicated checkpoint space.** Users are expected to manage their own checkpoints under their `$SCRATCH` folder.
472
+
On the legacy Bon Echo cluster, there was a dedicated checkpoint space in the file system for checkpoints. **⚠️ In Killarney, there is no dedicated checkpoint space.** Users are expected to manage their own checkpoints under their `$SCRATCH` folder. Recall that your scratch folder is not permanent, and so you'll want to move any important checkpoints to you're home or project folder.
0 commit comments