You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-6Lines changed: 17 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -141,7 +141,10 @@ Create a `.env` file with the following environment variables:
141
141
-`DISCORD_DEBUG_CLUSTER_STAGING_ID` : The ID of the "staging" server you want to connect to
142
142
-`DISCORD_CLUSTER_STAGING_ID` : The ID of the "production" server you want to connect to
143
143
-`GITHUB_TOKEN` : A Github token with permissions to trigger workflows, for now only new branches from [discord-cluster-manager](https://github.com/gpu-mode/discord-cluster-manager) are tested, since the bot triggers workflows on your behalf
144
+
-`GITHUB_REPO` : The repository where the cluster manager is hosted.
145
+
-`GITHUB_WORKFLOW_BRANCH` : The branch to start the GitHub Actions jobs from when submitting a task.
144
146
-`DATABASE_URL` : The URL you use to connect to Postgres.
147
+
-`DISABLE_SSL` : (Optional) set if you want to disable SSL when connecting to Postgres.
145
148
146
149
Below is where to find these environment variables:
147
150
@@ -169,14 +172,22 @@ Below is where to find these environment variables:
169
172
<img width="1440" alt="Screenshot 2024-12-30 at 8 51 59 AM" src="https://github.com/user-attachments/assets/e3467871-bd2c-4f94-b0c5-c8a6ef5ce89e">
170
173
</details>
171
174
175
+
-`GITHUB_REPO`: This should be set to this repository, which is usually `gpu-mode/discord-cluster-manager`.
176
+
177
+
-`GITHUB_WORKFLOW_BRANCH`: Usually `main` or the branch you are working from.
178
+
172
179
-`DATABASE_URL`: This contains the connection details for your local database, and has the form `postgresql://user:password@localhost/clusterdev`.
173
180
181
+
-`DISABLE_SSL`: Set to `1` when developing.
182
+
174
183
### Verify Setup
175
184
185
+
Install the kernel bot as editable using `pip install -e .`
186
+
176
187
Run the following command to run the bot:
177
188
178
189
```
179
-
python src/discord-cluster-manager/bot.py --debug
190
+
python src/kernelbot/main.py --debug
180
191
```
181
192
182
193
Then in your staging server, use the `/verifyruns` command to test basic functionalities of the bot and the `/verifydb` command to check database connectivity.
@@ -232,7 +243,7 @@ specify the available GPUs that the leaderboard evaluates on.
232
243
The Discord bot internally contains an `eval.py` script that handles the correctness and timing
233
244
analysis for the leaderboard. The `reference_code` that the leaderboard creator submits must have
234
245
the following function signatures with their implementations filled out. `InputType` and
235
-
`OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`, etc.
246
+
`OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`, etc.
236
247
depending on the reference code specifications. We leave this flexibility to the leaderboard creator.
237
248
238
249
```python
@@ -257,8 +268,8 @@ handle the typing system for tensors. The `reference.cu` that the leaderboard cr
257
268
the following function signatures with their implementations filled out:
258
269
259
270
The main difference is we now need to define an alias for the type that the input / outputs are. A
260
-
simple and common example is a list of FP32 tensors, which can be defined using a pre-defined array of
261
-
`const int`s called `N_SIZES`, then define an array of containers, e.g.
271
+
simple and common example is a list of FP32 tensors, which can be defined using a pre-defined array of
272
+
`const int`s called `N_SIZES`, then define an array of containers, e.g.
0 commit comments