Skip to content

Commit 5a2043e

Browse files
authored
Some setup fixes (#329)
1 parent 13e95da commit 5a2043e

File tree

3 files changed

+23
-12
lines changed

3 files changed

+23
-12
lines changed

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,10 @@ Create a `.env` file with the following environment variables:
141141
- `DISCORD_DEBUG_CLUSTER_STAGING_ID` : The ID of the "staging" server you want to connect to
142142
- `DISCORD_CLUSTER_STAGING_ID` : The ID of the "production" server you want to connect to
143143
- `GITHUB_TOKEN` : A Github token with permissions to trigger workflows, for now only new branches from [discord-cluster-manager](https://github.com/gpu-mode/discord-cluster-manager) are tested, since the bot triggers workflows on your behalf
144+
- `GITHUB_REPO` : The repository where the cluster manager is hosted.
145+
- `GITHUB_WORKFLOW_BRANCH` : The branch to start the GitHub Actions jobs from when submitting a task.
144146
- `DATABASE_URL` : The URL you use to connect to Postgres.
147+
- `DISABLE_SSL` : (Optional) set if you want to disable SSL when connecting to Postgres.
145148

146149
Below is where to find these environment variables:
147150

@@ -169,14 +172,22 @@ Below is where to find these environment variables:
169172
<img width="1440" alt="Screenshot 2024-12-30 at 8 51 59 AM" src="https://github.com/user-attachments/assets/e3467871-bd2c-4f94-b0c5-c8a6ef5ce89e">
170173
</details>
171174

175+
- `GITHUB_REPO`: This should be set to this repository, which is usually `gpu-mode/discord-cluster-manager`.
176+
177+
- `GITHUB_WORKFLOW_BRANCH`: Usually `main` or the branch you are working from.
178+
172179
- `DATABASE_URL`: This contains the connection details for your local database, and has the form `postgresql://user:password@localhost/clusterdev`.
173180

181+
- `DISABLE_SSL`: Set to `1` when developing.
182+
174183
### Verify Setup
175184

185+
Install the kernel bot as editable using `pip install -e .`
186+
176187
Run the following command to run the bot:
177188

178189
```
179-
python src/discord-cluster-manager/bot.py --debug
190+
python src/kernelbot/main.py --debug
180191
```
181192

182193
Then in your staging server, use the `/verifyruns` command to test basic functionalities of the bot and the `/verifydb` command to check database connectivity.
@@ -232,7 +243,7 @@ specify the available GPUs that the leaderboard evaluates on.
232243
The Discord bot internally contains an `eval.py` script that handles the correctness and timing
233244
analysis for the leaderboard. The `reference_code` that the leaderboard creator submits must have
234245
the following function signatures with their implementations filled out. `InputType` and
235-
`OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`, etc.
246+
`OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`, etc.
236247
depending on the reference code specifications. We leave this flexibility to the leaderboard creator.
237248

238249
```python
@@ -257,8 +268,8 @@ handle the typing system for tensors. The `reference.cu` that the leaderboard cr
257268
the following function signatures with their implementations filled out:
258269

259270
The main difference is we now need to define an alias for the type that the input / outputs are. A
260-
simple and common example is a list of FP32 tensors, which can be defined using a pre-defined array of
261-
`const int`s called `N_SIZES`, then define an array of containers, e.g.
271+
simple and common example is a list of FP32 tensors, which can be defined using a pre-defined array of
272+
`const int`s called `N_SIZES`, then define an array of containers, e.g.
262273
`std::array<std::vector<float>, N_SIZES>`.
263274

264275
```cuda
@@ -293,7 +304,7 @@ bool check_implementation(output_t out, output_t ref) {
293304
```
294305

295306
The leaderboard submission for _Python code_ requires the following function signatures, where
296-
`InputType` and `OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`,
307+
`InputType` and `OutputType` are generics that could be a `torch.Tensor`, `List[torch.Tensor]`,
297308
etc. depending on the reference code specifications.
298309

299310
```python
@@ -354,7 +365,7 @@ If you'd like to donate a GPU to our efforts, we can make you a CI admin in Gith
354365

355366
## Citation
356367

357-
If you used our software please cite it as
368+
If you used our software please cite it as
358369

359370
```
360371
@misc{kernelbot2025,

src/kernelbot/cogs/admin_cog.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ async def leaderboard_create_local(
184184
try:
185185
old_lb = db.get_leaderboard(leaderboard_name)
186186
except LeaderboardDoesNotExist:
187-
pass
187+
old_lb = None
188188
db.delete_leaderboard(leaderboard_name, force=True)
189189

190190
# get existing forum thread or create new one

src/kernelbot/cogs/verify_run_cog.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,20 +53,20 @@ async def trigger_run(self, interaction: discord.Interaction, gpu: GPU, reporter
5353
sub_code = create_mock_attachment(
5454
"submission.py", Path("examples/identity_py/submission.py").read_text()
5555
)
56-
task = make_task_definition("examples/identity_py")
56+
leaderboard = make_task_definition("examples/identity_py")
5757
else:
5858
sub_code = create_mock_attachment(
5959
"test.cu", Path("examples/identity_cuda/submission.cu").read_text()
6060
)
61-
task = make_task_definition("examples/identity_cuda")
61+
leaderboard = make_task_definition("examples/identity_cuda")
6262

6363
return await submit_leaderboard(
6464
interaction,
6565
-1,
6666
sub_code,
6767
gpu,
6868
reporter=reporter,
69-
task=task,
69+
task=leaderboard.task,
7070
mode=SubmissionMode.TEST,
7171
seed=None,
7272
)
@@ -292,8 +292,8 @@ async def verify_runs(self, interaction: discord.Interaction):
292292
amd = get_gpu_by_name("mi300")
293293
t4 = get_gpu_by_name("T4")
294294

295-
reporter = MultiProgressReporterDiscord("Verifying")
296-
await reporter.show(interaction)
295+
reporter = MultiProgressReporterDiscord(interaction)
296+
await reporter.show("Verifying")
297297

298298
results = await asyncio.gather(
299299
self.verify_github_run(interaction, nvidia, reporter.add_run("NVIDIA-PY"), "py"),

0 commit comments

Comments
 (0)