Skip to content

Conversation

JamesKunstle
Copy link
Contributor

@JamesKunstle JamesKunstle commented Jun 3, 2025

requires #590 to be merged first- this PR is based on that one

Closes #462

@mergify mergify bot added testing Relates to testing ci-failure labels Jun 3, 2025
@JamesKunstle JamesKunstle force-pushed the add-liger-smoketest branch from bfb705b to 6f6a312 Compare June 3, 2025 23:48
@mergify mergify bot removed the ci-failure label Jun 3, 2025
@JamesKunstle JamesKunstle force-pushed the add-liger-smoketest branch from 6f6a312 to 86fba44 Compare June 4, 2025 00:27
@JamesKunstle
Copy link
Contributor Author

solves #462

@booxter booxter added the hold label Jun 4, 2025
@mergify mergify bot added the one-approval label Jun 4, 2025
@booxter
Copy link
Contributor

booxter commented Jun 4, 2025

@Mergifyio rebase

@booxter booxter removed the hold label Jun 4, 2025
Copy link
Contributor

mergify bot commented Jun 4, 2025

rebase

✅ Branch has been successfully rebased

@booxter booxter force-pushed the add-liger-smoketest branch from 86fba44 to d40e922 Compare June 4, 2025 17:44
@mergify mergify bot added ci-failure and removed ci-failure labels Jun 4, 2025
TorchrunArgs,
TrainingArgs,
)
from instructlab.training.main_ds import run_training

MINIMAL_TRAINING_ARGS = {
"max_seq_len": 140, # this config fits nicely on 4xL40s and may need modification for other setups
"max_batch_len": 15000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these parameters' changes intentional? why?

@booxter booxter self-requested a review June 4, 2025 19:40
@fynnsu
Copy link
Collaborator

fynnsu commented Jun 6, 2025

Is the idea behind the param changes that this could run on a smaller ec2 instance? Should we also update the workflow file in this pr?

@mergify mergify bot added the ci-failure label Jun 20, 2025
@booxter
Copy link
Contributor

booxter commented Jun 20, 2025

@Mergifyio rebase

JamesKunstle and others added 2 commits June 20, 2025 17:02
runs through Liger w/ and w/o CPUOffload
parameterizes LoRA but doesn't enable it because of memory usage bug

removes `smoketest.sh` from `tests` directory- all tests should use
pytest in the future.

Signed-off-by: James Kunstle <[email protected]>
Let's see if there's any issue without these changes. If not, we can
cancel / postpone changes to a separate PR.

Signed-off-by: Ihar Hrachyshka <[email protected]>
Copy link
Contributor

mergify bot commented Jun 20, 2025

rebase

✅ Branch has been successfully rebased

@booxter booxter force-pushed the add-liger-smoketest branch from 8aad472 to a3e0fde Compare June 20, 2025 17:02
@mergify mergify bot added ci-failure and removed ci-failure labels Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
one-approval testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite smoke tests from tests/smoketest.sh as pytest smoke tests
3 participants