I am using the Sagemaker Pytorch Estimator based on a custom docker image stored in AWS ECR.
from sagemaker.pytorch.estimator import PyTorch
role = "arn:..."
estimator = PyTorch(
image_uri="1...ecr...amazonaws.com/...:prototype",
git_config={"repo": "https://github.com/celsofranssa/LightningPrototype.git", "branch": "sagemaker"},
entry_point="main.py",
role=role,
region="us-...",
instance_type="local", # ml.g4dn.2xlarge
instance_count=1,
volume_size=225,
hyperparameters=hparams
)
estimator.fit()
Sagemaker correctly clones the sources from GitHub and performs the checkout into the specified branch.
The Bug:
However, it only copies the main.py
to /opt/ml/code
inside the container instead of the holy-cloned source code, which causing ModuleNotFoundError: No module named 'source'
:
Traceback (most recent call last):
2y9byzwyxr-algo-1-reuoy | File "/opt/ml/code/main.py", line 15, in <module>
2y9byzwyxr-algo-1-reuoy | from source.helper.EvalHelper import EvalHelper
2y9byzwyxr-algo-1-reuoy | ModuleNotFoundError: No module named 'source'
Logging the /opt/ml/code
content only shows the main.py
:
print(f"Content: {os.listdir(os.getcwd())}")
['main.py']