-
Notifications
You must be signed in to change notification settings - Fork 588
Closed
Description
Hi,
I have used the following commands to download llama-3.1-8b dataset according to the readme file.
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri
And the following files are downloaded in the folder:
llama3.1-8b]$ ls dataset
cnn_dailymail_calibration.json llama3-1-8b-cnn-dailymail-calibration.md5 llama3-1-8b-sample-cnn-eval-5000.md5
cnn_eval.json llama3-1-8b-cnn-eval.md5 sample_cnn_eval_5000.json
However, the inference command fails because pickle files are not found.
llama3.1-8b]$ export DATASET_PATH=$LLAMA_FOLDER/dataset
llama3.1-8b]$ export CHECKPOINT_PATH=$LLAMA_FOLDER/Llama-3.1-8B-Instruct
llama3.1-8b]$ python -u main.py --scenario Offline \
--model-path ${CHECKPOINT_PATH} \
--batch-size 16 \
--dtype bfloat16 \
--user-conf user.conf \
--total-sample-count 13368 \
--dataset-path ${DATASET_PATH} \
--output-log-dir output \
--tensor-parallel-size ${GPU_COUNT} \
--vllm
No module named 'vllm._version'
from vllm.version import __version__ as VLLM_VERSION
INFO:datasets:PyTorch version 2.4.0 available.
WARNING:Llama-8B-Dataset:Processed pickle file /scratch/mn/inference/language/llama3.1-8b/dataset not found. Please check that the path is correct
INFO:Llama-8B-Dataset:Loading dataset...
Traceback (most recent call last):
File "/scratch/mn/inference/language/llama3.1-8b/main.py", line 216, in <module>
main()
File "/scratch/mn/inference/language/llama3.1-8b/main.py", line 173, in main
sut = sut_cls(
File "/scratch/mn/inference/language/llama3.1-8b/SUT_VLLM.py", line 56, in __init__
self.data_object = Dataset(
File "/scratch/mn/inference/language/llama3.1-8b/dataset.py", line 36, in __init__
self.load_processed_dataset()
File "/scratch/mn/inference/language/llama3.1-8b/dataset.py", line 52, in load_processed_dataset
self.processed_data = pd.read_json(self.dataset_path)
File "/home/mn/.local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 791, in read_json
json_reader = JsonReader(
File "/home/mn/.local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 904, in __init__
data = self._get_data_from_filepath(filepath_or_buffer)
File "/home/mn/.local/lib/python3.10/site-packages/pandas/io/json/_json.py", line 944, in _get_data_from_filepath
self.handles = get_handle(
File "/home/mn/.local/lib/python3.10/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
IsADirectoryError: [Errno 21] Is a directory: '/scratch/mn/inference/language/llama3.1-8b/dataset'
I also don't know why it throws an error that dataset path is a directory. What else should it be?
Any idea about that?
Metadata
Metadata
Assignees
Labels
No labels