Data flow and management within a FRIDGE

We need to work out what the actual data flow is inside the FRIDGE .

For performance reasons, we will typically want the data to be on storage attached to the node where the workflows will run.

Here's an example:

Data Owner A has 4 TB of data to put in the FRIDGE.
Data Owner A is asked to upload it using MinIO.
MinIO thus needs to have 4 TB of space available to receive the data. 
Where is that disk space?
Assume that MinIO is running on a non-GPU node on Dawn - that node needs 4TB additional storage attaching to it just for MinIO to use.
Note that scaling the amount of storage has up and down is not trivial once it's deployed - you have to add/remove additional storage pools

Researcher A then wants to run a workflow using that data.
How is the data made available to the Workflow pod?
Options:
1. The Workflow requests the data as an input from the repository.
This makes a copy of the data on ephemeral storage.
This means there needs to be enough space to make a complete copy of the data on the GPU node where the workflow is running
2. Mount a volume containing the data
This will require the data to be moved out of the MinIO bucket and onto a standard volume that can then be mounted on the workflow pod (thus we essentially need double the space?)
3. Mount bucket as a volume
May be possible using something like https://github.com/s3fs-fuse/s3fs-fuse

Once the workflow is complete, where does the output go?
1. Into a MinIO bucket? There needs to be sufficient space available to MinIO
2. Into a volume? That output then needs to be transferred to MinIO

Upshot: we likely need a mechanism for transferring things from MinIO buckets to volumes and vice versa



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data flow and management within a FRIDGE #107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data flow and management within a FRIDGE #107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions