[WIP] Suggested modifications to directory layout of the bids-study `DatasetType` #2191

nikhil153 · 2025-09-02T02:04:38Z

This is a continuation of discussion on BIDS DatasetType started in #1972 and #2185. The proposed DatasetTypes include: bids-study, raw, and derivative.

I think this is a fantastic idea with the capacity to curate modular and nested standardized datasets.

However, the directory layout for bids-study proposed in #2185 is suboptimal from the data visibility perspective. Currently, the bids-study does not have a root-level subdirectory for bids-raw dataset. Based on previous conversations it was suggested that bids-raw dataset can be stowed inside sourcedata dir. This hidden location for a valid and probably the most common DatasetType is probably not ideal and will confuse new users.

So, I would like to make a case for treating sourcedata, bids-raw, and bids-derivative with equal importance by putting them on the same root-level inside the bids-study directory tree. In my experience this is more intuitive and helpful for data management of neuroimaging studies, where potentially different people will handle these three DatasetTypes.

The suggested bids- prefix for these directories is mostly to avoid possible confusion between source vs raw data connotations from past discussions and to indicate each of them can be stand-alone BIDS datasets.

Happy to hear your thoughts @michellewang, @jbpoline, @yarikoptic, @effigies, @mathdugre, @julia-pfarr, @nburgos, @AliceJoubert, @Adam-Ismaili-92, @surchs

…Type

codecov · 2025-09-02T02:08:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.71%. Comparing base (73281ae) to head (e0fe3ce).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2191   +/-   ##
=======================================
  Coverage   82.71%   82.71%           
=======================================
  Files          20       20           
  Lines        1608     1608           
=======================================
  Hits         1330     1330           
  Misses        278      278

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jbpoline · 2025-09-03T00:03:02Z

This makes a lot of sense to me, and would help everyone to more straightforwardly differentiate source-data and raw-bids much more easily.

AliceJoubert · 2025-09-04T15:55:02Z

This solution feels much clearer, I think it's a good idea !

effigies

Two notes on the schema. I don't have a strong opinion about bids-raw/, except to note that this will be the first use of a hyphen that doesn't separate key-value entities. If you want to be consistent with older versions of the spec, you could call it rawdata/. Many people seemed to think that was supposed to be valid...

effigies · 2025-09-04T17:10:31Z

src/schema/rules/directories.yaml

 # No naming convention applies, and the requirement level and opacity would be superfluous.
 #
-study:
+bids-study:


These top-level keys are DatasetType values. If you want them to be more mnemonic, you'll need some alternative way to match to DatasetType.

I would also recommend against hyphens in keys, as we have heretofore used valid identifiers, which makes traversing the schema using dot notation (rules.directories.study) possible.

Yes, sorry this was not the right place to suggest changes to DatasetType values.
The idea was to introduce bids- prefix in the directory names as an easier way to infer DatasetType similar to sub- and ses- directories. But that would possibly only work for bids-raw and bids-derivative subdirs within a study layout. Not sure if that's ideal at this point, so will simply revert to original naming.

effigies · 2025-09-04T17:11:38Z

src/schema/rules/directories.yaml

+  bids-raw:
+    name: bids-raw


Here name would contain the actual intended directory name, while the key is just a value that shows up in subdirs. You could do something like:

Suggested change

bids-raw:

name: bids-raw

bids_raw:

name: bids-raw

I think we will drop the bids prefix idea and go back to calling it rawdata and rely on docs to explain that this is a raw bids Datatype clarify any confusion with the sourcedata. Does this work?

effigies · 2025-09-04T17:24:02Z

src/schema/rules/directories.yaml

-  derivatives:
-    name: derivatives


I oppose this change on the grounds that derivatives/ already has a defined meaning, and a different name would imply a different meaning. For example bids-derivatives/ would probably be expected to contain only BIDS-Derivatives datasets, and no nonstandard derivatives (like freesurfer/). If you're not changing the meaning, don't change the name.

Further, using the same directory names as other BIDS dataset types allows for code that recursively indexes datasets to care very little about the specific DatasetType of the current dataset. The proposal to add a raw directory would be an addition, but I think it's sensible to be able to crawl sourcedata/ and derivatives/ the same as any other dataset.

I see your point which I mostly agree. Although, I think what bids prefix would imply is a bit unclear.
Currently since the spec allows nonstandard pipeline output inside a valid BIDS dataset, I didn't think bids-derivatives DatasetType would imply change in that policy / meaning.
But based on earlier point, the proposal of bids suffix doesn't seem ideal here, so will revert to derivatives

effigies · 2025-09-04T17:25:25Z

src/schema/rules/directories.yaml

+  phenotype:
+    name: phenotype
+    level: optional
+    opaque: false


This was removed after this discussion: #2185 (comment)

I didn't see a justification for re-adding it, so I want to make sure this isn't slipped in with little consideration.

Yes, should have explained this addition based on offline conversations.
The reason for removing phenotype directory was that it would presumably appear in the raw dataset. But isn't that true for derivatives directory as well? It's a bit confusing having derivatives part of all three DatasetTypes but enforcing phenotype only be part of raw or derivative. In practice, phenotype data curation is likely to be more independent than that of derivatives, which is one of the reasons we prefer it to be at the top level.

nikhil153 · 2025-09-09T19:45:40Z

@effigies - Thanks for the feedback and patiently dealing with these multiple intertwined open issues and hurried PR from my side - a result of mixed-up online and offline conversations. Will respond individually to the inline comments.

julia-pfarr

A few thoughts:

I think we need to be more clear about what bids-study is. If I was a naive user, I would think bids-study is a layout for organizing my study in "the bids way". Meaning that as soon as I move from unorganized sourcedata to some sort of organized data, it should be BIDS compliant. Thus, I do not need a bids prefix for raw or derivatives since my assumption is "everything BIDS". If we want to expand bids-study to allow it to be consumed by other schemas or standards (e.g., Nipoppy), I think we can find a different way than the folder-naming to make them compatible and validate. E.g., adding expanded validation rules in the sense of „if type bids-study is used, everything that BIDS has a spec on (raw, derivatives, phenotype etc.) needs to be in BIDS standard“ or „„if type bids-study is used, at least X% of data needs to be BIDS“ or add an output-validator file that lists which of the data was validated by BIDS. (idk if any of those suggestions make sense, it’s just wild ideas :-D)
It is true that phenotype is supposed to be on the level of the raw data. However, the <measurement_tool_name>.tsv/.json are connected to the (recommended) sessions.tsv/.json file which should be kept at the root level. Thus, it would be more convenient to not have too deep nesting of phenotype. Also, as measurement data can also evolve from source to raw, the <measurement_tool_name>.tsv/.json might contain information for both types.
in offline conversation we also had the discussion about making bids-study not a DatasetType but a StudyType. Reasoning for that would be to have a clear specification for StudyType, maybe the validation would be wasier than changing the nesting logic of existing DatasetTypes and it is expandable to e.g. Meta or Mega Analysis etc. Wdyt?

Suggested modifications to directory layout of the bids-study Dataset…

b99a7bd

…Type

nikhil153 changed the title ~~[WIP] Suggested modifications to directory layout of the bids-study Dataset…~~ [WIP] Suggested modifications to directory layout of the bids-study DatasetType Sep 2, 2025

fixed minor typos

2d59198

effigies reviewed Sep 4, 2025

View reviewed changes

effigies added the discussion ongoing discussion label Sep 5, 2025

Merge branch 'bids-standard:master' into master

7fc71c7

removed bids prefix from the DatasetType and subdirs.

e0fe3ce

yarikoptic mentioned this pull request Sep 13, 2025

We were talking about "raw dataset" but did not make it clear that it BIDS #2210

Open

julia-pfarr reviewed Sep 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Suggested modifications to directory layout of the bids-study `DatasetType` #2191

[WIP] Suggested modifications to directory layout of the bids-study `DatasetType` #2191

nikhil153 commented Sep 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

jbpoline commented Sep 3, 2025

Uh oh!

AliceJoubert commented Sep 4, 2025

Uh oh!

effigies left a comment

Uh oh!

effigies Sep 4, 2025

Uh oh!

nikhil153 Sep 9, 2025

Uh oh!

effigies Sep 4, 2025

Uh oh!

nikhil153 Sep 10, 2025

Uh oh!

effigies Sep 4, 2025

Uh oh!

nikhil153 Sep 10, 2025

Uh oh!

effigies Sep 4, 2025

Uh oh!

nikhil153 Sep 10, 2025

Uh oh!

nikhil153 commented Sep 9, 2025

Uh oh!

julia-pfarr left a comment •

edited

Loading

Uh oh!

Uh oh!

[WIP] Suggested modifications to directory layout of the bids-study DatasetType #2191

Are you sure you want to change the base?

[WIP] Suggested modifications to directory layout of the bids-study DatasetType #2191

Conversation

nikhil153 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jbpoline commented Sep 3, 2025

Uh oh!

AliceJoubert commented Sep 4, 2025

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikhil153 commented Sep 9, 2025

Uh oh!

julia-pfarr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[WIP] Suggested modifications to directory layout of the bids-study `DatasetType` #2191

[WIP] Suggested modifications to directory layout of the bids-study `DatasetType` #2191

nikhil153 commented Sep 2, 2025 •

edited

Loading

codecov bot commented Sep 2, 2025 •

edited

Loading

julia-pfarr left a comment •

edited

Loading