Skip to content

Commit 93be74e

Browse files
tsaloeffigiesyarikoptic
authored
[ENH] Allow plus signs in labels (#1926)
* Allow plus signs in labels. * Update test_rules.py * Remove self from CODEOWNERS. * Update formats.yaml * Update src/schema/objects/formats.yaml Co-authored-by: Chris Markiewicz <[email protected]> * [DATALAD RUNCMD] Replace more of label regexes present in various spots to gain + === Do not change lines below === { "chain": [], "cmd": "git-sedi '\\[0-9a-zA-Z\\]' '[0-9a-zA-Z+]'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ * Adjust (or replace) mentionings of alphanumeric as now would include + as well * Allow for + in the paths, which can include labels * Fix regex for `_run` index in a test_validator.py * Update src/appendices/entity-table.md Co-authored-by: Yaroslav Halchenko <[email protected]> * fix(test): Run is an index, not a label --------- Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Yaroslav Halchenko <[email protected]> Co-authored-by: Christopher J. Markiewicz <[email protected]>
1 parent 79e4ea8 commit 93be74e

File tree

10 files changed

+46
-36
lines changed

10 files changed

+46
-36
lines changed

src/appendices/entity-table.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ For example, if a file has an acquisition and reconstruction label, the
1212
acquisition entity must precede the reconstruction entity.
1313
REQUIRED and OPTIONAL entities for a given file type are denoted;
1414
empty cells imply that entities MUST NOT be specified.
15-
Entity formats indicate whether the value is alphanumeric
15+
Entity formats indicate whether the value is alphanumeric (and possibly including `+` character(s))
1616
(`<label>`) or numeric (`<index>`).
1717

1818
A general introduction to entities is given in the section on

src/common-principles.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Each entity has the following attributes:
4949
1. *Index*: A non-negative integer, potentially zero-padded for
5050
consistent width.
5151

52-
1. *Label*: An alphanumeric string.
52+
1. *Label*: An alphanumeric (and possibly including `+` character(s)) string.
5353
Note that labels MUST not collide when casing is ignored
5454
(see [Case collision intolerance](#case-collision-intolerance)).
5555

@@ -1120,7 +1120,7 @@ A guide for using macros can be found at
11201120
Additional files and directories containing raw data MAY be added as needed for
11211121
special cases.
11221122
All non-standard file entities SHOULD conform to BIDS-style naming conventions, including
1123-
alphabetic entities and suffixes and alphanumeric labels/indices.
1123+
alphabetic entities and suffixes and alphanumeric (and possibly including `+` character(s)) labels/indices.
11241124
Non-standard suffixes SHOULD reflect the nature of the data, and existing
11251125
entities SHOULD be used when appropriate.
11261126
For example, an ASSET calibration scan might be named

src/schema/objects/columns.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ derived_from:
110110
`sample-<label>` entity from which a sample is derived,
111111
for example a slice of tissue (`sample-02`) derived from a block of tissue (`sample-01`).
112112
type: string
113-
pattern: ^sample-[0-9a-zA-Z]+$
113+
pattern: ^sample-[0-9a-zA-Z+]+$
114114
desc_id:
115115
name: desc_id
116116
display_name: Description Label
@@ -125,7 +125,7 @@ desc_id:
125125
its `desc_id` column SHOULD contain all labels of the `desc` entity)
126126
used across the entire derivative dataset.
127127
type: string
128-
pattern: ^desc-[0-9a-zA-Z]+$
128+
pattern: ^desc-[0-9a-zA-Z+]+$
129129
description:
130130
name: description
131131
display_name: Description
@@ -371,7 +371,7 @@ participant_id:
371371
A participant identifier of the form `sub-<label>`,
372372
matching a participant entity found in the dataset.
373373
type: string
374-
pattern: ^sub-[0-9a-zA-Z]+$
374+
pattern: ^sub-[0-9a-zA-Z+]+$
375375
placement__motion:
376376
name: placement
377377
display_name: Placement
@@ -436,7 +436,7 @@ sample_id:
436436
A sample identifier of the form `sample-<label>`,
437437
matching a sample entity found in the dataset.
438438
type: string
439-
pattern: ^sample-[0-9a-zA-Z]+$
439+
pattern: ^sample-[0-9a-zA-Z+]+$
440440
sample_type:
441441
name: sample_type
442442
display_name: Sample type
@@ -468,7 +468,7 @@ session_id:
468468
A session identifier of the form `ses-<label>`,
469469
matching a session found in the dataset.
470470
type: string
471-
pattern: ^ses-[0-9a-zA-Z]+$
471+
pattern: ^ses-[0-9a-zA-Z+]+$
472472
sex:
473473
name: sex
474474
display_name: Sex

src/schema/objects/common_principles.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,8 @@ index:
8686
label:
8787
display_name: label
8888
description: |
89-
An alphanumeric value, possibly prefixed with arbitrary number of 0s for consistent indentation,
89+
An alphanumeric (and possibly including `+` character(s)) value, possibly prefixed with arbitrary
90+
number of 0s for consistent indentation,
9091
for example, it is `rest` in `task-rest` following `task-<label>` specification.
9192
Note that labels MUST not collide when casing is ignored
9293
(see [Case collision intolerance](SPEC_ROOT/common-principles.md#case-collision-intolerance)).

src/schema/objects/entities.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ direction:
7676
name: dir
7777
display_name: Phase-Encoding Direction
7878
description: |
79-
The `dir-<label>` entity can be set to an arbitrary alphanumeric label
79+
The `dir-<label>` entity can be set to an arbitrary legitimate label
8080
(for example, `dir-LR` or `dir-AP`)
8181
to distinguish different phase-encoding directions.
8282

src/schema/objects/formats.yaml

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,16 @@ index:
99
label:
1010
display_name: Label
1111
description: |
12-
Freeform labels without special characters.
13-
pattern: '[0-9a-zA-Z]+'
12+
Free-form labels with alphanumeric and plus (+) characters.
13+
14+
Plus signs MAY be used to concatenate multiple applicable labels,
15+
but no relationship is established by a partial match.
16+
In particular, the inheritance principle does not connect files
17+
containing entities such as `<name>-x+y` with either `<name>-x` or `<name>-y`.
18+
For example, metadata stored in a file at the root of the dataset with name `/acq-6p_T2w.json`
19+
does not apply to files with partially matching "acquisition" entity values
20+
such as `/sub-1/anat/sub-1_acq-6p+s2_T2w.nii`.
21+
pattern: '[0-9a-zA-Z+]+'
1422
# Metadata types
1523
boolean:
1624
display_name: Boolean
@@ -59,7 +67,7 @@ dataset_relative:
5967
The validation for this format is minimal.
6068
It simply ensures that the value is a string with any characters that may appear in a valid path,
6169
without starting with "/" (an absolute path).
62-
pattern: '(?!/)[0-9a-zA-Z/\_\-\.]+'
70+
pattern: '(?!/)[0-9a-zA-Z+/\_\-\.]+'
6371
date:
6472
display_name: Date
6573
description: |
@@ -90,7 +98,7 @@ file_relative:
9098
The validation for this format is minimal.
9199
It simply ensures that the value is a string with any characters that may appear in a valid path,
92100
without starting with "/" (an absolute path).
93-
pattern: '(?!/)[0-9a-zA-Z/\_\-\.]+'
101+
pattern: '(?!/)[0-9a-zA-Z+/\_\-\.]+'
94102
participant_relative:
95103
display_name: Path relative to the participant directory
96104
description: |
@@ -100,7 +108,7 @@ participant_relative:
100108
It simply ensures that the value is a string with any characters that may appear in a valid path,
101109
without starting with "/" (an absolute path) or "sub/"
102110
(a relative path starting with the participant directory, rather than relative to that directory).
103-
pattern: '(?!/)(?!sub-)[0-9a-zA-Z/\_\-\.]+'
111+
pattern: '(?!/)(?!sub-)[0-9a-zA-Z+/\_\-\.]+'
104112
rrid:
105113
display_name: Research resource identifier
106114
description: |
@@ -115,7 +123,7 @@ stimuli_relative:
115123
It simply ensures that the value is a string with any characters that may appear in a valid path,
116124
without starting with "/" (an absolute path) or "stimuli/"
117125
(a relative path starting with the stimuli directory, rather than relative to that directory).
118-
pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z/\_\-\.]+'
126+
pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z+/\_\-\.]+'
119127
time:
120128
display_name: Time
121129
description: |

src/schema/objects/metadata.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3683,9 +3683,10 @@ TaskName:
36833683
Name of the task.
36843684
No two tasks should have the same name.
36853685
The task label included in the filename is derived from this `"TaskName"` field
3686-
by removing all non-alphanumeric characters (that is, all except those matching `[0-9a-zA-Z]`).
3687-
For example `"TaskName"` `"faces n-back"` or `"head nodding"` will correspond to task labels
3688-
`facesnback` and `headnodding`, respectively.
3686+
by removing all non-alphanumeric or `+` characters (that is, all except those matching `[0-9a-zA-Z+]`),
3687+
and potentially replacing spaces with `+` to ease readability.
3688+
For example `"TaskName"` `"faces n-back"` or `"head nodding"` could correspond to task labels
3689+
`faces+n+back` or `facesnback` and `head+nodding` or `headnodding`, respectively.
36893690
type: string
36903691
TermURL:
36913692
name: TermURL
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
SUMMARY:
33
0 out of 1 files were successfully validated, using the following regular expressions:
4-
- `.*?/sub-(?P<subject>[0-9a-zA-Z]+)/(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))(|_ce-(?P<ceagent>[0-9a-zA-Z]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))(|_run-(?P<run>[0-9a-zA-Z]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
4+
- `.*?/sub-(?P<subject>[0-9a-zA-Z+]+)/(|ses-(?P<session>[0-9a-zA-Z+]+)/)anat/sub-(?P=subject)(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z+]+))(|_ce-(?P<ceagent>[0-9a-zA-Z+]+))(|_rec-(?P<reconstruction>[0-9a-zA-Z+]+))(|_run-(?P<run>[0-9]+))(|_part-(?P<part>(mag|phase|real|imag)))_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)\.(nii.gz|nii|json)$`
55
The following files were not matched by any regex schema entry:
66
* `/home/chymera/.data2/datalad/000026/noncompliant/sub-EXC022/anat/sub-EXC022_ses-MRI_flip-1_VFA.nii.gz
77
The following mandatory regex schema entries did not match any files:

tools/schemacode/src/bidsschematools/tests/test_rules.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ def test_entity_rule(schema_obj):
1818
nii_rule = rules._entity_rule(rule, schema_obj)
1919
assert nii_rule == {
2020
"regex": (
21-
r"sub-(?P<subject>[0-9a-zA-Z]+)/"
22-
r"(?:ses-(?P<session>[0-9a-zA-Z]+)/)?"
21+
r"sub-(?P<subject>[0-9a-zA-Z+]+)/"
22+
r"(?:ses-(?P<session>[0-9a-zA-Z+]+)/)?"
2323
r"(?P<datatype>anat)/"
2424
r"(?(subject)sub-(?P=subject)_)"
2525
r"(?(session)ses-(?P=session)_)"
@@ -50,8 +50,8 @@ def test_entity_rule(schema_obj):
5050
json_rule = rules._entity_rule(rule, schema_obj)
5151
assert json_rule == {
5252
"regex": (
53-
r"(?:sub-(?P<subject>[0-9a-zA-Z]+)/)?"
54-
r"(?:ses-(?P<session>[0-9a-zA-Z]+)/)?"
53+
r"(?:sub-(?P<subject>[0-9a-zA-Z+]+)/)?"
54+
r"(?:ses-(?P<session>[0-9a-zA-Z+]+)/)?"
5555
r"(?:(?P<datatype>anat)/)?"
5656
r"(?(subject)sub-(?P=subject)_)"
5757
r"(?(session)ses-(?P=session)_)"

tools/schemacode/src/bidsschematools/tests/test_validator.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,12 @@ def test_write_report(tmp_path):
7272

7373
validation_result["schema_tracking"] = [
7474
{
75-
"regex": ".*?/sub-(?P<subject>[0-9a-zA-Z]+)/"
76-
"(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)"
77-
"(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))"
78-
"(|_ce-(?P<ceagent>[0-9a-zA-Z]+))"
79-
"(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))"
80-
"(|_run-(?P<run>[0-9a-zA-Z]+))"
75+
"regex": ".*?/sub-(?P<subject>[0-9a-zA-Z+]+)/"
76+
"(|ses-(?P<session>[0-9a-zA-Z+]+)/)anat/sub-(?P=subject)"
77+
"(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z+]+))"
78+
"(|_ce-(?P<ceagent>[0-9a-zA-Z+]+))"
79+
"(|_rec-(?P<reconstruction>[0-9a-zA-Z+]+))"
80+
"(|_run-(?P<run>[0-9a-zA-Z+]+))"
8181
"(|_part-(?P<part>(mag|phase|real|imag)))"
8282
"_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
8383
"\\.(nii.gz|nii|json)$",
@@ -86,12 +86,12 @@ def test_write_report(tmp_path):
8686
]
8787
validation_result["schema_listing"] = [
8888
{
89-
"regex": ".*?/sub-(?P<subject>[0-9a-zA-Z]+)/"
90-
"(|ses-(?P<session>[0-9a-zA-Z]+)/)anat/sub-(?P=subject)"
91-
"(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z]+))"
92-
"(|_ce-(?P<ceagent>[0-9a-zA-Z]+))"
93-
"(|_rec-(?P<reconstruction>[0-9a-zA-Z]+))"
94-
"(|_run-(?P<run>[0-9a-zA-Z]+))"
89+
"regex": ".*?/sub-(?P<subject>[0-9a-zA-Z+]+)/"
90+
"(|ses-(?P<session>[0-9a-zA-Z+]+)/)anat/sub-(?P=subject)"
91+
"(|_ses-(?P=session))(|_acq-(?P<acquisition>[0-9a-zA-Z+]+))"
92+
"(|_ce-(?P<ceagent>[0-9a-zA-Z+]+))"
93+
"(|_rec-(?P<reconstruction>[0-9a-zA-Z+]+))"
94+
"(|_run-(?P<run>[0-9]+))"
9595
"(|_part-(?P<part>(mag|phase|real|imag)))"
9696
"_(T1w|T2w|PDw|T2starw|FLAIR|inplaneT1|inplaneT2|PDT2|angio|T2star)"
9797
"\\.(nii.gz|nii|json)$",

0 commit comments

Comments
 (0)