Skip to content
Merged
110 changes: 106 additions & 4 deletions docs/design/interface.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,71 @@ Package standard that are not relevant to your use case.

See the help documentation with `help(Exclusion)` for more details.

#### {{< var done >}} `CustomCheck`
#### {{< var wip >}} `Extensions`

This sub-item of `Config` defines extensions, i.e., additional checks
that supplement those specified by the Data Package standard. It
contains subitems that store additional checks, such as `RequiredCheck`
and `CustomCheck`. This `Extensions` class might be expanded to include
more types of extensions.

```` python
@dataclass(frozen=True)
class Extensions:
"""Extensions to the standard checks.

This contains additional checks to be made alongside the standard
Data Package checks.

Attributes:
required_checks: A list of `RequiredCheck` objects defining properties to set as required.
custom_checks: A list of `CustomCheck` objects defining extra, custom checks to run alongside the standard
checks.

Examples:

```{python}
import check_datapackage as cdp

extensions = cdp.Extensions(
required_checks=[
cdp.RequiredCheck(
jsonpath="$.description",
message="Data Packages must include a description."
),
cdp.RequiredCheck(
jsonpath="$.contributors[*].email",
message="All contributors must have an email address."
)
],
custom_checks=[cdp.CustomCheck(
type="only-mit",
jsonpath="$.licenses[*].name",
message="Data Packages may only be licensed under MIT.",
check=lambda license_name: license_name == "mit",
)]
)
# check(descriptor, config=cdp.Config(extensions=extensions))
```
"""
required_check : list[RequiredCheck] = field(default_factory=list)
custom_checks : list[CustomCheck] = field(default_factory=list)
````

Each extension class must implement its own `apply()` method that takes
the `datapackage.json` properties `dict` as input and outputs an `Issue`
list that contains the issues found by that extension.

#### {{< var wip >}} `RequiredCheck`

A sub-item of `Config`. Expresses a custom check.
A sub-item of `Extensions` that allows users to set specific properties
as required that are not required by the Data Package standard. See the
help documentation with `help(RequiredCheck)` for more details.

#### {{< var done >}} `CustomCheck`

A sub-item of `Extensions` that allows users to add an additional, custom
check that `check-datapackage` will run alongside the standard checks.
See the help documentation with `help(CustomCheck)` for more details.

### {{< var done >}} `Issue`
Expand All @@ -159,20 +220,61 @@ Package.
See the help documentation with
[`help(Issue)`](/docs/reference/Issue.qmd) for more details.

## {{< var planned >}} Configuration file

When we develop the CLI, we'll use a config file to store the settings
contained within the `Config` class. This file will be named `.cdp.toml`
and will be located in the same directory as the `datapackage.json`
file. This is an example of what that file could look like:

``` toml
# The Data Package standard version to check against.
version = "v2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be a string and not a e.g. a float 2.0? At least, we should include the .0 since we know there's a v2.1 on the way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could provide a set list of potential values in it, like ["v2", "v2.0", "v1", "v2.1"] to allow short hands of v2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I still don't like the ambiguity of "v2" 🤔 Like would that refer to "v2.0" or "v2.1"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's just that their documentation refers to v2: https://datapackage.org. The only place v2.0 is used is in the Changelog. But everywhere else, it is v2. I'd rather keep aligned with what they use in their language. But I'm also not super strongly opinionated on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwjohnst86 That's true; they tend to refer to v2 on their website. They do, however, use 2.0 in their profiles, e.g., https://datapackage.org/profiles/2.0/datapackage.json.


# Whether to check properties that must *and should* be included.
strict = true

# Exclude all issues related to the "resources" property.
[[exclusions]]
jsonpath = "$.resources"

# Exclude all issues related to the "format" type in the schema.
[[exclusions]]
type = "format"

# Exclude issues that are both a "pattern" type and found in
# the "path" property of the "contributors" field.
[[exclusions]]
jsonpath = "$.contributors[*].path"
type = "pattern"

# Require that the "description" property is included in the Data Package.
[[extensions.required_checks]]
jsonpath = "$.description"
message = "This Data Package needs to include a 'description' property."

# A custom check to ensure that all resource names are lowercase.
[[extensions.custom_checks]]
jsonpath = "$.resources[*].name"
type = "name-lowercase"
message = "The value in the 'name' property of the 'resources' must be lowercase."
check = "lambda name: name.islower()"
```

## Flow

This is the potential flow of using `check-datapackage`:

```{mermaid}
%%| label: fig-interface-flow
%%| fig-cap: "Flow of functions and classes when using `check-datapackage`."
%%| fig-alt: "A flowchart showing the flow of using `check-datapackage`, starting with reading the datapackage.json and .cdp.yaml files, then checking the descriptor with the config, and finally explaining any issues found."
%%| fig-alt: "A flowchart showing the flow of using `check-datapackage`, starting with reading the `datapackage.json` and `.cdp.toml` files, then checking the descriptor with the config, and finally explaining any issues found."
flowchart TD
descriptor_file[(datapackage.json)]
read_json["read_json()"]
descriptor[/"Descriptor<br>(dict)"/]

config_file[(.cdp.yaml)]
config_file[(.cdp.toml)]
read_config["read_config()"]

config[/Config/]
Expand Down