Defining validation rules¶
On this page we explain the structure of rules. For a step-by-step guide to creating new rules, see the training.
Rule grouping¶
The validation of IDS data is done by applying rules to IDS structures. In this section we explain how rules are grouped together:
- Rule
A Rule is a set of checks that logically belong together. Rules are defined in a Python function and applied to one IDS. See the section Rule definition for more information on rules.
- Rule file
Multiple rules can be put in a rule file to group them together. This can be useful to, for example, group all checks belonging to a certain IDS together.
Rules could also be grouped based on functionality instead of IDS. This is completely up to the rule developer.
- Rule set
Multiple rule files in one folder make up a rule set. Rule sets can be selected by users when they want to validate a data entry. They are useful to group checks for, for example, specific scenarios (e.g. checking ITER machine limits) or specific workflows (e.g. checking constraints expected by the H&CD workflow).
Rules that check the validity of the data according to the generic Data Dictionary definition are grouped in a special rule set called
generic. This generic ruleset is built in to the IMAS-Validator and enabled by default. You can specifically disable the generic tests by supplying the--no-genericflag to the Command Line Interface.
Ruleset folder structure¶
See below code block for an example directory structure. We can interpret the folder structure as follows:
Directories containing rule sets (
rule_dir,rule_dir_custom). The IMAS-Validator can find the rule sets in these directories through the Command Line Interface argument--extra-rule-dirs /path/to/rule_dir /path/to/rule_dir_customor through by setting the environment variableRULESET_PATH=/path/to/rule_dir:/path/to/rule_dir_custom.Inside these directories we find the rule sets:
Diagnostics,ITER-MD,ECR,MyCustomRules. They can be enabled for validation with the Command Line Interface argument--ruleset <ruleset_name>.Inside the rule sets directories we find the validation rule files containing validation functions.
├── rule_dir
| ├── Diagnostics
| | ├── common_ids.py
| | └── equilibrium.py
| └── ITER-MD
| ├── common_ids.py
| └── core_profiles.py
└── rule_dir_custom
├── ECR
| ├── common_ids.py
| └── core_profiles.py
└── MyCustomRules
├── common_ids.py
└── equilibrium.py
Note
Make sure that anyone working with the tests or using the explore tool can
properly understand your validation rules. Add docstrings explaining what kind of tests you have written.
Docstrings are surrounded by triple quotes ("""my explanation docstring""") and are defined:
At rule level (under function definition)
At file level (top of validation rule file)
At ruleset level (top of
__init__.pyfile in ruleset folder)
Rule definition¶
Validation rules are defined inside the python files as follows:
An
@validatordecorator indicates which IDSs (and optionally which occurrences) to apply the validator function to. This is done like@validator('summary'),@validator('summary:0')or@validator('summary:0', 'equilibrium:0'). More details on this decorator can be found in the API documentation:@validator.The
@validatordecorator is followed by a Python function definition:def <rule_name>(arguments...):. This sets the name of the rule, which should be unique.Note
The full name of the rule is
<rule_set>/<rule_file>/<rule_name>, for example a rulevalidate_ids_plugins_metadatain the rule filecommon_ids.pyin the rule setMyCustomRuleswill be calledMyCustomRules/common_ids.py/validate_plugins_metadata.A Python docstring describing what the rule checks. This description is available to users when any of the assertions in the rule fail. Therefore, it should give users an indication of what is being checked and how to fix any failing checks.
The docstring starts and ends with three double quotes (
"""). See below examples.The checks are written in the function body of the rule. Use
assertstatements to check criteria. Severalhelper methodsare available for common types of checks.You can write an assertion as follows:
assert <check>[, "optional message"], see below examples. When the check evaluates toFalse, this is reported as a failed validation. You can provide an optional message to clarify this specific check: this is recommended when the check itself is a complex expression and/or not immediately clear to users.Important
In contrast to regular Python
assertstatements, the validation rule continues to be evaluated after a failedassert. This allows to catch multiple validation failures in a single rule, instead of stopping after the first. It may, however, be surprising to regular Python developers:Rules continue evaluation after a failed assert¶@validator("core_profiles") def validate_profiles_1d(cp): assert len(cp.profiles_1d) > 0 # In regular Python, we don't reach this line when profiles_1d is empty. # However, this is a validation rule and we could get an IndexError # because evaluation continues even when len(cp.profiles_1d) == 0 first_profiles = cp.profiles_1d[0] ...
Attention
The @validator decorator and all helper methods are automatically available in rule files. You
should not try to import them manually from the imas_validator package.
Your IDE might complain about undefined variables, but you can safely ignore that.
"""This validation rule file shows example cases of how to define IDS validation rules"""
@validator("*")
def validate_ids_plugins_metadata(ids):
"""Validate mandatory attributes in the ids_properties.plugins."""
plugins = ids.ids_properties.plugins
for node in plugins.node:
assert node.path != ""
for name in node.put_operation:
assert name != ""
# etc.
@validator("gyrokinetics_local")
def validate_gyrokinetics_electron_definition(gk):
"""Validate that there is an electron species in the species AoS."""
for species in gk.species:
if species.charge_norm != -1:
continue
assert species.mass_norm == 2.724437108e-4
assert species.temperature_norm == 1.0
assert species.density_norm == 1.0
break
else:
assert False, "No electron species found"
@validator("core_profiles")
def validate_ion_charge(cp, version=">=3.38.0, <4.0.0"):
"""Validate that profiles_1d/ion/z_ion is defined."""
for p1d in cp.profiles_1d:
for ion in p1d.ion:
assert ion.z_ion.has_value
@validator("equilibrium:0")
def validate_has_comment(eq):
"""Validate that first occurrence of equilibrium has a comment."""
assert eq.ids_properties.comment != ''
Note
The dd_version formatting is done according to the packaging module specifiers. If a specific version number is required it is formatted as "==3.38.1"
It is also possible to write rules that cross-validate multiple IDSs.
This is done by specifying all the necessary IDS names in the @validator decorator.
While specifying the occurrence number in the @validator decorator is optional
for single IDS validation, it is mandatory for multi-IDS validation.
@validator("summary:0", "core_profiles:0")
def cross_validate_summary_and_core_profiles(summary, core_profiles):
"""
Validate that quantities defined in both
summary and core_profiles are in agreement.
"""
assert Approx(summary.time, core_profiles.time)
assert Approx(
summary.global_quantities.ip.value,
core_profiles.global_quantities.ip
)