Generate JSON Schema from Data Models
JSONSchema Components and How to Set from Data Model
This document serves as a guide on what features in a CSV data model map to which components in a JSONSchema file. All examples of JSONSchema files were taken from this example data model. For documentation on how to generate a JSONSchema file, see the cli documentation.
Property Keys
Property keys are taken from the Attribute names in the data model. Class labels can be used as well as display names, provided they do not contain any characters blacklisted by synapse: "(", ")", ".", " ", "-"
.
Description
The JSONSchema description is taken from the data model’s description field. If the data model does not have a description, the description will be set to TBD
.
An attribute with a description:
{"$id": "http://example.com/MockComponent_validation",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Component to hold mock attributes for testing all validation rules",
"properties": {...}
}
An attribute without a description:
"CheckAges": {
"description": "TBD",
"not": {
"type": "null"
},
"title": "Check Ages"
},
Type
Multiple types can be enforced in JSONSchema. Currently allowed types are in the table below. Types can be set explicitly in the data model or inferred from an attribute’s specified validation rules.
Type |
---|
|
|
|
|
Explicit Type Setting
To explicitly set the type of a property in the JSONSchema, add the olumnType
column to the data model and specify one of the allowed types in that column for the appropriate attribute’s row. It is acceptable to leave rows blank for attributes that you do not wish to specify type for.
Implicit Type Inference
Types are also inferred from the validation rules set for an attribute. The following validation rules map to the indicated JSONSchema types:
Validation Rule |
JSONSchema Type |
---|---|
list |
array |
regex module |
string |
float |
number |
int |
integer |
num |
number |
string |
string |
inRange |
integer or number |
date |
string |
datetime |
format: date |
URL |
format: uri |
An attribute with a specified type:
"CheckRange": {
"description": "TBD",
"maximum": 100.0,
"minimum": 50.0,
"title": "Check Range",
"type": "number"
}
An attribute without a specified type:
"YearofBirth": {
"description": "TBD",
"title": "Year of Birth"
}
Validation Checks
Certain validation checks from schematic are also present in JSONSchema validation.
Type Checks
Types discussed above are enforced in JSONSchema validation. For more information about these rules see the documentation for type rules.
Valid Values
If an attribute has valid values specified, the JSONSchema validation will enforce that provided values are one of the valid values specified.
This will show up in the JSONSchema as an enum
key with a list of valid values.
An attribute with valid values specified:
"FileFormat": {
"description": "TBD",
"oneOf": [
{
"enum": [
"BAM",
"CRAM",
"CSV/TSV",
"FASTQ"
],
"title": "enum"
}
],
"title": "File Format"
},
An attribute with valid values specified along with the list
rule:
"CheckListEnum": {
"description": "TBD",
"oneOf": [
{
"items": {
"enum": [
"ab",
"cd",
"ef",
"gh"
]
},
"title": "array",
"type": "array"
}
],
"title": "Check List Enum"
}
Required Attributes
For required attributes, the JSONSchema will have an additional not: {"type": "null"}
key value pair added to the property.
A required attribute:
"CheckDate": {
"description": "TBD",
"not": {
"type": "null"
},
"title": "Check Date"
}
Validation Rules
inRange
Aside from the type validation checks, the inRange
rule will also be translated to the JSONSchema if provided for an attribute. The attribute must be a number
type, and the maximum
and minimum
keys will be added to the JSONSchema for the property, with the values taken from the range specified in the data model.
An attribute with an inRange
validation rule:
"CheckRange": {
"description": "TBD",
"maximum": 100.0,
"minimum": 50.0,
"title": "Check Range",
"type": "number"
}
For more information about the inRange
rule see the rule documentation.
regex
module
If the regex
module is specified for an attribute, the JSONSchema will include a pattern
keyword with the value being the regex string provided in the data model. Note that in cases where regex match
is the specified rule, the character ^
will be automatically pre-prended to the regex string, which enables the match
functionality on the backend. This caret does not need to be added within the data model to enable this functionality.
For example, an attribute with a regex
rule regex search [a-f]
specified will yield a property like:
"CheckRegexSingle": {
"description": "TBD",
"pattern": "[a-f]",
"type": "string",
"title": "Check Regex Single"
},
While an attribute with a regex
rule regex match [a-f]
specified will yield a property like:
"CheckRegexFormat": {
"description": "TBD",
"pattern": "^[a-f]",
"type": "string",
"title": "Check Regex Format"
}
For more information about the regex
module rule see the rule documentation.
date
If the date
validation rule is specified for an attribute, the JSONSchema will include a format: date
key value pair.
An attribute with a date
validation rule specified:
"CheckDate": {
"description": "TBD",
"type": "string",
"format": "date",
"title": "Check Date"
}
For more information about the date
rule see the rule documentation.
URL
If the URL
validation rule is specified for an attribute, the JSONSchema will include a format: uri
key value pair.
An attribute with a URL
validation rule specified:
"CheckURL": {
"description": "TBD",
"type": "string",
"format": "uri",
"title": "Check URL"
}
For more information about the URL
rule see the rule documentation.
Conditional Dependencies
Conditional properties will be added to the JSONSchema if present in the data model. The conditional formatting will look like a series of "if": {}, "then": {}
key dictionary pairs, in addition to the regular attribute dictionaries.
An example of a data type with conditional dependencies:
{
"$id": "http://example.com/BulkRNA-seqAssay_validation",
"$schema": "http://json-schema.org/draft-07/schema#",
"allOf": [
{
"if": {
"properties": {
"FileFormat": {
"enum": [
"BAM"
]
}
}
},
"then": {
"properties": {
"GenomeBuild": {
"not": {
"type": "null"
}
}
},
"required": [
"GenomeBuild"
]
}
},
{
"if": {
"properties": {
"FileFormat": {
"enum": [
"CRAM"
]
}
}
},
"then": {
"properties": {
"GenomeBuild": {
"not": {
"type": "null"
}
}
},
"required": [
"GenomeBuild"
]
}
},
{
"if": {
"properties": {
"FileFormat": {
"enum": [
"CSV/TSV"
]
}
}
},
"then": {
"properties": {
"GenomeBuild": {
"not": {
"type": "null"
}
}
},
"required": [
"GenomeBuild"
]
}
},
{
"if": {
"properties": {
"FileFormat": {
"enum": [
"CRAM"
]
}
}
},
"then": {
"properties": {
"GenomeFASTA": {
"not": {
"type": "null"
}
}
},
"required": [
"GenomeFASTA"
]
}
}
],
"description": "TBD",
"properties": {
"Component": {
"description": "TBD",
"not": {
"type": "null"
},
"title": "Component"
},
"FileFormat": {
"description": "TBD",
"oneOf": [
{
"enum": [
"BAM",
"CRAM",
"CSV/TSV",
"FASTQ"
],
"title": "enum"
}
],
"title": "File Format"
},
"Filename": {
"description": "TBD",
"not": {
"type": "null"
},
"title": "Filename"
},
"GenomeBuild": {
"description": "TBD",
"oneOf": [
{
"enum": [
"GRCh37",
"GRCh38",
"GRCm38",
"GRCm39"
],
"title": "enum"
},
{
"title": "null",
"type": "null"
}
],
"title": "Genome Build"
},
"GenomeFASTA": {
"description": "TBD",
"title": "Genome FASTA"
},
"SampleID": {
"description": "TBD",
"not": {
"type": "null"
},
"title": "Sample ID"
}
},
"required": [
"Component",
"FileFormat",
"Filename",
"SampleID"
],
"title": "BulkRNA-seqAssay_validation",
"type": "object"
}