Troubleshooting
These are some common issues you may encounter when using schematic
Debugging
Whether you are using DCA or schematic API or schematic library/CLI, the following are some steps that you want to take to debug your issues. Here are some steps to walk you through the process.
What was the command that caused the error?
Is the error listed down below?
Did you follow the workflow outlined in the tutorials section under: “Contributing your manifest with the CLI”?
If you are validating or submitting the manifest, how was the manifest initiatially generated? If manually and NOT using schematic, there may be errors.
If the manifest was generated by schematic, when was it generated? Did you download the previously submitted manifest from Synapse and modify it? Did you download it and resubmit it? Please run the manifest generate command again to have a fresh manifest.
Does your data model have any “reserved words” as attribute names?
The following are reserved words that should not be used as attribute names in your data model:
id (any case variation): - When submitting a manifest, schematic automatically adds the “Id” column to the manifest. If you have “id” (any case variation) in your model, it could potentially cause a problem or confusion. - Avoid mapping the “display name” of “Id” to “id” (lowercase), as “id” is a reserved internal key for Synapse and cannot be used as an annotation key.
entityId (any case variation): - The entityId column in a manifest refers to the Synapse ID of the file that a particular row of metadata describes. - It ensures your metadata is attached to the correct file in Synapse. - When generating a manifest, schematic automatically adds the entityId column to the manifest to ensure that the metadata is attached to the correct file in Synapse during the submission step. - When submitting or updating metadata, schematic uses entityId to know where the annotations should go.
- eTag (any case variation):
The eTag is a version identifier for a file in Synapse. It helps ensure that metadata is being applied to the correct version of an entity.
When submitting or updating metadata, schematic automatically adds the eTag column to the manifest.
Please also note that the following are reserved words for Synapse table columns. Any variations of the following would cause a conflict with Synapse table columns:
- ROW_ID (any case variation):
row_id
RowID
ROW ID (contains a space)
` row_id ` (contains leading/trailing spaces)
- ROW_VERSION (any case variation):
row_version
RowVersion
ROW VERSION (contains a space)
` row_version ` (contains leading/trailing spaces)
- ROW_ETAG (any case variation):
row_etag
RowETag
ROW ETAG (contains a space)
` row_etag ` (contains leading/trailing spaces)
- ROW_BENEFACTOR (any case variation):
row_benefactor
RowBenefactor
ROW BENEFACTOR (contains a space)
` row_benefactor ` (contains leading/trailing spaces)
- ROW_SEARCH_CONTENT (any case variation):
row_search_content
RowSearchContent
ROW SEARCH CONTENT (contains spaces)
` row_search_content ` (contains leading/trailing spaces)
- ROW_HASH_CODE (any case variation):
row_hash_code
RowHashCode
ROW HASH CODE (contains spaces)
` row_hash_code ` (contains leading/trailing spaces)
The following are reserved words for Synapse annotations. Variations of these words could potentially work (except id and etag which are already reserved words for schematic), but it is recommended to avoid them altogether. For more details, refer to the Synapse REST API documentation: EntityView.
name: The name of this entity. Must be 256 characters or less. Names may only contain: letters, numbers, spaces, underscores, hyphens, periods, plus signs, apostrophes, and parentheses.
description: The description of this entity. Must be 1000 characters or less.
id: The unique immutable ID for this entity. A new ID will be generated for new Entities. Once issued, this ID is guaranteed to never change or be re-issued.
etag: Synapse employs an Optimistic Concurrency Control (OCC) scheme to handle concurrent updates. Since the E-Tag changes every time an entity is updated it is used to detect when a client’s current representation of an entity is out-of-date.
createdOn: The timestamp when the entity was created.
modifiedOn: The timestamp when the entity was last modified.
createdBy: The ID of the user who created this entity.
modifiedBy: The ID of the user who last modified this entity.
parentId: The ID of the Entity that is the parent of this Entity.
concreteType: Indicates which implementation of Entity this object represents. The value is the fully qualified class name, e.g., org.sagebionetworks.repo.model.FileEntity.
versionNumber: The version number issued to this version of the object.
versionLabel: The version label for this entity.
versionComment: The version comment for this entity.
isLatestVersion: A boolean indicating if this is the latest version of the object.
columnIds: An array of ColumnModel IDs that define the schema of the object.
isSearchEnabled: A boolean specifying if full-text search is enabled. Note that enabling full-text search might slow down the indexing of the table or view.
viewTypeMask: A bitmask representing the types to include in the view.
type: Deprecated. Use viewTypeMask instead.
scopeIds: The list of IDs defining the scope of the view.
The following also have special meaning to schematic. Misusing these terms in your data model could lead to errors or unexpected behavior. Please read carefully before using them in your data model:
- Filename:
For data types that are stored in data files, the attribute Filename is used to denote the file name of each file in a dataset. If Filename is not included in the data type schema attributes, schematic interprets the data type as “tabular” (e.g., clinical, biospecimen data).
- Component:
The Component field in schematic is used to define higher-level groupings of attributes. - For example, a Patient might be described by components such as Demographics, Family History, Diagnosis, and Therapy, each with its own set of attributes and corresponding manifest. - Schematic allows declaration of “components” and relationships between components. - Schematic also enables validation and tracking of components across related entities (e.g., ensuring that all parts of a Patient record are present).
Create a Github issue or reach out to your respective DCC service desks. What is the schematic or DCA configuration used? Specifically, it’s most important to capture the following:
data_type: This is the same as Component in the data model.
master_fileview_id: This is the Synapse ID of the file view listing all project data.
data model url: This is the link to your data model.
dataset_id: This is the “top level folder” (folder annoated with contentType: Datatset).
What is the command or API call that you made? If you are using DCA, please provide the step at which you encountered the error (manifest generate, validate, submit, etc)
schematic manifest -c /path/to/config.yml get -dt <your data type> -s # OR (PLEASE REDACT YOUR BEARER TOKEN) curl -X 'GET' \ 'https://schematic.api.sagebionetworks.org/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2Fnf-osi%2Fnf-metadata-dictionary%2Fv9.8.0%2FNF.jsonld&title=Example&data_type=EpigeneticsAssayTemplate&use_annotations=true&dataset_id=syn63305821&asset_view=syn16858331&output_format=google_sheet&strict_validation=true&data_model_labels=class_label' \ -H 'accept: application/json' ...
Manifest Submit: RuntimeError: failed with SynapseHTTPError(‘400 Client Error: nan is not a valid Synapse ID.’)
As for 24.10.2 version of Schematic, we require the Filename column to have the full paths to the file on Synapse including the project name. You will encounter this issue if you try an submit a manifest with wrong filenames. For example, if your file in your project has this full path my_project/my_folder/my_file.txt, you will get this error by:
not containing full path (e.g. my_file.txt)
Wrong filename (e.g. my_project/my_folder/wrong_file_name.txt)
Wrong filepath (e.g. my_project/wrong_folder/my_file.txt)
This is because we join the Filename column together with what’s in Synapse to append the entityId column if it’s missing.
To fix: You will want to first check if your “Top Level Folder” has a manifest with invalid Filename values in the column. If so, please generate a manifest with schematic which should fix the Filenames OR (the less preferred solution) manually update the Filenames to include the full path to the file and manually upload.
Manifest Submit: TypeError: boolean value of NA is ambiguous
You may encounter this error if your manifest has a Component column but it is empty. This may occur if the manifest in your “Top Level Folder” does not contain this column. During manifest generate, it will create an empty column for you.
To fix: Check if your manifest has an empty Component column. Please fill out this column with the correct Component values and submit the manifest again.
Manifest Submit: AssertionError: input_df lacks Id column.
You may encounter this error if your manifest has an “id” (lower case) column during submission.
To fix: Delete the id (any case variation) and eTag column (any case variation) from your manifest and submit the manifest again.
Manifest validation: The submitted metadata does not contain all required column(s)
The required columns are determined by the data model, but Component should be a required column even if it’s not set that way in the data model. This is the validation error you may get if you don’t have the Component column.
To fix: Check if your manifest has a Component column or missing other required columns. Please add the Component column (and fill it out) or any other required columns.
Manifest validation: The submitted metadata contains << ‘string’ >> in the Component column, but requested validation for << expected string >>
If the manifest has incorrect Component values, you might get the validation error message above. This is because the Component value is incorrect, and the validation rule uses the “display” value of what’s expected in the Component column. For example, the display name could be “Imaging Assay” but the actual Component name is “ImagingAssayTemplate”.
To fix: Check if your manifest has invalid Component values and fill it out correctly. Using the above example, fill out your Component column with “ImagingAssayTemplate”
Manifest Generate: KeyError: entityId
Fixed: v24.12.1
If there is currently a manifest in your “Top Level Folder” on Synapse with an incorrect Filename BUT entityId column. You will be able to run manifest generate to create a new manifest with the new Filenames. However, If this manifest on Synapse does NOT have the entityId column you will encounter that error.
To fix: You will want to first check if your “Top Level Folder” has a manifest without the entityId column. If so, you can either submit your manifest using schematic OR (the less preferred solution) manually add the entityId column to the manifest on Synapse.
Manifest Generate: ValueError: cannot insert eTag, already exists
Fixed: v24.11.2
If you do NOT have a manifest in your “Top Level Folder” on Synapse and your File entities in this folder are annotated with ‘eTag’ key and you try to generate a manifest, it will fail.
To fix: This should be fixed in schematic, but for now, remove the ‘eTag’ annotation from your file.