Generate a manifest
A manifest is a structured file containing metadata that adheres to a specific data model. This page covers different ways to generate a manifest.
Prerequisites
Before Using the Schematic CLI
Install and Configure Schematic: Ensure you have installed schematic and set up its dependencies. See the Installation section for more details.
Understand Important Concepts: Understand Important Concepts: Familiarize yourself with key concepts outlined on the Welcome to Schematic’s documentation! of the documentation.
Configuration File: Learn more about each attribute in the configuration file by referring to the relevant documentation.
Using the Schematic API in Production
Visit the Schematic API (Production Environment): https://schematic.api.sagebionetworks.org/v1/ui/#/
This will open the Swagger UI, where you can explore all available API endpoints.
Run help command
You could run the following commands to learn about subcommands with manifest generation:
schematic manifest -h
You could also run the following commands to learn about all the options with manifest generation:
schematic manifest --config path/to/config.yml get -h
Generate an empty manifest
Option 1: Use the CLI
You can generate a manifest by running the following command:
schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s
-c /path/to/config.yml: Specifies the configuration file containing your data model location.
-dt <your_data_type>: Defines the data type for the manifest (e.g., “Patient”, “Biospecimen”).
-s: Generates a manifest as a Google Sheet.
If you want to generate a manifest as an excel spreadsheet, you could do:
schematic manifest -c /path/to/config.yml get -dt <your data type> --output-xlsx <your-output-manifest-path.xlsx>
And if you want to generate a manifest as a csv file, you could do:
schematic manifest -c /path/to/config.yml get -dt <your data type> --output-csv <your-output-manifest-path.csv>
Option 2: Use the API
Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the following parameters and execute the request:
schema_url: The URL of your data model. - If your data model is hosted on GitHub, the URL should follow this format:
JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
output_format: The desired format for the generated manifest. Options include “excel” or “google_sheet”.
This will generate a manifest directly from the API.
Generate a manifest using a dataset on synapse
Option 1: Use the CLI
Note
See the Installation section for more details to obtain synapse credentials and set up synapse configuration file.
The top-level dataset can be either an empty folder or a folder containing files.
See below as an example of a top-level dataset:
syn12345678/
├── sample1.fastq
├── sample2.fastq
└── sample3.fastq
Here you should use syn12345678 to generate a manifest
See another example of a top-level dataset with subfolders:
syn12345678/
└── subfolder1/
├── sample1.fastq
└── sample2.fastq
└── subfolder2/
├── sample3.fastq
└── sample4.fastq
Here you should use syn12345678 to generate a manifest
schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s -d <synapse_dataset_id>
-c /path/to/config.yml: Specifies the configuration file containing the data model location and asset view (master_fileview_id).
-dt <your_data_type>: Defines the data type/schema model for the manifest (e.g., “Patient”, “Biospecimen”).
-d <your_dataset_id>: Retrieves the existing manifest associated with a specific dataset on Synpase.
Option 2: Use the API
To generate a manifest using the Schematic API, follow these steps:
Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the required parameters and execute the request:
- schema_url: The URL of your data model.
- If your data model is hosted on GitHub, the URL should follow this format:
JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- output_format: The desired format for the generated manifest.
Options include “excel” or “google_sheet”.
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
- dataset_id: The top-level Synapse dataset ID.
This can be a Synapse Project ID or a Folder ID.
asset_view: The Synapse ID of the fileview containing the top-level dataset for which you want to generate a manifest.
Generate a manifest using a dataset on synapse and pull annotations
Note
When you pull annotations from Synapse, the existing metadata (annotations) associated with files or folders in a Synapse dataset is automatically retrieved and pre-filled into the generated manifest. This saves time and ensures consistency between the Synapse dataset and the manifest.
See below as an example:
syn12345678/
├── file1.txt
├── file2.txt
└── file3.txt
The corresponding annotations might look like this:
file1.txt - Annotation Key: species - Annotation Value: test1
file2.txt - Annotation Key: species - Annotation Value: test2
file3.txt - Annotation Key: species - Annotation Value: test3
The generated manifest will include the above annotations pulled from Synapse when enabled.
Option 1: Use the CLI
Note
Ensure your Synapse credentials are configured before running the command. You can obtain a personal access token from Synapse by following the instructions here: https://python-docs.synapse.org/tutorials/authentication/#prerequisites
The top-level dataset can be either an empty folder or a folder containing files.
schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s -d <synapse_dataset_id> -a
-c /path/to/config.yml: Specifies the configuration file containing the data model location and asset view (master_fileview_id).
-a: Pulls annotations from Synapse and fills out the manifest with the annotations.
-dt <your_data_type>: Defines the data type/schema model for the manifest (e.g., “Patient”, “Biospecimen”).
-d <your_dataset_id>: Retrieves the existing manifest associated with a specific dataset on Synpase.
Option 2: Use the API
To generate a manifest using the Schematic API, follow these steps:
Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the required parameters and execute the request:
- schema_url: The URL of your data model.
- If your data model is hosted on GitHub, the URL should follow this format:
JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- output_format: The desired format for the generated manifest.
Options include “excel” or “google_sheet”.
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
- dataset_id: The top-level Synapse dataset ID.
This can be a Synapse Project ID or a Folder ID.
asset_view: The Synapse ID of the fileview containing the top-level dataset for which you want to generate a manifest.
- use_annotations: A boolean value that determines whether to pull annotations from Synapse and fill out the manifest with the annotations.
Set this value to true to pull annotations.