TogoMetabolomeDataFormat
TogoMD: the Togo Metabolome Data Format
The Togo Metabolome Data Format (TogoMD) defines an easy-to-use data format with the aim of advanced utilization of metabolomics data. Based on this format, we endeavor to integrate domestic metabolome databases.
Contents |
Definition regarding Description Fields
XML Definition File (XSD)
From metadata to peak data, those fields necessary for describing metabolome data have been carefully selected and the field names and descriptions defined. This definition is provided as the XML schema described below.
URI | https://metabolonote.kazusa-db.jp/TogoMetabolomeDbSchema.xsd |
---|---|
Version | 1.2.0 |
Last modified | Nov. 5, 2014 |
Correspondence of the XML Element/Attribute and Metabolonote Field Name
This shows the description of the XML element and attribute. In addition, this section also shows the correspondence of the field name and the property name described on each Metabolonote page.
* Peak information (P) is not used in Metabolonote.
Metabolonote | XML schema | Value format *2 | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID Label | Page's field name | Property name | Element name | Attribute name or subelement name *1 | |||||||||
SE | sample_set | Sample set information. Indicates a set of experiments or data obtaining projects. | |||||||||||
ID | SE_ID | id | /SE\d+/ | Sample set ID. This is the unique ID in the system. When data is private, any given alphanumeric characters can be used for the tentative ID. | |||||||||
Title | SE_Title | title | STRING | Short title | |||||||||
Description | SE_Description | description | STRING | Describes important concepts for interpreting data, such as experiment purposes and relevancy between samples. | |||||||||
Authors | SE_Authors | authors | STRING | Author | |||||||||
Reference | SE_Reference | reference | STRING | Related reference information | |||||||||
Comment | SE_Comment | comment | TEXT *3 | Comment | |||||||||
S | sample | Sample information. Describes the preparation methods for each individual sample. | |||||||||||
ID | S_ID | id | /S\d+/ | Sample ID. This ID does not duplicate in the sample set (SE). | |||||||||
Title | S_Title | title | STRING | Short sample name | |||||||||
Organism - Scientific Name | S_Organism - Scientific Name | organism_scientific_name | STRING | Scientific name. This is required when biological samples are handled. | |||||||||
Organism - ID | S_Organism - ID | organism_id | Database Name:ID[|Database Name:ID]... *4 | Classification ID of the organism. | |||||||||
Compound - ID | S_Compound - ID | compound_id | Database Name:ID[|Database Name:ID]... *4 | Compound ID | |||||||||
Compound - Source | S_Compound - Source | compound_source | STRING | Information about the availability of the reagent: the company name and catalog id. This is required when standard compounds are handled. | |||||||||
Preparation | S_Preparation | preparation | STRING | Growing methods, conditions, particular processing, sampling portions, sampling methods, and preparation methods for reagents | |||||||||
Sample Preparation Details ID | S_Sample Preparation Details ID | sample_preparation_details_id | /SS\d+/ | The ID of sample preparation details information (SS) applied. | |||||||||
Comment | S_Comment | comment | TEXT *3 | Comment | |||||||||
M | analytical_method | Analytical method information Describes the instrumental analysis methods for individual samples. | |||||||||||
ID | M_ID | id | /M\d+/ | Analysis method ID that does not duplicate in the sample (S). | |||||||||
ID | M_Title | title | STRING | Short title. | |||||||||
Method Set ID | M_Method Set ID | analytical_method_details_id | /MS\d+/ | Detailed analysis information ID (MS) applied. | |||||||||
Sample Amount | M_Sample Amount | sample_amount | STRING | An amount of sample used. This information is necessary for normalizing quantitative data to compare with other samples. | |||||||||
Comment | M_Comment | comment | TEXT *3 | Comment | |||||||||
D | data_analysis | Data analysis information. Describes data analysis methods based on the use of computer, such as peak extraction. | |||||||||||
ID | D_ID | id | /D\d+/ | Data analysis method ID that does not duplicate in the analysis method (M). | |||||||||
Title | D_Title | title | STRING | Short Title. | |||||||||
Data Analysis Set ID | D_Data Analysis Set ID | data_analysis_details_id | /DS\d+/ | Detailed data analysis method information ID (DS) applied. | |||||||||
Recommended decimal places of m/z | D_Recommended decimal places of m/z | recommended_decimal_places_of_mass | {default OR INT}{[|peak INT] OR [|Instrument X INT]}... *5 | Number of significant figures. | |||||||||
Comment | D_Comment | comment | TEXT *3 | Comment | |||||||||
SS | sample_preparation_details | Detailed information about sample preparation. Shared in the sample set. | |||||||||||
ID | SS_ID | id | /SS\d+/ | The sample preparation details ID that does not duplicate in the sample set (SE). | |||||||||
Title | SS_Title | title | STRING | Short title | |||||||||
Description | SS_Description | description | STRING | Details about sample preparation. In the case of biological samples, for example, details of growth conditions and drug treatments are described. Descriptions that depend on analytical methods should not be included here, and they should be included in the details of analytical methods (MS). | |||||||||
Comment_of_details | SS_Comment of details | comment_of_details | TEXT *3 | Comment | |||||||||
MS | analytical_method_details | Detailed analysis method information. Shared within the sample set. | |||||||||||
ID | MS_ID | id | /MS\d+/ | Detailed analysis information ID that does not duplicate in the sample set (SE). | |||||||||
Title | MS_Title | title | STRING | Short title | |||||||||
Instrument | MS_Instrument | instrument | STRING | Instrument name and vendor name | |||||||||
Instrument Type | MS_Instrument Type | instrument_type | *6 | Instrument type | |||||||||
Ionization | MS_Ionization | ionization_method | *6 | Ionization method | |||||||||
Ion Mode | MS_Ion Mode | ion_mode | *6 | Distinction of positive analysis and negative analysis | |||||||||
Description | MS_Description | description | STRING | Details about methods of instrumental analysis. Describes all details regarding analytical instruments and analysis conditions. Describes sample preparation methods too, other than information that depends on the sample. For example, homogenization and metabolite extraction method should be described here. | |||||||||
Comment_of_details | MS_Comment of details | comment_of_details | TEXT *3 | Comment | |||||||||
DS | data_analysis_details | Detailed information of data analysis methods. Shared within the sample set. | |||||||||||
ID | DS_ID | id | /DS\d+/ | Detailed analysis method information that does not duplicate in the sample set (SE). | |||||||||
Title | DS_Title | title | STRING | Short title | |||||||||
Description | DS_Description | description | STRING | Describes all details regarding data analysis methods such as software programs used and the parameters adopted. | |||||||||
Comment_of_details | DS_Comment of details | comment_of_details | TEXT *3 | Comment | |||||||||
AM | annotation_method_details | Detailed information about annotation methods. | |||||||||||
ID | AM_ID | id | /AM\d+/ | Annotation method ID that does not duplicate in the sample set (SE) | |||||||||
Title | AM_Title | title | STRING | Short title | |||||||||
Description | AM_Description | description | STRING | Describes details regarding annotation methods. Describes standards by which annotation has been assigned. | |||||||||
Comment_of_details | AM_Comment of details | comment_of_details | TEXT *3 | Comment | |||||||||
P *7 | peak | Peak information. Detailed description of each individual peak obtained and its annotation. | |||||||||||
Peak ID *7 | @id | /P\d+/ | Peak ID that does not duplicate in data analysis method information (D) | ||||||||||
Intensity *7 | intensity | DOUBLE | Peak intensity The interpretation of value, if it is the relative value or the absolute value, is described in data analysis method information (D). | ||||||||||
Retention Time (min) *7 | retention_time | DOUBLE | Retention time. The unit is minutes. If CE-MS, this indicates Migration Time. | ||||||||||
Retention Index *7 | retention_index | DOUBLE | Retention time index. If CE-MS, this indicates Migration Index. | ||||||||||
Mass Detected *7 | mass_detected | DOUBLE | m/z value of the parent ion that was detected. If GC-MS, this indicates null. | ||||||||||
Ion Species *7 | ion_species | STRING *6 | If LC-MS, this indicates the type of ion detected. [M+H]+, etc. | ||||||||||
Isotope Peaks *7 | isotope_peaks | MI:MASS INT[|13C1:MASS INT[|13C2:MASS INT[|13C3:MASS INT...]]] *8 | The m/z value of isotope peak and intensity information | ||||||||||
EI MS spectrum *7 | ei_mass_spectrum | *9 *10 | If GC-MS, this indicates MS spectrum information with EI. | ||||||||||
MSn spectrum *7 | msn_spectrum | *9 *10 | If LC-MS and CE-MS, this indicates the MSn spectrum. | ||||||||||
UV absorption spectrum *7 | uv_absorption_spectrum | *9 *11 | If LC-MS, this indicates the UV-Vis absorption spectrum. NIR and IR will also be available in the future. | ||||||||||
Annotation *7 | annotation | STRING | Annotation information. Describes information regarding the elemental formula, the compound name, the compound group name, and the degree of annotation confidence. | ||||||||||
Annotation Method ID *6 | annotation_method_details_id | /AM\d+/ | ID of the detailed information of annotation methods (AM) | ||||||||||
Annotated Compound ID *7 | annotated_compound_id | Database Name:ID[|Database name:ID]... *4 | Annotated compound ID | ||||||||||
Comment *7 | comment | STRING | Comment |
- *1 "@" indicates the attribute name, while other are indicated by element name.
- *2 "STRING" indicates a non-breaking string. "TEXT" indicates a breaking string. "INT" indicates an integer. "DOUBLE" indicates a double floating-point number. "MASS" is the value that indicates m/z value. "ID" indicates the database ID. A string separated with "/" indicates the regular expression. A portion between "[" and "]" indicates the block that can be added arbitrarily. The character "..." indicates the repetition of the last portion separated by "[" and "]" or the similar pattern. The character "|" indicates the delimiter, which does not mean "OR" used as one of the regular expressions. The portion separated by "{" and "}" indicates the block that can be added to the pattern before or after "OR". "OR" means the "OR" of one of the regular expressions. Other expressions indicate reserved words.
- *3 When the line head is prefixed with "[", the portion up to the next character "]" is considered to be the subfield name. The portion up to the line end is considered to be the content of the subfield. This specification is prepared for future function enhancement.
- *4 Only the determined STRING for the database name is inserted, but is not always defined with XSD.
- *5 "default": A reserved word that means "just as described". Can still be used even though changed to an integer value. "peak": The number of digits of m/z detected within peak information. "Instrument X": The number of digits of mass within msn_spectrum.
- *6 Only the determined STRING is inserted, but is not always defined with XSD.
- *7 Peak information (P) is not used in Metabolonote.
- *8 "MI": A reserved word that indicates the monoisotopic ion. MASS becomes identical with m/z detected. "Isotope (e.g. 13C1)" indicates the isotope of the isotopic peaks and the number of isotopic peaks within a molecule.
- *9 Not written on the peaktable file. See Spectrum Data Format for details on how to describe this information.
- *10 The xml definitions of MSn and EI MS. This value can have multiple ion elements with "mass" and "intensity" as attributes.
- *11 The xml definitions of UV-Vis. This value can have multiple absorption elements with "wave_length" and "value" as attributes.
Other rules
Omission of top-level ID
When the metadata ID is described with omission of its top-level ID, the metadata ID is recognized to come under the same top-level metadata. For example, a ID "DS2" written in the description of metadata "SE1_DS1" represents the ID "SE1_DS2."
"PSEUDO: " a blank node
A metadata whose Title starts with "PSEUDO: " represents a blank node which is conveniently prepared for placing the lower-level metadata. Several processed data (D) can be further used for another integrated data analysis (D). In this case, the metadata for the integrated analysis should not be related to a certain substance of sample or raw data. To describe such metadata, a blank node to construct the metadata hierarchy is needed. The description "PSEUDO: " in the head of Title in sample (S) or analytical method (M) class is a marker of such a conveniently prepared metadata as blank node.
ID Assignment
See here for the rules for ID Assignment.
File Type and Extension
Data type | Example of ID | File descriptor (extension) | Description | File format |
---|---|---|---|---|
Metadata | SE** | .info.txt | Files that contains metadata of each class (SE, S, M, D, MS, DS, AM) | The Element name, the Attribute name or subelement name of XML schema, and the values of thme are described in tab delimited format. Sample file is here. |
SE**_S** | ||||
SE**_S**_M** | ||||
SE**_S**_M**_D** | ||||
SE**_S**_M**_D**_P** | ||||
Peak related data (for multiple peaks) | SE**_S**_M**_D** | .peak-table.txt | Information of detected peaks are described in a table. | The attribute name or subelement name of the XML schema for peak information (P) (excluding the spectrum data) and their values are described in tab delimited text format. Sample file is here. |
.msn-list.txt | MSn spectrum data in list. | See the section "Fromat of spectrum data file" in detail. A sample of msn-list file is here. | ||
.uv-list.txt | UV-Vis spectrum data in list. | |||
.ei-list.txt | EI mass spectrum data in list. | |||
Peak related data (for a single peak) | SE**_S**_M**_D**_P** | .peak.txt | Information of a detected peak. | The format is same as that of "peak-table.txt", although data for only one peak is included. |
.msn.txt | MSn spectrum data for a single peak | Same as ".msn-list.txt" file | ||
.uv.txt | UV-Vis spectrum data for a single peak | Same as ".uv-list.txt" file | ||
.ei.txt | EI mass spectrum data for a single peak | Same as ".ei-list.txt" file | ||
.peak-all.txt | All information related to a single peak | Data in .info.txt, .peak.txt, .msn.txt, .uv.txt, and .ei.txt (if exists) are concatenated in a file. | ||
Data type | Example of ID | File descriptor (extension) | Description | File format |
Raw data (binary) | SE**_S**_M** | .bin.zip | The binary raw data generated by the analytical instrument. | A zip compressed file includes the binary raw file, .info.txt file and other additional files such as license information. |
Raw data (text) | SE**_S**_M**_D** | .txt.zip | Text files that contain unprocessed near-raw data extracted from the binary raw data. | A zip compressed file includes the text files below, .info.txt file, and other additional files such as license information. |
SE**_S**_M**_D** | .raw-ms.txt | chromatogram data | It will be discussed and defined according to requirements. If the full mass data and MSn data are prepared in separate files, the raw-ms.txt files can be provided with branch numbers. At least one of raw-ms.txt or raw-ms-table.txt must be provided. If in the case UV-Vis data exists, at least one of raw-uv.txt file or raw-uv-table.txt file must be provided. | |
SE**_S**_M**_D** | .raw-uv.txt | Raw UV-Vis spectrum data | ||
SE**_S**_M**_D** | .raw-ms-table.txt | Mass chromatogram data in table format. | ||
SE**_S**_M**_D** | .raw-uv-table.txt | UV-Vis spectrum data in table format. |
Format of data file
Described in text files.
Common file header
Files must contain a header line shown below as the first line.
- " <tab> " means a tab (control character). The data values are shown in parentheses "[]".
# <tab> id <tab> [Database name]:[Metadata ID].[File descriptor]
(Example)
# <tab> id <tab> kazusa:SE01_S01_M01_D01.info.txt
Optional header
Other information can be attached after the first line.
# <tab> license <tab> [License information]
(Example)
# <tab> license <tab> CC BY-SA
Peak table
Data file that contains information of multiple peaks in tab delimited table format.
A column header line is described after the common header line.
The attribute name and subelement name of the XML schema for Peak information (P) (described in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name") should be described with being delimited by tab.
- The spectrum data (ei_mass_spectrum, msn_spectrum, and uv_absorption_spectrum) should not be included in this file.
The data values are described in the following lines with being delimited by tab.
(Example)
Format of spectrum data file
This file format is defined to describe the data below.
- MSn spectrum data
- EI mass spectrum data generated by GC-MS analysis
- UV-Vis absorption spectrum data
More than one data block defined below should be described after the common header line.
The header line starts with ">" and following data line(s) containing a pair of values. Tab is used as delimiter.
(Example) In the case of msn-list data
A sample of msn-list file is here.
Header line
Each column of the header line contains below.
Column | Description | Requirements | Value format *1 |
---|---|---|---|
1 | Peak ID | mandatory | />P\d+/ (">" + Peak ID) |
2 | Descriptor for MSn and detector type | mandatory | STRING *2 |
3 | Type of instrument | mandatory | STRING *3 |
4 | Ion mode | mandatory for MSn data | /[+|-]/ (positive or negative) |
5 | Mass scan mode | mandatory for MSn and EI data | /[c|p]/ (centroid or profile) |
6 | Ionization method | mandatory for MSn and EI data | STRING *4 |
7 | Collision energy | mandatory for MSn and EI data | STRING *5 |
8 | m/z scan range | mandatory for MSn and EI data | /[\d\.]+-[\d\.]+/ |
*1 Same as *2 of the table in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name".
*2 Details of the descriptor is described in the next section.
*3 Specified strings such as ITMS, FTMS, TOF-MS for EI, and PDA for UV-Vis analysis should be described.
*4 Specified strings such as ESI and EI should be described.
*5 Different descriptions can be described according to the type of instruments. (Example) cid35.00, 70eV, etc.
Descriptor for MSn and detector type
Multi-stage MS (MSn) | msn event descriptor [mass value of precursor ion @ msn event descriptor that generate the precursor ion] |
---|---|
Electron ionization | EI |
UV-Vis absorption spectrum | PDA, etc. |
Multiple msn event descriptors that having the same name should not be contained in the data for a single peak ID.
In the case of MS2, the part of [mass value of precursor ion @...] can be omitted, because the precursor ion is explicitly same as the peak metabolite.
In the case of MS3 or further stage of MSn, the part of [mass value of precursor ion @...] must be described, because the origin of the precursor ion should be identified.
(Example)
ms3_1 [123.456@ms2_1]
The msn event descriptor
"ms" followed by the number of the stage. If multiple data exist for the same stage, they should be identified by branch numbers.
(Example)
Derived from the peak metabolite
- ms2
Derived from the peak metabolite (such as the case that multiple MS2 data are acquired at multiple retention times).
- ms2_1, ms2_2, etc.
Derived form a product ion generated by a MS2 analysis.
- ms3, ms3_1, etc.
Data lines
Column | Description | Requirement | Value format *1 |
---|---|---|---|
1 | m/z value or wave length (nm) | mandatory | DOUBLE |
2 | intensity | mandatory | DOUBLE |
- 1 Same as *2 of the table in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name".