From Metabolonote

jump-to-nav Jump to: navigation, search

TogoMD: the Togo Metabolome Data Format

The Togo Metabolome Data Format (TogoMD) defines an easy-to-use data format with the aim of advanced utilization of metabolomics data. Based on this format, we endeavor to integrate domestic metabolome databases.

Definition regarding Description Fields

XML Definition File (XSD)

From metadata to peak data, those fields necessary for describing metabolome data have been carefully selected and the field names and descriptions defined. This definition is provided as the XML schema described below.

URI	https://metabolonote.kazusa-db.jp/TogoMetabolomeDbSchema.xsd
Version	1.2.0
Last modified	Nov. 5, 2014

Correspondence of the XML Element/Attribute and Metabolonote Field Name

This shows the description of the XML element and attribute. In addition, this section also shows the correspondence of the field name and the property name described on each Metabolonote page.

* Peak information (P) is not used in Metabolonote.

Metabolonote			XML schema		Value format *2	Description
ID Label	Page's field name	Property name	Element name	Attribute name or subelement name *1	Value format *2	Description
SE			sample_set			Sample set information. Indicates a set of experiments or data obtaining projects.
	ID	SE_ID		id	/SE\d+/	Sample set ID. This is the unique ID in the system. When data is private, any given alphanumeric characters can be used for the tentative ID.
	Title	SE_Title		title	STRING	Short title
	Description	SE_Description		description	STRING	Describes important concepts for interpreting data, such as experiment purposes and relevancy between samples.
	Authors	SE_Authors		authors	STRING	Author
	Reference	SE_Reference		reference	STRING	Related reference information
	Comment	SE_Comment		comment	TEXT *3	Comment

S			sample			Sample information. Describes the preparation methods for each individual sample.
	ID	S_ID		id	/S\d+/	Sample ID. This ID does not duplicate in the sample set (SE).
	Title	S_Title		title	STRING	Short sample name
	Organism - Scientific Name	S_Organism - Scientific Name		organism_scientific_name	STRING	Scientific name. This is required when biological samples are handled.
	Organism - ID	S_Organism - ID		organism_id	Database Name:ID[\|Database Name:ID]... *4	Classification ID of the organism.
	Compound - ID	S_Compound - ID		compound_id	Database Name:ID[\|Database Name:ID]... *４	Compound ID
	Compound - Source	S_Compound - Source		compound_source	STRING	Information about the availability of the reagent: the company name and catalog id. This is required when standard compounds are handled.
	Preparation	S_Preparation		preparation	STRING	Growing methods, conditions, particular processing, sampling portions, sampling methods, and preparation methods for reagents
	Sample Preparation Details ID	S_Sample Preparation Details ID		sample_preparation_details_id	/SS\d+/	The ID of sample preparation details information (SS) applied.
	Comment	S_Comment		comment	TEXT *3	Comment

M			analytical_method			Analytical method information Describes the instrumental analysis methods for individual samples.
	ID	M_ID		id	/M\d+/	Analysis method ID that does not duplicate in the sample (S).
	ID	M_Title		title	STRING	Short title.
	Method Set ID	M_Method Set ID		analytical_method_details_id	/MS\d+/	Detailed analysis information ID (MS) applied.
	Sample Amount	M_Sample Amount		sample_amount	STRING	An amount of sample used. This information is necessary for normalizing quantitative data to compare with other samples.
	Comment	M_Comment		comment	TEXT *3	Comment

D			data_analysis			Data analysis information. Describes data analysis methods based on the use of computer, such as peak extraction.
	ID	D_ID		id	/D\d+/	Data analysis method ID that does not duplicate in the analysis method (M).
	Title	D_Title		title	STRING	Short Title.
	Data Analysis Set ID	D_Data Analysis Set ID		data_analysis_details_id	/DS\d+/	Detailed data analysis method information ID (DS) applied.
	Recommended decimal places of m/z	D_Recommended decimal places of m/z		recommended_decimal_places_of_mass	{default OR INT}{[\|peak INT] OR [\|Instrument X INT]}... *5	Number of significant figures.
	Comment	D_Comment		comment	TEXT *3	Comment

SS			sample_preparation_details			Detailed information about sample preparation. Shared in the sample set.
	ID	SS_ID		id	/SS\d+/	The sample preparation details ID that does not duplicate in the sample set (SE).
	Title	SS_Title		title	STRING	Short title
	Description	SS_Description		description	STRING	Details about sample preparation. In the case of biological samples, for example, details of growth conditions and drug treatments are described. Descriptions that depend on analytical methods should not be included here, and they should be included in the details of analytical methods (MS).
	Comment_of_details	SS_Comment of details		comment_of_details	TEXT *3	Comment
MS			analytical_method_details			Detailed analysis method information. Shared within the sample set.
	ID	MS_ID		id	/MS\d+/	Detailed analysis information ID that does not duplicate in the sample set (SE).
	Title	MS_Title		title	STRING	Short title
	Instrument	MS_Instrument		instrument	STRING	Instrument name and vendor name
	Instrument Type	MS_Instrument Type		instrument_type	*6	Instrument type
	Ionization	MS_Ionization		ionization_method	*6	Ionization method
	Ion Mode	MS_Ion Mode		ion_mode	*6	Distinction of positive analysis and negative analysis
	Description	MS_Description		description	STRING	Details about methods of instrumental analysis. Describes all details regarding analytical instruments and analysis conditions. Describes sample preparation methods too, other than information that depends on the sample. For example, homogenization and metabolite extraction method should be described here.
	Comment_of_details	MS_Comment of details		comment_of_details	TEXT *3	Comment

DS			data_analysis_details			Detailed information of data analysis methods. Shared within the sample set.
	ID	DS_ID		id	/DS\d+/	Detailed analysis method information that does not duplicate in the sample set (SE).
	Title	DS_Title		title	STRING	Short title
	Description	DS_Description		description	STRING	Describes all details regarding data analysis methods such as software programs used and the parameters adopted.
	Comment_of_details	DS_Comment of details		comment_of_details	TEXT *3	Comment

AM			annotation_method_details			Detailed information about annotation methods.
	ID	AM_ID		id	/AM\d+/	Annotation method ID that does not duplicate in the sample set (SE)
	Title	AM_Title		title	STRING	Short title
	Description	AM_Description		description	STRING	Describes details regarding annotation methods. Describes standards by which annotation has been assigned.
	Comment_of_details	AM_Comment of details		comment_of_details	TEXT *3	Comment

P *7			peak			Peak information. Detailed description of each individual peak obtained and its annotation.
	Peak ID *7			@id	/P\d+/	Peak ID that does not duplicate in data analysis method information (D)
	Intensity *7			intensity	DOUBLE	Peak intensity The interpretation of value, if it is the relative value or the absolute value, is described in data analysis method information (D).
	Retention Time (min) *7			retention_time	DOUBLE	Retention time. The unit is minutes. If CE-MS, this indicates Migration Time.
	Retention Index *7			retention_index	DOUBLE	Retention time index. If CE-MS, this indicates Migration Index.
	Mass Detected *7			mass_detected	DOUBLE	m/z value of the parent ion that was detected. If GC-MS, this indicates null.
	Ion Species *7			ion_species	STRING *6	If LC-MS, this indicates the type of ion detected. [M+H]+, etc.
	Isotope Peaks *7			isotope_peaks	MI:MASS INT[\|13C1:MASS INT[\|13C2:MASS INT[\|13C3:MASS INT...]]] *8	The m/z value of isotope peak and intensity information
	EI MS spectrum *7			ei_mass_spectrum	9 10	If GC-MS, this indicates MS spectrum information with EI.
	MSn spectrum *7			msn_spectrum	9 10	If LC-MS and CE-MS, this indicates the MSn spectrum.
	UV absorption spectrum *7			uv_absorption_spectrum	9 11	If LC-MS, this indicates the UV-Vis absorption spectrum. NIR and IR will also be available in the future.
	Annotation *7			annotation	STRING	Annotation information. Describes information regarding the elemental formula, the compound name, the compound group name, and the degree of annotation confidence.
	Annotation Method ID *6			annotation_method_details_id	/AM\d+/	ID of the detailed information of annotation methods (AM)
	Annotated Compound ID *7			annotated_compound_id	Database Name:ID[\|Database name:ID]... *4	Annotated compound ID
	Comment *7			comment	STRING	Comment

*1 "@" indicates the attribute name, while other are indicated by element name.
*2 "STRING" indicates a non-breaking string. "TEXT" indicates a breaking string. "INT" indicates an integer. "DOUBLE" indicates a double floating-point number. "MASS" is the value that indicates m/z value. "ID" indicates the database ID. A string separated with "/" indicates the regular expression. A portion between "[" and "]" indicates the block that can be added arbitrarily. The character "..." indicates the repetition of the last portion separated by "[" and "]" or the similar pattern. The character "|" indicates the delimiter, which does not mean "OR" used as one of the regular expressions. The portion separated by "{" and "}" indicates the block that can be added to the pattern before or after "OR". "OR" means the "OR" of one of the regular expressions. Other expressions indicate reserved words.
*3 When the line head is prefixed with "[", the portion up to the next character "]" is considered to be the subfield name. The portion up to the line end is considered to be the content of the subfield. This specification is prepared for future function enhancement.
*4 Only the determined STRING for the database name is inserted, but is not always defined with XSD.
*5 "default": A reserved word that means "just as described". Can still be used even though changed to an integer value. "peak": The number of digits of m/z detected within peak information. "Instrument X": The number of digits of mass within msn_spectrum.
*6 Only the determined STRING is inserted, but is not always defined with XSD.
*7 Peak information (P) is not used in Metabolonote.
*8 "MI": A reserved word that indicates the monoisotopic ion. MASS becomes identical with m/z detected. "Isotope (e.g. 13C1)" indicates the isotope of the isotopic peaks and the number of isotopic peaks within a molecule.
*9 Not written on the peaktable file. See Spectrum Data Format for details on how to describe this information.
*10 The xml definitions of MSn and EI MS. This value can have multiple ion elements with "mass" and "intensity" as attributes.
*11 The xml definitions of UV-Vis. This value can have multiple absorption elements with "wave_length" and "value" as attributes.

Other rules

Omission of top-level ID

When the metadata ID is described with omission of its top-level ID, the metadata ID is recognized to come under the same top-level metadata. For example, a ID "DS2" written in the description of metadata "SE1_DS1" represents the ID "SE1_DS2."

"PSEUDO: " a blank node

A metadata whose Title starts with "PSEUDO: " represents a blank node which is conveniently prepared for placing the lower-level metadata. Several processed data (D) can be further used for another integrated data analysis (D). In this case, the metadata for the integrated analysis should not be related to a certain substance of sample or raw data. To describe such metadata, a blank node to construct the metadata hierarchy is needed. The description "PSEUDO: " in the head of Title in sample (S) or analytical method (M) class is a marker of such a conveniently prepared metadata as blank node.

ID Assignment

See here for the rules for ID Assignment.

File Type and Extension

Data type	Example of ID	File descriptor (extension)	Description	File format
Metadata	SE**	.info.txt	Files that contains metadata of each class (SE, S, M, D, MS, DS, AM)	The Element name, the Attribute name or subelement name of XML schema, and the values of thme are described in tab delimited format. Sample file is here.
	SE_S
	SE_S_M**
	SE_S_M_D
	SE_S_M_D_P**
Peak related data (for multiple peaks)	SE_S_M_D	.peak-table.txt	Information of detected peaks are described in a table.	The attribute name or subelement name of the XML schema for peak information (P) (excluding the spectrum data) and their values are described in tab delimited text format. Sample file is here.
		.msn-list.txt	MSn spectrum data in list.	See the section "Fromat of spectrum data file" in detail. A sample of msn-list file is here.
		.uv-list.txt	UV-Vis spectrum data in list.
		.ei-list.txt	EI mass spectrum data in list.
Peak related data (for a single peak)	SE_S_M_D_P**	.peak.txt	Information of a detected peak.	The format is same as that of "peak-table.txt", although data for only one peak is included.
		.msn.txt	MSn spectrum data for a single peak	Same as ".msn-list.txt" file
		.uv.txt	UV-Vis spectrum data for a single peak	Same as ".uv-list.txt" file
		.ei.txt	EI mass spectrum data for a single peak	Same as ".ei-list.txt" file
		.peak-all.txt	All information related to a single peak	Data in .info.txt, .peak.txt, .msn.txt, .uv.txt, and .ei.txt (if exists) are concatenated in a file.

Data type	Example of ID	File descriptor (extension)	Description	File format
Raw data (binary)	SE_S_M**	.bin.zip	The binary raw data generated by the analytical instrument.	A zip compressed file includes the binary raw file, .info.txt file and other additional files such as license information.
Raw data (text)	SE_S_M_D	.txt.zip	Text files that contain unprocessed near-raw data extracted from the binary raw data.	A zip compressed file includes the text files below, .info.txt file, and other additional files such as license information.
	SE_S_M_D	.raw-ms.txt	chromatogram data	It will be discussed and defined according to requirements. If the full mass data and MSn data are prepared in separate files, the raw-ms.txt files can be provided with branch numbers. At least one of raw-ms.txt or raw-ms-table.txt must be provided. If in the case UV-Vis data exists, at least one of raw-uv.txt file or raw-uv-table.txt file must be provided.
	SE_S_M_D	.raw-uv.txt	Raw UV-Vis spectrum data
	SE_S_M_D	.raw-ms-table.txt	Mass chromatogram data in table format.
	SE_S_M_D	.raw-uv-table.txt	UV-Vis spectrum data in table format.

Format of data file

Described in text files.

Common file header

Files must contain a header line shown below as the first line.

" <tab> " means a tab (control character). The data values are shown in parentheses "[]".

# <tab> id <tab> [Database name]:[Metadata ID].[File descriptor]

(Example)

# <tab> id <tab> kazusa:SE01_S01_M01_D01.info.txt

Optional header

Other information can be attached after the first line.

# <tab> license <tab> [License information]

(Example)

# <tab> license <tab> CC BY-SA

Peak table

Data file that contains information of multiple peaks in tab delimited table format.

A column header line is described after the common header line.

The attribute name and subelement name of the XML schema for Peak information (P) (described in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name") should be described with being delimited by tab.

The spectrum data (ei_mass_spectrum, msn_spectrum, and uv_absorption_spectrum) should not be included in this file.

The data values are described in the following lines with being delimited by tab.

(Example)

Sample file is here.

Format of spectrum data file

This file format is defined to describe the data below.

MSn spectrum data
EI mass spectrum data generated by GC-MS analysis
UV-Vis absorption spectrum data

More than one data block defined below should be described after the common header line.

The header line starts with ">" and following data line(s) containing a pair of values. Tab is used as delimiter.

(Example) In the case of msn-list data

A sample of msn-list file is here.

Header line

Each column of the header line contains below.

Column	Description	Requirements	Value format *1
1	Peak ID	mandatory	/>P\d+/ (">" + Peak ID)
2	Descriptor for MSn and detector type	mandatory	STRING *2
3	Type of instrument	mandatory	STRING *3
4	Ion mode	mandatory for MSn data	/[+\|-]/ (positive or negative)
5	Mass scan mode	mandatory for MSn and EI data	/[c\|p]/ (centroid or profile)
6	Ionization method	mandatory for MSn and EI data	STRING *4
7	Collision energy	mandatory for MSn and EI data	STRING *5
8	m/z scan range	mandatory for MSn and EI data	/[\d\.]+-[\d\.]+/

*1 Same as *2 of the table in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name".

*2 Details of the descriptor is described in the next section.

*3 Specified strings such as ITMS, FTMS, TOF-MS for EI, and PDA for UV-Vis analysis should be described.

*4 Specified strings such as ESI and EI should be described.

*5 Different descriptions can be described according to the type of instruments. (Example) cid35.00, 70eV, etc.

Descriptor for MSn and detector type

Multi-stage MS (MSn)	msn event descriptor [mass value of precursor ion @ msn event descriptor that generate the precursor ion]
Electron ionization	EI
UV-Vis absorption spectrum	PDA, etc.

Multiple msn event descriptors that having the same name should not be contained in the data for a single peak ID.

In the case of MS2, the part of [mass value of precursor ion @...] can be omitted, because the precursor ion is explicitly same as the peak metabolite.

In the case of MS3 or further stage of MSn, the part of [mass value of precursor ion @...] must be described, because the origin of the precursor ion should be identified.

(Example)

ms3_1 [123.456@ms2_1]

The msn event descriptor

"ms" followed by the number of the stage. If multiple data exist for the same stage, they should be identified by branch numbers.

(Example)

Derived from the peak metabolite

ms2

Derived from the peak metabolite (such as the case that multiple MS2 data are acquired at multiple retention times).

ms2_1, ms2_2, etc.

Derived form a product ion generated by a MS2 analysis.

ms3, ms3_1, etc.

Data lines

Column	Description	Requirement	Value format *1
1	m/z value or wave length (nm）	mandatory	DOUBLE
2	intensity	mandatory	DOUBLE

1 Same as *2 of the table in the section "Correspondence of the XML Element/Attribute and Metabolonote Field Name".

TogoMetabolomeDataFormat

Contents

Definition regarding Description Fields

XML Definition File (XSD)

Correspondence of the XML Element/Attribute and Metabolonote Field Name

Other rules

Omission of top-level ID

"PSEUDO: " a blank node

ID Assignment

File Type and Extension

Format of data file

Common file header

Optional header

Peak table

Format of spectrum data file

Header line

Descriptor for MSn and detector type

The msn event descriptor

Data lines

Personal tools

View and Edit Metadata

Variants

Views

Actions

Search

Navigation

Active User Ranking

Toolbox