Input data format conventions

Input data must be submitted in a tab-separated values (TSV) file which is a plain text format for storing data in a tabular structure. It can be prepared with a help of a spreadsheet software but must be saved as a plain text format with a tabulation as a field separator. Although any file's extension is accepted (. tsv, .txt, .csv, etc.), only tabulation is accepted as a field separator.

Each input file may include data related to several analytes, but user must keep in mind that all analytes in one file will have models learned with the same parameters except when the model parameter 'k' is not fixed, and the model is selected based on the lowest BIC. If the nature of the analyte data requires models with essentially different parameters they'd be better split into several files.

Column names in one-row format

Column names can be given in two different formats that should not be mixed in the same file:

Each column in '1-row format' must have a name formatted according to one of the following patterns:

where italicized parts can be customized. 'NAME' represents the analyte name and must be the same in all columns related to a given analyte (for more info, see 'Workflow examples')

Column names in 2-row format

In '2-row format', the analyte identification and type of sample ('ref', 'test', ...) are split in two rows. The first row contains names and the second contains the types, e.g.


id gene A gene B id gene A gene B id gene A gene B
ref ref ref test test test Query 1 Query 1 Query 1

As you can see, the part that was between parenthesis in 1-row format was put on the second row in 2-row format.