Availability of data and materials

Submission of a manuscript to ISCI implies that materials described in the manuscript, including all relevant raw data, will be freely available to any scientist wishing to use them for non-commercial purposes, without breaching participant confidentiality.

Availability of data and materials section

All authors must include an “Availability of Data and Materials” section in their manuscript detailing where the data supporting their findings can be found. Authors who do not wish to share their data must state that data will not be shared, and give the reason. The following format for the Availability of Data and Materials section should be used:

“The dataset(s) supporting the conclusions of this article is (are) available in the [repository name] repository [unique persistent identifier and hyperlink to dataset(s) in http:// format].”

The following format is required when data are included as additional files:

“The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional file(s)).”

ISCI endorses the Force 11 Data Citation Principles and requires that all publicly available datasets be fully referenced in the reference list with an accession number or unique identifier such as a digital object identifier (DOI).

List of recommended repositories

A list of recommended repositories by subject area and data type is below. If you have questions as to the suitability of a given repository, please contact the Editor. If you are a repository and would like to be added to the list below, or an author who would like a repository added, please contact us at [email protected].

Nucleic acids sequences and variation

Sequence information should be deposited following the MIxS guidelines.

Data scope and type Database
Annotated collection of all publicly available nucleotide sequences and their translated amino acid sequences DNA Data Bank of Japan (DDBJ)
 Nucleic acid sequence, gene, genome European Molecular Biology Laboratory (EMBL/EBI) Nucleotide Sequence Database
Nucleic acid sequence, gene, genome GenBank (National Center for Biotechnology Information)
SNPs, variation dbSNP
Variation European Variation Archive (EVA)
Variation dbVar
Variation, structural variants Database of Genomic Variants Archive (DGVa)
Metagenome, sequence alignment, sequence information EBI Metagenomics
Sequencing NCBI Sequence Read Archive (SRA)

Protein sequences

Data scope and type Database
Protein information  Universal Protein Resource (UniProt)

Mass spectrometry

Mass spectrometry data should be supplied in the mzML format recommended by the HUPO Protein Standards Initiative Mass Spectrometry Standards Working Group guidelines.

Proteomics

Data scope and type Database
Proteome ProteomeXchange through the PRIDE website
Protein interaction MEx consortium

Structures

Data scope and type Database
Protein structures Worldwide Protein Data Bank
Nucleic acid structures Nucleic Acid Database
Crystal structure data Cambridge Crystallographic Data Centre
Crystal structure data, atomic coordinates Crystallography Open Database (COD)
Microscopy, electron density map, structure Electron Microscopy Data Bank (EMDB)
Imaging (all types) Coherent X-ray Imaging Data Bank (CXIDB)

Neuroscience

Data scope and type Database
Raw fMRI data OpenfMRI
Resting-state fMRI, DTI, phenotypic information Functional Connectomes Project International Neuroimaging Data-Sharing Initiative
Human brain statistical maps NeuroVault
Digitally reconstructed neurons NeuroMorpho

Chemical structures and assays

Data scope and type Database
Chemical structures PubChem Substance
Bioactivity screens PubChem BioAssay
Nanomaterials and their composition, nanomaterial characterizations from physico-chemical characterizations, nanomaterial chracterizations from in vitro characterizations caNanoLab
Metabolite concentrations (time-series and steady-state), flux data, and enzyme measurements and tools to build ODE-based kinetic models Kinetic Models of Biological Systems (KiMoSys)

Functional genomics data (such as microarray, RNA-seq or ChIP-seq data)

Where appropriate, authors should adhere to the standards proposed by the Functional Genomics Data Society and must deposit microarray data in MIAME-compliant format in one of the public repositories below, such as ArrayExpress or Gene Expression Omnibus (GEO).

Data scope and type Database
Microarray ArrayExpress
Microarray Gene Expression Omnibus (GEO)
Expression, epigenetics, phenotype, genotype, genomic variants (GWAS studies) Database of Genotypes and Phenotypes (dbGaP)
Protein-protein interaction, protein-DNA interaction, molecular interactions, protein-RNA interaction The IntAct molecular interaction database (IntAct)
miRNA sequences and annotation miRBase

Biological materials

We encourage the deposition of biological materials, such as plasmids, mutant strains, and cell lines, in established public repositories where one exists. Authors are also asked to check the list of known misidentified cell lines maintained by the International Cell Line Authentication Committee (ICLAC).

Data scope and type Database
Cell lines, tissue biopsies, environmental isolates BioSample database
Plasmids, DNA Addgene or PlasmID

 Phylogenetic data

Data scope and type Database
Phylogenetic data (alignments, phylogenetic trees, or other relevant primary data) TreeBase
Phylogenetic data (alignments, phylogenetic trees, or other relevant primary data); general data repository Dryad

 Environmental and ecological data

Data scope and type Database
Georeferenced data from earth systems research PANGEA
All environmental data, those funded by NERC NERC Data Centres
Biodiversity metadata, occurrences (observations, specimens, etc), checklists (names) Global Biodiversity Information Facility
Ecological data, phylogenetic data; general data repository Dryad

Other data

Much scientific data do not currently have a dedicated subject repository. In such cases, we recommend the following general repositories and workspaces through which authors can archive their data and make them publicly available.

Data scope and type Database
Ecological data, phylogenetic data; general data repository Dryad
General, free (especially recommended for code archiving) Zenodo
General (max file size 250MB, unlimited storage) FigShare
Big Data General, free (restrictions: must publish a Data Note with GigaScience) GigaDB
General data repository and workflow management/versioning, connects to other services; free Open Science Framework
General Lab Workspace (subscription) LabArchives

Publication of clinical datasets

For datasets containing clinical data, authors have an ethical and legal responsibility to respect participants’ rights to privacy and to protect their identity. Ideally, authors should gain informed consent for publication of the dataset from participants at the point of recruitment to the trial. If this is not possible, authors must demonstrate that publication of such data does not compromise anonymity or confidentiality or breach local data protection laws, for the dataset to be considered for publication. Authors must consider whether the dataset contains any direct or indirect identifiers (see herefor further information) and consult their local ethics committee or other appropriate body before submission if there is any possibility that participants will not be fully anonymous. Authors must state in their manuscript on submission whether informed consent was obtained for publication of patient data. If informed consent was not obtained, authors must state the reason for this, and which body was consulted in the preparation of the dataset.

Software and code

Any previously unreported software application or custom code described in the manuscript should be available for testing by reviewers in a way that preserves their anonymity. The manuscript should include a description in the Availability of Data and Materials section of how the reviewers can access the unreported software application or custom code. This section should include a link to the most recent version of your software or code (e.g. GitHub or Sourceforge) as well as a link to the archived version referenced in the manuscript. The software or code should be archived in an appropriate repository with a DOI or other unique identifier. For software in GitHub, we recommend using Zenodo. If published, the software application/tool should be readily available to any scientist wishing to use it for non-commercial purposes, without restrictions (such as the need for a material transfer agreement). If the implementation is not made freely available, then the manuscript should focus clearly on the development of the underlying method and not discuss the tool in any detail.

Back to top

Statistical methods

Authors should include full information on the statistical methods and measures used in their research, including justification of the appropriateness of the statistical test used (see the SAMPL guidelines for more information). Reviewers will be asked to check the statistical methods, and the manuscript may be sent for specialist statistical review if considered necessary.

Resource identification

To enable effective tracking of the key resources used to produce the scientific findings reported in the biomedical literature, authors are expected to include a full description of all resources with enough information to allow them to be uniquely identified. In support of the Resource Identification Initiative (RII), we encourage authors to use unique Resource Identifiers (RRIDs) within their manuscript to identify their model organisms, antibodies, or tools.

If human cell lines are used, authors are strongly encouraged to include the following information in their manuscript:

·       The source of the cell line, including when and from where it was obtained

·       Whether the cell line has recently been authenticated  and by what method

·       Whether the cell line has recently been tested for mycoplasma contamination

Further information is available from the International Cell Line Authentication Committee (ICLAC). We recommend that authors check the NCBI database for misidentification and contamination of human cell lines.

Gene nomenclature

Standardized gene nomenclature should be used throughout. Human gene symbols and names can be found in the HUGO Gene Nomenclature Committee (HGNC) database; requests for new gene symbols should be submitted here and any enquiries about gene nomenclature can be directed here. Alternative gene aliases that are commonly used may also be reported, but should not be used alone in place of the HGNC symbol. Nomenclature committees for other species are listed here.

Reporting of sequence variants

We endorse the recommendations of the Human Variome Project Consortium for describing sequence variants (Human Genome Variation Society) and phenotypes (Human Phenotype Ontology).

We recommend that authors should submit all variants described in a manuscript to the relevant public gene/disease specific database (LSDB): a list is available here. The database URL and the unique identifier should be reported in the manuscript.

Data

To drive the maximum re-use and utility of published research, we expect authors to comply with available field-specific standards for the preparation and recording of data. Please see the BioSharing website for information on field-specific data standards. Authors must comply with best practice in their field for sharing of data, with particular attention to maintaining patient confidentiality.

Authors using unpublished genomic data are expected to abide by the guidelines of the Fort Lauderdale and Toronto agreements. Based on broadly accepted scientific community standards, the key requirement of third parties using genomic data is to contact the owners of unpublished data (i.e. the principal investigator and sequencing center) prior to undertaking their research, to advise them about their planned analyses.