ObjTables: Tools for creating and reusing high-quality spreadsheets

Open-source toolkit for creating and reusing high-quality spreadsheets.

ObjTables combines the ease of use of spreadsheets (XLSX file or set of CSV or TSV files) with the rigor of schemas and the power of object-oriented programming.

ObjTables is ideal for exchanging scientific data such as supplementary spreadsheets to journal articles. ObjTables makes it easy for both authors to build and quality control spreadsheets and for other investigators to reanalyze, compare, and compose their spreadsheets.

ObjTables builds on a community initiative to make supplementary spreadsheets more reusable. Our goal is to create an ecosystem of reusable data for comparative and integrative research. We invite the community to share feedback or get involved.

How ObjTables facilitates reuse
How ObjTables helps researchers
Components of the toolkit
Software tools
Use cases
Workflow
Examples and tutorials
Documentation
Help
Contributing to ObjTables

How does ObjTables make spreadsheets (XLSX file or set of CSV or TSV files) human- and machine-readable?

Markup syntax for spreadsheets

ObjTables provides simple markup for indicating the class and attribute encoded into each worksheet and column of each worksheet (XLSX file or collection of CSV or TSV files).

Schemas for spreadsheets

ObjTables provides a simple format for describing the types of objects encoded in a spreadsheet, their attributes, and their possible relationships.

High-level software tools

ObjTables provides software for using schemas to parse, validate, compare, compose, and analyze annotated spreadsheets.

Example: Escherichia coli metabolism

Worksheet: !!_Table of contents

!!!ObjTables objTablesVersion='1.0.0' date='2020-03-14 13:19:04'
!!ObjTables type='TableOfContents'
!Table	!Description	!Number of objects
Schema	Table/model and column/attribute definitions
Reactions	Metabolic reactions	2

Worksheet: !!Reactions

!!ObjTables type='Data' class='Reaction' tableFormat='row'
!Id	!Name	!Equation	!ΔG (kJ mol^-1)
fbp	fructose-bisphosphatase	[c]: fdp + h2o <=> f6p + pi	-45.4
fum	fumarase	[c]: fum + h2o <=> mal-L	-2.6

Worksheet: !!_Schema

!!ObjTables type='Schema' tableFormat='row'
!Name	!Type	!Parent	!Format	!Verbose name	!Verbose name plural
Reaction	Class		row	Reaction	Reactions
id	Attribute	Reaction	String(primary=True, unique=True)	Id
name	Attribute	Reaction	String	Name
equation	Attribute	Reaction	ReactionEquation	Equation
delta_g	Attribute	Reaction	Float	ΔG (kJ mol^-1)

How does ObjTables help researchers create and reuse high-quality spreadsheets?

Use Excel & similar programs as GUIs for viewing/editing datasets

ObjTables enables users to use tools such as Excel and LibreOffice Calc as graphical interfaces for viewing and editing datasets. This makes it easy to get started, and enables users to leverage the features of these tools. ObjTables leverages the XLSX format to provide the following features within programs such as Excel:

Table of contents: Datasets can include a worksheet that describes the data represented by each worksheet.
Formatted table titles: Each worksheet includes a title bar that describes the data captured by the worksheet and each column.
Inline help for attributes: ObjTables uses comments to embed help information about each attribute into it's heading.
Select menus for enumerations and relationships: ObjTables provides dropdown menus for attributes that represent enumerations and *-to-one relationships.
Instant validation: ObjTables can setup basic validation for attributes. Note, the ObjTables software provides more extensive validation.

Create template spreadsheets for building datasets

To make it easy to build datasets, the ObjTables can generate template spreadsheets for schemas with inline help, dropdown menus, and basic validation.

Iteratively build schemas and datasets

ObjTables can use Git to revision schemas and datasets and migrate datasets between different versions of schemas (e.g., adding, removing, and renaming tables).

Query and analyze datasets

The ObjTables Python package makes it easy to find objects in datasets and use Python to conduct complex analyses of datasets such as numerical simulations.

Rigorously validate and quickly debug datasets

ObjTables makes it easy to validate datasets at multiple levels: individual atributes, individual objects, and classes of objects

Merge datasets

To help users build large datasets, the ObjTables software can merge datasets by identifying and fusing common objects.

Compare/difference datasets

To help users compare and review changes to datasets, the ObjTables software can determine if datasets are semantically equal and identify their differences.

Pretty print datasets for publication

ObjTables can generate pretty XLSX files with tables of contents, formatted table titles and column headings, and inline help.

Visualize schemas for datasets

To help users understand schemas, ObjTables can generate UML diagrams.

Components of the ObjTables toolkit

Format for schemas for spreadsheets

Schemas capture how classes, instances, attributes, and relationships are encoded into worksheets, rows, and columns. ObjTables supports three ways of encoding relationships: foreign keys, groups of columns, and grammars. Schemas also capture constraints on the value of each attribute.

Numerous data types

ObjTables provides numerous data types, including for mathematics, science, chemoinformatics, and genomics.

Markup syntax for spreadsheets

ObjTables provides syntax for indicating which cells represent each class, instance, and attribute (e.g., worksheet title !!Reactions); declaring which cells represent metadata such as the date that a table was updated (!!ObjTables ...); and declaring which entries represent comments (%/ ... /%).

High-level software for spreadsheets

ObjTables includes a web application, a web service, a command-line program, and a Python package for working with datasets. These tools can be used to pretty print, validate, compare, revision, and migrate datasets.

Python package for additional flexibility

For more flexibility, the Python package can be used to implement custom data types, define custom validation, query, and analyze datasets.

Software tools

Web application

A web application is available at objtables.org/app.

Web service

A web service is available at objtables.org/api.

Command-line program

A command-line program is available from PyPI .

Python package

A Python package is available from PyPI .

Docker image

An Ubuntu Linux Docker image with ObjTables is available from DockerHub . To use the methods of the ChemicalStructure attribute, copy a license for ChemAxon Marvin to ~/.chemaxon/license.cxl. A Dockerfile and instructions for building this image is available at GitHub .

Source code

The source code is available from GitHub .

Use cases

ObjTables was designed to help users work with complex data with the ease of spreadsheets and the rigor of schemas. ObjTables excels at cases where datasets need to be both human and machine-readable, such as supplementary materials of journal articles. ObjTables is also well-suited to emerging fields which need to quickly build new formats for new types of data.

Publishing re-usable supplementary spreadsheets

Although supplementary spreadsheets contain valuable data, supplementary spreadsheets are hard to reuse because they often contain errors and often capture data ad hoc.

ObjTables enables authors to create high-quality datasets that are both human- and machine-readable: (a) authors can use ObjTables to debug their data, (b) authors can use ObjTables to pretty print data with tables of contents and inline help, (c) authors can publish schemas for parsing their data, and (d) readers can use these schemas to parse and analyze published data with minimal effort.

Sharing re-usable data and models

Research often involves novel datasets and models that require new formats. Unfortunately, the substantial effort needed to reuse these custom formats is a frequent barrier to collaboration.

ObjTables makes it easier to share data and models with collaborators by (a) enabling researchers to clearly describe the structure of their data or model with a schema, (b) enabling researchers to capture metadata about their data or model, (c) providing researchers software tools for validating their data, and (d) enabling collaborators to use these schemas to quickly parse data from colleagues.

Building and analyzing complex datasets and models

Many fields aim to understand how behaviors emerge from complex networks. This often requires integrating diverse data. For example, systems biology aims to understand how cellular behavior emerges from genotype, often using genomics and other data. Spreadsheets are a popular tool for merging data because they are easy to use. However, spreadsheets only support a few data types, and spreadsheets have limited support for multi-dimensional data. In addition, it is difficult to debug spreadsheets.

ObjTables makes it easy to build, validate, and analyze complex datasets: (a) users can use spreadsheets to assemble diverse data, (b) users can quickly define schemas for their data, and (c) users can use these schemas to validate their data and parse it into object-oriented data structures for further analysis in languages such as Python. For example, we have used ObjTables to integrate data about the biochemistry of H1 human embryonic stem cells.

ObjTables also makes it easy to build datasets iteratively over time by helping users revision data with Git and migrate their data as they revise their schemas.

Defining formats for new types of data and predictive computational models

New areas of science often require new types of data and new kinds of predictive computational models. In turn, this often requires new formats to capture these data and models, and new software for working with these formats. Creating these formats is often an obstacle for new domains that have limited resources. Furthermore, evolving these formats as new approaches emerge is challenging because this often requires updating the software tools and converting old files to the new format.

ObjTables addresses this issue by making it easy to define schemas for domain-specific data and providing software tools for parsing, manipulating, and validating data encoded in these schemas. For example, we have used ObjTables to create, WC-KB , a format for the experimental omics, biochemical, and physiological data needed to model cellular biochemistry. We have also used ObjTables to create, WC-Lang , a format for whole-cell models of all of the biochemical activity in a cell. Creating these formats required minimal code.

Workflow: How can I use ObjTables?

Authors

Use Microsoft Excel and similar programs as graphical interfaces for complex datasets that consist of multiple distinct types of objects.
Use complex data types (e.g., reaction equations) within tables.
Use foreign keys, column groups, and grammars to encode relationships.
Define schemas that describe the class represented by each table.
Use schemas and the ObjTables software to debug, revision, and migrate tables.

Reviewers and editors

Use schemas and the ObjTables software to verify that tables are free of errors, and that they can be re-used by other investigators.

Readers

Use schemas and the ObjTables software to compare and merge spreadsheets.
Use schemas and the ObjTables software to parse spreadsheets into object-oriented data structures for analysis with tools such as Python.

Getting started: examples and tutorials

Guides

The guide page contains guides for (a) authors for creating reusable spreadsheets, (b) reviewers for verifying that spreadsheets are reusable, and (c) readers for reusing spreadsheets.

Examples

The documentation contains several example schemas and datasets .

Tutorials for the Python package

Interactive tutorials are available as Jupyter notebooks at sandbox.karrlab.org.

Documentation

Command-line program installation

Installation instructions for the command-line program are available at docs.karrlab.org. An Ubuntu Linux Docker image with ObjTables is available from DockerHub . To use the methods of the ChemicalStructure attribute, copy a license for ChemAxon Marvin to ~/.chemaxon/license.cxl. A Dockerfile and instructions for building this image is available at GitHub .

Python package installation

Installation instructions for the Python package are available at docs.karrlab.org. An Ubuntu Linux Docker image with ObjTables is available from DockerHub . To use the methods of the ChemicalStructure attribute, copy a license for ChemAxon Marvin to ~/.chemaxon/license.cxl. A Dockerfile and instructions for building this image is available at GitHub .

Docs for the schema & dataset formats

Documentation for the formats for schemas and the formats for datasets is available at objtables.org/docs.

Docs for the command-line program

Documentation for the command-line program is available inline by running obj-tables --help.

Docs for the web service

Documentation for the web service is available at objtables.org/api.

Docs for the Python package

An introduction to the Python package is available at objtables.org/docs. Detailed documentation is available at docs.karrlab.org.

Comparison with other tools

The documentation compares ObjTables with workbook editors such as Microsoft Excel, other spreadsheet schemas such as Table Schema , low code databases such as Airtable , object-relational mapping tools such as Django , schemas for data serialization formats such as JSON Schema , and relational database querying tools such as MySQL Workbench .

Help

Please contact the developers with any questions or suggestions.

Contributing to ObjTables

ObjTables is a community initiative to make supplementary spreadsheets more reusable. The long-term goal is to create an ecosystem of reusable data for comparative and integrative research. We invite the community to contribute to this endeavour. We welcome suggestions via GitHub issues or pull requests . We also welcome additional members of the ObjTables team. Please contact the developers to get involved.

By using this site you agree to use cookies to collect limited personal information to help us improve ObjTables as outlined in our Privacy Policy.

Open-source toolkit for creating and reusing high-quality spreadsheets.

Contents

How does ObjTables make spreadsheets (XLSX file or set of CSV or TSV files) human- and machine-readable?

Markup syntax for spreadsheets

Schemas for spreadsheets

High-level software tools

Example: Escherichia coli metabolism

How does ObjTables help researchers create and reuse high-quality spreadsheets?

Use Excel & similar programs as GUIs for viewing/editing datasets

Create template spreadsheets for building datasets

Iteratively build schemas and datasets

Query and analyze datasets

Rigorously validate and quickly debug datasets

Merge datasets

Compare/difference datasets

Pretty print datasets for publication

Visualize schemas for datasets

Components of the ObjTables toolkit

Format for schemas for spreadsheets

Numerous data types

Markup syntax for spreadsheets

High-level software for spreadsheets

Python package for additional flexibility

Software tools

Web application

Web service

Command-line program

Python package

Docker image

Source code

Use cases

Publishing re-usable supplementary spreadsheets

Sharing re-usable data and models

Building and analyzing complex datasets and models

Defining formats for new types of data and predictive computational models

Workflow: How can I use ObjTables?

Authors

Reviewers and editors

Readers

Getting started: examples and tutorials

Guides

Examples

Tutorials for the Python package

Documentation

Command-line program installation

Python package installation

Docs for the schema & dataset formats

Docs for the command-line program

Docs for the web service

Docs for the Python package

Comparison with other tools

Help

Contributing to ObjTables