Hi all, we’ve been looking into both schemas and reference lists (or ontologies) which are standardized to use when storing soil lab test (or in-field test) data. This post is just a summary of investigation for posterity’s sake and to see if folks in the know (@jherrick @DanT @kanedan29 @mstenta @TomWatson many others) have any feedback.
Application details
- Support tools to move send samples to a lab, return data from a lab, or even ingest a spreadsheet from a lab into a standard format
- Support ability to distinguish between comparable test methods (include the exact lab method (LOI carbon), as well as measurable outputs (g carbon / kg soil)).
- Include a clear schema for standard creation of entries and storage across services and labs.
- Example applications: storage of soil data in farm information management systems (like farmOS), movement of data to / from labs, use of data by end-use application like benchmarking, modeling, etc.
So far, MODUS is the only spec really designed to handle intake/export for labs.
Left to do
- follow up with FarmLab in Australia who may know of other standards
- follow up Vic to see if NEON has a more standard schema buried in there
- follow up with Jason who maintains MODUS to get more details and use.
Review of sources
MODUS
Link: https://bitbucket.org/modus/
Modus has a schema (XML) and a list of terms. Overall, they are both very practical and are still kept up to date. Based on discussions with Jason (who maintains the current lists), a large number of large-scale labs use the Modus spec to transfer bulk data to and from those labs, primarily used with companies who do large scale sampling.
Pluses:
- very nicely curated and well documented list of terms, clearly separates by comparability like (soil carbon has several entries based on the method, for example - this is critical if we want to benchmark like against like).
- simple and well designed schema for “Result” (returning data) and “Submit” (pushing data).
- Is still actively maintained
- Is easy to push updates to (send to Jason, and he updates the bitbucket repo!)
- Overall, a good design fit for use by things we’re interested in (soilstack push/pull to labs, farmOS ingest lab data, coffee shop benchmark lab data, etc.).
Minuses:
- US specific - unclear if it’s used internationally
- XML isn’t the greatest modern format
- Not maintained by a community (just 1 person really at this point)
NEON Soil Archive
Link: https://data.neonscience.org/documents
Suggested by Tom H., this is more of a soil archive. They do use standard formats, but those formats aren’t really published in an accessible way (there’s no clear JSON/XML/other schema that can be easily used). Their formats are quite general (they collect a lot of different types of data), and they don’t seem to have a list of lab test types the same way Modus does (which is really central for our application.
Overall, they have a great soil library and accessible data, but the formatting doesn’t seem to be a fit.
European Soils Database
Links:
Again - this is mostly a database of soils, but does contain it’s own schema… however, again, it doesn’t seem to be a schema that thought deeply about how to be updated or expanded. It does contain more detailed lab measurements, but doesn’t reference specific methods (exctraction methods, DOIs etc) which are required to determine comparability between samples (so, distinguishing LOI versus dry combustion).
Pluses
- has an existing large dataset which uses this format
Minuses
- the format is not well updated (or clearly updatable)
- there is no actual schema - it’s just a database with a published list of categories) in XLS and DOC format.