Session: RDF/Knowledge graph

RDF and knowledge graphs

Thank you @jtwood for facilitating!!

Resources

  • Schema.org: Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.

Interests/Whys (popcorn around group)

  • Using RDF to semantically descript API coming out of farmOS, going beyond JSON:API
  • Higher level conventions that bubble up lower-level data into semantic representations
  • Can RDF be used to help make collaborative standard definitions
  • Data schemas and how it related to methodologies, tagging, understanding potential
  • Boils down to shared meaning, we should share meaning with the data → “semantics”, meaning
  • Relationship aspect is interesting, connections, relationships: ability to visualize relationships between datasets
  • How can it help solve a social consensus of what is happening in the real world
  • Talking about the “shape” of data, typically you write software to change/transform the shape of data → can RDF help us make a consistent shape for data? Not worry about transforming shape of data, only focus on transmission of data.
  • Pay the cost of interoperability up front

Deeper dive on why

  • Global data aggregation, ecological state analysis, help come to novvel conclusions
  • Global scientific database eg: aggregate data about two different grazing systems
  • Less data silos, more decentralized data stores
  • Data is useless → conclusions are useful :: Syntax is useless, semantics is useful
  • Usually sit at desk using data, without understanding how hard it is to collect all that data
  • Data for purpose
  • Be able to collect the bare minimum data for what you are doing
  • Talk about how much we want to use data & interoperate; shared meaning will make this useful
  • Example: here is a landmass, what data do I know about this space. Actionable things are all associated with this landmass
  • Visualization, storytelling, statistical modeling
  • There are different whys for different businesses/communities

What

  • RDF:
  • Example: “Who is the king of england?” …. Wikidata vs Wikipedia. Wikipedia is human readable data, but hard for a computer to “use” wikipedia. BUT wikidata, uses the RDF data format, and computers are able to traverse these knowledge graphs to answer a question.
  • Way of taking database schemas, socially aligning the metadata
  • Failures of coordination have prevented RDF from becoming more used
  • Concrete example (Mike farmOS example):
    • Log: I seeded my lettuce here, the lettuce is now associated in this location
    • Schema.org has a “thing”, and an action” is a “thing”, and a “movement” is an “action”
    • Could we represent this log as a schema.org “movement”
  • Conventions for a “soil test” → could this be described in a semantic way?
    • Rothamsted convention, etc, let people build their own conventions
  • if you had this in an RDF format, still, who would use this? You could have done this
    • We could have, but didn’t need to build integrations/share meaning at that time
  • Coffee shop demo that used farmOS….
    • A good example of custom “conventions” for data coming in & going out

Barriers

  • Aaron core point: we want alignment among the standards. But incentives are not aligned for this to be happening
  • What are some actionable things to make this happen
  • Framework has been around for a long time, but the standards/schemas have not
  • Bigger issues of justice:
    • We’re inclined to figure out the barriers, doing this as technologists in a small group
    • Need to make the space for other unrepresented peoples to come in, in a cooperative way
  • Need funding to make something happen, but may drive it in a different direction, not always good
  • Everything on the web should be semantic, every web document should return things like this
  • Top down planning/boiling the ocean
  • Lets try bottom up. Here is a very specific thing I need to describe…
  • Business models are often structured with how you extract money from data
  • A lot more effort to do this, labor costs
  • Example: want to define ground truth of carbon, derived from models
  • Open ag data alliance: tried to put standards together in xml or something but it fizzled out
  • Semantic consensus in data standardization - this is core to huge business models. If we can make this less expensive, does this screw with our people

Benefits

  • SurveyStack use case:
  • What tillage is, asks questions about what a tillage event is. Later, it pulls this back out. Tightly controlled input and output. If I enter my tillage event in a different way then it will not work in the coffeeshop
  • IF the tillage event has a semantic standard/meaning then it is easier to save & share data in this format
  • RDF being a machine readable file

Enablers

  • Ability to selectively share only some pieces of a dataset (schema is a prerequisite for this, but definitely requires business models to implement this capability)
  • We need better software libraries and tooling to help use RDF
  • Need an argument for why this is the best solution
  • Graph element
  • Are there other solutions that are better for interoperability?
  • The graph part is a good part
  • Tooling isn’t as good
  • At an early stage it seems like overkill, but adding it on later makes it complex
  • Methods for consensus setting
  • Top down deciding to bottom up… Marcus example
  • Funding specific tools for developers

Final questions:

  • Is access control critical to this?
  • Is regen using RDF? In parts of the stack. You can use tooling to RDF datasets to canonicalize data, get a hash, and save it on-chain
  • Are there connections between what we need to build??
    • YES
    • Methodology developer that has data in farmOS, we would use that data in Regen blockchain data module
    • MODUS lab test data
  • Are there other groups outside of ag that are using RDF successfully?
    • Maybe science/research?

Follow up on this with OpenTEAM tech working group on Tuesday calls…

2 Likes

@Chroma_Signet @paul121 @clevinson @anna @mstenta @aaronc @jgaehring @bharman @jtwood @GOATforMikeD

2 Likes

@paul121 - Thanks for the notes. Could you explain how a RDF approach would differ from the translation approach that has also been discussed? They seem to have common objectives - how to enable data interoperability between standards and systems? Would be good to have a simple explanation of the differences and their potential as solutions.

@chrowe @mikahpinegar - it was above my pay grade, but you guys seemed to see a way to test a translation approach. Is this RDF discussion helpful towards this?

1 Like

Photos of easel notes https://photos.google.com/u/0/photo/AF1QipP8qRLT30g3EjugzzRVnRKkyrTXh93_Eg9a2rf_

1 Like

For context about translation idea see this Session, Aggregating Data Session: Aggregating Data @mstenta are they talking about what you were talking about?

I missed the Aggregating Data session, but I’m assuming that the “translating data” idea was along the lines of the “API switchboard” discussions we’ve had in the past? Basically: if we had a common standard data format, and every service provided a translation into that format, then a “switchboard” could simplify translating from service A -> common format -> service B as a way of integrating A and B.

RDF could certainly help with this process! I think of it as one of (or part of) the technical solutions to the translation problem. The bulk of the work would be in defining the “conventions” for different types of data records, though, which in theory is a never-ending process. But if we pick some simple pieces to start with we could start small and test this approach with some specific data, then grow from there…

I’m curious what @chrowe @mikahpinegar @paul121 @jgaehring and others think, since they were in those sessions as well.

1 Like

I mentioned this in the aggregation session, but Ink & Switch’s Cambria project is a pretty neat proof-of-concept that just uses JSON Schema. The PI, Geoffrey Litt, did express some reservations about JSON Schema’s limitations. I don’t know if RDF would contribute a higher level of interpretation that could achieve similar design aims, but perhaps that’s something worth exploring.

The research essay is excellent and fairly accessible with a minimum of technical knowledge:

And the project on GitHub:

1 Like