Unconference session: Linked Data & Ontologies

clevinson · November 24, 2024, 9:32pm

Summary

This was a hybrid unconference session on Thursday (11/7/24) in the stage / patio. The topics being discussed ranged from:

tools & mechanisms for leveraging AI to turn unstructured data (e.g. natural language / plain text) into structured data (when explicit schemas for structured datasets already exist)
general discussion around use cases & value proposition of using linked data & RDF as a tool for data interchange & interoperability between GOAT projects
evaluating community need for a meta-registry of ontologies used within the GOAT / open ag tech communities

Action Items

We discussed using the existing AI weekly meetings organized by Our-Sci as one immediate place to use as a container for holding these conversations. Likely this work will warrant spinning out a separate working group or set of regular calls if there is enough interest in continuing the conversation, but for now there seemed to be interest in leveraging existing calls until there is critical mass for having a separate community call for this process.

Raw Notes

Rob: When I had talked about ontology and linked data — a reason I had wanted to talk about it is that it’s an interesting way to standardize things without necessarily agreeing with people
Let’s say you have “Corn” — in the UK you have “Maze”, there can be multiple labels, and these labels can be relational
- Shipping wheat, the manifests are always shipping pieces of paper. Sometimes someone says “that’s not right” and crosses it out and changes it and writes the correct label.
Robert: it’s important to know when things are most definitely not the same thing
Cory: it’s important to distinguish between an ontology and a JSON Schema
- ontologies are used to map concepts and vocabularies together in a formal, but extensible and open way
- tools like JSON Schema are used for data validation, to ensure that a given dataset has the required fields, and that those fields match certain criteria. Given that a JSON dataset adheres to a given schema, it can then be inferred how that dataset fits within a larger ontological structure, and what other data it is related to
How do we bring this closer to the farmer? Many of these conversations at GOAT have been about bringing stuff closer to the farmer
- Open futures coalition: We have multiple working groups having these conversations, and have spent a lot of time thinking about how we can standardize
  - A lot of our realization is that it’s really best to build stuff in as usable way as possible for end users, and then on the backend try and standardize the systems
Rob: The problem with local slang is that it can be interpreted very different in different contexts
There’s still an educational component where folks are coming up with their master list for the first time, what kind of workflows can we use to help people better navigate this space?
Greg: The solution to a lot of these things, is actually well known. If you have many different interacting with different data, you need some commonality of how they are structuring data
- Even if you have a common standard, there’ still a lot of pairwise API integrations that would need to get built
- Maybe, AI can dramatically reduce the effort involved in creating those connections
  - We could actually maybe get on the same page
  - Many folks that I know use AirTable, what if you could have something smart enough to store something, and it could ask questions to connect the dots, engage this ontology development world with AirTable world
- AI can be useful also in being agile and supporting with integration work itself
Cory: and additionally, AI could also be used to build ontologies
An AI could be really good at “losing weight”, as AI systems are really just a large graph of words, and an ontology is just a much simpler graph of words
Rob: If we took everybody here and asked them to come up with an ontology for Wheat, they would both be completely reasonable and completely incompatible with each other
- I think an LLM would actually have too many ways to screw it up
- I wrote a wheat ontology to describe malts, and someone got really upset at me for it
- The one things that LLM’s are really good at, is giving descriptions to things
This week, Open Futures Coalition has a platform which has the ability for folks to create articles & upload files, which can all be vectorized, and an AI chat assistance that uses a RAG for helping projects engage with this
- We have multiple parallel instances with different master lists, but we’re asking ourselves to what extent we index this explicitly, and how we do that?
- One of our devs suggested recently that we remove our tagging system because the AI is so good at being able to navigate all the articles given a search query
- So what started as a technical conversation ended up becoming a usability conversation
Let’s take weeds…
- Supposing I send something out into a field for it to do a scan of the ground, and identify as many weeds as possible-
Cory: I’d love to have a reference
There’s 3 different components here:
- Structural compatibility between our systems
- How are we aligning on vocabularies & ontologies (and contents) — is there a metadirectory that is useful across tools?
- How can we support our communities through UX or curricular or other approaches about how specific communities can navigate through their own ontologies and potentially build their own
There’s a weekly AI meeting where we’ve been sharing stuff that we have been working on, and our processes
It might not be a bad idea for an ontology group to meet

clevinson · November 24, 2024, 9:54pm

One other discussed action item which I’d love to take initiative on leading is creating a mapping of all the existing data APIs and data schemas used by various GOAT community projects, and consolditating that information into a single resource. This can probably starts as spreadsheet, and eventually get migrated to a GitHub repo where projects can maintain records linking out to their existing API or data schema docs, and track compatibility across formats.

Few questions:

Would GOATech GitHub org be an appropriate place for this? Or better to start just in spreadsheets and then we can discuss GitHub stuff later?
Is this kind of work something that can tap into the 11th Hour seed grant that @sudokita and @samuejao shared during the GOAT Rodeo session?

Symbioquine · November 25, 2024, 3:00pm

Not a criticism of the note-taking fidelity, but I recall that, for example, @Lalitha was at this one and asked good questions that aren’t credited here. Maybe we can try and at least tag the other attendees and invite them to add anything they recall was missed?

Symbioquine · November 25, 2024, 3:13pm

*(Those are the ones, not already tagged, who I can map to forum members from memory.)

clevinson · November 27, 2024, 6:41pm

Yes apologies that it was hard to juggle note taking with fully participating in the conversation myself. I’d love supplemental reflections / thoughts from others on parts of the conversation that I didn’t fully capture.

jgaehring · November 30, 2024, 4:54pm

If this happens, count me in!

Cory, I can add you to the GOATech org on GitHub if you think you want to go that route. Or even if you don’t use it for this. Otherwise that org is not being used for much, so if yer ever in need of a non-partisan community space to host a project, it’s there.