Machine learning to translate data between structures

gbathree · January 24, 2023, 4:48pm

Hi GOAT, OT and other folks, I have a bit of a challenge:

There’s a lot of excitement about populating forms which were previously submitted… possibly pulling from an older version of the same survey (CFDN, Sea Coast Eat Local, basically anyone who’s doing enrollment forms that will be updated), possibly pulling from another survey all together.

I’ve been playing with ChatGPT and kind of amazed at the ability to put things into the right box using AI. Obviously this isn’t a chatGPT problem, but I’m wondering if there are good tools to use fuzzy logic or other general machine learning tools to figure out what information from a previous survey (or schema) would likely be in what data fields from the current survey (or schema). This is something that Quickbooks (in the US) does every year (they pull in all your last years info).

If generalizable, it could be used for surveystack surveys, schemas (like openteam conventions or other conventions), etc. This may also be handy in trying to bulk transfer info from our DB to another, or even DB migrations (imperfect but perhaps can make logical recommendations). Also, because it’s solving a general problem impacting lots people, it’s likely to have more investment or interest.

Do you think there’s machine learning code to do this kind of work (and what and where if so!), and what conditions would make this more or less likely to success? @paul121 @jgaehring @3wordchant @tibetsprague @wgardiner @ircwaves maybe interested in this question.

jgaehring · January 24, 2023, 8:10pm

I can’t find her to @ her here on the forum but Rian Wanstreet might be a good person to talk to. I believe she worked on or knew people involved with Mozilla’s Common Voice project prior to joining OpenTEAM.

I haven’t looked at “open” options for machine learning in a long while, and generally lost interest when I realized any specialized applications of ML (by “specialized”, I mean anything that requires rolling your own training data, models and/or algorithms) are way more trouble than they’re worth for solving anything like the kinds of problems I encounter in my day-to-day work on agtech. It’s just overkill, and usually a simple regression analysis would be more appropriate for the small- to medium-sized datasets I’m likely to ever be working with. And the lack of truly free and open datasets and models (Common Voice being a notable exception) discouraged me from seeking more generalized ML approaches, akin to the one I think you’re describing. But that assessment of mine is several years old now, and I’d be really interested to know what options exist now. Obviously, ChatGPT is a major recent development with some potential to change that landscape, but my currently low bandwidth combined with an entrenched skepticism has stopped me taking the time to find out if the service-level/licensing agreements underpinning those services are any better than the state of the art, say, 5 years ago, let alone if any of it could be applicable to anything I’m working on. Still skeptical, but hope to be proved wrong!

gbathree · January 27, 2023, 4:15pm

Rian works at OT now so I can ask her!

Common Voice was awesome, but it was voice recognition more than anything else.

I think it’s something we all need to be thinking more about, I don’t think (based on using it) it’s vaporware. The open source question is very very real, but the world will move regardless so we need to think about this either way.

jgaehring · January 27, 2023, 5:22pm

Reminds me this popped up on my feed a few days ago and I still need to give it a listen:

I’ve thought off-and-on over the years that it’d be great for the GOAT community to connect somehow with people/orgs doing AI work of this nature, like Gebru’s Distributed AI Research Institute or similar:

Perhaps the time has come? This stuff is so way over my head, both the actual technology and the vast amount of scholarship on the ethical and sociological implications. I agree AI definitely deserves consideration from the ag community, if nothing else b/c the potential for negative outcomes and abuse seems especially high for our industry. I feel those considerations also require a real dialogue with the experts, though, if we hope to reach any meaningful conclusions about its possibilities for ag-tech.