Conversation about chatGPT

We had a nice conversation about chatGPT, the pluses, uses, minuses, concerns, alternatives, and all kinds of stuff.

Miro board with notes: Miro | Online Whiteboard for Visual Collaboration

We talked about the importance of the training data, and value in it. Here’s how chatGPT handles training data, with some concrete examples: OpenAI API

I recorded most of it with Otter, and then fed it to chatGPT to summarize (just to try it out). It did a bad job… couldn’t really take all of the discussion because it was too long (too many tokens!!!). I also can’t upload the raw transcript either :\ But here’s the links to the transcripts (had to make it into two sets):


Also, just want to note that in their docs you cannot (even with paid plans) increase the total number of tokens per request (which is 4096 for the fanciest model) - OpenAI API.

You can improve the quality of your custom model by fine tuning it (OpenAI API). Fine tuning gives it query - answer pairs to provide more accurate answers with. However, you can’t really just feed it a bunch of new bulk text (like all my meeting notes or something) and have it ‘learn’ from that the same way it ‘learned’ from the internet. Rather, you’d have to feed it only < 4096 tokens worth (which is not that much, a page or so).

So basically, we can’t really make a model for our local selves / groups per se, because that seems (at least how chatGPT is set up now) it would require a more significant retraining.

We can however train it to do specific things… like I’m imagining would be really useful for our community, training data that said: `input: , output: … this could be a farmer talking about his or her actions during that day converted to a farm management data format (JSON schema maybe) as an example.

This just reminded me of our conversation:

Was it Vic who mentioned a similar instance of getting a recommendation for a scholarly article that didn’t exist?

NYT just did a few pieces on the Bing chatbot. This is the long transcript of the conversation :
Follow-up articles were also interesting but this is the moment in time our future historians will be talking about… human or otherwise.

We discussed the possibility of replacing apis or other interactions that have the be very manually plumbed…

1 Like

Interesting newsletter on “open-ish-ness” in ML, and more broadly:

Reminds me, too, of @brianwdavis’s post:


1 Like

The Distributed AI Research Institute has put up all their panel discussions from the Stochastic Parrots day event here. It’s super refreshing to hear these scholars talk clearly and casually about issues I care about (data governance & consent, data sourcing, worker exploitation) in an information field that depends so heavily on urgency & obfuscation.

1 Like



I will check that out Vic, it’s on my watch list :slight_smile:

I think I have a very scoped perspective on what I want these things to do…

I don’t much want a parrot to tell me how to manage my business, who to hire or fire, or what’s important in the world. If other people want to do that, god bless I suppose.

I DO want a parrot (ai) to connect to other dumber parrots (web services) so that they can communicate information to each other (apis) without having to spend 500 hours of non-parrot time (humans who should be doing things that are meaningful to them) getting the dumber parrots (web services) to talk in the same dumb language (common schemas) rather than two separate dumb langauges (two different schemas / api’s).

I ALSO want a parrot (ai) to take my words and put them into a form (db) so that other dumb parrots (web services) can use them when I want them to (permissioned), and so I can share stuff with my friends and their dumb parrots (apis and dbs).

Finally, I want to stop staring at (&$% screens all day so I can get @#$( dumb parrots to do things that I’m not even entirely sure I give two #$@#$ about and I’m becoming increasingly doubtful anyone else should either.

1 Like

I would like for us to develop ways to connect web services with each other without the intermediary fiction that the pipeline is thinking. I do not understand how this tool (large language models/AI/chatGPT) does anything for the specific tasks we are trying to accomplish besides providing illusory cover for actors at either end of the process to convince ourselves it is doing something that it is not, in fact, doing.

  1. if greater schematization is the answer (and I’m not sure it always is tbh!!) then I don’t think ai is the right way to do it-- I want a traceable pipeline that can be consistently reconstructed & replicated all the way through the schema from either direction, every time. If greater schematization is not the answer, ai is DEFINITELY not going to help.

  2. there’s no way to ask a parrot to synthesize or condense information without de facto asking it what is important in the world - this is what synthesis is and does.

  3. if the objective is to reclaim more of our time, we’ve seen ample, conspicuous, evidence that automation alone does not ever provide this (labor organizing does).


Me w/ my traceable pipeline / me outside touching grass

This is the kind of thing that keeps me excited –

In the last few weeks, it’s become clear that open source chatbots and general AI will win. The open source community, largely but not solely led by those at universities (which itself is a testament to the power of changing minds over time!), is create models that are as good or better than the closed stuff. MIT licenses now abound!

This shows that the core training (fine tuning) can now be done overnight on desktop PCs, making very localized (think “Our GOAT community” or “Our Sci” or “OpenTEAM” level ai) possible today, with equivalent level responses to chatGPT 4.

That’s a big deal! That means @vicsf even if this is just about Parrots, then it’s OUR parrot. A parrot we use, just like all the other tech, to support us and our community to do our jobs better, support our community better, and help grow our movement faster. It’s a tool that we use in conjunction with other tools to support users, get feedback, decrease friction between services, etc.

It’s like I said at the beginning of this - we need to understand hammers and how we can use them. The brand will change, and I feel fortunate that open source is becoming the winning brand.

I seriously think we should create a GOAT level ai that we can train on information specifically from our services, and I would argue conversations that are happening within our services. An example is all the conversations I’m having with farmers - that would be an amazing resource for anyone wanting to know ‘how do farms use farm management systems’… Of course, right now they can ask me or amber, but frankly we have limited time and they don’t even know to ask because not everyone knows the work was done.

In light of the major media attention given to Geoffrey Hinton’s resignation from Google, this is a really amazing interview with Yoshua Bengio, who shared the Turing Award w/ Hinton, along with pop-sci media darling Max Tegmark and Tawana Petty of the Algorithmic Justice League and formerly Data for Black Lives:

I strongly recommend this to my fellow GOATs who are interested in both AI’s technical promise as well as its ethical implications, particularly the part around the 20 min marker where Petty calls out the way women researchers and people of color have been sounding the alarm for years, only to be silenced (and occasionally fired) by the predominantly white male leaders in the AI field. After all, Hinton himself didn’t raise a finger of support for Timnit Gebru in 2020 when Google fired her for raising concerns about the negative social impacts of AI:

1 Like

“Put another way, ChatGPT seems so human because it was trained by an AI that was mimicking humans who were rating an AI that was mimicking humans who were pretending to be a better version of an AI that was trained on human writing.”

1 Like

There was a very good In-Depth today on IC-FOODS, which I recommend when the video posts to the OpenTEAM channel, but it led me to this short (4m5s) explainer video on Icicle, a related project out of OSU:

I think this could definitely pique the interest of a few folks on this thread, if you’re not already familiar.