We talked about the importance of the training data, and value in it. Here’s how chatGPT handles training data, with some concrete examples: OpenAI API
I recorded most of it with Otter, and then fed it to chatGPT to summarize (just to try it out). It did a bad job… couldn’t really take all of the discussion because it was too long (too many tokens!!!). I also can’t upload the raw transcript either :\ But here’s the links to the transcripts (had to make it into two sets):
Also, just want to note that in their docs you cannot (even with paid plans) increase the total number of tokens per request (which is 4096 for the fanciest model) - OpenAI API.
You can improve the quality of your custom model by fine tuning it (OpenAI API). Fine tuning gives it query - answer pairs to provide more accurate answers with. However, you can’t really just feed it a bunch of new bulk text (like all my meeting notes or something) and have it ‘learn’ from that the same way it ‘learned’ from the internet. Rather, you’d have to feed it only < 4096 tokens worth (which is not that much, a page or so).
So basically, we can’t really make a model for our local selves / groups per se, because that seems (at least how chatGPT is set up now) it would require a more significant retraining.