zlacker

Great article. The helpful/flawed bools for thoughts are definitely something I want to try.

>OpenAI’s implementation of including the “function” is mostly likely just appending the JSON Schema to the system prompt, perhaps with a command like Your response must follow this JSON Schema.

Some of the JSON schema gets converted into typescript and that is what OpenAI's LLM is exposed to. Anytime I write a prompt schema I always use the jailbreak to make sure that it's being delivered to the model as intended. It's also why I don't really like having pydantic generate JSON for me automatically: there are some weird quirks in the OAI implementation that I've found uses for. https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c7....

Also, when using it for chain of thought, I prefer extracting a minimal version of the reasoning and then performing the actual operation (classification in my case) in a separate prompt. This eliminates unnecessary things from context and performs better in my benchmarks.

One implementation used a gpt-3.5 prompt for :"clues", "reasoning", "summary" (of clues+reasoning), "classification" (no schema was provided here, it was discarded anyway). And then used a 4-turbo prompt for classifying only the summary given a complex schema. Having a classification field in the 3.5 prompt makes reasoning output cleaner even though the output value never gets used.

My example for field order mattering:

I have a data pipeline for extracting structured deals out of articles. This had two major issues.

1. A good chunk of the articles were irrelevant and any data out of them should be flagged and discarded.

2. Articles could have multiple deals.

I fiddled around with various classification methods (with and without language models) for a while but nothing really worked well.

Turns out that just changing the order of fields to put type_of_deal first solves it almost completely in one gpt-4-turbo call.