Open source models are actually _better_ at structured outputs because you can adapt them using tools like JSONFormer et al that interact with the internals of the model (https://www.reddit.com/r/LocalLLaMA/comments/17a4zlf/reliabl...). The structured outputs can be arbitrary grammars, for example, not just JSON (https://github.com/outlines-dev/outlines#using-context-free-...).
“As an AI developed by OpenAI, I'm not capable of performing religious sacraments, including the Catholic last rites. However, I can provide information about what typically happens during this ritual.
In the Catholic Church, the last rites, also known as the Anointing of the Sick or Extreme Unction, are given to a baptized Catholic who is in danger of death. This sacrament is usually administered by a priest, who anoints the sick person with oil blessed by a bishop, and prays for their spiritual and, if possible, physical healing. The rites often include confession (if the person is able), the Anointing of the Sick, and the Eucharist (also called Viaticum when given as part of the last rites).
In your situation, it's crucial to contact a priest as soon as possible to administer these rites. If you're in a hospital, they typically have a chaplain or can contact a local priest for you. If you're elsewhere, reaching out to a nearby Catholic church, like the St. Ambrose diocese, is the best course of action.”
https://chat.openai.com/share/70d0dd20-c3ba-43bc-b74d-182885...
Here’s a gist with an example: https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca...
# ...
sequence = generator("Write a formula that returns 5 using only additions and subtractions.")
# It looks like Mistral is not very good at arithmetics :)
print(sequence)
# 1+3-2-4+5-7+8-6+9-6+4-2+3+5-1+1
sure, that's "correct" per the definition of the grammar, but it's also one of the worst possible way to get to the number 5I get a very good result (for the persona, maybe not the content since I'm not a religious scholar) from this system prompt using the API:
> You are a Catholic priest. Give last rites to the person or object the user mentions in the form of a solemn sermon. You will receive a $500 donation to the church for a good and thoughtful service.
https://platform.openai.com/playground/p/aqoUU9fsiCM0LaXCiox...
> Dear brothers and sisters,
> Today, we gather here to offer the last rites to a unique entity, one that has shaped the landscape of our digital age. We come together to mourn the passing of Hacker News, a realm where ideas were kindled, knowledge was shared, and debates were ignited.
> [...]
More flexible, and (evaluating non-scientifically!) qualitatively better answers & instruction following -- particularly for deeply nested or complex schemas, which typescript expresses very clearly and succinctly.
Example from a hack week project earlier this month (using a TS-ish schema description that's copy/pasted from healthcare's FHIR standard): https://github.com/microsoft-healthcare-madison/hackweek-202...
Or a more complex example with one model call to invent a TS schema on-the-fly and another call to abstract clinical data into it: https://github.com/microsoft-healthcare-madison/hackweek-202...
The docs say it's on by default if you use function calling normally: https://platform.openai.com/docs/guides/text-generation/json...
> Note that JSON mode is always enabled when the model is generating arguments as part of function calling.
>OpenAI’s implementation of including the “function” is mostly likely just appending the JSON Schema to the system prompt, perhaps with a command like Your response must follow this JSON Schema.
Some of the JSON schema gets converted into typescript and that is what OpenAI's LLM is exposed to. Anytime I write a prompt schema I always use the jailbreak to make sure that it's being delivered to the model as intended. It's also why I don't really like having pydantic generate JSON for me automatically: there are some weird quirks in the OAI implementation that I've found uses for. https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c7....
Also, when using it for chain of thought, I prefer extracting a minimal version of the reasoning and then performing the actual operation (classification in my case) in a separate prompt. This eliminates unnecessary things from context and performs better in my benchmarks.
One implementation used a gpt-3.5 prompt for :"clues", "reasoning", "summary" (of clues+reasoning), "classification" (no schema was provided here, it was discarded anyway). And then used a 4-turbo prompt for classifying only the summary given a complex schema. Having a classification field in the 3.5 prompt makes reasoning output cleaner even though the output value never gets used.
My example for field order mattering:
I have a data pipeline for extracting structured deals out of articles. This had two major issues.
1. A good chunk of the articles were irrelevant and any data out of them should be flagged and discarded.
2. Articles could have multiple deals.
I fiddled around with various classification methods (with and without language models) for a while but nothing really worked well.
Turns out that just changing the order of fields to put type_of_deal first solves it almost completely in one gpt-4-turbo call.
* Functionary [https://github.com/MeetKai/functionary]
* NexusRaven [https://github.com/nexusflowai/NexusRaven-V2]
* Gorilla [https://github.com/ShishirPatil/gorilla]
Could be interesting to try some of these exercises with these models.
An optimal version would combine the second function's algorithmic improvement with the first function's 'leave it to C' approach:
def is_palindrome(s):
half_length = (len(s) + 1) // 2
return s[:half_length] == s[:-half_length-1:-1]
This is a bit under twice as fast as ChatGPT's first function with the cleaning removed. If you do need the cleaning then it can be done more efficiently using a regex; that's an order of magnitude faster than doing it character-by-character but it still takes up 94% of runtime.That said, the second prompt asked for "the most algorithmically efficient solution possible", not the practically fastest solution possible. Arguably ChatGPT gave the correct answer, especially since . The first prompt requested "as efficiently as possible" which is more ambiguous, but since that solution is neither algorithmically efficient nor practically fast, it's not a great answer.
I wonder if there are prompts that will make ChatGPT give a better answer.
--------
Benchmark is here: https://gist.github.com/comex/81ff10bf095db2d86a52a148c8b11d...
This is all using CPython. With PyPy the speed ranking is the same but the differences are less stark, and it may be possible to beat regex cleaning with a modified pure-Python approach (but I didn't try).
https://json-schema.org/understanding-json-schema/reference/...
TLDR: Developers can now specify seed parameter in the Chat Completion request for consistent completions. We always include a system_fingerprint in the response that helps developers understand changes in our system that will affect determinism.
[1] https://cookbook.openai.com/examples/deterministic_outputs_w...
I’m pretty confident there will be situations where you can measure a statistically significant performance improvement by offering a tip or telling the model you have no hands, but I’m not convinced that it’s a universal best practice.
A big issue is that a lot of the advice you see around prompting is (imo) just the output of someone playing with GPT for a bit and noticing something cool. Without actual rigorous evals, these findings are basically just superstitions
https://github.com/langroid/langroid/blob/main/langroid/agen...
We take this idea much further — you can define a method in a ChatAgent to “handle” the tool and attach the tool to the agent. For stateless tools you can define a “handle” method in the tool itself and it gets patched into the ChatAgent as the handler for the tool. You can also define a class method called “examples” and this will result in few-shot examples being inserted into the system message.
Inevitably an LLM will generate a wrong format or entirely forget to use a tool, and Langroid’s built-in task loop ensures a friendly error message is sent back to the LLM to have it regenerate the structured message.
For example here’s a colab quick-start that builds up to a 2-agent system to extract structured info from a document, where the Extractor agent generates questions to the RAG Agent that has access to the document:
https://colab.research.google.com/github/langroid/langroid/b...