You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the book format where you found this issue?
[ ] pdf
[x ] web
[ ] ipynb
Chapter
In what chapter did you find this issue?
2 - Structured Output
Issue Description
few minor random comments
"Pos-training" -> Post-training
On fine-tuned JSON approaches, "JSON mode is typically a form of fine-tuning, where a base model went though a post-training process to learn target formats. However, while useful this strategy is not guaranteed to work all the time." However, from OpenAI docs, "While both ensure valid JSON is produced, only Structured Outputs ensure schema adherance [sic]." My understanding and experience is, the response is guaranteed to adhere to the provided schema. presumably they go beyond fine-tuning to some of the other approaches in their JSON mode. Hopefully same is true of other LLMs that offer structured output. ofc it doesn't guarantee integrity beyond valid JSON and matching the provided schema, can't force a length contraing, an enumerated type, could get eg refusal. TL;DR just use JSON mode if available and it should guarantee good JSON? and then, what is best practice beyond that, presumably outlines/instructor? https://platform.openai.com/docs/guides/structured-outputs .
Would maybe mention that pydantic just serves as a convenient readable way to generate the json schema format in the REST API call (maybe obvious)
From the original OpenAI blog post: "Structured Outputs takes inspiration from excellent work from the open source community: namely, the outlines, jsonformer, instructor, guidance, and lark libraries." https://archive.is/KIRVL#selection-19231.0-19240.0 . The Applied LLMs folks mention " (If you’re importing an LLM API SDK, use Instructor; if you’re importing Huggingface for a self-hosted model, use Outlines.)" https://applied-llms.org/
The text was updated successfully, but these errors were encountered:
Format
What's the book format where you found this issue?
[ ] pdf
[x ] web
[ ] ipynb
Chapter
In what chapter did you find this issue?
2 - Structured Output
Issue Description
few minor random comments
"Pos-training" -> Post-training
On fine-tuned JSON approaches, "JSON mode is typically a form of fine-tuning, where a base model went though a post-training process to learn target formats. However, while useful this strategy is not guaranteed to work all the time." However, from OpenAI docs, "While both ensure valid JSON is produced, only Structured Outputs ensure schema adherance [sic]." My understanding and experience is, the response is guaranteed to adhere to the provided schema. presumably they go beyond fine-tuning to some of the other approaches in their JSON mode. Hopefully same is true of other LLMs that offer structured output. ofc it doesn't guarantee integrity beyond valid JSON and matching the provided schema, can't force a length contraing, an enumerated type, could get eg refusal. TL;DR just use JSON mode if available and it should guarantee good JSON? and then, what is best practice beyond that, presumably outlines/instructor?
https://platform.openai.com/docs/guides/structured-outputs .
Would maybe mention that pydantic just serves as a convenient readable way to generate the json schema format in the REST API call (maybe obvious)
From the original OpenAI blog post: "Structured Outputs takes inspiration from excellent work from the open source community: namely, the outlines, jsonformer, instructor, guidance, and lark libraries." https://archive.is/KIRVL#selection-19231.0-19240.0 . The Applied LLMs folks mention " (If you’re importing an LLM API SDK, use Instructor; if you’re importing Huggingface for a self-hosted model, use Outlines.)" https://applied-llms.org/
The text was updated successfully, but these errors were encountered: