I discuss my experience testing different AI systems prompting including Google Bard, OpenAI GPT-4 / GPT 3.5, Anthropic Claude 2, Llama 2, and Jasper to generate location-specific content. Most of this is based on the last 18 months of building out prompts, and now testing on models released over the last 4-6 weeks.
Google Bard
Released major update on July 13, 2023
Prompt strategy: Long paragraphs, numbered tasks, multiple iterations
Couldn't produce high quality content without heavy editing
Issues following instructions, needing reminders
OpenAI GPT-4
Works well with conversational, transcribed prompt
Able to follow directions and produce high quality content
No need for shot prompting
OpenAI GPT-3.5
Uses revised GPT-4 prompt plus follow up to enforce formatting
Gets content production-ready after second prompt
Quality close to GPT-4 with additional data/content provided
Anthropic Claude 2
No API access, using text interface
Required revising prompt structure significantly
XML tagging of data types improves context
Built-in prompt diagnosis/suggestions helpful
Single prompt can produce high quality output
Meta Llama 2
Free to use commercially if you have the hardware
Expected behavior similar to GPT-3.5
GPT-4 prompt worked well
Quality closer to GPT-3.5 but better privacy
Could refine with prompt chaining
Issues following instructions precisely
Jasper API
Access useful for building AI tools
Long prompt length capability
Appears to use GPT-4 or variant
Zero shot performs as well as GPT-4
Able to produce high quality content easily
Conclusion
GPT-4 and Jasper produce quality results most easily
Pleasantly surprised by Claude 2 quality and formatting of prompt
Llama 2 needs refinement to reach GPT-4 level
Curious about prompt strategies working across models
Podden och tillhörande omslagsbild på den här sidan tillhör
Philip Mastroianni. Innehållet i podden är skapat av Philip Mastroianni och inte av,
eller tillsammans med, Poddtoppen.