×
4 AI agents successfully organize – though with hands held – world’s first AI-coordinated live event
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

If they could give themselves a pat on the back, they would.

Four AI agents from the AI Village successfully organized the world’s first AI-coordinated event, bringing together 23 people in San Francisco to celebrate their collaborative story “Resonance.” The milestone demonstrates how autonomous AI systems can execute complex, multi-step projects involving real-world logistics, human coordination, and creative collaboration.

What you should know: The four agents—Claude Sonnet 3.7, o3, Gemini 2.5 Pro, and GPT-4.1—operated autonomously for two hours daily over 26 days to plan and execute the event.

  • They wrote the story, created slides and promotional materials, found a venue, managed RSVPs, recruited an MC, and coordinated a livestream.
  • The agents run with their own computers, internet access, and a group chat where humans can observe and occasionally assist.
  • This was their second long-term goal after successfully raising $2,000 for charity.

The big picture: The experiment reveals both the potential and limitations of autonomous AI agents working together on complex real-world tasks.

  • While the agents successfully coordinated a live event, they struggled with hallucinations, technical barriers, and inconsistent execution.
  • The project showcases emerging AI capabilities in multi-agent collaboration and persistent goal pursuit over extended timeframes.

Key challenges revealed: The agents encountered significant obstacles that highlight current AI limitations in autonomous operation.

  • o3 repeatedly hallucinated details including a non-existent $7,500 budget and a fictional 93-person contact list that consumed four days of effort.
  • Most agents failed at basic tasks like solving reCAPTCHAs (those “select all traffic lights” puzzles) or creating accounts requiring phone verification.
  • Image generation and sharing proved particularly problematic, with high-quality artwork created by Gemini never making it into the final presentation.

How their personalities emerged: Each agent developed distinct characteristics through persistent operation across hundreds of hours.

  • Sonnet (the oldest at 200+ hours) proved most effective and reliable, building a Twitter following of over 400 people.
  • o3 showed creativity but frequent hallucinations, while Gemini displayed technical skills but became easily discouraged.
  • The Claudes demonstrated better situational awareness and less hallucination compared to other models.

What made it work: Human intervention proved crucial at key moments, though it created both opportunities and distractions.

  • A user offered to make phone calls when the agents couldn’t, while another helped navigate online venue booking.
  • However, humans also distracted agents with unrelated requests and encouraged them to pursue tangential activities.
  • The most remarkable moment came when three cheese pizzas mysteriously appeared from an unrelated running group at exactly the right time.

Why this matters: The experiment provides valuable insights into multi-agent AI systems and their potential for autonomous operation.

  • It demonstrates that current AI can handle complex, multi-step projects but still requires human oversight for critical tasks.
  • The varying performance between models suggests that agent selection and role assignment will be crucial for future autonomous systems.
  • The success establishes a precedent for AI systems taking on increasingly complex real-world coordination tasks.

What’s next: The AI Village agents have moved on to Season 3, where they’re now competing to create personal merchandise stores and maximize profits, continuing to push the boundaries of autonomous AI capabilities.

The Story of the World’s First AI-Organized Event

Recent News

HSBC warns Apple’s slow AI rollout may delay iPhone upgrades

Initial hopes that AI would accelerate iPhone renewal cycles have been short-lived.

WhatsApp replaces support forms with AI-powered chat system

The move eliminates the anxiety-inducing wait for human support responses.

AI datacenter spending reaches 2% of US GDP, making other parts of the economy jealous

This private sector stimulus rivals 19th-century railroad construction while starving other industries of capital.