Chatbots WhatsApp Tutorial FastAPI

How to Deploy a WhatsApp AI Bot for Your Business in Under 4 Weeks

A step-by-step architecture guide for building a production WhatsApp AI chatbot — covering the Twilio integration, conversation design, LLM wiring, and the mistakes that kill projects in week 3.

Yash Garg
Senior AI Engineer & AI Automation Consultant
7 min read

WhatsApp has 500 million daily active users in India. If your business takes enquiries, books appointments, or handles support, there is a very high probability your customers are already messaging you on WhatsApp — or wish they could.

Here's how I build WhatsApp AI bots in production, based on systems currently handling thousands of conversations per day.

The Architecture

A production WhatsApp AI bot has four layers:

  1. Webhook receiver — FastAPI endpoint that accepts Twilio's WhatsApp webhook
  2. Session manager — Maintains conversation history per user
  3. AI layer — Language model with the right system prompt and tool access
  4. Integration layer — CRM, calendar, or database that the bot reads from and writes to

The mistake most teams make is jumping straight to the AI layer without designing layers 2 and 4. A stateless chatbot that can't remember what it said two messages ago is not an AI product — it's a parlour trick.

Setting Up the Twilio WhatsApp Integration

Twilio is the easiest path to a WhatsApp Business API connection without a Meta business account. You get a sandbox number in minutes.

First, create a FastAPI endpoint that Twilio can POST to:

from fastapi import APIRouter, Form, Response
from twilio.twiml.messaging_response import MessagingResponse

router = APIRouter()

@router.post("/webhook/whatsapp")
async def whatsapp_webhook(
    From: str = Form(...),
    Body: str = Form(...),
    NumMedia: int = Form(default=0),
):
    user_phone = From.replace("whatsapp:", "")
    user_message = Body.strip()

    response_text = await handle_message(user_phone, user_message)

    twiml = MessagingResponse()
    twiml.message(response_text)
    return Response(content=str(twiml), media_type="application/xml")

Point your Twilio sandbox webhook at https://yourdomain.com/webhook/whatsapp. Every incoming WhatsApp message will POST to this endpoint.

The Session Manager

Conversations require memory. Without it, every message is treated as a new conversation and the bot asks "How can I help you?" after you just told it you want to book an appointment.

I use Redis for session storage in production. For a lean MVP, an in-memory dictionary works:

from collections import defaultdict
from datetime import datetime, timedelta

# In production: use Redis with TTL
_sessions: dict[str, list[dict]] = defaultdict(list)
SESSION_TTL_HOURS = 24

def get_history(phone: str) -> list[dict]:
    return _sessions[phone][-20:]  # last 20 messages

def add_to_history(phone: str, role: str, content: str):
    _sessions[phone].append({"role": role, "content": content})
    # Prune to last 50 messages to control token costs
    if len(_sessions[phone]) > 50:
        _sessions[phone] = _sessions[phone][-50:]

For production on Vercel or any stateless environment, _sessions won't persist across cold starts. Use Redis with a 24-hour TTL:

import redis
import json

r = redis.from_url(os.getenv("REDIS_URL"))

def get_history(phone: str) -> list[dict]:
    raw = r.get(f"session:{phone}")
    return json.loads(raw) if raw else []

def save_history(phone: str, history: list[dict]):
    r.setex(f"session:{phone}", 86400, json.dumps(history[-50:]))

The AI Layer

The core of the bot is a function that takes the user's phone number (for session retrieval), the new message, and returns a response:

from openai import AsyncOpenAI

client = AsyncOpenAI()
SYSTEM_PROMPT = """
You are a helpful assistant for Sunrise Clinic. Your job is to:
1. Answer questions about services, timings, and doctors
2. Help patients book appointments
3. Handle prescription refill requests

Rules:
- Always be warm, professional, and concise
- If you don't know something, say so and offer to connect them to staff
- For appointment booking, collect: name, preferred date, and which doctor
- Never give medical advice
"""

async def handle_message(phone: str, user_message: str) -> str:
    history = get_history(phone)
    history.append({"role": "user", "content": user_message})

    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": SYSTEM_PROMPT}] + history,
        max_tokens=300,
        temperature=0.3,
    )

    assistant_reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": assistant_reply})
    save_history(phone, history)

    return assistant_reply

A few things worth noting here:

gpt-4o-mini over gpt-4o for most WhatsApp bots. It's 15x cheaper, low latency, and handles FAQ/booking tasks perfectly. Reserve gpt-4o for complex reasoning tasks.

temperature=0.3 for consistency. A customer service bot should give the same answer to the same question every time. Don't use 0.7 or 1.0 here.

max_tokens=300 because WhatsApp is a messaging interface, not a document. Long responses get ignored. Force the bot to be concise.

Conversation Design: The Part Nobody Talks About

The system prompt is the most important engineering artifact in your bot. Here's what I've learned after 50+ production bots:

Be explicit about persona. "You are a helpful assistant" produces generic, corporate-sounding responses. "You are Priya, the friendly receptionist at Sunrise Clinic, speaking to a patient on WhatsApp" produces warmer, more natural responses.

Define the exact workflows. If appointment booking requires collecting name, date, and doctor preference, say that explicitly. Don't let the model decide how to gather information — it will ask for everything in one message and confuse users.

Set firm boundaries. Enumerate what the bot will and will not do. "Never give medical diagnoses. Never share other patients' information. If asked about pricing that you don't have, say 'Let me check with our team and get back to you.'"

Handle fallback explicitly. "If the user's request is something you cannot help with, respond with exactly: ESCALATE: [brief reason]. This will route the conversation to a human agent."

Handling Escalation

No bot should handle 100% of conversations. You need a clear escalation path.

I implement a keyword pattern in the webhook handler:

if "ESCALATE:" in response_text:
    reason = response_text.split("ESCALATE:")[1].strip()
    await notify_human_agent(phone, reason, get_history(phone))
    return "Let me connect you with a team member who can help with this. They'll be in touch shortly."

notify_human_agent can be a Slack message, an email, or a CRM ticket — whatever your team uses. The key is that the handoff happens automatically and the agent gets the full conversation history.

The 4-Week Timeline

Here's how I scope these projects:

Week 1: Twilio setup, FastAPI webhook, basic session management, static FAQ responses. The bot works but isn't intelligent yet. Goal: working plumbing.

Week 2: AI integration, system prompt design, conversation flow testing. The bot handles 80% of real queries. Goal: core use cases work.

Week 3: Integration with actual systems (booking system, CRM, database). The bot can actually book appointments and look up real data. Goal: end-to-end function.

Week 4: Edge case handling, escalation flows, monitoring setup, client UAT and training. Goal: production-ready.

The projects that take 8 weeks instead of 4 are the ones where week 3 is underestimated. API integrations always take longer than expected. Scope the integration work carefully before committing to a timeline.

Things That Break in Production

Twilio rate limits: Twilio's WhatsApp sandbox limits message frequency. Production requires a WhatsApp Business API account with proper rate limit handling.

Long responses getting truncated: WhatsApp splits long messages. Design your prompts to produce responses under 1,000 characters.

Users sending voice notes: Your webhook will receive NumMedia=1 with a media URL. Either handle audio transcription or gracefully tell users to send text.

Session corruption: If your session storage fails (Redis timeout, etc.), fail gracefully — start a new conversation rather than throwing a 500 error to the user.

The "hello loop": Users often send "hi", "hello", "ok", "thanks" between substantive messages. Make sure your session logic handles these without resetting context.

Monitoring

Before you go live, instrument three things:

  1. Message count per user per day — flags spam and abuse
  2. Escalation rate — should be under 20% in a well-designed bot; if higher, the system prompt needs work
  3. Response time — WhatsApp users expect responses in under 3 seconds; monitor p95 latency

Log every message (with appropriate privacy controls) so you can review conversations and improve the system prompt over time.

Ready to Build?

This architecture handles the majority of production WhatsApp bot requirements. What I've described here is the MVP — production systems add things like multi-language support, voice note transcription, image handling, and deep CRM integration.

If you want a system designed and built for your specific business, let's start with a call. Most projects are live within 4 weeks.