Skip to content
Pydantic AI

Web Scraping Agent with Pydantic AI

Build an intelligent web scraping agent that fetches pages, extracts structured data, and handles pagination — powered by Pydantic AI.

web scrapingdata extractionHTTPparsing

Working Code

Pydantic AI
from pydantic_ai import Agent, RunContext
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a web scraping agent. Fetch pages, extract the requested data, and return it in structured format. Respect robots.txt.",
)
@agent.tool
async def fetch_url(ctx: RunContext, url: str) -> str:
"""Fetch a webpage and return its content as markdown."""
import httpx
from markdownify import markdownify
async with httpx.AsyncClient() as client:
response = await client.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=15)
return markdownify(response.text)[:5000]
@agent.tool
async def extract_data(ctx: RunContext, text: str, instruction: str) -> str:
"""Extract structured data from text based on instruction."""
return f"Extracting from {len(text)} chars: {instruction}"
result = await agent.run("Scrape the pricing page at example.com/pricing and extract all plan names and prices")
print(result.output)

Step by Step

1

Install dependencies

Install Pydantic AI and the required tools for this use case.

2

Define your tools

Create the domain-specific tool functions your agent will use to interact with external services.

3

Create the agent and run

Initialize the Pydantic AI agent with your tools, set the system prompt, and execute a query.

Ready to build with Pydantic AI?

Generate a production-ready project with Pydantic AI pre-configured — FastAPI + Next.js, auth, streaming, and more.

Get Started

Ready to build your first production AI agent?

Open-source tools, battle-tested patterns, zero boilerplate. Configure your stack and ship in minutes — not months.