Kameleo with Python and Playwright — Browser Profile Automation in Production

Why Kameleo, Specifically

If you've tried to scrape a Cloudflare-fronted site with plain Playwright, you know what happens. The browser loads, the page shows the "Just a moment…" challenge, and even after the challenge resolves, the next request gets blocked again. Switch to a residential proxy, same result. Switch to headless mode off, same result. Spend a weekend on puppeteer-extra-stealth equivalents for Playwright, and you'll claw back maybe 30% of the targets — at the cost of a stack you have to maintain against every Chrome update.

Kameleo solves this by being an actual anti-detect browser. The fingerprint it presents — canvas, WebGL, audio context, fonts, timezone, screen metrics, navigator properties, the whole surface that bot-protection vendors check — is real, internally consistent, and randomized per profile. From the target site's perspective, every Kameleo profile looks like a different real human on a different real device. From your code's perspective, it's a Chromium instance you talk to over CDP, just like any other Playwright session.

This article walks through how I actually integrate Kameleo into production Python + Playwright systems. The Kameleo docs cover the API; this is the operational layer on top — profile lifecycle, proxy pairing, worker partitioning, and the rules that keep multi-hour runs stable.

The Setup

Kameleo runs as a desktop application (or CLI in headless server environments) that exposes a local HTTP API on port 5050 by default. You create profiles via the API, start them, and Kameleo gives you back a CDP endpoint Playwright can connect to.

The minimal Python integration looks like this:

import requests
from playwright.sync_api import sync_playwright

KAMELEO_API = "http://localhost:5050"

# 1. Pick a base profile (fingerprint)
fingerprints = requests.get(
    f"{KAMELEO_API}/profiles/fingerprints",
    params={"deviceType": "desktop", "browserProduct": "chrome"},
).json()
fingerprint = fingerprints[0]

# 2. Create a profile from that fingerprint, attach a proxy
profile = requests.post(f"{KAMELEO_API}/profiles", json={
    "name": "WORKER_01",
    "fingerprint": {"id": fingerprint["id"]},
    "proxy": {
        "value": "http",
        "extra": {
            "host": "proxy.example.com",
            "port": 8080,
            "id": "user",
            "secret": "pass",
        },
    },
}).json()

# 3. Start the profile
requests.post(f"{KAMELEO_API}/profiles/{profile['id']}/start")

# 4. Attach Playwright over CDP
with sync_playwright() as pw:
    browser = pw.chromium.connect_over_cdp(
        f"ws://localhost:5050/playwright/{profile['id']}"
    )
    context = browser.contexts[0]
    page = context.pages[0] if context.pages else context.new_page()

    page.goto("https://target.example.com/")
    # ... do work ...

    browser.close()

# 5. Stop the profile when done
requests.post(f"{KAMELEO_API}/profiles/{profile['id']}/stop")

That's the whole bridge. From step 4 onward, it's normal Playwright — the same API you'd use against a vanilla Chromium instance. Everything Kameleo provides (the fingerprint, the proxy, the anti-detect behavior) is invisible to your scraping code.

Profile Lifecycle

The first decision is whether to create profiles fresh on every run, or to keep a stable pool of named profiles you reuse. Both work; they have different tradeoffs.

Fresh profiles every run means a brand-new fingerprint, a clean cookie jar, and no carry-over state. Best for stateless scraping where you're walking through public pages and the target shouldn't have any reason to recognize you across runs. The downside is profile creation isn't free — Kameleo takes a couple of seconds per profile, and you're discarding warm-up benefits (cached resources, established TLS sessions) on every run.

Stable named profiles (WORKER_01 through WORKER_06, persisted across runs) means each worker keeps the same identity over time. Cookies persist, the fingerprint stays consistent, and the target sees a user who logged in last week and is back today. This is right for any session-bearing work — logged-in workflows, customer service automation, anything where re-auth on every run would be obviously bot-shaped.

In practice I default to stable named profiles for any project where a worker takes more than a few minutes per run, and fresh profiles only for short stateless jobs. The stable model also makes proxy pairing dramatically simpler.

Proxy-per-Profile, Block-Only Rotation

The single most important operational rule for Kameleo work is: one profile, one proxy, never rotate proactively, only rotate on observed blocks.

The naive approach is to rotate the proxy on every request. That's exactly what bot-protection vendors look for — TLS fingerprint stays the same while the source IP jumps every 30 seconds. It's a louder bot signal than just sending traffic from one IP. Kameleo + a single proxy is a much quieter shape than vanilla Chromium + 100 proxies.

The architecture I use:

A JSON file holds the proxy pool. Two pools, usually: proxies.json for mobile rotating proxies, dedicated_proxies.json for static datacenter ones.
A proxy_state.json file tracks per-proxy state: last-used timestamp, in-use marker, blocked timestamp.
At worker startup, the worker claims an available proxy (not in-use, not blocked, or blocked more than 24 hours ago) and pairs it with its Kameleo profile for the entire run.
If the worker hits a block (HTTP 403, 429, 503, or a known body marker), it marks the proxy blocked with the current timestamp, stops the Kameleo profile, claims a new proxy, restarts the profile with the new proxy attached, and retries the failed URL exactly once.
If the retry also blocks, the URL is marked error and the worker moves on. No further retries on that URL in this run.

The result, on a real Cloudflare-fronted target, is consumption you can plan around. A typical overnight run with six workers and a 580-proxy pool might consume 5 proxies total — one or two per worker, mostly because the worker hit one bad proxy at startup. The rest of the pool sits idle, available for tomorrow's run.

Parallel Workers

For parallel runs, each worker process gets a disjoint slice of the proxy pool via an environment variable:

WORKER_PROXY_IDS=1,5,9,13 python worker.py --worker-id=0 --total-workers=6

The slicing is done once at launch (round-robin assignment of pool IDs to workers) so workers never compete for the same outbound IP. Each worker also runs as one OS process with one Kameleo profile. Six workers means six Kameleo profiles, six proxies in flight, six Playwright sessions — all independent.

Concurrency tops out around what your machine can sustain. Kameleo profiles aren't tiny — each one is a real Chromium process plus the Kameleo wrapper. On a 32 GB / 12-core machine, six concurrent profiles is comfortable. Twelve is doable but starts pushing memory. Above that, you want a second machine, not more workers on the first one.

Headed, Not Headless

Kameleo runs in headed mode. The browser windows are real — they just sit minimized while the workers do their thing. The CPU and memory cost is real but acceptable.

The reason is simple: the --headless flag, even in modern Chromium, leaves trace signals (specific navigator properties, missing UI features, predictable performance characteristics) that bot-protection vendors weight heavily. Kameleo could try to mask all of those, but the more reliable answer is to just run headed. The throughput cost is marginal; the detection cost of headless is not.

Block Detection

Block detection is the hinge that makes the rotation logic work, so it has to be precise. The signals I check after every navigation:

HTTP status. 403, 429, 503, 451 are blocks. 200 is provisional success.
Body markers. "Just a moment…" (Cloudflare), "Access denied", DataDome's challenge string, PerimeterX's "Press & Hold" marker. A small Python list, regex-matched against the page content.
Navigation result. Did Playwright actually navigate to the URL you asked for, or did it land on a challenge URL?
Content shape. For known target pages, a quick "is the expected element actually here?" check. If the page rendered but the data is missing, that's a soft block worth investigating.

The detection runs once per fetch. A positive signal raises BlockedError, the worker rotates and retries once, and the rotation is logged with the signal that triggered it. Months later, you can look at the logs and tell exactly which protection layer was triggering — useful for tuning and useful for explaining the system to a new operator.

What This Buys You

The architecture above is what makes Kameleo viable for serious work. Without it — without the block-only rotation, the proxy-per-profile pairing, the worker partitioning — you'll burn through proxies, get inconsistent results, and spend more time fighting the tooling than scraping.

With it, you get a system that runs unattended for hours, recovers from blocks without dropping URLs, and produces predictable proxy consumption you can budget around. That's the operational baseline that makes Kameleo worth the cost.

Wrap-Up

Kameleo is one of those tools where the API is the easy part and the operational discipline around it is what determines whether the system holds up in production. Profile lifecycle, proxy-per-profile pairing, block-only rotation, headed mode, worker partitioning. None of these are exotic; all of them are non-optional.

For deeper coverage of the surrounding architecture — the worker pool, the queue, the recovery loop — see the Kameleo Automation, Playwright Automation, and Undetectable Browser Automation hubs.