Building an E-Commerce Price Monitor in Python

The Real Problem Isn't Fetching

Anyone can write a price scraper. The hard part of competitive price monitoring is everything that happens after the price comes back. Storing the history so you can spot trends. Distinguishing a real price change from a scrape error. Routing alerts to the people who care, without burying them in noise. Handling the long tail of weird per-site behaviors — bundle pricing, member-only discounts, sale countdowns, regional variants — without your alert system going haywire every time a site changes something cosmetic.

This article walks through how I structure production price-monitoring systems. The fetcher is a small part of it; the schema, change-detection logic, and alert routing are where the leverage is. The patterns scale comfortably from a few hundred SKUs across one site to tens of thousands across a dozen.

Schema First

Price-monitoring schemas die when they conflate "the SKU we're tracking" with "an observation of that SKU at a point in time." Keep them in separate tables.

CREATE TABLE tracked_skus (
    id          BIGINT PRIMARY KEY AUTO_INCREMENT,
    site        VARCHAR(64) NOT NULL,
    site_sku    VARCHAR(128) NOT NULL,
    url         TEXT NOT NULL,
    label       VARCHAR(255),  -- our internal name
    category    VARCHAR(64),
    competitor  VARCHAR(64),
    is_active   BOOLEAN DEFAULT TRUE,
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY unq_site_sku (site, site_sku)
);

CREATE TABLE price_observations (
    id              BIGINT PRIMARY KEY AUTO_INCREMENT,
    tracked_sku_id  BIGINT NOT NULL,
    observed_at     TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    price_cents     INT,
    list_price_cents INT,         -- "was" price for sales
    in_stock        BOOLEAN,
    currency        CHAR(3),
    raw_payload     JSON,         -- everything else for forensics
    INDEX idx_sku_time (tracked_sku_id, observed_at)
);

Two design choices to call out:

Prices in cents, never in floats. Floating-point arithmetic on currency produces rounding errors that make change detection unreliable. Use integer cents (or whatever the smallest currency unit is) end to end. Convert to display format only at the UI layer.

Append-only observations, never updates. Every scrape inserts a new row. The current price is "the most recent observation"; the history is "every observation, ever." This makes trend analysis trivial and means you never lose data because of a bad parse — you just have one observation that's wrong, with the raw payload preserved for debugging.

For trend queries, a materialized "current price" view (refreshed once per scrape cycle) keeps the dashboard fast without losing the underlying history.

Change Detection

The naive approach is "if the new price differs from the last observed price, alert." This produces a torrent of noise. Real price-monitoring change detection has to handle three cases the naive version doesn't.

Scrape errors that look like price changes. The site served a challenge page; the parser returned None; the comparison "None != $19.99" looks like a change. Filter at the comparison layer: if either side is None or zero or implausibly far from the running median, classify as a parse failure rather than a price change. Log it as a parse error, don't alert, and don't insert the bad observation as if it were real.

Cosmetic fluctuations. Some sites round prices differently in different contexts, show different prices for logged-in vs. anonymous users, or A/B test microdiscounts. A change of less than, say, $0.50 or 1% is usually not signal. Make the threshold configurable per SKU or per category, with a conservative default.

Sale start / sale end vs. base-price changes. A list price moving from $99 to $79 with a "SALE" badge is different from a list price moving from $99 to $79 with no badge. Track both fields. The first is a marketing event (often expected, often time-limited). The second is a real competitive move.

The change detector takes a new observation and the previous one and emits a structured ChangeEvent:

@dataclass
class ChangeEvent:
    tracked_sku_id: int
    change_type: Literal["price_drop", "price_rise", "stock_in",
                         "stock_out", "sale_start", "sale_end",
                         "no_change", "parse_error"]
    old_value: Any
    new_value: Any
    delta_pct: float | None
    significance: Literal["high", "medium", "low", "none"]

The significance field is what drives alerting. High = page out, alert immediately. Medium = batch into a daily digest. Low = log only, surface in the dashboard if asked. None = don't even log; this is below the noise floor.

Fetching at Scale

For a few hundred SKUs across a couple of well-behaved sites, the fetcher is a plain requests loop with polite delays. For the actual production case — thousands of SKUs across sites with bot protection — the fetcher is the same multi-stage worker pool I describe in the production web scraping article. Queue table, parallel workers partitioned by SKU ID modulo, block-only proxy rotation, on-disk cache.

The cache is critical. Fetch the page once per cycle, parse from the cached HTML. If the parser breaks tomorrow when you add a new field, you can re-parse last week's cache to backfill the historical data — no extra fetches, no burned proxies.

Run frequency is per-category. Top-priority SKUs ("the 50 we react to") get scraped every hour. The long tail gets scraped daily or twice-daily. There is rarely a reason to scrape every SKU at the same frequency; tier the schedule and you'll cut bandwidth and proxy consumption significantly without losing meaningful signal.

Alert Routing

Alerts are where most price-monitoring systems lose the people they're supposed to help. If the team is getting 200 emails a day, they will route them to a folder no one reads. The alerting layer has to be opinionated.

Three channels, three jobs:

Real-time channel (Slack, SMS, push) — only for significance = high events. Top SKUs only, large changes only. The threshold should be set so the team gets at most a handful of these per day. If they're getting more, the threshold is wrong, not the system.

Daily digest (email) — everything significance = medium from the last 24 hours, grouped by category and competitor. One email, not 50. The team reads it with morning coffee.

Dashboard (web) — everything. The dashboard is where you go when you want to investigate a specific SKU, see trend lines, drill into raw observations. It's pull-based, not push-based; nobody gets paged by the dashboard.

The routing rules live in the tracked_skus table or a related alert_rules table — per-SKU thresholds, per-recipient channel preferences, suppression windows. They should be editable by the operations team without a code change.

Per-Site Quirks

Every site has weird behavior you'll have to handle. The architectural answer is to put the quirks in the parser, never in the change detector or alert layer.

Examples from real projects:

Site shows different prices depending on which warehouse you geolocate to. Parser captures the warehouse; downstream filters by relevant region.
Site shows a "with promo code" price that's only valid for new customers. Parser captures both the displayed price and the post-promo price; alert layer is configured to ignore the promo path.
Site has a 24-hour sale countdown that resets every day. The "sale" is the regular price; the "list price" is fictional. Parser flags this site so sale-start / sale-end events get suppressed.
Site occasionally shows a "Sold Out" placeholder for items that are actually in stock — a CDN race condition. Parser checks the order button state in addition to the stock label.

The parser is per-site, in its own file, with a strict schema. Quirks get documented at the top of the file. When the site changes, you edit one file. When a new quirk shows up, you add it to one file. The change-detection and alerting logic stays clean.

Wrap-Up

A production price monitor is a small fetcher wrapped in a careful schema, careful change detection, and careful alerting. The schema decouples observations from tracked SKUs and keeps everything append-only. The change detector classifies events into significance tiers instead of treating every diff as an alert. The alerting layer respects the team's attention budget — real-time only for high-signal, digest for medium, dashboard for everything.

What this gets you is a system the team actually trusts. The morning email is read because it's not noise. The Slack pings are acted on because they're rare. The dashboard answers questions because the underlying data is clean.

For the surrounding architecture — the fetching layer, anti-bot strategy, dashboard design — see the E-Commerce Automation, Python Web Scraping, and Automation Dashboards hubs.