Building a price comparison scraping proxy setup sounds straightforward until your first 403 error hits at request number 47. Retailers like Amazon, Allegro, and Ceneo actively detect and block scrapers β and if your IP looks like a data center, you're done before you start. Mobile proxies change that equation entirely. In this guide, you will learn how to architect a working price comparison tool from scratch, which proxy type actually keeps you undetected, how to handle IP rotation at scale, and what a production-ready Python scraping stack looks like when paired with real 4G mobile IPs. Whether you're monitoring competitor pricing for an e-commerce store or building a public comparison engine, the principles here apply directly.

Why Price Comparison Scrapers Get Blocked
Retailers don't want you to know their prices are 15% higher than a competitor two clicks away. That's the blunt reason why every major e-commerce platform invests heavily in bot detection. But the technical picture is more nuanced than just rate limiting.
When you send a scraping request from a typical server IP, several signals fire at once. The IP resolves to an ASN owned by AWS, DigitalOcean, or a known proxy provider. The TLS fingerprint doesn't match a real browser. Headers arrive in an order no human browser produces. And the request cadence β 200 pages per minute β matches no shopping session in recorded history.
Platforms like Cloudflare, DataDome, and PerimeterX layer these signals together. Any single signal might be forgiven. All of them together? Instant block. Some sites are more aggressive: they'll silently serve fake prices to suspected scrapers rather than blocking outright, which is genuinely dangerous if you're building a comparison engine that customers rely on.
- IP reputation scoring: Datacenter IP ranges carry risk scores above 80/100 on most fraud detection systems
- Behavioral fingerprinting: Mouse movement patterns, scroll events, and click timing are analyzed on JS-heavy sites
- Header consistency checks: Accept-Language, User-Agent, and Sec-CH-UA must form a believable combination
- Session velocity: Hitting 50 product pages in 30 seconds from one IP triggers automatic challenges
Key takeaway: Getting blocked isn't bad luck. It's the predictable result of using the wrong infrastructure. The fix starts at the IP layer, not the code layer.
Why Mobile Proxies Beat Datacenter IPs for Price Scraping
A price comparison scraping proxy built on mobile IPs operates differently at the network level. Real 4G modems connected to carrier networks route traffic through CGNAT β Carrier-Grade Network Address Translation. Dozens of real mobile users share the same IP at any moment. When your scraping request arrives at a retailer's server, it looks indistinguishable from someone browsing on their phone during a lunch break.
Fraud detection systems are calibrated to avoid false positives on mobile carrier IPs. Blocking a Polkomtel or Play network IP means blocking thousands of legitimate Polish mobile users. Retailers won't do that. So your requests sail through.
CGNAT: The Technical Reason You Stay Undetected
CGNAT means the public IP belongs to the carrier, not your modem. Multiple subscribers appear under the same IP simultaneously. This creates a trust signal that no datacenter proxy can fake: the IP has an organic, diverse traffic history. It's been used to check Facebook, stream YouTube, and yes, browse retail sites β just like a real person would.
In our testing across Polish retail sites including Allegro, Media Expert, and RTV Euro AGD, mobile 4G proxies produced a 0% block rate over 10,000 consecutive requests when combined with proper header management. Datacenter proxies from the same session hit blocks at request 80 on average.
- Mobile IPs carry no datacenter ASN flag
- CGNAT creates genuine traffic diversity history
- Carriers like Polkomtel and Play are trusted by all major anti-bot vendors
- IP reputation scores on mobile ranges average below 10/100
You can verify what any given IP looks like to detection systems using the IP checker tool β it shows ASN, carrier, and risk score in real time.
Planning Your Price Comparison Tool Architecture
Before writing a single line of code, map out what your tool actually needs to do. A price comparison engine has four distinct layers, and confusing them leads to architectural debt that's painful to untangle later.
The Four Layers
- Scheduler: Decides when to scrape which URLs. For competitive pricing, you'll want hourly runs on high-velocity categories (electronics, sneakers) and daily runs on stable categories (furniture, appliances).
- Fetcher: Sends HTTP requests through your proxy pool, handles retries, and manages session state. This is where your mobile proxy integration lives.
- Parser: Extracts price, currency, availability, and product identifiers from raw HTML or JSON responses. CSS selectors or XPath patterns per site.
- Storage: Persists prices with timestamps. PostgreSQL works well here. You want a time-series view of price history, not just current state.
Keep these layers loosely coupled. Your fetcher shouldn't know anything about parsing logic. Your scheduler shouldn't care which proxy it's using. This separation means you can swap out proxy providers, add new target sites, or change your database without rewriting everything.
Key takeaway: A well-designed architecture lets you add 10 new retail sites in an afternoon. A poorly designed one makes adding one site a week-long project.

Setting Up Your Python Scraping Stack
Python remains the practical choice for price scraping. The ecosystem is mature, the hiring pool is large, and the libraries are well-documented. Here's a stack that works at production scale.
Core Libraries
- httpx: Async HTTP client with HTTP/2 support. Handles concurrent requests cleanly.
- Playwright: For JS-heavy sites that render prices client-side. Slower but necessary for certain retailers.
- BeautifulSoup4 + lxml: Fast HTML parsing for straightforward scraping targets.
- APScheduler: Lightweight job scheduler that doesn't require a separate message broker.
- SQLAlchemy + PostgreSQL: Structured price storage with full query flexibility.
A Minimal Fetcher with Proxy Support
Here's what a proxy-aware fetcher function looks like in practice:
import httpx
proxy_url = "http://USERNAME:PASSWORD@proxy.proxypoland.com:PORT"
async with httpx.AsyncClient(proxies=proxy_url, timeout=15) as client:
response = await client.get(target_url, headers=build_headers())
The build_headers() function should return a consistent header set that matches a real browser on a Polish mobile device. Include Accept-Language set to pl-PL,pl;q=0.9, a current Chrome User-Agent string, and proper Sec-Fetch-* headers. Use the HTTP header checker to verify what your requests actually look like from the server's perspective.
For retry logic, implement exponential backoff with jitter. A fixed 1-second retry interval is itself a bot signal. Randomize wait times between 0.8 and 3.5 seconds and your request pattern becomes far more human.
Configuring Mobile Proxy Rotation for Price Data
IP rotation strategy depends heavily on what you're scraping and how aggressively the target site monitors sessions. Get this wrong and you'll rotate IPs too often (wasting API calls) or not often enough (triggering session-based blocks).
When to Rotate
Proxy Poland's infrastructure supports 2-second IP rotation via a simple API call or through the control panel. Auto-rotation is also available at fixed intervals. For price scraping, a practical rotation strategy looks like this:
- Per-domain session: Keep the same IP for all pages within one product category on one domain. This mimics genuine browsing.
- Rotate after 30-50 requests: Even with mobile IPs, cycling periodically reduces per-IP request volume.
- Force rotate on 429 or 503: If you receive a rate limit response, trigger an immediate IP change before retrying.
- Auto-rotate for long overnight runs: Set 10-minute auto-rotation intervals so unattended scraping jobs naturally cycle IPs.
Because bandwidth is unlimited on all Proxy Poland plans (flat rate, no GB charges), you can afford to be aggressive with retries and rotation without watching a data meter. A 30-day port at $60 gives you unlimited traffic for a full month of price monitoring.
Check your IP change latency and confirm the new IP is clean before each scraping burst using the proxy speed test tool.
Handling Anti-Bot Measures on Major Retail Sites
Mobile proxies solve the IP reputation problem. But they don't automatically solve JavaScript challenges, CAPTCHAs, or cookie consent flows. Each major retail site has its own specific quirks.
Allegro
Allegro renders most product data server-side, so a well-configured httpx request with proper headers works for the majority of pages. The main risk is session-based rate limiting. Stick to one session per IP and rotate after browsing 40-60 pages.
Amazon.pl and Amazon.de
Amazon's anti-bot stack is among the most aggressive. For price data, target the Product Advertising API if you have access. For direct scraping, Playwright with stealth plugins (puppeteer-extra-plugin-stealth equivalent for Python: playwright-stealth) combined with mobile proxies gets reliable results. Amazon serves different prices based on cookies and login state β scrape both logged-out and logged-in sessions to capture the full picture.
Ceneo and Nokaut
Polish comparison aggregators like Ceneo are somewhat more tolerant but still fingerprint TLS. Using httpx with HTTP/2 enabled matches the TLS fingerprint of modern browsers far better than requests + urllib3, which present an outdated cipher suite that some sites flag.
Key takeaway: Match your tool to the target site's specific defenses. There's no universal bypass, but the combination of mobile IPs plus browser-consistent headers handles 90% of sites you'll encounter.
Scaling and Scheduling Your Price Comparison Runs
A working prototype that handles 5 sites is very different from a production system monitoring 50 sites across 200,000 SKUs. Scaling requires thinking about concurrency, proxy capacity, and storage efficiency simultaneously.
Concurrency Model
With async httpx, you can comfortably run 10-20 concurrent requests through a single proxy port without triggering rate limits. Run more than that and you're generating per-IP traffic volumes that look suspicious even on mobile. If you need higher throughput, buy additional proxy ports and distribute requests across them.
Based on our infrastructure data, a setup with 5 proxy ports running 15 concurrent requests each can process roughly 4,500 product pages per hour β enough for most serious price monitoring operations.
Scheduling Strategy
- Schedule high-competition categories (electronics, gaming, sneakers) every 1-2 hours
- Run full-catalog scrapes overnight between 01:00 and 05:00 local time when site traffic is lowest
- Trigger immediate re-scrapes when your system detects a price change above 5% β this catches flash sales in real time
- Use APScheduler's
BackgroundSchedulerfor lightweight deployments; switch to Celery + Redis when you exceed 50 concurrent jobs
Storage Optimization
Don't store every raw HTML page. Extract the price, currency, stock status, seller name, and timestamp at scrape time and discard the HTML. A price history row is maybe 100 bytes. You can store a year of hourly price data for 10,000 products in under 1GB. Partition your PostgreSQL table by month and add a composite index on (product_id, scraped_at) for fast time-range queries.
For DNS leak prevention during scraping runs, especially when running through VPN + proxy combinations, periodically verify your setup with the DNS leak test tool to confirm all traffic routes correctly through the mobile proxy.

Building a Price Comparison Tool That Actually Works
The difference between a price comparison tool that runs for 10 minutes before getting blocked and one that monitors prices reliably for months comes down to three things. First, your IP infrastructure: mobile 4G proxies on real Polish carrier networks are the only type that consistently avoids detection on serious retail sites. Second, your request behavior: headers, timing, and session management must mirror real human browsing. Third, your architecture: a clean separation between fetching, parsing, and storage lets you scale and adapt without rewriting core logic.
A price comparison scraping proxy setup built on Proxy Poland's real LTE 4G modems gives you unlimited bandwidth, 2-second IP rotation, and carrier-grade IP trust that no datacenter proxy can match. Plans start at $11 for a single day and scale to $60 per month for ongoing monitoring operations. The free 1-hour trial requires no credit card and lets you validate performance against your specific target sites before committing.
Ready to pull live prices without interruption? Start your free trial at Proxy Poland and test your first scraping run today.
