How to Scrape Google SERPs Without Getting Blocked: A…

Why Google SERPs Are a Different Beast

Most anti-bot guides lump "the web" together and hand you a checklist. Google's Search results page does not care about most of that checklist. Googlebot-style fingerprinting, passive TLS signals, mouse entropy, and a behavioral model trained on billions of real queries make Google one of the hardest surfaces on the public internet to scrape at scale. This guide is narrowly scoped: how to keep a SERP scraper alive, not how to scrape the web in general.

If you are scraping product pages, news sites or social feeds, most of what follows is overkill. But if your job is rank tracking, competitor monitoring, or feature-box extraction (People Also Ask, AI Overviews, Knowledge Panels, Top Stories), you need a Google-specific playbook.

The Signals Google Actually Reads

In 2026, Google's SERP anti-abuse stack evaluates at least six layers before it serves you a result page. Skipping one is usually enough to earn a sorry/index redirect or an inline /sorry/ reCAPTCHA challenge.

IP reputation + ASN class: datacenter ASNs (AS14061, AS16509, AS16276) are penalized harder than residential and mobile ASNs. Orange Polska mobile ranges are treated as consumer traffic.
TLS fingerprint (JA3/JA4): curl and plain requests announce themselves immediately. Google compares JA4 against the User-Agent string.
HTTP/2 pseudo-header ordering: browsers send :method, :authority, :scheme, :path in a specific order. Most scrapers do not.
Consent state (CONSENT cookie): absence on EU IPs triggers a consent wall that kills a naïve scraper.
Query entropy: same query, same session, high frequency = bot. Humans reformulate.
Click-through and dwell: a session that only fetches /search and never clicks a result is a strong bot signal; Google remembers sessions via the NID cookie.

Hit the Right Endpoint

Do not scrape www.google.com/search blindly. The SERP surface changes based on which endpoint you hit:

/search?q=...&hl=en&gl=us&num=20&pws=0 — desktop, personalization off, 20 organic results.
/search?q=...&tbm=nws — news vertical, different layout, different HTML.
/search?q=...&udm=14 — "Web" filter introduced in 2024 that hides AI Overviews. Useful for rank trackers that want classic blue links only.
/search?q=...&brd_json=1 — does not exist; any guide that tells you to append undocumented flags is wrong and usually triggers a block.

The hl (host language) and gl (geo) parameters matter more than people realize. They do not geolocate you — your IP does — but they do control the SERP UI, the spell-corrector, and which result set Google picks. If your IP is Polish mobile but gl=us, Google will sometimes serve you a transitional interstitial before the SERP. Match them: a Polish mobile IP should send hl=pl&gl=pl unless you are explicitly testing cross-locale ranking.

Session Stickiness: The SERP-Specific Wrinkle

For generic scraping, everyone tells you to rotate IPs aggressively. For Google SERPs, that advice is half wrong. Google hands you an NID cookie on first contact and expects to see it come back with the same IP for at least a short window. Rotate the IP mid-session and you look like a hijacked cookie — instant challenge.

The practical rule: one IP per session, one session per keyword batch. A "session" here means a warm-up request to google.com, a consent acceptance if the IP is in the EEA, then 5–40 SERP fetches that reuse the same cookie jar. Then rotate, burn the cookies, and start over.

On ProxyPoland mobile proxies, you do this with the per-port sticky mode:

import httpx

PROXY = "http://user:pass@api.proxypoland.com:10001"  # port 10001 = sticky
async with httpx.AsyncClient(
    proxies=PROXY,
    http2=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    },
    timeout=20.0,
    follow_redirects=True,
) as client:
    await client.get("https://www.google.com/")  # warm-up, collect NID
    for kw in keyword_batch:
        r = await client.get(
            "https://www.google.com/search",
            params={"q": kw, "hl": "en", "gl": "us", "num": 20, "pws": 0},
        )
        yield parse_serp(r.text)

Rendering: When You Need a Headless Browser (and When You Don't)

The honest answer is: for classic ten-blue-link SERPs, raw HTTP is still fine in 2026 if your TLS and header stack are correct. Where a headless browser becomes non-optional:

AI Overviews — injected via client-side JS after the initial response. Raw HTML has only a skeleton.
People Also Ask expansion — each expansion is a separate XHR to /async/ge_ansp; parsing the initial HTML only gives you the four seed questions.
Shopping / Products pack — carousels are hydrated from a lazy-loaded JSON blob.
Discover / local pack reviews — same pattern.

For these, use Playwright with the mobile Chrome profile on a mobile proxy, and do one critical thing that half the SERP scrapers forget: disable the --disable-blink-features=AutomationControlled dance and use patched undetected-chromedriver or Playwright with the stealth plugin. Google checks navigator.webdriver, Chromium's Runtime.Enable CDP chatter, and the presence of unusual CDP bindings.

Avoiding the /sorry/ reCAPTCHA

A /sorry/index?continue=... redirect is Google's soft-block. Hitting one does not mean your IP is dead — it means your current session needs to solve a reCAPTCHA v2 challenge before Google will serve SERPs again. Your options are:

Cut the session: throw away cookies, rotate the mobile IP, warm up a new session. This is the correct default for non-interactive scrapers — solving reCAPTCHAs is expensive and slow.
Solve it: 2Captcha / CapMonster can solve the v2 image challenge in ~15–40 s at $1–3 per 1,000 solves. Only worth it if you are paying per IP and burning one for every challenge is more expensive than solving.
Back off and retry later: the rate limit on a specific IP/cookie pair usually decays within 15–60 minutes.

The single best predictor of /sorry/ frequency is query shape, not volume. Scraping 500 different long-tail keywords per hour from one mobile IP is fine. Scraping the same 5 high-commercial-intent head terms 100 times each is not — Google's query-entropy signal flags it in minutes.

Rate Limits That Actually Work on Google

Published numbers from other blogs (8–15 s between requests, 1 request per minute) are outdated and cargo-culted. In 2026 what matters is:

Jitter, not fixed delay: 3–9 s uniformly randomized, with 10% chance of a 20–40 s "user reading" pause.
Daily cap per IP: budget ~600–1,500 SERP fetches per mobile IP per day. Above that, soft blocks spike.
Session cap: 20–50 SERPs per cookie jar, then rotate.
No parallel fetches on the same IP: one request at a time per proxy port. Parallelism goes across ports, not within one.

Parsing: Don't Rely on Class Names

Google renames CSS classes roughly every 4–8 weeks. A selector like div.yuRUbf > a will break before your quarter ends. Anchor on stable attributes instead:

Organic results: div[data-hveid][data-ved] h3 and the nearest ancestor <a href> whose href does not start with /search?.
Sitelinks: child <a> tags inside the same data-hveid block.
Snippets: div[data-sncf="1"] or the first <span> after the cite tag.
AI Overviews: block with data-subtree="aio" — stable since mid-2024.

If you can, offload this to an API that maintains selectors for you (SerpApi, Oxylabs SERP API, Bright Data SERP). At ProxyPoland we provide the transport layer — the Polish mobile IPs that do not get blocked — but the parsing game is a full-time job and outsourcing it is rational.

Mobile Proxies vs Residential for SERP

Residential proxies (rotating pools sourced from consumer devices) are the default recommendation in most guides. For Google SERPs specifically, real mobile IPs have two structural advantages:

CGNAT tolerance: Google knows mobile carriers NAT thousands of real users behind the same IP. A high request rate from a carrier IP is less suspicious than the same rate from a residential IP, because it is statistically normal.
Stable session windows: consumer residential proxies churn the IP whenever the peer disconnects. That breaks the sticky-session pattern SERP scrapers need. A dedicated mobile port holds the IP as long as you want.

The downside: mobile IPs are slower (50–150 ms added latency) and more expensive per IP. For rank tracking at a few thousand queries per day, the math tips toward mobile. For "scrape all of SERP for every keyword on Earth", you need a large residential pool with a specialized SERP API layer on top.

A Working Stack, End to End

Transport: ProxyPoland mobile ports in sticky mode, one port per concurrent worker.
HTTP client: curl_cffi (Python) or undici with a patched JA4 fingerprint (Node). Plain requests and axios will be flagged.
Headers: real mobile Chrome 120+ UA, matching sec-ch-ua hints, Accept-Language aligned with hl.
Cookies: accept NID, CONSENT, SOCS; persist across batch, drop on rotation.
Renderer: Playwright with stealth, mobile emulation, only for AI Overviews / PAA expansion.
Parser: attribute-based selectors, fuzzy fallbacks, daily regression fixtures.
Rotation: session-ends, /sorry/ triggers, or 20–50 SERPs, whichever comes first.

What Not to Do

Do not share cookie jars across IPs. One of the fastest ways to get every proxy you own soft-blocked.
Do not scrape suggestqueries.google.com as a substitute — it has a much stricter rate limit than SERPs.
Do not ignore the consent wall on EU IPs. Accept once per session, store the SOCS cookie, move on.
Do not scrape signed-in Google (accounts.google.com cookies). You will lose accounts and proxies simultaneously.
Do not set a custom Cookie header manually — Google compares it against expected JS-set values, and the mismatch is fingerprinted.

Bottom Line

Scraping Google SERPs in 2026 is a session-discipline problem, not a raw-IP problem. The IP has to be clean (mobile or good residential), but the reason most scrapers die is sloppy session management, mismatched hl/gl, header/JA4 drift, and naïve rotation timing. Get those four right and a single Polish mobile IP on ProxyPoland will quietly pull 1,000+ SERPs a day indefinitely. Get them wrong and it does not matter how many proxies you buy.

If you are starting a SERP scraper from scratch, begin with one sticky port, a curl_cffi client, and a 20-SERP session budget. Grow from there only after you see zero /sorry/ redirects over a full week. That restraint is the feature, not the bug.

How to Scrape Google SERPs Without Getting Blocked: A 2026 Anti-Detection Playbook

Why Google SERPs Are a Different Beast

The Signals Google Actually Reads

Hit the Right Endpoint

Session Stickiness: The SERP-Specific Wrinkle

Rendering: When You Need a Headless Browser (and When You Don't)

Avoiding the /sorry/ reCAPTCHA

Rate Limits That Actually Work on Google

Parsing: Don't Rely on Class Names

Mobile Proxies vs Residential for SERP

A Working Stack, End to End

What Not to Do

Bottom Line

Mobile Proxy Scrapy Setup: Integrate 4G Proxies Fast

4G Proxy Puppeteer Setup: Step-by-Step Guide

Proxy Setup in Selenium: Configure a 4G Proxy for Scraping

VLESS Connection Errors: Fixes for Reality Handshake, TLS & Timeout (2026)

VLESS vs WireGuard vs Shadowsocks vs Trojan: Protocol Comparison (2026)

Mobile Proxy Setup for GoLogin, Dolphin Anty & Multilogin — Complete 2026 Guide

Why Google SERPs Are a Different Beast

The Signals Google Actually Reads

Hit the Right Endpoint

Session Stickiness: The SERP-Specific Wrinkle

Rendering: When You Need a Headless Browser (and When You Don't)

Avoiding the /sorry/ reCAPTCHA

Rate Limits That Actually Work on Google

Parsing: Don't Rely on Class Names

Mobile Proxies vs Residential for SERP

A Working Stack, End to End

What Not to Do

Bottom Line

Related articles

Mobile Proxy Scrapy Setup: Integrate 4G Proxies Fast

4G Proxy Puppeteer Setup: Step-by-Step Guide

Proxy Setup in Selenium: Configure a 4G Proxy for Scraping

VLESS Connection Errors: Fixes for Reality Handshake, TLS & Timeout (2026)

VLESS vs WireGuard vs Shadowsocks vs Trojan: Protocol Comparison (2026)

Mobile Proxy Setup for GoLogin, Dolphin Anty & Multilogin — Complete 2026 Guide