Why Google SERPs Are a Different Beast
Most anti-bot guides lump "the web" together and hand you a checklist. Google's Search results page does not care about most of that checklist. Googlebot-style fingerprinting, passive TLS signals, mouse entropy, and a behavioral model trained on billions of real queries make Google one of the hardest surfaces on the public internet to scrape at scale. This guide is narrowly scoped: how to keep a SERP scraper alive, not how to scrape the web in general.
If you are scraping product pages, news sites or social feeds, most of what follows is overkill. But if your job is rank tracking, competitor monitoring, or feature-box extraction (People Also Ask, AI Overviews, Knowledge Panels, Top Stories), you need a Google-specific playbook.
The Signals Google Actually Reads
In 2026, Google's SERP anti-abuse stack evaluates at least six layers before it serves you a result page. Skipping one is usually enough to earn a sorry/index redirect or an inline /sorry/ reCAPTCHA challenge.
- IP reputation + ASN class: datacenter ASNs (AS14061, AS16509, AS16276) are penalized harder than residential and mobile ASNs. Orange Polska mobile ranges are treated as consumer traffic.
- TLS fingerprint (JA3/JA4): curl and plain
requestsannounce themselves immediately. Google compares JA4 against the User-Agent string. - HTTP/2 pseudo-header ordering: browsers send
:method,:authority,:scheme,:pathin a specific order. Most scrapers do not. - Consent state (
CONSENTcookie): absence on EU IPs triggers a consent wall that kills a naïve scraper. - Query entropy: same query, same session, high frequency = bot. Humans reformulate.
- Click-through and dwell: a session that only fetches
/searchand never clicks a result is a strong bot signal; Google remembers sessions via theNIDcookie.
Hit the Right Endpoint
Do not scrape www.google.com/search blindly. The SERP surface changes based on which endpoint you hit:
/search?q=...&hl=en&gl=us&num=20&pws=0— desktop, personalization off, 20 organic results./search?q=...&tbm=nws— news vertical, different layout, different HTML./search?q=...&udm=14— "Web" filter introduced in 2024 that hides AI Overviews. Useful for rank trackers that want classic blue links only./search?q=...&brd_json=1— does not exist; any guide that tells you to append undocumented flags is wrong and usually triggers a block.
The hl (host language) and gl (geo) parameters matter more than people realize. They do not geolocate you — your IP does — but they do control the SERP UI, the spell-corrector, and which result set Google picks. If your IP is Polish mobile but gl=us, Google will sometimes serve you a transitional interstitial before the SERP. Match them: a Polish mobile IP should send hl=pl&gl=pl unless you are explicitly testing cross-locale ranking.
Session Stickiness: The SERP-Specific Wrinkle
For generic scraping, everyone tells you to rotate IPs aggressively. For Google SERPs, that advice is half wrong. Google hands you an NID cookie on first contact and expects to see it come back with the same IP for at least a short window. Rotate the IP mid-session and you look like a hijacked cookie — instant challenge.
The practical rule: one IP per session, one session per keyword batch. A "session" here means a warm-up request to google.com, a consent acceptance if the IP is in the EEA, then 5–40 SERP fetches that reuse the same cookie jar. Then rotate, burn the cookies, and start over.
On ProxyPoland mobile proxies, you do this with the per-port sticky mode:
import httpx
PROXY = "http://user:pass@api.proxypoland.com:10001" # port 10001 = sticky
async with httpx.AsyncClient(
proxies=PROXY,
http2=True,
headers={
"User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
},
timeout=20.0,
follow_redirects=True,
) as client:
await client.get("https://www.google.com/") # warm-up, collect NID
for kw in keyword_batch:
r = await client.get(
"https://www.google.com/search",
params={"q": kw, "hl": "en", "gl": "us", "num": 20, "pws": 0},
)
yield parse_serp(r.text)
Rendering: When You Need a Headless Browser (and When You Don't)
The honest answer is: for classic ten-blue-link SERPs, raw HTTP is still fine in 2026 if your TLS and header stack are correct. Where a headless browser becomes non-optional:
- AI Overviews — injected via client-side JS after the initial response. Raw HTML has only a skeleton.
- People Also Ask expansion — each expansion is a separate XHR to
/async/ge_ansp; parsing the initial HTML only gives you the four seed questions. - Shopping / Products pack — carousels are hydrated from a lazy-loaded JSON blob.
- Discover / local pack reviews — same pattern.
For these, use Playwright with the mobile Chrome profile on a mobile proxy, and do one critical thing that half the SERP scrapers forget: disable the --disable-blink-features=AutomationControlled dance and use patched undetected-chromedriver or Playwright with the stealth plugin. Google checks navigator.webdriver, Chromium's Runtime.Enable CDP chatter, and the presence of unusual CDP bindings.
Avoiding the /sorry/ reCAPTCHA
A /sorry/index?continue=... redirect is Google's soft-block. Hitting one does not mean your IP is dead — it means your current session needs to solve a reCAPTCHA v2 challenge before Google will serve SERPs again. Your options are:
- Cut the session: throw away cookies, rotate the mobile IP, warm up a new session. This is the correct default for non-interactive scrapers — solving reCAPTCHAs is expensive and slow.
- Solve it: 2Captcha / CapMonster can solve the v2 image challenge in ~15–40 s at $1–3 per 1,000 solves. Only worth it if you are paying per IP and burning one for every challenge is more expensive than solving.
- Back off and retry later: the rate limit on a specific IP/cookie pair usually decays within 15–60 minutes.
The single best predictor of /sorry/ frequency is query shape, not volume. Scraping 500 different long-tail keywords per hour from one mobile IP is fine. Scraping the same 5 high-commercial-intent head terms 100 times each is not — Google's query-entropy signal flags it in minutes.
Rate Limits That Actually Work on Google
Published numbers from other blogs (8–15 s between requests, 1 request per minute) are outdated and cargo-culted. In 2026 what matters is:
- Jitter, not fixed delay: 3–9 s uniformly randomized, with 10% chance of a 20–40 s "user reading" pause.
- Daily cap per IP: budget ~600–1,500 SERP fetches per mobile IP per day. Above that, soft blocks spike.
- Session cap: 20–50 SERPs per cookie jar, then rotate.
- No parallel fetches on the same IP: one request at a time per proxy port. Parallelism goes across ports, not within one.
Parsing: Don't Rely on Class Names
Google renames CSS classes roughly every 4–8 weeks. A selector like div.yuRUbf > a will break before your quarter ends. Anchor on stable attributes instead:
- Organic results:
div[data-hveid][data-ved] h3and the nearest ancestor<a href>whosehrefdoes not start with/search?. - Sitelinks: child
<a>tags inside the samedata-hveidblock. - Snippets:
div[data-sncf="1"]or the first<span>after thecitetag. - AI Overviews: block with
data-subtree="aio"— stable since mid-2024.
If you can, offload this to an API that maintains selectors for you (SerpApi, Oxylabs SERP API, Bright Data SERP). At ProxyPoland we provide the transport layer — the Polish mobile IPs that do not get blocked — but the parsing game is a full-time job and outsourcing it is rational.
Mobile Proxies vs Residential for SERP
Residential proxies (rotating pools sourced from consumer devices) are the default recommendation in most guides. For Google SERPs specifically, real mobile IPs have two structural advantages:
- CGNAT tolerance: Google knows mobile carriers NAT thousands of real users behind the same IP. A high request rate from a carrier IP is less suspicious than the same rate from a residential IP, because it is statistically normal.
- Stable session windows: consumer residential proxies churn the IP whenever the peer disconnects. That breaks the sticky-session pattern SERP scrapers need. A dedicated mobile port holds the IP as long as you want.
The downside: mobile IPs are slower (50–150 ms added latency) and more expensive per IP. For rank tracking at a few thousand queries per day, the math tips toward mobile. For "scrape all of SERP for every keyword on Earth", you need a large residential pool with a specialized SERP API layer on top.
A Working Stack, End to End
- Transport: ProxyPoland mobile ports in sticky mode, one port per concurrent worker.
- HTTP client:
curl_cffi(Python) orundiciwith a patched JA4 fingerprint (Node). Plainrequestsandaxioswill be flagged. - Headers: real mobile Chrome 120+ UA, matching
sec-ch-uahints,Accept-Languagealigned withhl. - Cookies: accept
NID,CONSENT,SOCS; persist across batch, drop on rotation. - Renderer: Playwright with stealth, mobile emulation, only for AI Overviews / PAA expansion.
- Parser: attribute-based selectors, fuzzy fallbacks, daily regression fixtures.
- Rotation: session-ends, /sorry/ triggers, or 20–50 SERPs, whichever comes first.
What Not to Do
- Do not share cookie jars across IPs. One of the fastest ways to get every proxy you own soft-blocked.
- Do not scrape
suggestqueries.google.comas a substitute — it has a much stricter rate limit than SERPs. - Do not ignore the consent wall on EU IPs. Accept once per session, store the
SOCScookie, move on. - Do not scrape signed-in Google (
accounts.google.comcookies). You will lose accounts and proxies simultaneously. - Do not set a custom
Cookieheader manually — Google compares it against expected JS-set values, and the mismatch is fingerprinted.
Bottom Line
Scraping Google SERPs in 2026 is a session-discipline problem, not a raw-IP problem. The IP has to be clean (mobile or good residential), but the reason most scrapers die is sloppy session management, mismatched hl/gl, header/JA4 drift, and naïve rotation timing. Get those four right and a single Polish mobile IP on ProxyPoland will quietly pull 1,000+ SERPs a day indefinitely. Get them wrong and it does not matter how many proxies you buy.
If you are starting a SERP scraper from scratch, begin with one sticky port, a curl_cffi client, and a 20-SERP session budget. Grow from there only after you see zero /sorry/ redirects over a full week. That restraint is the feature, not the bug.
