technical

HTTP Status Codes for SEO: 2026 Complete Technical Guide

Master HTTP status codes for SEO in 2026: 2xx, 3xx, 4xx, 5xx, crawl budget, indexing effects, soft 404s, 429 rate limiting, log-file diagnostics, and migration QA.

Every HTTP status code your server returns sends a signal to Googlebot—affecting whether a URL gets crawled, indexed, or removed from search results. In 2026, with clarified crawl limits (2 MB HTML), separate per-hostname crawl budgets, and rising AI crawler traffic (4.2% of all HTML requests per Cloudflare network data, 2025 – Log Analysis Guide), understanding these signals has never been more critical. This guide covers every code category (2xx–5xx), soft 404s, 429 rate limiting, 304 caching, and provides original decision trees, checklists, and log-file diagnostic workflows—all backed by the latest Google documentation and industry research.


2xx Success: 200 OK and 206 Partial Content

200 OK – The Baseline

A 200 OK tells Googlebot the page is accessible and valid. Pages returning 200 are queued for rendering unless a noindex directive is present (Log Analysis Guide). Googlebot increases crawl frequency for pages that change often—news, prices, inventory—but crawl demand is driven by external links and user traffic (CaptainDNS Crawl Budget Guide).

Key 2026 update: Googlebot’s HTML crawl limit is 2 MB (Spotibo, citing Google Feb 3, 2026 documentation). Files exceeding 2 MB are truncated silently—no Search Console warning. The median HTML page on mobile is ~33 KB; the 90th percentile is ~151 KB (Web Almanac 2025). Keep HTML payloads under 2 MB, ideally under 100 KB, to avoid incomplete indexing.

Response time target: TTFB under 200 ms allows Google to increase the crawl rate limit (CaptainDNS). Slower responses reduce crawl frequency.

Soft 404 risk: Returning a 200 on a page with no valuable content creates a soft 404. Studies show 7.35% of web servers send 200 for unknown documents (Source 3, 2014 study; similar prevalence today). Detect soft 404s in logs by spotting multiple 200 responses with identical small byte counts (~2,450 bytes is a common signature) (Log Analysis Guide). Automated detection tools like Soft404Detector achieve precision 0.992, recall 0.980 using content-based heuristics (Source 3).

206 Partial Content

Used for byte-range requests for large files (video, PDF). Googlebot can resume interrupted downloads or fetch chunks of resources >2 MB (e.g., PDFs up to 64 MB). Proper 206 implementation improves crawl efficiency for large resources. No direct SEO impact but supports large file indexing.


3xx Redirects: 301, 302, 307, 308

Link Equity and 2026 Understanding

Google has confirmed that 3xx redirects no longer inherently dilute PageRank (Rankability, citing Search Engine Land). However, redirect chains dilute equity indirectly via latency and risk of broken hops. A 302 redirect passes PageRank just like a 301 (since 2016), but a persistent 302 may confuse canonical selection. After ~12 months, Google may treat a long-standing 302 as a 301, but don’t rely on that (Digital Applied, 2026).

  • 301 (Permanent): Old URL removed from index, new URL indexed (DBETA).
  • 302 (Temporary): Both old and new URL may remain indexed temporarily (SE Ranking blog).
  • 307 (Temporary, must preserve method): Similar to 302.
  • 308 (Permanent, preserve method): Treated like 301 for ranking. Rarely needed for HTML pages.

Chain Length and Crawl Budget Waste

Google can follow up to 10 hops but recommends under 3 hops, max 5 (redirect.pizza, Rankability). John Mueller suggests under 5 hops (Rankability citing Search Engine Journal). Each hop adds 50–300 ms latency (redirect.pizza; Lighthouse benchmarks). A 3-hop chain at 200 ms/hop adds 600 ms to TTFB, hurting Core Web Vitals.

Every redirect consumes a crawl request. On large sites, 30–50% of crawl budget can be wasted on non-essential pages (Log Analysis Guide). Best practice: Redirect directly A→C, never A→B→C (Wadi Digital blog).

AI Search and Redirect Occurrence

Only 0.79% of ChatGPT-cited URLs contain redirects, vs 5.75% in Google organic and 5.45% in AI Mode (Source 10). Users clicking ChatGPT links are 3–7 times more likely to land on the intended page directly. For AI crawlers, keep redirects minimal.

Common Redirect Pitfalls

  • CMS plugins enforcing 302: Many plugins (Redirection, Yoast Premium) default to 302. Always verify status code in logs (redirect.pizza).
  • WordPress Address/Site Address mismatch: Creates infinite redirect loops when SSL plugin fights CDN (redirect.pizza).
  • Conflicting www/non-www rules: Most common cause of redirect loops (redirect.pizza).

4xx Client Errors: 403, 404, 410, 451

403 Forbidden

Googlebot stops crawling and may retry later (Log Analysis Guide). If a page exists but returns 403, it may be treated as a soft 404. Use logs to differentiate.

404 Not Found – Crawl Budget and De-indexing Speed

Every 404 consumes a crawl request with zero indexing value (CaptainDNS). In log analysis, 2% of crawler requests typically return 404 (Log Analysis Guide). Spikes indicate broken internal links.

404 vs 410 for removal speed: A 410 (Gone) is processed faster by Google for de-indexing—days vs weeks (DBETA, Digital Applied, Log Analysis Guide). Use 410 for intentionally removed pages with no suitable redirect destination. Anti-pattern: Redirecting every deleted URL to homepage creates soft 404–like confusion. Best practice: 410 + a context-relevant link in response body (Wadi Digital blog).

410 Gone – Explicit Intent

Faster removal than 404. Use for discontinued products, deleted accounts, legal takedowns. Ensure you don’t redirect to a disconnected page.

Soft 404s – Detection and Remediation (2026 Focus)

A page returning 200 OK but with effectively empty or error-like content (e.g., database output empty, template failed) (DBETA, 2026). Prevalence: 7.35% of servers (Source 3). Detection via logs: multiple 200 responses with identical small byte counts. Google Search Console reports soft 404s under Crawl Errors (Google Search Central blog, 2010). Proposed automated detection (Soft404Detector) achieved precision 0.992, recall 0.980 (Source 3).

2026 refinement: Clearer boundary: if a page should remain accessible but not indexed, use noindex (meta tag or header), not robots.txt Disallow (Google does not support noindex in robots.txt) (DBETA, 2026).

451 Unavailable for Legal Reasons

Indicates content blocked due to legal restrictions. Not relevant for typical commercial sites.


429 Too Many Requests – Rate Limiting

Causes and Googlebot Behavior

Server returns 429 when it receives too many requests in a given time period (Source 4, Source 6, Source 7). Common triggers: SEO crawlers (Screaming Frog concurrency on Shopify), search engine crawlers on overloaded sites, scrapers, internal scripts. User report: 19% of URLs returned 429 from Screaming Frog on Shopify (Source 4).

Always include Retry-After header (required by RFC 6585) (Source 6, Discourse post on Retry-After header). Cloudflare rate limiting default status is 429 (configurable) (Cloudflare WAF docs). Important: Cloudflare passes origin’s Retry-After to the client but does not enforce it server-side—if a visitor retries early, Cloudflare still forwards (Cloudflare Community – hardekern).

Googlebot behavior: 429 causes Google to reduce crawl rate; persistent 429s create crawl gaps and may reduce the crawl capacity limit (Google Search Central docs). Do not misuse 403 or 404 for rate limiting—use 429 (or 503/500) (DBETA, 2026).

Client-Side Best Practices

  • Exponential backoff with jitter: initial 1.0s, max 60.0s, multiplier 2.0, timeout 300.0s (Source 6).
  • Batch API calls (Cloud Storage docs).
  • Whitelist trusted bots (Googlebot, Bingbot) via user-agent + reverse DNS verification (Hostwinds blog).
  • Block bad bots by IP, ASN, WAF (Hostwinds blog).

Key Data Points

  • Cloudflare API per user/account token: 1,200 per 5 minutes; per IP: 200/second (Cloudflare rate limiting docs).
  • Google Cloud Storage retries 429 by default with exponential backoff (Google Cloud Storage retry docs).
  • Gemini Enterprise Agent Platform: two quota frameworks – pay-as-you-go and Provisioned Throughput (Source 15).

304 Not Modified – Caching and Crawl Budget Efficiency

Googlebot sends conditional GET with If-Modified-Since or ETag. If content unchanged, server returns 304 with no body (Log Analysis Guide). This saves crawl budget because no content is transferred, allowing Google to recrawl more URLs within the same budget (Google Search Central docs). In log analysis, 4% of crawler requests are typically 304 (Log Analysis Guide). High 304 rate for stable resources is a positive signal.

Misconfiguration risks: If Last-Modified or ETag headers are missing, Googlebot fetches full 200 every time, wasting budget. Over-caching: sending 304 too aggressively can suppress recrawls even when content has changed—Google may delay detecting updates. 304 is NOT a redirect (DBETA, 2026)—it does not inform search engines of URL moves.

Stale-while-revalidate and serve-stale CDN policies are recommended during outages to avoid leaking 5xx to Googlebot (Digital Applied, 2026).


5xx Server Errors: 500, 502, 503, 504

5xx errors trigger exponential backoff by Googlebot; sustained errors reduce crawl frequency and may drop URLs from the index (Log Analysis Guide, AIOSEO). Retryable codes (500, 502, 503, 504) are treated as transient.

503 with Retry-After: Google respects the header and slows down. If persistent, crawl rate drops (AIOSEO). Use 503 intentionally for maintenance; keep windows under 24 hours. Sustained 503 for multiple days signals site abandonment and can cause de-indexing.

Impact on crawl budget: Repeated 5xx reduces the crawl capacity limit (Google Search Central docs). In logs, 1% of crawler requests typically return 500 (Log Analysis Guide)—investigate immediately.

Validation process: Use URL Inspection in GSC → Request Indexing after fix; then Validate Fix in Page Indexing report. Validation takes up to 2 weeks (AIOSEO). Hostload exceeded error in URL Inspection is a critical symptom of insufficient server capacity (LinkGraph, 2026).


Crawl Budget – Mechanics and Optimization (2026 Update)

Crawl budget = min(crawl rate limit, crawl demand) (CaptainDNS). Crawl rate limit adjusts based on server health (TTFB <200ms ideal). Crawl demand driven by URL popularity, freshness, sitemap presence.

2026 key update: Crawl budget is defined per hostname (e.g., example.com and shop.example.com each have separate budgets) (LinkGraph, January 2026). A new render budget concept introduced for JavaScript-heavy sites (LinkGraph, 2026).

Crawl budget is relevant mostly for sites >10,000 pages (CaptainDNS, citing Google). For smaller sites, pages are typically crawled same day.

Factors Affecting Crawl Capacity

  • Crawl health: Fast responses increase limit; errors decrease it.
  • Server performance: Slow TTFB, 5xx errors, high response times reduce capacity.
  • DNS impact: Slow DNS resolution adds latency before every request. Over 1,000 pages, 200 ms per resolution wastes 180 seconds. Googlebot interprets slow DNS as overload → reduces crawl rate. Target DNS resolution <50 ms, TTL 3,600–86,400 seconds (CaptainDNS).

Factors Affecting Crawl Demand

  • Perceived inventory: duplicates, blocked pages, redirects reduce efficiency.
  • Popularity, staleness, content freshness.

Crawl Budget Waste Sources

  • Redirect chains, 4xx/5xx errors, soft 404s.
  • noindex does NOT prevent crawling—use robots.txt Disallow or 410 to stop crawling (Log Analysis Guide).
  • Crawl traps: infinite calendars, facet combinations, session IDs in URLs (CaptainDNS).
  • Orphan pages with bot traffic (no internal links).
  • Tracking parameter pollution: Block ?utm_, ?fbclid, etc., in robots.txt rather than redirecting.

Best Practices to Maximize Efficiency

  • Consolidate duplicate content (Google Search Central docs).
  • Block unimportant URLs via robots.txt (not noindex) (Google Search Central docs).
  • Return 404/410 for removed pages.
  • Eliminate soft 404 errors.
  • Keep sitemaps up to date with <lastmod>.
  • Avoid long redirect chains.
  • Keep HTML efficient (<100 KB ideal).
  • Monitor availability issues.
  • Use clean URL structure.

Case Study (2026)

An e-commerce site reduced crawl waste by 73% (from 45% to 12%) after blocking filter URLs, cleaning sitemaps, and optimizing server response. New product indexing time dropped from 21 days to 4 days (LinkGraph, 2026).

GSC Crawl Stats Benchmarks

  • Average response time: **<200 ms ideal**; 200–500 ms good; 500–1000 ms problematic; >1000 ms guaranteed reduced crawl budget (Amit Tiwari, YouTube: "Crawl Budget Complete Guide").
  • Correlation: When response time decreases, requests increase, and vice versa.

Indexing Effects of HTTP Status Codes

Status Code Indexing Behavior
200 OK Normal indexing path; subject to quality signals, noindex, canonical.
301/308 Old URL removed from index; new URL indexed (DBETA).
302/307 Both old and new URL may be indexed temporarily (SE Ranking blog).
404 Google drops URL over time; may retry periodically.
410 Faster removal than 404 (explicit permanent deletion).
403/429/5xx Google may pause indexing; repeated errors can lead to de-indexing.
304 No impact on indexing; just caching behavior. Does not trigger recrawl.

JavaScript and non-200 status: As of December 2025, pages with non-200 status may not be sent to rendering (Google JavaScript SEO basics). This means JavaScript content on error pages is not executed—preventing potential rendering-based soft 404 detection.

AI crawlers (2026): Separate from Googlebot; different indexing goals. Only 14% of top 10,000 domains had AI-specific robots.txt rules as of mid-2025 (Log Analysis Guide). Verify bots by IP reverse DNS, not just user-agent (Log Analysis Guide). GPTBot growth +305% YoY; PerplexityBot growth +157,490% from near zero (Log Analysis Guide).


Migration QA – 301 vs 302 vs 410 Decision Tree

Permanent vs Temporary Decision

Is the URL permanently moved?
├── Yes → Use 301 or 308
│        ├── Update internal links, canonicals, sitemaps to new URL.
│        └── Avoid homepage-only redirects (treated as soft 404).
├── No (A/B test, temporary promotion) → Use 302 or 307
└── Content intentionally removed?
     ├── No suitable redirect destination → Use 410 (faster de-indexing)
     └── Redirect to relevant context? → Use 301 to closest relevant page.

Pre-Migration Checklist (condensed)

  1. Capture baseline: 12 months organic traffic, top keywords, indexed count, positions, Core Web Vitals (Siteimprove migration guide).
  2. Full site crawl (Screaming Frog/Sitebulb) – catalog every URL and status code.
  3. Build URL map: 1:1 mapping for most URLs; avoid homepage-only redirects (Wadi Digital blog).
  4. Update internal links at source code level – don’t rely on redirects.
  5. Audit hreflang, structured data.
  6. Set up new GSC property.
  7. Pre-launch QA crawl on staging – check canonical tags, no noindex, robots.txt not blocking important paths.
  8. Prioritize high-value URLs: pages with organic entrances, conversions, revenue (Siteimprove migration guide).

Cutover and Post-Migration Monitoring

  • Deploy 301 (or 308) redirects.
  • Remove staging protection.
  • Update DNS (lower TTL 24–48h before).
  • Submit sitemap to GSC.
  • Use Change of Address tool (domain moves only).
  • Test sample redirects with curl -I -L (redirect.pizza).
  • First 72 hours: largest swings in indexed count and crawl rate.
  • Daily checks for 2 weeks: Coverage report spikes.
  • Expect 10–25% traffic drop first 1–4 weeks; >30% drop after week 4 needs investigation (Ahrefs migration study).
  • Compare indexed page count within 5–10% of baseline.
  • Monitor logs: if Googlebot still hitting old URLs after 30 days, update internal links.
  • Maintain redirects for at least 12 months (Wadi Digital blog).

Common Migration Mistakes

  • Redirecting all old URLs to homepage (soft 404s).
  • Allowing redirect chains.
  • Leaving canonical tags pointing to staging URLs.
  • Skipping internal link update (forces redirect chain for navigation).
  • Cutting redirects after 30–60 days (keep 12 months minimum).

Log-File Diagnostics for Status Code Analysis

Logs are the ground truth for crawl diagnostics: exact timestamps, status codes, response times, bytes transferred for every Googlebot request (Log Analysis Guide). Search Console is sampled and delayed.

Key Diagnostic Patterns

  • Status code distribution: Healthy example: 87% 200, 6% 301, 4% 304, 2% 404, 1% 500 (Log Analysis Guide). Deviation warrants investigation.
  • Redirect chains: URLs appearing multiple times with 301/302 indicate internal links not updated.
  • Soft 404s: Multiple 200 responses with identical small byte counts.
  • Crawl budget waste: Parameter variants receiving more crawl than canonical product pages.
  • 5xx only to bots: Check if 5xx responses correlate with Googlebot IPs.
  • Low 304 rate: Suggests missing caching headers.

Bot Identification (2026 Criticality)

Verify Googlebot by reverse DNS: host [IP] must end in googlebot.com, google.com, or googleusercontent.com; then forward lookup (Log Analysis Guide). For non-Google bots, use IP JSON files (e.g., openai.com/gptbot.json). Unverified “Googlebot” requests can produce unreliable conclusions.


Common CMS and CDN Pitfalls (2026)

WordPress

  • Plugin-enforced 302s: Redirection, Yoast Premium default to 302 – always verify (redirect.pizza).
  • Settings conflict: WordPress Address/Site Address mismatch creates loops (redirect.pizza).
  • SSL plugin vs CDN: Double SSL termination can cause chains (redirect.pizza).
  • Canonical tag issues: Staging environment canonicals surviving launch.

Shopify

  • Aggressive bot throttling: Default Screaming Frog concurrency triggers 429. Lower threads to 1–2 and add delay (Source 4, Hostwinds blog).

Cloudflare

  • Default 404 returning 200: Custom 404 pages may return 200 if not configured to pass through correct status – common soft 404 source.
  • Rate limiting as 503: Some rules return 503 instead of 429; Googlebot treats 503 as transient, may retry more aggressively.
  • Origin Retry-After not enforced: Cloudflare does not block early retries (Cloudflare Community).
  • Cookie-based redirect loops: CDN plus origin both handling www/non-www/HTTPS.

General CDN Issues

  • Stale-while-revalidate: Recommended during outages to avoid 5xx to Googlebot (Digital Applied, 2026).
  • Geo-IP redirect anti-pattern: Never use 302 based on IP for canonical locale – use hreflang instead.

FAQ – Practitioner Questions

Q: Should I use 404 or 410 for deleted pages? A: Use 410 for intentionally removed content – faster de-indexing (DBETA, Digital Applied). Use 404 for temporary removals or when unsure. Never redirect to homepage if the page no longer serves a purpose.

Q: How does Googlebot handle 429? A: It reduces crawl rate (GSC docs). Include a Retry-After header. Persistent 429s can shrinks crawl budget and delay indexing.

Q: Does 302 pass link equity? A: Yes, since 2016 Google passes PageRank through 302 redirects (Digital Applied). But a persistent 302 may confuse canonical selection; don’t rely on it as a permanent solution.

Q: How many redirect hops are safe? A: Keep under 3 hops, maximum 5 (redirect.pizza, Rankability). Google can follow 10 but each hop adds latency and wastes crawl budget.

Q: What’s the biggest crawl budget waste in 2026? A: Redirect chains, soft 404s, and parameter pollution. On large sites, 30–50% of crawl budget can be wasted (Log Analysis Guide).

Q: How do I detect soft 404s? A: Use log analysis: look for multiple 200 responses with identical small byte counts. Also check Google Search Console Crawl Errors report (GSC blog, 2010). Automated tools with C4.5 classifier achieve 99% precision (Source 3).

Q: Does noindex stop crawling? A: No – noindex prevents indexing but Googlebot still crawls the page (Log Analysis Guide). Use robots.txt Disallow or 410 to stop crawling.

Q: Should I worry about the 2 MB HTML crawl limit? A: Yes – if your HTML body exceeds 2 MB, Googlebot truncates it silently (Spotibo test). Keep payload under 100 KB for optimal indexing.


Summary of Key Statistics and Benchmarks

Metric Value Source
Googlebot HTML crawl limit (2026) 2 MB Spotibo, Google
PDF crawl limit 64 MB Spotibo
Median HTML page weight (mobile) ~33 KB Web Almanac 2025
90th percentile HTML weight ~151 KB Web Almanac 2025
Google max redirect hops 10 redirect.pizza
Recommended max chain 3 hops redirect.pizza, Rankability
Latency per redirect hop 50–300 ms redirect.pizza
Ideal TTFB for crawl speed <200 ms CaptainDNS
DNS resolution target <50 ms CaptainDNS
Cloudflare API per user token limit 1,200/5 min Cloudflare docs
GPTBot YoY growth +305% Log Analysis Guide
PerplexityBot growth +157,490% Log Analysis Guide
AI bot % of all HTML requests (Cloudflare 2025 avg) 4.2% Log Analysis Guide
Sites with AI robots.txt rules (mid-2025) 14% Log Analysis Guide
Crawl budget waste on large sites (est.) 30–50% Log Analysis Guide
Soft 404 prevalence 7.35% of servers Source 3
Migration traffic loss (typical) 10–25% first 4 weeks Ahrefs study
410 de-indexing speed Days vs weeks for 404 DBETA, Digital Applied
429 retry exponential backoff (recommended) 1s initial, 60s max Source 6

For deeper dives, see our related guides: Crawl Budget Optimization, 301 Redirect Best Practices, and Log File Analysis for SEO.

Originally published in the EcomExperts SEO library.

Ready to Become One of Our Success Stories?

Book a free 30-minute consultation and get a custom SEO strategy that will increase your revenue, not just your traffic. We'll show you exactly how to outrank your competitors and capture more customers.

Book your Free 30-minute Consultation Now