Google Indexing in 2026: Fix & Keep Pages Indexed
Why Google skips pages in 2026, how to fix every GSC index-status reason, and a diagnostic workflow for crawled-not-indexed and discovered-not-indexed URLs.
Google processes an estimated 400 billion pages per day through its crawl pipeline — yet deliberately indexes only a fraction of the web (Google Search Central). Since 2022, Google has made selection more aggressive: its systems now weigh the "cost" of indexing against the likelihood a page will satisfy user queries. The result is that pages with thin content, poor signals, or structural problems are excluded — even when they are crawlable.
For site owners in 2026, the challenge is not getting Googlebot to visit a URL. It is passing the quality and relevance bar that determines whether a visited URL gets kept in the index.
Quick answer:
Google indexes pages selectively based on content quality, crawl budget allocation, and perceived value to searchers. The most common blockers are "Crawled — currently not indexed" (quality threshold not met) and "Discovered — currently not indexed" (crawl budget not allocated). Fix the first by improving content quality and removing thin pages. Fix the second by improving internal linking, site speed, and eliminating index bloat that wastes crawl budget.
How Google Indexing Works in 2026
Indexing is a three-stage pipeline. Each stage can silently drop a URL.
Stage 1: Discovery
Googlebot discovers URLs through:
- Sitemaps submitted in Google Search Console (Google Search Central)
- Internal links crawled from already-indexed pages
- External links from other indexed sites pointing to yours
- URL inspection requests submitted manually via Search Console or the Indexing API
Discovery adds a URL to the crawl queue. It does not guarantee a visit — Googlebot prioritises the queue based on crawl budget signals, and low-authority or slow sites may wait days or weeks before a queued URL is actually fetched.
Stage 2: Crawling
When Googlebot fetches a URL, it downloads the HTML (and renders JavaScript for pages requiring it). At this point Googlebot reads:
robots.txtdirectives (pre-fetch, not during the fetch itself)<meta name="robots">tagsX-Robots-TagHTTP headers- Canonical tags (
<link rel="canonical">) - HTTP status codes
A page blocked by robots.txt, noindex, or returning a non-200 status code (including soft 404s) will not proceed to indexing, even if crawled. See Google's robots.txt documentation for full rule syntax.
Stage 3: Indexing Decision
After crawling and rendering, Google evaluates whether the page is worth storing. This is where most invisibility problems originate. Google's systems ask:
- Is this page unique? Duplicate or near-duplicate content is consolidated under the canonical URL.
- Does this page provide value beyond what is already indexed? Thin pages, doorway pages, and over-optimised pages targeting the same intent as existing content are frequently excluded.
- Is this page trustworthy? Signals including E-E-A-T, external links, and site-level authority all factor into the evaluation (Google Search Quality Evaluator Guidelines).
Google does not publish the exact threshold, but its guidance consistently points to content that "meaningfully serves users" (Google Search Central: Core updates).
For a deeper look at the crawling side of this pipeline, see Crawling.
Reading Google Search Console Index Coverage
Search Console → Indexing → Pages gives you the definitive view of how Google sees every URL on your site. Understanding each status is essential before diagnosing a problem.
Index Coverage Status Reasons
| Status | What It Means | Primary Cause | Priority |
|---|---|---|---|
| Indexed | URL is in the index and eligible to rank | — | Monitor |
| Crawled — currently not indexed | Googlebot visited the page but decided not to index it | Content quality, thin content, duplication | High |
| Discovered — currently not indexed | URL is known but Googlebot has not yet crawled it | Crawl budget exhausted, low internal link equity, slow site | High |
| Duplicate without user-selected canonical | Multiple URLs with same content; Google chose its own canonical | Missing or inconsistent canonical tags | Medium |
| Duplicate, submitted URL not selected as canonical | You submitted a URL in a sitemap but Google chose a different canonical | Canonical mismatch between sitemap and tags | Medium |
| Page with redirect | URL redirects to another URL | Expected for redirected pages | Low (expected) |
| Soft 404 | Page returns 200 but Google treats it as empty/unhelpful | No-results search pages, thin user-generated content, empty category pages | High |
| Not found (404) | Page returns a 404 status | Deleted content, broken URL | Low (expected if intentional) |
| Blocked by robots.txt | robots.txt prevents crawling |
Inadvertent or outdated robots.txt rules | Critical if unintentional |
| Excluded by 'noindex' tag | <meta name="robots" content="noindex"> or X-Robots-Tag present |
Intentional or misconfigured noindex | Critical if unintentional |
| Alternate page with proper canonical tag | Non-canonical version of a page; canonical points elsewhere | Correct canonical implementation | Low (expected) |
| URL is unknown to Google | URL never discovered | No internal links, not in sitemap, not externally linked | Medium |
Sources: Google Search Central: Index Coverage report, Google Search Central: Fix indexing problems
Crawled — Currently Not Indexed: The Most Common Blocker
"Crawled — currently not indexed" is the status site owners encounter most frequently after Google's 2022–2024 quality tightening. It means Googlebot visited the page and decided it was not worth keeping.
Why Google Declines to Index a Crawled Page
1. Thin or low-value content. Pages with fewer than ~300 words of substantive text, pages that mostly repeat other pages on your site, or pages that list items without adding commentary or analysis are routinely excluded. Google's quality raters look for content that "demonstrates first-hand expertise and a depth of knowledge" (Google Search Quality Evaluator Guidelines).
2. Content that matches intent better on another URL. If a page targets a keyword already covered by a stronger page on your site or a competing site, Google may choose not to index the weaker candidate. This is different from duplication — the content may be unique but deemed redundant in intent.
3. Poor E-E-A-T signals. Pages on YMYL topics (health, finance, legal, safety) with no identified author, no credentials, and no external signals of authority are frequently excluded. The same pattern increasingly applies to comparative content after the December 2025 core update (Google Search Central: E-E-A-T).
4. Soft 404 behaviour. A page returning HTTP 200 but containing a message like "no results found" or "this product is out of stock" with no substantive content will be treated as a soft 404. Google's documentation explicitly identifies e-commerce out-of-stock pages, empty site-search result pages, and gateway pages with no real content as soft 404 examples (Google Search Central: Fix soft 404 errors).
5. Parasite or boilerplate content. Automatically generated text with minimal variation — common in faceted navigation, location pages, and programmatic SEO at scale — is frequently excluded when the variations are superficial.
How to Fix It
- Run URL Inspection on a sample of affected URLs — the "Coverage" section shows last crawl date and detected issues.
- Audit content depth. Does this page contain anything not already in the top 5 results for its target query? If no, expand substantively or consolidate into a stronger page via 301.
- Add E-E-A-T signals. Name the author, add credentials and a bio, link to external sources, show publication and last-updated dates.
- Fix soft 404s. Out-of-stock pages should return 404/410, redirect to a category page, or show genuinely useful alternatives.
- Submit for reindexing. Use URL Inspection → "Request indexing" for priority pages. Update the sitemap's
<lastmod>and resubmit via GSC for bulk signals (Google Search Central: Sitemaps).
Discovered — Currently Not Indexed: Crawl Budget Problems
"Discovered — currently not indexed" means Google knows the URL exists but has not yet chosen to crawl it. For small sites (under ~1,000 indexed pages), this rarely indicates a crawl budget problem — the queue will eventually clear. For large sites, it signals that Googlebot is spending its crawl allocation elsewhere and deprioritising these URLs.
What Determines Crawl Budget
Google defines two factors (Google Search Central: Crawl Budget):
- Crawl capacity limit: How fast Googlebot can crawl without overloading your server — slow TTFB directly reduces the crawl rate.
- Crawl demand: How much Googlebot wants to crawl, based on PageRank, external links, and content freshness.
Sites with low link equity, stale content, and no traffic get fewer crawl slots — leaving large URL sets perpetually in the "discovered" queue.
Index Bloat: The Hidden Crawl Budget Killer
Index bloat occurs when Google is indexing (or attempting to index) thousands of low-value URLs that consume crawl budget without contributing ranking equity. Common sources:
- Faceted navigation URLs —
/products?color=red&size=M&sort=pricegenerates exponential URL combinations - Duplicate parameter URLs — tracking parameters (
?utm_source=email) creating duplicate pages - Session ID URLs — user-session tokens appended to URLs
- Pagination beyond page 2–3 — paginated archives with no unique content
- Tag and category pages with minimal content, especially WordPress taxonomies
- Low-quality programmatic pages — location, template, or data-driven pages with thin variation
A site with 500 good pages and 50,000 bloat URLs will have its crawl budget consumed by the bloat, leaving good pages undiscovered or crawled infrequently (Google Search Central: URL structure).
How to Fix Discovered — Currently Not Indexed
- Diagnose internal linking. Pages with few or no internal links pointing to them get low crawl priority. Use Screaming Frog or similar to map pages with zero or one internal link. Add contextual links from high-traffic pages.
- Improve server response time. Target Time to First Byte (TTFB) under 200ms. Slow servers directly reduce Googlebot's crawl rate. (Google Search Central: Page Experience).
- Manage faceted navigation. Use
<meta name="robots" content="noindex">on parameter combinations that don't deserve independent ranking. Configurerobots.txtto disallow crawling of parameter patterns that generate duplicates. Implement canonical tags pointing to the clean URL. - Remove or noindex bloat. Audit your GSC Index Coverage report for pages that are indexed but generating no impressions or clicks over 12 months. Noindex or consolidate these pages to reclaim crawl budget for content that matters.
- Submit a clean XML sitemap. Include only canonical URLs you want indexed. Remove redirects, noindex pages, and parameter URLs from the sitemap. Resubmit after cleanup (Google Search Central: Build a sitemap).
Canonical Tags and Duplicate Content
Duplicate content is the second-largest indexing problem after quality. Google handles duplicates by selecting a canonical URL from the duplicate set and indexing only that one. Your job is to ensure Google picks the URL you want.
Why Canonicalisation Fails
- Inconsistent internal linking — linking to both
/page/and/page(trailing slash mismatch) creates two candidate URLs. - HTTP and HTTPS both accessible — ensure HTTP 301 redirects to HTTPS.
- www and non-www both accessible — pick one, redirect the other, set preferred domain in Search Console.
- Sitemap includes non-canonical URLs — contradicts the live canonical.
- Canonical tag conflicts with redirect — a redirect from
/page-ato/page-bwith a canonical on/page-bpointing back to/page-acreates an unresolvable loop.
Implementing Canonical Tags Correctly
Every page needs one <link rel="canonical" href="https://example.com/preferred-url"> in <head>. Self-referencing canonicals (a page pointing to itself) are correct — they block parameter-based duplication (Google Search Central: Consolidate duplicate URLs).
For e-commerce product variants: if they have distinct search demand, give each a self-referencing canonical with unique content. If not, point all variants to the base product page.
The Google Indexing API
The Indexing API allows direct notification to Google when a page is published or removed, bypassing the normal crawl queue. It was designed for job postings and live-event pages and is documented with that scope in mind (Google Search Central: Indexing API).
What the Indexing API Actually Does
- Signals to Google that a URL should be crawled immediately
- Does not guarantee indexing — it prioritises the crawl, not the indexing decision
- Works best for pages with high freshness requirements (job listings, live sports, breaking news)
Indexing API Limits (2026)
The Indexing API has strict quotas per project:
- 200 requests per day by default
- Bursting is not allowed — requests are spread across the day
- Exceeding the limit returns a 429 error
- Limit increases require an application to Google, which is rarely granted without a documented justified use case (Google Search Central: Indexing API quotas)
For most sites, the Indexing API is useful for newly published content and for signalling content removal. It is not a substitute for fixing the underlying crawl and quality issues that cause pages to remain unindexed.
IndexNow Protocol
IndexNow is an open-source protocol (supported by Microsoft Bing, Yandex, and Naver) that allows immediate URL submission when content changes (IndexNow.org). Google does not participate in IndexNow — Googlebot's crawl decisions are not influenced by IndexNow submissions. Do not rely on IndexNow as a Google indexing accelerator.
Internal Linking and Crawl Equity
Internal links are the primary mechanism Googlebot uses to discover and reprioritise pages after the initial crawl. A page with no internal links pointing to it is effectively invisible to Googlebot's crawl path — even if it is in your sitemap.
Link Equity Principles
- PageRank flows through internal links. Pages with more inlinks are crawled more frequently and evaluated with higher authority (Brin & Page, 1998).
- Contextual links > navigation links. Body-copy links carry relevance signals that sidebar/footer links do not.
- Orphan pages are deprioritised. A URL accessible only through a sitemap but with no internal links is unlikely to be crawled regularly on large sites.
Internal Linking Audit for Indexing
- Export all URLs from your sitemap.
- Crawl the site with Screaming Frog and filter to pages with fewer than 3 inlinks.
- For each, identify 3–5 related pages where a contextual link fits naturally.
- Add links, prioritising pages that show "Discovered — currently not indexed" in GSC.
For more on how internal linking interacts with crawling and ranking, see Indexing.
Diagnostic Workflow: From Problem to Fix
Use this workflow when pages are not appearing in Google search results.
Step 1 — Confirm the Page is Not Indexed
Search site:example.com/your-page-url in Google. No result = not indexed. A result = indexed but possibly not ranking (a different problem).
Also check Search Console → URL Inspection for the exact page — this gives the most accurate view including last crawl date and detected issues.
Step 2 — Check for Blockers
Run through this checklist in order:
-
robots.txt— does it block the URL or its containing directory? -
<meta name="robots">— isnoindexpresent? -
X-Robots-TagHTTP header — check withcurl -I https://example.com/page - HTTP status — does the page return 200? Check for soft 404 behaviour.
- Canonical tag — does it point to the correct URL (itself or the intended canonical)?
- JavaScript rendering — is the main content in the initial HTML, or does it require JS execution?
For JavaScript-rendered pages, use the URL Inspection Tool's "Test Live URL" → "View crawled page" to see what Googlebot actually rendered. If the main content is blank, Googlebot likely cannot render the JavaScript (Google Search Central: JavaScript SEO).
Step 3 — Identify the GSC Status
Check Search Console → Indexing → Pages. Find your URL's status from the table above and follow the fix path for that status.
Step 4 — Assess Content Quality
For "Crawled — currently not indexed" pages:
- Open the page alongside the top 3 Google results for its target query.
- If your page is materially thinner, expand it with original content, data, or expert perspective.
- If it near-duplicates another page on your site, consolidate via 301 to the stronger URL.
Step 5 — Fix, Wait, Re-check
- Use URL Inspection → "Request indexing" for high-priority pages.
- Update
<lastmod>in your sitemap and resubmit via GSC → Sitemaps. - Wait 7–21 days before re-checking GSC status.
- For large-scale bloat removals, allow 4–8 weeks for crawl budget reallocation.
Index Bloat Management at Scale
Large sites (10,000+ pages) require ongoing index hygiene, not one-time fixes.
Regular Audit Process
Run this quarterly:
- Export the indexed set from GSC → Pages → Indexed, sorted by impressions (ascending). Pages with zero impressions over 90 days are your primary bloat candidates.
- Segment by URL pattern. Identify which templates generate the most zero-impression pages — faceted nav, tag pages, paginated archives, thin landing pages.
- Decide per-segment: noindex, consolidate, 301, or delete/410.
- Implement in batches. Remove 10–15% per month, not thousands at once, so Google can reassess crawl budget gradually.
- Track crawl rate. A rising crawl rate in GSC → Settings → Crawl Stats confirms the cleanup is working.
Monitoring Crawl Stats
Search Console → Settings → Crawl Stats shows total daily crawl requests, download size per request, and response time trends. A spike in crawl errors or drop in pages crawled per day often precedes discovery or indexing problems. Check weekly for large sites (Google Search Central: Crawl Stats).
Sitemaps: What They Do and Don't Do
Sitemaps communicate URL existence to Google — they do not compel crawling or indexing. A URL in a sitemap is a suggestion, not an instruction (Google Search Central: Sitemaps).
Sitemap Best Practices for 2026
- Include only canonical URLs — never noindex pages, redirect sources, or parameter URLs.
- Keep
<lastmod>accurate. Google detects cosmetic date changes and stops trusting the field. - Use sitemap index files for sites over 50,000 URLs or 50MB per file.
- Separate content types. A news sitemap alongside your main sitemap helps Google prioritise freshness-sensitive content.
- Submit via Search Console — the GSC submission generates status reporting that
robots.txtalone does not. - Maximum 50,000 URLs per sitemap file (Google Search Central: Sitemap file requirements).
Frequently Asked Questions
How long does it take Google to index a new page?
It varies by site authority and crawl frequency. High-authority sites with strong internal links can see new pages indexed within hours. Low-authority sites may wait days or weeks for a crawl, then still land in "Crawled — currently not indexed." URL Inspection → "Request indexing" accelerates the crawl for individual pages but does not guarantee indexing (Google Search Central: Request indexing).
Why does Google index some pages but not others on my site?
Google selects pages based on content quality, crawl budget, and canonicalisation. Hundreds of near-duplicate pages — same product in multiple colours, location pages with templated copy — will result in Google indexing only the strongest representative. Segment your GSC index coverage report by URL pattern to see which templates are systematically excluded.
Should I noindex low-quality pages or delete them?
Both work differently. noindex removes the page from the index but keeps the URL accessible and internal link equity flowing. Deleting (404/410) removes the page completely — external links stop passing equity. For content you may revive, noindex is safer. For permanently retired pages, 404/410 plus a redirect to the nearest relevant URL is cleaner. Do not 301 to an unrelated page to preserve equity — Google ignores irrelevant redirects (Google Search Central: Remove content from Google).
What is the fastest way to get a new article indexed?
The most reliable combination: (1) publish with strong quality so the page passes the indexing bar; (2) add internal links from 2–3 high-traffic existing pages on publication day; (3) use URL Inspection → "Request indexing" in Search Console; (4) include the URL in your sitemap with today's <lastmod>. Internal links from existing indexed pages are the most reliable crawl trigger — do not rely on the sitemap or inspection request alone.
Does submitting a URL to the Indexing API guarantee indexing?
No. The Indexing API prioritises the crawl — it tells Googlebot to visit the URL sooner than it otherwise would. Once Googlebot visits, the standard indexing decision process applies: quality threshold, canonical evaluation, duplicate check. A low-quality page submitted via the Indexing API will still end up as "Crawled — currently not indexed." The API's quota of 200 requests per day also means it is impractical for large-scale indexing operations (Google Search Central: Indexing API overview).
Originally published in the EcomExperts SEO library.