tools

New Screaming Frog Features: Complete Guide & Advanced Tips

Master the latest Screaming Frog updates including AI semantic analysis, Core Web Vitals metrics, Ahrefs integration, and more. Step-by-step guide for advanced

This comprehensive guide delves into the latest advancements within Screaming Frog SEO Spider, offering an in-depth look at new features, their functionalities, and practical, step-by-step instructions for leveraging them in sophisticated SEO workflows.

1. Topic Overview & Core Definitions

Screaming Frog SEO Spider is a leading desktop-based website crawler, designed to extract data and audit common SEO issues. Its continuous evolution introduces powerful new capabilities, particularly in AI-driven content analysis, performance diagnostics, and enhanced data integration. This guide focuses on features released in recent major updates (primarily versions 20.0 through 24.1) and other significant additions, providing SEO professionals with the knowledge to harness these tools for deeper insights and more efficient audits.

Why it matters: These new features empower SEOs to:

  • Conduct more nuanced content audits using AI-driven semantic analysis.
  • Gain deeper insights into website performance and Core Web Vitals.
  • Integrate third-party data seamlessly for holistic analysis.
  • Streamline large-scale crawl management and reporting.
  • Address complex technical SEO challenges with greater precision.
  • Automate workflows and interact with crawl data via natural language using the official MCP (Model Context Protocol) server (v24.0+).

Key concepts and terminology:

  • AI-powered features: Functionalities leveraging artificial intelligence for tasks like semantic analysis, content clustering, and contextual search.
  • Semantic Similarity: A measure of how closely related two pieces of text are in meaning, rather than just keyword overlap.
  • Content Clustering: Grouping pages together based on their topical relevance and semantic similarity.
  • Lighthouse & PSI Insight Audits: Enhanced integration with Google's PageSpeed Insights and Lighthouse for performance and user experience diagnostics (v23.0+).
  • API Integration: Connecting Screaming Frog with external data sources (e.g., Ahrefs, OpenAI, Gemini) to enrich crawl data.
  • Crawl Retention/Auto-Deleting Crawls: Features for managing the storage and lifecycle of crawl data.
  • Document Request Latency & LCP Request Discovery: New metrics providing granular insights into page load timing and Largest Contentful Paint (LCP) resource discovery.
  • MCP (Model Context Protocol): An open protocol that enables LLMs to interact with Screaming Frog. The official MCP server provides ~29 tools for crawl control, reporting, and data export via natural language (v24.0+).
  • Auto Compare Crawls: Automatically compares the last two scheduled crawls and highlights changes in issues, pages added/removed, and more (v24.0+).
  • Uncrawlable Links: Links that do not conform to Google’s best practices (e.g., <span href>, <a onclick>), now detected as a new issue (v24.0+).

2. Foundational Knowledge

The new features build upon Screaming Frog's core crawling and data extraction capabilities, extending its reach into AI-driven content intelligence, advanced performance analysis, and seamless external data integration. Understanding the underlying principles—how Screaming Frog crawls, renders JavaScript, and processes data—is crucial for maximizing the utility of these new functionalities. Many of these features operate post-crawl or during specific data collection phases (e.g., PSI audits, API calls), requiring correct initial crawl configurations.

3. Comprehensive Implementation Guide

This section details how to access, configure, and utilize the most impactful new features.

3.1. AI-Powered Features (Semantic Similarity, Content Clustering, Semantic Search) - Introduced in v22.0

These features require an OpenAI API key (or similar AI model integration) to function, as they rely on large language models for semantic analysis.

Requirements:

  • A valid OpenAI API key (or access to Gemini 1.0, Anthropic Claude, Ollama, or other integrated AI models). Supported providers now include OpenAI, Gemini, Anthropic, Ollama, and custom endpoints (DeepSeek, Grok, Azure OpenAI, OpenRouter, LM Studio) – source.
  • Sufficient API credits, as analyses consume tokens.

Step-by-step procedures:

  1. Configure API Key:

    • Navigate to Configuration > API Access > [Provider].
    • Enter your API Key.
    • (Optional) Adjust Max Concurrent Requests, Request Delay, and Timeout to manage API usage and avoid rate limits.
    • (Optional) Configure model settings (e.g., Model, Temperature) for more granular control over AI responses if exposed in the UI. As of v23.0, default models are: OpenAI → gpt-5-mini, Gemini → gemini-2.5-flash, Anthropic → claude-sonnet-4-5source.
  2. Enable Semantic Analysis during Crawl:

    • For content-related analyses, ensure Configuration > Spider > Content > Extract All Content (or specific elements like H1, Meta Description, Body Text) is enabled to gather the necessary text data.
    • Specific AI features might have their own toggles under Configuration > AI (or Semantic). Ensure these are checked.
  3. Performing Semantic Similarity Analysis:

    • After a crawl, navigate to the Internal tab.
    • Select Reports > Semantic Similarity (or similar menu option).
    • You may be prompted to select the source content for comparison (e.g., H1, Body Content).
    • Screaming Frog will then process the content through the AI API, calculating similarity scores between specified pages.
    • Output: A report typically showing pairs of URLs with their semantic similarity scores, often highlighting potential content duplication or cannibalization issues. The default threshold is 0.95, adjustable down to 0.5 – source.
  4. Utilizing Content Clustering:

    • This feature often follows semantic similarity analysis.
    • Go to Reports > Content Clustering.
    • The tool will group pages into clusters based on their semantic relationships.
    • Output: A visualization or table showing distinct topic clusters and the URLs belonging to each, aiding in identifying content gaps, consolidation opportunities, or topic authority. In v23.0, you can right-click to show inlinks/outlinks within clusters – source.
  5. Leveraging Semantic Search:

    • This is typically an in-app search functionality.
    • In the main interface, use the search bar or a dedicated "Semantic Search" filter.
    • Instead of keyword matching, type a query representing a concept or topic.
    • Screaming Frog will return pages semantically related to your query, even if they don't contain the exact keywords.
    • Use Case: Finding related content for internal linking, identifying comprehensive topical coverage, or discovering content gaps.

Advanced AI Configuration (v21.0+, enhanced v22.0-v24.0):

  • Up to 100 prompts per crawl. Targets: Page Text (auto-excludes nav/footer), HTML, Custom Extraction – source.
  • Advanced prompts can combine multiple target elements (page text + custom extractors) via the cog icon.
  • Run prompts only against URLs matching a segment (e.g., only images missing alt text) under Prompt Configuration > cog > "No Segment Matching" dropdown.
  • System Wide Prompt for AI: set a chat system prompt in Advanced tab of each AI provider (v24.0+).
  • Live model validation alerts if a deprecated model is selected (v24.0+).
  • Token usage per provider displayed on Account Information dialog (v24.0+).
  • Ollama image generation and request timeout adjustments (v24.0+).

3.2. Lighthouse & PSI Updated to Insight Audits - Introduced in v23.0

This feature deepens the integration with Google's performance tools, providing more actionable recommendations directly within Screaming Frog. It is based on Lighthouse 13 and includes 7 new issues, 11 removed/consolidated, and 6 renamed – source.

New issues introduced:

  • Document Request Latency
  • Improve Image Delivery
  • LCP Request Discovery
  • Forced Reflow
  • Avoid Enormous Network Payloads
  • Network Dependency Tree
  • Duplicated JavaScript

Removed/Consolidated: Defer Offscreen Images, Preload Key Requests, Efficiently Encode Images (merged into Improve Image Delivery), etc.

Renamed: e.g., Eliminate Render-Blocking Resources → Render Blocking Requests.

Requirements:

  • Internet connection for API calls to Google's PageSpeed Insights.
  • API key for PageSpeed Insights (though often not strictly required for basic usage, it can increase quota).

Step-by-step procedures:

  1. Configure PSI Integration:

    • Navigate to Configuration > API Access > PageSpeed Insights.
    • (Optional) Enter your PageSpeed Insights API Key.
    • Select Strategy (Mobile, Desktop, or Both).
    • Adjust Batch Size and Delay to manage API requests.
    • Ensure Enable PageSpeed Insights is checked.
  2. Run a Crawl with PSI Audits:

    • Start a standard crawl.
    • As pages are crawled, Screaming Frog will queue them for PSI analysis.
    • The PSI data will populate columns in the main interface (e.g., PSI Score, FCP, LCP, CLS).
  3. Access Insight Audits:

    • Select a URL in the main window.
    • In the lower window pane, navigate to the PageSpeed Insights tab.
    • Instead of just raw scores, the "Insight Audits" present a more structured and actionable breakdown of performance issues, often with specific recommendations and links to Lighthouse documentation.
    • Output: Detailed reports for each URL, highlighting opportunities for improvement in performance, accessibility, best practices, and SEO, with color-coded severity.

3.3. Ahrefs v3 API Update - Introduced in v23.0

This update ensures compatibility with the latest Ahrefs API using OAuth authentication, working on any paid Ahrefs plan – source.

Requirements:

  • A valid Ahrefs account with API access.
  • An Ahrefs API Key (OAuth flow).

Step-by-step procedures:

  1. Configure Ahrefs API:

    • Navigate to Configuration > API Access > Ahrefs.
    • Enter your Ahrefs API Key.
    • Select the desired Metrics to retrieve (e.g., Domain Rating, URL Rating, Referring Domains, Backlinks, Organic Traffic, Keywords, Cost).
    • Configure Batch Size and Delay to manage API usage and stay within Ahrefs' rate limits.
    • Ensure Enable Ahrefs API is checked.
  2. Integrate Ahrefs Data during Crawl:

    • Perform a crawl.
    • Screaming Frog will automatically make API calls to Ahrefs for each crawled URL (or for the domain, depending on selected metrics).
    • Ahrefs data will populate dedicated columns in the main interface.
    • Use Cases: Combine on-page data with off-page metrics for a holistic view; identify pages with high UR/DR but low organic traffic, or vice-versa; prioritize pages for link building based on internal linking structure and Ahrefs metrics.
    • In v24.0, Ahrefs Country Level Metrics were added, allowing filtering by country and volume (monthly or average) – source.

3.4. Auto-Deleting Crawls (Crawl Retention) - Introduced in v23.0

This feature helps manage disk space and data retention by automatically removing old crawl data.

Requirements:

  • Screaming Frog SEO Spider.

Step-by-step procedures:

  1. Configure Crawl Retention:

    • Navigate to File > Configuration > Crawl Retention (or File > Settings > Crawl Retention in newer versions).
    • Enable Automatically Delete Old Crawls.
    • Set the Retention Period (e.g., 7 days, 30 days, 90 days). Default is "Never" – source.
    • (Optional) Configure Maximum Disk Space to cap the total space used by crawls, or Maximum Number of Crawls to limit the quantity.
    • Click OK to save settings.
  2. Impact:

    • Screaming Frog will periodically check for and delete crawl files older than the specified retention period, freeing up disk space.
    • Best Practice: Set a retention period that aligns with your audit cycles and data analysis needs. For daily crawls, a shorter period might be appropriate; for monthly deep dives, a longer one.

3.5. New Metrics: Document Request Latency & LCP Request Discovery

These provide granular insights into page load performance, crucial for Core Web Vitals optimization. Introduced as part of the Insight Audits in v23.0 – source.

Requirements:

  • JavaScript Rendering enabled (Configuration > Spider > Rendering > JavaScript).
  • A crawl where JavaScript is rendered.

Step-by-step procedures:

  1. Enable JavaScript Rendering:

    • Go to Configuration > Spider > Rendering.
    • Select JavaScript from the dropdown.
    • Adjust Rendering Timeout and AJAX Timeout as needed for complex sites.
  2. Run a Crawl:

    • Start a crawl with JavaScript rendering enabled.
    • Screaming Frog will render each page and collect performance metrics.
  3. Access Metrics:

    • After the crawl, navigate to the Internal tab.
    • Locate columns like Document Request Latency and LCP Request Discovery.
    • Document Request Latency: Shows the time taken for the initial HTML document request. High latency here can indicate server issues, slow routing, or network problems.
    • LCP Request Discovery: Indicates how quickly the Largest Contentful Paint (LCP) resource was discovered by the browser. A high value suggests the LCP resource is not easily discoverable in the initial HTML response, often due to deferred loading, hidden elements, or complex JavaScript.
    • Use Cases: Pinpoint bottlenecks affecting LCP; identify render-blocking resources; prioritize optimization efforts by understanding the critical path to rendering the main content.

3.6. Enhanced JavaScript Rendering Improvements

Ongoing enhancements to JavaScript rendering ensure more accurate and complete crawling of modern, client-side rendered websites. Custom JavaScript Snippets (v20.0+) allow running arbitrary JS during crawl (e.g., mouseover, scroll, extract from Chrome console) – source.

Requirements:

  • Enable JavaScript rendering (Configuration > Spider > Rendering > JavaScript).

Step-by-step procedures:

  1. Configure Rendering:

    • Ensure JavaScript rendering is selected.
    • Adjust Rendering Timeout and AJAX Timeout (e.g., 10-20 seconds for complex SPAs).
    • Consider Screen Size and User Agent settings in Configuration > User-Agent to mimic specific browser environments.
  2. Crawl and Analyze:

    • Perform a crawl.
    • Pay attention to Response Codes (especially 200 OK for rendered content), Word Count, and H1/Title elements to verify that content is being correctly rendered and extracted.
    • Use the Rendered Page tab in the lower pane to visually inspect how Screaming Frog sees the page after JavaScript execution.
    • Use Cases: Audit SEO for Single Page Applications (SPAs) or heavily JavaScript-driven sites; detect content hidden by JavaScript that might be crawlable; identify rendering issues that could impact indexing.

3.7. Enhanced Redirect Chain Analysis

Provides a clearer, more comprehensive view of complex redirect paths.

Requirements:

  • Standard crawl settings.

Step-by-step procedures:

  1. Run a Crawl:

    • Perform a standard crawl.
  2. Access Redirect Chains:

    • Navigate to the Response Codes > Redirection (3XX) filter.
    • Select a URL that is a redirect.
    • In the lower window pane, go to the Redirects tab.
    • The enhanced analysis will display the full redirect chain (source URL -> intermediate redirects -> final destination).
    • Output: Clear visualization of each hop in the redirect chain, including response codes and URLs.
    • Use Cases: Identify excessively long redirect chains (3+ hops) that waste crawl budget and introduce latency; detect redirect loops; find broken redirects or redirects to non-existent pages; audit migration redirects.

3.8. Custom JavaScript

This allows users to inject and execute their own JavaScript code during the rendering process, enabling highly customized data extraction or manipulation. Introduced in v20.0 – source.

Requirements:

  • JavaScript Rendering enabled.
  • Knowledge of JavaScript.

Step-by-step procedures:

  1. Enable Custom JavaScript:

    • Go to Configuration > Custom > Custom JavaScript (v20.0+).
    • Click Add.
    • Write or paste your JavaScript code. Snippets can be saved to a library.
    • The code executes in the context of the rendered page.
  2. Example Use Case: Extracting data from JavaScript variables, manipulating the DOM before extraction, or logging specific events.

    • Example (Conceptual): If a specific data point is in a JavaScript variable window.myApp.productID, your custom JS could expose this to the DOM or log it for Screaming Frog to pick up via another custom extractor.
  3. Run Crawl:

    • Perform a crawl with JavaScript rendering enabled.
    • Screaming Frog will execute your custom script on each page.
    • Output: The results of your custom script can then be extracted using standard XPath/CSSPath/Regex based on how your script modified the page or exposed data.
    • Use Cases: Extracting complex product data from e-commerce sites; auditing custom analytics implementations; testing JavaScript-based content visibility.

3.9. Schema.org v27 Support

Ensures that Screaming Frog can accurately parse and validate the latest version of Schema.org markup.

Requirements:

  • Schema Markup extraction enabled (Configuration > Spider > Extraction > Schema.org).

Step-by-step procedures:

  1. Enable Schema Extraction:

    • Go to Configuration > Spider > Extraction.
    • Check Schema.org.
  2. Run a Crawl:

    • Perform a crawl.
  3. Analyze Schema Data:

    • After the crawl, navigate to the Schema tab.
    • Screaming Frog will parse and display structured data found on pages, now using the updated v27 definitions.
    • Output: Detailed view of Schema types, properties, and values. Includes validation status.
    • Use Cases: Audit structured data implementation for compliance with the latest standards; identify missing or incorrect Schema markup; ensure rich snippet eligibility.

3.10. Wildcard User-Agent Matching & Remove Parameters (Log File Analyser)

These are specific to the Log File Analyser, not the main SEO Spider, but are critical for advanced log analysis.

Wildcard User-Agent Matching:

  • Purpose: Allows more flexible grouping and filtering of user agents in log files, e.g., matching all Googlebot variants with Googlebot*.
  • How to Use: In the Log File Analyser, when configuring user agent filters or grouping, use standard wildcard characters (*, ?) in your definitions.

Remove Parameters:

  • Purpose: Cleans up URLs in log files by removing query parameters, enabling more accurate aggregation and analysis of unique pages.
  • How to Use: In the Log File Analyser settings, there will be an option to Remove Parameters or Ignore Query Strings for URL processing. This ensures that example.com/page?id=1 and example.com/page?id=2 are treated as the same page (example.com/page).

3.11. MCP Server (Model Context Protocol) - Introduced in v24.0

The official Screaming Frog SEO Spider MCP server enables natural language control of crawl data via Claude and other AI assistants. It is a Node.js-based server built into the Spider application – source.

Requirements:

  • Screaming Frog SEO Spider v24.0+
  • Paid license (free version limited to 500 URLs)
  • An MCP-compatible AI client (e.g., Claude Desktop, Claude Cowork, Cursor, LM Studio)

Two connection modes:

  • STDIO mode: Runs SF headless; GUI and MCP cannot both hold the database simultaneously. Workflow: crawl in GUI, close SF, then use MCP.
  • Streamable HTTP mode (mcp-remote to http://localhost:11435/mcp): Allows keeping a crawl open in GUI while using Claude. Sequential exports work, but simultaneous conflicting actions are not allowed – source.

Setup for Claude Desktop:

  1. Add to claude_desktop_config.json:
    {
      "mcpServers": {
        "sf": {
          "command": "npx",
          "args": ["mcp-remote", "http://localhost:11435/mcp"]
        }
      }
    }
    
  2. Set a base directory (e.g., /Users/you/seo_spider_mcp_outputs/) – everything MCP produces is stored there. Suggested folder structure: /seo-mcp/clients/[client-name]/crawls/YYYY-MM-DD/.
  3. Server key should be short (e.g., “sf”) to avoid tool naming issues (combined name limit ~60 characters).

Reports available via MCP (sf_generate_report): Verified list includes 53+ reports across core, redirects, canonicals, pagination, hreflang, structured data, PageSpeed (20 reports), mobile, accessibility, cookies, JavaScript – source.

Important notes:

  • 100% crawl progress doesn’t mean ready if APIs still processing – poll sf_crawl_progress until idle.
  • Modal dialogs in SF block MCP tool calls – close all dialogs before using MCP.
  • Not every feature is supported in MCP v1 (e.g., custom extraction, JS rendering configuration must be set up in GUI and saved as .seospiderconfig profile).

3.12. Auto Compare Crawls - Introduced in v24.0

Automatically compare the last two scheduled crawls in a project to highlight changes – source.

Step-by-step procedures:

  1. Go to File > Scheduling > Add (or edit an existing scheduled task).
  2. Under task settings, enable Auto Compare Crawls.
  3. Run scheduled crawls. After each crawl, Screaming Frog automatically compares it with the previous one.
  4. Output: Displays all changes: pages added/removed, issue counts, title changes, etc.
  5. If email notifications are enabled (v21.0+), the email summary includes a comparison of issues between the last two crawls – highlighting new/resolved problems without opening the tool (v24.0+).

3.13. Find Uncrawlable Links - Introduced in v24.0

Detects HTML links that don’t conform to Google’s link best practices (e.g., improperly formatted, <span href>, <a onclick>) – source.

Requirements:

  • “Store” enabled in Config > Spider > Crawl.

Step-by-step procedures:

  1. Run a standard crawl with storage enabled.
  2. Navigate to the Links tab.
  3. Use the new filter Pages With Uncrawlable Internal Outlinks.
  4. A new column Link Crawlability appears in the lower Outlinks tab, indicating which links are not crawlable.
  5. Bulk export: Bulk Export > Links > Pages With Uncrawlable Internal Outlinks.

Use Cases: Identify broken internal linking structures, incorrect HTML link implementations, and pages that waste crawl budget by pointing to non-crawlable URLs.

4. Best Practices & Proven Strategies

  • API Key Management: Always keep API keys secure. Use environment variables or secure configuration management where possible. Monitor API usage to control costs.
  • Incremental Adoption: Start by integrating one new feature at a time into your workflow. Understand its output before combining it with others.
  • Performance Optimization: When using JavaScript rendering or multiple API integrations, be mindful of crawl speed. Adjust Batch Size, Delay, and Timeout settings to prevent rate limiting and ensure stable crawls. For large sites (>1M URLs), use database storage on SSD, allocate at least 16GB RAM, and disable unnecessary features – source.
  • Cross-Referencing Data: The power of Screaming Frog lies in its ability to combine data. Cross-reference AI-powered content insights with performance metrics, Ahrefs data, and log file analysis for truly holistic audits.
  • Custom Extraction for AI: For highly specific AI analyses, use custom extraction to precisely target the content you want the AI to process (e.g., only main article body, excluding comments or sidebars).
  • Regular Updates: Keep Screaming Frog updated to benefit from the latest features, bug fixes, and performance enhancements.
  • File Size Limits: Be aware that as of v23.3, Googlebot’s limit is 2MB (not 15MB as previously thought). New issues check for HTML documents and resources over 2MB – source.
  • MCP Workflow: Use Streamable HTTP mode to keep the GUI open alongside MCP for interactive work. Set a structured base directory for MCP output to keep files organized.

5. Advanced Techniques & Expert Insights

  • Semantic Content Gap Analysis: Use the AI features to crawl competitor sites (with permission) and identify their content clusters. Compare these to your own site's clusters to find topical gaps or areas where competitors have deeper coverage.
  • Prioritizing Performance Fixes: Combine LCP Request Discovery and Document Request Latency with PSI Insight Audits to create a prioritized list of performance optimizations. Focus on pages with critical LCP issues stemming from either slow server response or late resource discovery.
  • Automated Content Audits: Script Screaming Frog to run scheduled crawls with AI analysis. Export the semantic similarity and clustering reports, then use external tools (e.g., Python scripts) to automatically flag potential content cannibalization or consolidation opportunities.
  • Deep Dive into Redirect Migrations: Leverage enhanced redirect chain analysis post-migration. Export all redirect chains and build a pivot table to quickly identify long chains, accidental loops, or redirects to 404s, allowing for rapid remediation.
  • Custom JavaScript for A/B Testing Validation: Inject custom JavaScript to check for the presence of A/B testing scripts or specific variations of content, helping to ensure consistent deployment across pages.
  • Combining Ahrefs & Internal Link Data: Export crawl data including Ahrefs metrics (DR, UR, Referring Domains) and internal linking data (Inlinks, Outlinks). Analyze in a spreadsheet to find high-authority pages with poor internal linking, or low-authority pages receiving excessive internal links.
  • GEO (Generative Engine Optimization) Case Study (2026): A B2B SaaS content site used Screaming Frog to audit a pillar-spoke cluster. After fixing internal linking gaps, duplicate content, canonical conflicts, and structured data issues, they saw a +28% lift in impressions (41,200→52,900) and +24% lift in clicks (1,180→1,460) over 60 days – source. Monthly monitoring: 0 orphans, depth ≤3, no canonicalized URLs in sitemaps, 0 schema errors, duplicate title/H1 checked.
  • Content Audit Assistant (Ian Lurie Method): Combine Screaming Frog crawl (with OpenAI embeddings, GSC, GA) + Claude project + Zapier MCP for DataForSEO and Google Sheets. Steps: Set up Claude project with instructions, get OpenAI API key, configure SF crawl with JS rendering, JSON‑LD extraction, store HTML, content area, embeddings. Export crawl data and Google Analytics report to Google Sheets. Use Claude to generate prioritized content opportunities – source.

6. Common Problems & Solutions

  • API Rate Limits:
    • Problem: APIs (OpenAI, Ahrefs, PSI) return errors due to too many requests.
    • Solution: Increase Delay and decrease Batch Size in the respective API configuration settings. Monitor API dashboards for usage. Adjust RPM per provider under Advanced tab.
  • High AI API Costs:
    • Problem: Semantic analysis consumes a large number of tokens, leading to unexpected costs.
    • Solution: Be selective about which pages or content types you analyze. Use custom extraction to limit the text sent to the API. Consider smaller, more targeted crawls. Use free Gemini tier (limited, slow) if available – source.
  • Inaccurate JavaScript Rendering:
    • Problem: Screaming Frog doesn't correctly render content on complex JavaScript sites.
    • Solution: Increase Rendering Timeout and AJAX Timeout. Try different Screen Size and User Agent settings. Use the Rendered Page tab to debug. Ensure all necessary resources (CSS, JS) are not blocked by robots.txt.
  • Slow Crawl Speed with New Features:
    • Problem: Enabling multiple API integrations and JavaScript rendering significantly slows down crawls.
    • Solution: Run separate crawls for different data sets (e.g., one for technical, one for AI, one for PSI). Adjust Threads in Configuration > Spider > Advanced (though increasing threads can exacerbate API rate limits). Prioritize which features are essential for a given audit.
  • Crawl Retention Deleting Important Data:
    • Problem: Crawl data is deleted before you've finished analyzing it.
    • Solution: Adjust the Retention Period in File > Settings > Crawl Retention to a longer duration. Manually save critical crawls (File > Save As) to prevent automatic deletion. You can also lock crawls from deletion – source.
  • MCP Issues:
    • Problem: MCP tool calls fail because of modal dialogs or database lock.
    • Solution: Close all dialogs before using MCP. In STDIO mode, ensure GUI is closed. Use Streamable HTTP mode to allow GUI and MCP to coexist (sequential use). Wait for crawl to reach idle state (poll progress) – source.
  • Token Exceeded Errors:
    • Problem: Long content pages exceed token limits when sent to AI API.
    • Solution: Limit page content length in custom endpoint settings. Consider using smaller models or truncating content – source.
  • File Size Limit Update (v23.3):
    • Problem: Googlebot’s limit is 2MB, not 15MB as previously communicated. New issues report HTML documents and resources over 2MB.
    • Solution: Update crawl configuration to check for oversized files. Use the new configurable limits – source.
  • Looker Studio Breaking Change (v23.0):
    • Problem: After PSI updates to Insight Audits, existing Looker Studio reports need column updates.
    • Solution: Re-map columns in Looker Studio data source. Screaming Frog provides in-app warning – source.

7. Metrics, Measurement & Analysis

The new features introduce or enhance several key metrics:

  • AI-Driven Metrics:
    • Semantic Similarity Score: Quantifies topical overlap between pages.
    • Content Cluster ID: Groups pages by dominant topic.
    • Semantic Search Relevance: Measures how well a page matches a conceptual query.
    • Analysis: Use similarity scores to identify cannibalization risks or consolidation opportunities. Cluster IDs help map content strategy and internal linking.
  • Performance Metrics (Enhanced):
    • PSI Score (Overall, Performance, Accessibility, Best Practices, SEO): Google's holistic performance assessment.
    • Core Web Vitals (LCP, FID/INP, CLS): Key user experience metrics.
    • Document Request Latency: Time for initial HTML response.
    • LCP Request Discovery: Time to discover the Largest Contentful Paint element.
    • New Insight Audit Issues: 7 new issues as listed in 3.2.
    • Analysis: Correlate low PSI scores and poor CWV with Document Request Latency and LCP Request Discovery to pinpoint root causes. Prioritize fixes based on impact and effort.
  • Ahrefs Metrics (Integrated):
    • Domain Rating (DR), URL Rating (UR), Referring Domains, Backlinks, Organic Traffic, Keywords, Cost: Off-page authority and link profile.
    • Country-level metrics (v24.0): Filter by country and volume.
    • Analysis: Combine with on-page data (e.g., Word Count, H1) and traffic data (via GA/GSC API integration) to identify high-authority pages underperforming, or low-authority pages that need link building.
  • MCP Metrics:
    • Number of tools available: ~29
    • Reports available: 53+ (core, redirects, canonicals, etc.)
    • Usage Stats (v24.0): Help > Usage Stats shows time spent crawling; non-cumulative view added in v24.1 – source.

8. Tools, Resources & Documentation

9. Edge Cases, Exceptions & Special Scenarios

  • AI Analysis on Dynamic Content: Be cautious when performing semantic analysis on content that changes frequently (e.g., user-generated content, news feeds). The analysis reflects a snapshot in time.
  • Headless CMS & API-Driven Sites: JavaScript rendering and custom JavaScript become indispensable for crawling and extracting data from these modern architectures. Standard crawls will likely miss most content.
  • Large-Scale Crawls with APIs: For sites with millions of URLs, running all API integrations simultaneously can be prohibitively slow and expensive. Consider segmenting crawls or prioritizing specific API data for subsets of URLs. Use database storage with SSD and allocate sufficient RAM (16GB+ for 1M+ URLs) – source.
  • International SEO & AI: Ensure your chosen AI model supports the language of your website for accurate semantic analysis.
  • Log File Analyser & CDN Logs: When using Wildcard User-Agent Matching or Remove Parameters with CDN logs, ensure the log format is correctly configured, as CDNs often have unique log structures.
  • Free Version Limitations: Maximum 500 URLs per crawl; cannot save/open crawls; no configuration options, custom search/extraction, GA/GSC integration, JS rendering – source. The 500-URL limit includes images, CSS, JS, so small sites may exhaust it quickly – source.
  • MCP vs GUI Conflicts: In STDIO mode, GUI and MCP cannot both hold the database simultaneously. Close SF GUI before using STDIO MCP. Use Streamable HTTP mode for concurrent access (sequential exports only) – source.
  • File Size Limit Change: As of v23.3, Googlebot’s file size limit is 2MB, not 15MB. New issues check for oversized HTML documents and resources. Configurable limits available – source.

10. Deep-Dive FAQs

  • Q: Can I use the AI features without an OpenAI API key?
    • A: No, the AI-powered semantic features specifically rely on integration with external large language models like OpenAI's API (or Gemini, Anthropic, Ollama, etc.). Without a configured API key, these features will not function.
  • Q: How do I manage the cost of AI API usage?
    • A: Start with smaller crawls. Use custom extraction to limit the amount of text sent to the API. Monitor your API provider's dashboard. Set strict crawl limits or use filters to target specific content for AI analysis. Consider using the free Gemini tier (though slow and limited) or local models via Ollama.
  • Q: My JavaScript-rendered pages look different in Screaming Frog than in my browser. Why?
    • A: This can happen due to several reasons:
      • Timeout: Screaming Frog's rendering timeout might be too short, preventing all JavaScript from executing.
      • Resource Blocking: JavaScript or CSS files might be blocked by robots.txt, preventing the page from rendering correctly.
      • User Agent Differences: Screaming Frog's default user agent might be served different content.
      • Browser Features: Screaming Frog's embedded browser might lack certain cutting-edge browser features or extensions.
      • Solution: Increase timeouts, check robots.txt, try different user agents, and use the Rendered Page tab for debugging.
  • Q: How can I ensure Document Request Latency and LCP Request Discovery are accurate?
    • A: Ensure your crawl environment (network, CPU) is stable and not overloaded. Perform crawls from a consistent location. These metrics are influenced by network conditions, so results can vary slightly between crawls.
  • Q: Is it safe to use "Auto-Deleting Crawls"? What if I need an old crawl?
    • A: It is safe if configured thoughtfully. If you need to retain specific crawls indefinitely, manually save them (File > Save As) before the auto-delete mechanism can affect them. You can also lock crawls from deletion. Consider a separate backup strategy for critical crawl data.
  • Q: Can I combine Ahrefs data with Google Search Console data in Screaming Frog?
    • A: Yes, you can integrate both Ahrefs and Google Search Console (GSC) via their respective API configurations in Screaming Frog. This allows you to pull in backlink data, organic search clicks, impressions, and positions alongside your on-page and technical crawl data. This is a powerful combination for identifying high-potential pages.
  • Q: What is the difference between the official MCP server and the community one?
    • A: The official MCP server (v24.0+) is Node.js-based, stateful, has ~29 tools including a script runner, and requires the Screaming Frog GUI (or headless with STDIO). The community MCP (GitHub) is Python-based, stateless, has 9 read-and-export tools only (safer for CI/CD), and works with v16+.
  • Q: What are the minimum system requirements for large crawls?
    • A: For up to 200k URLs, 8GB RAM is recommended. For 1M+ URLs, use 16GB RAM + SSD storage. Memory allocation in the tool should be set to at least 4GB for up to 2M URLs – source. Screaming Frog supports Windows, macOS, Ubuntu, and Fedora (including arm64 Linux since v24.0) – source.

11. Related Concepts & Next Steps

  • Content Strategy & Information Architecture: AI-powered features directly inform these areas, helping to map content, identify gaps, and plan internal linking.
  • Core Web Vitals Optimization: The enhanced performance metrics are central to understanding and improving CWV, directly impacting user experience and search rankings.
  • Technical SEO Auditing: New rendering and redirect analysis tools are critical for comprehensive technical audits, especially for complex modern websites.
  • Log File Analysis: While some features are in the Log File Analyser, understanding how crawl data from the SEO Spider correlates with server log data provides a complete picture of search engine interaction.
  • Data Visualization: Exporting data from Screaming Frog and visualizing it in tools like Google Data Studio, Tableau, or Excel can unlock deeper insights, especially for content clusters and performance trends.
  • Automated Reporting: Use scheduled crawls with email notifications (v21.0+), Auto Compare (v24.0+), and MCP to create automated monitoring dashboards and reports.

Recent News & Updates (Version Release History)

The following is a comprehensive timeline of major features introduced between May 2024 and June 2026, based on official release notes and blog posts.

Version 20.0 (7 May 2024 – codename ‘cracker’)

  • Custom JavaScript Snippets
  • Mobile Usability auditing via Lighthouse
  • N‑grams Analysis
  • Aggregated Anchor Text
  • Local Lighthouse Integration
  • Carbon Footprint & Rating
  • Bug fixes through 20.4 (22 Oct 2024) – source

Version 21.0 (12 November 2024 – codename ‘towbar’)

  • Direct AI API Integration: OpenAI, Gemini & Ollama
  • Accessibility tab (axe rule set)
  • Email Notifications
  • Custom Search Bulk Upload
  • Bug fixes through 21.4 (5 Feb 2025) – source

Version 22.0 (10 June 2025 – codename ‘knee-deep’)

  • Semantic Similarity Analysis using LLM embeddings (default threshold 0.95, adjustable down to 0.5)
  • Semantic Content Cluster Visualisation (2D clustering)
  • Semantic Search (right‑hand tab, cosine similarity)
  • AI improvements: Multiple Prompt Targets, run prompts for specific segments, reference URL details, custom OpenAI endpoint (DeepSeek, Microsoft Copilot, Grok), Anthropic (Claude) integration, Image & Text‑to‑Speech generation
  • Advanced Column Configurator
  • Custom Multi‑Export with presets; Looker Studio export from manual crawl
  • Export to Multiple Tabs in Single Sheet/Workbook (Google Sheets & Excel)
  • Download Multiple XML Sitemaps (list mode)
  • Download from Google Sheets (private sheets via Google Drive auth)
  • APIs Mode (Mode > APIs) – fetch API data without crawling
  • Moz API updated to v3 (link propensity, spam score, brand authority)
  • Majestic API – option to pull Trust Flow Topics
  • Bug fixes through 22.2 (2 Jul 2025) – source

Version 23.0 (20 October 2025 – codename ‘Rush Hour’)

  • Lighthouse & PSI updated to Insight Audits (Lighthouse 13): 7 new issues, 11 removed/consolidated, 6 renamed (details in section 3.2)
  • Ahrefs v3 API Update – OAuth flow, works on any paid plan; metrics include backlinks, referring domains, URL rating, domain rating, organic traffic, keywords, cost
  • Auto‑Deleting Crawls (Crawl Retention) – default “Never”; lock crawls from deletion
  • Semantic Similarity Embedding Rules – define URL patterns for analysis
  • Display All Links in Visualisations (right‑click, also 3D)
  • Display Links in Semantic Content Cluster Diagram (right‑click; “Show Inlinks Within Cluster”)
  • Limit Crawl Total Per Subdomain (Config > Spider > Limits)
  • Improved Heading Counts (total number on page, not just 2)
  • Move Up/Down buttons for Custom Search, Extraction & JS ordering
  • Configurable Percent Encoding (uppercase by default)
  • Irish Language Spelling & Grammar support
  • Updated AI models: OpenAI → gpt‑5‑mini, Gemini → gemini‑2.5‑flash, Anthropic → claude‑sonnet‑4‑5
  • New exports: All Error Inlinks (Bulk Export > Response Codes), Redirects to Error report; Redirection (HTTP Refresh) filter
  • Bug fixes through 23.3 (18 Feb 2026) – source

Version 23.3 (18 Feb 2026) – Notable changes

  • Updated file size limits: Googlebot’s limit is 2MB (not 15MB). New issues: ‘HTML Document Over 2MB’, ‘Resource Over 2MB’ (configurable limits)
  • Replaced deprecated ‘Gemini text‑embedding‑004’ with ‘gemini‑embedding‑001’
  • New bulk export: ‘Non‑Indexable Page Inlinks Only’ under Bulk Export > Links
  • Java 21.0.10; Log4j 2.25.3 – source

Version 24.0 (19 May 2026 – codename ‘bolus’)

  • Screaming Frog SEO Spider MCP (Model Context Protocol) – official natural language control via Claude (Streamable HTTP or STDIO mode)
  • Auto Compare Crawls – automatic comparison of scheduled crawls
  • View Crawl Changes in Email Notifications
  • Send Crawl Export Attachments by Email
  • Find Uncrawlable Links – new issue: Pages with Uncrawlable Internal Outlinks
  • Usage Stats – time spent crawling
  • Arm64 Linux Versions (Ubuntu & Fedora aarch64)
  • Improved Reporting of Syntactically Invalid Links (e.g., hppts://example.com)
  • Skip Empty Reports; Ahrefs Country Level Metrics
  • Filterable Content Clusters; Live Model Validation in AI Integrations
  • System Wide Prompt for AI; AI Provider Token Usage display
  • Ollama Image Generation & Request Timeout
  • Renamed Looker Studio back to Data Studio
  • Java 25 – source

Version 24.1 (8 June 2026)

  • Bug fixes only: added recently crawled URLs to scheduled crawl seed textbox; non‑cumulative view for Usage Stats; “Auto‑start MCP Server on application launch” setting; ability to Bulk Export all multi‑file exports via MCP; download MCP API as markdown; increased GSC read timeout to 2 minutes (from 20 seconds); MCP Server progress now includes APIs and Crawl Analysis; fixed URL Inspection not working in API mode; fixed MCP sf_url_content “Tool error: Page content size too large”; fixed MCP tools in non‑English languages; fixed MCP logging to stdout in CLI mode; various crash fixes – source

12. Appendix: Reference Information

  • Glossary:
    • LCP: Largest Contentful Paint – A Core Web Vital measuring loading performance.
    • CLS: Cumulative Layout Shift – A Core Web Vital measuring visual stability.
    • FID/INP: First Input Delay / Interaction to Next Paint – Core Web Vitals measuring interactivity.
    • SPA: Single Page Application – A web application model that loads a single HTML page and dynamically updates that page as the user interacts with the app.
    • API: Application Programming Interface – A set of rules that allows different software applications to communicate with each other.
    • Semantic Search: Search that understands the meaning and context of queries, not just keywords.
    • N-gram: A contiguous sequence of n items from a given sample of text or speech.
    • MCP (Model Context Protocol): An open protocol that enables LLMs to interact with external tools. Screaming Frog’s implementation is a Node.js server exposing crawl data and controls.
    • Insight Audits: The upgraded version of Lighthouse/PSI audits (Lighthouse 13) introduced in v23.0, providing more actionable recommendations.
    • OAuth: Authentication flow used by Ahrefs v3 API (v23.0+) and Google integrations.
    • Cosine Similarity: The metric used in Semantic Search (v22.0+) to rank pages by relevance to a query, based on vector embeddings.
    • Embeddings: Numerical vector representations of text generated by LLMs; enable semantic similarity comparison.
  • Checklist for Implementation:
    • Update Screaming Frog to the latest version (24.1+).
    • Obtain and configure necessary API keys (OpenAI, Ahrefs, PSI).
    • Review Configuration > Spider > Rendering settings for JavaScript-heavy sites.
    • Understand the cost implications of AI API usage.
    • Plan targeted crawls for specific feature testing.
    • Create a data analysis workflow for new metrics.
    • Regularly check Screaming Frog's blog and release notes for further updates.
    • Set up MCP server for natural language queries (v24.0+).
    • Configure crawl retention policies if using auto-deletions.
    • For large sites, use database storage on SSD and allocate sufficient RAM (16GB+ for >1M URLs).

13. Knowledge Completeness Checklist

  • Total unique knowledge points: 100+
  • Sources consulted: 30+ (based on provided research findings)
  • Edge cases documented: 8+
  • Practical examples included: 15+
  • Tools/resources listed: 10+
  • Common questions answered: 8+
  • Missing information identified: Specific details on N-gram analysis as a distinct new feature (though included in v20.0, its advanced applications are not fully covered). Exact release dates for all minor bug fix versions omitted for brevity.

What's new (2026-06-19)

Originally published in the EcomExperts SEO library.

Ready to Become One of Our Success Stories?

Book a free 30-minute consultation and get a custom SEO strategy that will increase your revenue, not just your traffic. We'll show you exactly how to outrank your competitors and capture more customers.

Book your Free 30-minute Consultation Now