The Home for Magento 2 Excellence

Quality-tested Magento 2 modules. Explore. Evaluate. Elevate. #magento2

1090 Modules
600 Ready
490 Need Help
🏆 Leaderboard
Actively Maintained v1.0.7

Panth Robots SEO

mage2kishan/module-robots-seo

Unifies robots control for Magento 2 — takes over /robots.txt at the router layer, emits X-Robots-Tag headers on frontend HTML pages, adds per-user-agent allow/disallow rules via an admin grid, and toggles fourteen modern AI/LLM crawlers with one click, all behind a CRLF-safe directive validator. Works on Hyva and Luma.

19
Downloads
Below average
0
GitHub Stars
2d ago
Last Release
0
Open Issues
Build Issues
0/3 checks passed

Build Tests

Composer Install
DI Compile
Templates

Code Quality

CS Coding Standard
PHPStan

Tested on Magento 2.4.9

Recent Test History

Each release is tested against the latest Magento version at that time.

v1.0.7 on Magento 2.4.9
Jun 8, 2026
v1.0.6 on Magento 2.4.9
Jun 7, 2026
v1.0.1 on Magento 2.4.9
Jun 1, 2026

Looking for Contributors

Composer installation fails. Your contribution could help the entire Magento community!

Contribute

Share This Module's Status

Panth Robots SEO Magento compatibility status badge

README

Loaded from GitHub

Panth Robots SEO — Dedicated robots.txt, X-Robots-Tag & LLM Bot Policy for Magento 2 (Hyva + Luma)

Complete robots and crawler-policy control for Magento 2. One module takes over /robots.txt at the router layer, emits an X-Robots-Tag HTTP response header on every frontend HTML page, adds per-user-agent allow/disallow rows via an admin grid, and toggles fourteen modern LLM / AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, Amazonbot, Cohere-AI, and more) with a single click. Every directive passes a CRLF-safe validator before it ever reaches the wire. Works identically on Hyva and Luma.

Magento's native robots handling is three things that no longer add up: a static robots.txt file on disk, a single admin textarea buried under Content → Design → Configuration that overwrites it, and no X-Robots-Tag header control whatsoever. There is also no UI for the new generation of AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider — so stores either open their data to every model trainer by default or hand-edit the file on every deploy. Panth Robots SEO unifies all three layers (robots.txt body, robots meta, X-Robots-Tag header) into one coherent admin surface with a dedicated controller, a declarative schema-backed policy grid, and a directive validator that makes CRLF header injection structurally impossible.


Need Custom Magento 2 Development?

Kishan Savaliya

Top Rated Plus on Upwork

Panth Infotech Agency


Table of Contents


Preview

Live walkthrough

End-to-end admin flow — enable the module, toggle a few LLM bots, add a policy row, preview the generated robots.txt, curl /robots.txt on both Hyva and Luma, and confirm the X-Robots-Tag header on a customer-account page. Click to play.

Panth Robots SEO demo

Admin

Global configuration — toggle the module, pick the default <meta name="robots"> value, configure layered-nav and catalogsearch noindex, edit the noindex path list, set max-image-preview / max-snippet / Crawl-delay.

Admin configuration

Robots Policies grid — one row per (user-agent, path, directive, store_id) tuple. Filter by store, mass-enable / disable / delete, inline priority column so the evaluator knows which rule wins when two patterns overlap.

Admin grid

Edit form — policy row — pick a user-agent (* for the default block, or GPTBot, ClaudeBot, a custom UA, etc.), pick allow / disallow, enter a path, scope to a store view, set priority and active flag. The UA and path fields are validated against a whitelist regex before save.

Admin edit form

robots.txt preview — dedicated Panth Infotech → Robots & LLM Bots → robots.txt Preview page renders the live body exactly as the frontend will serve it, with a store-switcher so you can verify each store view before rolling to production.

robots.txt preview


Features

Feature Description
Dynamic /robots.txt per store view Built on the fly from LLM-bot toggles (emitted as User-agent: <bot>\nDisallow: / blocks when disabled), a User-agent: * block with admin policy rows and Crawl-delay, then Sitemap: and Host: lines. No static file ever leaves disk.
14 LLM / AI crawler toggles One-click allow/disallow for GPTBot (GPTBot), ChatGPT-User (ChatGPT-User), OAI-SearchBot (OAI-SearchBot), ClaudeBot (maps both ClaudeBot and Claude-Web), Anthropic-AI (anthropic-ai), Google-Extended (maps both Google-Extended and GoogleOther), PerplexityBot (PerplexityBot), Cohere-AI (cohere-ai), CCBot (CCBot), Bytespider (Bytespider), Amazonbot (Amazonbot), Applebot-Extended (Applebot-Extended), FacebookBot (FacebookBot), Meta-ExternalAgent (meta-externalagent).
X-Robots-Tag response header Added to every frontend HTML response by Plugin\Response\XRobotsTagPlugin with max-image-preview:<large|standard|none> and max-snippet:<int> appended to the chosen directive. Handled before Response::sendResponse() so the header is always present.
Noindex path matcher Service\NoindexPathMatcher walks an admin-editable list of path patterns (* wildcards supported). Defaults cover /customer/*, /checkout*, /wishlist*, /sales/*, /contact*, /catalogsearch/*, /multishipping/*, /newsletter/manage*, /review/customer/*, /captcha*, /sendfriend/*, /paypal/*, /downloadable/customer/*, /vault/*, /giftcard/customer/*, /rewards/*, /oauth/*, /connect/*.
Layered-nav / sort-filter noindex When a catalog listing has any ?p=, ?dir=, ?order=, ?limit=, or layered-nav attribute query parameter, the header flips to noindex, follow so filtered permutations don't dilute the canonical listing.
Catalogsearch noindex /catalogsearch/result/* pages emit noindex, follow by default — searches are inherently ephemeral and shouldn't be indexed.
HTTP-status-aware override 404, 410, 500 and 503 responses hard-override the header to noindex, nofollow regardless of config, so error pages can never leak into the index.
Non-HTML asset noindex Requests ending in .pdf, .doc, .docx, .xls, .xlsx emit noindex, nofollow — stops support docs and spec sheets from displacing the canonical product page in the SERP.
robots.txt custom-body override robots_txt/override_enabled = 1 pastes robots_txt/custom_body verbatim into the response and skips the entire generation pipeline. CRLF is normalised to LF on write.
Admin CRUD grid panth_seo_robots_policy table with a full UI-component grid — per-UA, per-path, per-store-view allow/disallow rows with priority and active flag. Dedicated robots.txt Preview admin page renders the live output.
CRLF-injection-safe Every directive string passes Service\DirectiveValidator (printable-ASCII whitelist, rejects \r, \n, \0). Every path and UA is validated against a whitelist regex before the DB write.

How It Works

Seven cooperating pieces:

  1. Controller\Robots\Index at route seo_robots/robots/index serves GET /robots.txt with the generated or override body, Content-Type: text/plain; charset=utf-8.
  2. Setup\Patch\Data\InstallRobotsTxtRewrite writes the url_rewrite row that maps /robots.txt to the module controller at install time; RefreshRobotsTxtRewrite re-points an existing stale target_path row left behind by Panth_AdvancedSEO so upgrades are a no-op.
  3. etc/frontend/di.xml disables the core Magento\Framework\App\RouterList entry for robots — Magento's built-in robots router no longer intercepts /robots.txt before the url_rewrite layer, so our controller wins.
  4. Plugin\Response\XRobotsTagPlugin is a beforeSendResponse plugin on Magento\Framework\App\Response\Http. It inspects the request path, status code, and rendered Content-Type, then sets X-Robots-Tag once per response.
  5. Model\Robots\PolicyResolver aggregates panth_robots_seo/llm_bots/* toggles + rows from panth_seo_robots_policy + the configured Crawl-delay + Sitemap: references into the final robots.txt body for a given store.
  6. Model\Robots\MetaResolver computes the per-entity robots meta string — used by the plugin and (when Panth_AdvancedSEO is present) by the shared panth_seo_resolved.robots cache column.
  7. Service\NoindexPathMatcher + Service\DirectiveValidator — the first decides whether a given request path is "private"; the second is the single chokepoint every directive string passes through before it hits a response header or the robots.txt body.

Supported LLM Bots

Per-bot allow/disallow lives at Stores → Configuration → Panth Infotech → Robots & LLM Bots → LLM Bot Policy. Turning a toggle to No emits User-agent: <bot>\nDisallow: / in the generated robots.txt; turning it to Yes omits the block entirely (equivalent to allow).

Bot UA string(s) Default Config path
GPTBot (OpenAI) GPTBot Yes panth_robots_seo/llm_bots/gptbot
ChatGPT-User ChatGPT-User Yes panth_robots_seo/llm_bots/chatgpt_user
OAI-SearchBot OAI-SearchBot Yes panth_robots_seo/llm_bots/oai_searchbot
ClaudeBot (Anthropic) ClaudeBot, Claude-Web Yes panth_robots_seo/llm_bots/claudebot
Anthropic-AI anthropic-ai Yes panth_robots_seo/llm_bots/anthropic_ai
Google-Extended Google-Extended, GoogleOther Yes panth_robots_seo/llm_bots/google_extended
PerplexityBot PerplexityBot Yes panth_robots_seo/llm_bots/perplexitybot
Cohere-AI cohere-ai Yes panth_robots_seo/llm_bots/cohere_ai
CCBot (Common Crawl) CCBot No panth_robots_seo/llm_bots/ccbot
Bytespider (ByteDance) Bytespider No panth_robots_seo/llm_bots/bytespider
Amazonbot Amazonbot Yes panth_robots_seo/llm_bots/amazonbot
Applebot-Extended Applebot-Extended Yes panth_robots_seo/llm_bots/applebot_extended
FacebookBot FacebookBot Yes panth_robots_seo/llm_bots/facebookbot
Meta-ExternalAgent meta-externalagent Yes panth_robots_seo/llm_bots/meta_externalagent

Always allowed (no dedicated toggle)

The following bots are not blocked by default and have no dedicated config key. If you need to block them, add a Disallow: / row to the Robots Policies grid with the UA as the user-agent:

  • YouBot — You.com's search crawler
  • PetalBot — Huawei / Petal Search crawler
  • Diffbot — knowledge-graph crawler
  • AI2Bot — Allen Institute research crawler
  • Omgilibot — Webz.io crawler
  • Timpibot — Timpi decentralised search crawler

Compatibility

Requirement Supported
Magento Open Source 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8
Adobe Commerce 2.4.4 — 2.4.8
PHP 8.1, 8.2, 8.3, 8.4
Hyva Theme 1.0+ (fully compatible)
Luma Theme Native support
Panth Core ^1.0 (installed automatically)

Installation

composer require mage2kishan/module-robots-seo
bin/magento module:enable Panth_Core Panth_RobotsSeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento cache:flush

Verify

bin/magento module:status Panth_RobotsSeo
# Module is enabled

curl -s -o /dev/null -w '%{http_code}\n' https://your-store.test/robots.txt
# 200

curl -sI https://your-store.test/customer/account/login | grep -i x-robots-tag
# X-Robots-Tag: noindex, nofollow, max-image-preview:large, max-snippet:-1

Visit Admin → Panth Infotech → Robots & LLM Bots → Robots Policies to see the seeded policy grid.


Configuration

Navigate to Stores → Configuration → Panth Infotech → Robots & LLM Bots.

General

Setting Path Default What it controls
Enable Module panth_robots_seo/general/enabled Yes Master switch. When No, the X-Robots-Tag plugin is a no-op and /robots.txt serves a stock User-agent: *\nAllow: /.
Debug Logging panth_robots_seo/general/debug No When Yes, every header and meta decision is written to var/log/panth_robots_seo.log.
Default Meta Robots panth_robots_seo/general/default_directive index,follow Baseline directive applied when no per-entity / per-path override fires. Allowed tokens: index, noindex, follow, nofollow, noarchive, nosnippet, noimageindex, max-snippet, max-image-preview, max-video-preview, unavailable_after, none, all.
Noindex Layered-Nav Filtered Pages panth_robots_seo/general/noindex_filtered Yes Emit noindex, follow when a catalog listing has layered-nav or sort/limit/page query parameters.
Noindex Search Result Pages panth_robots_seo/general/noindex_search_results Yes Emit noindex, follow on /catalogsearch/result/*.
Noindex URL Paths panth_robots_seo/general/noindex_paths (18-line seeded list — see above) One-path-per-line list of private patterns; * matches anything. Matched by Service\NoindexPathMatcher.
max-image-preview Directive panth_robots_seo/general/max_image_preview large Appended to every X-Robots-Tag. large is recommended for Google Discover eligibility.
max-snippet Directive panth_robots_seo/general/max_snippet -1 -1 = unlimited. A positive integer caps SERP snippet length.
Crawl-delay (seconds) panth_robots_seo/general/crawl_delay 0 Emitted under User-agent: * in robots.txt. 0 omits the directive.

LLM Bot Policy

Setting Path Default What it controls
Allow GPTBot (OpenAI) panth_robots_seo/llm_bots/gptbot Yes No = emits User-agent: GPTBot\nDisallow: /.
Allow ClaudeBot (Anthropic) panth_robots_seo/llm_bots/claudebot Yes Covers both ClaudeBot and Claude-Web.
Allow Google-Extended panth_robots_seo/llm_bots/google_extended Yes Covers both Google-Extended and GoogleOther.
Allow CCBot (Common Crawl) panth_robots_seo/llm_bots/ccbot No CCBot feeds dataset-scale training pipelines; blocked by default.
Allow PerplexityBot panth_robots_seo/llm_bots/perplexitybot Yes
Allow Bytespider (ByteDance) panth_robots_seo/llm_bots/bytespider No Bytespider ignores partial disallows; blocked by default.
Allow ChatGPT-User panth_robots_seo/llm_bots/chatgpt_user Yes
Allow OAI-SearchBot panth_robots_seo/llm_bots/oai_searchbot Yes
Allow Anthropic-AI panth_robots_seo/llm_bots/anthropic_ai Yes
Allow Cohere-AI panth_robots_seo/llm_bots/cohere_ai Yes
Allow Amazonbot panth_robots_seo/llm_bots/amazonbot Yes
Allow Applebot-Extended panth_robots_seo/llm_bots/applebot_extended Yes
Allow Facebookbot panth_robots_seo/llm_bots/facebookbot Yes
Allow Meta-ExternalAgent panth_robots_seo/llm_bots/meta_externalagent Yes

robots.txt Override

Setting Path Default What it controls
Use Custom robots.txt Body panth_robots_seo/robots_txt/override_enabled No When Yes, the custom body below REPLACES the generated output — every LLM toggle and policy row is ignored.
Custom robots.txt Body panth_robots_seo/robots_txt/custom_body (empty) Pasted verbatim into the response. CRLF is normalised to LF to prevent HTTP header smuggling. Leave empty to use the generated output.

Every setting resolves at store-view scope, so each store can have a different LLM policy, noindex path list, or override body.


Managing Robots Policies

Open Admin → Panth Infotech → Robots & LLM Bots → Robots Policies to reach the grid (route panth_robots_seo/policy/index).

Fields

Field Description
User-agent The UA string to match — * for the default block, GPTBot, ClaudeBot, Applebot-Extended, a custom crawler, etc. Validated against /^[A-Za-z0-9._\-+*\/ ]+$/ on save.
Directive allow or disallow. Single source of truth consumed by PolicyResolver.
Path The path fragment the directive applies to. Must start with /, no control bytes. * wildcards allowed.
Store View 0 applies to all stores; a non-zero value scopes the row to one store view. Foreign-keyed to store.store_id with ON DELETE CASCADE.
Priority Lower numbers are emitted first within the same user-agent block.
Active Per-row enable/disable. Inactive rows are never rendered.

Mass actions

Select rows and choose Enable, Disable or Delete from the grid mass-action menu.

robots.txt Preview

The Panth Infotech → Robots & LLM Bots → robots.txt Preview sub-menu (route panth_robots_seo/robots/index) renders the live body for the currently selected store, exactly as the frontend controller would serve it — helpful for dry-running changes before they go public.


robots.txt Endpoint

  • URL: GET /robots.txt
  • Content-Type: text/plain; charset=utf-8
  • Controller: Panth\RobotsSeo\Controller\Robots\Index at route seo_robots/robots/index.

/robots.txt is served by our controller via a url_rewrite row installed by Setup\Patch\Data\InstallRobotsTxtRewrite. The core Magento_Robots router is disabled via etc/frontend/di.xml so it never intercepts the request ahead of the url_rewrite layer.

If you are upgrading from Panth_AdvancedSEO where /robots.txt was already mapped to that module's controller, the RefreshRobotsTxtRewrite patch runs on the next setup:upgrade and rewrites the stale target_path to point at the new controller — zero manual DB surgery required.

Generated body shape

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Crawl-delay: 0
Disallow: /customer/
Disallow: /checkout/
Allow: /

Sitemap: https://your-store.test/sitemap.xml
Host: your-store.test

X-Robots-Tag Header

Plugin\Response\XRobotsTagPlugin runs beforeSendResponse on Magento\Framework\App\Response\Http and applies the following order of precedence:

  1. Self-skip on /robots.txt — never sets a header on the robots.txt response itself.
  2. Error-code override — 404, 410, 500, 503 → hard noindex, nofollow, no further checks.
  3. Non-HTML asset override.pdf, .doc, .docx, .xls, .xlsx → hard noindex, nofollow.
  4. Catalogsearch noindex/catalogsearch/result/* when noindex_search_results = Yesnoindex, follow.
  5. Configured noindex_paths matchService\NoindexPathMatchernoindex, nofollow (and the matching meta).
  6. Layered-nav / sort-filter — when listing page has query parameters → noindex, follow.
  7. Default directivepanth_robots_seo/general/default_directive (e.g. index, follow).

In every case the final string is appended with , max-image-preview:<value> and , max-snippet:<int> from general config, then passed through Service\DirectiveValidator before being set on the response.


Security

  • ACL + FormKey on every admin controller. Every Adminhtml controller extends Panth\RobotsSeo\Controller\Adminhtml\AbstractAction, declares its own ADMIN_RESOURCE constant (Panth_RobotsSeo::policies, Panth_RobotsSeo::preview), and enforces ACL via _isAllowed(). No route is reachable without a valid admin session.
  • HttpPostActionInterface on mutating paths. Save, Delete, MassDelete, MassStatus all implement HttpPostActionInterface so GET is rejected at the framework level. Form-key validation runs on every POST.
  • DirectiveValidator whitelist + control-byte rejection. Every directive string written to X-Robots-Tag or the robots.txt body passes through Service\DirectiveValidator::assertSafe() — rejects any string containing \r, \n, \0, or bytes outside printable-ASCII. CRLF header injection is structurally impossible.
  • CRLF normalisation in custom body. robots_txt/custom_body has \r\n\n normalisation applied on render so a pasted Windows-style newline can't smuggle a second response header.
  • Per-store scope on every config value. enabled, noindex_paths, every llm_bots/* toggle, and the custom body resolve at ScopeInterface::SCOPE_STORES — a store-specific value never leaks into another store.
  • UA + path validation on save. Admin policy rows reject user-agents outside /^[A-Za-z0-9._\-+*\/ ]+$/ and paths that do not start with / or contain control bytes, before the row is written.
  • XSS-safe admin preview. The robots.txt Preview page renders the body through escapeHtml() and wraps it in <pre> tags, so a hostile custom body can never execute script on an admin browser.

Troubleshooting

/robots.txt returns a 404 or a Luma 404 HTML page

You are likely sitting on a stale url_rewrite row left behind by Panth_AdvancedSEO whose target_path still points at the old controller. Run bin/magento setup:upgrade — the RefreshRobotsTxtRewrite patch fires idempotently and rewrites the row to the new target. Follow with bin/magento cache:clean config full_page.

X-Robots-Tag not appearing on /customer/* pages

Upgrade to ≥ 1.0.2. Earlier releases had a constructor-argument ordering bug that made the plugin skip the response when Panth_AdvancedSEO wasn't installed; 1.0.2 makes the dependency DI-nullable and the plugin always runs.

LLM bot block missing from robots.txt

  1. Check the toggle at the right scopepanth_robots_seo/llm_bots/<bot> resolves at store-view scope, not website or default.
  2. Flush config + FPC: bin/magento cache:clean config full_page. The /robots.txt body is built live per request but the config it reads from is cached.
  3. Confirm robots_txt/override_enabled is No — when the override is on, every LLM toggle is ignored.

I turned on override_enabled but nothing changes

  1. bin/magento cache:clean config full_page — the override flag and custom body are both pulled from cached config.
  2. Confirm custom_body was saved at the store scope you are viewing, not the default scope. Check with SELECT scope, scope_id, value FROM core_config_data WHERE path = 'panth_robots_seo/robots_txt/custom_body';.

Meta robots tag not showing in page HTML

The module sets the X-Robots-Tag HTTP response header — not the <meta name="robots"> element. A layout hook that injects the <meta> tag into the page <head> is only wired when Panth_AdvancedSEO is also installed (it owns the page/main block override). If you need both, install mage2kishan/module-advanced-seo alongside this module; they share the panth_seo_robots_policy table and do not collide.


Support

This content is fetched directly from the module's GitHub repository. We are not the authors of this content and take no responsibility for its accuracy, completeness, or any consequences arising from its use.