robots.txt
robots.txt file This file tells search engine crawlers which pages or files can and cannot be requested from your website. This file is a web standard file recognized by most good crawlers, robots.txt. Consume before requesting anything from a specific domain. In order to protect specific areas of the website (such as CMS, admin, user accounts, etc.) from being crawled, these files must be located in the root directory of each host. You can redirect the root/robots.txt path to the target URL.
Controlling Crawling Permissions: Through directives, it informs crawlers which pages or directories can be accessed (Allow) and which cannot (Disallow). For instance, it can block crawlers from sensitive areas like admin panels or temporary pages to reduce server load or protect privacy.
- Specifying Crawling Rules: Common directives include:
- User-agent: *: Applies to all crawlers.
- Disallow: /private/: Prohibits crawling content under the /private/ directory.
- Allow: /public/: Permits crawling the /public/ directory.
- Sitemap: https://example.com/sitemap.xml: Optionally points to the location of the XML sitemap.
- Notes:
- It is not a mandatory security measure and is only effective for crawlers that adhere to the protocol. Malicious crawlers may ignore it.
- If the file does not exist or is formatted incorrectly, crawlers will typically default to crawling the entire site.
- It is commonly used for SEO (Search Engine Optimization) to help the site be better indexed by search engines.
This is a standardized protocol promoted by the W3C and search engine companies (such as Google), and it can be created using a simple text editor.
XML sitemap
An XML sitemap is a file written in XML format, usually named sitemap.xml and placed in the root directory of a website (for example, https://example.com/sitemap.xml ). It is a tool for website administrators to provide search engines with the site's structure and a list of pages, helping crawlers discover and index content more efficiently. Its primary functions include:


