ConduitScore
Technical Guides13 min read

What Is llms.txt? The Complete Guide

llms.txt tells AI systems what your site contains and which pages to prioritize for citations. Learn what it is, how to write one, and whether it matters.

Ben Stone

Co-founder, ConduitScore

robots.txt tells AI systems where they can go. llms.txt tells them what they will find when they get there.

That distinction matters more than it first appears. A site can be completely accessible to AI crawlers — no blocked paths, valid sitemap, clean HTML — and still receive zero citations from ChatGPT or Perplexity. The reason is not access. The reason is context. When an AI system retrieves 150 pages from a website, it has no machine-readable signal about which three pages represent the site's core expertise, which content is most current, or which author is the authoritative voice on a given topic.

llms.txt is how you provide that signal directly.

What llms.txt Is

llms.txt is a plain-text file placed at the root of your website, accessible at yourdomain.com/llms.txt. It is a structured, markdown-formatted document that describes your site to AI systems: what the site covers, which pages are most important, who the authors are, and what an AI should know before it starts retrieving and citing your content.

The format was proposed by Jeremy Howard, co-founder of fast.ai, in September 2024. Howard's argument was straightforward: robots.txt was designed in 1994 for search engine crawlers. It handles access permissions — which paths are allowed, which are denied. It was never designed to communicate content context, and that gap has become a problem as AI systems need to make citation decisions based on more than just raw text retrieval.

llms.txt fills that gap. It is not a replacement for robots.txt. It is a companion document that answers a different question: not "can you access this?" but "once you get there, what is this site, and what should you prioritize?"

The Counterintuitive Part: This Is a Context Problem, Not a Crawl Problem

Most site owners who learn about AI visibility assume the main obstacle is access. They check their robots.txt, confirm that GPTBot is not blocked, and conclude their site is AI-ready.

It is not — and the reason reveals something important about how AI citation decisions actually work.

When a system like Perplexity retrieves content to answer a question, it does not read every page on your site with equal attention. It retrieves a selection of pages, ranks them by relevance and trustworthiness, synthesizes the content, and cites the sources it considers most authoritative for the specific query.

That ranking step — relevance and trustworthiness — is where most sites lose citations they should be receiving. An AI system that retrieves your homepage, three blog posts, and your pricing page has no way to know that your research report from six months ago is the most authoritative piece of content on your site. It cannot tell, from the HTML alone, that the author of that report has 15 years of domain experience and has been cited by three industry publications.

llms.txt is how you tell it. Directly, in machine-readable format, without ambiguity.

The Format: What Goes in a llms.txt File

llms.txt uses simple markdown formatting. There is no rigid specification yet — the standard is still emerging — but the structure that has become most widely adopted follows this pattern:

The header section opens the file with a brief description of the site. Two to four sentences covering what the site does, who it serves, and what makes its content authoritative. This is the first thing an AI system reads when it retrieves your llms.txt.

The main content sections are H2-level headings that correspond to the major content areas of your site. Under each heading, you list key pages with their URLs and a one-sentence description of what each page covers. These descriptions are not for human readers — they are for AI systems that need to quickly assess whether a given page is relevant to a query they are trying to answer.

The author section lists the people who create content for the site, with brief credential information. AI systems use author credentials as part of their citation trustworthiness assessment. A named author with stated expertise is more citable than anonymous content.

The optional metadata includes content freshness signals (when the site last had significant updates), contact information, and links to key resources like the sitemap, privacy policy, or API documentation if applicable.

Here is what a well-structured llms.txt looks like for a hypothetical B2B SaaS company called Acme Analytics, which provides data pipeline software:

# Acme Analytics

Acme Analytics provides cloud-native data pipeline software for engineering teams at mid-size companies. Our documentation, blog, and research cover data engineering, ETL architecture, real-time data processing, and data quality practices. Content is written by senior data engineers with 8-15 years of direct implementation experience.

Documentation

  • [Getting Started Guide](/docs/getting-started): Complete setup walkthrough for new users, from account creation to first pipeline run.
  • [Pipeline Architecture Overview](/docs/architecture): Technical reference for Acme's three-tier processing model and how data flows through the system.
  • [API Reference](/docs/api): Full API documentation with request/response examples for all endpoints.
  • [Troubleshooting Guide](/docs/troubleshooting): Solutions to the 40 most common configuration and runtime errors.

Blog

  • [Why Most Data Pipelines Fail at Scale](/blog/pipeline-failure-modes): Analysis of the four architectural patterns that cause pipeline failures above 10M events/day, with specific case examples.
  • [Real-Time vs. Batch Processing: A Decision Framework](/blog/realtime-vs-batch): Framework for choosing between streaming and batch architectures based on latency requirements, cost constraints, and team capacity.
  • [Data Quality at the Source: Preventing Downstream Problems](/blog/data-quality-source): Practical guide to implementing validation at ingestion rather than transformation, with code examples.

Research

  • [State of Data Engineering 2025](/research/state-of-data-engineering-2025): Survey of 847 data engineering teams on tooling, architecture patterns, challenges, and salary benchmarks.

Authors

  • Maria Chen (Head of Product, Acme Analytics): 12 years in data engineering, former staff engineer at Databricks. Writes on pipeline architecture and product decisions.
  • James Park (Senior Data Engineer): Specializes in real-time processing and stream analytics. Author of the Troubleshooting Guide.

Contact

  • General: hello@acmeanalytics.com
  • Press/Research: press@acmeanalytics.com
  • Last content update: March 2026

Notice what this file does. In under 400 words, it tells an AI system: what Acme Analytics is, what content categories exist, which specific pages are most authoritative, who the authors are and what their credentials are, and when the site was last updated. An AI answering a question about data pipeline failure modes can immediately identify the relevant blog post, understand who wrote it and why they are credible, and cite it with confidence.

Without llms.txt, that same AI system retrieves the page, has no context about its authority, and may deprioritize it in favor of content from a site that has provided clearer signals.

How llms.txt Fits Into the 14-Signal AI Visibility Framework

llms.txt is one of the 14 AI visibility signals that determine whether a site gets cited by AI systems. It falls within the Crawlability and Access category alongside robots.txt permissions and sitemap.xml.

The distinction within that category is important. robots.txt governs access — the binary question of whether an AI crawler is permitted to retrieve a page. A sitemap.xml governs discovery — it tells crawlers which pages exist. llms.txt governs context — it tells crawlers which pages matter and why.

All three work together. A site with a clean robots.txt and a valid sitemap but no llms.txt is accessible and discoverable, but not contextualized. A site with llms.txt but a misconfigured robots.txt that blocks AI crawlers is contextualized but inaccessible. You need all three.

Within the broader 14-signal framework, llms.txt also amplifies signals in other categories. When your llms.txt lists named authors with credentials, it reinforces the Citation Readiness signal. When it links to your most substantive content, it reinforces the Content Quality signal by making that content easier for AI systems to find and weight appropriately.

How to Write and Deploy Your Own llms.txt

The process is straightforward for any site owner or developer with access to the site root.

Step 1: Create the file. Open a text editor and create a new file named llms.txt. Use UTF-8 encoding. Do not use a .doc, .docx, or .html format — the file must be plain text.

Step 2: Write the header description. Two to four sentences describing what your site covers and who it is authoritative for. Be specific. "We cover digital marketing" is too vague. "We publish data-backed analysis of B2B SaaS marketing attribution, with a focus on multi-touch models and revenue operations" tells an AI system exactly what queries your content should be considered for.

Step 3: Identify your most important pages by category. For each major content area (blog, documentation, research, resources), select three to eight pages that best represent your expertise. These should be your most substantive, well-cited, or frequently referenced pieces — not your most recent ones. An AI system reading your llms.txt should come away knowing exactly which pages to retrieve first.

Step 4: Write one-sentence descriptions for each page. The description should answer: what specific question does this page answer, and for whom? Not "our blog post about SEO" but "a practitioner's guide to implementing breadcrumb schema markup for multi-level product category pages."

Step 5: Add author information. For each author who has published content on your site, include their name, their role or credential, and their areas of focus. Two to three sentences per author is sufficient.

Step 6: Deploy the file. Place llms.txt in your site's root directory so it is accessible at yourdomain.com/llms.txt. For static sites, this means placing it alongside your index.html. For Next.js or similar frameworks, place it in the /public directory.

Step 7: Verify accessibility. Open a browser in incognito mode and navigate to yourdomain.com/llms.txt. Confirm the file loads as plain text. Confirm the response headers show text/plain as the content type. If your hosting environment is serving it as HTML, check your server configuration.

Step 8: Do not block it in robots.txt. This sounds obvious, but it has happened. If your robots.txt has broad "Disallow: /" rules for non-Google bots, confirm that it does not block access to llms.txt. AI systems cannot use a file they cannot retrieve.

Common Mistakes Site Owners Make With llms.txt

Too long. llms.txt should be under 2,000 words. The purpose is to provide a quick, structured context summary — not to reproduce your entire site's content. An AI system that retrieves a 10,000-word llms.txt is not being helped; it is being burdened.

Too vague. A header description that reads "We cover business topics for professionals" gives an AI system nothing useful. Be specific about your domain, your content type, and your target audience.

Not maintained. llms.txt has a "last updated" field for a reason. A file that lists content from 2023 as your most important pages, when you have published substantially better content since, actively misleads AI systems. Block quarterly time to review and update it.

Wrong file location. The file must be at the domain root: yourdomain.com/llms.txt. Not yourdomain.com/blog/llms.txt. Not yourdomain.com/files/llms.txt. AI systems look for it at the root path specifically.

Blocking it in robots.txt. As noted above: if your robots.txt disallows access to llms.txt, AI systems cannot retrieve it. The file does nothing.

Listing promotional pages instead of authoritative ones. Site owners sometimes use llms.txt to highlight their homepage, pricing page, and product tour. Those pages rarely drive AI citations. The pages that get cited are substantive content pages: research reports, technical guides, in-depth analyses. Prioritize those.

Is llms.txt Worth It Right Now?

The honest answer is: it is not universally adopted, but adoption is accelerating, and the cost of adding it is low enough that the question is almost academic.

As of early 2026, Perplexity has confirmed that it reads llms.txt files when available. Several other AI retrieval systems have signaled support. OpenAI has not published explicit confirmation for ChatGPT's web browsing, but the standard is visible in their public documentation discussions.

The counterargument — that you should wait until more AI systems officially support it — has a logical flaw. The sites that benefit from llms.txt will be the sites that had it in place when AI systems standardized on it, not the sites that added it afterward. The adoption curve for standards like this is steep and fast: robots.txt went from proposal to universal adoption in under three years.

The practical frame: llms.txt takes two to four hours to write well. It requires no ongoing technical maintenance, only content updates when your key pages change. The upside is being contextualized for AI citations on every query where your content is relevant. The downside of not having it is invisible — you will not see the citations you are not receiving.

The Larger Point About Machine-Readable Context

llms.txt represents a pattern that is becoming more broadly true about the web: content that is explicitly machine-readable is cited more than content that requires inference.

JSON-LD schema markup says explicitly: "this is an Article, authored by this person, published on this date." Author bylines say explicitly: "a named human with verifiable credentials wrote this." llms.txt says explicitly: "these are the pages on this site that best answer questions in this domain."

In each case, the explicit signal outperforms the inferred one. AI systems making citation decisions under time constraints favor sources that are transparent about what they are and why they are authoritative. That transparency is not a trick. It is the digital equivalent of what credible publications have always done: show their work.

Your llms.txt is where you show yours.

Scan your site at ConduitScore.com to check whether llms.txt is present and readable — the free scan checks this alongside 13 other AI visibility signals in about 15 seconds. Three free scans per month, no account required. You will see your current score and exactly which signals to fix first.

Check Your AI Visibility Score

See how your website performs across all 7 categories in 30 seconds.

Scan Your Website Free