US English (US)
SA Arabic
BS Bosnian

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Book a Demo
  • Product Updates
  • Contact Us
English (US)
US English (US)
SA Arabic
BS Bosnian
  • Home
  • Customization
  • Customization Guides
  • Theme Documents

Robot.txt

Written by Emil Hajric

Updated at October 15th, 2021

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Integrations
  • Getting Started
    New to Helpjuice? Start Here Users Accessibility Content Management Multiple Languages/Translations & Localization Multilingual Knowledge Bases Analytics Video Tutorials
  • Customization
    Customization Guides
  • API V2
  • API V3
  • Article Editor
  • Swifty (In-App Widget)
  • Billing / Subscription
  • Authentication
+ More

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

"Spiders" take a Web page's content and create key search words that enable online users to find pages they're looking for.

It works likes this: a robot wants to vists a Web site URL, sayhttp://www.example.com/welcome.html. Before it does so, it firstschecks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots.The "Disallow: /" tells the robot that it should not visit anypages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan theweb for security vulnerabilities, and email address harvesters used by spammerswill pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sectionsof your server you don't want robots to use.

How To Use It

The "/robots.txt" file is a text file, with one or more records.Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Note that you need a separate "Disallow" line for every URL prefix youwant to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on asingle line. Also, you may not have blank lines in a record, as theyare used to delimit multiple records.

Note also that globbing and regular expression arenot supported in either the User-agent or Disallowlines. The '*' in the User-agent field is a special value meaning "anyrobot". Specifically, you cannot have lines like "User-agent: *bot*","Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fairgame to retrieve. Here follow some examples:

To exclude all robots from the entire server
User-agent: *
Disallow: /

To allow all robots complete access
User-agent: *
Disallow:

(or just create an empty "/robots.txt" file, or don't use one at all)

To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: Google
Disallow:

User-agent: *
Disallow: /
To exclude all files except one

This is currently a bit awkward, as there is no "Allow" field. Theeasy way is to put all files to be disallowed into a separatedirectory, say "stuff", and leave the one file in the level abovethis directory:

User-agent: *
Disallow: /~joe/stuff/

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html


font: http://www.robotstxt.org/robotstxt.html

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

  • Displaying The Most Popular Questions
  • How To Activate More Users

Copyright © 2022 - Helpjuice

Helpjuice, Inc. is a registered US Corporation, EIN # 45-2275731

Download W9
  • Help
  • Features
  • Pricing
  • About
  • Careers
  • Customers
  • Blog
  • Case Studies
  • Resources
  • Knowledge Base Examples
  • Privacy Policy
  • Terms of Service

Why is the knowledge base important?

With a knowledge base, you can allow your customers to self-help themselves, thus reducing your customer support by up to 60%. Furthermore, you can also have your team get instant answers to the questions they need without having to email themselves all using knowledge base software.

What is the purpose of a knowledge base?

The purpose of knowledge base software is to allow you to host your knowledge base/corporate wiki in one centralized 'hub'. Both your customers, and employees can now access information within seconds!

Made with from Miami, Bosnia, Morocco & Brasil

+1 (833) 387 3877 success@helpjuice.com
Expand