Hiding Web Pages from Search

EPA builds all web content in the Drupal WebCMS as of January 2013. All new microsites and resource directories will be created using Drupal.  There is still content on EPA's legacy servers and this content will be maintained there until it is transformed and moved into the Drupal WebCMS.  The following information should be used only for minor updates/maintenance of existing pages; any significant updates or revisions to existing pages should be done in the context of One EPA Web content transformation into the Drupal WebCMS

A .noindex or a .filesnoindex file is designed to "hide" your files during the search collection building process. If you are developing pages that you do not want the public to find, these files will prevent the circular indexer from adding your documents to the collection.

.noindex files block the indexer from all files in the directory they are placed in, and all subordinate directories. The .noindex file does not contain any information. Its mere presence in the directory prevents the EPA search engine from indexing the contents of the directory and it's subdirectories.

.filesnoindex files names individual documents in the directory, and therefore it must contain a list of protected files, one per line.

Note that this convention is only in place on the Unix web servers, Buckeye (epapub.epa.gov) and Tulip (epaintra.epa.gov). For documents on other platforms, such as ColdFusion or NT servers, may use robots.txt, or robots metatags.

Sample robots.txt entry to prevent all search engines from crawling the test and reports directories.

User-agent: *
Disallow: /test/
Disallow: /reports/

Sample robots metatag:

<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

For more information about robots.txt and robots metatags, visit robotstxt.org Exit

To create a .noindex file, follow these steps:

  1. Login to the appropriate server:

    $telnet <server> (ex. telnet epapub.epa.gov) 
    login: <username> (ex. abc) <Enter> 
    Password: <pw> (ex. abcdef) <Enter>

  2. Go to the data directory 
    (ex. /public/data/epapages/web/epahome/testsearch)

    % cd /public/data/epapages/web/epahome/testsearch <Enter>

  3. Create the .noindex file

    % touch .noindex <Enter>

Please Note:

  • The individual must have write access to the directory where the .noindex file is placed.
  • Subdirectories of a directory with a .noindex file will not be included in the collection. For example,

    ..data/epapages/web/epahome/testsearch/verity_test

    will be protected with:

    ..data/epapages/web/epahome/testsearch/
  • Web pages in a .noindex directory are accessible from a web browser, but the user would have to know the URL. The search engine would not provide any information if someone were searching for the site.

To create a .filesnoindex file, follow these steps:

  1. Edit a file, either on the Unix server, or on your Windows client:
  2. Add the file names off all the files in a specific directory that you do not want indexed. Include only the filename and extension - do not include the path information. Each filename should be on a new line.
    example:
    file1.html
    file2.pdf
  3. Place the file in the directory that contains the files as .filesnoindex

Please Note:

  • It's impossible to create files beginning with a "." (dot) in Windows. You will either need to create the file directly in Unix, or FTP an arbitrarily named file from Windows to Unix, and rename the file on the Unix side.
  • A .noindex file protects the directory contents and all subordinate directories.
  • The .noindex convention is only in effect for Linux web servers, Buckeye (epapub.epa.gov) and Tulip (epaintra.epa.gov). Use robots.txt and/or robots metatags to protect Domino, ColdFusion and other applications and databases.
  • When you create a .noindex or .filesnoindex files, an automated job will will create an entry in the main robots.txt file for the intranet.epa.gov or www.epa.gov site. If you want external spiders to ignore your directory, but you still want the EPA search engine to index your directory, please contact the Search Engine Webmasters.

Top of Page