ROT Review and Treatment


What is ROT?

ROT is Redundant, Outdated, and/or Trivial content.  Finding and treating ROT is the low hanging fruit of content management. It allows content owners to get rid of the obviously bad content.

Redundant

Redundant content is characterized by repeating the same idea in multiple locations.

  • Duplicate pages/documents
  • Multiple pages on the same topic, written for the same/similar audience

Outdated

Outdated content consists of materials that are no longer in use, or are out-of-date.

  • Pages that refer to something "new" that is now well established.
  • Pages that describe defunct projects, cobwebbed pages.
  • Pages that refer to events that have concluded, which have no intrinsic historical value (conference registration pages, meeting agendas, etc.)

Trivial

Trivial content is of little importance or value, and is considered insignificant to the overall scheme or purpose of your website.

  • Pages that are essentially about nothing at all
  • Default pages and index pages without “real” information

Top of Page

Why we care

ROT removal is the single biggest thing that can be done to improve EPA web site performance and user experience.

Redundant, Outdated and Trivial content:

  • Erodes the credibility and authority of EPA.gov content
  • Leads to poor judgment of all content by EPA.gov audience
  • Interferes with search engine results and makes it harder for people to find the information they are looking for
  • Makes maintenance, especially backup and data storage more costly
  • Makes updating sites time-consuming and burdensome.

The benefits of treating ROT include:

  • Easier to find and use Web content
  • A better user experience
  • Less website maintenance costs for both day to day work, as well as backup and data storage

Top of Page

How to identify and treat ROT

First, realize that this is an ongoing process. Like cleaning your garage or maintaining your inbox, taking care of ROT is not a one-time process. Build content management, including ROT treatment, into your workflow.

Identifying ROT

At this time, Drupal WebCMS does not automatically flag content for review. Until this process is in place, recommended ROT process:

  • Create periodic calendar entries to remind you to review your web content.
    • Chunk your reviews into manageable tasks by targeting specific metadata fields (content type, channel) or by certain types of pages, search terms, etc.
  • Make review easier by creating lists of relevant HTML and PDF content  in your web area
    • For legacy content, use Rottweiler, which has a full file inventory for each TSSMS account.  You may need to combine TSSMS accounts or pull partial inventories.
    • In Drupal WebCMS, use the content tab to generate lists by web area, or other search fields.  Copy and paste the list into a spreadsheet for easy sorting and review.
  • Sort and filter your content lists to identify ROT
    • Sort by title and look for duplicates and/or redundant files
    • Sort by last modified date to find outdated files
  • Use site statistics, analytics, and metrics to help find trivial content
    • Pages that are only getting a few visits a month or less may not be worth keeping

Once you have your list of content, review it for the really obvious bad content. -  Typical signs of ROT:

  • /test/ directories
  • Password protected directories
  • Made up file extensions, including initials and dates
  • Files labeled old, bak, backup, bad
  • Duplicates
  • Old files (old is relative, but files that have not been updated in more than a year could be suspect)
  • Local copies of files hosted elsewhere, such as Agency images, EPA press releases, Federal Register and CFR notices, and Non-EPA materials (e.g. NIH press releases)
  • Non-web formats (e.g. PowerPoint)
  • Old conference material, newsletters, calendars
  • .noindex directories. If you’re hiding it, do you need it?
  • Administrative file types, Dreamweaver admin files, FTP Logs, etc.( extensions/file names: .dwt, .lbi, .lck, .mno, dwsync.xml, ws_ftp.log )
  • Copyrighted documents
  • “Chunked” documents (documents broken up into chapters or other segments)
  • Redirects (useful in the short term, but should be a temporary measure)
  • Pages with “EPA no longer updates this content” notices (Cobwebbed content) - if it’s not worth maintaining, is it worth keeping?
  • Pages that are only getting a few visits each month

Tools

Legacy Content Tools

  • Webman – provides high level overview of file age http://cfint.rtpnc.epa.gov/webman/webman/index.cfm
    • Total number of files
    • Age of the files
  • Maxamine (now Accenture) QA Reports / duplicate files report https://maxamine.epa.gov/maxcentral/
    • Shows duplicate files and their paths
    • Idiosyncrasies - Our aliases confuse it, /test/ and /test/index.html appear as duplicates
  • Maxamine QA report / broken link report
    • Broken links are ROT
  • Robots.txt http://www.epa.gov/robots.txt
    • Directories on epa.gov that are hidden from our search engine
    • If you don’t want the search engine to find it, why is it posted on the public web site?
  • Rottweiler Reports http://intranet.epa.gov/webmvall/rot/out2.html/index.htm Intranet
    • File Inventory
      • Listing of all files in a tssms area (review all files/file types)
      • Spreadsheet format
    • Orphan
      • Files that are not linked to by your own tssms content
      • EPA systems can fool this report. Verify that files are actually orphans before removing
      • There are valid orphan pages (404 pages, “thank you” pages, etc.)
      • Includes password protected directory files
    • Unloved
      • Files that have not been requested/viewed since June 8, 2008, or if newer, have never been viewed
      • Info comes from logs used for Maxamine reports.
      • This is a single point in time report, if the file was last viewed on June 9, 2008, it will not be on the report.

Drupal Content Tools

  • Content / Drupal Dashboard Node View(tab in Drupal)
    • Lists all Basic / Document pages, Webforms
    • Last modified date
  • Robots.txt http://www.epa.gov/robots.txt
    • Directories on epa.gov that are hidden from our search engine
    • If you don’t want the search engine to find it, why is it posted on the public web site?

Treating ROT

Content identified as possible ROT can be treated in one of the following ways.

  • Remove from public server – content is no longer useful for the scope/audience of the site
  • Update and republish – content is still useful but changes are needed to make it current and accurate
  • Approve and republish – content is OK as is

Remove Bad Content

You will improve search results, inspire more user confidence in your site, and make content management an easier task if you remove or delete ROT.  Target the obvious, bad content first:  duplicates, test materials, dated materials (conference logistics, public meeting notices), etc. 

Select Good Candidates to Update

For pages or files with low usage:  A low number of hits, or a high bounce rate, can hint that the file is not valuable to your audience and is a candidate for removal

  • For valuable or good content, rethink how it is titled, where it is linked from, what kind of metadata it has, etc.  Is it easy to browse to?  Is it easy to search for?  Improve titles, metadata, link placement, etc., so that your audience can more readily find your content.

Replace ROT with Strategic Linking

  • Link to the NSCEP document. Before removing content with an EPA report number from your web area, make sure NSCEP has a copy. If they don’t you can provide them with one and remove your unwanted content.
  • Link to the FedSys FR or CFR, or to Regulations.gov.  If the information can be found in the official docket, link to it.
  • Link to EPA’s or the owning-Agency’s Newsroom.  Don’t keep local copies of Press Release notices, new releases, or other announcements. 

Top of Page

Preventing future ROT

  • Be sure that content is relevant to the audience, task, and goal of the site
  • Limit each page to a single topic, audience and purpose
  • For every piece of content, ask yourself if it is necessary. Does it exist already in some other form? Should it exist?
  • Do not duplicate information that is already on another page, or in another web area. Link to it
  • Do not post “your copy” of a document, link to it
  • Write timeless content: Use dates and phrases related to “when” judiciously 

Top of Page

Other Resources

Frequent Questions about Web Sites and Records
http://www.epa.gov/records/faqs/webrecords.htm

Writing for the Web
http://www2.epa.gov/webguide/writing-web-requirements

Top of Page