Improving Relevance Ranking in EPA Search Results

On this Page:


Relevance Ranking

The position of a document in the list of results for a query, known as relevancy rank, is determined by the following factors:

  • Google Page Rank: The most important factor in Google Page Rank is the number of pages at the EPA that link to your page, and the Google Page Rank of those linking pages. Your page will only accrue Google Page Rank when other pages link to it using the primary alias.
  • Document Language: If you believe a specific document should be ranked highly for a specific query, ensure that the language in the document and in the metadata includes the query terms. The higher occurrence of the query terms in the content of the document, the higher the rank.
  • Metadata: The GSA will use the metadata to make connections between your document and the search query. Metadata must be supported by the terms in your document to get the full benefit.
  • URLs: The Google Search Appliance accumulates click history and referring link information based on URL.  If a page is linked using a different host or path alias, this information will not be effective.
  • Link Text: ensure the text of the links to your page is useful for your page. The text that people click on to link to your documents counts as content in your document, and is weighted highly.
  • Document Use: The more often someone uses or clicks on your document, the more useful the GSA perceives it to be, ranking it higher.

Top of Page


Guidance for Improving the Relevancy Rank for a Page or Document

URLs:

For www.epa.gov and intranet.epa.gov content

  • Ensure you link to your internal documents using consistent URLs, and encourage other EPA departments to use the same URLs.The Google Search Appliance crawls and indexes content based upon the primary alias selection tool and will index only URLs that use your TSSMS's primary alias. Ensure that the primary alias is correct and in line with your URLs.

For Drupal WebCMS (www2.epa.gov) content

  • Linking is not an issue, with one exception: For PDFs stored in WebCMS, you must alway link to the Document node (page) for a PDF and never directly to the PDF file itself.  All metadata and history are associated with the document page.  

Top of Page


Link Text:

  • Use intuitive link text that contains intuitive keywords to describe the content you are linking to.
  • Ensure the text of the links to your page is relevant to and useful for your page/document content.

Top of Page


Metadata:

  • Ensure your documents have good quality, useful metadata. The GSA will use the metadata to make connections between the search query and your document. The metadata in the keyword field must be consistent with document content.

Resources:

Top of Page


Document Content:

  • Back up the metadata with content in the document. The GSA performs a full text search taking into account all text in the document. If the query term or terms you would like to connect to your documents are not present in the body of the document, the GSA will rank this page lower.
  • Get to know your users and how they are searching for your content. Ensure the language in the document is consistent with intuitive search queries that users would enter in the search engine. Consult the query reports to see how and what terms people use to find your information.

Resources:

Top of Page


Redundant, Outdated and Trivial (ROT) Pages:

  • Clean up your redundant, outdated and trivial pages. The GSA does not distinguish this content from current, important pages, therefore the ROT pages will crowd the results.

Resources:

Top of Page


Best Bets (Key Match)

If you have followed all of these guidelines, you may request to have a Best Bet created.

A Best Bet connects specific documents to specific queries and displays these documents at the top of the search results page when the specified query is searched. The Best Bets will be offset from the rest of the search results by being displayed above the search results in a blue box with the title: Best Bets For Your Search.

Example of a best bet

How to submit a request for a Best Bet:

Submit a request to the search team at: http://www.epa.gov/epahome/comments.htm. This can also be reached by clicking on “contact us” above the query box on the search page.

Information to include in request:

  • Specify that it's a request for a best bet to be created
  • Include specific query(s) to connect to the documents
  • Include the name and location of the documents you wish to have connected to the query(s)

NOTE: There is a limit of three Best Bets for one query

What will happen after you submit as request for a Best Bet?

  • You will receive a confirmation of receipt of the request.
  • The specified documents will be evaluated for compliance to the above guidelines.
  • The documents will be evaluated to ensure the content of the document is best suited for the specified search query.
  • If needed, the EPA librarians will make suggestions for improvements in respect to the guidelines above.
  • Once all of these areas have been fulfilled and the specified document(s) are still not being ranked highly for the specific query, the best bet will be created.
  • The EPA librarians will continue to monitor the best bet and will remove it once its page rank has improved and the document is returned on the first page of the search results when the specific query is searched.

Top of Page