XML Search Results

On this page:


Using XML Search Results

You must contact the Search Webmasters before developing and deploying an application that uses XML results to discuss system capacities and security restrictions.

The instructions on this page apply to both public access and intranet.

Top of Page

Why Use XML Results?

Application developers may choose to run searches and receive search results formatted in XML, rather than HTML, for any of these reasons:

  • To perform additional processing, such as looking up results in a database to retrieve additional information.
  • To apply your own XSL transformations.
  • To integrate search results into an existing presentation.
  • To store results from frequent searches in a machine-readable format.

Top of Page

Requesting XML Results

You can request XML results using a standard search form, as described in Create a Web Area Search Form, with the following two changes.

Required Parameters

Parameter Value Description
tmf no Instructs the webapp not to create the package required for HTML presentation (optionally, specify xml=yes)
result_template xmlresults.xsl
xmlkeys.xsl
custom template
See "Using the response packet" for a description of the output formats

Optional Parameters

Parameter Value Description
results_per_page 1 – 1000 When requesting xmlkeys.xsl (URLs only), you can request up 1000 documents per request. When requesting the xmlresults.xsl (the full packet), the maximum is 500.
Start 1 – [1000 – results_per page] Use the start number to page through the results, if you expect to retrieve more than the maximum results_per_page.  The highest document number you can retrieve is 1000.

Top of Page

Using the Response Packets

You can choose from the two results templates for XML results, or code a custom XSL template.

xmlresults.xsl

The output from xmlresults.xsl consists of the Google Search Appliance XML response packet, transformed to the original Northern Light format, plus a section inserted locally labeled EPAAddendum. See EPAAddendum section below for detailed contents.

Sample xmlresults Output

<?xml version="1.0" encoding="UTF-8" ?>
<search_results xmlns:fo="http://www.w3.org/1999/XSL/Format">
<server name="gsadmz1" ip="134.67.99.23" port="80"></server>
<database name="epa_default" encoding="utf-8"></database>
<context>xml_no_dtd</context>
<header total_documents="105000" next_record_available="11"></header>
<result_list>
<result>
<document number="1">
<url>http://www.epa.gov/air/</url>
<title><b>Air</b> and Radiation | US EPA</title>
<date>2010-09-22</date>
<caption><b>Air</b> pollution, clean <b>air</b>, and <b>air</b> quality information is provided by the US EPA's Office of <b>Air</b> and Radiation (OAR). OAR develops <b>...</b> </caption>
<author></author>
<keyword></keyword>
<epa_collection>all</epa_collection>
<mime-type></mime-type>
</document>
</result>

<result>
<document number="10">
<url>http://www.epa.gov/reg3artd/Indoor/iaq.html</url>
<title>Mid-Atlantic <b>Air</b> Protection | Region 3 | US EPA</title>
<date>2010-03-30</date>
<caption>[logo] US EPA. Mid-Atlantic <b>Air</b> Protection. <b>...</b> Indoor <b>Air</b> Quality In The Mid Atlantic Region. Welcome to Region 3's Indoor <b>Air</b> Quality page. <b>...</b> </caption>
<author></author>
<keyword></keyword>
<epa_collection>all</epa_collection>
<mime-type></mime-type>
</document>
</result>
</result_list>
<EPAAddendum>
… See EPAAddendum section…
</EPAAddendum>
</search_results>

Frequently Used Elements in xmlresults

Label Description
Header:total_documents Use EPAAddendum:docs_found instead
Document One instance per document returned up to results_per_page
Document:number The number of the document starting with the beginning of the results list (rather than the beginning of this page).
url The URL of this document as indexed by the search engine. This should be the preferred URL for this document.
Title The title extracted from the metadata on this page.
Caption The dynamically generated description, which may or may not contain the description metadata extracted from this page.
Author The author extracted from the metadata on this page
Keyword This list of keywords extracted from the metadata on this page
Epa_collection Currently the literal "all", for backward compatibility with Northern Light
MIME-Type Will contain "PDF" for PDF documents, otherwise empty

Top of Page

xmlkeys.xsl

xmlkeys.xsl returns only the URLs of the documents returned. Use xmlkeys when you only need the URL of each result document, since it transmits faster.

Sample xmlkeys Output

<?xml version="1.0" encoding="UTF-8" ?>
<search_results xmlns:fo="http://www.w3.org/1999/XSL/Format" DocsFound="105000" DocsReturned="10" NextStartDoc="21">
<url>http://www.epa.gov/apti/</url>
<url>http://www.epa.gov/airmarkets/</url>
<url>http://www.epa.gov/airtrends/</url>
<url>http://www.epa.gov/ord/npd/cleanair-research-intro.htm</url>
<url>http://www.epa.gov/ttn/atw/</url>
<url>http://www.epa.gov/reg5oair/</url>
<url>http://www.epa.gov/air/lead/</url>
<url>http://www.epa.gov/air/urbanair/</url>
<url>http://www.epa.gov/airquality/</url>
<url>http://www.epa.gov/airscience/</url>
</search_results>

EPAAddendum.

EPAAddendum is included in xmlresults and xmlkeys. Use elements from EPAAddendum where available, because the Addendum will not change in when the underlying search engine technology changes.

Sample EPAAddendum Content

<EPAAddendum>
<Status></Status>
<DocsFound>105000</DocsFound>
<DocsReturned>10</DocsReturned>
<QueryText>air</QueryText>
<OriginalQueryText>air</OriginalQueryText>
<SimpleQuery>air</SimpleQuery>
<VerySimpleQuery>air</VerySimpleQuery>
<Cluster>no</Cluster>
<SuggestedQuery></SuggestedQuery>
<SimpleSuggestedQuery></SimpleSuggestedQuery>
<MyEnvironmentQuery></MyEnvironmentQuery>
<PubQuery></PubQuery>
<ResultTemplate>xmlresults.xsl</ResultTemplate>
<Collection>all</Collection>
<SearchURL>gsadmz2.epa.gov</SearchURL>
<TypeOfSearch>epa</TypeOfSearch>
<Sort>term_relevancy</Sort>
<Search_FAQ>no</Search_FAQ>
<DocType>all</DocType>
<BooleOpt></BooleOpt>
<SearchIn></SearchIn>
<Weighted_search>no</Weighted_search>
<FilterClause></FilterClause>
<AreaName></AreaName>
<AreaSearchURL></AreaSearchURL>
<AreaContacts></AreaContacts>
<Referer>http://aggie.rtpnc.epa.gov:81/gsa/</Referer>
<SessionID>EF5D00BF3D67116B55BA8D2BB7317FAC</SessionID>
<SidebarTemplate>search_sidebar</SidebarTemplate>
<PageheadTemplate>epafiles_pagehead</PageheadTemplate>
<PagefootTemplate>epafiles_pagefoot</PagefootTemplate>
<Stylesheet>s/epa.css</Stylesheet>
<Pagination>
<BaseURL>typeofsearch=epa&filterclause=&max_results=100&referer=http%3A%2F%2Faggie.rtpnc.epa.gov%3A81%2Fgsa%2F&result_template=xmlresults.xsl&areaname=&areapagehead=epafiles_pagehead&areapagefoot=epafiles_pagefoot&areasidebar=search_sidebar&stylesheet=s/epa.css&sort=term_relevancy&faq=no&results_per_page=10&cluster=no&sessionid=EF5D00BF3D67116B55BA8D2BB7317FAC</BaseURL>
<Next StartDoc="11" URL="typeofsearch=epa&filterclause=&max_results=100&referer=http%3A%2F%2Faggie.rtpnc.epa.gov%3A81%2Fgsa%2F&result_template=xmlresults.xsl&areaname=&areapagehead=epafiles_pagehead&areapagefoot=epafiles_pagefoot&areasidebar=search_sidebar&stylesheet=s/epa.css&sort=term_relevancy&faq=no&results_per_page=10&cluster=no&sessionid=EF5D00BF3D67116B55BA8D2BB7317FAC&querytext=air&start=11"></Next>
<Start>1</Start>
<End>10</End>
<CurrentPage>1</CurrentPage>
<ResultsPerPage>10</ResultsPerPage>
<Page ellipsis="" number="1"></Page>
<Page URL="typeofsearch=epa&filterclause=&max_results=100&referer=http%3A%2F%2Faggie.rtpnc.epa.gov%3A81%2Fgsa%2F&result_template=xmlresults.xsl&areaname=&areapagehead=epafiles_pagehead&areapagefoot=epafiles_pagefoot&areasidebar=search_sidebar&stylesheet=s/epa.css&sort=term_relevancy&faq=no&results_per_page=10&cluster=no&sessionid=EF5D00BF3D67116B55BA8D2BB7317FAC&querytext=air&start=11" ellipsis="" number="2"></Page>
<Page …
</Pagination>
</EPAAddendum>

Label Description
Status The response from the query server. 1 is success. All others are fatal.
DocsFound The total number of documents found for the query
DocsReturned The number of documents in this page
QueryText The search terms submitted
Sort The method for ordering the results. Default is "term_relevancy"
DocType The type of document requested. Default it "all". Other values are pdf and html.
Pagination:BaseURL The URL for any request, excluding querytext and start
Pagination:Prev The URL to request the prev page (absent for first page)
Pagination:Next The URL to request the next page
Pagination:Start The number of the first document in this packet
Pagination:End The number of the last document in this packet
Pagination:ResultsPerPage The requested number of results per packet.
Pagination:Page One element per page returned in this response
Pagination:Page:number The calculated page number, based on results per page and start number
Pagination:Page:URL The URL of the page, relative to the URL of the webapp

Top of Page

Custom XML Output

If you want output formatted in some way other than what xmlresults and xmlkeys produce, you can code your own XSL stylesheet. You may reference the URL of a remote stylesheet in your result_template parameter, but the stylesheet must be hosted on an EPA server. You must go through the epasearch webapp to request native Google Search Appliance output.

The native GSA XML response is documented in the Google Search Appliance Protocol ReferenceExit

http://nlquery.epa.gov/epasearch/xmlkeys.xsl is an example of a custom result template.

Top of Page