Web Services, XML, and SOAP: Using php classes and library functions or building customized function libraries of your own are two tools to allow you to construct sites with increasing functionality, and the complexity that goes with it. But a maxim of web development is to never reinvent the wheel. If we can get access to other people's more complex code (i.e., when others welcome our access and offer it freely), we can further extend our code with relative ease. The web is full of prebuilt services which we can use, and XML and SOAP are two tools to let us do this. This chapter explores these, applying them to an extension of the book website that uses a web service provided by Amazon.
Goals for this chapter: 1) Understand XML and SOAP. 2) Use them in an application. The use of XML to communicate between a local site and Amazon uses XML over HTTP, called Representational State Transfer, or REST. Requests from Amazon create an XML document with the response; we then parse the XML document and choose pieces to display on our own page. This relies on the highly structured nature of the document, and structure is what XML is all about. We'll also look at SOAP (Simple Object Access Protocol originally, but it isn't simple; it's just a name now), which allows your page to interact with Amazon's SOAP server to obtain the same information. We'll then build a backend using both of these alternative approaches.
XML: The Extensible Markup Language is the superset that contains XHTML, so you have been speaking / using XML all along. Its home base is the W3C (more). It is a tag-based text format that provides structure for documents. XML is defined through the Document Type Declaration (DTD) or through a Schema which can be accessed through an XML namespace; that this means is that you can create (and this means YOU can create your own, if you are so inclined) an XML set of rules and then use them in a document. The XHTML set of rules, for instance, says that <body> can contains things like divs, paragraphs, or images, and that images can't contain any other element, but do have a set of acceptable attributes (source, width, id, etc.). An example of an XML document is listed in listing 33.1, p. 820-22; this one describes the first edition of the text, complete with tags for authors, publisher, images from Amazon, and browsable topics. The rules are constructed as sets of nested tags; <authors>, for instance, contains <author>, which contains the name of an individual author, etc.
XML documents can be parsed and checked against the rules. If they are structurally correct (tags properly nested, all opening tags matched by a closing tag) and otherwise conforming (all values associated with attributes should be enclosed in quotations, for example), the document is said to be well-formed and valid. Well-formed means that the document follows the W3C XML syntax rules, that the document has one or more elements, with a root element that contains all the rest (e.g., the <html> element), and that elements are properly nested. Valid means that the document has a DTD with which it complies, meaning that it has and follows a set of syntax rules. XML compliancy should result in more durable, multi-platform code that execcutes expediently.
Web Services: You can build an interface to existing online applications, analogous to instantiating a class, so that you can use its methods. The chapter uses the Amazon interface, but you can also easily access the google interface, for example, to implement a Google Maps feature (more). A 10 page list of web services is available here.
SOAP: SOAP provides an XML-based protocol for requesting and receiving web services; that is, SOAP sends XML messages. Soap messages are identified as XML (<?XML...); the body of the message is contained in a <SOAP-ENV> envelope (tag), which contains a body (<SOAP-ENV:Body>). A request to Amazon is presented in listing 33.2, p. 825. There is a SOAP library of functions to facilitate requesting and decoding the response.
WSDL ("wiz-dul"): The Web Services Description Language describes the interface for a particular web service (eg. amazon's is here; you'll set the document tree in your browser). (More: W3C Draft; not yet a W#C recommendation, so it may change.).
Solution Components
In this chapter, we need a shopping cart for customers, and code to connect to Amazon via REST or SOAP. We also need code to parse the message received from Amazon; to improve performance, we might want to implement caching, too. We also want to pass the final contents of the cart back to Amazon for final checkout.
The shopping cart was built in chapter 27. It will do.
To interface with Amazon, we'll use the Amazon Web Services Developer's Kit; this was the link as of 4/23/07. You will need to sign up with Amazon for a developer's token, and if you want to collect a small commission from your sales, ask for an Amazon Associate ID. The downloaded developer's kit describes how the interface works and it has PHP code samples. Note the license limitations, including
- limit of one request per second
- can't cache data coming from Amazon
- 24 hour limit on most cached data (3 months on some attributes)
- caches of prices for more than 1 hour require a disclaimer
- limits on linking back to Amazon pages, use of images, etc.
Parsing XML: PHP has a SimpleXML library, which you can use to parse XML.
Using SOAP with PHP: Use NuSOAP (here), which is a single file which you will download (available for free use under the "Lesser GPL") and call with require_once().
Caching: We'll develop a means to store and reuse the data that is downloaded, until it has passed its use-by date.
Overview of the Solution
The path through the application is outlined in 5 figures (33.1-5, p. 828-32). They include
- 33.1—Opening Screen with Selected Categories to choose; one of the categories is displayed by default, providing links to 10 books.
- 33.2—Book details page, for books selected from the opening screen<./li>
- 33.3—Response page for search (item at top of all pages).
- 33.4—Display of cart contents page.
- 33.5—Amazon page to process final selections.
A complete list of files is given in table 33.1 (p. 832-3). You'll also need the nusoap.php file (available in the Common files for chapter 33.)
Index.php: The core application is in listing 33.3, p. 833-5. It uses the same event-driven approach (i.e., use of an action attribute appended to the form's Action URL to guide a Select:Case page organization), as in chapter 29. Basic features include
- Use of $_SESSION['cart'] to contain the shopping cart, as before. (see use of session_register())
- Several require_once() calls, including constants.php (listing 33.4, p. 836), which sets up constants for the SOAP connection to Amazon and information for the cache for downloaded data.
- Initializations based on any form inputs in $_GET or $_POST, and values for Amazon's nodes (Amazon's categories for products, "modes," and for categories within products, "nodes," which is determined by $browseNode).
- Input data is cleaned up (bottom, p. 837); input data from the customer is used to create cache file names, so these can't contain ".." or "/", for example.
- Adding or dropping items from the cart, or emptying it (part of the tidying up process from possible form entries coming back to the page).
- Include for topbar.php, which brings in headers, style sheet, and a call to ShowSmallCart() (below).
- Main Events, including these possible actions (table 33.2, p. 838), which are then discussed at length:
- browsenode—(default action) shows books in a category
- detail—details of a particular book
- image—shows larger image of book cover
- search—shows results of search
- addtocart, deletefromcart, emptycart, showcart—obvious
Showing books—browsenode(): This action shows a list of categories, and names the current category, and produces a page (10 items) of books in that category. Function showCategories () (listing 33.5, p. 839-40) uses an array of categories that is declared in constants.php, which maps Amazon's browsenode numbers to names (hand-coded array); the category names are displayed, including the name of the current node (category). ShowBrowseNode() listing 33.6, p. 841) gets an AmazonResultSet object from the function getARS() (listing 33.7, p. 841-2) and displays it; this is the key to the whole link to Amazon. You have two possible ways to get the data, either from the cache (Amazon requires this and it is examined first) or live from Amazon; to go live from Amazon, you instantiate a member of the AmazonResultsSet class (listing 33.8, p. 842-7) and use one of its methods, determined by the $type parameter (i.e., either set to browse or asin (for a particular book) or search (keyword search)) . The keyword passed invokes a different method, either ASINSearch() for an individual item, browseNodeSearch() for a category, or keywordSearch().
AmazonResultSet.php (class): This class handles the connection to Amazon and contains the three functions for browsing, getting categories, or searching. It connects by either REST or SOAP (set in constants.php). Browsing is handled by function browseNodeSearch() (listing 33.9, p. 848-9). If this is done by REST/XML over HTTP, the algorithm is discussed on p. 849-54; if by SOAP, the discussion is p. 854-62. What both of these paths do is to provide an array of product information, which can be built into a shopping cart; this is done with cartfunctions.php (listing 33.13, p. 859-62). In turn, the cart page provides a form (detail, p. 862) with an action back to Amazon, passing an array of items and quantities, along with the Amazon Associate ID. This is one-way, meaning that you can't easily go back and forth between your shopping cart and Amazon, without creating duplicate cart items.