Simple Object Access Protocol (SOAP)

Introduction to SOAP

Programming the Web

A programmable Web site is made of programmable pages, such as Microsoft Active Server Pages (ASP), Java Server Pages (JSP) and even Java Servlets. By the expression "programming the Web", we mean being able to interrogate existing pages or URLs, and have them understand the question and return the requested information. This will lead us to at least two classes of problems:

1 . Platform incompatibilities

2. Component Interoperability

Platform Incompatibilities

The key to the success of the Web is ubiquity. One can connect to the Internet from anywhere using any browser. So far as connectivity goes, all the physical differences between the various hardware and software platforms disappeared with the HTTP protocol. The whole industry embraced it as the standard protocol for connecting machines, and companies developed their platform-specific code on top of it. Thus the characteristics of the platform are simply a non-issue as long as we (clients) use HTTP as the networking protocol to connect to Web server.

There are a couple of other interesting protocols, such as Microsoft DCOM and OMG IIOP. These are very good for server-to-server communications and are not upto the mark for client-to-server communications. If we want to get the best out of DCOM, we should definitely go for a configuration based on Windows NT/2000. As for IIOP is concerned, we should use an Object Request Broker (ORB) to make sure we get a really interoperable environment. Also in a client-to-server scenario, these two face firewalls and proxy servers that usually block all but a few ports. The HTTP port 80 is one of the exceptions. Thus the standard protocol HTTP is the best solution to overtake the problem of platform incompatibilities for client-server Internet applications. We have described about a new component technology here.

Thus HTTP is good for transporting the call from point to point, regardless of the platform. XML is suited to transporting the method call from client to server, once again regardless of the platform and the component technology used to write the server. Once the server has understood and processed the request, the return data will be packed in XML and transported back to the client through the same mechanism. To make this pattern a universal schema to do remote procedure invocation, we just need two additional steps:

1. Formalizing the XML schema to describe a method call

2. Arranging a run-time to process it

The XML schema, the structure of the XML protocol to identify the method and parameters to call, is the Simple Object Access Protocol (SOAP).

What is SOAP

SOAP is a simple, lightweight, XML-based protocol that facilitates exchange of information in a decentralized distributed environment. It is firewall-friendly and platform independent, and inherits from XML flexibility and extensibility. It can be used not only as data carrier, but also most importantly, for invocation of remote procedures on servers, services, components and objects written with any language and running on any platform

The SOAP simplifies information exchange and provides interoperability across a variety of platforms because it is not tied to any specific object model. It has garnered a lot of interest lately, especially since the World Wide Web Consortium (W3C) undertook its standardization and Microsoft incorporated this protocol into its products, such as DNA 2000 and Microsoft.Net software. SOAP is the latest protocol obtaining world-wide acceptance for distributed computing and two-way remote collaborative software development. Above all, it can jump through firewalls and proxy servers.

It enables interoperability by providing a generalized specification for invoking methods on objects and components using standard hypertext transfer protocol (HTTP) and Extensible Markup Language (XML) data formats. Even though SOAP is mostly associated with HTTP as the underlying transportation protocol, it could be bound also to asynchronous protocols such as SMTP and MSMQ. Finally, SOAP is the best technology for load balancing the HTTP protocol. It promises to be a major industry standard in the future, having finally obtained support from 'the other block' of IBM and even Sun Microsystems.

The SOAP answer of four questions: how to structure message, how to encode it, how to send remote procedure request and receive respond, and how to carry SOAP message in the body of HTTP message. For this, the SOAP specification defines:

1. an envelope, a framework for expressing what is in a message, who should handle the individual parts of th message, and whether handling each component is optional or mandatory;

2. encoding rules, a serialization mechanism for exchanging instances of application-defined data types; and

3. the SOAP remote-procedure-call (RPC) representation, a convention for representing RPCs and responses.

SOAP Examples

Figure 1 is an example of a SOAP message embedded in an HTTP request. In this example, a GetCurrentTime SOAP request goes to a time service. The first five lines are standard HTTP:

    1. a POST request, that is, a client request to the server
    2. the host name;
    3. a specification of the content type as "text/xml" and a character set of "utf-8;"
    4. a content-length specification
    5. a SOAPAction header field indicating the intent of the HTTP
The post request is a standard HTTP verb. All HTTP messages require a host identifier, as well as content-type and length specifications. Firewalls can use the SOAPAction field's content to filter the SOAP request messages. The information to which the SOAPaction uniform resource identifier (URI) points must match the corresponding headers and tags in the SOAP payload; otherwise the firewall security function will reject the request.

The next eleven lines constitute the SOAP message. The first line is the SOAP envelope tag. The second through fifth lines contain the name space identifiers for the SOAP envelope and serialization. The sixth through tenth lines contain the body of the SOAP message. The GetCurrentTime element contains an element called "city". By specifying a city, the SOAP message becomes a request to return the current time for that city.

Figure 2 is a sample response to the previous HTTP request. Similar to the request, this response consists of a SOAP message embedded in an HTTP response. The response message's HTTP portion contains a status of "200 OK" indicating normal request completion. The response returns after the server invokes the requested remote procedure and the server time application obtains the current time value in the specified city. The response includes a GetCurrentTimeResponse element in the SOAP reply containing a time element specifying the current time.

The SOAP Envelope

A SOAP message contains the envelope, an optional header, and a body. The envelope is the top element of the XML document representing the message. I may contain a namespace declaration. The header may extend a message without prior knowledge between communicating parties; typical extensions that may be encoded as header entries include authentication and transaction management. The encoding style global attribute appears as the first element within the envelope. The encoding style attribute consists of one or more URIs identifying the rules for deserializing the SOAP message. The URIs appear in order, from most to least specific.

The SOAP body element is for exchanging information with the message recipient. Typically, it handles RPC calls and error reporting. The body element's child elements or body entries, are encoded as independent elements and normally the user defines them. The W3C SOAP 1.1 document defines a fault element, a special case of a body entry that can carry error and status information.

SOAP Encoding

The SOAP 1.1 specification implements data typing as part of its encoding. The SOAP type system is a simplification of that found in many programming languages. A type may be either simple or a compound constructed of multiple parts. Here comes a brief of the rules for encoding types in SOAP:

1. A value may consist of a string, number, date or a composite of several such primitive values

2. A simple value is one without named parts, such as strings, integers or enumerated values

3. A simple type is a class of simple values.

4. A compound value is an aggregate of relations to other values. For example, one can assign a type to the aggregate of values that represents a purchase order, stock report, or street address

5. A compound type is a class of compound values

6. Within a compound value, each related value is distinguished by its accessor, a role name, ordinal, or both.

7. An array is a compound value in which ordinal position serves as the only distinction among member values

8. A struct is a compound value in which accessor name is the only distinction among member values and no accessor has the same name as any other.

9. A simple type is a class of simple values.

Remote Procedure Calls

The SOAP 1.1 specification defines a standard for RPCs and responses. To invoke a method, software designers must include the target object's URI, the method name, and the method's parameters in the SOAP message. Optionally, programmers could include the method signature and additional header data. The SOAP body element carries both RPC method calls and responses. A single struct containing an accessor for each "in" or "in/out" parameter models a method invocation. A single struct containing an accessor for the return value and each "in" or "in/out" parameter models a method response.

SOAP and BizTalk

BizTalk is an initiative meant to facilitate document exchange over the Internet. At its heart, a BizTalk client works in much the same way as a SOAP client. Both rely on HTTP as the favorite transportation protocol and both use a request-response message pattern. However, SOAP is more general than BizTalk or other business-to-business (B2B) initiatives like OASIS. As a general-purpose mechanism to allow remote method invocation, certainly SOAP can be used to transport BizTalk information. Actually, asking a BizTalk remote server to translate from one XML schema to another is an example of a remote method invocation on a server object that exposes a well-known and predefined interface. SOAP is not limited to the request-response message pattern. If it happens to use an asynchronous protocol like SMTP or MSMQ, then the message pattern must necessarily be different. In this, it would be a one-way method, also known as the fire-and-forget message pattern.

SOAP Security

SOAP is a sort of XML dialect spoken on top of HTTP. As such, it is simply another payload that can be carried via HTTP. Lack of built-in security is sometimes listed as one of the disadvantages of SOAP, although from a security point of view, SOAP is certainly no more dangerous than HTTP itself. The same potential vulnerability we face with plain HTTP packets, we will have to cope with using SOAP. On the other hand, any security feature we can add to HTTP can be added to SOAP also. We can employ SSL and HTTPS. Thus robust security features must be developed for SOAP.

SOAP Applications

As SOAP can send messages from one object to another object across the Internet without knowing what type of object is sending or receiving the message, no distributed-object architecture is needed. Also SOAP can be implemented in any programming language. SOAP is potentially useful for various generic service-provide functions. For instance, an application server provider could develop a SOAP implementation of a customer authentication function and then sell it to e-business operators, who would incorporate it their Web sites. A wire protocol like SOAP is the key element to program the Web. Basically it enables remote access to internal business systems over the Internet. Any URL can be seen and perceived as a service publicly available over the Internet. This is called a web service. Thus a web service is a middle-tier business component exposed via standard web protocols. A web service is a new type of component in the sense that it is ! reachable by anyone and from anywhere. Organizations can get incredible value from this. Integrating existing applications becomes easier and integrating with other vendor's services is possible irrespective of their technologies and platforms. SOAP plays a very critical role in web service design and hence SOAP represents an evolution in the user of the Internet.

Although HTTP is the most common protocol used for SOAP, it is not the only possibility. Session Initiation Protocol (SIP), an Internet telephony protocol can be also used to transport SOAP messages. SIP nodes could then request services from remote nodes, such as network servers and application service brokers. An even more ambitious use of SOAP for SIP would be to provide an RPC mechanism for call-processing scripts, thus providing full call-setup and -connection capability. Much work remains before SOAP can become an industrial-strength protocol. For example, the SOAP specification does not say anything about bidirectional communication. Indeed, SOAP and its implementers can look to a future that is bubbling with possibilities.

Click for a set of hyperlinks to Web resources for SOAP

Click for an overview of XML

Click for an overview of Web Services