Thursday, July 4, 2013

Sending large attachments via SOAP and MTOM in Java

     Sometimes you need to pass a large chunk of unstructured (possibly even binary) data via SOAP protocol — for instance, you wish to attach a file to a message. The default way to do this is to pass the data in an XML element with base64Binary type. What it effectively means is, your data will be Base64-encoded and passed inside the message body. Not only your data gets enlarged by about 30%, but also any client or server that sends or receives such message will have to parse it entirely which may be time and memory consuming on large volumes of data.

     To solve this problem, the MTOM standard was defined. Basically it allows you to pass the content of a base64Binary block outside of the SOAP message, leaving a simple reference element instead. As for the correspondent HTTP binding, the message is transferred as a SOAP with attachments with a multipart/related content type. I won't go into the details here, you may learn it all straight from the above mentioned standards and RFCs.

     The tricky part is, although we've disposed of a 30% volume overhead by passing the data outside of the message, the standards themselves don't specify the ways of processing the messages by the implementations of clients and servers — whether the messages should be completely read into memory with all their attachments during sending and receiving or offloaded on external storage. By default, the implementations (including Java's SAAJ) usually read the attachments completely into memory, thus causing a possibility of running out of memory on large files or heavy-loaded systems. In Java, this is usually signified by a "java.lang.OutOfMemoryError: Java heap space" error.

     In this post I will demonstrate a simple client-server application that can transfer SOAP attachments of arbitrary volume with disk offloading, using Apache CXF on the client and Oracle's SAAJ implementation (a part of JDK 6+) on the server. This will require some tuning for the mentioned frameworks. The complete code of the application is available on GitHub.

     First, we will place the common files (XSD and WSDL) in a separate project, as they will be used by both client and sever. The WSDL schema of the service is relatively straightforward: we have a port with a single operation that consists of a SimpleRequest request and a SimpleResponse response from the server. The file is transferred in the request to the server. The XSD schema of request and response is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<s:schema elementFormDefault="qualified"

    <s:element name="SampleRequest">
            <s:documentation>Service request</s:documentation>
                <s:element name="text" type="s:string" />
                <s:element name="file" type="s:base64Binary" xmime:expectedContentTypes="*/*" />

    <s:element name="SampleResponse">
            <s:documentation>Service response</s:documentation>
            <s:attribute name="text" type="s:string" />

     Take a note of the imported xmime schema, and the usage of xmime:expectedContentTypes="*/*" attribute on a binary data element. This enables us to generate correct JAXB code out of this schema, because by default the base64Binary element corresponds to a byte[] array field in the JAXB-mapped class. But as we'll see, the expectedContentTypes attribute alters the generation of the class:

@XmlType(name = "", propOrder = {
@XmlRootElement(name = "SampleRequest")
public class SampleRequest {

    @XmlElement(required = true)
    protected String text;
    @XmlElement(required = true)
    protected DataHandler file;

     Note that the file field is of type DataHandler, which allows for streaming processing of the data.

     We shall generate the JAXB classes for both client and server, and a service class for the client, using Apache CXF cxf-codegen-plugin for Maven during build-time. The configuration is as follows:

     In this Maven plugin configuration we explicitly specify the wsdlLocation property that will be included into the generated service class. Without it, the generated path to the WSDL file will be a local path on the developer's machine, which we obviously don't want.

     The client (module mtom-soap-client) is plain simple, as it is based on Apache CXF and a generated SampleService class. Here we only enable MTOM for underlying SOAP binding and specify an infinite timeout, as the transfer of large files may take time:

        // Creating a CXF-generated service
        Sample sampleClient = new SampleService().getSampleSoap12();

        // Setting infinite HTTP timeouts
        HTTPClientPolicy httpClientPolicy = new HTTPClientPolicy();
        HTTPConduit httpConduit = (HTTPConduit) ClientProxy.getClient(sampleClient).getConduit();

        // Enabling MTOM for the SOAP binding provider
        BindingProvider bindingProvider = (BindingProvider) sampleClient;
        SOAPBinding binding = (SOAPBinding) bindingProvider.getBinding();

        // Creating request object
        SampleRequest request = new SampleRequest();
        request.setFile(new DataHandler(new FileDataSource(args[0])));

        // Sending request
        SampleResponse response = sampleClient.sample(request);

        System.out.println(String.format("Server responded: \"%s\"", response.getText()));

     The server is based on the Spring WS framework. Only we won't use a typical default <annotation-config /> configuration here and specify a custom DefaultMethodEndpointAdapter configuration, because we need Spring WS to use our custom-configured jaxb2Marshaller bean:

<!-- The service bean -->
<bean class="ru.forketyfork.mtomsoap.server.SampleServiceEndpoint" p:uploadPath="/tmp"/>

<!-- SAAJ message factory configured for SOAP v1.2 -->
<bean id="messageFactory" class=""

<!-- JAXB2 Marshaller configured for MTOM -->
<bean id="jaxb2Marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller"

<!-- Endpoint mapping for the @PayloadRoot annotation -->
<bean class="" />

<!-- Endpoint adapter to marshal endpoint method arguments and return values as JAXB2 objects -->
<bean class="">
    <property name="methodArgumentResolvers">
            <ref bean="marshallingPayloadMethodProcessor" />
    <property name="methodReturnValueHandlers">
            <ref bean="marshallingPayloadMethodProcessor" />

<!-- JAXB@ Marshaller/Unmarshaller for method arguments and return values -->
<bean id="marshallingPayloadMethodProcessor" class="">
    <constructor-arg ref="jaxb2Marshaller" />
     Important thing to notice here is a mtomEnabled property of jaxb2Marshaller, the rest of the configuration is quite typical.

     The SampleServiceEndpoint class is a service that is bound via the @PayloadRoot annotation to process our SampleRequest requests:

    @PayloadRoot(namespace = "", localPart = "SampleRequest")
    public SampleResponse serve(@RequestPayload SampleRequest request) throws IOException {

        // randomly generating file name as a UUID
        String fileName = UUID.randomUUID().toString();
        File file = new File(uploadPath + File.separator + fileName);

        // writing attachment to file
        try(FileOutputStream fos = new FileOutputStream(file)) {

        // constructing the response
        SampleResponse response = new SampleResponse();
        response.setText(String.format("Hi, just received a %d byte file from ya, saved with id = %s",
                file.length(), fileName));

        return response;

     Notice how we work with the request.getFile() field of the request. Remember, the type of the field is DataHandler. What actually happens is, the request.getFile() wraps an InputStream that points to the attachment that was offloaded by SAAJ to disk when the request was received. So we may copy this file to another location or process it in any way while not loading it completely into memory.

     A final trick is to enable the attachment offloading for the Oracle's SAAJ implementation that is bundled with Oracle's JDK starting from version 6. To do that, we must run our server with the -Dsaaj.use.mimepull=true JVM argument.

     Once again, the complete code for the article is available on GitHub.