BeanIO 2.1 Reference Guide

Revision 2, © 2010-2013 Kevin Seim

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

1.0. Introduction
1.1. What's new in 2.1?
1.2. Migrating from 2.0.x to 2.1
1.3. Migrating from 1.2.x to 2.0
2.0. Getting Started
2.1. My First Stream
3.0. Core Concepts
3.1. BeanReader
3.2. BeanWriter
3.3. Unmarshaller
3.4. Marshaller
3.5. Mapping Files
3.6. StreamFactory
3.7. Exception Handling
4.0. Stream Components
4.1. Streams
4.2. Records
4.3. Fields
4.4. Constants
4.5. Segments
4.6. Stream Validation
4.7. Templates
4.8. Advanced Topics
5.0. Mapping XML Streams
5.1. Introduction
5.2. XML Names
5.3. XML Namespaces
5.4. Streams
5.5. Groups
5.6. Records
5.7. Fields
5.8. Segments
6.0. Annotations and the Stream Builder API
6.1. The Stream Builder API
6.2. Annotations
7.0. Spring Batch Integration
7.1. BeanIO ItemReader/Writer Beans
7.2. BeanIO StreamFactory Bean
8.0. Configuration
8.1. Settings
Appendix A: XML Mapping File Reference
A.1. beanio
A.2. import
A.3. typeHandler
A.4. property
A.5. template
A.6. include
A.7. stream
A.8. parser
A.9. group
A.10. record
A.11. segment
A.12. field
Appendix B: Error Message Parameters
Appendix C: Upgrading a 1.x Mapping File Example

1.0. Introduction

BeanIO is an open source Java framework for reading and writing Java objects from a flat file, stream, or any String input. BeanIO is well suited for batch processing, and currently supports XML, CSV, delimited and fixed length file formats. BeanIO is licensed under the Apache 2.0 License.

1.1. What's new in 2.1?

BeanIO 2.1 includes the following significant enhancements:

1.2. Migrating from 2.0.x to 2.1

Release 2.1 is "mostly" backwards compatible with prior 2.0.x releases. A few behavior changes are noted below that will hopefully improve your experience with BeanIO. Where noted, legacy behavior can be restored using beanio.properties configuration settings.

1.3. Migrating from 1.2.x to 2.0

Release 2.0 is not backwards compatible with prior releases. Sorry. This section contains the steps you'll need to follow to update your code and mapping files.

1.2.1. Java Changes

The org.beanio.BeanReaderContext class was renamed RecordContext in order to support bean objects bound to multiple records.

The exception classes org.beanio.BeanReaderException and org.beanio.BeanWriterException are no longer abstract and may be thrown in a few rare (but fatal) scenarios. The exception classes org.beanio.BeanReaderIOException and org.beanio.BeanWriterIOException are now only thrown when the underlying input stream throws a java.io.IOException, or when a BeanReader/Writer method is invoked on a closed stream.

The org.beanio.stream.RecordReaderFactory and org.beanio.stream.RecordWriterFactory interfaces have been consolidated into the RecordParserFactory interface, which is also used to create RecordMarshaller and RecordUnmarshaller implementations for parsing individual records.

All type handlers and Spring related classes are unchanged or backwards compatible. Internal implementation classes have been moved to the org.beanio.internal package and their API may change in any release without further regard to backwards compatibility.

1.2.2. Mapping File Changes

The mapping file namespace has changed to http://www.beanio.org/2012/03 for all elements.

Release 2.0 includes more lenient defaults for some mapping components. A new stream attribute called strict has been added to support some legacy behavior. If strict is set to true, the following behavior is enabled (which mimics prior releases):

  1. A default order is calculated for groups and records that do not have order explicitly set, based on the order they appear in the mapping file.
  2. CSV, delimited and fixed length record elements will use default minLength and maxLength settings calculated based on it's children. (If strict is false, release 2.0 defaults minLength to 0 and maxLength to unbounded.)

The ordered attribute has been removed from a stream. Since release 2.0, all record and group components are unordered by default. The order attribute is still supported. If you want to continue validating record order, you can set order attributes on ordered groups and records, or set strict to true as described above to have BeanIO calculate a default order.

The reader and writer elements have been combined into a single parser element. Format specific property names have not changed. If you have overridden the default RecordReaderFactory or RecordWriterFactory, you will need to modify your class to implement RecordParserFactory instead.

The minOccurs attribute for a record now defaults to 0, instead of 1.

All bean elements should be renamed segment. A segment element supports all the functionality of a bean element (and more).

For XML formatted streams, the minOccurs attribute for a bean/segment, or a field bound to an XML element, will always default to 1. Prior to release 2.0, minOccurs defaulted to 0 if not nillable. (This is now consistent with XML Schema and hopefully simpler to remember.) The default minOccurs attribute for a field bound to an XML attribute remains 0.

The xmlWrapper attribute has been removed. XML wrappers can be replaced by segment components.

Mapping file changes are illustrated using an example in Appendix C.

If desired, BeanIO's default minOccurs value for a group, record or field can be overridden using property values. See Section 7.0 Configuration for details.

2.0. Getting Started

To get started with BeanIO, download the latest stable version from Google Code, extract the contents of the ZIP file, and add beanio.jar to your application's classpath.

BeanIO requires a version 1.5 JDK or higher. In order to process XML formatted streams, BeanIO also requires an XML parser based on the Streaming API for XML (StAX), as specified by JSR 173. JDK 1.6 and higher includes a StAX implementation and therefore does not require any additional libraries. JDK 1.5 users will need to include the following:

Alternatively, Maven users can declare the following dependencies in their application's POM. Note that the version numbers used below are only examples and may have changed.

    <!-- BeanIO dependency -->
    <dependency>
      <groupId>org.beanio</groupId>
      <artifactId>beanio</artifactId>
      <version>2.1.0</version>
    </dependency>

    <!-- StAX dependencies for JDK 1.5 users -->
    <dependency>
      <groupId>javax.xml</groupId>
      <artifactId>jsr173</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>com.sun.xml.stream</groupId>
      <artifactId>sjsxp</artifactId>
      <version>1.0.2</version>
    </dependency>

2.1. My First Stream

This section explores a simple example that uses BeanIO to read and write a flat file containing employee data. Let's suppose the file is in CSV format and has the following record layout:

PositionFieldFormat
0First NameText
1Last NameText
2Job TitleText
3SalaryNumber
4Hire DateDate (MMDDYYYY)

A sample file is shown below.

Joe,Smith,Developer,75000,10012009
Jane,Doe,Architect,80000,01152008
Jon,Anderson,Manager,85000,03182007

Next, let's suppose we want to read records into the following Java bean for further processing. Remember that a Java bean must have a default no-argument constructor and public getters and setters for all exposed properties.

package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    
    // getters and setters not shown...
}

BeanIO uses an XML configuration file, called a mapping file, to define how bean objects are bound to records. Below is a mapping file, named mapping.xml, that could be used to read the sample employee file and unmarshall records into Employee objects. The same mapping file can be used to write, or marshall, Employee objects to a file or output stream.

<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
    </record>
  </stream>
</beanio>

To read the employee CSV file, a StreamFactory is used to load our mapping file and create a new BeanReader instance. The BeanReader is used to unmarshall Employee objects from the file employee.csv. (For the sake of brevity, proper exception handling is not shown.)

package example;

import org.beanio.*;
import java.io.*;

public class BeanReaderExample {
    public static void main(String[] args) throws Exception {
        // create a StreamFactory
        StreamFactory factory = StreamFactory.newInstance();
        // load the mapping file
        factory.load("mapping.xml");
        
        // use a StreamFactory to create a BeanReader
        BeanReader in = factory.createReader("employeeFile", new File("employee.csv"));
        Employee employee;
        while ((employee = (Employee) in.read()) != null) {
            // process the employee...
        }
        in.close();
    }
}

To write an employee CSV file, the same StreamFactory class is used to create a BeanWriter for marshalling Employee bean objects to the file employee.csv. In this example, the same mapping configuration file is used for both reading and writing an employee file.

package example;

import org.beanio.*;
import java.io.*;
import java.util.*;

public class BeanWriterExample {
    public static void main(String[] args) throws Exception {
        // create a StreamFactory
        StreamFactory factory = StreamFactory.newInstance();
        // load the mapping file
        factory.load("mapping.xml");
        
        Employee employee = new Employee();
        employee.setFirstName("Jennifer");
        employee.setLastName("Jones");
        employee.setTitle("Marketing")
        employee.setSalary(60000);
        employee.setHireDate(new Date());
        
        // use a StreamFactory to create a BeanWriter
        BeanWriter out = factory.createWriter("employeeFile", new File("employee.csv"));
        // write an Employee object directly to the BeanWriter
        out.write(employee);
        out.flush();
        out.close();
    }
}

Running BeanWriterExample produces the following CSV file.

Jennifer,Jones,Marketing,60000,01012011

3.0. Core Concepts

3.1. BeanReader

The org.beanio.BeanReader interface, shown below, is used to read bean objects from an input stream. The read() method returns an unmarshalled bean object for the next record or group of records read from the input stream. When the end of the stream is reached, null is returned.

The method setErrorHandler(...) can be used to register a custom error handler. If an error handler is not configured, read() simply throws the unhandled exception.

The method getRecordName() returns the name of the record (or group) mapped to the most recent bean object read from the input stream, as declared in the mapping file. And getLineNumber() returns the line number of the first record mapped to the most recent bean object read from the input stream. Additional information is available about records read from the stream by calling getRecordCount and getRecordContext. Please consult the API documentation for further information.

Before discarding a BeanReader, close() should be invoked to close the underlying input stream.

package org.beanio;

public interface BeanReader {

    public Object read() throws BeanReaderException;
    
    public int getLineNumber();
    
    public String getRecordName();
    
    public int getRecordCount();
    
    public RecordContext getRecordContext(int index); 
    
    public int skip(int count) throws BeanReaderException;
    
    public void close() throws BeanReaderIOException;
    
    public void setErrorHandler(BeanReaderErrorHandler errorHandler);
}

3.2. BeanWriter

The org.beanio.BeanWriter interface, shown below, is used to write bean objects to an output stream. Calling the write(Object) method marshals a bean object to the output stream. In some cases where multiple record types are not discernible by class type or record identifying fields, the write(String,Object) method can be used to explicitly name the record type to marshal.

Before discarding a BeanWriter, close() should be invoked to close the underlying output stream.

package org.beanio;

public interface BeanWriter {

    public void write(Object bean) throws BeanWriterException;
    
    public void write(String recordName, Object bean) throws BeanWriterException;
    
    public void flush() throws BeanWriterIOException;
    
    public void close() throws BeanWriterIOException;
}

3.3. Unmarshaller

The org.beanio.Unmarshaller interface, shown below, is used to unmarshal a bean object from a String record.

package org.beanio;

public interface Unmarshaller {

    // For all stream formats
    public Object unmarshal(String record) throws BeanReaderException;
    
    // For CSV and delimited formatted streams
    public Object unmarshal(List<String> fields) throws BeanReaderException;
    public Object unmarshal(String[] fields) throws BeanReaderException;

    // For XML formatted streams
    public Object unmarshal(Node node) throws BeanReaderException;
    
    public String getRecordName();
    
    public RecordContext getRecordContext();
}

3.4. Marshaller

The org.beanio.Marshaller interface, shown below, is used to marshal a bean object into a String record.

package org.beanio;

public interface Marshaller {

    public Marshaller marshal(Object bean) throws BeanWriterException;
    
    public Marshaller marshal(String recordName, Object bean) throws BeanWriterException;
    
    // For all stream formats
    public String toString();
    
    // For CSV and delimited formatted streams
    public String[] toArray() throws BeanWriterException;
    public List<String> toList() throws BeanWriterException;
    
    // For XML formatted streams
    public Document toDocument() throws BeanWriterException;
}

Marshalling a single bean object to record text is now as simple as:

String recordText = marshaller.marshal(object).toString();

3.5. Mapping Files

BeanIO uses XML configuration files, called mapping files, to bind a stream layout to bean objects. Multiple layouts can be configured in a single mapping file using stream elements. Each stream is assigned a unique name for referencing the layout. In addition to its name, every stream must declare its format using the format attribute. Supported stream formats include csv, delimited, fixedlength, and xml. Mapping files are fully explained in the next section (4.0. The Mapping File).

<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <stream name="stream1" format="csv"... >
    <!-- record layout... -->
  </stream>
  
  <stream name="stream2" format="fixedlength"... >
    <!-- record layout... -->
  </stream>
    
</beanio>

3.6. StreamFactory

The org.beanio.StreamFactory class is used to load mapping files and create BeanReader, BeanWriter, Marshaller and Unmarshaller instances. The following code snippet shows how to instantiate a StreamFactory, load a mapping file and create the various BeanIO parsers. The load(...) method loads mapping files from the file system (relative to the current working directory), while the method loadResource(...) loads mapping files from the classpath.

// create a StreamFactory
StreamFactory factory = StreamFactory.newInstance();
// load 'mapping-1.xml' from the current working directory
factory.load("mapping-1.xml");
// load 'mapping-2.xml' from the classpath
factory.loadResource("mapping-2.xml");'

// create a BeanReader to read from 'in.txt'
Reader in = new BufferedReader(new FileReader("in.txt"));
BeanReader beanReader = factory.createBeanReader("streamName", in);

// create a BeanWriter to write to 'out.txt'
Writer out = new BufferedWriter(new FileWriter("out.txt"));
BeanWriter beanWriter = factory.createBeanReader("streamName", out);

// create an Unmarshaller to unmarshal bean objects from record text
Unmarshaller unmarshaller = factory.createUnmarshaller("streamName");

// create a Marshaller to marshal bean objects to record text
Marshaller marshaller = factory.createMarshaller("streamName");

3.7. Exception Handling

All BeanIO exceptions extend from BeanIOException, which extends from RuntimeException so that exceptions do not need to be explicitly caught unless desired. BeanReaderException and BeanWriterException extend from BeanIOException and may be thrown by a BeanReader or BeanWriter respectively.

A BeanReaderException is further broken down into the following subclasses thrown by the read() method.

ExceptionDescription
BeanReaderIOException Thrown when the underlying input stream throws an IOException.
MalformedRecordException Thrown when the underlying input stream is malformed based on the configured stream format, and therefore a record could not be accurately read from the stream. In many cases, further reads from the input stream will be unsuccessful.
UnidentifiedRecordException Thrown when a record does not match any record definition configured in the mapping file. If the stream layout does not strictly enforce record sequencing, further reads from the input stream are likely to be successful.
UnexpectedRecordException Thrown when a record is read out of order. Once record sequencing is violated, further reads from the input stream are likely to be unsuccessful.
InvalidRecordException Thrown when a record is matched, but the record is invalid for one of the following reasons:
  • A record level validation failed
  • One or more field level validations failed
  • Field type conversion failed
This exception has no effect on the state of the BeanReader and further reads from the input stream can be safely performed.
InvalidRecordGroupException Extends from InvalidRecordException and is thrown when one or more records in a group (that are mapped to a single bean object) are invalid. This exception has no effect on the state of the BeanReader and further reads from the input stream can be safely performed.
BeanReaderException Thrown directly in a few rare unrecoverable scenarios.

When a BeanReaderException is thrown, information about the failed record(s) can be accessed by calling exception.getRecordContext() to obtain a org.beanio.RecordContext. Please refer to the API javadocs for more information.

package org.beanio;

public interface RecordContext {
    public int getLineNumber();
    public String getRecordText();
    public String getRecordName();
    public boolean hasErrors();
    public boolean hasRecordErrors();
    public Collection<String> getRecordErrors();
    public String getFieldText(String fieldName);
    public String getFieldText(String fieldName, int index);
    public boolean hasFieldErrors();
    public Map<String, Collection<String>> getFieldErrors();
    public Collection<String> getFieldErrors(String fieldName);
}

3.7.1. BeanReaderErrorHandler

If you need to handle an exception and continue processing, it may be simpler to register a BeanReaderErrorHandler using the beanReader.setErrorHandler() method. The BeanReaderErrorHandler interface is shown below. Any exception thrown by the error handler will be rethrown by the BeanReader.

package org.beanio;

public interface BeanReaderErrorHandler {
    public void handleError(BeanReaderException ex) throws Exception;
}

The following example shows how invalid records could be written to a reject file by registering an error handler extending BeanReaderErrorHandlerSupport, a subclass of BeanReaderErrorHandler. All other exceptions are left uncaught and will bubble up to the calling method.

    BeanReader input;
    BufferedWriter rejects;
    try {
        input.setErrorHandler(new BeanReaderErrorHandlerSupport() {
            public void invalidRecord(InvalidRecordException ex) throws Exception {
                // if a bean object is mapped to a record group,
                // the exception may contain more than one record
            	for (int i=0, j=ex.getRecordCount(); i<j; i++) {
                    rejects.write(ex.getRecordContext(i).getRecordText());
                    rejects.newLine();
                }
            }
        });
        
        Object record = null;
        while ((record = input.read()) != null) {
            // process a valid record
        }
        
        rejects.flush();
    }
    finally {
        input.close();
        rejects.close();
    }

4.0. Stream Components

This section covers the basic components used by BeanIO to map an input stream or String to Java objects. All examples are shown using a mapping file, but the concepts (and most attributes) are the same whether using the stream builder API, mapping file, Java annotations, or any combination thereof.

4.1. Streams

A typical mapping file contains one or more stream layouts. A stream must have a name and format attribute configured. The name of the stream is used to reference the layout when creating a parser using a StreamFactory. And the format instructs BeanIO how to interpret the stream. Supported formats include xml, csv, delimited and fixedlength.

<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <stream name="stream1" format="csv"... >
    <!-- record layout... -->
  </stream>
  
  <stream name="stream2" format="fixedlength"... >
    <!-- record layout... -->
  </stream>
    
</beanio>

BeanIO parses (and formats) a record from a stream or text using a record parser generated by a RecordParserFactory. BeanIO allows you to create and customize your own RecordParserFactory, but in most cases you can simply configure BeanIO's default record parser factory using a stream's parser element. The parser element allows you to set format specific properties on a RecordParserFactory. For example, the following stream layout changes the delimiter to a pipe for the delimited stream 's1':

<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <stream name="s1" format="delmiited">
    <parser>
      <property name="delimiter" value="|" />
    </parser>
    <!-- record layout... -->
  </stream>
  
</beanio>

The next few sections list available parser properties for each stream format.

4.1.1. CSV Streams

CSV formatted streams are parsed according to RFC 4180 with one exception: multi-line records are disabled (but this can be overridden).

The following properties can be used to customize default CSV parsers:

Property Name Type Description Affects
delimiter char The field delimiter. Defaults to a comma. *
quote char The quotation mark character used to wrap fields containing a delimiter character, a quotation mark, or new lines. Defaults to the double quotation mark, ". *
escape Character The character used to escape a quotation mark in a quoted field. Defaults to the double quotation mark, ". *
comments String[] A comma separated list of values for identifying commented lines. If a line read from an input stream begins with any of the configured values, the line is ignored. A backslash may be used to escape a comma and itself. All whitespace is preserved.

Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking.

BeanReader
multilineEnabled boolean If set to true, quoted fields may contain new line characters. Defaults to false. BeanReader
whitespaceAllowed boolean If set to true, whitespace is ignored and allowed before and after quoted values. For example, the following is allowed:
    Jennifer, "Jones" ,24
Defaults to false.
BeanReader, Unmarshaller
unquotedQuotesAllowed boolean If set to true, field text containing quotation marks do not need to be quoted unless the field text starts with a quotation mark. For example, the following is allowed:
    Jennifer,She said "OK"
Defaults to false.
BeanReader, Unmarshaller
recordTerminator String The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. BeanWriter
alwaysQuote boolean If set to true, field text is always quoted. By default, a field is only quoted if it contains a delimeter, a quotation mark or new line characters. BeanWriter, Marshaller

4.1.2. Delimited Streams

The default delimited parsers can be customized using the following properties:

Property Name Type Description Affects
delimiter char The field delimiter. Defaults to the tab character. *
escape Character The escape character allowed to escape a delimiter or itself. By default, escaping is disabled. *
lineContinuationCharacter Character If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. BeanReader
recordTerminator Character The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. BeanReader, BeanWriter
comments String[] A comma separated list of values for identifying commented lines. If a line read from an input stream begins with any of the configured values, the line is ignored. A backslash may be used to escape a comma and itself. All whitespace is preserved.

Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking.

BeanReader

4.1.3. Fixed Length Streams

The default fixed length parsers can be customized using the following properties:

Property Name Type Description Affects
lineContinuationCharacter Character If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. BeanReader
recordTerminator Character The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. BeanReader, BeanWriter
comments String[] A comma separated list of values for identifying commented lines. If a line read from an input stream begins with any of the configured values, the line is ignored. A backslash may be used to escape a comma and itself. All whitespace is preserved.

Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking.

BeanReader

4.1.4. XML Streams

The default XML parsers can be customized using the following properties:

Property Name Type Description Affects
suppressHeader boolean If set to true, the XML header is suppressed in the marshalled document. Defaults to false. BeanWriter, Marshaller
version String The XML header version. Defaults to 1.0. BeanWriter, Marshaller
encoding String The XML header encoding. Defaults to utf-8. Note that this setting has no bearing on the actual encoding of the output stream. If set to "", an encoding attribute is not included in the header. BeanWriter, Marshaller
namespaces String A space delimited list of XML prefixes and namespaces to declare on the root element of a marshalled document. The property value should be formatted as
prefix1 namespace1 prefix2 namespace2...
BeanWriter, Marshaller
indentation Integer The number of spaces to indent each level of XML. By default, indentation is disabled using a value of -1. BeanWriter, Marshaller
lineSeparator String The character(s) used to separate lines when indentation is enabled. By default, System.getProperty("line.separator") is used. BeanWriter, Marshaller

4.2. Records

Each record type read from an input stream or written to an output stream must be mapped using a record element. A stream mapping must include at least one record. A record mapping is used to validate the record and bind field values to a bean object. A simple record configuration is shown below.

<beanio>

  <stream name="stream1" format="csv">
    <record name="record1" class="example.Record">
      <field name="firstName" />
      <field name="lastName" />
      <field name="age" />
    </record>
  </stream>
  
</beanio>

In this example, a CSV formatted stream is mapped to a single record composed of three fields: first name, last name and age. When a record is read from a stream using a BeanReader, the class example.Record is instantiated and its firstName, lastName and age attributes are set using standard Java bean setter naming conventions (e.g. setFirstName(String)).

Similarly, when a example.Record bean object is written to an output stream using a BeanWriter, its firstName, lastName and age attributes are retrieved from the bean object using standard Java bean getter naming conventions (e.g. getFirstName()).

BeanIO also supports Map based records by setting a record's class attribute to map, or to the fully qualified class name of any class assignable to java.util.Map. Note that if you plan to use Map based records, field types may need be explicitly configured using the type attribute, or BeanIO will assume the field is of type java.lang.String The type attribute is further explained in section 4.6. Field Type Conversion.

<beanio>

  <stream name="stream1" format="csv">
    <record name="record1" class="map">
      <field name="firstName" />
      <field name="lastName" />
      <field name="age" type="int"/>
    </record>
  </stream>
  
</beanio>

4.2.1. Record Identification

Oftentimes, a stream is made up of multiple record types. A typical batch file may include one header, one trailer, and zero to many detail records. BeanIO allows a record to be identified by one or more of its fields using expected literal values or regular expressions. If desired, BeanIO can be used to validate the order of all records in the input stream.

To see how a stream can be configured to handle multiple record types, let's modify our Employee file to include a header and trailer record as shown below. Each record now includes a record type field that identifies the type of record.

Header,01012011
Detail,Joe,Smith,Developer,75000,10012009
Detail,Jane,Doe,Architect,80000,01152008
Detail,Jon,Anderson,Manager,85000,03182007
Trailer,3

The mapping file can now be updated as follows:

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" format="MMddyyyy" />
    </record>
    <record name="employee" minOccurs="0" maxOccurs="unbounded" class="example.Employee">
      <field name="recordType" rid="true" literal="Detail" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
    </record>
    <record name="trailer" minOccurs="1" maxOccurs="1" class="example.Trailer">
      <field name="recordType" rid="true" literal="Trailer" />
      <field name="recordCount" />
    </record>  
  </stream>
  
</beanio>

There are several new record and field attributes introduced in this mapping file, so we'll explain each new attribute in turn.

First, a field used to identify a record must be configured as a record identifier using rid="true". There is no limitation to the number of fields that can be used to identify a record, but all fields where rid="true" must be satisfied before a record is identified. If there is no field configured as a record identifier, by default the record will always match.

    <record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" />
    </record>

Second, all record identifying fields must have a matching validation rule configured. In our example, the literal value Header in the record type field is used to identify the header record. Literal values must match exactly and can be configured using the literal field attribute. Alternatively, record identifying fields may use a regular expression to match field text using the regex field attribute.

    <record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" />
    </record>

Third, each record defines the minimum and maximum number of times it may repeat using the attributes minOccurs and maxOccurs. Based on our configuration, exactly one header and trailer record is expected, while the number of detail records is unbounded.

    <record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" />
    </record>

If minOccurs and/or maxOccurs are not set, the minimum occurrences of a record defaults to 0 and maximum occurrences is unbounded.

Its also possible to identify delimited and fixed length records based on their length. The ridLength record attribute can be used to specify a range of lengths to identify the record.

4.2.2. Record Ordering

As explained in the previous section, a stream can support multiple record types. By default, a BeanReader will read records in any order. But if desired, BeanIO can enforce record ordering using an order attribute on each record. The order attribute can be assigned any positive integer value greater than 0. Records that are assigned the same number may be read from the stream in any order. If order is set for one record, it must be set for all other records (and groups) that share the same parent.

In our previous example, if we want enforce that the header record is the first record in the file, the trailer is the last, and all detail records appear in the middle, the mapping file could be changed as follows. Using this configuration, if a detail record were to appear before the header record, the BeanReader will throw an UnexpectedRecordException when the detail record is read out of order.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="header" order="1" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" format="MMddyyyy" />
    </record>
    <record name="employee" order="2" minOccurs="0" maxOccurs="unbounded" class="example.Employee">
      <field name="recordType" rid="true" literal="Detail" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
    </record>
    <record name="trailer" order="3" minOccurs="1" maxOccurs="1" class="example.Trailer">
      <field name="recordType" rid="true" literal="Trailer" />
      <field name="recordCount" />
    </record>  
  </stream>
  
</beanio>

4.2.3. Record Grouping

In some cases, a stream may be further divided into batches or groups of records. Continuing with our employee file, lets suppose employee detail records are batched by department, where each group of employees has a department header and a department trailer record. Thus an input file may look something like this:

Header,01012011
DeptHeader,Development
Detail,Joe,Smith,Developer,75000,10012009
Detail,Jane,Doe,Architect,80000,01152008
DeptTrailer,2
DeptHeader,Product Management
Detail,Jon,Anderson,Manager,85000,03182007
DeptTrailer,1
Trailer,2

BeanIO allows you to define groups of records using a group element to wrap the record types that belong to the group. Groups support the same order, minOccurs, and maxOccurs attributes, although there meaning is applied to the entire group. Once a record type is matched that belongs to a group, all other records in that group where minOccurs is greater that 1, must be read from the stream before the group may repeat or a different record can be read. Our mapping file would now look like this:

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="header" order="1" minOccurs="1" maxOccurs="1" class="example.Header">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" format="MMddyyyy" />
    </record>
    <group name="departmentGroup" order="2" minOccurs="0" maxOccurs"unbounded">
      <record name="deptHeader" order="1" minOccurs="1" maxOccurs="1" class="example.DeptHeader">
        <field name="recordType" rid="true" literal="DeptHeader" />
        <field name="departmentName" />
      </record>
      <record name="employee" order="2" minOccurs="0" maxOccurs="unbounded" class="example.Employee">
        <field name="recordType" rid="true" literal="Detail" />
        <field name="firstName" />
        <field name="lastName" />
        <field name="title" />
        <field name="salary" />
        <field name="hireDate" format="MMddyyyy" />
      </record>
      <record name="deptTrailer" order="3" minOccurs="1" maxOccurs="1" class="example.DeptTrailer">
        <field name="recordType" rid="true" literal="DeptTrailer" />
        <field name="employeeCount" />
      </record>  
    </group>
    <record name="trailer" order="3" minOccurs="1" maxOccurs="1" class="example.Trailer">
      <field name="recordType" rid="true" literal="Trailer" />
      <field name="departmentCount" />
    </record>  
  </stream>
  
</beanio>

The stream definition itself is a record group with defaults minOccurs="0" and maxOccurs="1". If you want your BeanReader to throw an exception if the stream is empty, simply change minOccurs to 1, or if you want to allow the entire stream to repeat indefinitely, simply change maxOccurs to unbounded as shown below.

<beanio>

  <stream name="employeeFile" format="csv" minOccurs="1" maxOccurs="unbounded">
    <!-- Record layout... -->
  </stream>
  
</beanio>

4.3. Fields

A record is made up of one or more fields, which are validated and bound to bean properties using the field element. All fields must specify a name attribute, which by default, is used to get and set the field value from the bean object.

Default getter and setter methods can be overridden using getter and setter attributes as shown below. If a field is a constructor argument, setter can be set to '#N' where N is the position of the argument in the constructor starting at 1 (not shown).

<beanio>

  <stream name="stream1" format="csv">
    <record name="record1" class="example.Record">
      <field name="firstName" />
      <field name="lastName" setter="setSurname" getter="getSurname"/>
      <field name="age" />
    </record>
  </stream>
  
</beanio>

Fields found in a stream that do not map to a bean property can be declared using the ignore field attribute. Note that any configured validation rules are still applied to ignored fields (not shown).

<beanio>

  <stream name="stream1" format="csv">
    <record name="record1" class="example.Record">
      <field name="firstName" />
      <field name="lastName" />
      <field name="age" />
      <field name="filler" ignore="true" />
    </record>
  </stream>
  
</beanio>

By default, BeanIO expects fields to appear in a CSV, delimited or fixed length stream in the same order they are declared in the mapping file. If this is not the case, a position field attribute can be configured for each field. If a position is declared for one field, a position must be declared for all other fields in the same record. For delimited (and CSV) formatted streams, position should be set to the index of the first occurrence of the field in the record, beginning at 0. For fixed length formatted streams, position should be set to the index of the first character of the first occurrence of the field in the record, beginning at 0. A negative position can be used to specify a field location relative to the end of the record. For example, a position of -2 indicates the second to last field in a delimited record.

The following example shows how the position attribute can be used. Although the fields are declared in a different order, the record definition is identical to the previous example. When positions are explicitly configured for an input stream, there is no need to declare all fields in a record, unless desired for validation purposes.

<beanio>

  <stream name="stream1" format="csv">
    <record name="record1" class="example.Record">
      <field name="filler" position="3" ignore="true" />
      <field name="lastName" position="1" />
      <field name="age" position="2"/>
      <field name="firstName" position="0" />
    </record>
  </stream>
  
</beanio>

4.3.1. Field Type Conversion

The property type of a field is determined by introspecting the bean object the field belongs to. If the bean class is of type java.util.Map or java.util.Collection, BeanIO will assume the field is of type java.lang.String, unless a field type is explicitly declared using a field's type attribute.

The type attribute may be set to any fully qualified class name or to one of the supported type aliases below. Type aliases are not case sensitive, and the same alias may be used for primitive types. For example, int and java.lang.Integer bean properties will use the same type handler registered for the type java.lang.Integer, or alias integer or int.

Class NamePrimitiveAlias(es)
java.lang.String-string
java.lang.Booleanbooleanboolean
java.lang.Bytebytebyte
java.lang.Charactercharcharacter
char
java.lang.Shortshortshort
java.lang.Integerintinteger
int
java.lang.Longlonglong
java.lang.Floatfloatfloat
java.lang.Doubledoubledouble
java.math.BigInteger-biginteger
java.math.BigDecimal-bigdecimal
decimal
java.util.Date1- datetime
date
time
java.util.Calendar2- calendar
calendar-datetime
calendar-date
calendar-time
java.util.UUID-uuid
java.net.URL-url
java.lang.Enum3--

1 By default, the date alias is used for java.util.Date types that contain date information only, and the time alias is used for java.util.Date types that contain only time information. Only the datetime alias can be used to replace the default type handler for the java.util.Date class.

2 By default, the calendar-date alias is used for java.util.Calendar types that contain date information only, and the calendar-time alias is used for java.util.Date types that contain only time information. Only the calendar-datetime and calendar aliases can be used to replace the default type handler for the java.util.Calendar class.

3 By default, enums are converted using Enum.valueOf(Class, String). If format="toString", the enum will be converted using values computed by calling toString() for each enum value. In either case, conversion is case sensitive. As with other types, a custom type handler can also be used for enums.

Optionally, a format attribute can be used to pass a decimal format for java.lang.Number types, and for passing a date format for java.util.Date types. In the example below, the hireDate field uses the SimpleDateFormat pattern "yyyy-MM-dd", and the salary field uses the DecimalFormat pattern "#,##0". For more information about supported patterns, please reference the API documentation for Java's java.text.DecimalFormat and java.text.SimpleDateFormat classes.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="header" minOccurs="1" maxOccurs="1" class="map">
      <field name="recordType" rid="true" literal="Header" />
      <field name="fileDate" type="java.util.Date" />
    </record>
    <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map">
      <field name="recordType" rid="true" literal="Detail" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" type="int" format="#,##0" />
      <field name="hireDate" type="date" format="yyyy-MM-dd" />
    </record>
    <record name="trailer" minOccurs="1" maxOccurs="1" class="map">
      <field name="recordType" rid="true" literal="Trailer" />
      <field name="recordCount" type="int" />
    </record>  
  </stream>
  
</beanio>

4.3.2. Custom Type Handlers

Field type conversion is performed by a type handler. BeanIO includes type handlers for common Java types, or you can create your own type handler by implementing the org.beanio.types.TypeHandler interface shown below. When writing a custom type handler, make sure to handle null values and empty strings. Only one instance of your type handler is created, so if you plan to concurrently read or write multiple streams, make sure your type handler is also thread safe.

package org.beanio.types;

public interface TypeHandler {
    public Object parse(String text) throws TypeConversionException;
    public String format(Object value);
    public Class<?> getType();
}

The following example shows a custom type handler for the java.lang.Boolean class and boolean primitive based on "Y" or "N" indicators.

import org.beanio.types.TypeHandler;

public class YNTypeHandler implements TypeHandler {
    public Object parse(String text) throws TypeConversionException {
        return "Y".equals(text);
    }
    public String format(Object value) {
        return value != null && ((Boolean)value).booleanValue() ? "Y" : "N";
    }
    public Class<?> getType() {
        return Boolean.class;
    }
}

A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.

Many default type handlers included with BeanIO support customization through the use of one or more property elements, where the name attribute is a bean property of the type handler, and the value attribute is the property value.

Type handlers can be declared globally (for all streams in the mapping file) or for a specific stream. Globally declared type handlers may optionally use a format attribute to narrow the type handler scope to a specific stream format.

In the example below, the first DateTypeHandler is declared globally for all stream formats. The second DateTypeHandler overrides the first for java.util.Date types in an XML formatted stream, and the YNTypeHandler is declared only for the 'employeeFile' stream. Stream specific type handlers override global type handlers when declared with the same name or for the same type.

<beanio>

  <typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler">
    <property name="pattern" value="MMddyyyy" />
    <property name="lenient" value="true" />
  </typeHandler>
  <typeHandler type="java.util.Date" format="xml" class="org.beanio.types.DateTypeHandler">
    <property name="pattern" value="yyyy-MM-dd" />
  </typeHandler>

  <stream name="employeeFile" format="csv">
    <typeHandler name="ynHandler" class="example.YNTypeHandler" />
  
    <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map">
      <field name="recordType" rid="true" literal="Detail" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" />
      <field name="exempt" typeHandler="ynHandler" />
    </record> 
  </stream>
  
</beanio>

4.3.3. Repeating Fields

Repeating fields are also supported by BeanIO. For example, lets assume our Employee bean object contains a list of account numbers.

package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    List<Integer> accounts;
    
    // getters and setters not shown...
}

And lets assume our input file now looks like this:

Joe,Smith,Developer,75000,10012009
Chris,Johnson,Sales,80000,05292006,100012,200034,200045
Jane,Doe,Architect,80000,01152008
Jon,Anderson,Manager,85000,03182007,333001

In this example, the accounts bean property can be defined in the mapping file using a collection field attribute. The collection attribute can be set to the fully qualified class name of a java.util.Collection subclass, or to one of the collection type aliases below.

ClassAliasDefault Implementation
java.util.Collectioncollectionjava.util.ArrayList
java.util.Listlistjava.util.ArrayList
java.util.Setsetjava.util.HashSet
(Java Array)arrayN/A

Repeating fields can declare the number of occurrences of the field using the minOccurs and maxOccurs field attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
      <field name="accounts" type="int" collection="list" minOccurs="0" maxOccurs="unbounded" />
    </record> 
  </stream>
  
</beanio>

Flat file formats (CSV, delimited and fixed length) may only contain one field or segment of indeterminate length (i.e. where maxOccurs is greater than minOccurs). The position of components that follow are assumed to be relative to the end of the record.

If a field repeats a fixed number of times based on a preceding field in the same record, the occursRef attribute can be used to identify the name of the controlling field. If the controlling field is not bound to a separate property of its parent bean object, be sure to specify ignore="true". The following mapping file shows how to configure the accounts field occurrences to be dependent on the numberOfAccounts field. If desired, minOccurs and maxOccurs may still be specified to validate the referenced field occurrences value.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
      <field name="numberOfAccounts" ignore="true" />
      <field name="accounts" type="int" collection="list" occursRef="numberOfAccounts" />
    </record> 
  </stream>
  
</beanio>

Note that a repeating field can not be used for record identification.

4.3.4. Fixed Length Fields

Fixed length fields require a little extra configuration than their delimited counterparts. Let's redefine our employee file example using the fixed length format below.

PositionFieldFormatLength
0First NameText10
10Last NameText10
20Job TitleText10
30SalaryNumber6
36Hire DateDate (MMDDYYYY)8

A fixed length version of the employee file might look like the following:

Joe       Smith    Developer 07500010012009
Jane      Doe      Architect 08000001152008
Jon       Anderson Manager   08500003182007

The length of a fixed length field must be configured using the length field attribute. By default, fixed length fields are left justified and padded with spaces, but these settings can be overridden using the padding and justify field attributes. Field padding can be set to any single character, and field justification can be set to left or right. Using these attributes, our mapping file can now be updated as follows:

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" length="10" />
      <field name="lastName" length="10" />
      <field name="title" length="10" />
      <field name="salary" length="6" padding="0" justify="right" />
      <field name="hireDate" length="8" format="MMddyyyy" />
    </record> 
  </stream>
  
</beanio>

The configured padding character is removed from the beginning of the field if right justified, or from the end of the field if left justified, until a character is found that does not match the padding character. If the entire field is padded, Number property types default to the padding character if it is a digit, and the padding character is ignored for Character types. To illustrate this, some examples are shown in the table below.

Justify Type Padding Padded Text Unpadded Text
left String " " "George  " "George"
"        " ""
Character " " "A" "A"
" " " "
right Number "0" "00123" "123"
"00000" "0"
"9" "00000" "00000"
"99999" "9"
"X" "XXXXX" ""

The marshalling and unmarshalling behavior of null field values for a padded field is further controlled using the required attribute. If required is set to true, null field values are marshalled by filling the field with the padding character. If required is set to false, a null field value is marshalled as spaces for fixed length streams and an empty string for non-fixed length streams. Similarly, if required is set to false, spaces are unmarshalled to a null field value regardless of the padding character. To illustrate this, the following table shows the field text for a right justified zero padded 3 digit number.

Required Field Value Field Text
(Fixed Length)
Field Text
(CSV, Delimited, XML)
true 0 "000" "000"
null "000"1 "000"1
false 0 "000" "000"
null "   " ""

1 Applies to marshalling only. Unmarshalling "000" would produce a field value of 0.

As hinted to above, padding settings can be applied to any field for any stream type.

4.4. Constants

If a bean property does not map to a field in the stream, a constant property value can still be set using a property element. Like a field, all properties must specify a name attribute, which by default, is used to get and set the property value from the bean object. Properties also require a value attribute for setting the textual representation of the property value. The value text is type converted using the same rules and attributes (type, typeHandler and format) used for field type conversion described above. Collection type properties are not supported.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="map">
      <property name="recordType" value="employee" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
    </record> 
  </stream>
  
</beanio>

Constant properties may be useful in two scenarios:

4.5. Segments

A segment is a group of fields within a record. Segments are most often used to bind a group of fields to a nested bean object or collection of bean objects, and are configured in a mapping file using a segment element.

Prior to release 2.x, the bean element performed this task. A segment supports all the functionality of a bean element, but unlike the original bean element, a segment is not required to be bound to a bean object. This allows repeating segments to be fully validated during unmarshalling, without necessarily binding the fields to a bean object. An "unbound" segment also allows an arbitrary number of XML fields to be wrapped by other XML nodes without creating bean objects that mirror the same hierarchy.

4.5.1. Nested Beans

As mentioned, a record can be divided into nested bean objects using a segment element. First, let's suppose we store an address in our CSV employee file, so that the record layout might look like this:

PositionFieldFormat
0First NameText
1Last NameText
2Job TitleText
3SalaryNumber
4Hire DateDate (MMDDYYYY)
5StreetText
6CityText
7StateText
8ZipText

Second, lets suppose we want to store address information in a new Address bean object like the one below, and add an Address reference to our Employee class.

package example;

public class Address {
    String street;
    String city;
    String state;
    String zip;
    
    // getters and setters not shown...
}
package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    Address mailingAddress;
    
    // getters and setters not shown...
}

With this information, we can now update our employee CSV mapping file to accomodate the nested Address object. A segment must include a name attribute, and may optionally provide a class attribute to bind its children to a bean object. If class is set, the attribute must be set to the fully qualified class name of the bean object, or to map, or to the class name of any concrete java.util.Map implementation. If the bean class is of type java.util.Map, field values are stored in the Map using the configured field names for keys. By default, the name attribute is used to determine the getter and setter on its parent bean or record. Alternatively, getter or setter attributes can be used to override the default property name similar to a field property.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
      <segment name="mailingAddress" class="example.Address">
        <field name="street" />
        <field name="city" />
        <field name="state" />      
        <field name="zip" />
      </segment>
    </record> 
  </stream>
  
</beanio>

If class is not set, fields will be automatically bound to the segment's parent bean object, which would be the Employee object in the example above.

If needed, segments can be further divided into other segments. There is no limit to the number of nested levels that can be configured in a mapping file.

4.5.2. Repeating Segments

Similar to repeating fields, BeanIO supports repeating segments, which may be bound to a collection of bean objects. Continuing our previous example, let's suppose the employee CSV file may contain 1 or more addresses for each employee. Thus our Employee bean object might look like this:

package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    List<Address> addressList;
    
    // getters and setters not shown...
}

And our input file might look like this:

Joe,Smith,Developer,75000,10012009,123 State St,Chicago,IL,60614
Jane,Doe,Architect,80000,01152008,456 Main St,Chicago,IL,60611,111 Michigan Ave,Chicago,IL,60611
Jon,Anderson,Manager,85000,03182007,1212 North Ave,Chicago,IL,60614

In our mapping file, in order to bind a segment to a collection, simply set it's collection attribute to the fully qualified class name of a java.util.Collection or java.util.Map subclass, or to one of the collection type aliases below.

ClassAliasDefault Implementation
java.util.Collectioncollectionjava.util.ArrayList
java.util.Listlistjava.util.ArrayList
java.util.Setsetjava.util.HashSet
java.util.Mapmapjava.util.LinkedHashMap
(Java Array)arrayN/A

Repeating segments can declare the number of occurrences using the minOccurs and maxOccurs attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater.

Just like repeating fields, if the number of occurrences of a segment is dependent on a preceding field in the same record, the occursRef attribute can be set to the name of the field that controls the number of occurrences.

Flat file formats (CSV, delimited and fixed length) may only contain one field or segment of indeterminate length (i.e. where maxOccurs is greater than minOccurs). The position of components that follow are assumed to be relative to the end of the record.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" format="MMddyyyy" />
      <segment name="addressList" collection="list" minOccurs="1" maxOccurs="unbounded" class="example.Address">
        <field name="street" />
        <field name="city" />
        <field name="state" />      
        <field name="zip" />
      </segment>
    </record> 
  </stream>
  
</beanio>

When working with repeating segments, there are a few restrictions to keep in mind:

4.5.2.1. Inline Maps

As noted above, a segment can also be bound to a java.util.Map which provides support for "inline" maps. For example, given the following CSV file of users,

id1,firstName1,lastName1,id2,firstName2,lastName2
jsmith,Joe,Smith,jdoe,Jane,Doe

The following mapping file could be used to create a Map of User objects by ID. The key attribute is used to set the name of a descendant field to use for the Map key.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" target="userMap">
      <segment name="userMap" class="example.User" collection="map" key="id" 
          minOccurs="1" maxOccurs="unbounded">
        <field name="id" />
        <field name="firstName" />
        <field name="lastName" />  
      </segment>
    </record> 
  </stream>
  
</beanio>

If a Map of last names by ID is needed instead, simply replace the class attribute with value and specify the name of the descendant field to use for the Map value. In this case, first name is effectively ignored.

<beanio>

  <stream name="employeeFile" format="csv">
    <record name="employee" target="userMap">
      <segment name="userMap" collection="map" key="id" value="lastName" 
          minOccurs="1" maxOccurs="unbounded">
        <field name="id" />
        <field name="firstName" />
        <field name="lastName" />  
      </segment>
    </record> 
  </stream>
  
</beanio>

4.6. Stream Validation

A BeanReader will throw an InvalidRecordException if a record or one of its fields fails a configured validation rule. There are two types of errors reported for an invalid record: record level errors and field level errors. If a record level error occurs, further processing of the record is aborted and an excception is immediately thrown. If a field level error is reported, the BeanReader will continue to process the record's other fields before throwing an exception.

When an InvalidRecordException is thrown, the exception will contain the reported record and field level errors. The following code shows how this information can be accessed using the RecordContext.

    BeanReader in;
    try {
        Object record = in.read();
        if (record != null) {
            // process record...
        }
    }
    catch (InvalidRecordException ex) {
        RecordContext context = ex.getRecordContext();
        if (context.hasRecordErrors()) {
            for (String error : context.getRecordErrors()) {
                // handle record errors...
            }
        }
        if (context.hasFieldErrors()) {
            for (String field : context.getFieldErrors().keySet()) {
                for (String error : context.getFieldErrors(field)) {
                    // handle field error...
                }
            }
        }
    }               
}

Alternatively, it may be simpler to register a BeanReaderErrorHandler for handling non-fatal exceptions. The example below shows how invalid records could be written to a reject file by extending BeanReaderErrorHandlerSupport. (Note that the example assumes the mapping file does not bind a record group to a bean object.)

    BeanReader input;
    BufferedWriter rejects;
    try {
        input.setErrorHandler(new BeanReaderErrorHandlerSupport() {
            public void invalidRecord(InvalidRecordException ex) throws Exception {
                rejects.write(ex.getRecordContext().getRecordText());
                rejects.newLine();
            }
        });
        
        Object record = null;
        while ((record = input.read()) != null) {
            // process a valid record
        }
        
        rejects.flush();
    }
    finally {
        input.close();
        rejects.close();
    }

Record and field level error messages can be customized and localized through the use of resource bundles. A resource bundle is configured at the stream level using the resourceBundle attribute as shown below.

<beanio>

  <typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler">
    <property name="pattern" value="MMddyyyy" />
  </typeHandler>

  <stream name="employeeFile" format="csv" resourceBundle="example.messages" >
    <record name="employee" class="map">
      <field name="recordType" rid="true" literal="Detail" />
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" />
    </record> 
  </stream>
  
</beanio>

Record level error messages are retrieved using the following prioritized list of keys. If a message is not configured under the name of the first key, the next key will be tried until a message is found, or a default message is used.

  1. recorderror.[record name].[rule]
  2. recorderror.[rule]

Similarly, field level error messages are retrieved using the following priortized list of keys:

  1. fielderror.[record name].[field name].[rule]
  2. fielderror.[record name].[rule]
  3. fielderror.[rule]

More descriptive or localized labels can be configured for record and field names using the keys label.[record name] and label.[record name].[field name] respectively.

For example, the following resource bundle could be used to customize a few error messages for the employee file.

# 'employee' record label:
label.employee = Employee Record
# 'firstName' field label:
label.employee.firstName = First Name Field
# Unidentified record error message:
recorderror.unidentified = Unidentified record at line {0}
# Type conversion error message for the 'hireDate' field:
fielderror.employee.hireDate.type = Invalid date format
# Maximum field length error message for all fields:
fielderror.maxLength = Maximum field length exceeded for {3}

Error messages are formatted using a java.text.MessageFormat. Depending on the validation rule that was violated, different parameters are passed to the MessageFormat. Appendix B documents the parameters passed to the MessageFormat for each validation rule.

4.6.1. Record Validation

The following record level validation rules may be configured on a record element.

AttributeArgument TypeDescription
minLength Integer Validates the record contains at least minLength fields for delimited and CSV formatted streams, or has at least minLength characters for fixed length formatted streams.
maxLength Integer Validates the record contains at most maxLength fields for delimited and CSV formatted streams, or has at most maxLength characters for fixed length formatted streams.

4.6.2. Field Validation

BeanIO supports several common field validation rules when reading an input stream. All field validation rules are validated against the field text before type conversion. When field trimming is enabled, trim="true", all validations are performed after the field's text has first been trimmed. Field validations are ignored when writing to an output stream.

The following table lists supported field attributes for validation.

AttributeArgument TypeDescription
required Boolean When set to true, validates the field is present and the field text is not the empty string.
minLength Integer Validates the field text is at least N characters.
maxLength Integer Validates the field text does not exceed N characters.
literal String Validates the field text exactly matches the literal value.
regex String Validates the field text matches the given regular expression pattern.
minOccurs String Validates the minimum occurrences of the field in a stream. If the field is present in the stream, minOccurs is satisfied, and the required setting determines whether a value is required.

4.7. Templates

When a common set of fields is used by multiple record types, configuration may be simplified using templates. A template is a reusable list of components (segments, fields, and properties/constants) that can be included by a record, segment or other template. The following example illustrates some of the ways a template can be used:

<beanio>

  <template name="address">
    <field name="street1" />
    <field name="street2" />
    <field name="city" />
    <field name="state" />
    <field name="zip" />
  </template>

  <template name="employee">
    <field name="firstName" />
    <field name="lastName" />
    <field name="title" />
    <field name="salary" />
    <field name="hireDate" format="MMddyyyy" />
    <segment name="mailingAddress" template="address" class="example.Address" />
  </template>
      
  <stream name="employeeFile" format="csv">
    <record name="employee" template="employee" class="example.Employee" />
  </stream>

  <stream name="addressFile" format="csv">
    <record name="address" class="example.Address">
      <field name="location" />
      <include template="address"/>
      <field name="attention" />
    </record> 
  </stream>

</beanio>

Templates are essentially copied into their destination using the include element. For convenience, record and segment elements support a template attribute which includes the template before any other children.

The include element can optionally specify a positional offset for included fields using the offset attribute. The following example illustrates this behavior. Even when using templates, remember that position must be declared for all fields or none.

<beanio>

  <template name="address">
    <field name="street1" position="0" />
    <field name="street2" position="1" />
    <field name="city" position="2" />
    <field name="state" position="3" />
    <field name="zip" position="4" />
  </template>

  <stream name="addressFile" format="csv">
    <record name="address" class="example.Address">
      <field name="location" position="0" />
      <include template="address" offset="1"/>
      <field name="attention" position="6" />
    </record> 
  </stream>

</beanio>

4.8. Advanced Topics

4.8.1. Mapping Bean Objects that Span Multiple Records

Since release 2.0, BeanIO supports the binding of multiple consecutive records to a single bean object. This can be achieved by assigning a bean class to a stream or group containing the record configurations bound to the bean.

Let's suppose we are reading a CSV input file of orders that contains an order, followed by the customer that placed the order, followed by a detailed list of items that make up the order. A sample input file might look like this:

Order,101,2012-02-01,5.00
Customer,John,Smith
Item,Apple,2,2.00
Item,Orange,1,1.00
Order,102,2012-02-01,3.00
Customer,Jane,Johnson
Item,Ham,1,3.00

Let's then suppose we want to read the following Order class from the stream, which contains a reference to Customer and Item classes. (For brevity, getters and setters are not shown.)

package example;
import java.util.Date;

public class Order {
    String id;
    Date date;
    BigDecimal amount;
    Customer customer;
    List<Item> items;
}

public class Customer {
    String firstName;
    String lastName;
}

public class Item {
    String name;
    int quantity;
    BigDecimal amount;
}

Now to read and write Order objects from our example stream, the following mapping file can be used:

<beanio>

  <stream name="orders" format="csv">
    <group name="order" class="example.Order" minOccurs="0" maxOccurs="unbounded">
      <record name="orderRecord" order="1" minOccurs="1">
        <field name="recordType" rid="true" literal="Order" ignore="true" />
        <field name="id" />
        <field name="date" format="yyyy-MM-dd" />
        <field name="amount" />
      </record>
      <record name="customer" class="example.Customer" order="2" minOccurs="1" maxOccurs="1">
        <field name="recordType" rid="true" literal="Customer" ignore="true" />
        <field name="firstName" />
        <field name="lastName" />
      </record>      
      <record name="items" class="example.Item" collection="list" order="3" minOccurs="1" maxOccurs="unbounded">
        <field name="recordType" rid="true" literal="Item" ignore="true" />
        <field name="name" />
        <field name="quantity" />
        <field name="amount" />
      </record>
    </group>
  </stream>
  
</beanio>

By configuring a class on a group component, BeanIO will automatically marshal or unmarshal all of the group's descendants in a single call to read or write from the stream. Also note that by not configuring a class on a record, in this case our "orderRecord", the fields are instead set on the bean class assigned to it's parent group. Finally, repeating records can be aggregated into a collection using a collection attribute at the record level, as used for the "items" record. If necessary, getter and setter attributes can be configured on a record component as well.

If any record included in a group bound to a bean object is invalid, an InvalidRecordException is thrown, but only after reading all the other records in the group. In such cases, the InvalidRecordException will contain RecordContext objects for every record in the group read from the stream. If multiple records in the group are invalid, only one InvalidRecordException is thrown.

If a malformed or unidentified record is read from the stream while unmarsahalling a record group, an exception is immediately thrown, and the BeanReader will most likely not be able to recover. For this reason, when unmarshalling untrusted sources, it is recommended that you read the stream twice, using the first pass to validate the integrity of the file including syntax, record identification, record ordering, possible header/trailer counts, etc. For example, the following mapping file might be used to validate our orders file.

<beanio>

  <stream name="orders-validation" format="csv">
    <group name="order" minOccurs="0" maxOccurs="unbounded">
      <record name="orderRecord" order="1" minOccurs="1">
        <field name="recordType" rid="true" literal="Order" ignore="true" />
      </record>
      <record name="customer" order="2" minOccurs="1">
        <field name="recordType" rid="true" literal="Customer" ignore="true" />
      </record>      
      <record name="items" order="3" minOccurs="1" maxOccurs="unbounded">
        <field name="recordType" rid="true" literal="Item" ignore="true" />
      </record>
    </group>
  </stream>
  
</beanio>

In this case, we are validating syntax, record ordering and record identification for the entire file in a single call to beanReader.read(), while leaving other record and field level validations for unmarshalling, which can be caught and handled without worrying whether the BeanReader will be able to recover.

5.0. Mapping XML Streams

This section provides further details for using BeanIO to marshall and unmarshall Java objects to and from XML formatted streams. This section assumes you are already familiar with the mapping file concepts documented in previous sections.

5.1. Introduction

BeanIO is similar to other OXM (Object to XML Mapping) libraries, except that it is also capable of marshalling and unmarshalling extremely large XML files by reading and writing Java beans one record at a time. BeanIO uses a streaming XML (StAX) parser to read and write XML, and will never hold more than the minimum amount of XML in memory needed to marshall or unmarshall a single bean object. That said, it is still possible to run out of memory (heap space) with poorly designed XML documents and/or misconfigured mapping files.

5.1.1. My First XML Stream

Before diving into the details, let's start with a basic example using the employee input file from Section 2.1 after it's been converted to XML (shown below).

<?xml version="1.0"?>
<employeeFile>
  <employee>
    <firstName>Joe</firstName>
    <lastName>Smith</lastName>
    <title>Developer</title>
    <salary>75000</salary>
    <hireDate>2009-10-12</hireDate>
  </employee>
  <employee>
    <firstName>Jane</firstName>
    <lastName>Doe</lastName>
    <title>Architect</title>
    <salary>80000</salary>
    <hireDate>2008-01-15</hireDate>
  </employee>
  <employee>
    <firstName>Jon</firstName>
    <lastName>Andersen</lastName>
    <title>Manager</title>
    <salary>85000</salary>
    <hireDate>2007-03-18</hireDate>
  </employee>
</employeeFile>

In this example, let's suppose we are unmarshalling the XML employee file into the same Employee bean object from Section 2.1 and repeated below.

package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    
    // getters and setters not shown...
}

Our original mapping file from Section 2.1 can now be updated to parse XML instead of CSV with only two minor changes. First, the stream format is changed to xml. And second, the hire date field format is removed and replaced with type="date". With XML, the date format does not need to be explicity declared because it conforms to the W3C XML Schema date syntax. (This will be further explained in Section 5.7.1).

<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <stream name="employeeFile" format="xml">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
  </stream>
</beanio>

That's it! No Java code changes are required, and as before, Employee bean objects will be unmarshalled from the XML input stream each time beanReader.read() is called.

And also as before, Employee objects can be marshalled to an XML output stream using beanWriter.write(Object). However, please note that when marshalling/writing XML, it is even more important to call beanWriter.close() so that the XML document can be properly completed.

5.1.2. A Note on XML Validation

Because BeanIO is built like a pull parser, it does not support XML validation against a DTD or XML schema. Where this functionality is needed, it is recommended to make two passes on the input document. The first pass can use a SAX parser or other means to validate the XML, and the second pass can use BeanIO to parse and process bean objects read from the document.

5.2. XML Names

Each BeanIO mapping component (stream, group, record, segment and field), is mapped to an XML element with the same local name. If the name of the stream, group, etc. does not match the XML element name, the xmlName attribute can be used. For example, if the name of the root element in the previous example's employee file is changed from "employeeFile" to "employees", and "title" was renamed "position", the mapping file could be updated as shown below.

<beanio>

  <stream name="employeeFile" format="xml" xmlName="employees">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" xmlName="position" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
  </stream>
  
</beanio>

5.3. XML Namespaces

XML namespaces can be enabled through the use of the xmlNamespace attribute on any mapping component (stream, group, record, segment or field). By default, all mapping elements inherit their namespace (or lack thereof) from their parent. When a namespace is declared, the local name and namespace must match when unmarshalling XML, and appropriate namespace declarations are included when marshalling bean objects. For example, let's suppose our employee file contains namespaces as shown below.

<?xml version="1.0"?>
<employeeFile xmlns="http://example.com/employeeFile" xmlns:n="http://example.com/name">
  <e:employee xmlns:e="http://example.com/employee">
    <n:firstName>Joe</n:firstName>
    <n:lastName>Smith</n:lastName>
    <e:title>Developer</e:title>
    <e:salary>75000</e:salary>
    <e:hireDate>2009-10-12</e:hireDate>
  </e:employee>
  .
  .
  .
</employeeFile>

To unmarshall the file using namespaces, and to marshall Employee bean objects in the same fashion as they appear above, the following mapping file can be used.

<beanio>

  <stream name="employeeFile" format="xml" xmlNamespace="http://example.com/employeeFile">
    <parser>
      <property name="namespaces" value="n http://example.com/name"/>
    </parser>
    <record name="employee" class="example.Employee" xmlNamespace="http://example.com/employee" xmlPrefix="e">
      <field name="firstName" xmlNamespace="http://example.com/name" />
      <field name="lastName" xmlNamespace="http://example.com/name" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
  </stream>
  
</beanio>

From this example, the following behavior can be observed:

BeanIO also supports a special wildcard namespace. If xmlNamespace is set to '*', any namespace is allowed when unmarshalling XML, and no namespace declaration will be made when marshalling XML.

The following table summarizes namespace configuration options and their effect on the configured element and a child that inherits it's parent namespace.

Mapping Configuration Marshalled Element And Child
[None]
<element>
  <child/>
</element>
xmlNamespace="*"
<element>
  <child/>
</element>
xmlNamespace=""
<element xmlns="">
  <child/>
</element>
xmlNamespace="http://example.com"
<element xmlns="http://example.com">
  <child/>
</element>
xmlNamespace="http://example.com" xmlPrefix="e"
<e:element xmlns="http://example.com">
  <e:child/>
</e:element>

5.4. Streams

When unmarshalling multiple records from an XML document, the stream configuration is mapped to the root element in the XML formatted stream. This default behavior has been demonstrated in previous examples. If on the other hand, an XML document contains only a single record, the document can be fully read or written by setting the stream configuration's xmlType attribute to none. This behavior is similar to other OXM libraries that marshall or unmarshall one bean object per XML document.

For example, if BeanIO was used to unmarshall a single employee record submitted via a web service, the XML document might look like the following. Notice there is no 'employeeFile' root element for containing multiple employee records.

<employee>
  <firstName>Joe</firstName>
  <lastName>Smith</lastName>
  <title>Developer</title>
  <salary>75000</salary>
  <hireDate>2009-10-12</hireDate>
</employee>

In this example, the following highlighted changes can be made to our mapping file to allow BeanIO to unmarshall/marshall a single employee record.

<beanio>

  <stream name="employeeFile" format="xml" xmlType="none">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
  </stream>
  
</beanio>

5.5. Groups

Like other mapping elements, groups are also mapped to XML elements by default. Or if a group is used only for control purposes, the group's xmlType attribute can be set to none.

5.6. Records

A record is always mapped to an XML element. As we've seen before, records are matched based on their group context and configured record identifying fields. XML records are further matched using their XML element name, as defined by xmlName, or if not present, name. Other than configured record identifying fields, segment and field names declared within the record are not used to identify records.

For example, let's suppose our employee file differentiated managers using 'manager' tags.

<?xml version="1.0"?>
<employeeFile>
  <employee>
    <firstName>Joe</firstName>
    <lastName>Smith</lastName>
    <title>Developer</title>
    <salary>75000</salary>
    <hireDate>2009-10-12</hireDate>
  </employee>
  <employee>
    <firstName>Jane</firstName>
    <lastName>Doe</lastName>
    <title>Architect</title>
    <salary>80000</salary>
    <hireDate>2008-01-15</hireDate>
  </employee>
  <manager>
    <firstName>Jon</firstName>
    <lastName>Andersen</lastName>
    <title>Manager</title>
    <salary>85000</salary>
    <hireDate>2007-03-18</hireDate>
  </manager>
</employeeFile>

To bind managers to a new Manager bean we could use the following mapping configuration.

<beanio>

  <stream name="employeeFile" format="xml">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
    <record name="manager" class="example.Manager">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
    </record>
  </stream>
  
</beanio>

5.7. Fields

A field is mapped to XML using the field's xmlType attribute, which defaults to element. The field XML type can be set to element, attribute, text, or none. The following table illustrates possible configurations, except for none which is not covered here.

Record Definition Sample Record
<record name="person" class="map">
  <field name="name" xmlType="element"/>
</person>
<person>
  <name>John</name>
</person>
<record name="person" class="map">
  <field name="name" xmlType="attribute"/>
</person>
<person name="John"/>
<record name="person" class="map">
  <field name="name" xmlType="text"/>
</person>
<person>John</person>

5.7.1. Field Type Conversion

Field type conversion works the same way for XML formatted streams as it does for other formats. However, several default type handlers are overridden specifically for XML formatted streams to conform with W3C XML Schema built-in data types according to this specification. The following table summarizes overriden type handlers:

Class or Type Alias XML Schema Data Type Example
date
calendar-date
date 2011-01-01
datetime
java.util.Date
calendar
calendar-datetime
java.util.Calendar
dateTime 2011-01-01T15:14:13
time
calendar-time
time 15:14:13
boolean boolean true

Like other type handlers, XML specific type handlers can be customized or completely replaced. Please consult BeanIO javadocs for customization details.

5.7.2. Marshalling Null Field Values

The nillable and minOccurs field attributes control how a null field value is marshalled. If minOccurs is 0, an element or attribute is not marshalled for the field. If an element type field has nillable set to true and minOccurs set to 1, the W3C XML Schema Instance attribute nil is set to true.

This behavior is illustrated in the following table.

Field Type Record Definition Marshalled Record
(Field Value is Null)
element
<record name="person" class="map">
  <field name="name" />
</person>
<person>
  <name/>
</person>
<record name="person" class="map">
  <field name="name" minOccurs="0" />
</person>
<person/>
<record name="person" class="map">
  <field name="name" nillable="true"/>
</person>
<person>
  <name xsi:nil="true"/>
</person>
attribute
<record name="person" class="map">
  <field name="name" xmlType="attribute"/>
</person>
<person/>
<record name="person" class="map">
  <field name="name" xmlType="attribute" minOccurs="1"/>
</person>
<person name=""/>
text
<record name="person" class="map">
  <field name="name" xmlType="text"/>
</person>
<person/>

5.8. Segments

A segment can be used to bind a group of fields to a nested bean object, or to wrap a field or group of fields under an XML element.

5.8.1. Nested Beans

Segments can be used to bind a group of fields to a bean object. The xmlType assigned to the segment determines the format of the XML. Possible values are element (default) and none. The difference can be explored using the Address and Employee beans defined in Section 4.4 and repeated here.

package example;

public class Address {
    String street;
    String city;
    String state;
    String zip;
    
    // getters and setters not shown...
}
package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    Address mailingAddress;
    
    // getters and setters not shown...
}

By default, a segment's xmlType is set to element, so it is not necessary to declare it in the mapping file below.

<beanio>

  <stream name="employeeFile" format="xml">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
      <segment name="mailingAddress" class="example.Address" xmlType="element">
        <field name="street" />
        <field name="city" />
        <field name="state" />      
        <field name="zip" />
      </segment>
    </record> 
  </stream>
  
</beanio>

This mapping configuration can be used to process the sample XML document below. When a segment is mapped to an XML element, nillable and minOccurs will control the marshalling behavior of null bean objects in the same fashion as a field (see Section 5.7.2).

<?xml version="1.0"?>
<employeeFile>
  <employee>
    <firstName>Joe</firstName>
    <lastName>Smith</lastName>
    <title>Developer</title>
    <salary>75000</salary>
    <hireDate>2009-10-12</hireDate>
    <mailingAddress>
      <street>123 Main Street</street>
      <city>Chicago</city>
      <state>IL</state>
      <zip>12345</zip>
    </mailingAddress>
  </employee>
  .
  .
  .
</employeeFile>

Alternatively, if the segment's xmlType is set to none, the following XML document can be processed.

<?xml version="1.0"?>
<employeeFile>
  <employee>
    <firstName>Joe</firstName>
    <lastName>Smith</lastName>
    <title>Developer</title>
    <salary>75000</salary>
    <hireDate>2009-10-12</hireDate>
    <street>123 Main Street</street>
    <city>Chicago</city>
    <state>IL</state>
    <zip>12345</zip>
  </employee>
  .
  .
  .
</employeeFile>

5.8.2. Wrapped Segments

In some cases, an XML document may contain extraneous elements that do not map directly to a bean object or property value. In these cases, a segment (without a class attribute) can be used to wrap a field or group of fields.

Extending the previous example, let's suppose the Employee bean object is modified to hold a list of addresses.

package example;
import java.util.Date;

public class Employee {
    String firstName;
    String lastName;
    String title;
    int salary;
    Date hireDate;
    List<Address> addressList;
    
    // getters and setters not shown...
}

And let's further suppose that each employee's list of addresses is enclosed in a new element called addresses.

<?xml version="1.0"?>
<employeeFile>
  <employee>
    <firstName>Joe</firstName>
    <lastName>Smith</lastName>
    <title>Developer</title>
    <salary>75000</salary>
    <hireDate>2009-10-12</hireDate>
    <addresses>
      <mailingAddress>
        <street>123 Main Street</street>
        <city>Chicago</city>
        <state>IL</state>
        <zip>12345</zip>
      </mailingAddress>
    </addresses>
  </employee>
  .
  .
  .
</employeeFile>

The mapping file can now be updated as follows:

<beanio>

  <stream name="employeeFile" format="xml">
    <record name="employee" class="example.Employee">
      <field name="firstName" />
      <field name="lastName" />
      <field name="title" />
      <field name="salary" />
      <field name="hireDate" type="date" />
      <segment name="addresses">
        <segment name="mailingAddress" class="example.Address" collection="list" minOccurs="0" maxOccurs="unbounded">
          <field name="street" />
          <field name="city" />
          <field name="state" />      
          <field name="zip" />
        </segment>
      </segment>
    </record> 
  </stream>
  
</beanio>

The following table illustrates various effects using a segment based on the xmlType of a field, and the effect of minOccurs and nillable when marshalling null field values.

Field Mapping Non-Null Field Value Null Field Value
<segment name="wrapper">
  <field name="field" />
</segment>
<wrapper>
  <field>value</field>
</wrapper>
<wrapper>
  <field/>
</wrapper>
<segment name="wrapper" minOccurs="0">
  <field name="field" />
</segment>
-
<segment name="wrapper" nillable="true">
  <field name="field" />
</segment>
<wrapper xsi:nil="true"/>
<segment name="wrapper">
  <field name="field" nillable="true" />
</segment>
<wrapper>
  <field xsi:nil="true"/>
</wrapper>
<segment name="wrapper">
  <field name="field" minOccurs="0"/>
</segment>
<wrapper/>
<segment name="wrapper">
  <field name="field" xmlType="attribute" />
</segment>
<wrapper field="value"/>
<wrapper/>
<segment name="wrapper">
  <field name="field" xmlType="attribute" minOccurs="1" />
</segment>
<wrapper field=""/>
<segment name="wrapper" minOccurs="0">
  <field name="field" xmlType="attribute" minOccurs="1" />
</segment>
-
<segment name="wrapper">
  <field name="field" xmlType="text" />
</segment>
<wrapper>value</wrapper>
<wrapper/>
<segment name="wrapper" nillable="true">
  <field name="field" xmlType="text" />
</segment>
<wrapper xsi:nil="true"/>
<segment name="wrapper" minOccurs="0">
  <field name="field" xmlType="text"/>
</segment>
-

Similarly, a segment can be used to wrap a repeating field as illustrated below.

Field Mapping Collection Null or Empty Collection
<segment name="wrapper">
  <field name="field" collection="list" 
    minOccurs="0" maxOccurs="10" />
</segment name="wrapper">
<wrapper>
  <field>value1</field>
  <field>value2</field>  
</wrapper>
<wrapper />
<segment name="wrapper">
  <field name="field" collection="list" 
    minOccurs="1" maxOccurs="10" />
</wrapper>
<wrapper>
  <field/>
</wrapper>
<segment name="wrapper" minOccurs="0">
  <field name="field" collection="list" 
    minOccurs="1" maxOccurs="10" />
</wrapper>
-
<segment name="wrapper" nillable="true">
  <field name="field" collection="list" 
    minOccurs="1" maxOccurs="10" />
</wrapper>
<wrapper xsi:nil="true"/>

6.0. Annotations and the Stream Builder API

Since release 2.1, BeanIO includes support for Java annotations and a stream builder API.

6.1. The Stream Builder API

The stream builder API can be used to programatically create a stream mapping without the need for a mapping file.


    StreamFactory factory = StreamFactory.newInstance();
    
    // create a new StreamBuilder and define its layout
    StreamBuilder builder = new StreamBuilder("employeeFile")
        .format("delimited")
        .parser(new DelimitedParserBuilder(','))
        .addRecord(new RecordBuilder("employee")
            .type(Employee.class)
            .minOccurs(1)
            .addField(new FieldBuilder("type").rid().literal("EMP").ignore())
            .addField(new FieldBuilder("recordType").rid(
            .addField(new FieldBuilder("firstName"))
            .addField(new FieldBuilder("lastName"))
            .addField(new FieldBuilder("title"))
            .addField(new FieldBuilder("salary"))
            .addField(new FieldBuilder("hireDate").format("MMddyyyy")));

    // pass the StreamBuilder to the factory
    factory.define(builder);
    
    BeanReader in = factory.createReader("employeeFile", new File("employee.csv"));
    // etc...

Like in a mapping file, components are assumed to be ordered as they are added to their parent, unless at (for fields) or order (for records and groups) is explicitly set. For more information, refer to the Javadocs for the org.beanio.builder package.

6.2. Annotations

Java classes can also be annotated to augment the use of the stream builder API or mapping file. Classes may be annotated with @Record, and class attributes or getter/setter methods may be annotated with @Field. Any component annotated with @Record or @Segment may be further annotated using @Fields to include fields not bound to a Java property.

Continuing our previous example, the example.Employee class could be annotated like so:

@Record(minOccurs=1}
@Fields({
    @Field(name="type", at=0, rid=true, literal="EMP")
})
public class Employee {
    
    @Field(at=1)
    private String firstName;
    @Field(at=2)
    private String lastName;
    @Field(at=3)
    private String title;
    @Field(at=4)
    private String salary,
    @Field(at=5, format="MMddyyyy")
    private Date hireDate;

    // getters and setters...
}

Using an annotated Employee class, the stream builder example can be greatly simplified:


    StreamFactory factory = StreamFactory.newInstance();
    
    StreamBuilder builder = new StreamBuilder("employeeFile")
        .format("delimited")
        .parser(new DelimitedParserBuilder(','))
        .addRecord(Employee.class);

    factory.define(builder);

As can a mapping file:

<beanio>

  <stream name="employeeFile" format="delimited">
    <parser>
      <property name="delimiter" value="," />
    </parser>
    <record name="employee" class="example.Employee" />
  </stream>
  
</beanio>

When using annotations, it is strongly recommended to explicitly set the position (using at) for all fields and segments. BeanIO does not guarrantee the order in which annotated components are added to a layout.

Annotation settings are generally named according to their mapping file counterparts and follow the same convention as well. Refer to Appendix A for a complete explanation of all settings.

Where used, annotated records can not be overridden by mapping file components. Configuration settings other than class and descendent components will be ignored.

7.0. Spring Batch Integration

As of release 1.2, BeanIO can be used to read and write flat files with Spring Batch (2.1.x), a batch processing framework by SpringSource.

7.1. BeanIO ItemReader/Writer Beans

The class org.beanio.spring.BeanIOFlatFileItemReader implements Spring Batch's ItemReader interface and can be used to read flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item reader configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to read a file called 'in.txt'. The location of the mapping file is set using the streamMapping property, and the name of the stream layout is specified using the streamName property.

  <bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader">
    <property name="streamMapping" value="classpath:/mapping.xml" />
    <property name="streamName" value="stream" />
    <property name="resource" value="file:in.txt" />
  </bean>

Similarly, the class org.beanio.spring.BeanIOFlatFileItemWriter implements Spring Batch's ItemWriter interface and can be used to write flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item writer configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to write a file called 'out.txt'.

  <bean id="itemWriter" class="org.beanio.spring.BeanIOFlatFileItemWriter">
    <property name="streamMapping" value="classpath:/mapping.xml" />
    <property name="streamName" value="stream" />
    <property name="resource" value="file:out.txt" />
  </bean>

BeanIO item readers and writers are restartable, and support many of the same properties supported by the flat file item reader and writer included with Spring Batch. Please refer to their API documentation for details.

7.2. BeanIO StreamFactory Bean

By default, a BeanIO item reader/writer creates its own stream factory, but in cases where this could cause one or more mapping files to be loaded multiple times, it may be preferable to create a shared stream factory instance. BeanIO's org.beanio.spring.BeanIOStreamFactory class can be used to create a shared stream factory that can be injected into BeanIO item readers and writers. The following Spring beans configuration file illustrates this:

<beans>

  <bean id="streamFactory" class="org.beanio.spring.BeanIOStreamFactory">
    <property name="streamMappings">
      <list>
        <value>classpath:/mapping1.xml</value>
        <value>file:/mapping2.xml</value>
      </list>
    </property>
  </bean>
  
  <bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader">
    <property name="streamFactory" ref="streamFactory" />
    <property name="streamName" value="stream" />
    <property name="resource" value="file:in.txt" />
  </bean>
  
</beans>

8.0. Configuration

In some cases, BeanIO behavior can be controlled by setting optional property values. Properties can be set using System properties or a property file. BeanIO will load configuration setting in the following order of priority:

  1. System properties.
  2. A property file named beanio.properties. The file will be looked for first in the application's working directory, and then on the classpath.

The name and location of beanio.properties can be overridden using the System property org.beanio.configuration. In the following example, configuration settings will be loaded from the file named config/settings.properties, first relative to the application's working directory, and if not found, then from the root of the application's classpath.

java -Dorg.beanio.configuration=config/settings.properties example.Main

8.1. Settings

The following configuration settings are supported by BeanIO:

Property Description Default
org.beanio.allowProtectedAccess Whether private and protected class variables and constructors can be accessed (i.e. make accessible using the reflection API). true
org.beanio.lazyIfEmpty Whether objects are lazily instantiated if String properties are empty (and not just null). true
org.beanio.errorIfNullPrimitive Whether null field values will cause an exception if bound to a primitive property. false
org.beanio.useDefaultIfMissing Whether default values apply to fields missing from the stream. true
org.beanio.propertyEscapingEnabled Whether property values (for typeHandler, reader and writer elements) support escape patterns for line feeds, carriage returns, tabs, etc. Set to true or false. true
org.beanio.nullEscapingEnabled Whether the null character can be escaped using \0 when property escaping is enabled. Set to true or false. true
org.beanio.marshalDefaultEnabled Whether a configured field default is marshalled for null property values. May be disabled for backwards compatibility by setting the value to false. true
org.beanio.defaultTypeHandlerLocale Sets the default type handler locale. Locale.getDefault()
org.beanio.defaultDateFormat Sets the default SimpleDateFormat pattern for date and calendar-date type fields in CSV, delimited and fixed length file formats. DateFormat. getDateInstance()
org.beanio.defaultDateTimeFormat Sets the default SimpleDateFormat pattern for datetime, calendar-datetime and calendar type fields in CSV, delimited and fixed length file formats.. DateFormat. getDateTimeInstance()
org.beanio.defaultTimeFormat Sets the default SimpleDateFormat pattern for time and calendar-time type fields in CSV, delimited and fixed length file formats.. DateFormat. getTimeInstance()
org.beanio.group.minOccurs Sets the default minOccurs for a group. 0
org.beanio.record.minOccurs Sets the default minOccurs for a record. 0
org.beanio.field.minOccurs.[format] Sets the default minOccurs for a field by stream format. 1
org.beanio.propertyAccessorFactory Sets the method of property invocation to use. Defaults to reflection, but may be set to asm to use ASM bytecode generation for setting bean properties. (In my limited benchmarking, this can improve marshalling and unmarshalling performance by up to 20% in a 1.5 JVM. Modern JVM's (1.6+) will likely see little to no performance improvement. I strongly recommend conducting your own benchmarking tests before changing this setting.) reflection
org.beanio.xml.defaultXmlType Sets the default XML type for a field in an XML formatted stream. May be set to element or attribute. element
org.beanio.xml.xsiNamespacePrefix Sets the default prefix for the namespace http://www.w3.org/2001/XMLSchema-instance. xsi
org.beanio.xml.sorted Whether XML fields are sorted by position (if assigned). true

Appendix A: XML Mapping File Reference

Appendix A is the complete reference for the BeanIO 2.x XML mapping file schema. The root element of a mapping file is beanio with namespace http://www.beanio.org/2012/03. The following notation is used to indicate the allowed number of child elements:

* Zero, one or more
+ One or more
? Zero or one

Ranges

Where noted, some attributes can be configured using a range notation. A range is expressed using the following syntax, where N and M are integer values:

N Upper and lower boundaries are set to N.
N-MLower boundery is set to N. Upper boundary is set to M.
N+ Lower boundary is set to N. No upper boundary.

A.1. beanio

The beanio element is the root element for a BeanIO mapping file.

Children: property*, import*, typeHandler*, template*, stream*

A.2. import

The import element is used to import type handlers, templates and streams from an external mapping file. Streams declared in a mapping file being imported are not affected by global type handlers or templates declared in the file that imported it.

Attributes:

Attribute Description Required
resource The name of the resource to import.

The resource name must be qualified with 'classpath:' to load the resource from the classpath, or with 'file:' to load the file relative to the application's working directory.

Yes

A.3. typeHandler

A typeHandler element is used to declare a custom field type handler that implements the org.beanio.types.TypeHandler interface. A type handler can be registered for a specific Java type, or registered for a Java type and stream format combination, or explicitly named.

Attributes:

Attribute Description Required
name The type handler name. A field can always reference a type handler by name, even if the stream format does not match the configured type handler format attribute.

When configured, the name of a globally declared type handler must be unique within a mapping and any imported mapping files.

One of name or type is required.
type The fully qualified classname or type alias to register the type handler for. If format is also set, the type handler will only be used by streams that match the configured format. One of name or type is required.
class The fully qualified classname of the TypeHandler implementation. Yes
format When used in conjunction with the type attribute, a type handler can be registered for a specific stream format. Set to xml, csv, delimited, or fixedlength. If not set, the type handler may be used by any stream format. No

Children: property*

A.4. property

A property element has several uses.

  1. When used at the top of a mapping file as a direct child of beanio, a property may declare properties to use for property substitution in other attributes within the mapping file. Property substitution uses the syntax ${propertyName,default}, where all whitespace between the brackets is retained. Properties cannot be imported from another file.
  2. Or, a property element may be used to customize other elements, such as a typeHandler or parser.
  3. Or finally, a property value can be used to set constant values on a bean object, which is further described below.
Attribute Description Required
name The property name. Yes
value The property value.

When used to customize a typeHandler or parser, default type handlers only are used to convert property text to an object value. String and Character type property values can use the following escape sequences: \\ (Backslash), \n (Line Feed), \r (Carriage Return), \t (Tab), \0 (Null) and \f (Form Feed). A backslash preceding any other character is ignored.

Yes

A property element, when used as child of a record or segment element, can be used to set constant values on a record or bean object that do not map to a field in the input or output stream. The following additional attributes are accepted in this scenario:

Attributes:

Attribute Description Required Format(s)
getter The getter method used to retrieve the property value from its parent bean class. By default, the getter method is determined through introspection using the property name. No *
setter The setter method used to set the property value on its parent bean class. By default, the setter method is determined through introspection using the property name. No *
rid Record identifier indicator for marshalling/writing only. Set to true if this property is used to identify the record mapping configuration used to marshall a bean object. More than one property or field can be used for identification. Defaults to false. No *
type The fully qualified class name or type alias of the property value. By default, BeanIO will derive the property type from the bean class. This attribute can be used to override the default or may be required if the bean class is of type Map. No *
typeHandler The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on type when set, or through introspection of the property's parent bean class. No *
format The decimal format pattern for Number type properties, or the simple date format pattern for Date type properties.

The format value can accessed by any custom type handler that implements ConfigurableTypeHandler.

No *

A.5. template

The template element is used to create reusable lists of bean properties.

Note that templates are "expanded" at the time they are included. This means an imported template that relies on property substitution will use property values from the mapping file that included it and not the mapping file where the template was declared.

Attributes:

Attribute Description Required
name The name of the template. Template names must be unique within a mapping file and any imported mapping files. Yes

Children: ( field | property | segment | include )*

A.6. include

The include element is used to include a template in a record, segment, or another template.

Attributes:

Attribute Description Required
template The name of the template to include. Yes
offset The offset added to field positions included by the template. Defaults to 0. No

A.7. stream

A stream element defines the record layout of an input or output stream.

Attributes:

Attribute Description Required Format(s)
name The name of the stream. Yes *
format The stream format. Either xml, csv, delimited or fixedlength Yes *
mode By default, a stream mapping can be used for both reading input streams and writing output streams, called readwrite mode. Setting mode to read or write instead, respectively restricts usage to a BeanReader or a BeanWriter only, but relaxes some validations on the mapping configuration.

When mode is set read, a bean class does not require getter methods.

When mode is set write, a bean class may be abstract or an interface, and does not require setter methods.

No *
resourceBundle The name of the resource bundle for customizing error messages. No *
strict When set to true, BeanIO will calculate and enforce record ordering based on the order records are declared. The record order attribute can still be used to override a particular section of the stream.

When set to true, BeanIO will also calculate and enforce record lengths based on configured fields and their occurrences. The record minLength and maxLength attributes can still be used to override BeanIO defaults.

Defaults to fales.

No *
minOccurs The minimum number of times the record layout must be read from an input stream. Defaults to 0. No *
maxOccurs The maximum number of times the record layout can repeat when read from an input stream. Defaults to 1. No *
occurs An alternative to specifying both minOccurs and maxOccurs that uses range notation. No *
ignoreUnidentified Records If set to true, BeanIO will skip records that cannot be identified, otherwise an UnidentifiedRecordException is thrown. This feature is not recommended for use with record groups, since a record sequencing error could cause large portions of a stream to go unprocessed without any exception. No *
xmlType The XML node type mapped to the stream. If not specified or set to element, the stream is mapped to the root element of the XML document being marshalled or unmarshalled. If set to none, the XML input stream will be fully read and mapped to a child group or record. No xml
xmlName The local name of the XML element mapped to the stream. Defaults to the stream name. No xml
xmlNamespace The namespace of the XML element mapped to the stream. Defaults to '*' which will ignore namespaces while marshalling and unmarshalling. No xml
xmlPrefix The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. No xml

Children: parser?, typeHandler*, ( record | group )+

A.8. parser

A parser element is used to customize or replace the default record parser factory for a stream.

Attributes:

Attribute Description Required
class The fully qualified class name of the org.beanio.stream.RecordParserFactory implementation to use for this stream. If not specified, one of the following default factories is used based on the stream format:

csv - org.beanio.stream.csv.CsvRecordParserFactory
delimited - org.beanio.stream.delimited.DelimitedRecordParserFactory
fixedlength - org.beanio.stream.fixedlength.FixedLengthRecordParserFactory
xml - org.beanio.stream.xml.XmlRecordParserFactory

Overriding the record parser factory for XML is not supported (but also not prevented).

No

Children: property*

A.9. group

A group element is used to group records together for validating occurrences of the group as a whole.

Attributes:

Attribute Description Required Format(s)
name The name of the group. Yes *
class The fully qualified class name of the bean object mapped to this group. A class may be bound to a group when its marshalled form spans multiple consecutive records.

During umarshalling, if any record in the group fails validation, an InvalidRecordGroupException is thrown.

No *
value
target
The name of a child component (typically a record) to return in lieu of an assigned class.

There can be only one iteration of the named value. For example, if a repeating segment bound to a collection contains a repeating field (also bound to a collection), the segment can be targeted, but the field cannot.

No *
collection The collection type for repeating groups bound to a parent bean object (configured on a group). The value may be set to any fully qualified class name assignable to java.util.Collection, or to one of the collection type aliases: list, set or array. A collection type can only be set if class is also set.

BeanIO will not derive the collection type from it's parent bean object.

No *
getter The getter method used to get the bean object bound to this group from it's parent. By default, the getter method is determined through introspection using the group name. Ignored if class is not set. No *
setter The setter method used to set the bean object bound to this group on the bean object of it's parent. By default, the setter method is determined through introspection using the group name. Ignored if class is not set. No *
order The order this group must appear within its parent group or stream.

If strict is set to true at the stream level, order will default to the order assigned to its preceding sibling plus one (i.e. the record or group that shares the same parent), or 1 if this group is the first child in its parent group or stream. If strict is false, defaults to 1.

If order is explicitly set for one group, it must be set for all other siblings that share the same parent.

No *
minOccurs The minimum number of occurences of this group within its parent group or stream. Defaults to 1. No *
maxOccurs The maximum number of occurences of this group within its parent group or stream. Defaults to unbounded. No *
occurs An alternative to specifying both minOccurs and maxOccurs that uses range notation. No *
xmlType The XML node type mapped to this group. If not specified or set to element, this group is mapped to an XML element. When set to none, this group is used only to define expected record sequencing. No xml
xmlName The local name of the XML element mapped to this group. Defaults to the group name. No xml
xmlNamespace The namespace of the XML element mapped to this group. Defaults to the namespace declared for the parent stream or group definition. No xml
xmlPrefix The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). No xml

Children: record*

A.10. record

A record is used to define a record mapping within a stream.

Attributes:

Attribute Description Required Format(s)
name The name of the record. Yes *
class The fully qualified class name of the bean object mapped to this record.

If set to map or any java.util.Map implementation, a Map object will be used with field names for keys and field values for values.

If set to list, set, collection, or any java.util.Collection implementation, child fields are added to the declared collection, including null values for missing or null fields.

If neither class or target is set, a BeanReader will fully validate the record, but no bean object will be returned and the reader will continue reading the next record.

No *
value
target
The name of a child segment or field to return in lieu of an assigned class.

There can be only one iteration of a named value. For example, if a repeating segment bound to a collection contains a repeating field (also bound to a collection), the segment can be targeted, but the field cannot.

If neither class or value is set, a BeanReader will fully validate the record, but no bean object will be returned and the reader will continue reading the next record.

No *
getter The getter method used to get the bean object bound to this record from it's parent. By default, the getter method is determined through introspection using the record name. Ignored if class is not set. No *
setter The setter method used to set the bean object bound to this record on the bean object of it's parent. By default, the setter method is determined through introspection using the record name. Ignored if class is not set. No *
collection The collection type for repeating records bound to a parent bean object (configured on a group). The value may be set to any fully qualified class name assignable to java.util.Collection or java.util.Map, or to one of the collection type aliases: list, set, map or array. A collection type can only be set if class or target is also set.

BeanIO will not derive the collection type from it's parent bean object.

No *
key The name of a descendant field to use for the Map key when collection is assignable to a java.util.Map. No *
order The order this record must appear within its parent group or stream.

If strict is set to true at the stream level, order will default to the order assigned to its preceding sibling plus one (i.e. the record or group that shares the same parent), or 1 if this record is the first child in its parent group or stream. If strict is false, defaults to 1.

If order is explicitly set for one record, it must be set for all other siblings that share the same parent.

No *
minOccurs The minimum number of occurences of this record within its parent group or stream. Defaults to 0. No *
maxOccurs The maximum number of occurrences of this record within its parent group or stream. Defaults to unbounded. No *
occurs An alternative to specifying both minOccurs and maxOccurs that uses range notation. No *
lazy If set to true, the class or collection bound to this record will only be instantiated if at least one child attribute is not null or the empty String. Defaults to false. [Only applies if this record is bound to an attribute of a parent group.] No *
template The name of the template to include. The template is added to the record layout before any child of this record. No *
ridLength The expected length of this record for identifying it. The value uses range notation.

If the stream format is delimited or csv, record length is measured by number of fields. If the stream format is fixedlength, record length is measured in characters.

No csv, delimited, fixedlength
minLength If the stream format is delimited or csv, minLength is the minimum number of fields required by this record. If strict is true, defaults to the number of fields defined for the record, otherwise 0.

If the stream format is fixedlength, minLength is the minimum number of characters required by this record. If strict is true, defaults to the sum of all field lengths definied for the record, otherwise 0.

No csv, delimited, fixedlength
maxLength If the stream format is delimited or csv, maxLength is the maximum number of fields allowed by this record. If strict is true, defaults to the number of fields defined for the record, or if no fields are declared or strict is false, then unbounded.

If the stream format is fixedlength, maxLength is the maximum number of characters allowed by this record. If strict is true, defaults to the sum of all field lengths defined for the record, or if no fields are declared or strict is false, then unbounded.

No csv, delimited, fixedlength
xmlName The local name of the XML element mapped to this record. Defaults to the record name. No xml
xmlNamespace The namespace of the XML element mapped to this record. Defaults to the namespace declared for this record's parent group or stream. No xml
xmlPrefix The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). No xml

Children: ( field | property | segment | include )*

A.11. segment

A segment is used to bind groups of fields to a nested bean object, or to validate repeating groups of fields, or in an XML formatted stream, to wrap one or more fields in an element.

Attributes:

Attribute Description Required Format(s)
name The name of the segment. If the segment is bound to a bean object, the segment name is used for the name of the bean property unless a getter or setter is set. Yes *
class The fully qualified class name of the bean object bound to this segment. If set to map or any java.util.Map implementation, a Map object will be used with field/segment names for keys and field/segment values for values. No *
value
target
The name of a child segment or field to return in lieu of an assigned class. If set, all other descendants are not bound to the parent bean property. No *
getter The getter method used to get the bean object bound to this segment from it's parent. By default, the getter method is determined through introspection using the segment name. Ignored if class is not set. No *
setter The setter method used to set the bean object bound to this segment on the bean object of it's parent. By default, the setter method is determined through introspection using the segment name. Ignored if class is not set. No *
collection The collection type for repeating segments bound to a parent bean object. The value may be set to any fully qualified class name assignable to java.util.Collection or java.util.Map, or to one of the collection type aliases: list, set, map or array. A collection type can only be set if class or target is also set.

BeanIO will not derive the collection type from it's parent bean object.

There are a few restrictions specific to repeating segments in any "flat" format (delimited, CSV or fixedlength):

  • Repeating segments must appear in the stream consecutively.
  • A repeating segment cannot contain repeating fields or segments where the length is indeterminate (i.e. where minOccurs does not match maxOccurs).
  • Repeating segments must fully declare all child fields- there can be no gaps in the definition. (However, you can still skip over unbound values using the ignore field attribute.)
No *
key The name of a descendant field to use for the Map key when collection is assignable to a java.util.Map. No *
minOccurs The minimum consecutive occurrences of this segment. Defaults to 1.

If minOccurs is 0, a null bean object bound to this segment will not be marshalled (unless a subsequent field is marshalled for CSV, delimited and fixed legnth stream formats)

During unmarshalling, if the configured minimum occurrences is not met, an InvalidRecordException is thrown.

No *
maxOccurs The maximum consecutive occurrences of this segment. By default, maxOccurs is set to minOccurs or 1, whichever is greater. If there is no limit to the number of occurrences, the value may be set to unbounded.

If set for a CSV, delimited or fixed length stream, the value can only exceed minOccurs if the segment appears at the end of a record.

Maximum occurrences is not used for validation. If bounded, the size of a bound collection will not exceed the configured value, and additional occurrences are ignored.

No *
occurs An alternative to specifying both minOccurs and maxOccurs that uses range notation. No *
occursRef The name of a preceding field in the same record that controls the number of occurrences of this segment. If the controlling field is not bound to a separate property (i.e. ignore="true"), its automatically set based on the size of the segment collection during marshalling. No csv, delimited, fixedlength
lazy If set to true, the class or collection bound to this segment will only be instantiated if at least one child attribute is not null or the empty String. Defaults to false. This functionality differs from minOccurs in that the fields may still exist in the input stream. No *
template The name of the template to include. The template is added to the layout before any child of this segment. No *
xmlType The XML node type mapped to this segment. If not specified or set to element, this bean is mapped to an XML element. If set to none, children of this segment are expected to be contained by this segment's parent. No xml
xmlName The local name of the XML element mapped to this segment. Defaults to the segment name. No xml
xmlNamespace The namespace of the XML element mapped to this segmnet. Defaults to the namespace declared for the parent record or segmnet. No xml
xmlPrefix The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). No xml
nillable Set to true if the W3C Schema Instance attribute nil should be set to true when marshalling a null bean object. Defaults to false. During unmarshalling, a nillable element will cause an InvalidRecordException if nillable is false.. No xml

Children: ( field | property | segment | include )*

A.12. field

A field element is used to bind a field belonging to a record or segment to a bean property.

Attributes:

Attribute Description Required Formats
name The name of field. Unless a getter and/or setter is defined, the field name is used for the bean property name. Yes *
getter The getter method used to retrieve the property value for this field from its parent bean class. By default, the getter method is determined through introspection using the field name. No *
setter The setter method used to set the property value of this field on its parent bean class. By default, the setter method is determined through introspection using the field name.

If the field is a constructor argument, setter may be set to '#N', where N is the position of the argumnet in the constructor beginning at 1.

No *
rid Record identifier indicator. Set to true if this field is used to identify a record. More than one field can be used to identify a record. Defaults to false.

Record identifying fields must have regex or literal configured to match a record, unless the field is a named XML element or attribute.

No *
at
position
For delimited and CSV formatted streams, position (or at) is the index of the field within the record, beginning at 0. And for fixed length formatted streams, position is the index of the first character of the field within the record, beginning at 0.

Negative numbers can be used to indicate the position is relative to the end of the record. For example, the position -2 indicates the second to last field in a delimited record.

If the field repeats, or the field belongs to a segment that repeats, position should be set based on the first occurrence of the field in a record.

A position must be specified for all fields in a record, or for none at all. If positions are not specified, BeanIO will automatically calculate field positions based on the order in which the fields are defined in the mapping file.

Position, if defined, is also used in XML formatted streams for ordering fields within their parent record or segment. This is typically not needed when using a mapping file, but can be useful when using annotations.

If a position is configured for a parent segment (with annotations), the positions declared for fields added to the segment are assumed to be relative to their parent.

No *
until The maximum position of the field in the record. Only applies to fields that repeat where the number of occurrences is indeterminate (i.e. maxOccurs is greater than minOccurs). until must always be specified relative to the end of the record, and is therefore always a negative number. No csv, delimited, fixedlength
trim Set to true to trim the field text before validation and type conversion. Defaults to false. No *
lazy Set to true to convert empty field text to null before type conversion. For repeating fields bound to a collection, the collection will not be created if all field values are null or the empty String. Defaults to false. No *
required Set to true if this field is required. If a field is required and its field text is empty, a BeanReader will throw an InvalidRecordException when reading the record. Defaults to false. No *
minLength The minimum length of the field text before type conversion. Minimum length is only validated if the field length is greater than 0. Defaults to 0. No *
maxLength The maximum length of the field text before type conversion. Defaults to unbounded. No *
regex The regular expression pattern the field text must match. No *
literal Sets the literal or constant value of this field. When unmarshalled, an InvalidRecordException is thrown if the field text does not exactly match the literal value. No *
default The default value of this field.

When unmarshalling a stream, this value is set on the bean object when the field text is null or the empty string. And when marshalling, the default value is used when the property value is null or ignore is set to true (unless disabled).

A default value is converted to a Java object using the same type handler configured for the field.

No *
type The fully qualified class name or type alias of the field value. By default, BeanIO will derive the field type from its parent bean class. This attribute can be used to override the default, or may be needed when its parent class is of type java.util.Map. No *
collection If a repeating field is bound to a collection object, collection is the fully qualified class name of the java.util.Collection implementation, or a collection type alias. When a collection is configured, the type attribute is used to declare the property type of an item stored in the collection. May be set to array if the collection type is a Java array.

Repeating fields bound to a property value must have collection configured. BeanIO will not derive the collection type from a field's parent bean class.

No *
minOccurs The minimum consecutive occurrences of this field in a record. Defaults to 1, with one exception: a field in an XML formatted stream bound to an attribute defaults to 0.

minOccurs controls whether a field is marshalled for a null field value, and whether the field must be present during unmarshalling. If minOccurs is 1 or greater and the field is not present during unmarshalling, an InvalidRecordException is thrown.

No *
maxOccurs The maximum consecutive occurrences of this field in a record. By default, maxOccurs is set to minOccurs or 1, whichever is greater. If overridden for a non-XML stream format, the value can only exceed minOccurs if this is the last field in the record. The value may be set to unbounded if there is no limit to the number of occurrences of this field.

Maximum occurrences is not used for validation. When bounded, the size of a collection will not exceed the configured value, and additional occurrences are ignored.

No *
occurs An alternative to specifying both minOccurs and maxOccurs that uses range notation. No *
occursRef The name of a preceding field in the same record that controls the number of occurrences of this field. If the controlling field is not bound to a separate property (i.e. ignore="true"), its automatically set based on the size of the field collection during marshalling. No csv, delimited, fixedlength
format The decimal format pattern for java.lang.Number field values, or the simple date format pattern for java.util.Date field properties.

The format value can also be accessed by any custom type handler that implements org.beanio.types.ConfigurableTypeHandler.

No *
typeHandler The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on the field type when set, or through introspection of this field's parent bean class. No *
ignore Set to true if this field is not a property of it's parent bean class. Defaults to false. Note that any configured validation rule on an ignored field is still performed. No *
length The padded length of this field measured in characters. Length is required for fixed length formatted streams, and can be set for fields in other stream formats (along with a padding character) to enable field padding.

The length of the last field in a fixed length record may be set to unbounded to disable padding and allow a single variable length field at the end of the otherwise fixed length record.

Yes1 *
padding The character used to pad this field. For fixed length formatted streams, padding defaults to a space. For non-fixed length formatted streams, padding is disabled unless a padding character and length are specified.

If padding is enabled, the required field attribute has some control over the marshalling and unmarshalling of null values.

When unmarshalling a field consisting of all spaces in a fixed length stream, if required is false, the field is accepted regardless of the padding character. If required is true, a required field validation error is triggered. And when marshalling a null field value, if required is false, the field text is formatted as spaces regardless of the configured padding character.

In other stream formats that are not fixed length, null field values are unmarshalled and marshalled as empty strings when required is false. When required is true, unmarshalling an empty string will trigger a required field validation error, and marshalling a null value will fill the field text with the padding character up to the padded length of the field.

No *
keepPadding Set to true if field padding should not be removed when unmarshalling a fixed length field. Defaults to false. No fixedlength
lenientPadding Set to true to disable enforcement of the padded field length when unmarshalling a fixed length field. Defaults to false. No fixedlength
justify
align
The justification (i.e. alignment) of the field text within its padding. Either left or right. Defaults to left. No *
xmlType The XML node type mapped to this field. The type can be set to element (default) to map this field to an XML element, attribute to map to an XML attribute, or text to map the field value to the enclosed text of it's parent record or segment.

When set to text, xmlName and xmlNamespace have no effect.

No xml
xmlName The local name of the XML element or attribute mapped to this field. Defaults to the field name. No xml
xmlNamespace The namespace of the XML element mapped to this field. Defaults to the namespace configured for it's immediate parent record or segment. No xml
xmlPrefix The namespace prefix assigned to the configured xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. No xml
nillable Set to true if the W3C Schema Instance attribute nil should be set to true when the marshalled field value is null. Defaults to false. Unmarshalling a non-nillalbe field where nil="true" will cause an InvalidRecordException. No xml

1Only required for fixed length fields. If a literal value is supplied for a fixed length field, length will default to the length of the literal value.

Appendix B: Error Message Parameters

The following table shows the message parameters used to format an error message for each configurable validation rule.

Type Rule Name Index Value
Record Error malformed 0 Line Number
unidentified 0 Line Number
unexpected 0 Line Number
1 Record Label/Name
minLength 0 Line Number
1 Record Label/Name
2 Minimum Length
3 Maximum Length
maxLength 0 Line Number
1 Record Label/Name
2 Minimum Length
3 Maximum Length
Field Error required 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
nillable 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
minLength 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 Minimum Length
5 Maximum Length
maxLength 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 Minimum Length
5 Maximum Length
length 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 Fixed Length Field Length
regex 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 Regular Expression Pattern
type 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 TypeConversionException error message.
literal 0 Line Number
1 Record Label/Name
2 Field Label/Name
3 Field Text
4 Literal value
minOccurs 0 Line Number
1 Record Label/Name
2 Field or Bean Label/Name
3 -
4 Minimum occurrences
5 Maximum occurences
maxOccurs 0 Line Number
1 Record Label/Name
2 Field or Bean Label/Name
3 -
4 Minimum occurrences
5 Maximum occurences

Appendix C: Upgrading a 1.x Mapping File Example

This appendix illustrates typical changes required to update an 1.x mapping file to 2.x.

Given the following 1.x mapping file:

<beanio xmlns="http://www.beanio.org/2011/01" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd">

  <stream name="employees" format="delimited">
    <reader>
      <property name="delimiter" value="," />
    </reader>
    <writer>
      <property name="delimiter" value="," />
    </writer>
    <record name="header" class="example.Header" maxOccurs="1">
      <field name="recordType" rid="true" literal="H" />
      <field name="fileDate" format="yyyy-MM-dd" />
    </record>
    <record name="employee" class="example.Employee" minOccurs="0" minLength="6" maxLength="7">
      <field name="recordType" rid="true" literal="D" />
      <field name="firstName" />
      <field name="lastName" />
      <bean name="address" class="example.Address" >
        <field name="city" />
        <field name="state" />
        <field name="zip" />
      </bean>
      <field name="phoneNumber" />
    </record>
    <record name="trailer" class="example.Trailer" maxOccurs="1">
      <field name="recordType" rid="true" literal="T" />
      <field name="recordCount" />
    </record>
  </stream>

  <stream name="contacts" format="xml" ordered="false">
    <record name="person" class="example.Person" minOccurs="0">
      <field name="firstName" />
      <field name="lastName" minOccurs="1" />
      <field name="phone" collection="list" minOccurs="0" maxOccurs="5" xmlWrapper="phoneList" />
    </record>
    <record name="company" class="example.Person" minOccurs="0">
      <field name="companyName" minOccurs="1" />
      <field name="phone" />
    </record>
  </stream>

</beanio>

The following 2.x mapping file can be created:

<!-- Namespace updated -->
<beanio xmlns="http://www.beanio.org/2012/03" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">

  <!-- Use 'strict' to have BeanIO calculate and enforce default record lengths and ordering -->
  <stream name="employees" format="delimited" strict="true">
    <!-- Combine 'reader' and 'writer' elements into 'parser' -->
    <parser>
      <property name="delimiter" value="," />
    </parser>
    <!-- 'minOccurs' defaults to 0 if not specified -->
    <record name="header" class="example.Header" minOccurs="1" maxOccurs="1">
      <field name="recordType" rid="true" literal="H" />
      <field name="fileDate" format="yyyy-MM-dd" />
    </record>
    <!-- 'minLength/maxLength' not needed, see 'phoneNumber' field below for explanation -->
    <record name="employee" class="example.Employee" minOccurs="0" minLength="6" maxLength="7">
      <field name="recordType" rid="true" literal="D" />
      <field name="firstName" />
      <field name="lastName" />
      <!-- Change 'bean' elements to 'segment' elements -->
      <segment name="address" class="example.Address" >
        <field name="city" />
        <field name="state" />
        <field name="zip" />
      </segment>
      <!-- Use 'minOccurs' to denote optional fields at the end of a record.  When used with
        -- 'strict', there is no need to set 'minLength' and 'maxLength' on the record, unless 
        -- you are not mapping every field -->
      <field name="phoneNumber" minOccurs="0" />
    </record>
    <record name="trailer" class="example.Trailer" minOccurs="1" maxOccurs="1">
      <field name="recordType" rid="true" literal="T" />
      <field name="recordCount" />
    </record>
  </stream>

  <!-- Records are not ordered by default.  -->
  <stream name="contacts" format="xml" ordered="false">
    <!-- minOccurs defaults to 0 -->
    <record name="person" class="example.Person" minOccurs="0">
      <!-- Optional XML elements must set minOccurs to 0 -->
      <field name="firstName" minOccurs="0"/>
      <field name="lastName" minOccurs="1" />
      <!-- Use a 'segment' instead of an 'xmlWrapper' -->
      <segment name="phoneList" minOccurs="0">
        <field name="phone" collection="list" minOccurs="0" maxOccurs="5" xmlWrapper="phoneList" />
      </segment>
    </record>
    <record name="company" class="example.Person" minOccurs="0">
      <field name="companyName" minOccurs="1" />
      <!-- Optional XML elements must set minOccurs to 0 -->
      <field name="phone" minOccurs="0"/>
    </record>
  </stream>

</beanio>