Flat File Parser

After a few projects where I had to parse through legacy flat files I decided enough was enough and decided to write my own parser. This parser would do exactly one thing efficiently and that was convert lines from the flat file to java objects. I wanted something that was thin and did exactly what I mentioned above and no other frills. Though now that I have it working a few frills may be in order . I have created a project at JavaForge where this tool will reside. If you do find it useful please drop a comment in the discussions forum on the javaforge site javaforge.com/project/2066. The goal is to parse a flat file (either character separated columns or fixed length columns). The parser supports two methods of parsing a file. In the first approach you are responsible for reading the file and providing each line that needs to be transformed to the transformer. The second approach is SAX-like, in that you register a listener and the transformer will call your listener whenever it finds a record and also when it could not resolve a record. First let's run through the first approach and at the end I will show you the SAX-line parsing approach.

Let’s create a java bean class to represent our record with space character separated columns.

import org.aver.fft.annotations.Column;

import org.aver.fft.annotations.Transform;

@Transform (spaceEscapeCharacter="_", recordIdValue="88")

public class DelimitedBean

{

.....

@Column(position = 1, required = true)
public int getRecordId()
   return recordId;
}

@Column(position = 2, required = true)
public String getNameOnCard()
   return nameOnCard;
}

@Column(position = 3, required = true)
public String getCardNumber()
   return cardNumber;
}

@Column(position = 4, required = true)
public int getExpMonth()
   return expMonth;
}

@Column(position = 5, required = true)
public int getExpYear()
   return expYear;
}

@Column(position = 6, required = true)
public double getAmount()
   return amount;
}

@Column(position = 7, required = true) public
String getCardSecurityCode()
   return cardSecurityCode;
}

@Column(position = 8, required = true, format = "MMddyyyy")
public Date getTransactionDate()
   return transactionDate;
}

... other methods here ...

}

As you can see we use Java 5.0 annotations to mark our record format. By default the parser sets itself up to parse character separated columns and the delimiter is space.

@Transform (spaceEscapeCharacter="_", recordIdValue="88")

By default the parser is setup to parse character-separated columns. The attribute spaceEscapeCharacter indicates the character used to represent spaces within column data. The parser can replace that with space before loading it into your java object. The recordIdValue identifies the value of the key column. The transformer keeps an internal mapping of the key value to the java bean class that represents it. By default the first column is the key column. You can change that by passing in parameter recordIdColumn for character separated columns or using recordStartIdColumn / recordEndIdColumn for fixed length columns. By default the column separator is space for character. You can change that using columnSeparator.

That’s enough on defining the file format. Now here is how to actually read it.

Transformer spec = 
   TransformerFactory.getTransformer(new Class[] { DelimitedBean.class });

String line = 
   "88 Mathew_Thomas 4111111111111111 02 2008 12.89 222 10212005";

DelimitedBean bean = (DelimitedBean) spec.loadRecord(line);

You get a transformer instance as shown above. Pass it an array of all classes that represent your various records and that uses annotations as defined above. Now you have a fully loaded bean from which to read your data. That’s all.

Now lets see how you define the same for a fixed column record format. The parsing code above stays the same. The difference is in how you annotate your result bean class.

import org.aver.fft.annotations.Column; import org.aver.fft.annotations.Transform;

@Transform(spaceEscapeCharacter = "_", columnSeparatorType = Transformer.ColumnSeparator.FIXLENGTH, recordIdStartColumn = 1, recordIdEndColumn = 2, recordIdValue=”88”)
public class FixedColBean {

@Column(position= 1, start = 1, end = 2, required = true)
public int getRecordId()
   return recordId;
}

@Column(position= 2, start = 3, end = 15, required = true)
public String getNameOnCard()
   return nameOnCard;
}

@Column(position= 3, start = 16, end = 31, required = true)
public String getCardNumber()
   return cardNumber;
}

@Column(position= 4, start = 32, end = 33, required = true)
public int getExpMonth()
   return expMonth;
}

@Column(position= 5, start = 34, end = 37, required = true)
public int getExpYear()
   return expYear;
}

@Column(position= 6, start = 38, end = 43, required = true)
public double getAmount()
   return amount;
}

@Column(position= 7, start = 44, end = 46, required = true) public String getCardSecurityCode() {
   return cardSecurityCode;
}

@Column(position= 8, start = 47, end = 54, required = true, format = "MMddyyyy")
public Date getTransactionDate()
   return transactionDate;
}

… other methods here …

}

The parsing logic stays the same. Just give it the correct line of data.

Now I will show you the SAX-like parsing approach.

package org.aver.fft;

import java.io.File;
import junit.framework.TestCase;

public class DelimitedFullFileReaderTestCase extends TestCase { 
   public void testFullFileReader()
         Transformer spec = TransformerFactory.getTransformer(new Class[] { DelimitedBean.class }); 
         spec.parseFlatFile(new File("c:/multi-record-delcol-file.txt"), new Listener()); 
   }

class Listener implements RecordListener
   public void foundRecord(Object o)
      bean = (DelimitedBean) o; System.out.println(bean.getNameOnCard()); 
   }

   public void unresolvableRecord(String rec)
   }

}

}

I have this project located at: www.javaforge.com/proj/summary.do?proj_id=271

 del.icio.us  Stumbleupon  Technorati  Digg 

 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this entry.
Comments

  • 1/31/2008 6:45 PM Larry wrote:
    I can't access the project even though I just registered. I get a permission denied error.

    Any suggestions?
    Reply to this
    1. 2/5/2008 6:53 AM Mathew Thomas wrote:
      I could not access it. Looks like JavaForge upgraded their site and my project is gone. I will get it reloaded into JavaForge in the next couple of days.

      Sorry about that.

      Reply to this
  • 4/6/2008 6:44 AM Praveen wrote:
    Matt,
    Could we access to source and test files also? I'm trying to implement the filereader0.5,but receiving null pointer exceptions

    Thanks
    Reply to this
    1. 5/13/2008 8:20 PM Mathew Thomas wrote:
      look for a version 0.6 over the next few days (could be tonight or over the weekend) that has the source code and also the license information in the jar file.

      Reply to this
  • 4/25/2008 3:49 AM Chris wrote:
    This looks like an interesting library. Is the source available - I get permission denied on Javaforge? Also what licence is being used?
    Reply to this
  • 5/5/2008 7:27 AM Leonardo wrote:
    Congratulations for your work! I was looking for something like that. Something that could replace BEA's MFL (http://e-docs.bea.com/workshop/docs81/doc/en/integration/dtguide/dtguideNonXML.html?skipReload=true) . But there is still a feature I am in need: nesting formats. Imagine that a given column is not a simple field, but a group of them. For instance:

    12 My_Name First_Son,Second_Son,Third_Son

    The third column brings all the childrens' names for the person named My Name and uses a different separator from the rest of the fields. Can I still solve this problem with your parser?

    Regards.
    Reply to this
  • 5/6/2008 6:24 AM Chris Pheby wrote:
    Mathew, Is the source code available for this project? Also, what is the license? I'd like to use it but will need this information to do so.
    Reply to this
    1. 5/13/2008 8:18 PM Mathew Thomas wrote:
      It uses the apache license 2.0. I will try to get the source code published over the weekend and also have a jar 0.6 with the license information inside. I have had a few requests for source code so i will get that out at the same time.


      Reply to this
    2. 5/13/2008 10:07 PM Mathew Thomas wrote:
      I have put a new release 0.6 on the javaforge site (http://www.javaforge.com/project/2066).

      Please unzip the distribution file flatfilereader-0.6-dist.zip ... it contains the library, source code and javadoc. No code changes were made. All code and functionality from 0.5 is as-is.

      Reply to this
  • 6/3/2009 10:20 AM Oliver wrote:
    Beautiful work. Thanks for sharing this piece of fine software.
    Reply to this
    1. 6/7/2009 8:02 PM Mathew Thomas wrote:
      Thanks for the compliment. It is a simple piece of software...have some more plans to enhance it but finding time (and motivation) is touch.

      Reply to this
Leave a comment

Submitted comments will be subject to moderation before being displayed.

 Enter the above security code (required)

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.