In my algorithm, I read a large file line by line (just a simple .txt format) and transform each line of the file into an object.
#Override
public void configure() {
from("file:from/")
.split(body().tokenize("\n"))
.streaming()
.process(handle());
}
private Processor handle() {
return exchange -> {
final String body = exchange.getIn().getBody(String.class);
// convert to DTO
System.out.println(dto);
};
}
But the file contains the first and last lines, which should be removed. These lines start with \\test.
My question is: how can I delete these lines using the Apache Camel API, without check for each line for equality to this value \\test?
I don't want to do something like this for each file line (pseudocode):
if (getFirstStringСharacter().equals("\\test") {
removeString();
}
Perhaps the Apache Camel before starting to read the file can do preliminary actions and simply ignore the first and last lines.
The split EIP is producing (among others) two interesting exchange properties on each Exchange that are split:
CamelSplitIndex
CamelSplitComplete
Assuming the line "//test" is always present in the first and the last line, your processor (handle) could skip the processing
when CamelSplitIndex==0 [first line]
OR when CamelSplitComplete is true [last line]
Example: skip first line
from("...")
.split(body().tokenize("\n"))
.streaming()
.filter( simple("${exchangeProperty.CamelSplitIndex} > 0") )
.process( handle() );
To answer your last question:
.filter( simple("${exchangeProperty.CamelSplitComplete} == false") )
In case of +/- complex condition, I recommend the use of Camel Predicate, eg:
import org.apache.camel.support.builder.PredicateBuilder;
Predicate isNotFirst = PredicateBuilder.isGreaterThan( exchangeProperty("CamelSplitIndex"), constant(0) );
Predicate isNotLast = PredicateBuilder.isNotEqualTo( exchangeProperty("CamelSplitComplete"), constant(true) );
Predicate retained = PredicateBuilder.and(isNotFirst, isNotLast);
from("...")
.filter(retained)
Related
I Have a formatted text file called cars.txt; It's separated by tabs.
Name Length Width
truck1 18.6 8.1
suv1 17.4 7.4
coupe1 14.8 5.4
mini1 14.1 5.0
sedan1 16.4 6.1
suv2 17.5 7.3
mini2 14.3 5.2
sedan2 16.5 6.2
I need to read in this information so it can be used for calculations later on.
This is my current idea but I am having a hard time piecing together what I need to execute.
public class Class{
public void readFileIn(){
Scanner sc = new Scanner(new FileReader("cars.txt");
try{
while (sc.hasNextLine()){
if (/**something that catches strings*/){
method1(string1, double1, double2);
method2(double1, double2);
}
}
}catch(FileNotFoundException exception){
System.out.println("File dosen't exist");
}
}
}
Scanner and Buffer Reader are not used very often anymore as Java provides a better way to achieve tha same result with less code.
I can see at least three possible approaches to solve your problem:
approach 1: if you can use at least Java 8, then I would suggest to use the java.nio.file libraries to read the file as a stream of lines:
Stream<String> linesStream=Files.lines("cars.txt");
Then depending on what you need to do, you could use either forEach that will loop on each line of the stream:
linesStream.forEach(e -> e.myMethod());
Or Java Collectors to execute the calculation that you need to. A good tutorial about Collectors can be found here. You can use collectors also to separate your string etc...
approach 2: you can use Apache Commons libraries to achieve the same goal. In particular you could use FileUtils and StringUtils. For instance:
File carFile=new File("cars.txt");
LineIterator lineIterator=lineIterator(carFile);
for(String line : lineIterator) {
String[] my values=StringUtils.split(line);
//do whatever you need
}
approach 3: use Jackson to transform your file into a json or a java object that you can then use for your own transformations. Here is an example explaining how to convert a CSV to JSON. With a bit of digging in the Jackson documentation, you could apply it to your case.
First of all, i recommend you create an Entry class that represents your data.
private class Entry {
private String name;
private double length;
private double width;
// getters and setters omitted
#Override
public String toString() {
// omitted
}
}
Next, create a method that takes a String as an arguments and is responsible for parsing a line of text to an instance of Entry. The regex \\s+ matches any whitespace characters and will split your line to its individual columns. Remember that in production, Double.valueOf can throw an RuntimeException if your are not passing a valid String.
Finally, you can read the file, here using the Java 8 stream API. Skip the first line since it includes the column header and not actual data.
private void readFile() throws Exception {
Path path = Paths.get(/* path to your file */);
Files.readAllLines(path).stream().skip(1).map(FileReadTest::toEntry)
.forEach(this::action);
}
In my example, i am just printing each entry to the console:
private void action(Entry entry) {
System.out.println(entry);
}
Resulting output:
Entry[name='truck1', length=18.6, width=8.1]
Entry[name='suv1', length=17.4, width=7.4]
Entry[name='coupe1', length=14.8, width=5.4]
Entry[name='mini1', length=14.1, width=5.0]
Entry[name='sedan1', length=16.4, width=6.1]
Entry[name='suv2', length=17.5, width=7.3]
Entry[name='mini2', length=14.3, width=5.2]
Entry[name='sedan2', length=16.5, width=6.2]
Here's an example of how to properly read a text file - replace the charset with the one you need.
try (final BufferedReader br = Files.newBufferedReader(file.toPath(), StandardCharsets.UTF_8)) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
Once you have the individual lines, you can split them by whitespace: str.split("\\s+");
You get an array with three entries. I guess you can figure out the rest.
so I'm writing a program that will read a csv file and put each individual line of the file into an array. I would like to know if it would be possible to name a singular array that was created in the while loop. I would also love to know if you have any ideas on how I'd be able to separate the lines (containing the rows of the csv file) by the columns of the csv file.
This is my code:
package sample.package;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class SampleClass {
/**
* #param args
*/
public static String fileLocation; //Used to declare the file path
public static void readAndArray(String fileLocation) throws IOException { //Method to read a file and put each line into an array
BufferedReader lineRead = new BufferedReader(new FileReader(fileLocation)); //Read the file and be able to parse it into separate lines
String line = lineRead.readLine(); //Put line into variable so it isn't a boolean statement
while ((line = lineRead.readLine()) !=null) { //Make it possible to print out array as BufferedReader is used
String[] oneLine = new String[] {line}; //Put parsed line in new array and parsing that data for individual spots in array
System.out.println(oneLine[0]); //Print out each oneLine array
}
lineRead.close(); //Neatly close BufferedReader and FileReader
}
public static void main(String[] args) throws IOException{
readAndArray("filePath"); //Initialize method by inputting the file path
}
}
Thanks so much guys!
First of: Welcome to stackoverflow!
I assume that your question relates to some sort of educational programming task, if not: There are a number of libraries dealing with CSV files (with additional features like reading header rows and read row entries by header/column name, etc).
But ... why should it be a complicated task to write a CSV parser, i mean, its basically just values separated by commas, phhh!?
To cut a long story short: There is RFC 4180, but don't expect that all your .csv files stick to it. Quoting Wikipedia:
The CSV file format is not fully standardized. The basic idea of
separating fields with a comma is clear, but that idea gets
complicated when the field data may also contain commas or even
embedded line-breaks. CSV implementations may not handle such field
data, or they may use quotation marks to surround the field. Quotation
does not solve everything: some fields may need embedded quotation
marks, so a CSV implementation may include escape characters or escape
sequences.
After you are aware of and hopefully understand the provided warning about compatibility the following code example completely ignores it and provides a quick and dirty solution (don't use it in production, seriously ... how to not be fired 101: use a well tested library like opencsv or Apache Commons CSV):
public static void main(String[] args)
{
Path path = Paths.get("some path"); // TODO Change to path to an CSV file
try (Stream<String> lines = Files.lines(path))
{
List<List<String>> rows = lines
// Map each line of the file to its fields (String[])
.map(line -> line.split(","))
// Map the fields of each line to unmodifiable lists
.map(fields -> Collections.unmodifiableList(Arrays.asList(fields))
// Collect the unmodifiable lists in an unmodiable list (listception)
.collect(Collectors.toUnmodifiableList());
// Ensure file is not empty.
if (rows.isEmpty())
{
throw new IllegalStateException("empty file");
}
// Ensure all lines have the same number of fields.
int fieldsPerRow = rows.get(0).size();
if (!rows.stream().allMatch(row -> row.size() == fieldsPerRow))
{
throw new IllegalStateException("not all rows have the same number of fields");
}
// Assume the file has a header line appearing as the first line.
System.out.printf("Column names: %s\n", rows.get(0));
// Read the data rows.
rows.stream()
.skip(1) // Skip header line
.forEach(System.out::println);
}
catch (IOException | UncheckedIOException e)
{
e.printStackTrace(); // TODO Handle exception
}
}
This code assumes:
Fields are separated by commas
A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed (behaviour of BufferedReader used by Files.lines())
All lines have the same number of fields
The first line is a header line
The implementor was too lazy to mind fields enclosed in double quotes
Quick and dirty ... and violates the RFC ;)
I have the following route:
from("direct:abc")
// read the file
.split(body().tokenize("\n", 3, false)).streaming().stopOnException()
.unmarshal(new BindyCsvDataFormat(Foo.class))
.process(new FooListProcessor());
The problem is that if I have 4 records in file first group comes in processor as List but the second as a single Foo object.
I have to check body with instanceof and create a list every time when such cases occurs.
Foo class:
#CsvRecord(separator = ",")
public class Foo {
#DataField(pos = 1)
private String fooField;
#DataField(pos = 2, trim = true)
private String barField;
}
File content:
"lorem","ipsum"
"dolorem","sit"
"consectetur","adipiscing"
"eiusmod","incididunt"
Is there a way to force Bindy to always unmarshall into a List?
No bindy returns a single instance if there is single instance. And a list of there are more.
I have logged a ticket for an improvement so you can configure this: https://issues.apache.org/jira/browse/CAMEL-12321
Just a small point. Since it is not supported as #Claus said, instead of you doing instance of check in processor code, you could as well do it in the route like this and let camel handle it for you.
from("file:///tmp/camel/input")
// read the file
.split(body().tokenize("\n", 3, false)).streaming().stopOnException()
.unmarshal(new BindyCsvDataFormat(Foo.class))
.choice()
.when(body().isInstanceOf(List.class))
.process(exchange -> { // Your logic here for list})
.otherwise()
.process(exchange -> {// Your logic here for individual items})
.endChoice();
I have an input file like this
1234AA11BB4321BS33XY...
and I want to split it into single messages like this
Message 1: 1234AA11BB
Message 2: 4321BS33XY
transform the records into Java objects, marshal them to xml with jaxb and aggregate about 1000 records in the outgoing Message.
Transformation and marshalling is no problem but I can't split the String above.
There is no delimiter but the length. Every Record is exactly 10 characters long.
I was wondering if there is an out of the box solution like
split(body().tokenizeBySize(10)).streaming()
Since in reality each record consists of 300 characters and there may be 500.000 records in a file, I want to split an InputStream.
In other examples I saw custom iterators used for splitting but all of them where token or xml based.
Any idea?
By the way we are bound to Java 6 and camel 2.13.4
Thanks
Nick
The easiest way would be to split by empty string - .split().tokenize("", 10).streaming() - meaning that tokenizer will take each character - and group 10 tokens (characters) together and then aggregate them into a single group e.g.
#Override
public void configure() throws Exception {
from("file:src/data?delay=3000&noop=true")
.split().tokenize("", 10).streaming()
.aggregate().constant(true) // all messages have the same correlator
.aggregationStrategy(new GroupedMessageAggregationStrategy())
.completionSize(1000)
.completionTimeout(5000) // use a timeout or a predicate
// to know when to stop
.process(new Processor() { // process the aggregate
#Override
public void process(final Exchange e) throws Exception {
final List<Message> aggregatedMessages =
(List<Message>) e.getIn().getBody();
StringBuilder builder = new StringBuilder();
for (Message message : aggregatedMessages) {
builder.append(message.getBody()).append("-");
}
e.getIn().setBody(builder.toString());
}
})
.log("Got ${body}")
.delay(2000);
}
EDIT
Here's my memory consumption in streaming mode with 2s delay for a 100MB file:
Why not let a normal java class do the splitting and refer to it? See here:
http://camel.apache.org/splitter.html
Code example taken from the documentation.
The below java dsl uses the "method" to call the split method defined in a separate class.
from("direct:body")
// here we use a POJO bean mySplitterBean to do the split of the payload
.split().method("mySplitterBean", "splitBody")
Below you define your splitter and return each split message.
public class MySplitterBean {
/**
* The split body method returns something that is iteratable such as a java.util.List.
*
* #param body the payload of the incoming message
* #return a list containing each part splitted
*/
public List<String> splitBody(String body) {
// since this is based on an unit test you can of cause
// use different logic for splitting as Camel have out
// of the box support for splitting a String based on comma
// but this is for show and tell, since this is java code
// you have the full power how you like to split your messages
List<String> answer = new ArrayList<String>();
String[] parts = body.split(",");
for (String part : parts) {
answer.add(part);
}
return answer;
}
I am writing a tool to parse some very big files, and I am implementing it using Camel. I have used Camel for other things before and it has served me well.
I am doing an initial Proof of Concept on processing files in streaming mode, because if I try to run a file that is too big without it, I get a java.lang.OutOfMemoryError.
Here is my route configuration:
#Override
public void configure() throws Exception {
from("file:" + from)
.split(body().tokenize("\n")).streaming()
.bean(new LineProcessor())
.aggregate(header(Exchange.FILE_NAME_ONLY), new SimpleStringAggregator())
.completionTimeout(150000)
.to("file://" + to)
.end();
}
from points to the directory where my test file is.
to points to the directory where I want the file to go after processing.
With that approach I could parse files that had up to hundreds of thousands of lines, so it's good enough for what I need. But I'm not sure the file is being aggregated correctly.
If i run cat /path_to_input/file I get this:
Line 1
Line 2
Line 3
Line 4
Line 5
Now on the output directory cat /path_to_output/file I get this:
Line 1
Line 2
Line 3
Line 4
Line 5%
I think this might be a pretty simple thing, although I don't know how to solve this. both files have slightly different byte sizes as well.
Here is my LineProcessor class:
public class LineProcessor implements Processor {
#Override
public void process(Exchange exchange) throws Exception {
String line = exchange.getIn().getBody(String.class);
System.out.println(line);
}
}
And my SimpleStringAggregator class:
public class SimpleStringAggregator implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if(oldExchange == null) {
return newExchange;
}
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String body = oldBody + "\n" + newBody;
oldExchange.getIn().setBody(body);
return oldExchange;
}
}
Maybe I shouldn't even worry about this, but I would just like to have it working perfectly since this is just a POC before I get to the real implementation.
It looks like your input files last character is a line break. You split up the file with \n and add it back in the aggregator except for the last line. Because there is no new line left the line terminator \n is removed from the last line. One solution might by adding the \n in advance:
String body = oldBody + "\n" + newBody + "\n";
The answer from 0X00me is probably correct however you are doing unneeded work probably.
I assume you are using a version of camel higher than 2.3. In which case you can drop the aggregation implementation completely as according to the camel documentation:
Camel 2.3 and newer:
The Splitter will by default return the original input message.
Change your route to something like this(I cant test it):
#Override
public void configure() throws Exception {
from("file:" + from)
.split(body().tokenize("\n")).streaming()
.bean(new LineProcessor())
.completionTimeout(150000)
.to("file://" + to)
.end();
}
If you need to do custom aggregation then you need to implement the aggregator. I process files this way daily and always end with exactly what I started with.