Flat file Comparison Tool - java

Flatfile1.txt
HDR06112016FLATFILE TXT
CLM12345 ABCDEF....
DTL12345 100A00....
DTL12345 200B00....
CLM54321 ABCDEF....
DTL54321 100C00....
DTL54321 200D00....
Flatfile2.txt
HDR06112016FLATFILE TXT
CLM54321 FEDCBA....
DTL54321 100C00....
DTL54321 200D00....
CLM12345 ABCDEF....
DTL12345 100A00....
DTL12345 200B00....
Mapping for both file will be same:
Header:
Field StartPosition EndPos Length
Identifier 1 3 3
Date 4 12 8
and so on
Clm:
Field StartPosition EndPos Length
Identifier 1 3 3
Key 4 12 8
Data 13 19 6
and so on
Dtl:
Field StartPosition EndPos Length
Identifier 1 3 3
Key 4 12 8
and so on
This is a sample file and may size upto 500mb and about 50 fields. I will need to compare the two files based on their mapping. The file format is - one header and claim data(12345) in one line and its detail data can be more than one. These claims can be present randomly in the other file.Its not line to line mapping. Detail data ordering will be same in both the files.
Desired output :
For Key 54321 , Data(pos 13:19) is not same.
Would you please help me in comparing the two files? Will it be feasible in Java since the file size will be huge?.

Java would work fine. You don't need to have the files entirely in memory; you can open them both and read from both incrementally, comparing as you go along.

Related

Retrieve The line number of The TSV file being read by flatfileitemReader in spring batch process

My Requirement: I have a TSV FILE with millions of records in it. I use flatfileitemreader to read these values in chunks which goes to precessor to apply business logic to data and then use jdbcBatchItemWriter to write the value obtained. I want to add the insert the line number of TSV row being read along with the row in database so that later in my business logic if i find any error in the value inserted , then i can move it to some error file along with its row number. so that the client knows which line number the error occurred in.
Problem Statement: I have to store the line number of the TSV file record in database along with its content.
eg:- SAMPLE TSV FILE
Name Age Tier
Ram 24 3
Sam 12 1
Rose 18 2
Dan 43 3
Mini 10 1
Lily 19 2
How I want it to be stored
text line number
Lily|19|2 6
Dan|43|3 4
Ram|24|3 1
Sam|12|1 2
Mini|10|1 5
Rose|18|2 3
How it is being stored
text line number
Lily|19|2 1
Dan|43|3 1
Ram|24|3 6
Sam|12|1 4
Mini|10|1 5
Rose|18|2 5
in my step i have a task executor configured in the step like this:-
#Bean
public Step step1() throws Exception {
return stepBuilderFactory().get("step1").<InputFormatClass, OutputFormatClass>chunk(Integer.parseInt(chunksize))
.reader(reader())
.processor(processor())
.writer(writerJdbc())
.taskExecutor(taskExecutor())
.build();
}
I suspect that due to task executor(multi threading) , my line number is getting read parallely and thus not getting correctly assigned.
Note:- I am using a custom reader with same implementation as flatfileItemReader, i have appended the lineNumber to the line being read in protected T doRead() throws Exception
Please suggest a way in which I can insert the row number correctly in multithreaded enviornment.

Writing Html file with java duplicates the entry

I have a program to do some calculations in excel and writing the output in a table tag in html file. I am adding rows dynamically at runtime depending on the number of results. While writing to html file the entries are not correct.
Suppose i have 50 rows in a html file. I am appening 49 rows at runtime in the template file and replacing values $id0, $age0, $time0.....$id49, $age49, $time49 in html file . For me first 10 rows are writing properly. From 11th row, the values are writing wrong. I am getting correct ones in the logs as well.
for(int i = 0; i < c; i++) {
htmlString = htmlString.replace("$id"+i, cycle.get("id"+i).toString().trim());
htmlString = htmlString.replace("$time"+i, cycle.get("time"+i).toString().trim());
htmlString = htmlString.replace("$name"+i, cycle.get("name"+i).toString().trim())
}
The entry comes in html as
id Name age time
9 abc 8 8.08
10 xyz 12 9.19
11 xyz1 121 9.191
12 xyz12 122 9.192
the values for id 11, 12 are wrong. It shows 10th id's values appended with 1,2 etc.
I was able to resolve by adding an extra character after the $id1 like $id1:.
Example:
id1=abc
id2=xyz
without the extra code $id11 was giving as abc1

To read large file and store them in pojo and inturn in arraylist

Hi I have my program structure - in a spring boot project as below:
in a Service class:
File 1 - loaded in arraylist1(list of pojos)
File 2 -- loaded in arraylist2(list of pojos)
File 3- loaded in arraylist3(list of pojos)
input file --- parsed and loaded in arraylist.
output Arraylist
for (iterate input file- arraylist){
//for output field 1
for(iterate in file1){
if field available in file1 - assign output column
else
reject record..
}
//for output field 2
for(iterate in file2){
if field available in file2 - assign output column
else
reject record..
}
//for output field 3
for(iterate in file3){
if field available in file3 - assign output column
else
reject record..
}
assign to other output fields from input fields..
output field 4=inputfield 4
output field 5=inputfield 5
output field 6=inputfield 6
output field 7=inputfield 7
output field 8=inputfield 8
outputList.add(output pojo)
}
So while reading the File 2 which is of 2 gb , the process hungs or throws Out of memory error. Completely stuck with this,Please help with this problem.
Thank you
When dealing with large input/outputs the best way of approaching the problem is to chunk it. For example, you could set a max size of each arraylist to something like 10000 and process each arraylist in chunks.
However, given your file size I feel that you could perhaps use a database rather than trying to work with such large inputs in memory. You should rethink your approach.

FOREACH in cypher - neo4j

I am very new to CYPHER QUERY LANGUAGE AND i am working on relationships between nodes.
I have a CSV file of table containing multiple columns and 1000 rows.
Template of my table is :
cdrType ANUMBER BNUMBER DUARTION
2 123 456 10
2 890 456 5
2 123 666 2
2 123 709 7
2 345 789 20
I have used these commands to create nodes and property keys.
LOAD CSV WITH HEADERS FROM "file:///2.csv" AS ROW
CREATE (:ANUMBER {aNumber:ROW.aNumber} ),
CREATE (:BNUMBER {bNumber:ROW.bNumber} )
Now I need to create relation between all rows in the table and I think FOREACH loop is best in my case. I created this query but it gives me an error. Query is :
MATCH (a:ANUMBER),(b:BNUMBER)
FOREACH(i in RANGE(0, length(ANUMBER)) |
CREATE UNIQUE (ANUMBER[i])-[s:CALLED]->(BNUMBER[i]))
and the error is :
Invalid input '[': expected an identifier character, whitespace,
NodeLabel, a property map, ')' or a relationship pattern (line 3,
column 29 (offset: 100)) " CREATE UNIQUE
(a:ANUMBER[i])-[s:CALLED]->(b:BNUMBER[i]))"
I need relation for every row. like in my case. 123 - called -> 456 , 890 - called -> 456. So I need visual representation of this calling data that which number called which one. For this I need to create relation between all rows.
any one have idea how to solve this ?
What about :
LOAD CSV WITH HEADERS FROM "file:///2.csv" AS ROW
CREATE (a:ANUMBER {aNumber:ROW.aNumber} )
CREATE (b:BNUMBER {bNumber:ROW.bNumber} )
MERGE (a)-[:CALLED]->(b);
It's not more complex than that i.m.o.
Hope this helps !
Regards,
Tom

How can I easily customize the XML input files of the Curriculum Course example of OptaPlanner?

I'm quite a beginner for not only OptaPlanner but also Java, which I started to learn two weeks ago. I am considering to start with modifying an example accompanied with the source program to make a scheduler for my present project.
Target
The XML input file of the “Curriculum Course” example of RedHat OptaPlanner
Question
Is there any easy way to modify the XML input files or add customized ones?
Example data
Original XML: optaplanner-examples/data/curriculumcourse/unsolved/comp01.xml
<CourseSchedule id="1">
...
<curriculumList id="27">
<Curriculum id="28">
<code>q000</code>
</Curriculum>
...
<courseList id="42">
<Course id="43">
<curriculumList id="44">
<Curriculum reference="28"/>
<Curriculum reference="30"/>
</curriculumList>
</Course>
...
</courseList>
...
</CourseSchedule>
The problem is that I have to renumber all the ID numbers every time I change the length of any lists.
In the “Curriculum Course” example, I can find the corresponding .ctt files that are easy to change using a common text editor. The following is input data that I want to create expressed in the .ctt format:
.ctt modified from: optaplanner-examples/data/curriculumcourse/import/comp01.ctt
Name: Test01
Courses: 3
Rooms: 3
Days: 5
Periods_per_day: 13
Curricula: 3
Constraints: 0
COURSES:
c0001 t000 5 99 1
c0002 t001 5 99 1
c0003 t002 5 99 1
ROOMS:
P01 999
P02 999
P03 999
CURRICULA:
q001 1 c0001
q002 1 c0002
q003 1 c0003
UNAVAILABILITY_CONSTRAINTS:
END.
I wonder some tools could convert this text data into the XML files specific to the “Curriculum Course” example.
If you click "Import..." button of the application, you can load the .ctt file and solve it. You don't need to convert .ctt to .xml. So there should be no problem for you to place your own xxx.ctt file under data/curriculumcourse/import and let the app import it.

Categories