Writing Html file with java duplicates the entry - java

I have a program to do some calculations in excel and writing the output in a table tag in html file. I am adding rows dynamically at runtime depending on the number of results. While writing to html file the entries are not correct.
Suppose i have 50 rows in a html file. I am appening 49 rows at runtime in the template file and replacing values $id0, $age0, $time0.....$id49, $age49, $time49 in html file . For me first 10 rows are writing properly. From 11th row, the values are writing wrong. I am getting correct ones in the logs as well.
for(int i = 0; i < c; i++) {
htmlString = htmlString.replace("$id"+i, cycle.get("id"+i).toString().trim());
htmlString = htmlString.replace("$time"+i, cycle.get("time"+i).toString().trim());
htmlString = htmlString.replace("$name"+i, cycle.get("name"+i).toString().trim())
}
The entry comes in html as
id Name age time
9 abc 8 8.08
10 xyz 12 9.19
11 xyz1 121 9.191
12 xyz12 122 9.192
the values for id 11, 12 are wrong. It shows 10th id's values appended with 1,2 etc.

I was able to resolve by adding an extra character after the $id1 like $id1:.
Example:
id1=abc
id2=xyz
without the extra code $id11 was giving as abc1

Related

i want to convert a report which is in text format into a xlsx document. but the problem is data in text file has some missing column values

typical report data is like this,
A simple approach that i wanted to follow was to use space as a delimeter but the data is not in a well structured manner
read the first line of the file and split each column by checking if there is more than 1 whitespace. In addition to that you count how long each column is.
after that you can simply go through the other rows containing data and extract the information, by checking the length of the column you are at
(and please don't put images of text into stackoverflow, actual text is better)
EDIT:
python implementation:
import pandas as pd
import re
file = "path/to/file.txt"
with open("file", "r") as f:
line = f.readline()
columns = re.split(" +", line)
column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
column_sizes.append(-1)
# ------
f.readline()
rows = []
while True:
line = f.readline()
if len(line) == 0:
break
elif line[-1] != "\n":
line += "\n"
row = []
for i in range(len(column_sizes)-1):
value = line[column_sizes[i]:column_sizes[i+1]]
row.append(value)
rows.append(row)
columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)
print(df)
df.to_excel(file.split(".")[0] + ".xlsx")
You are correct that export from text to csv is not a practical start, however it would be good for import. So here is your 100% well structured source text to be saved into plain text.
And here is the import to Excel
you can use google lens to get your data out of this picture then copy and paste to excel file. the easiest way.
or first convert this into pdf then use google lens. go to file scroll to print option in print setting their is an option of MICROSOFT PRINT TO PDF select that and press print it will ask you for location then give it and use it

AS400 SQL Script on a parameter file returns

I'm integrating an application to the AS400 using Java/JT400 driver. I'm having an issue when I extract data from a parameter file - the data retrieved seems to be encoded.
SELECT SUBSTR(F00001,1,20) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY
Output
00001: C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040, - 1
00001: C6C9D9C5406040C3D6D4D4C5D9C3C9C1D3406040, - 2
How can I convert this to a readable format? Is there a function which I can use to decode this?
On the terminal connection to the AS400 the information is displayed correctly through the same SQL query.
I have no experience working with AS400 before this and could really use some help. This issue is only with the parameter files. The database tables work fine.
What you are seeing is EBCDIC output instead of ASCII. This is due to the CCSID not being specified in the database as mentioned in other answers. The ideal solution is to assign the CCSID to your field in the database. If you don't have the ability to do so and can't convince those responsible to do so, then the following solution should also work:
SELECT CAST(SUBSTR(F00001,1,20) AS CHAR(20) CCSID(37))
FROM QS36F."FX.PARA"
WHERE K00001 LIKE '16FFC%%%%%'
FETCH FIRST 5 ROWS ONLY
Replace the CCSID with whichever one you need. The CCSID definitions can be found here: https://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html
Since the file is in QS36F, I would guess that the file is a flat file and not externally defined ... so the data in the file would have to be manually interpreted if being accessed via SQL.
You could try casting the field, after you substring it, into a character format.
(I don't have a S/36 file handy, so I really can't try it)
It is hex of bytes of a text in EBCDIC, the AS/400 charset.
static String fromEbcdic(String hex) {
int m = hex.length();
if (m % 2 != 0) {
throw new IllegalArgumentException("Must be even length");
}
int n = m/2;
byte[] bytes = new byte[n];
for (int i = 0; i < n; ++i) {
int b = Integer.parseInt(hex.substring(i*2, i*2 + 2), 16);
bytes[i] = (byte) b;
}
return new String(bytes, Charset.forName("Cp500"));
}
passing "C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040".
Convert the file with Cp500 as charset:
Path path = Paths.get("...");
List<String> lines = Files.readAllLines(path, Charset.forName("Cp500"));
For line endings, which are on AS/400 the NEL char, U+0085, one can use regex:
content = content.replaceAll("\\R", "\r\n");
The regex \R will match exactly one line break, whether \r, \n, \r\n, \u0085.
A Big thank you for all the answers provided, they are all correct.
It is a flat parameter file in the AS400 and I have no control over changing anything in the system. So it has to be at runtime of the SQL query or once received.
I had absolutely no clue about what the code page was as I have no prior experience with AS400 and files in it. Hence all your answers have helped resolve and enlighten me on this. :)
So, the best answer is the last one. I have changed the SQL as follows and I get the desired result.
SELECT CAST(F00001 AS CHAR(20) CCSID 37) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY
00001: FIRE COMMERCIAL , - 1
00001: FIRE - COMMERCIAL - , - 2
Thanks once again.
Dilanke

To read large file and store them in pojo and inturn in arraylist

Hi I have my program structure - in a spring boot project as below:
in a Service class:
File 1 - loaded in arraylist1(list of pojos)
File 2 -- loaded in arraylist2(list of pojos)
File 3- loaded in arraylist3(list of pojos)
input file --- parsed and loaded in arraylist.
output Arraylist
for (iterate input file- arraylist){
//for output field 1
for(iterate in file1){
if field available in file1 - assign output column
else
reject record..
}
//for output field 2
for(iterate in file2){
if field available in file2 - assign output column
else
reject record..
}
//for output field 3
for(iterate in file3){
if field available in file3 - assign output column
else
reject record..
}
assign to other output fields from input fields..
output field 4=inputfield 4
output field 5=inputfield 5
output field 6=inputfield 6
output field 7=inputfield 7
output field 8=inputfield 8
outputList.add(output pojo)
}
So while reading the File 2 which is of 2 gb , the process hungs or throws Out of memory error. Completely stuck with this,Please help with this problem.
Thank you
When dealing with large input/outputs the best way of approaching the problem is to chunk it. For example, you could set a max size of each arraylist to something like 10000 and process each arraylist in chunks.
However, given your file size I feel that you could perhaps use a database rather than trying to work with such large inputs in memory. You should rethink your approach.

Forming DataFrames from CSV files with different headers in Spark

I am trying to read a folder of Gzipped CSV's (without extension) with a list of variables. e.g.:
CSV file 1: TIMESTAMP | VAR1 | VAR2 | VAR3
CSV file 2: TIMESTAMP | VAR1 | VAR3
Each file represents a day. The order of the columns can be different (or there can be missing columns on one file).
The first option of reading the whole folder on one shot using spark.read is discarded because the join between the files is taking into account the column order and not the column names.
My next options is to read by file:
for (String key : pathArray) {
Dataset<Row> rawData = spark.read().option("header", true).csv(key);
allDatasets.add(rawData);
}
And then do a full outer join on the column names:
Dataset<Row> data = allDatasets.get(0);
for (int i = 1; i < allDatasets.size(); i++) {
ArrayList<String> columns = new
ArrayList(Arrays.asList(data.columns()));
columns.retainAll(new
ArrayList(Arrays.asList(allDatasets.get(i).columns())));
data = data.join(allDatasets.get(i),
JavaConversions.asScalaBuffer(columns), "outer");
}
But this process is very slow as it loads a file at a time.
The next approach is to use sc.binaryFiles as with sc.readFiles is not possible to make a workaround for adding custom Hadoop codecs(in order to be able to read Gzipped files without the gz extension).
Using the latest approach and translating this code to Java I have the following:
A JavaPairRDD<String, Iterable<Tuple2<String, String>>> containing the name of the variable (VAR1) and a iterable of tuples TIMESTAMP,VALUE for that VAR.
I would like to form with this a DataFrame representing all the files, however I am completely lost on how to transform this final PairRDD to a Dataframe. The DataFrame should represent the contents of all the files together. And example of the final DataFrame that I would like to have is the following:
TIMESTAMP | VAR1 | VAR2 | VAR3
01 32 12 32 ==> Start of contents of file 1
02 10 5 7 ==> End of contents of file 1
03 1 5 ==> Start of contents of file 2
04 4 8 ==> End of contents of file 2
Any suggestions or ideas?
Finally I got it with very good performance:
Reading by month in "background" (using a Java Executor to read in parallel other folders with CSV's), with this approach the time that the Driver takes while scanning each folder is reduced because is done in parallel.
Next, the process follows extracting on the one hand the headers and on the other hand their contents (tuples with varname, timestamp, value).
Finally, union the contents using the RDD API and make the Dataframe with the headers.

Flat file Comparison Tool

Flatfile1.txt
HDR06112016FLATFILE TXT
CLM12345 ABCDEF....
DTL12345 100A00....
DTL12345 200B00....
CLM54321 ABCDEF....
DTL54321 100C00....
DTL54321 200D00....
Flatfile2.txt
HDR06112016FLATFILE TXT
CLM54321 FEDCBA....
DTL54321 100C00....
DTL54321 200D00....
CLM12345 ABCDEF....
DTL12345 100A00....
DTL12345 200B00....
Mapping for both file will be same:
Header:
Field StartPosition EndPos Length
Identifier 1 3 3
Date 4 12 8
and so on
Clm:
Field StartPosition EndPos Length
Identifier 1 3 3
Key 4 12 8
Data 13 19 6
and so on
Dtl:
Field StartPosition EndPos Length
Identifier 1 3 3
Key 4 12 8
and so on
This is a sample file and may size upto 500mb and about 50 fields. I will need to compare the two files based on their mapping. The file format is - one header and claim data(12345) in one line and its detail data can be more than one. These claims can be present randomly in the other file.Its not line to line mapping. Detail data ordering will be same in both the files.
Desired output :
For Key 54321 , Data(pos 13:19) is not same.
Would you please help me in comparing the two files? Will it be feasible in Java since the file size will be huge?.
Java would work fine. You don't need to have the files entirely in memory; you can open them both and read from both incrementally, comparing as you go along.

Categories