BOM Problem for matching 2 keys' value with TreeMap.getEntryUsingComparator()

BOM Problem for matching 2 keys' value with TreeMap.getEntryUsingComparator() - java

I'm trying to follow the tutorial on this page (https://www.bezkoder.com/spring-boot-upload-csv-file/) in order to insert information from a csv file into a mysql DB but I got stuck by something in the class CSVHelper.
While doing my search, I've found the problem was located in TreeMap.getEntryUsingComparator() where the key value doesn't match with any of the values of the headerMap.
When I checked the variables in the Debug view, I saw the first values were different whereas the text was the same ("Id").
The key argument ("Id") has for value [73, 100]
The headersMap ("Id") has for value [-1, -2, 73, 0, 100, 0]
I have checked the header in the file and there's no space. Otherwise, all the other headers work fine.
After changing the order of the headers, it spotlights that the problem is for the 1st header name. It adds [-1,-2] at the beginning and 0 between the other values.
So, what do you think it can possibly be ? What can I do to solve this ?
Project on Github, branch dev-mysql-csv

This change at the beginning of the input was the consequence of the BOM (Byte Order Mark). The csv file wasn't saved at the good format (I changed from "csv delimited with comma" to "csv delimited with semicolon") then it worked.
But this is ok only when the separator was "," and not ";" that is very odd...
In order to handle the BOM, there is BOMInputStream. I succeeded to run "csv delimited with comma".
I tried to use withRecordSeparator(";") and withDelimiter(";") in order to make it work with ";" but it failed.

Related

Cant parse pipe delimited header data into correct variable

I have a file with data in the first row that i want to extract the data looks like
20200403|AS421|||FINN|
public void handleLine(String line) {
if (line.contains(firstJobConfig.DELIMITER_PIPE)){
headerInfo.setcreateDate(line.substring(0, line.indexOf(firstJobConfig.DELIMITER_PIPE)));
headerInfo.setformName(line.substring(line.indexOf(firstJobConfig.DELIMITER_PIPE)));
}
}
}
I have code that pulls 20200403 into my createDate variable but i cant figure out how to get my formName to be set to AS421. right now its set to |AS421|||FINN|. i know that if i doline.substring(9,14)); it will work but i want to start after the first pipe delimiter( |) and stop at the next one.

Right now, you're doing this: headerInfo.setformName(line.substring(line.indexOf(firstJobConfig.DELIMITER_PIPE))) -> you're taking substring starting with the index equals to index where the first delimiter is and aren't specifying the end of this substring (That's why the result of the second substring is: |AS421|||FINN|). So the better way will be to use line.split("\\|") - It will return the table of 5 elements in your case: ["20200403","AS421","","","FINN"]. And then you can do:
headerInfo.setcreateDate(table[0]);
headerInfo.setformName(table[1])

You can split the strings like below.
Add a + to match one or more instances of the pipe:
temp.split("\\|+");

AS400 SQL Script on a parameter file returns

I'm integrating an application to the AS400 using Java/JT400 driver. I'm having an issue when I extract data from a parameter file - the data retrieved seems to be encoded.
SELECT SUBSTR(F00001,1,20) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY
Output
00001: C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040, - 1
00001: C6C9D9C5406040C3D6D4D4C5D9C3C9C1D3406040, - 2
How can I convert this to a readable format? Is there a function which I can use to decode this?
On the terminal connection to the AS400 the information is displayed correctly through the same SQL query.
I have no experience working with AS400 before this and could really use some help. This issue is only with the parameter files. The database tables work fine.

What you are seeing is EBCDIC output instead of ASCII. This is due to the CCSID not being specified in the database as mentioned in other answers. The ideal solution is to assign the CCSID to your field in the database. If you don't have the ability to do so and can't convince those responsible to do so, then the following solution should also work:
SELECT CAST(SUBSTR(F00001,1,20) AS CHAR(20) CCSID(37))
FROM QS36F."FX.PARA"
WHERE K00001 LIKE '16FFC%%%%%'
FETCH FIRST 5 ROWS ONLY
Replace the CCSID with whichever one you need. The CCSID definitions can be found here: https://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html

Since the file is in QS36F, I would guess that the file is a flat file and not externally defined ... so the data in the file would have to be manually interpreted if being accessed via SQL.
You could try casting the field, after you substring it, into a character format.
(I don't have a S/36 file handy, so I really can't try it)

It is hex of bytes of a text in EBCDIC, the AS/400 charset.
static String fromEbcdic(String hex) {
int m = hex.length();
if (m % 2 != 0) {
throw new IllegalArgumentException("Must be even length");
}
int n = m/2;
byte[] bytes = new byte[n];
for (int i = 0; i < n; ++i) {
int b = Integer.parseInt(hex.substring(i*2, i*2 + 2), 16);
bytes[i] = (byte) b;
}
return new String(bytes, Charset.forName("Cp500"));
}
passing "C6C9D9C540C3D6D4D4C5D9C3C9C1D34040404040".
Convert the file with Cp500 as charset:
Path path = Paths.get("...");
List<String> lines = Files.readAllLines(path, Charset.forName("Cp500"));
For line endings, which are on AS/400 the NEL char, U+0085, one can use regex:
content = content.replaceAll("\\R", "\r\n");
The regex \R will match exactly one line break, whether \r, \n, \r\n, \u0085.

A Big thank you for all the answers provided, they are all correct.
It is a flat parameter file in the AS400 and I have no control over changing anything in the system. So it has to be at runtime of the SQL query or once received.
I had absolutely no clue about what the code page was as I have no prior experience with AS400 and files in it. Hence all your answers have helped resolve and enlighten me on this. :)
So, the best answer is the last one. I have changed the SQL as follows and I get the desired result.
SELECT CAST(F00001 AS CHAR(20) CCSID 37) FROM QS36F."FX.PARA" WHERE K00001 LIKE '16FFC%%%%%' FETCH FIRST 5 ROWS ONLY
00001: FIRE COMMERCIAL , - 1
00001: FIRE - COMMERCIAL - , - 2
Thanks once again.
Dilanke

How to remove minus and plus sign duplicates via Talend job?

I have loaded local file into talend process and need to do below condition this file data
Below my csv file data showing like
NO,DATE,MARK
123,2015-03-01,200
123,2015-03-01,-200
123,2015-03-01,200
123,2015-03-01,200
125,2016-01-01,80
Here above "200" and "-200" two values availed. if I have -200
I need to remove corresponding +200 value after that If I have same NO,DATE,MARK then I need to remove duplicates two
" 123,2015-03-01,200"," 123,2015-03-01,200" = " 123,2015-03-01,200"
Finally my result should come like below
NO,DATE,MARK
123,2015-03-01,200
125,2016-01-01,80
After that I need to some 200 + 80 = 125,2016-01-01,280. How to do above process using talend job.

Step by step, we can start by removing this:
123,2015-03-01,200
123,2015-03-01,-200
we can do it by summing MARK after grouping by NO and DATE by using the talend compoenet tAggregateRow. After, we will get :
123,2015-03-01,0
Now we can use the component tFilterRow to remove all rows having MARK == 0, and the component tUniqRow to remove duplicated rows.
The last step is to get the sum of MARK using tAggregateRow and store it in a context variable, then get the greatest NO and the latest DATE by using the component tSortRow and then get only that row using tSampleRow. We can affect the sum of MARK.

Batch Inserts in PostgreSQL using JDBC

I want to insert a file into postgresql using JDBC.
I know the below command to use but the file downloaded is not parsed with a delimiter("|" or ",")
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN WITH DELIMITER '|'", fr);
My file looks like this :
ABAUT 20131011/0300 8.00 37.84 -109.46 3453.00 21.47 8.33 241.90
ALTU1 20131011/0300 8.00 37.44 -112.48 2146.00 -9999.00 -9999.00 -9999.00
BDGER 20131011/0300 8.00 39.34 -108.94 1529.00 43.40 0.34 271.30
BULLF 20131011/0300 8.00 37.52 -110.73 1128.00 56.43 8.07 197.50
CAIUT 20131011/0300 8.00 38.35 -110.95 1381.00 54.88 8.24 250.00
CCD 20131011/0300 8.00 40.69 -111.59 2743.00 27.94 8.68 285.40
So my question is .. is it necessary to append delimiters to this file to push it into the database using jdbc?

from postgres documentation
delimiter : The single character that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.
It looks like your data is tab delimeted. So using the default should work.
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN", fr);

You need to transform your file in some way, yes.
It looks it is currently either delimited by a variable number of spaces, or it has fixed width fields. The difference being, what would happen if 2146.00 were changed to 312146.00? Would it run into the previous field like "-112.48312146.00", like fixed width would, or would you add a space anyway even though that would break the column alignment?
I don't believe either of those are directly supported by COPY, so some transformation is necessary. Also, -9999.00 looks like a magic value that should probably be converted to NULL.

Java XMLInputFactory - truncates text when reading data with .getData()

I'm using XMLInputFactory to read data (sql queries) from xml file.
In some cases, the data is truncated. For example:
select CASE WHEN count(*) > 0 THEN 'LX1VQMSSRV069 OK' ELSE 'LX1VQMSSRV069 NOK' END from [PIWSLog].[dbo].[log]
is read as (text is truncated after the last '.'):
select CASE WHEN count(*) > 0 THEN 'LX1VQMSSRV069 OK' ELSE 'LX1VQMSSRV069 NOK' END from [PIWSLog].[dbo]
I've tested with several string and it seems that the problem is with the char's in [].[].[]..
I'm readind data using:
mySQLquery = event.asCharacters().getData();
Another situation is if the string has '\n'. Like, if it has two '\n', the event.asCharacters().getData(); reads correctly, but if it has three '\n' it truncates the string after the second '\n'. This is very odd!
Any idea what's the problem and how can I solve it?

The XMLInputFactory API is not obliged to give you all of the characters of a String in one go. It's permitted to pass you a sequence of events, each containing a fragment of the string.
You'll probably find that if you read another event after the one containing the truncated string, you'll find the remainder of your string (possibly after several events).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

BOM Problem for matching 2 keys' value with TreeMap.getEntryUsingComparator() - java

Related

Cant parse pipe delimited header data into correct variable

AS400 SQL Script on a parameter file returns

How to remove minus and plus sign duplicates via Talend job?

Batch Inserts in PostgreSQL using JDBC

Java XMLInputFactory - truncates text when reading data with .getData()

Categories

Resources