I want to insert a file into postgresql using JDBC.
I know the below command to use but the file downloaded is not parsed with a delimiter("|" or ",")
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN WITH DELIMITER '|'", fr);
My file looks like this :
ABAUT 20131011/0300 8.00 37.84 -109.46 3453.00 21.47 8.33 241.90
ALTU1 20131011/0300 8.00 37.44 -112.48 2146.00 -9999.00 -9999.00 -9999.00
BDGER 20131011/0300 8.00 39.34 -108.94 1529.00 43.40 0.34 271.30
BULLF 20131011/0300 8.00 37.52 -110.73 1128.00 56.43 8.07 197.50
CAIUT 20131011/0300 8.00 38.35 -110.95 1381.00 54.88 8.24 250.00
CCD 20131011/0300 8.00 40.69 -111.59 2743.00 27.94 8.68 285.40
So my question is .. is it necessary to append delimiters to this file to push it into the database using jdbc?
from postgres documentation
delimiter : The single character that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.
It looks like your data is tab delimeted. So using the default should work.
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN", fr);
You need to transform your file in some way, yes.
It looks it is currently either delimited by a variable number of spaces, or it has fixed width fields. The difference being, what would happen if 2146.00 were changed to 312146.00? Would it run into the previous field like "-112.48312146.00", like fixed width would, or would you add a space anyway even though that would break the column alignment?
I don't believe either of those are directly supported by COPY, so some transformation is necessary. Also, -9999.00 looks like a magic value that should probably be converted to NULL.
Related
typical report data is like this,
A simple approach that i wanted to follow was to use space as a delimeter but the data is not in a well structured manner
read the first line of the file and split each column by checking if there is more than 1 whitespace. In addition to that you count how long each column is.
after that you can simply go through the other rows containing data and extract the information, by checking the length of the column you are at
(and please don't put images of text into stackoverflow, actual text is better)
EDIT:
python implementation:
import pandas as pd
import re
file = "path/to/file.txt"
with open("file", "r") as f:
line = f.readline()
columns = re.split(" +", line)
column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
column_sizes.append(-1)
# ------
f.readline()
rows = []
while True:
line = f.readline()
if len(line) == 0:
break
elif line[-1] != "\n":
line += "\n"
row = []
for i in range(len(column_sizes)-1):
value = line[column_sizes[i]:column_sizes[i+1]]
row.append(value)
rows.append(row)
columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)
print(df)
df.to_excel(file.split(".")[0] + ".xlsx")
You are correct that export from text to csv is not a practical start, however it would be good for import. So here is your 100% well structured source text to be saved into plain text.
And here is the import to Excel
you can use google lens to get your data out of this picture then copy and paste to excel file. the easiest way.
or first convert this into pdf then use google lens. go to file scroll to print option in print setting their is an option of MICROSOFT PRINT TO PDF select that and press print it will ask you for location then give it and use it
I'm trying to follow the tutorial on this page (https://www.bezkoder.com/spring-boot-upload-csv-file/) in order to insert information from a csv file into a mysql DB but I got stuck by something in the class CSVHelper.
While doing my search, I've found the problem was located in TreeMap.getEntryUsingComparator() where the key value doesn't match with any of the values of the headerMap.
When I checked the variables in the Debug view, I saw the first values were different whereas the text was the same ("Id").
The key argument ("Id") has for value [73, 100]
The headersMap ("Id") has for value [-1, -2, 73, 0, 100, 0]
I have checked the header in the file and there's no space. Otherwise, all the other headers work fine.
After changing the order of the headers, it spotlights that the problem is for the 1st header name. It adds [-1,-2] at the beginning and 0 between the other values.
So, what do you think it can possibly be ? What can I do to solve this ?
Project on Github, branch dev-mysql-csv
This change at the beginning of the input was the consequence of the BOM (Byte Order Mark). The csv file wasn't saved at the good format (I changed from "csv delimited with comma" to "csv delimited with semicolon") then it worked.
But this is ok only when the separator was "," and not ";" that is very odd...
In order to handle the BOM, there is BOMInputStream. I succeeded to run "csv delimited with comma".
I tried to use withRecordSeparator(";") and withDelimiter(";") in order to make it work with ";" but it failed.
I have loaded local file into talend process and need to do below condition this file data
Below my csv file data showing like
NO,DATE,MARK
123,2015-03-01,200
123,2015-03-01,-200
123,2015-03-01,200
123,2015-03-01,200
125,2016-01-01,80
Here above "200" and "-200" two values availed. if I have -200
I need to remove corresponding +200 value after that If I have same NO,DATE,MARK then I need to remove duplicates two
" 123,2015-03-01,200"," 123,2015-03-01,200" = " 123,2015-03-01,200"
Finally my result should come like below
NO,DATE,MARK
123,2015-03-01,200
125,2016-01-01,80
After that I need to some 200 + 80 = 125,2016-01-01,280. How to do above process using talend job.
Step by step, we can start by removing this:
123,2015-03-01,200
123,2015-03-01,-200
we can do it by summing MARK after grouping by NO and DATE by using the talend compoenet tAggregateRow. After, we will get :
123,2015-03-01,0
Now we can use the component tFilterRow to remove all rows having MARK == 0, and the component tUniqRow to remove duplicated rows.
The last step is to get the sum of MARK using tAggregateRow and store it in a context variable, then get the greatest NO and the latest DATE by using the component tSortRow and then get only that row using tSampleRow. We can affect the sum of MARK.
How I can process CSV file where some fields are wrapped in quotes?
Line to process for example (field delimiter is ',')
I am column1, I am column2, "yes, I'm am column3"
The example has three columns. But the following example will say that I have four columns:
A = load '/path/to/file' using PigStorage(',');
Please, any suggestions, link to resource..?
Try loading the data, then do a FOREACH GENERATE to regenerate the data into whatever format you need. For the fields where you need to remove the quotes, use a REPLACE($3, '\"').
data = LOAD 'testdata' USING PigStorage(",");
data = FOREACH data GENERATE
(chararray) $0 AS col1:chararray,
(chararray) $1 AS col2:chararray,
(chararray) REPLACE($3, '\"') AS col3:chararray);
MAJOR ACC NO,MINOR ACC NO,STD CODE,TEL NO,DIST CODE
7452145,723456, 01,4213036,AAA
7254287,7863265, 01,2121920,AAA
FRUNDTE,FMACNO,FACCNO,FDISTCOD,FBILSEQ,FOOCTYP,FOOCDES,FOOCAMT,FSTD,FTELNO,FNORECON,FXFRACCN,FLANGIND,CUR
12345,71234,7643234,AAA,001,DX,WLR Promotion - Insitu /Pre-Cabled PSTN Connection,-37.87,,,0,,E,EUR
FRUNDTE,FMACNO,FACCNO,FDISTCOD,FBILSEQ,FORDNO,FREF,FCHGDES,FCHGAMT,CUR,FORENFRM,FORENTO
3242241,72349489,2345352,AAA,001,30234843P ,1,NEW CONNECTION - PRECABLED CHARGE,37.87,EUR,2123422,201201234
12123471,7618412389,76333232,AAA,001,3123443P ,2,BROKEN PERIOD RENTAL,5.40,EUR,201234523,20123601
I have a csv file something like the one above and I want to extract certain columns from it. For example I want to extract the first column of the first paragraph. I'm kind of new to java but I am able to read the file but I want to extract certain columns from different paragraphs. Any help will be appreciated.