How to convert windows saved csv to webserver on linux - java

The problem is that I have a windows excel exported CSV with Swedish letters åäöÅÄÖ. When I upload them and convert to string I get those letters completely messed up. The server is tomcat7 on linux. It's set to use iso-8859-1.
I have tried different byte[] conversions but none seem to work. I have removed all conversions I have tried from this code.
public void run(InputStreamReader is) {
BufferedReader br = null;
String line = "";
String cvsSplitBy = ";";
try {
br = new BufferedReader(is);
while ((line = br.readLine()) != null) {
// use comma as separator
String[] playerInfo = line.split(cvsSplitBy);
System.out.println("Förnamn: " + playerInfo[0]
+ "Efternamn: " + playerInfo[1]
+ "Klubb= " + playerInfo[7]
+ " , datum=" + playerInfo[10]
+ " , Total= " + playerInfo[14]
+ " , serier= " + playerInfo[15]);
saveInfo(playerInfo);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("Done");
}

Related

Reading from BigQuery and store data to Google storage (Special Character issue)

Reference: Can Google Data flow use existent VM and not temporary created ones?
Code is working, but the issue is that when it saves response from BigQuery to google storage all the Japanese characters are corrupted.
PCollectionTuple QVCollections = rows.apply("FilterEmptyRows", ParDo.of(new FilterEmptyRowDoFn("TransactionId", "TransactionDateTime"))).apply("CreateQVFiles",ParDo.of(new TransactionToQVFilesDoFnJP())
.withOutputTags(BobShare.QVHeaders, TupleTagList.of(BobShare.QVEvents).and(BobShare.QVPayments)));
QVCollections.get(BobShare.QVEvents).apply("WriteQVEvents", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "events_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_EVENTS).withSuffix(".csv"));
QVCollections.get(BobShare.QVPayments).apply("WriteQVPayments", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "payments_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_PAYMENTS).withSuffix(".csv"));
QVCollections.get(BobShare.QVHeaders).apply("WriteQVHeaders", TextIO.write().to(storagePath + CSV_OUTPUT_FOLDER + "header_" + timeSuffix).withoutSharding().withHeader(CSV_HEADER_TRANSACTION).withSuffix(".csv"));
Based on what I have found, need to use .withCoder(StringUtf8Coder.of())
In addition, this is what have tried (but working only locally - DirectRunner)
private static void uploadBlob(String project, String bucket, String filename, String localfile) {
String listFromCsv = readCsvFromLocalStorage(localfile);
Storage storage = StorageOptions.newBuilder().setProjectId(project).build().getService();
BlobId blobId = BlobId.of(bucket, filename);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("application/json").setContentEncoding(UTF_8).build();
try {
storage.create(blobInfo, listFromCsv.getBytes(UTF_8));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
private static String readCsvFromLocalStorage(String fileName) {
StringBuilder builder = new StringBuilder();
Path pathToFile = Paths.get(fileName);
try (BufferedReader br = Files.newBufferedReader(pathToFile,
StandardCharsets.UTF_8)) {
// read the first line from the text file
String line = br.readLine();
// loop until all lines are read
while (line != null) {
builder.append(line).append("\n");
line = br.readLine();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
return builder.toString();
}
private static void deleteLocalFile (String fileName)
{
try {
if (new File(fileName).delete()) {
System.out.println(fileName + " deleted.");
} else {
System.out.println(fileName + " could not be deleted.");
}
} catch (Exception e)
{
System.out.println(fileName + " could not be deleted.");
e.printStackTrace();
}
}
This is how data looks like (corrupted) :
JAPANESE CHRACTERS
Any suggestions? Any .... (((
You need to replace
BufferedReader br = Files.newBufferedReader(pathToFile,
StandardCharsets.UTF_8))
by
BufferedReader br = Files.newBufferedReader(pathToFile,
Charset.forName("UTF-8"))

Lag from BufferedReader

I am having trouble finding the source of lag in my code. I believe I have narrowed the possible source down to this method.
Essentially I start a script, set it in a Process variable p, and grab the output from the script using a BufferedReader, and put it into an ArrayList.
Somehow I am getting lag when the script outputs (it outputs at a 5 minute interval)
Any ideas?
public void runCommand(String path)
{
if (SystemUtils.IS_OS_WINDOWS)
{
ProcessBuilder builder = new ProcessBuilder("cmd.exe", "/c", "cd " + path + " && " + this.getCommand());
builder.redirectErrorStream(true);
try
{
p = builder.start();
}
catch (IOException e)
{
e.printStackTrace();
}
}
else
{
try
{
String name = ManagementFactory.getRuntimeMXBean().getName();
String pid = name.substring(0, name.indexOf("#"));
p = Runtime.getRuntime().exec("./btrace.sh " + pid + " " + path + " " + this.getConfig().getPort());
}
catch (Exception e)
{
e.printStackTrace();
}
}
BufferedReader r = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line;
try
{
// Print out everything that's happening.
while (true)
{
line = r.readLine();
if (line == null)
{
break;
}
if (this.isDebugEnabled)
{
System.out.println("[Script Output]: " + line);
}
lines.add(line);
}
r.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
Seems like I found the cause of lag.
After researching how ArrayLists resize, I realized it could be performance taxing and tried using a Linked list.
So far seems like the issue is fixed.
Thanks!

Debugging File Search / Merge Code

This program is meant to see two files located in a particular folder and then merge those two files and create a third file which is does. From the third merged file it is then searching for a keyword such as "test", once it finds that key word it prints out the location and the line of the keyword which is what is somewhat doing. What is happening is when I run the program it stops after the finds the keyword the first time in a line but it will not continue to search that line. So if there is multiple keyword 'test' in the line it will only find the first one and spit back the position and line. I want it to print both or multiple keywords. I think it is because of the IndexOf logic which is causing the issue.
import com.sun.deploy.util.StringUtils;
import java.io.*;
import java.lang.*;
import java.util.Scanner;
public class Concatenate {
public static void main(String[] args) {
String sourceFile1Path = "C:/Users/me/Desktop/test1.txt";
String sourceFile2Path = "C:/Users/me/Desktop/test2.txt";
String mergedFilePath = "C:/Users/me/Desktop/merged.txt";
File[] files = new File[2];
files[0] = new File(sourceFile1Path);
files[1] = new File(sourceFile2Path);
File mergedFile = new File(mergedFilePath);
mergeFiles(files, mergedFile);
stringSearch(args);
}
private static void mergeFiles(File[] files, File mergedFile) {
FileWriter fstream = null;
BufferedWriter out = null;
try {
fstream = new FileWriter(mergedFile, true);
out = new BufferedWriter(fstream);
} catch (IOException e1) {
e1.printStackTrace();
}
for (File f : files) {
System.out.println("merging: " + f.getName());
FileInputStream fis;
try {
fis = new FileInputStream(f);
BufferedReader in = new BufferedReader(new InputStreamReader(fis));
String aLine;
while ((aLine = in.readLine()) != null) {
out.write(aLine);
out.newLine();
}
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private static void stringSearch(String args[]) {
try {
String stringSearch = "test";
BufferedReader bf = new BufferedReader(new FileReader("C:/Users/me/Desktop/merged.txt"));
int linecount = 0;
String line;
System.out.println("Searching for " + stringSearch + " in file");
while (( line = bf.readLine()) != null){
linecount++;
int indexfound = line.indexOf(stringSearch);
if (indexfound > -1) {
System.out.println(stringSearch + " was found at position " + indexfound + " on line " + linecount);
System.out.println(line);
}
}
bf.close();
}
catch (IOException e) {
System.out.println("IO Error Occurred: " + e.toString());
}
}
}
It's because you are searching for the word once per line in your while loop. Each iteration of the loop takes you to the next line of the file because you are calling bf.readLine(). Try something like the following. You may have to tweak it but this should get you close.
while (( line = bf.readLine()) != null){
linecount++;
int indexfound = line.indexOf(stringSearch);
while(indexfound > -1)
{
System.out.println(stringSearch + " was found at position " + indexfound + " on line " + linecount);
System.out.println(line);
indexfound = line.indexOf(stringSearch, indexfound);
}
}

File IO Exceptions in Java

I am reading input from a tab delimited file in a java class. The file opens properly and the information from the file seems to be read in correctly as well. Every line in the file winds up printing to the screen as expected but then at the end of file it seems like it tries to print one more line and I get an ArrayIndexOutOfBoundsException: 1.
It is worth noting that if I uncomment the line where I output the value of sCurrentline and comment out the output of the split array I do not get the error.
Code:
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader(fname));
while ((sCurrentLine = br.readLine()) != null){
String[] values = sCurrentLine.split("\\t", -1); // don't truncate empty fields
System.out.println("Col1: " + values[0] + " Col2: " + values[1] + " Col3: "
+ values[2] + " Col4: " + values[3] + " Col5: " + values[4] );
//System.out.println(sCurrentLine);
}
} catch (IOException e) {
System.out.println("IOException");
e.printStackTrace();
} finally {
try {
if(br != null){
br.close();
}
} catch (IOException ex) {
System.out.println("ErrorClosingFile");
ex.printStackTrace();
}
}
The last line has not the same amount of elements as the other lines. After splitting the last line, you try to access fields of the array, that do not exist. That is indicated by the exception http://docs.oracle.com/javase/7/docs/api/java/lang/ArrayIndexOutOfBoundsException.html. Before you access the fields of the array, you have to check if there is the expected amount of items in it. Like this:
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader(fname));
while ((sCurrentLine = br.readLine()) != null){
String[] values = sCurrentLine.split("\\t", -1); // don't truncate empty fields
if (5 == values.length) {
System.out.println("Col1: " + values[0] + " Col2: " + values[1] + " Col3: "
+ values[2] + " Col4: " + values[3] + " Col5: " + values[4] );
}
// System.out.println(sCurrentLine);
}
} catch (IOException e) {
System.out.println("IOException");
e.printStackTrace();
} finally {
try {
if(br != null){
br.close();
}
} catch (IOException ex) {
System.out.println("ErrorClosingFile");
ex.printStackTrace();
}
}
the code seems ok... do you have an empty newline at the end?
while ((sCurrentLine = br.readLine()) != null){
if (sCurrentLine.isEmpty() || sCurrentLine.startsWith(";")) // skip empty and comment lines
continue;
String[] values = sCurrentLine.split("\\t"); // are you sure the -1 is required?
...
}
Try this
String[] values = "".split("\\t", -1); // don't truncate empty fields
int index=1;
StringBuffer sb = new StringBuffer();
for (String value : values) {
sb.append("Col"+index+":").append(value).append(" ");
index++;
}
System.out.println(sb.toString());
Is obvious that you are reading an array position that does not exist

how to remove double quotes while reading CSV

public class CSVTeast {
public static void main(String[] args) {
CSVTeast obj = new CSVTeast();
obj.run();
}
public void run() {
String csvFile = "D:\\text.csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = "~";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
// use comma as separator
String[] csvRead = line.split(cvsSplitBy);
System.out.println("Value [date= " + csvRead[5]
+ " , name=" + csvRead[9]+"]");
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("Done");
}
}
Output is
Value [date= "POLICY_CHANGE_EFFECTIVE_DATE" , name="AGENCY_NAME"]
Value [date= "2014-04-01" , name="USI INSURANCE SERVICES]--this value stated with double qoutes but not end with same .
Expected output
Value [date= POLICY_CHANGE_EFFECTIVE_DATE , name=AGENCY_NAME]
Value [date= 2014-04-01 , name=USI INSURANCE SERVICES]
You can try passing the value through the String.replace() method.
So your code would be:
public class CSVTeast {
public static void main(String[] args) {
CSVTeast obj = new CSVTeast();
obj.run();
}
public void run() {
String csvFile = "D:\\text.csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = "~";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] csvRead = line.split(cvsSplitBy);
System.out.println("Value [date= " + csvRead[5].replace("\"","")
+ " , name=" + csvRead[9].replace("\"","")+"]");
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("Done");
}
}
There's a nice CSV Reader for Java that will handle the mess of this for you, http://opencsv.sourceforge.net/
It has a maven package if your project is maven, else you can download the JARs there.
If the qoutemarks are at the beginning of every CSV line, you can do:
csvRead[5].substring(1, csvRead[5].length()-1)
That will remove the first and last character of that particular string. You then need to store the results somewhere or print it out.
It is also important to check if the String starts with a double quote, otherwise the code will start deleting the first character of the CSV value. I do this in my code in one of my apps, where my CSV value is coming in rowData[1] which sometimes have double quotes and sometimes it doesn't, depending upon the number of words in the value String.
String item = (String.valueOf(rowData[1].charAt(0)).equals("\"") ? rowData[1].substring(1, rowData[1].length() - 1) : rowData[1]);

Categories