Struggling to parse different text files based on their delimiters - java

Ive been working on this on and off today.
Here is my method, which basically needs to accept a .data (txt) file location, and then go through the contents of that text file and break it up into strings based on the delimiters present. These are the 2 files.
The person file.
Person ID,First Name,Last Name,Street,City
1,Ola,Hansen,Timoteivn,Sandnes
2,Tove,Svendson,Borgvn,Stavanger
3,Kari,Pettersen,Storgt,Stavanger
The order file.
Order ID|Order Number|Person ID
10|2000|1
11|2001|2
12|2002|1
13|2003|10
public static void openFile(String url) {
//initialize array for data to be held
String[][] myStringArray = new String[10][10];
int row = 0;
try {
//open the file
FileInputStream fstream = new FileInputStream(url);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
//ignores any blank entries
if (!"".equals(strLine)) {
//splits by comma(\\| for order) and places individually into array
String[] splitStr = new String[5];
//splitStr = strLine.split("\\|");
/*
* This is the part that i am struggling with getting to work.
*/
if (strLine.contains("\\|")) {
splitStr = strLine.split("\\|");
} else if (strLine.contains(",")) {
splitStr = strLine.split(",");
}else{
System.out.println("error no delimiter detected");
}
for (int i = 0; i < splitStr.length; i++) {
myStringArray[row][i] = splitStr[i];
System.out.println(myStringArray[row][i]);
}
}
}
//Close the input stream
br.close();
} catch (FileNotFoundException ex) {
Logger.getLogger(Client.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Client.class.getName()).log(Level.SEVERE, null, ex);
}
}
The person file is correctly read and parsed. But the order file with the "|" delimiter is having none of it. I just get 'null' printouts.
Whats confusing me is that when i just have splitStr = strLine.split("\|"); It works but i need this method to be able to detect the delimiter present and then apply the correct split.
Any help will be much appreciated

Apart from the fact that this should be done using a CSV library, the reason this code is failing is that contains doesnt accept a regular expression. Remove the escape characters so the pipe character can be detected
if (strLine.contains("|")) {

Related

Java ignore a specific symbol after another specific symbol

I have a .csv file. Data is divided by commas and I need to extract information out this file. Thing is if i just write this it works but partially:
String file = "FinalProject/src/Data.csv";
BufferedReader rd = null;
String line = "";
HashSet<String> platforms = new HashSet<String>();
try
{
rd = new BufferedReader(new FileReader(file));
rd.readLine();
while ((line = rd.readLine())!=null)
{
String [] arr = line.split("\"");
var words = new ArrayList<String>();
for(int i =0; i < arr.length;i++)
{
if(i % 2 == 0)
{
words.addAll(Arrays.asList(arr[i].split(",")));
}
else
{
words.add(arr[i]);
}
platforms.add(words.get(2));
}
}
}
catch (Exception e)
{
System.out.println("");
}
finally
{
try
{
rd.close();
}
catch (IOException e)
{
throw new RuntimeException(e);
}
}
When I check the contents of Set and extract the same data out of the database created from this .csv file it shows difference. For example - my set has 38 values, when the database has 40, all of them are unique( nothing is repeated). I think the problem is caused by separation of data in .csv file with comma signs. Because some of these signs are inside of quotes and this probably causes a loss of the potential values that i need. Is there any solution to that problem? Or perhaps there is a more efficient way to deal with the comma sings inside of the quotes so that they are ignored?

Retrieve data from txt file and replace the data into a new txt file using java

I am trying to read in a text file and then manipulate a little and update the records into a new text file.
Here is what I have so far:
ArrayList<String> linesList = new ArrayList<>();
BufferedReader br;
String empid, email;
String[] data;
try {
String line;
br = new BufferedReader(new FileReader("file.txt"));
while ((line = br.readLine()) !=null) {
linesList.add(line);
}
br.close();
}
catch (IOException e) { e.printStackTrace(); }
for (int i = 0; i < linesList.size(); i++) {
data = linesList.get(i).split(",");
empid = data[0];
ccode = data[3];
}
File tempFile = new File("File2.txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
for (int i = 0; i < linesList.size(); i++) {
if(i==0){
bw.write(linesList.get(i));
bw.newLine();
}
else{
data = linesList.get(i).split(",");
String empid1 = data[0];
if(data[13].equals("IND")) {
String replace = data[3].replaceAll("IND", "IN");
ccode1 = replace;
System.out.println(ccode1);
}
else if(data[13].equals("USA")) {
String replace = data[3].replaceAll("USA", "US");
ccode1 = replace;
}
else {
ccode1 = replace; //This does not work as replace is not defined here, but how can I get it to work here.
}
String newData=empid1+","+ccode1;
bw.write(newData);
bw.newLine();
}
}
Here is what is inside the text file:
EID,First,Last,Country
1,John,Smith,USA
2,Jane,Smith,IND
3,John,Adams,USA
So, what I need help with is editing the three letter country code and replacing it with a 2 letter country code. For example: USA would become US, and IND would become IN. I am able to read in the country code, but am having trouble in changing the value and then replacing the changed value back into a different text file. Any help is appreciated. Thanks in advance.
Open file in text editor, Search and Replace, ,USA with ,US, ,IND with ,IN and so on.
As such, to automate it, on the same while loop you read a line do:
//while(read){ line.replaceAll(",USA",",US");
That will be the easiest way to complete your objective.
To save, open a BufferedWriter bw; just like you opened a reader and use bw.write(). You would probably prefer to open both at the same time, the reader on your source file, and the writer on a new file, with _out suffix. That way you dont need to keep the file data in memory, you can read and write as you loop.
For harder ways, read the csv specs: https://www.rfc-editor.org/rfc/rfc4180#section-2
Notice that you have to account for the possibility of fields being enclosed in quotes, like: "1","John","Smith","USA", which means you also have to replace ,\"USA with ,\"US.
The delimiter may or may not be a comma, you have to make sure yur input will always use the same delimiter, or that you can detect and switch at runtime.
You have to account for the case where a delimiter may be part of a field, or where quotes are part of a field.
Now you know/can solve these issues you can, instead of using replace, parse the lines character by character using while( (/*int*/ c = br.read()) != -1), and do this replacement manually with an if gate.
/*while(read)*/
if( c == delimiter ){
if not field, start next field, else add to field value
} else if( c == quote ){
if field value empty, ignore and expect closing quote, else if quote escape not marked, mark it, else, add quote to field value
}
(...)
} else if( c == 13 or c == 10 ){
finished line, check last field of row read and replace data
}
To make it better/harder, define a parsing state machine, put the states into an Enum, and write the if gates with them in mind (this will make your code be more like a compiler parser).
You can find parsing code at different stages here: https://www.mkyong.com/java/how-to-read-and-parse-csv-file-in-java/
You need to change a little bit in your concept. If you want to edit a file then,
create a new file and write content in new file and delete old file and rename new file
with old name.
ArrayList<String> linesList = new ArrayList<>();
BufferedReader br;
String[] data;
File original=new File("D:\\abc\\file.txt");
try {
String line;
br = new BufferedReader(new FileReader(original));
while ((line = br.readLine()) !=null) {
linesList.add(line);
}
br.close();
}
catch (IOException e) { e.printStackTrace(); }
File tempFile = new File("D:\\abc\\tempfile.txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
for (int i = 0; i < linesList.size(); i++) {
if(i==0){
bw.write(linesList.get(i));
bw.newLine();
}
else{
data = linesList.get(i).split(",");
String empid = data[0];
String name=data[1];
String lname=data[2];
String ccode = data[3].substring(0, 2);
String newData=empid+","+name+","+lname+","+ccode+"\n";
bw.write(newData);
bw.newLine();
}
}
bw.close();
if (!original.delete()) {
System.out.println("Could not delete file");
return;
}
// Rename the new file to the filename the original file had.
if (!tempFile.renameTo(original))
System.out.println("Could not rename file");

Read file, one line at a time and run code

I have a file with text in this format:
text:text2:text3
text4:text5:text6
text7:text8:text9
Now what I want to do, is to read the first line, separate the words at the ":", and save the 3 strings into different variables. those variables are then used as parameter for a method, before having the program read the next line and doing the same thing over and over again.. So far I've got this:
public static void main(String[] args) {
BufferedReader reader = null;
try {
File file = new File("C://Users//Patrick//Desktop//textfile.txt");
reader = new BufferedReader(new FileReader(file));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Also, I've tried this for separation (although not sure Array is the best option:
String[] strArr = sCurrentLine.split("\\:");
Use String[] parts = line.split(":"); to get an array with text, text2 etc. You can then loop through parts and call the method you want with each item in the list.
Your original split does not work, because : is not a special character in Regex. You only have to use an escape character when the split you are trying to achieve uses a special character.
More information here.

Split file into multiple files

I want to cut a text file.
I want to cut the file 50 lines by 50 lines.
For example, If the file is 1010 lines, I would recover 21 files.
I know how to count the number of files, the number of lines but as soon as I write, it's doesn't work.
I use the Camel Simple (Talend) but it's Java code.
private void ExtractOrderFromBAC02(ProducerTemplate producerTemplate, InputStream content, String endpoint, String fileName, HashMap<String, Object> headers){
ArrayList<String> list = new ArrayList<String>();
BufferedReader br = new BufferedReader(new InputStreamReader(content));
String line;
long numSplits = 50;
int sourcesize=0;
int nof=0;
int number = 800;
try {
while((line = br.readLine()) != null){
sourcesize++;
list.add(line);
}
System.out.println("Lines in the file: " + sourcesize);
double numberFiles = (sourcesize/numSplits);
int numberFiles1=(int)numberFiles;
if(sourcesize<=50) {
nof=1;
}
else {
nof=numberFiles1+1;
}
System.out.println("No. of files to be generated :"+nof);
for (int j=1;j<=nof;j++) {
number++;
String Filename = ""+ number;
System.out.println(Filename);
StringBuilder builder = new StringBuilder();
for (String value : list) {
builder.append("/n"+value);
}
producerTemplate.sendBodyAndHeader(endpoint, builder.toString(), "CamelFileName",Filename);
}
}
} catch (IOException e) {
e.printStackTrace();
}
finally{
try {
if(br != null)br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
For people who don't know Camel, this line is used to send the file:
producerTemplate.sendBodyAndHeader (endpoint, line.toString (), "CamelFileName" Filename);
endpoint ==> Destination (it's ok with another code)
line.toString () ==> Values
And then the file name (it's ok with another code)
you count the lines first
while((line = br.readLine()) != null){
sourcesize++; }
and then you're at the end of the file: you read nothing
for (int i=1;i<=numSplits;i++) {
while((line = br.readLine()) != null){
You have to seek back to the start of the file before reading again.
But that's a waste of time & power because you'll read the file twice
It's better to read the file once and for all, put it in a List<String> (resizable), and proceed with your split using the lines stored in memory.
EDIT: seems that you followed my advice and stumbled on the next issue. You should have maybe asked another question, well... this creates a buffer with all the lines.
for (String value : list) {
builder.append("/n"+value);
}
You have to use indexes on the list to build small files.
for (int k=0;k<numSplits;k++) {
builder.append("/n"+list[current_line++]);
current_line being the global line counter in your file. That way you create files of 50 different lines each time :)

Get the offset of previous line in a file

I'm extracting data from a file line by line into a database and i can't figure out a proper way to flag lines that I've already read into my database.
I have the following code that I use to iterate through the file lines and I attempt to verify
that the line has my flag or else I try to append the flag to the file line
List<String> fileLines = new ArrayList<String>();
File logFile = new File("C:\\MyStuff\\SyslogCatchAllCopy.txt");
try {
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line = "";
String doneReadingFlag = "##";
Scanner fileScanner = new Scanner(logFile);
while ((line = raf.readLine()) != null && !line.contains(doneReading)) {
Scanner s = new Scanner(line);
String temp = "";
if (!s.hasNext(doneReadingFlag)) {
fileLines.add(line);
raf.write(doneReadingFlag.getBytes(), (int) raf.getFilePointer(),
doneReadingFlag.getBytes().length);
} else {
System.err.println("Allready Red");
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException e) {
System.out.println("Exception while reading the file ");
}
// return fileLines;
// MoreProccessing(fileLines);
This code appends the flag to the next line and it overwrites the characters in that position
Any Help ?
When you write to a file, it doesn't insert do you should expect it to replace the characters.
You need to reserve space in the file for information you want to change or you can add information to another file.
Or instead of marking each file, you can store somewhere the lines number (or better the character position) you have read up to.
If you are not restarting your process you can have process read the file as it is appended (meaning you might not need to store where you are up to anywhere)
#Peter Lawrey I did as you said and it worked for me like that:
as follows:
ArrayList<String> fileLines=new ArrayList<String>();
File logFile=new File("C:\\MyStuff\\MyFile.txt");
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line="";
String doneReadingFlag="#";
long oldOffset=raf.getFilePointer();
long newOffset=oldOffset;
while ((line=raf.readLine())!=null)
{
newOffset=raf.getFilePointer();
if(!line.contains(doneReadingFlag))
{
fileLines.add(line);
raf.seek((long)oldOffset);
raf.writeChars(doneReadingFlag);
raf.seek(newOffset);
System.out.println("Line added and flaged");
}
else
{
System.err.println("Already Red");
}
oldOffset=newOffset;
}

Categories