Java split CSV with two patterns [duplicate] - java

This question already has answers here:
CSV API for Java [closed]
(10 answers)
Closed 7 years ago.
I have this CSV with this format, I need to split in 16 Parts, according with the text's:
"MY TEXT","EVER, OK",,,,,,,,,,"The, CARLO","DO","ALFA","OME, GA",,
I have this way, but when appear case's with commas inside "OME, GA" doesn't work.
String[] lineToMap = line.split(",");
How I can control this cases?
Solution
I discovered this library: OPEN CSV, in my Test's work very well, but I have a problem with the codification (strange symbols like � instead ü), with characters like ü, ö, etc.. I'm using languages like German and French. I was testing with UTF-8 but didn't work, Any idea to fix this?
CSVReader reader=new CSVReader(new InputStreamReader(new FileInputStream("data/test.csv"), "UTF-8"),
',', '"', 0);
//Read CSV line by line and use the string array as you want
String[] nextLine;
try {
while ((nextLine = reader.readNext()) != null) {
if (nextLine != null) {
for (String element : nextLine) {
System.out.println(element);
}
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Trying splitting on "," instead and handle the edge cases at the beginning and end of the arrays by hand.

Related

How to output an string of ArrayList into a new file?

Hi guys I have this sample text file in which the names of the peopel are stuck together without any spacing in between them. Is it possible for me to put this into a bufferedreader and create a ArrayList to store the values in a string and then to separate the strings by name.
Text file details:
charles_luiharry_pinkertonarlene_purcellwayne_casanova
My code:
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
String str;
List<String> list = new ArrayList<String>();
while ((str = in.readLine()) != null) {
list.add(str);
}
String[] stringArr = list.toArray(new String[0]);
FileWriter writer = new FileWriter("new_users.txt");
for (String ss : list) {
writer.write(ss);
}
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Expected output :
charles_lui
harry_pinkerton
arlene_purcell
wayne_casanova
Real output:
A duplicate of the sample file.
Just add a line separator to your writer:
writer.write(ss);
writer.write(System.lineSeparator());
If problems with your os, use System.getProperty( "line.separator" )
BufferedReader.readLine() reads and returns a line from the input which ends with \n or \r\n. It cannot detect the boundaries between the names in your input file. Better prepare the input that the names are on different lines.
It's difficult for a human to successfully separate the last-name with first-name of the next name, how can you expect a computer to do so?
Proposed solution -
Modify the sample file and add a separator(say ';') between two names.
Make a lengthy string by concatenating all the lines in the file. When concatenating remove '\n' or '\r\n' from the end of lines. (Optional - Use a StringBuffer for performance).
Split that string into an 'array of valid names'.
This can be done by calling the split(';') method on the lengthy string, with the separator as the argument.
Then, print from the array.

Java string splitting, not working the way I envisioned it to be

So I'm trying to make a fan java program for a game for a few of my friends, I'm trying to read the contents of a text file to store into an array/arraylist in the future but I'm unable to get string split working the way I hoped it would. I tried examples from this place that worked for people just to see if it will work but I get the same output.
importCards.java
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("dark.txt"));
String read = null;
while ((read = in.readLine()) != null) {
read = in.readLine();
String[] splited = read.split("||");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
The text file is formatted as follows
17||Dark Soul Endor||Dark||2||1||Human||Main Characters||5|500000||833||126||78||23||Release of Spirit - Dark||Dissolve all Light Runestones to inflict Dark on all enemies||Power of Dark||Dark Attack x 150%
However when I tried and printed it I got this
1
8
|
|
D
a
r
k
and so on
Regular expression have special tokens such as | which don't mean | literally, but in this case mean OR. i.e. in this case "" OR "" OR "" When you match by empty string it splits each character into it's own string.
What you intended was
String[] splited = read.split("\\|\\|");
You might be wondering, why two \\| and not \| The reason is that \ has a special meaning in Java and regular expressions. When you write \\| in Java it becomes "\|" as a string, i.e. two characters, which as a regular expression is | literally, instead of a special token.
BTW I suggest you use , or \t (tab) instead. This will not only be smaller but you will be able to edit the file in your favourite spread sheet editor such as Excel or LibreCalc. This makes it much easier to see where the columns and even add/remove a column or change their order.
Diablo II used , for its raw data files. ;)
If you read CSV or TSV files, there is libraries to make it easier to read/import such as OpenCSV's CSVReader
split() takes a regex, and | is a regex special char which means "or", so you're splitting by "empty string" or "empty string" or "empty string".
You need to escape them: "\\|\\|".
You need to scape || as \\|\\|
Eg:
String str = "17||Dark Soul Endor||Dark||2||1||Human||Main Characters||5|500000||833||126||78||23||Release of Spirit - Dark||Dissolve all Light Runestones to inflict Dark on all enemies||Power of Dark||Dark Attack x 150%";
String[] splited = str.split("\\|\\|");
for(String i:splited){
System.out.println(i);
}
Remove read = in.readLine(); because you read 2 line in your example and change te split to:
String[] splited = read.split("\\|\\|");
becase || is a special char in regex.

Java split giving opposite order of arabic characters

I am splitting the following string using \\| in java (android) using the IntelliJ 12 IDE.
Everything is fine except the last part, somehow the split picks them up in the opposite order :
As you can see the real positioning 34,35,36 is correct and according to the string, but when it gets picked out into split part no 5 its in the wrong order, 36,35,34 ...
Any way I can get them to be in the right order?
My Code:
public ArrayList<Book> getBooksFromDatFile(Context context, String fileName)
{
ArrayList<Book> books = new ArrayList<Book>();
try
{
// load csv from assets
InputStream is = context.getAssets().open(fileName);
try
{
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = reader.readLine()) != null)
{
String[] RowData = line.split("\\|");
books.add(new Book(RowData[0], RowData[1], RowData[2], RowData[3], RowData[4], RowData[5]));
}
}
catch (IOException ex)
{
Log.e(TAG, "Error parsing csv file!");
}
finally
{
try
{
is.close();
}
catch (IOException e)
{
Log.e(TAG, "Error closing input stream!");
}
}
}
catch (IOException ex)
{
Log.e(TAG, "Error reading .dat file from assets!");
}
return books;
}
The characters in the String should always be in linguistic order, regardless of whether they're right-to-left or left-to-right characters. So we should see [34] = '١', [35] = '-', [36] = '٧'. It is up to rendering engines to display them using the correct right-to-left or left-to-right layout.
In Unicode world, there are strong and weak characters. These are the list of weak characters:
"\\", "/", "+", "-", "=", ";", "$"
They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed, more info here.
To fix this issue you need to set the Directional Formatting, for example:
RightToLeftEmbedding + weakCharacter + PopDirectionalFormatting
using these constant values
char RightToLeftEmbedding = (char)0x202B;
char PopDirectionalFormatting = (char)0x202C;

Splitting String in case of a new line [duplicate]

This question already has answers here:
Split Java String by New Line
(21 answers)
Closed 6 years ago.
I have a method in java which returns the updated lines of a flat file as a "String". I am receiving the String as a bunch of lines, what i want to do is to separate the String line by line. How can i do that in java?????
I will suggest make use of system property line.separator for this splitting to make your code work on any platform eg: *nix, Windows, Mac etc. Consider code like this:
String str = "line1\nline2\nline3";
String eol = System.getProperty("line.separator");
System.out.printf("After Split - %s%n", Arrays.toString(str.split(eol)));
String lines[] = fileString.split("\\r?\\n"); //should handle unix or win newlines
I am not sure about with single string separator .
but i roughly write a method that may be relevant to you .
public void splitter (DocumentEvent e) {
String split[], docStr = null;
Document textAreaDoc = (Document)e.getDocument();
try {
docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
split = docStr.split("\\n");
}

Android/Java text file read problem [duplicate]

This question already has answers here:
Android read text raw resource file
(14 answers)
Closed 3 years ago.
I'm currently trying to read a file from (res/raw) by using an InputStream that I dimension like such:
InputStream mStream = this.getResources().openRawResource(R.raw.my_text_file_utf_8);
I then put that into this method to return the values:
public List<String> getWords(InputStream aFile) {
List<String> contents = new ArrayList<String>();
try {
BufferedReader input = new BufferedReader(new InputStreamReader(aFile));
try {
String line = new String();//not declared within while loop
while ((line = input.readLine()) != null ){
contents.add(line);
}
}
finally {
input.close();
}
}
catch (IOException ex){
ex.printStackTrace();
}
return contents;
}
My problem: It reads all the values as it should, but say if the file is 104 lines long, it will actually return a value of something like 134 total lines with the remaining 30 lines being full of null??
Have checked: Already using UTF-8 format, and double checked that there are literally no blank lines within the document itself...
I thought the way the while loop was written that it couldn't record a line=null value to contents List? Am I missing something here?
Thanks for any constructive information! I'm pretty sure I'm overlooking some simple factoid here though...
Why dont you create HTML for your information and then parse it.

Categories