How Do I Split A String By Line Break? [duplicate] - java

This question already has answers here:
Split Java String by New Line
(21 answers)
Closed 6 years ago.
I'm a noob to android development and I am trying to split a string multiple times by its multiple line breaks. the string I'm trying to split is pulled from a database query and is constructed like this:
public String getCoin() {
// TODO Auto-generated method stub
String[] columns = new String[]{ KEY_ROWID, KEY_NAME, KEY_QUANTITY, KEY_OUNCES, KEY_VALUE };
Cursor c = ourDatabase.query(DATABASE_TABLE, columns, null, null, null, null, null);
String result = "";
int iRow = c.getColumnIndex(KEY_ROWID);
int iName = c.getColumnIndex(KEY_NAME);
int iQuantity = c.getColumnIndex(KEY_QUANTITY);
int iOunces = c.getColumnIndex(KEY_OUNCES);
int iValue = c.getColumnIndex(KEY_VALUE);
for (c.moveToFirst(); !c.isAfterLast(); c.moveToNext()){
result = result + /*c.getString(iRow) + " " +*/ c.getString(iName).substring(0, Math.min(18, c.getString(iName).length())) + "\n";
}
c.close();
return result;
result.getCoin reads as this:
alphabravocharlie
I want to split the string at the line break and place each substring into a String Array. This is my current code:
String[] separated = result.split("\n");
for (int i = 0; i < separated.length; i++) {
chartnames.add("$." + separated[i] + " some text" );
}
This gives me an output of:
"$.alpha
bravo
charlie some text"
instead of my desired output of:
"$.alpha some text, $.bravo some text, $.charlie some text"
Any help is greatly appreciated

you can split a string by line break by using the following statement :
String textStr[] = yourString.split("\\r\\n|\\n|\\r");

It's a little overkill, but you can use the standard I/O classes:
BufferedReader rdr = new BufferedReader(new StringReader(result));
List<String> lines = new ArrayList<String>();
for (String line = rdr.readLine(); line != null; line = rdr.readLine()) {
lines.add(line);
}
rdr.close(); // good form to close streams, but unnecessary for StringReader
// lines now contains all the strings between line breaks of any type
The advantage of this is that BufferedReader.readLine() has all the logic worked out for detecting all sorts of line terminators.
As of Java 8, BufferedReader has a lines() method, so there's an easier way (thanks, #jaco0646):
List<String> lines = new BufferedReader(new StringReader(result))
.lines()
.collect(Collectors.toList();
or, if an array is needed instead:
String[] lines = new BufferedReader(new StringReader(result))
.lines()
.toArray(String[]::new);

Using the Apache commons helper class StringUtils.
The platform independent way:
String[] lines = StringUtils.split(string, "\r\n");
The platform dependent way. Maybe some CPU cycles faster. But I wouldn't expect it to matter.
String[] lines = StringUtils.split(string, System.lineSeparator());

If possible I would suggest using the Guava Splitter and Joiner classes in preference to String.split. But even then, it's important to make sure that you're properly escaping your regular expressions when declaring them. I'm not certain "\n" won't be properly interpreted by the regex compiler in Java, but I'm not sure it will be either.
Covering all possible line endings is tricky, since multiple consecutive EOL markers can mess up your matching. I would suggest
String [] separated = result.replaceAll("\\r", "").split("\\n");

Matcher m = Pattern.compile("(^.+$)+", Pattern.MULTILINE).matcher(fieldContents);
while (m.find()) {
System.out.println("whole group " + m.group())
}

I propose the following snippet, which is compatiple with PC and Mac endline styles both.
String [] separated = result.replaceAll("\\r", "\\n")
.replaceAll("\\n{2,}", "\\n")
.split("\\n");

Related

String search breaking when wrapping .jar with JSmooth

I've got an oddball problem here. I've got a little java program that filters Minecraft log files to make them easier to read. On each line of these logs, there are usually multiple instances of the character "§", which returns a hex value of FFFD.
I am filtering out this character (as well as the character following it) using:
currentLine = currentLine.replaceAll("\uFFFD.", "");
Now, when I run the program through NetBeans, it works swell. My lines get outputted looking like this:
CxndyAnnie: Mhm
CxndyAnnie: Sorry
But when I build the .jar file and wrap it into a .exe file using JSmooth, that character no longer gets filtered out when I run the .exe, and my lines come out looking like this:
§e§7[§f$65§7] §1§nCxndyAnnie§e: Mhm
§e§7[§f$65§7] §1§nCxndyAnnie§e: Sorry
(note: the additional square brackets and $65 show up because their filtering is dependent on the special character and it's following character being removed first)
Any ideas why this would no longer work after putting it through JSmooth? Is there a different way to do the text replace that would preserve its function?
By the way, I also attempted to remove this character using
currentLine = currentLine.replaceAll("§.", "");
but that didn't work in Netbeans nor as a .exe.
I'll go ahead and past the full method below:
public static String[] filterLines(String[] allLines, String filterType, Boolean timeStamps) throws IOException {
String currentLine = null;
FileWriter saveFile = new FileWriter("readable.txt");
String heading;
String string1 = "[L]";
String string2 = "[A]";
String string3 = "[G]";
if (filterType.equals(string1)) {
heading = "LOCAL CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string2)) {
heading = "ADVERTISING CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string3)) {
heading = "GLOBAL CHAT LOGS ONLY \r\n\r\n";
}
else {
heading = "CHAT LINES CONTAINING \"" + filterType + "\" \r\n\r\n";
}
saveFile.write(heading);
for (int i = 0; i < allLines.length; i++) {
if ((allLines[i] != null ) && (allLines[i].contains(filterType))) {
currentLine = allLines[i];
if (!timeStamps) {
currentLine = currentLine.replaceAll("\\[..:..:..\\].", "");
}
currentLine = currentLine.replaceAll("\\[Client thread/INFO\\]:.", "");
currentLine = currentLine.replaceAll("\\[CHAT\\].", "");
currentLine = currentLine.replaceAll("\uFFFD.", "");
currentLine = currentLine.replaceAll("\\[A\\].", "");
currentLine = currentLine.replaceAll("\\[L\\].", "");
currentLine = currentLine.replaceAll("\\[G\\].", "");
currentLine = currentLine.replaceAll("\\[\\$..\\].", "");
currentLine = currentLine.replaceAll(".>", ":");
currentLine = currentLine.replaceAll("\\[\\$100\\].", "");
saveFile.write(currentLine + "\r\n");
//System.out.println(currentLine);
}
}
saveFile.close();
ProcessBuilder openFile = new ProcessBuilder("Notepad.exe", "readable.txt");
openFile.start();
return allLines;
}
FINAL EDIT
Just in case anyone stumbles across this and needs to know what finally worked, here's the snippet of code where I pull the lines from the file and re-encode it to work:
BufferedReader fileLines;
fileLines = new BufferedReader(new FileReader(file));
String[] allLines = new String[numLines];
int i=0;
while ((line = fileLines.readLine()) != null) {
byte[] bLine = line.getBytes();
String convLine = new String(bLine, Charset.forName("UTF-8"));
allLines[i] = convLine;
i++;
}
I also had a problem like this in the past with minecroft logs, I don’t remember the exact details, but the issue came down to a file format problem, where UTF8 encoding worked correctly but some other text encoding including the system default did not work correctly.
First:
Make sure that you specify UTF8 encoding when reading the byteArray from file so that allLines contains the correct info like so:
Path fileLocation = Paths.get("C:/myFileLocation/logs.txt");
byte[] data = Files.readAllBytes(fileLocation);
String allLines = new String(data , Charset.forName("UTF-8"));
Second:
Using \uFFFD is not going to work, because \uFFFD is only used to replace an incoming character whose value is unknown or unrepresentable in Unicode.
However if you used the correct encoding (shown in my first point) then \uFFFD is not necessary because the value § is known in unicode so you can simply use
currentLine.replaceAll("§", "");
or specifically use the actual unicode string for that character U+00A7 like so
currentLine.replaceAll("\u00A7", "");
or just use both those lines in your code.

Deal with PatternSyntaxException and scanning texts

I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}

how can i use split() with a big number of elements, java

I need to process a big text file, there are almost 400 column in each line, and almost 800000 lines in the file, the format of each line in the file is like:
340,9,2,3........5,2,LA
what I want to do is, for each line, if the last column is LA, then print the first column of this line.
i write a simple program to do it
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
String [] text = new String [388];
text = line.split(",");
if (text [387] == args[2]) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
but it seems the length of an array cant be that big, i received a java.lang.ArrayIndexOutOfBoundsException
since i'm using split(",") in order to get the last column of a line, and it will be out of array bounds, how can I do with it? thanks.
text does not need to be initialized, String.split will create a correctly sized array:
String[] text = line.split(",");
You're also comparing Strings using reference equality (==). You should be using .equals():
if (text[387].equals(args[2])) { ... }
You're probably getting java.lang.ArrayIndexOutOfBoundsException because the the index 387 is too big. If you want to get last element, use this:
text[text.length - 1]
Modify and try this
String [] text = line.split(",");
if (text [text.length - 1].equals(args[2])) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
Assuming args[2] is LA.
String [] text;
Change your code to this. You don't need to initialize a size. When the String.split method executes it will automatically initialize the correct size for your array.
If you just need the first and the last column, then there is no need to create an array out of the current line.
You could do something like this:
final String test = "340,9,2,354,63,5,5,45,634,5,5,2,LA";
final char delimiter = ',';
final String lastColumn = test.substring(test.lastIndexOf(delimiter) + 1);
if (lastColumn.equals("LA")) {
final String firstColumn = test.substring(0, test.indexOf(delimiter));
System.out.println(firstColumn);
}
This code extracts the last column first and tests it. If it matches "LA", then it extract the first column. It will ignore the remaining content of the line.
Your code would be:
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
final String lastColumn = line.substring(line.lastIndexOf(delimiter) + 1);
if (lastColumn.equals(args[2])) {
bufw.write(line.substring(0, line.indexOf(delimiter)));
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
(this code is not tested yet, but you get the idea :))

Splitting array on new line

I am submitting the following input through stdin:
4 2
30 one
30 two
15 three
25 four
My code is:
public static void main(String[] args) throws IOException {
BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
String submittedString;
System.out.flush();
submittedString = stdin.readLine();
zipfpuzzle mySolver = new zipfpuzzle();
mySolver.getTopSongs(submittedString);
}
Which calls:
//Bringing it all together
public String getTopSongs(String myString) {
setUp(myString);
calculateQuality();
qualitySort();
return titleSort();
}
Which calls
public void setUp(String myString) {
String tempString = myString;
//Creating array where each element is a line
String[] lineExplode = tempString.split("\\n+");
//Setting up numSongsAlbum and songsToSelect
String[] firstLine = lineExplode[0].split(" ");
numSongsAlbum = Integer.parseInt(firstLine[0]);
songsToSelect = Integer.parseInt(firstLine[1]);
System.out.println(lineExplode.length);
//etc
}
However, for some reason lineExplode.length returns value 1... Any suggestions?
Kind Regards,
Dario
String[] lineExplode = tempString.split("\\n+");
The argument to String#split is a String that contains a regular expression
Your String#split regex will work file on Strings with newline characters.
String[] lineExplode = tempString.split("\n");
The problem is that your tempString has none of these characters, hence the size of the array is 1.
Why not just put the readLine in a loop and add the Strings to an ArrayList
String submittedString;
while (!(submittedString= stdin.readLine()).equals("")) {
myArrayList.add(submittedString);
}
Are you sure the file is using UNIX-style line endings (\n)? For a cross-platform split, use:
String[] lineExplode = tempString.split("[\\n\\r]+");
You should use "\\n" character to separate by new line but check that not all OS use the same separators ( http://en.wikipedia.org/wiki/Newline )
To solve this is very useful the system property line.separator that contains the current separator charater(s) for the current OS that is running the application.
You should use:
String[] lineExplode = tempString.split("\\\\n");
using \n as separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator);
Using the current OS separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator + "+");
Using the current OS separator and requiring one item
Its better to use this split this way:
String[] lineExplode =
tempString.split(Pattern.quote(System.getProperty("line.separator")) + '+');
To keep this split on new line platform independent.
UPDATE: After looking at your posted code it is clear that OP is reading just one line (till \n) in this line:
submittedString = stdin.readLine();
and there is no loop to read further lines from input.

string tokenizer wrong usage in java

I believe I am not using correctly String Tokenizer. Here is my code:
buffer = new byte[(int) (end - begin)];
fin.seek(begin);
fin.read(buffer, 0, (int) (end - begin));
StringTokenizer strk = new StringTokenizer(new String(buffer),
DELIMS,true);
As you can see I am reading a chunk of lines from a file(end and begin are line numbers) and I am transfering the data to a string tokenizer. My delimitators are:
DELIMS = "\r\n ";
because I want to separate words that have a space between them, or are on the next line.
However this code sometimes separates whole words also. What could be the explanation?? Is my DELIMS string conceived wrong?
Also I am passing "true" as an argument to the tokenizer because I want the delimitators to be treated as tokens as well.( I want this because I want to count the line I am currently at)
Could you please help me. Thanks a lot.
To start with, your method for converting bytes into a String is a bit suspect, and this overall method will be less-than-efficient, especially for a larger file.
Are you required to use StringTokenizer? If not, I'd strongly recommend using Scanner instead. I'd provide you with an example, but will ask that you just refer to the Javadocs instead, which are quite comprehensive and already contain good examples. That said, it accepts delimiters as well - but as Regular Expressions, so just be aware.
You could always wrap your input stream in a LineNumberReader. That will keep track of the line number for you. LineNumberReader extends BufferedReader, which has a readLine() method. With that, you could use a regular StringTokenizer to get your words as tokens. You could use regular expressions or Scanner, but for this case, StringTokenizer is simpler for beginners to understand and quicker.
You must have a RandomAccessFile. You didn't specify that, but I'm guessing based on the methods you used. Try something like:
byte [] buffer = ...; // you know how to get this.
ByteArrayInputStream stream = new ByteArrayInputStream(buffer);
// if you have java.util.Scanner
{
int lineNumber = 0;
Scanner s = new Scanner(stream);
while (s.hasNextLine()) {
lineNum++;
String line = s.nextLine();
System.out.format("I am on line %s%n", lineNum);
Scanner lineScanner = new Scanner(line);
while (lineScanner.hasNext()) {
String word = lineScanner.next();
// do whatever with word
}
}
}
// if you don't have java.util.Scanner, or want to use StringTokenizer
{
LineNumberReader reader = new LineNumberReader(
new InputStreamReader(stream));
String line = null;
while ((line = reader.nextLine()) != null) {
System.out.println("I am on line " + reader.getLineNumber());
StringTokenizer tok = new StringTokenizer(line);
while (tok.hasMoreTokens()) {
String word = tok.nextToken();
// do whatever with word
}
}
}

Categories