A simple string.split() gone horribly wrong [duplicate] - java

This question already has answers here:
Split string with | separator in java
(12 answers)
Closed 10 years ago.
This code seems to be broken but I can't tell why:
System.out.println(line);
// prints: Some Name;1|IN03PLF;IN02SDI;IN03MAP;IN02SDA;IN01ARC
String args[] = line.split("|");
String candidatArgs[] = args[0].split(";");
String inscrieriString[] = args[1].split(";");
System.out.println(args[0]);
System.out.println(args[1]);
System.out.println(candidatArgs);
System.out.println("[0]:" + candidatArgs[0]);
System.out.println("[1]:" + candidatArgs[1]);
// prints: S
// [Ljava.lang.String;#4f77e2b6
// [0]:
I have no idea why that happens. By my logic:
String args[] = line.split("|");
[0]: Some Name;1
[1]: IN02SDI;IN03MAP;IN02SDA;IN01ARC
Instead of:
[0]: S
In case you'd like more code: This should compile even if it doesn't do much (removed as much un-necessary code as I could)
Main:
Have a file: Candidati.txt
containing: Some Name;1|IN03PLF;IN02SDI;IN03MAP;IN02SDA;IN01ARC
import java.util.ArrayList;
Repository repository = new Repository ("Candidati.txt"); // file name
ArrayList<Candidat> candidati = repository.getCandidati();
System.out.println(candidati);
Repository
import java.util.ArrayList;
public class Repository {
private String fisierCuCandidati;
private ArrayList<Candidat> listaCandidati;
public Repository (String fisierCuCandidati) {
this.fisierCuCandidati = fisierCuCandidati; // file name
this.listaCandidati = new ArrayList<Candidat>();
this.incarcaCandidati();
}
public void incarcaCandidati() {
FileReader in = null;
BufferedReader input = null;
//try {
in = new FileReader (this.fisierCuCandidati);
input = new BufferedReader (in);
String line;
while ((line = input.readLine()) != null) {
System.out.println(line);
String args[] = line.split("|");
String candidatArgs[] = args[0].split(";");
String inscrieriString[] = args[1].split(";");
System.out.println(args[0]);
System.out.println(args[1]);
System.out.println(candidatArgs);
System.out.println("[0]:" + candidatArgs[0]);
System.out.println("[1]:" + candidatArgs[1]);
}
}
Candidat
public class Candidat {
public Candidat (String nume) {
}
public Candidat (String nume, int id) {
}

String.split uses a regular expression so you need to escape the pipe |, which is a special character (meaning OR):
String args[] = line.split("\\|");
Also to print the String array output rather the Object.toString representation, you will need:
System.out.println(Arrays.toString(candidatArgs));

You either need to escape the pipe or use it inside a character class in your split, as String#split takes a regex, and | is a meta character in regex. So, use this instead:
String args[] = line.split("\\|");
or:
String args[] = line.split("[|]");
The reason a character class works is because, inside a character class, te meta-characters have no special meaning. So, a pipe is just a pipe, and not an alternation character.
In addition to that, you should use Arrays#toString method to print your array.

change to:
String args[] = line.split("\\|");
your | won't work because the parameter of split is a regex, | has special meaning(or) in regex.

| is a special character in java regexes, so you need to escape it like \\|. I generally do
line.split(Pattern.quote(separator))
where Pattern is http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html, and separator is whatever separator you use. This automatically takes care of escaping special characters.

Related

NoSuchToken exception for StringTokenizer.nextToken()

When I try to run the code:
import java.io.*;
import java.util.*;
class dothis {
public static void main (String [] args) throws IOException {
BufferedReader f = new BufferedReader(new FileReader("ride.in"));
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("ride.out")));
StringTokenizer st = new StringTokenizer(f.readLine());
String s1 = st.nextToken();
String s2 = st.nextToken();
char[] arr = new char[6];
if (find(s1, arr, 1) == find(s2, arr, 1)) {
out.print("one");
} else {
out.println("two");
}
out.close();
}
}
With the data file:
ABCDEF
WERTYU
it keeps on outputting:
Exception_in_thread_"main"_java.util.NoSuchElementException
at_java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at_dothis.main(Unknown_Source)
I did see a similar question on Stack Overflow, but in that case, the second line of the text file is blank, therefore there wasn't a second token to be read. However, the two first lines of this data file both contain a String. How come a token would not be read for the second line?
the StringTokenizer docs say that if you don't pass a token delimiter in the constructor, it's assumed to be:
" \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character
. you're then asking for two tokens to be read out of the string returned by f.readLine(), which is ABCDEF (with no delimiters), so an exception is thrown.
You are creating the StringTokenizer class from the first line and so when you are setting the value of s2, there is no token left because s1 has the first and only token (ABCDEF). If I am not wrong, you are getting the exception when you are trying to set s2?
When you read a line it will return a String till it found "\n" or "\r" and in your case you have token in each line (i believe).
You really don't need StringTokenizer.
Each line you read is actually a token for you.
Also, if you are expecting each-line to have more than one token than you need to make sure you supply the delimiter to your tokenizer to understand the same.
StringTokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.
If you are ready for Java 7/8, you could make it even simpler without the need of StringTokenizer.
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
class dothis {
public static void main (String [] args) throws IOException {
String contents = new String(Files.readAllBytes(Paths.get("ride.in")));
String[] lines= contents.split("\n");
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("ride.out")));
String s1 = lines[0];
String s2 = lines[1];
char[] arr = new char[6];
if (find(s1, arr, 1) == find(s2, arr, 1)) {
out.print("one");
} else {
out.println("two");
}
out.close();
}
}
In your code:
StringTokenizer st = new StringTokenizer(f.readLine());//f.readLine() method reads only 1 line
String s1 = st.nextToken();
String s2 = st.nextToken();//there is no second line, so this will give you error
Try this:
String result = "";
String tmp;
while((tmp = f.readLine())!= null)
{
result += tmp+"\n";
}
StringTokenizer st = new StringTokenizer(result);
ArrayList<String> str = new ArrayList<String>();
int count = st.countTokens();
for(int i=0; i< count; i++)
{
str.add(st.nextToken());
}
Now check by using the above arraylist.
As StringTokenizer is depricated, it is suggested to use split() method of String class.

Remove a character followed by whitespace each newline of a string

I am writing a program to edit a rtf file. The rtf file will always come in the same format with
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
I want to remove the Q / A + whitespace and leave just the X's and Y's on each line. My first idea is to split the string into a new string for each line and edit it from there using str.split like so:
private void countLines(String str){
String[] lines = str.split("\r\n|\r|\n");
linesInDoc = lines;
}
From here my idea is to take each even array value and get rid of Q + whitespace and take each odd array value and get rid of A + whitespace. Is there a better way to do this? Note: The first line somteimes contains a ~6 digit alphanumeric. I tihnk an if statement for a 2 non whitespace chars would solve this.
Here is the rest of the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.io.Writer;
import javax.swing.JEditorPane;
import javax.swing.text.BadLocationException;
import javax.swing.text.EditorKit;
public class StringEditing {
String[] linesInDoc;
private String readRTF(File file){
String documentText = "";
try{
JEditorPane p = new JEditorPane();
p.setContentType("text/rtf");
EditorKit rtfKit = p.getEditorKitForContentType("text/rtf");
rtfKit.read(new FileReader(file), p.getDocument(), 0);
rtfKit = null;
EditorKit txtKit = p.getEditorKitForContentType("text/plain");
Writer writer = new StringWriter();
txtKit.write(writer, p.getDocument(), 0, p.getDocument().getLength());
documentText = writer.toString();
}
catch( FileNotFoundException e )
{
System.out.println( "File not found" );
}
catch( IOException e )
{
System.out.println( "I/O error" );
}
catch( BadLocationException e )
{
}
return documentText;
}
public void editDocument(File file){
String plaintext = readRTF(file);
System.out.println(plaintext);
fixString(plaintext);
System.out.println(plaintext);
}
Unless I'm missing something, you could use String.substring(int) like
String lines = "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n" //
+ "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n";
for (String line : lines.split("\n")) {
System.out.println(line.substring(6));
}
Output is
XXXXXXXXXXXX
YYYYYYYYYYYY
XXXXXXXXXXXX
YYYYYYYYYYYY
If your format should be more general, you might prefer
System.out.println(line.substring(1).trim());
A BufferedReader will handle the newline \n for you.
You can use a matcher to validate that the line is in the desired format.
If the line is fixed length, simply use the substring
final String bodyPattern = "\\w{1,1}[ \\w]{5,5}\\d{12,12}";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
String line;
while ((line = br.readLine()) != null) {
if (line.matches(bodyPattern)) {
//
myString = line.substring(6);
}
}
}
//catch Block
You can adjust the regex pattern to your specific requirements
easily doable by a regex (assuming 'fileText' is your whole file's content)
removedPrefix = fileText.replaceAll("(A|Q) *(.+)\\r", "$2\\r");
The regex means a Q or A for start, then some (any amount of) spaces, then anything (marked as group 2), and a closing line. This doesn't do anything to the first line with the digits. The result is the file content without the Q/A and the spaces. There are easier ways if you know the exact number of spaces before your needed text, but this works for all, and greatly flexible.
If you process line by line it's
removedPrefix = currentLine.replaceAll("(A|Q) *(.+)", "$2");
As simple as that

Find a string in a very large formatted text file in java

Here is the thing:
I have a really big text file and it has a format like this:
0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99
0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99
...
...
And I need to find if a particular pattern exists in the file, say
0007476|whatever|00313865000|whatever
All I need is a boolean saying yes or no.
Now what I have done is to read the file line by line and do a regular expression matching:
Pattern pattern = Pattern.compile(regex);
Scanner scanner = new Scanner(new File(fileName));
String line;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (pattern.matcher(line).matches()) {
scanner.close();
return true;
}
}
and the regex has a form of
"0007476\|\d{12}\|0031386500.*
This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks
The java String class has a contains method which returns a boolean. If your string is fixed, this is a lot faster than a regular expression:
if (string.contains("0007476|") && string.contains("|00313865000|")) {
// whatever
}
Hope that helped, if not, leave a comment.
I assume that you need the Scanner because the file is too big to read into a single String instead?
If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:
"(?m)^0007476\|\d{12}\|0031386500.*$
If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?
I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:
private static boolean matches() throws IOException {
String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";
Pattern pattern = Pattern.compile(regex);
try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {
for(String lines; (lines = readLines(br, 10000)) != null; ) {
if (pattern.matcher(lines).find()) {
return true;
}
}
}
return false;
}
private static String readLines(BufferedReader br, int amount) throws IOException {
StringBuilder builder = new StringBuilder();
int lineCounter = 0;
for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {
builder.append(line).append(System.lineSeparator());
}
return lineCounter > 0 ? builder.toString() : null;
}

Splitting array on new line

I am submitting the following input through stdin:
4 2
30 one
30 two
15 three
25 four
My code is:
public static void main(String[] args) throws IOException {
BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
String submittedString;
System.out.flush();
submittedString = stdin.readLine();
zipfpuzzle mySolver = new zipfpuzzle();
mySolver.getTopSongs(submittedString);
}
Which calls:
//Bringing it all together
public String getTopSongs(String myString) {
setUp(myString);
calculateQuality();
qualitySort();
return titleSort();
}
Which calls
public void setUp(String myString) {
String tempString = myString;
//Creating array where each element is a line
String[] lineExplode = tempString.split("\\n+");
//Setting up numSongsAlbum and songsToSelect
String[] firstLine = lineExplode[0].split(" ");
numSongsAlbum = Integer.parseInt(firstLine[0]);
songsToSelect = Integer.parseInt(firstLine[1]);
System.out.println(lineExplode.length);
//etc
}
However, for some reason lineExplode.length returns value 1... Any suggestions?
Kind Regards,
Dario
String[] lineExplode = tempString.split("\\n+");
The argument to String#split is a String that contains a regular expression
Your String#split regex will work file on Strings with newline characters.
String[] lineExplode = tempString.split("\n");
The problem is that your tempString has none of these characters, hence the size of the array is 1.
Why not just put the readLine in a loop and add the Strings to an ArrayList
String submittedString;
while (!(submittedString= stdin.readLine()).equals("")) {
myArrayList.add(submittedString);
}
Are you sure the file is using UNIX-style line endings (\n)? For a cross-platform split, use:
String[] lineExplode = tempString.split("[\\n\\r]+");
You should use "\\n" character to separate by new line but check that not all OS use the same separators ( http://en.wikipedia.org/wiki/Newline )
To solve this is very useful the system property line.separator that contains the current separator charater(s) for the current OS that is running the application.
You should use:
String[] lineExplode = tempString.split("\\\\n");
using \n as separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator);
Using the current OS separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator + "+");
Using the current OS separator and requiring one item
Its better to use this split this way:
String[] lineExplode =
tempString.split(Pattern.quote(System.getProperty("line.separator")) + '+');
To keep this split on new line platform independent.
UPDATE: After looking at your posted code it is clear that OP is reading just one line (till \n) in this line:
submittedString = stdin.readLine();
and there is no loop to read further lines from input.

in java, how to print entire line in the file when string match found

i am having text file called "Sample.text". It contains multiple lines. From this file, i have search particular string.If staring matches or found in that file, i need to print entire line . searching string is in in middle of the line . also i am using string buffer to append the string after reading the string from text file.Also text file is too large size.so i dont want to iterate line by line. How to do this
You could do it with FileUtils from Apache Commons IO
Small sample:
StringBuffer myStringBuffer = new StringBuffer();
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).contains("something")) {
myStringBuffer.append(String.valueOf(line));
}
}
we can also use regex for string or pattern matching from a file.
Sample code:
import java.util.regex.*;
import java.io.*;
/**
* Print all the strings that match a given pattern from a file.
*/
public class ReaderIter {
public static void main(String[] args) throws IOException {
// The RE pattern
Pattern patt = Pattern.compile("[A-Za-z][a-z]+");
// A FileReader (see the I/O chapter)
BufferedReader r = new BufferedReader(new FileReader("file.txt"));
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
int start = m.start(0);
// Get ending position
int end = m.end(0);
// Print whatever matched.
// Use CharacterIterator.substring(offset, end);
System.out.println(line.substring(start, end));
}
}
}
}

Categories