Reading file line with multiple delimiters in java

Reading file line with multiple delimiters in java - java

I am trying to read a file line by line with multiple delimiters. I am using a regex for splitting but its not considering space (" ") as a delimiters. File contains ;, #, ,, and space as delimiters. What am I doing wrong?
File line looks like this - ADD R1, R2, R3
public static void initialize() throws IOException {
PC = 4000;
BufferedReader fileReader = new BufferedReader(new FileReader("test/ascii.txt"));
String str;
while((str = fileReader.readLine()) != null){
Instruction instruction = new Instruction();
String[] parts = str.split("[ ,:;#]");
instruction.instrAddr = String.valueOf(PC++);
System.out.println(instruction.instrAddr);
instruction.opcode = parts[0];
System.out.println(instruction.opcode);
instruction.dest = parts[1];
System.out.println(instruction.dest);
instruction.source_1 = parts[2];
System.out.println(instruction.source_1);
instruction.source_2 = parts[3];
System.out.println(instruction.source_2);
}
fileReader.close();}
The output prints 4000 (PC value), ADD, R1, " " and R2. How to avoid space? Is there anything wrong with the regex str.split("[ ,:;#]"); ?

Are you sure those are actually spaces?
This should work for any white-space:
#Test
public void test() {
String s = "1 2,3:4;5#6\t7";
Assert.assertEquals(7, s.split("[\\s,:;#]").length);
}

Related

NoSuchToken exception for StringTokenizer.nextToken()

When I try to run the code:
import java.io.*;
import java.util.*;
class dothis {
public static void main (String [] args) throws IOException {
BufferedReader f = new BufferedReader(new FileReader("ride.in"));
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("ride.out")));
StringTokenizer st = new StringTokenizer(f.readLine());
String s1 = st.nextToken();
String s2 = st.nextToken();
char[] arr = new char[6];
if (find(s1, arr, 1) == find(s2, arr, 1)) {
out.print("one");
} else {
out.println("two");
}
out.close();
}
}
With the data file:
ABCDEF
WERTYU
it keeps on outputting:
Exception_in_thread_"main"_java.util.NoSuchElementException
at_java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at_dothis.main(Unknown_Source)
I did see a similar question on Stack Overflow, but in that case, the second line of the text file is blank, therefore there wasn't a second token to be read. However, the two first lines of this data file both contain a String. How come a token would not be read for the second line?

the StringTokenizer docs say that if you don't pass a token delimiter in the constructor, it's assumed to be:
" \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character
. you're then asking for two tokens to be read out of the string returned by f.readLine(), which is ABCDEF (with no delimiters), so an exception is thrown.

You are creating the StringTokenizer class from the first line and so when you are setting the value of s2, there is no token left because s1 has the first and only token (ABCDEF). If I am not wrong, you are getting the exception when you are trying to set s2?

When you read a line it will return a String till it found "\n" or "\r" and in your case you have token in each line (i believe).
You really don't need StringTokenizer.
Each line you read is actually a token for you.
Also, if you are expecting each-line to have more than one token than you need to make sure you supply the delimiter to your tokenizer to understand the same.
StringTokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.

If you are ready for Java 7/8, you could make it even simpler without the need of StringTokenizer.
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
class dothis {
public static void main (String [] args) throws IOException {
String contents = new String(Files.readAllBytes(Paths.get("ride.in")));
String[] lines= contents.split("\n");
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("ride.out")));
String s1 = lines[0];
String s2 = lines[1];
char[] arr = new char[6];
if (find(s1, arr, 1) == find(s2, arr, 1)) {
out.print("one");
} else {
out.println("two");
}
out.close();
}
}

In your code:
StringTokenizer st = new StringTokenizer(f.readLine());//f.readLine() method reads only 1 line
String s1 = st.nextToken();
String s2 = st.nextToken();//there is no second line, so this will give you error
Try this:
String result = "";
String tmp;
while((tmp = f.readLine())!= null)
{
result += tmp+"\n";
}
StringTokenizer st = new StringTokenizer(result);
ArrayList<String> str = new ArrayList<String>();
int count = st.countTokens();
for(int i=0; i< count; i++)
{
str.add(st.nextToken());
}
Now check by using the above arraylist.
As StringTokenizer is depricated, it is suggested to use split() method of String class.

Split command on a nextElement

I am making a java servlet and am trying to make it display a preview of 3 different articles. I want it to preview the first sentence of each article, but can't seem to get split to work properly since I am reading the articles in with tokenizer. So I have something like:
while ((s = br.readLine()) != null) {
out.println("<tr>");
StringTokenizer s2 = new StringTokenizer(s, "|");
while (s2.hasMoreElements()) {
if (index == 0) {
out.println("<td class='first'>" + s2.nextElement() + "</td>");
}
out.println("</tr>");
}
index = 0;
}
How do I make s2.nextElement print out only the first sentence instead of the whole article? I imagine I could do split with a delimiter of ".", but can't get the code to work right. Thanks.

Try
s2.nextElement().split("\\.")[0];
to get the first sentence in the paragraph.

It would be better to use a Scanner:
Scanner scanner = new Scanner(new File("articles.txt"));
while (scanner.hasNext()) {
String article = scanner.next();
String[] parts = article.split("\\s*\\|\\s*");
String title = parts[0];
String text = parts[1];
String date = parts[2];
String image = parts[3];
String firstSentence = text.replaceAll("\\..*", ".");
// Output what you like about the article using the extracted parts
}
Scanner.next() reads in the whole line (the default delimiter is the newline char(s)).
split("\\s*\\|\\s*") splits the line on pipe chars (which have to be escaped because the pipe char has special regex meaning) and the \s* consumes any whitespace that may surround the pipe chars.

What I did was change hasMoreElements() to hasMoreTokens(). I then found the first occurrence of a ".". and created an int value. I then printed out a substring. here is what my code looked like:
while((s = br.readLine()) != null){
out.println("<tr>");
StringTokenizer s2 = new StringTokenizer(s, "|");
while (s2.hasMoreTokens()){
if (index == 0){
String one = s2.nextToken();
int i = one.indexOf(".");
out.println("<td>"+one.substring(0 , i)+"."+"</td>");
}

Java String Matching in a Sorted File and grouping similar data

i have sorted file and i need to do the following pattern match. I read the row and then compare or do patern match with the row just after it , if it matches then insert the string i used to match after a comma in that row and move on to the next row. I am new to Java and overwhelmed with options from Open CSV to BufferedReader. I intend to iterate through the file till it reaches the end. I may always have blanks and have a dated in quotes. The file size would be around 100 MBs.
My file has data like
ABCD
ABCD123
ABCD456, 123
XYZ
XYZ890
XYZ123, 890
and output is expected as
ABCD, ABCD
ABCD123, ABCD
ABCD456, 123, ABCD
XYZ, XYZ
XYZ890, XYZ
XYZ123, 890, XYZ
Not sure about the best method. Can you please help me.

To open a file, you can use File and FileReader classes:
File csvFile = new File("file.csv");
FileReader fileReader = null;
try {
fileReader = new FileReader(csvFile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
You can get a line of the file using Scanner:
Scanner reader = new Scanner(fileReader);
while(reader.hasNext()){
String line = reader.nextLine();
parseLine(line);
}
You want to parse this line. For it, you have to study Regex for using Pattern and Matcher classes:
private void parseLine(String line) {
Matcher matcher = Pattern.compile("(ABCD)").matcher(line);
if(matcher.find()){
System.out.println("find: " + matcher.group());
}
}
To find the next pattern of the same row, you can reuse matcher.find(). If some result was found, it will return true and you can get this result with matcher.groud();

Read line by line and use regex to replace it as per your need using String.replaceAll()
^([A-Z]+)([0-9]*)(, [0-9]+)?$
Replacement : $1$2$3, $1
Here is Online demo
Read more about Java Pattern
Sample code:
String regex = "^([A-Z]+)([0-9]*)(, [0-9]+)?$";
String replacement = "$1$2$3, $1";
String newLine = line.replaceAll(regex,replacement);
For better performance, read 100 or more lines at a time and store in a buffer and finally call String#replaceAll() single time to replace all at a time.
sample code:
String regex = "([A-Z]+)([0-9]*)(, [0-9]+)?(\r?\n|$)";
String replacement = "$1$2$3, $1$4";
StringBuilder builder = new StringBuilder();
int counter = 0;
String line = null;
try (BufferedReader reader = new BufferedReader(new FileReader("abc.csv"))) {
while ((line = reader.readLine()) != null) {
builder.append(line).append(System.lineSeparator());
if (counter++ % 100 == 0) { // 100 lines
String newLine = builder.toString().replaceAll(regex, replacement);
System.out.print(newLine);
builder.setLength(0); // reset the buffer
}
}
}
if (builder.length() > 0) {
String newLine = builder.toString().replaceAll(regex, replacement);
System.out.print(newLine);
}
Read more about Java 7 - The try-with-resources Statement

Splitting array on new line

I am submitting the following input through stdin:
4 2
30 one
30 two
15 three
25 four
My code is:
public static void main(String[] args) throws IOException {
BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
String submittedString;
System.out.flush();
submittedString = stdin.readLine();
zipfpuzzle mySolver = new zipfpuzzle();
mySolver.getTopSongs(submittedString);
}
Which calls:
//Bringing it all together
public String getTopSongs(String myString) {
setUp(myString);
calculateQuality();
qualitySort();
return titleSort();
}
Which calls
public void setUp(String myString) {
String tempString = myString;
//Creating array where each element is a line
String[] lineExplode = tempString.split("\\n+");
//Setting up numSongsAlbum and songsToSelect
String[] firstLine = lineExplode[0].split(" ");
numSongsAlbum = Integer.parseInt(firstLine[0]);
songsToSelect = Integer.parseInt(firstLine[1]);
System.out.println(lineExplode.length);
//etc
}
However, for some reason lineExplode.length returns value 1... Any suggestions?
Kind Regards,
Dario

String[] lineExplode = tempString.split("\\n+");
The argument to String#split is a String that contains a regular expression

Your String#split regex will work file on Strings with newline characters.
String[] lineExplode = tempString.split("\n");
The problem is that your tempString has none of these characters, hence the size of the array is 1.
Why not just put the readLine in a loop and add the Strings to an ArrayList
String submittedString;
while (!(submittedString= stdin.readLine()).equals("")) {
myArrayList.add(submittedString);
}

Are you sure the file is using UNIX-style line endings (\n)? For a cross-platform split, use:
String[] lineExplode = tempString.split("[\\n\\r]+");

You should use "\\n" character to separate by new line but check that not all OS use the same separators ( http://en.wikipedia.org/wiki/Newline )
To solve this is very useful the system property line.separator that contains the current separator charater(s) for the current OS that is running the application.
You should use:
String[] lineExplode = tempString.split("\\\\n");
using \n as separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator);
Using the current OS separator
Or:
String lineSeparator = System.getProperty("line.separator");
String[] lineExplode = tempString.split(lineSeparator + "+");
Using the current OS separator and requiring one item

Its better to use this split this way:
String[] lineExplode =
tempString.split(Pattern.quote(System.getProperty("line.separator")) + '+');
To keep this split on new line platform independent.
UPDATE: After looking at your posted code it is clear that OP is reading just one line (till \n) in this line:
submittedString = stdin.readLine();
and there is no loop to read further lines from input.

reading from text file to string array

So I can search for a string in my text file, however, I wanted to sort data within this ArrayList and implement an algorithm. Is it possible to read from a text file and the values [Strings] within the text file be stored in a String[] Array.
Also is it possible to separate the Strings? So instead of my Array having:
[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]
is it possible to an array as:
["Alice", "was" "beginning" "to" "get"...]
.
public static void main(String[]args) throws IOException
{
Scanner scan = new Scanner(System.in);
String stringSearch = scan.nextLine();
BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
List<String> words = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
for(String sLine : words)
{
if (sLine.contains(stringSearch))
{
int index = words.indexOf(sLine);
System.out.println("Got a match at line " + index);
}
}
//Collections.sort(words);
//for (String str: words)
// System.out.println(str);
int size = words.size();
System.out.println("There are " + size + " Lines of text in this text file.");
reader.close();
System.out.println(words);
}

To split a line into an array of words, use this:
String words = sentence.split("[^\\w']+");
The regex [^\w'] means "not a word char or an apostrophe"
This will capture words with embedded apostrophes like "can't" and skip over all punctuation.
Edit:
A comment has raised the edge case of parsing a quoted word such as 'this' as this.
Here's the solution for that - you have to first remove wrapping quotes:
String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");
Here's some test code with edge and corner cases:
public static void main(String[] args) throws Exception {
String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
System.out.println(Arrays.toString(words));
}
Output:
[I, ie, me, can't, extract, can't, or, can't]

Also is it possible to separate the Strings?
Yes, You can split string by using this for white spaces.
String[] strSplit;
String str = "This is test for split";
strSplit = str.split("[\\s,;!?\"]+");
See String API
Moreover you can also read a text file word by word.
Scanner scan = null;
try {
scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while(scan.hasNext()){
System.out.println( scan.next() );
}
See Scanner API

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading file line with multiple delimiters in java - java

Are you sure those are actually spaces? This should work for any white-space: #Test public void test() { String s = "1 2,3:4;5#6\t7"; Assert.assertEquals(7, s.split("[\\s,:;#]").length); }

Related

NoSuchToken exception for StringTokenizer.nextToken()

Split command on a nextElement

Java String Matching in a Sorted File and grouping similar data

Splitting array on new line

reading from text file to string array

Categories

Resources