Scanner's delimiter and regex in java - java

I'm trying to read input word by word, but couldn't figure out how to set Scanner's delimiter to whitespace and punctuation marks except ' (the single quote).
Here's what I got
BufferedReader input;
String line;
Scanner sc;
String word;
try {
input = new BufferedReader(new FileReader(path));
while (input.ready()) {
line = input.readLine();
System.out.println("Current Line: " + line);
sc = new Scanner(line);
sc.useDelimiter("\\W\\s^\'");
//...
}
}
//...

I assume you mean?
sc.useDelimiter("\\W\\s^\'");
I would use
sc.useDelimiter("[^\\w']+");
String line= "Hello, world!\n 'Computer\n \n Science'\n Hell\n";
System.out.println(Arrays.toString(line.split("[^\\w']+")));
prints
[Hello, world, 'Computer, Science', Hell]
String line= "Hello, world!\n 'Computer\n \n Science'\n Hell\n";
Scanner scan = new Scanner(line);
scan.useDelimiter("[^\\w']+");
while(scan.hasNext())
System.out.print("|"+scan.next());
System.out.println("|");
prints
|Hello|world|'Computer|Science'|Hell|

You can also use the Tokenizer like that:
StringTokenizer st1 = new StringTokenizer("a|b|c");
while(st1.hasMoreTokens())
System.out.println(st1.nextToken());
Hope that could help you in your case.

Related

How to break a file into tokens based on regex using Java

I have a file in the following format, records are separated by newline but some records have line feed in them, like below. I need to get each record and process them separately. The file could be a few Mb in size.
<?aaaaa>
<?bbbb
bb>
<?cccccc>
I have the code:
FileInputStream fs = new FileInputStream(FILE_PATH_NAME);
Scanner scanner = new Scanner(fs);
scanner.useDelimiter(Pattern.compile("<\\?"));
if (scanner.hasNext()) {
String line = scanner.next();
System.out.println(line);
}
scanner.close();
But the result I got have the begining <\? removed:
aaaaa>
bbbb
bb>
cccccc>
I know the Scanner consumes any input that matches the delimiter pattern. All I can think of is to add the delimiter pattern back to each record mannully.
Is there a way to NOT have the delimeter pattern removed?
Break on a newline only when preceded by a ">" char:
scanner.useDelimiter("(?<=>)\\R"); // Note you can pass a string directly
\R is a system independent newline
(?<=>) is a look behind that asserts (without consuming) that the previous char is a >
Plus it's cool because <=> looks like Darth Vader's TIE fighter.
I'm assuming you want to ignore the newline character '\n' everywhere.
I would read the whole file into a String and then remove all of the '\n's in the String. The part of the code this question is about looks like this:
String fileString = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8);
fileString = fileString.replace("\n", "");
Scanner scanner = new Scanner(fileString);
... //your code
Feel free to ask any further questions you might have!
Here is one way of doing it by using a StringBuilder:
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(new File("C:\\test.txt"));
StringBuilder builder = new StringBuilder();
String input = null;
while (in.hasNextLine() && null != (input = in.nextLine())) {
for (int x = 0; x < input.length(); x++) {
builder.append(input.charAt(x));
if (input.charAt(x) == '>') {
System.out.println(builder.toString());
builder = new StringBuilder();
}
}
}
in.close();
}
Input:
<?aaaaa>
<?bbbb
bb>
<?cccccc>
Output:
<?aaaaa>
<?bbbb bb>
<?cccccc>

How to read file line by line by CRLF

I have the following file:
and following code:
Scanner scanner = new Scanner(new FileReader(new File(file.txt)));
scanner.useDelimiter("\r\n");
int i = 0;
while (scanner.hasNext()) {
scanner.nextLine();
i++;
}
System.out.println(i);
It returns 5.
expected result: 2.
What do I wrong?
I want to split by CRLF only (not LF).
Use scanner.next() to invoke the delimiter specified.
scanner.nextLine() will use \n (exact pattern is \r\n|[\n\r\u2028\u2029\u0085]) as delimiter, hence the length is 5.
while (scanner.hasNext()) {
scanner.next();
i++;
}

Java Scanner does not ignore new lines (\n)

I know that by default, the Scanner skips over whitespaces and newlines.
There is something wrong with my code because my Scanner does not ignore "\n".
For example: the input is "this is\na test." and the desired output should be ""this is a test."
this is what I did so far:
Scanner scan = new Scanner(System.in);
String token = scan.nextLine();
String[] output = token.split("\\s+");
for (int i = 0; i < output.length; i++) {
if (hashmap.containsKey(output[i])) {
output[i] = hashmap.get(output[i]);
}
System.out.print(output[i]);
if (i != output.length - 1) {
System.out.print(" ");
}
nextLine() ignores the specified delimiter (as optionally set by useDelimiter()), and reads to the end of the current line.
Since input is two lines:
this is
a test.
only the first line (this is) is returned.
You then split that on whitespace, so output will contain [this, is].
Since you never use the scanner again, the second line (a test.) will never be read.
In essence, your title is right on point: Java Scanner does not ignore new lines (\n)
It specifically processed the newline when you called nextLine().
You don't have to use a Scanner to do this
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String result = in.lines().collect(Collectors.joining(" "));
Or if you really want to use a Scanner this should also work
Scanner scanner = new Scanner(System.in);
Spliterator<String> si = Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED);
String result = StreamSupport.stream(si, false).collect(Collectors.joining(" "));

Java Delete Line from File

the code below is from a reference i saw online, so there might be some similarities i'm trying to implement the code to remove an entire line based on the 1st field in this instance it is (aaaa or bbbb) the file which has a delimiter "|", but it is not working. Hope someone can advise me on this. Do i need to split the line first? or my method is wrong?
data in player.dat (e.g)
bbbb|aaaaa|cccc
aaaa|bbbbbb|cccc
Code is below
public class testcode {
public static void main(String[] args)throws IOException
{
File inputFile = new File("players.dat");
File tempFile = new File ("temp.dat");
BufferedReader read = new BufferedReader(new FileReader(inputFile));
BufferedWriter write = new BufferedWriter(new FileWriter(tempFile));
Scanner UserInput = new Scanner(System.in);
System.out.println("Please Enter Username:");
String UserIn = UserInput.nextLine();
String lineToRemove = UserIn;
String currentLine;
while((currentLine = read.readLine()) != null) {
// trim newline when comparing with lineToRemove
String trimmedLine = currentLine.trim();
if(trimmedLine.equals(lineToRemove)) continue;
write.write(currentLine + System.getProperty("line.separator"));
}
write.close();
read.close();
boolean success = tempFile.renameTo(inputFile);
}
}
Your code compares the entire line it reads from the file to the user name the user enters, but you say in your question that you actually only want to compare to the first part up to the first pipe (|). Your code doesn't do that.
What you need to do is read the line from the file, get the part of the string up to the first pipe symbol (split the string) and skip the line based on comparing the first part of the split string to the lineToRemove variable.
To make it easier, you could also add the pipe symbol to the user input and then do this:
string lineToRemove = UserIn + "|";
...
if (trimmedLine.startsWith(lineToRemove)) continue;
This spares you from splitting the string.
I'm currently not sure whether UserInput.nextLine(); returns the newline character or not. To be safe here, you could change the above to:
string lineToRemove = UserIn.trim() + "|";

Java parsing text file and preserving line breaks?

I have been researching how to do this and becoming a bit confused, I have tried so far with Scanner but that does not seem to preserve line breaks and I can't figure out how to make it determine if a line is a line break. I would appreciate if anyone has any advice. I have been using the Scanner class as below but am not sure how to even check if the line is a new line. Thanks
for (String fileName : f.list()) {
fileCount++;
Scanner sc = new Scanner(new File(f, fileName));
int count = 0;
String outputFileText = "";
//System.out.println(fileCount);
String text="";
while (sc.hasNext()) {
String line = sc.nextLine();
}
}
If you're just trying to read the file, I would suggesting using LineNumberReader instead.
LineNumberReader lnr = new LineNumberReader(new FileReader(f));
String line = "";
while(line != null){
line = lnr.readLine();
if(line==null){break;}
/* do stuff */
}
Java's Scanner class already splits it into lines for you, even if the line is an empty String. You just have to scan through the lines again to get your values:
Scanner lineScanner;
while(sc.hasNext())
{
String nextInputLine = sc.nextLine();
lineScanner = new Scanner(nextInputLine);
while(lineScanner.hasNext())
{
//read the values
}
}
You probably want to use BufferedReader#readLine.

Categories