Java or Eclipse 'fails' on Whitespaces(?) - java

After many questions asked by other users this is my first one for which I was not able to find a fitting answer.
However, the problem sounds weird and actually is:
I have had more than one situation in which whitespaces were part of the problem and common solutions to be find on stackoverflow or elsewhere did not help me.
First I wanted to split a String on whitespaces. Should be something like
String[] str = input.split(" ")
But neither (" ") nor any regex like ("\\s+") worked for me. Not really a problem at all. I just chose a different character to split on. :)
Now I'm trying to clean up a string by removing all whitespaces. Common solution to find is
String str = input.replaceAll(" ", "")
I tried to use the regex again and also (" *", "") to prevent exception if the string inludes no whitespaces. Again, none of these worked for me.
Now I'm asking myself whether this is a kinda weird problem on my Java/Eclipse plattform or if I'm doing something basically wrong. Technically I do not think so, because all code above works fine with any other character to split/clean on.
Hope to have made myself understood.
Regards Drebin
edit to make it clearer:
I'm caring just about the "replacing" right now.
My code does accept a combination of values and names separated by comma and series of these separated by semicolon, e.g.:
1,abc;2,def;3,ghi
this gets two time splitted, first on comma, then on semicolon. Works fine.
Now I want to clear such an input by removing all whitespaces to proceed as explained above. Therefore I use, as already explained, String.replaceAll(" ", ""), but it does NOT work. Instead, everything in the string after the FIRST whitespace, no matter where it is, gets removed and is lost. E.g. the String from above would change to
1,abc;
if there is whitespace after the first semicolon.
Hope this part of code works for you:
import java.util.*;
public class Main {
public static void main(String[] args) {
// some info output
Scanner scan = new Scanner(System.in);
String input;
System.out.println("\n wait for input ...");
input = scan.next();
if(input.equals("info"))
{
// special case for information
}
else if(input.equals("end"))
{
scan.close();
System.exit(0);
}
else
{
// here is the problem:
String input2 = input.replaceAll(" ", "");
System.out.println("DEBUG: "+input2);
// further code for cleared Strings
}
}
}
I really do not know how to make it even clearer now ...

The next method of Scanner returns the next token - with the default delimiters that will be a single word, not the complete line.
Use the nextLine method if you want to get the complete line.

Related

(Java) Trying to read a txt file and count the number of occurrences for each word

I am supposed to write a program that reads a file called mobydick.txt. The file contains the entire text of Moby Dick the book. The mobydick.txt file looks like this
I have to read the file, display every unique word in the file and then display the number of occurrences of each unique word.
The output should look like:
WORD Number
the 43
whale 12
boat 93
This is my code so far:
import java.util.*;
import java.io.*;
public class Main
{
public static void main(String[] args) throws IOException
{
//Create input stream & scanner
FileInputStream fin = new FileInputStream("mobydick.txt");
Scanner fileInput = new Scanner(fin);
//Create Arraylist
ArrayList<String> words = new ArrayList<String>();
ArrayList<Integer> count = new ArrayList<Integer>();
//Read through file and find the words
while(fileInput.hasNext())
{
//Get next word
String nextWord = fileInput.next();
//Determine if the word is in the arraylist
if(words.contains(nextWord))
{
int index = words.indexOf(nextWord);
count.set(index, count.get(index) + 1);
}
else
{
words.add(nextWord);
count.add(1);
}
}
//close
fileInput.close();
fin.close();
System.out.println("WORDS COUNT");
//Print out the results
for(int i = 0; i < words.size(); i++)
{
System.out.print(words.get(i) + " " + count.get(i) + "\n");
}
}
}
However, when I run this code I get a strange looking output.
It's strange because if I run the same code for a smaller and simpler text file like this, the output looks exactly like I want it to.
What am I doing wrong with the mobydick.txt?
Just look at the text input file. It contains, for example, ago-never. Computer tools for programmers tend to be extremely stupid, because us programmers need them to be extremely simple. Scanner splits on whitespace. Period. - is not whitespace. Scanner is dutifully giving you ago-never as a single token. If the book contains Cosmic said: "Sheesh, this coding stuff is hard, man!"., then these are the tokens that scanner is going to give you:
Cosmic
said:
"Sheesh,
this
coding
stuff
is
hard,
man!".
Which is obviously not what you wanted. You wanted for example man. Not man!"..
A second issue is that text files are files, and therefore, bag-o-bytes. bytes aren't characters. So, when you turn your file into a scanner, you're implicitly asking the computer to take a wild stab at how to do that, and wild stab it will: It will use 'platform default encoding', which is java-ese for 'never what you want'. There is no easy answer here. Somebody needs to investigate or tell you what the encoding is. It's probably UTF-8. In which case, you gotta tell java about that:
new Scanner(fin, "UTF-8")
you didn't, so java picked 'platform default encoding', which is some arbitrary and generally wrong choice, and thus something like 'HaƤgen Dasz' messes up - only the most basic characters tend to survive conversion with the wrong charset encoding.
As to how to solve that first problem, possibly all you really need is to tell scanner that you want the 'thing that is between tokens' to be 'any amount of non-letters'. The delimiter is a regexp which is presumably a concept you haven't been taught yet; it's quite complicated. The regexp \W+ represents the notion of: "1 or more 'non-word' characters", and that as separator would mean that the sequence of exclamation point, quote, dot, newline - all disappear as merely the thing that separates tokens. - is also not a letter, so, ago-never in the input file would then give you two tokens: ago, and never.
You should still lowercase the inputs, scanners cannot do this for you.
To set the delimiter:
scanner.useDelimiter("\\W+"); // double backslash. That's not a typo.
EDIT: This answer used [^a-zA-Z]+ before, but as #VGR pointed out in a comment, \\W+ is easier to understand; it's probably more idiomatic in general.

Java Splitting a string with multiple delimiters, some of which are 2-character sequences

long-time reader here but first-time poster! I am working on a college project that involves using Java to manipulate transcriptions of traditional music melodies written in the text-based abc notation standard (see here for a quick explainer on the abc standard, if you are interested).
I want to take the body of a whole tune transcription which is represented as a String, and split it into individual bars (i.e. into an array of Strings, one String for each bar). The abc standard has a number of different symbols and combinations of symbols that are used to delimit bars. These symbols are:
|
|]
||
[|
|:
:|
::
My idea was to use a regular expression with the String.split() method to break the tuneBody String below into the arrayOfBars array of Strings. My regex is below, and is intended to try to find any of the above symbols that can be used to delimit a bar in the music.
import java.util.Arrays;
public class TroubleshootRegex
{
//Split the tuneBody into individual bars
public static void main(String[] args)
{
//The musical notes from an abc tune transcription
String tuneBody = "|:G3 GAB|A3 ABd|edd gdd|edB dBA|\nGAG GAB|ABA ABd|edd gdd|BAF G3:|\nB2B d2d|ege dBA|B2B dBG|ABA AGA|\nBAB d^cd|ege dBd|gfg aga|bgg g3:|";
//The body of the tune after being split into individual bars
String[] arrayOfBars;
//This regex is my attempt to look for all the possible bar delimiters defined in the abc standard
String abcBarDelimiters = "[\\|]|\\|\\||\\[\\||\\|:|:\\||::|\\|]";
arrayOfBars = tuneBody.split(abcBarDelimiters);
System.out.println(Arrays.toString(arrayOfBars));
}
}
Unfortunately, when I run the above, I end up with a couple of issues. One of the issues is that I get an empty string at the start of the array, but a bit of research shows me that that's a known issue so I'll figure out a way to work around that. The bigger issue though that I can't seem to figure out on my own is that I end up with a colon included in the first bar of the music, whereas this should be filtered out as part of the initial delimiter when splitting the string if everything worked as intended. i.e. I want the initial "|:" delimiter from tuneBody to be removed during the string splitting. Here's the output:
[, :G3 GAB, A3 ABd, edd gdd, edB dBA,
GAG GAB, ABA ABd, edd gdd, BAF G3,
B2B d2d, ege dBA, B2B dBG, ABA AGA,
BAB d^cd, ege dBd, gfg aga, bgg g3]
I'm assuming that means that I probably have some kind of problem in my regex, but for the life of me I can't seem to figure out what the actual problem is, and I'm starting to go cross-eyed looking at it! It seems that it is matching the single pipe character at the start as a delimiter, rather than matching the character sequence |:
I'd be massively grateful if anyone who actually knows a bit about regexes can tell me why mine doesn't seem to do what I want, or how to get it to see the |: sequence as a whole as a delimiter, rather than a delimiter followed by a colon.
Thanks in advance!
One of the issues is that I get an empty string at the start of the array, but a bit of research shows me that that's a known issue so I'll figure out a way to work around that.
The problem is that your string starts with a delimiter so it will create an empty string as the first element of the split. The same would happen if you have two consecutive delimiters, e.g. ...|::|.... To solve that you could remove the empty strings you don't want, e.g. by using a list instead of an array.
The bigger issue though that I can't seem to figure out on my own is that I end up with a colon included in the first bar of the music, whereas this should be filtered out as part of the initial delimiter when splitting the string if everything worked as intended. i.e. I want the initial "|:" delimiter from tuneBody to be removed during the string splitting.
I'm not entirely sure here (but pretty sure): the problem is that the single pipe is the first option in your regex and thus it matches the pipe in |:. To fix that it should be sufficient to put the single pipe at the end.
You can also simplify your regex since you don't need character classes. Thus this should work:
String abcBarDelimiters = "\\|\\||\\[\\||\\|:|:\\||::|\\|\\]|\\|";
For going more easy on the regex beginners eyes, try the following:
public static void main(String[] args) {
//The musical notes from an abc tune transcription
String tuneBody = "|:G3 GAB|A3 ABd|edd gdd|edB dBA|\nGAG GAB|ABA ABd|edd gdd|BAF G3:|\nB2B d2d|ege dBA|B2B dBG|ABA AGA|\nBAB d^cd|ege dBd|gfg aga|bgg g3:|";
//The body of the tune after being split into individual bars
String re1 = "\\|[\\]\\||:]?"; // |, |], |:
String re2 = "\\[\\|"; // [|
String re3 = ":[\\|:]"; // :|, ::
String abcBarDelimiters = "(" + re1 + "|" + re2 + "|" + re3 + ")";
String[] arrayOfBars = tuneBody.split(abcBarDelimiters);
System.out.println(Arrays.toString(arrayOfBars));
}
... and as Thomas already said, the empty string at the beginning is due to the input starting with a delimiter.

How to make a space between a line in Java?

System.out.print("I have a question, can you assist me?");
System.out.println();
System.out.println("How can I make a gap between these two statements?");
I tried to use println(), thinking that it would create a blank line, but it didn't.
Try:
public class Main {
public static void main(String args[]) {
System.out.println("I have a question, can you assist me?\n");
System.out.println("How can I make a gap between these two statements?");
}
}
P.S. \n is newline separator and works ok at least on Windows machine. To achieve truly crossplatform separator, use one of methods below:
System.out.print("Hello" + System.lineSeparator()); // (for Java 1.7 and 1.8)
System.out.print("Hello" + System.getProperty("line.separator")); // (Java 1.6 and below)
Here's what's going on.
System.out.print("I have a question, can you assist me?");
You have now printed a bunch of characters, all on the same line. As you have used print and have not explicitly printed a newline character, the next character printed will also go onto this same line.
System.out.println();
This prints a newline character ('\n'), which is not the same as printing a blank line. Rather, it will cause the next character printed to go onto the line following the current one.
System.out.println("How can I make a gap between these two statements?");
Since you just printed a newline character, this text will go onto the line directly following your "I have a question" line. Also, since you have called println, if you print anything right after this, it will go onto a new line instead of the same one.
To put a blank line between the two statements, you can do this (I know, I know, not entirely cross-platform, but this is just a very simple example):
System.out.println("I have a question, can you assist me?");
System.out.println();
System.out.println("How can I make a gap between these two statements?");
Since you are now printing two newline characters between the two lines, you'll achieve the gap that you wanted.
Beware that adding a bare "\n" to the string you are outputting is liable to make your code platform specific. For console output, this is probably OK, but if the file is read by another (platform native) application then you can get strange errors.
Here are some recommend approaches ... that should work on all platforms:
Just use println consistently:
System.out.println("I have a question, can you assist me?");
System.out.println();
System.out.println("How can I make a gap?");
Note: println uses the platform default end-of-line sequence.
Use String.format:
String msg = "I have a question, can you assist me?%n%nHow can " +
"I make a gap?%n";
System.out.print(String.format(msg));
Note: %n means the platform default end-of-line sequence.
Note: there is a convenience printf method in the PrintWriter interface that does the same thing as String.format
Manually insert the appropriate end-of-line sequence into the string; see the end of #userlond's answer for examples.
Use:
system.out.println("\n");
\n takes you to new line.
You can use the below code. By that method you can use as much line gap as you want. Just increase or decrease the number of "\n".
System.out.print("Hello \n\n\n\n");
System.out.print("World");
If you want to leave a single line space in java,you could use
System.out.println("");
But if you want to leave Multiple line spaces in java,you could use
System.out.println("/n/n/n/n");
which means that each '/n' represents a single line
What about this one?
public class Main {
public static void main(String args[]) {
System.out.println("Something...");
System.out.printf("%n"); // Not "\n"
System.out.println("Something else...");
}
}
"%n" should be crossplatform-friendly.

String.contains always appears false

public final void nameErrorLoop () {
while (error) {
System.out.println("Enter the employee's name.");
setName(kb.next());
if (name.contains("[A-Za-z]")) {
error = false;
}
else {
System.out.println("The name can only contain letters.");
}
}
error = true;
}
Despite a similar setup working in a different method, this method becomes stuck in a constant loop because the if statement is always false. I've tried .matches(), and nothing that I found on the Interent has helped so far. Any ideas why? Thanks for your help in advance.
Edit: I just noticed as I was finishing the project, that trying to print 'name' later only shows the first name, and the last is never printed. Is there any way I can get the 'name' string to include both?
String.contains doesn't use regular expressions - it just checks whether one string contains another, in the way that "foobar" contains "oob".
It sounds like you want to check that name only contains letters, in which case you should be checking something like:
if (name.matches("^[A-Za-z]+$"))
The + (instead of *) will check that it's non-empty; the ^ and $ will check that there's nothing other than the letters.
If you expect it to be a full name, however, you may well want to allow spaces, hyphens and apostrophes:
if (name.matches("^[-' A-Za-z]+$"))
Also consider accented characters - and punctuation from other languages.
Easy. .contains() is not what you think. It does exact String matching.
"anything".contains("something that's not a regular expression");
Either use this
Pattern p2=Pattern.compile("[A-Za-z]+");//call only once
p2.matcher(txt).find();//call in loop
or this:
for(char ch: "something".toCharArray()){
if(Character.isAlphabetic(ch)){
}
}

How to scan words in a text without newline character?

I have serious issues to understand how the scanner class works. Indeed, I would like from this input:
AAAA BBBG GREZZ
ADFG GTRE
FREZZ
to have this ouput as an ArrayList:
[AAAA, BBBG, GREZZ, ADFG, GTRE, FREZZ]
My code is the following:
System.out.println("list of words?");
Scanner scan3 = new Scanner(System.in);
scan3.useDelimiter("[\\s+\\n]");
ArrayList<String> test = new ArrayList<String>();
while(scan3.hasNext()){
String temp = scan3.next();
if(temp.equals("STOP")){
break;
}else{
test.add(temp);
}
}
System.out.println(test);
My input is:
AAAA BBBG GREZZ
ADFG GTRE
FREZZ STOP
And my ouput is:
[AAAA, BBBG, GREZZ, , ADFG, GTRE, , FREZZ]
My question is twofold:
Why do I have an empty element inserted in the list (which is I beleive related to the new line)?
As you might have noticed, I add the string "STOP" at the end in order to stop the loop because the condition scan3.HasNext() always returns "true", is it the only way to proceed?
Many thanks for your help.
Just use String#split() with a suitable delimiter (it sounds like in this case you just need "[ \\n]+" (that's a space and a \n) but you may want to tweak that a bit, for example you might want to split on all whitespace.
The reason for the empty entry in the list is because you are just scanning for a single character of whitespace. This gets matched twice when you have two line endings next to each other. You need to modify the regex so it matches more than one character.
Your + appears to be in the wrong place. Try instead using scan3.useDelimiter("[\\s\\n]+");
Regarding,
As you might have noticed, I add the string "STOP" at the end in order to stop the loop because the condition scan3.HasNext() always returns "true", is it the only way to proceed?
What other way would you wish to proceed? What other stop condition do you propose to use?

Categories