Text searching with line Number Complication - java

EDIT:
Thanks dawww, the problem was with the Encoding, i changed it to UFT-8, and now the program works perfectly well. Just a tad slow.
I am in desperate need of help.
THE PROBLEM:
I have a TreeSet with words i took out of a text, they're all lower case and follow this regex("[^a-zA-Z]"), what i need is to compare word by word of the TreeSet with the text i took them from and get the line number each word appear, store them into and ArrayList and return.
I have the following Code:
public ArrayList<Integer> search(String word, String book) throws FileNotFoundException, IOException{
FileReader path = new FileReader(book);
LineNumberReader read = new LineNumberReader(path);
ArrayList<Integer> lines = new ArrayList<>();
String line;
for(line = read.readLine(); line != null; line = read.readLine()){
if(line.toLowerCase().contains(word)){
lines.add(read.getLineNumber());
}
}
return lines;
}
The idea is to use the search method's return as a value into a Map> (each word and the lines)
like this:
for(String s : words){
map.put(s, search(s , book));
}
words is the TreeSet with the strings i took from the text (Alice in wonderland by Lewis Carroll).
the code doesn't work, and i don't know why. The code compiles and runs but the map is empty.

To check if line contains word case insensitive, you can use Apache Commons Lang library, and specifically this method: StringUtils.containsIgnoreCase(CharSequence str, CharSequence searchStr).
This library has also other utility methods that can help, for example strip and trim are useful for cleaning Strings before operate with them.
Another problem can be with the encoding of the file. FileReader always use the platform default encoding. Try to use new InputStreamReader(new FileInputStream(filePath), <encoding>) to read from the file.

Remember contains method is case sensative.
And you are making line to lower case line.toLowerCase()
It may not be matching because of that.
Please put System.out.print statement for line.toLowerCase() and word to check it
System.out.print(line.toLowerCase()+" "+word);
And if that is the case, solution will be to lower case the word also in if condition.
if(line.toLowerCase().contains(word.toLowerCase())){
lines.add(read.getLineNumber());
}

Related

Iterate through a dictionary array

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");
Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.
I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!
Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

Setting two different text files as seperate string arrays and finding matches from the two arrays in Java

So basically i'm trying to take two text files (one with many jumbled words and one with many dictionary words.) I am supposed to take these two text files and convert them to two seperate arrays.
Following that, I need to compare jumbled strings from the first array and match the dictionary word in the second array up to it's jumbled counterpart. (ex. aannab(in the first array) to banana(in the second array))
I know how to set one array from a string, however I don't know how to do two from two seperate text files.
Use HashMap for matching. Where first text file data will be the key of Map and second text file data will be value. Then, by using key, you will get matching value.
you can read each file into an array like this:
String[] readFile(String filename) throws IOException {
List<String> stringList = new ArrayList<>();
try {
FileInputStream fis = new FileInputStream(new File(filename));
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
while ((line = br.readLine()) != null) {
stringList.add(line);
}
} finally {
br.close();
}
return stringList.toArray(new String[stringList.size()]);
}
Next, try to do the matching:
String[] jumbles = readFile("jumbles.txt");
String[] dict = readfile("dict.txt);
for (String jumble : jumbles) {
for (String word : dict) {
// can only be a match if the same length
if (jumble.length() == word.length()) {
//next loop through each letter of jumble and see if it
//appears in word.
}
}
}
I know how to set one array from a string, however I don't know how to do two from two seperate text files
I would encourage you to divide your problems don't knows and knows.
Search don't knows over internet you will get lot of ways to do it.
Then search for what you know,to explore whether it can be done in a better way.
To help you here,
Your Don't knows:
Reading file in Java.
Processing the content of read file.
Your known part :
String to array representation ( Search whether there are better ways in your use case)
Combine both :-)

String replace and output in Java [duplicate]

This question already has answers here:
Java String replace not working [duplicate]
(6 answers)
Closed 9 years ago.
I got a query, please see code below:
public void readFile(String path,String pathName,int num){
try{
PrintWriter out2=new PrintWriter(new PrintWriter(path));
File a=new File(pathName);
Scanner b=new Scanner(a);
while(b.hasNextLine()){
String message=b.nextLine();
Scanner h=new Scanner(message);
while(h.hasNext()){
String f=h.next();
if (f.equals("are")){
f.replace("are","ARE");
}
}
out2.printf("%s",message);
out2.println();
.......
The file content for scanner read is
who are you?
how are you?
what is up!
However, when I run the above codes and the output to the new file are the same with the input file, it means the "are" not replaced by "ARE", I have no idea which part is wrong, please advise, thanks guys!
This line just outputs the message unchanged to the new file.
out2.printf("%s",message);
Also the loop is strange too: why do you read it word by word, and then use String.replace()? You could do it line by line, using String.replaceAll():
while(h.hasNextLine()){
String message=b.nextLine();
out2.printf("%s",message.replaceAll("(^|\\W)are(\\W|$)"," ARE "));
}
The (^|\\W)are(\\W|$) string is a regular expression, having the meaning to match all content, that starts with either being the start of the string ^, or a non-word character (\\W), the string are, and ends with a non-word character or the end of line($)...
As scanner has whitespace as the default delimiter, it might be ever better to use (^|\\s)are(\\s|$), however both these will replace the whitespace before and after "ARE" with a single space ()...
Also, keep in mind, that String.replace does not mutate the input String... You have to assign the result, or use it any other way, like pass it to a function...
String is final and immutable, which is the same.
so f.replace("are","ARE"); must be inserted into a new or not variable.
f = f.replace("are","ARE");
I do not understand why you are doing that. Here is an alternative approach:
Get a BufferedReader to read the file.
While there is data in the file, read the lines.
If line.contains("are") then line = line.replace("are","ARE")
println(line)
As to why your code did not work:
In this line, f.replace("are","ARE"); You forgot to get the output.
Make it as such: message = f.replace("are","ARE");
Another option is to use StringBuffer or StringBuilder
Strings are immutable. Therefore, you can not run the replace method on object f and expect its value to be changed since the replace method of a string object will simply return a new String object.
either use a StringBuilder instead, or use :
f = f.replace
On the other hand, StringBuilder objects are mutable. Therefore, you can run the StringBuilder version of the replace method directly on the object if you choose that route instead.

How do I print content of reference in JAVA?

Newbie question comming up. Trying to get my head around JAVA.
How do I print out the content of the reference and not just their postition ? My program is ment to get some text in from the user, and print it out in a reverse order.
Here is my program (so far):
package myProgram;
import javax.swing.JOptionPane;
public class someRandomClass {
public static void main(String[] args) {
String word = JOptionPane.showInputDialog("Write som text here");
StringBuilder outPut = new StringBuilder();
for (int i = word.length()-1; i>=0; i--){
outPut.append(i);
}
System.out.println(outPut.toString());
}
}
I am greatfull for any help and tips! :)
In the line
outPut.append(i);
you are appending the value of your loop counter. You surely mean
outPut.append(word.charAt(i));
You seem to appending the integers instead of the appropriate characters. Try this instead:
outPut.append(word.substring(i, i + 1))
This way, the individual characters of word are appended to your StringBuilder. Note that the append method could also take a char as an argument, so you are also able to use word.charAt(i).
So, you want to emit the character at the position? Try using String.charAt.
outPut.append(word.charAt(i));
I'd probably avoid that and just index the char[] from String.toCharArray, though.
To be honest, I'd avoid doing the reversal loop manually to begin with... try something as follows:
final String word = JOptionPane.showInputDialog("Enter text below");
System.out.println(new StringBuilder(word).reverse());
StringBuilder.reverse should do the work for you (likely in a more efficient way, too). You also don't need to call toString manually, as println will do that for you.

Need help parsing strings in Java

I am reading in a csv file in Java and, depending on the format of the string on a given line, I have to do something different with it. The three different formats contained in the csv file are (using random numbers):
833
"79, 869"
"56-57, 568"
If it is just a single number (833), I want to add it to my ArrayList. If it is two numbers separated by a comma and surrounded by quotations ("79, 869)", I want to parse out the first of the two numbers (79) and add it to the ArrayList. If it is three numbers surrounded by quotations (where the first two numbers are separated by a dash, and the third by a comma ["56-57, 568"], then I want to parse out the third number (568) and add it to the ArrayList.
I am having trouble using str.contains() to determine if the string on a given line contains a dash or not. Can anyone offer me some help? Here is what I have so far:
private static void getFile(String filePath) throws java.io.IOException {
BufferedReader reader = new BufferedReader(new FileReader(filePath));
String str;
while ((str = reader.readLine()) != null) {
if(str.endsWith("\"")){
if (str.contains(charDash)){
System.out.println(str);
}
}
}
}
Thanks!
I recommend using the version of indexOf that actually takes a char rather than a string, since this method is much faster. (It is a simple loop, without a nested loop.)
I.e.
if (str.indexOf('-')!=-1) {
System.out.println(str);
}
(Note the single quotes, so this is a char, rather than a string.)
But then you have to split the line and parse the individual values. At present, you are testing if the whole line ends with a quote, which is probably not what you want.
The following code works for me (note: I wrote it with no optimization in mind - it's just for testing purposes):
public static void main(String args[]) {
ArrayList<String> numbers = GetNumbers();
}
private static ArrayList<String> GetNumbers() {
String str1 = "833";
String str2 = "79, 869";
String str3 = "56-57, 568";
ArrayList<String> lines = new ArrayList<String>();
lines.add(str1);
lines.add(str2);
lines.add(str3);
ArrayList<String> numbers = new ArrayList<String>();
for (Iterator<String> s = lines.iterator(); s.hasNext();) {
String thisString = s.next();
if (thisString.contains("-")) {
numbers.add(thisString.substring(thisString.indexOf(",") + 2));
} else if (thisString.contains(",")) {
numbers.add(thisString.substring(0, thisString.indexOf(",")));
} else {
numbers.add(thisString);
}
}
return numbers;
}
Output:
833
79
568
Although it gets a lot of hate these days, I still really like the StringTokenizer for this kind of stuff. You can set it up to return the tokens and, at least to me, it makes the processing trivial without interacting with regexes
you'd have to create it using ",- as your tokens, then just kick it off in a loop.
st=new StringTokenizer(line, "\",-", true);
Then you set up a loop:
while(st.hasNextToken()) {
String token=st.nextToken();
Each case becomes it's own little part of the loop:
// Use punctuation to set flags that tell you how to interpret the numbers.
if(token == "\"") {
isQuoted = !isQuoted;
} else if(token == ",") {
...
} else if(...) {
...
} else { // The punctuation has been dealt with, must be a number group
// Apply flags to determine how to parse this number.
}
I realize that StringTokenizer is outdated now, but I'm not really sure why. Parsing regular expressions can't be faster and the syntax is--well split is a pretty sweet syntax I gotta admit.
I guess if you and everyone you work with is really comfortable with Regular Expressions you could replace that with split and just iterate over the resultant array but I'm not sure how to get split to return the punctuation--probably that "+" thing from other answers but I never trust that some character I'm passing to a regular expression won't do something utterly unexpected.
will
if (str.indexOf(charDash.toString()) > -1){
System.out.println(str);
}
do the trick?
which by the way is fastest than contains... because it implements indexOf
Will this work?
if(str.contains("-")) {
System.out.println(str);
}
I wonder if the charDash variable is not what you are expecting it to be.
I think three regexes would be your best bet - because with a match, you also get the bit you're interested in. I suck at regex, but something along the lines of:
.*\-.*, (.+)
.*, (.+)
and
(.+)
ought to do the trick (in order, because the final pattern matches anything including the first two).

Categories