Iterate through a dictionary array - java

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");

Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.

I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!

Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

Related

Java read csv file as matrix

I'm new to writing java code as such. I have experience writing code in scripting type languages. I'm trying to rewrite a piece of code I had in python in java.
Python code below -
import pandas as pd
myFile = 'dataFile'
df = pd.DataFrame(pd.read_csv(myFile,skiprows=0))
inData = df.as_matrix()
I'm looking for a method in java that is equivalent to as_matrix in python. This function converts the data frame into a matrix.
I did look up for sometime now but can't find a method as such that does the conversion like in python. Is there a 3rd party library or something on those lines I could use? Any direction would help me a lot please. Thank you heaps.
What you want to do is really simple and requires minimal code on your part, therefore I suggest you code it yourself. Here is an example implementation:
List<String[]> rowList = new ArrayList<String[]>();
try (BufferedReader br = new BufferedReader(new FileReader("pathtocsvfile.csv"))) {
String line;
while ((line = br.readLine()) != null) {
String[] lineItems = line.split(",");
rowList.add(lineItems);
}
br.close();
}
catch(Exception e){
// Handle any I/O problems
}
String[][] matrix = new String[rowList.size()][];
for (int i = 0; i < rowList.size(); i++) {
String[] row = rowList.get(i);
matrix[i] = row;
}
What this does is really simple: It opens a buffered reader that will read the csv file line by line and paste the contents to an array of Strings after splitting them based on comma (which is your delimiter). Then it will add them to a list of arrays. I know this might not be perfect, so afterwards I take the contents of that list of arrays and turn it into a neat 2D matrix. Hope this helps.
Hint: there are a lot of improvements that could be made to this little piece of code (i.e. take care of trailing and leading spaces, add user-defined delimiters etc.), but this should be a good starting point.

BufferedReader and enumerating multiple lines in Java

I am in the process of making a java application that reads through a .ttl file line by line and creates a graphml file to represent the ontology.
I am having some trouble figuring out how to enumerate a certain section.
I am using BufferedReader to read each line.
For example, I have the following:
else if (line.contains("owl:oneOf")){
// insert code to enumerate list contained in ( )
}
And this is what the .ttl looks like for oneOf:
owl:oneOf (GUIFlow:ExactlyOne
GUIFlow:OneOrMore
GUIFlow:ZeroOrMore
GUIFlow:ZeroOrOne )
I need to return those 4 objects as one list, to be used as part of a graphical representation of an ontology.
Apparently you have some kind of loop going through the file. Here are some ideas:
1) Introduce a "state" into the loop so that upon reading the next line it will know that it's actually inside the oneOf list. A dynamic array to store the list can serve as the state. You create the list when encountering the (, and you send the list wherever it is needed when encountering the ) and then delete the list after that. A complication is that according to your source format you will have to create the list before adding values to it, and process and delete the list after adding values, because ( and ) are on the same lines as actual values.
Vector<String> oneOfList = null;
while(reader.ready()){
String line=reader.readLine();
if(line.contains("foo")){
...
}
else if (line.contains("owl:oneOf")){
oneOfList = new Vector<String>();
}
if(oneOfList!=null){
String str = line.trim();
int a = str.indexOf("("); // -1 if not found, OK
int b = str.indexOf(")");
if(b<0) b=str.length();
oneOfList.add(str.substring(a+1,b).trim());
}
if (line.contains(")")){
storeOneOf(oneOfList);
oneOfList=null;
}
}
2) When the oneOf header is encountered, create another small loop to read its values. A possible drawback may be that you end up with two loops iterating over the file and two calls to reader.readLine, which may complicate things or may not.
while(reader.ready()){
String line=reader.readLine();
if(line.contains("foo")){
...
}
else if (line.contains("owl:oneOf")){
Vector<String> oneOfList = new Vector<String>();
while(true){
String str = line.trim();
int a = str.indexOf("("); // -1 if not found, OK
int b = str.indexOf(")");
int c = (b>=0) ? b : str.length();
oneOfList.add(str.substring(a+1,c).trim());
if(b>=0) break;
line=reader.readLine();
}
storeOneOf(oneOfList);
}
}
3) The above algorithms rely on the fact that the header, the ( and the first value are on the same line, etc. If the source file is formatted a bit differently, the parsing will fail. A more flexible approach may be to use StreamTokenizer which automatically ignores whitespace and separates the text into words and stand-alone symbols:
StreamTokenizer tokzr=new StreamTokenizer(reader);
tokzr.wordChars(':',':');
while( tokzr.nextToken() != tokzr.TT_EOF ){
if( tokzr.ttype==tokzr.TT_WORD && tokzr.sval.equals("foo") ){
...
}
else if ( tokzr.ttype==tokzr.TT_WORD && tokzr.sval.equals("owl:oneOf") ){
if(tokzr.nextToken()!='(') throw new Exception("\"(\" expected");
Vector<String> oneOfList = new Vector<String>();
while(tokzr.nextToken() == tokzr.TT_WORD){
oneOfList.add(tokzr.sval);
}
storeOneOf(oneOfList);
if(tokzr.ttype!=')') throw new Exception("\")\" expected");
}
}
Have you considered (and rejected) existing solutions e.g: Jena ?

Setting two different text files as seperate string arrays and finding matches from the two arrays in Java

So basically i'm trying to take two text files (one with many jumbled words and one with many dictionary words.) I am supposed to take these two text files and convert them to two seperate arrays.
Following that, I need to compare jumbled strings from the first array and match the dictionary word in the second array up to it's jumbled counterpart. (ex. aannab(in the first array) to banana(in the second array))
I know how to set one array from a string, however I don't know how to do two from two seperate text files.
Use HashMap for matching. Where first text file data will be the key of Map and second text file data will be value. Then, by using key, you will get matching value.
you can read each file into an array like this:
String[] readFile(String filename) throws IOException {
List<String> stringList = new ArrayList<>();
try {
FileInputStream fis = new FileInputStream(new File(filename));
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
while ((line = br.readLine()) != null) {
stringList.add(line);
}
} finally {
br.close();
}
return stringList.toArray(new String[stringList.size()]);
}
Next, try to do the matching:
String[] jumbles = readFile("jumbles.txt");
String[] dict = readfile("dict.txt);
for (String jumble : jumbles) {
for (String word : dict) {
// can only be a match if the same length
if (jumble.length() == word.length()) {
//next loop through each letter of jumble and see if it
//appears in word.
}
}
}
I know how to set one array from a string, however I don't know how to do two from two seperate text files
I would encourage you to divide your problems don't knows and knows.
Search don't knows over internet you will get lot of ways to do it.
Then search for what you know,to explore whether it can be done in a better way.
To help you here,
Your Don't knows:
Reading file in Java.
Processing the content of read file.
Your known part :
String to array representation ( Search whether there are better ways in your use case)
Combine both :-)

Removing punctuation is not working in Java with string replacement

So, what I'm trying to do is compile a single word list with no repeats out of 8 separate dictionary word lists. Some of the dictionaries have punctuation in them to separate the words. Below is what I have that pertains to the punctuation removal. I've tried several different solutions that I've found on stack overflow regarding regex expressions, as well as the one I've left in place in my code. For some reason, none of them are removing the punctuation from the source dictionaries. Can someone tell me what it is I've done wrong here and possibly how to fix it? I'm at a loss and had a coworker check it and he says this ought to be working as well.
int i = 1;
boolean checker = true;
Scanner inputWords;
PrintWriter writer = new PrintWriter(
"/home/htarbox/Desktop/fullDictionary.txt");
String comparison, punctReplacer;
ArrayList<String> compilation = new ArrayList<String>();
while (i <9)
{
inputWords = new Scanner(new File("/home/htarbox/Desktop/"+i+".txt"));
while(inputWords.hasNext())
{
punctReplacer = inputWords.next();
punctReplacer.replaceAll("[;.:\"()!?\\t\\n]", "");
punctReplacer.replaceAll(",", "");
punctReplacer.replaceAll("\u201C", "");
punctReplacer.replaceAll("\u201D", "");
punctReplacer.replaceAll("’", "'");
System.out.println(punctReplacer);
compilation.add(punctReplacer);
}
}
inputWords.close();
}
i = 0;
The line
punctReplacer.replaceAll(",", "");
returns a new String with your replacement (which you're ignoring). It doesn't modify the existing String. As such you need:
punctReplacer = punctReplacer.replaceAll(",", "");
Strings are immutable. Once created you can't change them, and any String manipulation method will return you a new String
As strings are immutable you have to reset your variable:
punctReplacer = punctReplacer.replaceAll("[;.:\"()!?\\t\\n]", "");
(btw, immutable means that you cannot change the value once it has been set, so with String you always have to reset the variable if you want to change it)

How can I get integers from strings in my situation?

Short story: I generate random numbers and end symbol 0-9/ ( '/' is line end symbol, if I meet it, I go to write to next line in file.) When I generated my numbers and put in file, I want to get back those numbers from file but in not like strings, it should be Integers.
Assume my file looks like this:
846525451454341*
*
0067617354809629733035*
3313449117867514*
02337436891267261671546*
469980603887044*
7*
9*
642*
*
0617044835719095066*
5*
7175887168189821760*
581*
76300152922692817*
As you can noticed, line is able to hold only '*' in some cases (As I said it is generated random).
My purpose
I want to get back these lines like integers. For example I take 1 line until I meet end symbol ( '/' ) then I loop another line and so on.
Some snippet:
public void readGeneratedFile() throws IOException {
try(BufferedReader r= new BufferedReader(new FileReader("C:\\java\\numbers.txt"))){
int ch;
s = new String();
while((ch=r.read())!=-1){
s+=String.valueOf(Character.toChars(ch)).replace(" ",""); // Here I take all file into String, But this approach is leading to boilerplate code;
}
// catch IOException , finally close the file.
My question
How can I get back those lines like integers? (Suppose I want to take some actions with those numbers) It's cool if you get an idea what I want to do.
Thanks.
EDITED:
Sorry for misunderstanding, It is not what I want. I want to get back separated values, For example I have 123456/564654/21 string, and my Integer array[1][index] should looks like 1,2,3,4,5,6 then I meet end line symbol '/' I jump to array[2][index] and fill it with next line in file.
You can try like this. also.
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
BigInteger integer;
while (line != null) {
line = line.replace("*","");
integer = new BigInteger(line);
//Your stuf
line = br.readLine();
}
} finally {
br.close();
}
Your Strings are crossing the integer limit , You need BigInteger
Ex
BigInteger b = new BigInteger("7175887168189821760");
And you cannot get it back like integers since they are crossing the limit.
Use a StringBuilder as char accumulator (StringBuilder.append()) and finally
int result = Integer.parseInt(builder.toString());
This will help
Integer.parseInt(String)
Since you have bigger values greater than what an Integer can store, you can go for
Long.parseLong(String)
The other way is to use BigInteger(String) for working on the big numbers

Categories