BufferedReader and enumerating multiple lines in Java - java

I am in the process of making a java application that reads through a .ttl file line by line and creates a graphml file to represent the ontology.
I am having some trouble figuring out how to enumerate a certain section.
I am using BufferedReader to read each line.
For example, I have the following:
else if (line.contains("owl:oneOf")){
// insert code to enumerate list contained in ( )
}
And this is what the .ttl looks like for oneOf:
owl:oneOf (GUIFlow:ExactlyOne
GUIFlow:OneOrMore
GUIFlow:ZeroOrMore
GUIFlow:ZeroOrOne )
I need to return those 4 objects as one list, to be used as part of a graphical representation of an ontology.

Apparently you have some kind of loop going through the file. Here are some ideas:
1) Introduce a "state" into the loop so that upon reading the next line it will know that it's actually inside the oneOf list. A dynamic array to store the list can serve as the state. You create the list when encountering the (, and you send the list wherever it is needed when encountering the ) and then delete the list after that. A complication is that according to your source format you will have to create the list before adding values to it, and process and delete the list after adding values, because ( and ) are on the same lines as actual values.
Vector<String> oneOfList = null;
while(reader.ready()){
String line=reader.readLine();
if(line.contains("foo")){
...
}
else if (line.contains("owl:oneOf")){
oneOfList = new Vector<String>();
}
if(oneOfList!=null){
String str = line.trim();
int a = str.indexOf("("); // -1 if not found, OK
int b = str.indexOf(")");
if(b<0) b=str.length();
oneOfList.add(str.substring(a+1,b).trim());
}
if (line.contains(")")){
storeOneOf(oneOfList);
oneOfList=null;
}
}
2) When the oneOf header is encountered, create another small loop to read its values. A possible drawback may be that you end up with two loops iterating over the file and two calls to reader.readLine, which may complicate things or may not.
while(reader.ready()){
String line=reader.readLine();
if(line.contains("foo")){
...
}
else if (line.contains("owl:oneOf")){
Vector<String> oneOfList = new Vector<String>();
while(true){
String str = line.trim();
int a = str.indexOf("("); // -1 if not found, OK
int b = str.indexOf(")");
int c = (b>=0) ? b : str.length();
oneOfList.add(str.substring(a+1,c).trim());
if(b>=0) break;
line=reader.readLine();
}
storeOneOf(oneOfList);
}
}
3) The above algorithms rely on the fact that the header, the ( and the first value are on the same line, etc. If the source file is formatted a bit differently, the parsing will fail. A more flexible approach may be to use StreamTokenizer which automatically ignores whitespace and separates the text into words and stand-alone symbols:
StreamTokenizer tokzr=new StreamTokenizer(reader);
tokzr.wordChars(':',':');
while( tokzr.nextToken() != tokzr.TT_EOF ){
if( tokzr.ttype==tokzr.TT_WORD && tokzr.sval.equals("foo") ){
...
}
else if ( tokzr.ttype==tokzr.TT_WORD && tokzr.sval.equals("owl:oneOf") ){
if(tokzr.nextToken()!='(') throw new Exception("\"(\" expected");
Vector<String> oneOfList = new Vector<String>();
while(tokzr.nextToken() == tokzr.TT_WORD){
oneOfList.add(tokzr.sval);
}
storeOneOf(oneOfList);
if(tokzr.ttype!=')') throw new Exception("\")\" expected");
}
}

Have you considered (and rejected) existing solutions e.g: Jena ?

Related

Looking for elegant way of searching through an array of strings for duplicate entries. Brute force method works

I have a file of alphanumeric VIN numbers from vehicles (saved as strings). I need to parse through this file and determine
1) Is a VIN duplicated? If so, how many times
2) Write the duplicated VIN and the total number of duplicates to a text file
I have gotten it to work using the brute force method dual nested For loops. Am looking for a more elegant way to parse the strings. I'm using Java 7 in NetBeans 8.2 and it doesn't appear to like using the .set or hashmap.
Constraints
1) The VINs may be in any order
2) The duplicates can be scattered through the file at random
/* a) Open input and output files
*/
try {
inputStream = new BufferedReader(new FileReader(fileName));//csv file
outputStream = new PrintWriter(new FileWriter("DuplicateVINs.txt"));
/* b) Read in file line by line
then slice out the 17 digit VIN from the extra data I don't care about
*/
while ((thisLine = inputStream.readLine()) != null) {
l = thisLine.substring(1, 18);
linesVIN.add(l.split(","));//why does this split have to be here?
}
/*c) Now that the List is full calculate its size and then write to array of strings
*/
String[][] inputArray = new String[linesVIN.size()][];
i=linesVIN.size();
System.out.println(i);
linesVIN.toArray(inputArray);
/* d) Will use two nested For loos to look for duplicates
*/
countj=0;
countk=0;
for (int j = 1;j<=i-1; j++){ //j loop
duplicateVIN=Arrays.toString(inputArray[j]);
for(int k=1;k<=i-1;k++){
if(duplicateVIN.equals(Arrays.toString(inputArray[k]))){
countk=countk+1;
foundFlag=true;
} else{
//
if(countk>=2){
//if(j!=k){
System.out.println(duplicateVIN + countk);
//} // see if removes the first duplicate
}
foundFlag=false;
countk=0;
}
} //ends k loop
countj=j;
} //ends j loop
} //Completes the try
[2q3CDZC90JH1qqqqq], 3
[2q4RC1NG1JR1qqqqq], 4
[2q3CDZC96KH1qqqqq], 2
[1q4PJMDN8KD1qqqqq], 7
I'm using Java 7 in NetBeans 8.2 and it doesn't appear to like using the .set or hashmap.
Your first step should be to figure out what you're doing wrong with a map. A hashmap is the perfect solution for this problem, and is really what you should be using.
Here's a broad example of how the solution would work, using the information you provided.
Map<String,Integer> countMap = new HashMap<String,Integer>();
while ((thisLine = inputStream.readLine()) != null) {
l = thisLine.substring(1, 18);
if(countMap.containsKey(l)){
countMap.put(l, countMap.get(l)+1);
}else{
countMap.put(l,1);
}
}
I'm assuming that the while loop your provided is properly iterating over all VIN numbers.
After this while loop is completed you would just need to output the values of each key, similar to this:
for(String vin : countMap.keySet()){
System.out.println("VIN: "+vin+" COUNT: "+countMap.get(vin));
}
If I've read your problem correctly, there is no need for a nested loop.

Iterate through a dictionary array

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");
Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.
I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!
Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

Checking each line of data in a text file and identifying invalid data

so i've looked around and could'nt find anything specificaly related to what i'm wanting to accomplish, so i'm here to ask some of you folks if ya'll could help. I am a Uni student, and am struggling to wrap my head around a specfific task.
The task revolves around the following:
Being able to have the program we develop check each line of data in a file we input, and report any errors (such as missing data) to the console via messages.
I am currently using Scanner to scan the file and .split to split the text at each hyphen that it finds and then placing that data into a String[] splitText array... the code for that is as follows:
File Fileobject = new File(importFile);
Scanner fileReader = new Scanner(Fileobject);
while(fileReader.hasNext())
{
String line = fileReader.nextLine();
String[] splitText = line.split("-");
}
The text contained within the file we are scanning, is formatted as follows:
Title - Author - Price - Publisher - ISBN
Title, Author and Publisher are varying lengths - and ISBN is 11characters, Price is to two decimal places. I am able to easily print Valid data to the console, though it's the whole validating and printing errors (such as: "The book title may be missing.") to the console which has my head twisted.
Would IF statements be suited to checking each line of data? And if so, how would those be structured?
If you want to check the length/presence of each of the five columns, then consider the following:
while (fileReader.hasNext()) {
String line = fileReader.nextLine();
String[] splitText = line.split("-");
if (splitText.length < 5) {
System.out.println("One or more columns is entirely missing.");
continue; // skip this line
}
if (splitText[0].length == 0) {
System.out.println("Title is missing")
}
if (splitText[1].length == 0) {
System.out.println("Author is missing")
}
boolean isValidPrice = true;
try {
Double.parseDouble(splitText[2]);
}
catch (Exception e) {
isValidPrice = false;
}
if (!isValidPrice) {
System.out.println("Found an invalid price " + splitText[2] + " but expected a decimal.");
}
if (splitText[4].length != 11) {
System.out.println("Found an invalid ISBN.");
}
I do a two level validation above. If splitting the current line on dash does not yield 5 terms, then we have missing columns and we do not attempt to even guess what data might actually be there. If there are 5 expected columns, then we do a validation on each field by length and/or by expected value.
Yes, your best bet is to use if statements (I can't think of another way?). For cleanliness, I recommend you create a validateData(String data) method, or multiple validator functions.
For example, because you know each line is going to be in the Title - Author - Price - Publisher - ISBN format, you can write code like this:
public void validatePrice(String data) {
//Write your logic to validate.
}
public void validateAuthor(String data) {
//Write your logic to validate.
}
...
Then in your while loop you can call
validatePrice([splitText[0]);
validateAuthor([splitText[1]);
for each validator method.
Depending on your needs you can turn this more a bit more OOP style, but this is one cleanish way to do it.
The first thing you want to check for validation is that you have the proper number of entries (in this case check that the array is of size 5), and after that, you want to check each piece of data
If statements are a good way to go, and you can do something as simple as:
if(title.isBad()) print("error");
if(author.isBad()) print("error");
if(price.isBad()) print("error");
if(publisher.isBad()) print("error");
if(isbn.isBad()) print("error");
Replacing the .isBad with which ever clauses you are checking, such as string[i].isEmpty(), the length of the ISBN, etc.
For ones that take longer to check, such as the Price, you'll want to make some nested for loops, checking if it contains a period, contains only numbers, and on'y has 2 digits after the period.
Something helpful to know is the Wrapper classes for the primitive data types, if allows you to do
Character.isLetter(strings[i].charAt[j])
in the place of
(strings[i].charAt[j] >= 'A' && strings[i].charAt[j] <= 'Z') &&
(strings[i].charAt[j] >= 'a' && strings[i].charAt[j] <= 'z')
and
try{
Double.parseDouble(strings[i]);
}
instead of manually checking the price.
Hope this helps!

How to read an empty set from a text file in Java

I have 3 String fields per line within my text file. There are 4 lines in total. The first 2 fields (field[0] and field[1]) are already filled in but field 3 (field[2]) is yet to be generated so it shall remain empty. Is there any way I can read in this text file line by line without getting a java.lang.ArrayIndexOutOfBoundsException: 1 error? I have included my code used for reading in the file.
import java.io.*;
public class PassGen {
public static void main(String args[]) throws Exception{
BufferedReader inKb = new BufferedReader(new InputStreamReader(System.in));
BufferedReader inF = new BufferedReader(new FileReader(new File("students.txt")));
String line = inF.readLine();
int cnt = 0;
Student pupil[] = new Student[6];
while(line != null) {
String field[] = line.split("//s");
pupil[cnt] = new Student(field[0], field[1], field[2]);
cnt++;
inF.readLine();
}
}
}
You can simply add a check on the number of fields:
if(field.length > 2) {
pupil[cnt] = new Student(field[0], field[1], field[2]);
} else {
pupil[cnt] = new Student(field[0], field[1], null);
}
Alternatively, you can use the overloaded split method that takes a limit parameter and set that to -1 to include the empty field. From the documentation of String#split(String regex, int limit):
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Note that you need to use \\s instead of //s for the whitespace regex (this needs to be corrected either way).
String field[] = line.split("\\s", -1);
I think you problem lies in the way you are managing your data, but you can have something like this to read from any array and not getting any exceptions:
public static String getIfExists(final String[] values, final int position) {
return (values != null) && (values.length > position) ? values[position] : null;
}
Then you can fill every field like new Student(getIfExists(field, 0), getIfExists(field, 1), getIfExists(field, 2));
Of course you can optimize this a little bit more...but that would make the trick without having to think on how many fields you might get in the future or having a lot of if/case conditions.

Java CSV data into array

I'm having some trouble trying to turn an excel file into an arraylist or just an array containing information stored in different cells.
The information is stored in excel like this example:
Owner's info ; Car's Owner ; Car's seller;
Date; Car brand ; Number of doors ; Car license plate ; Car color ;
Price
2.3.2013 ; Fiat ; 4 ; 23-21-AA ; black ; 10.000
2.1.2014 ; Renault ; 4 ; 23-12-BA ; blue ; 25.000
I will need to access information such as getBrand() , getLicense etc, so I wanted to store this different information into arraylists, OwnerInfo[ Owner[] , Seller[] ]
Later I would like to sum the car prices or something else, and because of that I'd like to access CarInfo[6] and sum them all.
I'm kinda lost on this, need some suggestions or tips.
public static void main(String args[]) {
try {
FileInputStream fstream = new FileInputStream(
"file.csv");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null) {
String[] tokens = strLine.split(";");
for (int i = 0; i < tokens.length; i++) {
System.out.println(tokens[i]);
}
}
in.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Another question is when I use
for (int i = 0; i < tokens.length; i++) {
System.out.println(tokens[0]);
}
It will print first column of excel (.csv) , but if I print tokens[1] it won't print anything. Why's that?
Also, if I do the same thing on a .txt file it will print the second "column".
May be you could re-consider your "design". Now what you are doing is, store each line in a java array. This is not convenient for you later processing (as you described).
You can think about the following two alternatives:
If you love array, you can build a map<String, String[] (or number)> the first string, is the label of your header line, like number of door, car owner... of course you can use some short name for that., and the arrays, are the columns. Thus, if you want some column data, you just map.get(name). If you could use third party lib, consider some extended map type like multimap from guava to ease your implementation.
You can build your own type (class) for each row, like CarData, and generate each CarData object for each row, pack them into a Collection. If it was required, you could build some helper method to get the interesting values from the list.
personally I prefer the 2nd one, since it is flexible. Think about if you have requirement in future like sorting (Comparator), outputing line in other format, reporting .....
I hope I understood your problem right and hope the text above helps.
You can use https://github.com/CyborTronik/fluent-ssv for transforming CSV to beans. In your case you need to provide values separator to stream builder.
So you will have something like:
carsStream = new SsvStreamBuilder<Car>()
.withSeparator(";")
.forEntity(Car.class)
.stream("~/path/to/file");
And voila, use stream of beans instead of arrays.

Categories