Java read csv file as matrix - java

I'm new to writing java code as such. I have experience writing code in scripting type languages. I'm trying to rewrite a piece of code I had in python in java.
Python code below -
import pandas as pd
myFile = 'dataFile'
df = pd.DataFrame(pd.read_csv(myFile,skiprows=0))
inData = df.as_matrix()
I'm looking for a method in java that is equivalent to as_matrix in python. This function converts the data frame into a matrix.
I did look up for sometime now but can't find a method as such that does the conversion like in python. Is there a 3rd party library or something on those lines I could use? Any direction would help me a lot please. Thank you heaps.

What you want to do is really simple and requires minimal code on your part, therefore I suggest you code it yourself. Here is an example implementation:
List<String[]> rowList = new ArrayList<String[]>();
try (BufferedReader br = new BufferedReader(new FileReader("pathtocsvfile.csv"))) {
String line;
while ((line = br.readLine()) != null) {
String[] lineItems = line.split(",");
rowList.add(lineItems);
}
br.close();
}
catch(Exception e){
// Handle any I/O problems
}
String[][] matrix = new String[rowList.size()][];
for (int i = 0; i < rowList.size(); i++) {
String[] row = rowList.get(i);
matrix[i] = row;
}
What this does is really simple: It opens a buffered reader that will read the csv file line by line and paste the contents to an array of Strings after splitting them based on comma (which is your delimiter). Then it will add them to a list of arrays. I know this might not be perfect, so afterwards I take the contents of that list of arrays and turn it into a neat 2D matrix. Hope this helps.
Hint: there are a lot of improvements that could be made to this little piece of code (i.e. take care of trailing and leading spaces, add user-defined delimiters etc.), but this should be a good starting point.

Related

Is there a way to categorize code so that I can minimize something and use a comment to write what it does for organization?

This is a bit unorthodox and also I am relatively new to coding but here is what I want to do:
I am trying to make a flashcard simulator (like quizlet) to practice coding, and I have so far successfully made a program that takes a text file with terms and definitions and converts it into two arrays (terms and definitions).
Now this is taking up a lot of space for me looking at it, and I notice you can minimize certain loops and things so it just shows the top line, to reduce clutter. So I want to minimize the whole function for looking at and just write a comment next to it saying what it does, so its like a line of code.
The problem is any redundant loops (like a for loop that executes once) make the arrays unusable outside of that loop. So any ideas if I want to do this?
My best idea is a method but since I am a beginner I don't entirely know how to do that yet and my impression is that it would be outside of the main method and separate from the code which is not what I want cause I only use it once.
Thanks,
This is my code if you want it:
BufferedReader reader = new BufferedReader(new FileReader("Flashcards.txt"));
String[] terms = new String[128];
String[] defs = new String[128];
String term;
String def;
String line;
byte c = 0;
while((line = reader.readLine()) != null) {
boolean after = false;
def = "";
term = "";
for (short i = 0; i < line.length(); i++) {
if (line.charAt(i) == ':')
after = true;
else if (!after)
term = term + line.charAt(i);
else
def = def + line.charAt(i);
}
terms[c] = term;
defs[c]= def.strip();
c++;
In my personal opinion, a method isn't a bad idea, even if you are only going to use it once. A method would allow you to move all your code outside your main method and reduce it to one line of code, collapse it outside of the main method, and make your code more maintainable in the future. Editors like Eclipse (I don't know about Netbeans or InteliJ) do not allow you to collapse your code in the way that you want.

Iterate through a dictionary array

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");
Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.
I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!
Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

Setting two different text files as seperate string arrays and finding matches from the two arrays in Java

So basically i'm trying to take two text files (one with many jumbled words and one with many dictionary words.) I am supposed to take these two text files and convert them to two seperate arrays.
Following that, I need to compare jumbled strings from the first array and match the dictionary word in the second array up to it's jumbled counterpart. (ex. aannab(in the first array) to banana(in the second array))
I know how to set one array from a string, however I don't know how to do two from two seperate text files.
Use HashMap for matching. Where first text file data will be the key of Map and second text file data will be value. Then, by using key, you will get matching value.
you can read each file into an array like this:
String[] readFile(String filename) throws IOException {
List<String> stringList = new ArrayList<>();
try {
FileInputStream fis = new FileInputStream(new File(filename));
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
while ((line = br.readLine()) != null) {
stringList.add(line);
}
} finally {
br.close();
}
return stringList.toArray(new String[stringList.size()]);
}
Next, try to do the matching:
String[] jumbles = readFile("jumbles.txt");
String[] dict = readfile("dict.txt);
for (String jumble : jumbles) {
for (String word : dict) {
// can only be a match if the same length
if (jumble.length() == word.length()) {
//next loop through each letter of jumble and see if it
//appears in word.
}
}
}
I know how to set one array from a string, however I don't know how to do two from two seperate text files
I would encourage you to divide your problems don't knows and knows.
Search don't knows over internet you will get lot of ways to do it.
Then search for what you know,to explore whether it can be done in a better way.
To help you here,
Your Don't knows:
Reading file in Java.
Processing the content of read file.
Your known part :
String to array representation ( Search whether there are better ways in your use case)
Combine both :-)

Java: How to extract matching lines from a large text file fast?

Although aware that there are plenty of offered solutions to my problem in general,
I am still not satisfied with the runtime they require in my special case.
Consider a 35G large text file in FASTA format, like this:
>Protein_1 So nice and cute little fella
MTTKKCLQKFHLESLGKLGDSFLKYAISIQLFKSYENHYEGLPSIKKNKIISNAALFKLG
YARKILRFIRNEPFDLKVGLIPSDNSQAYNFGKEFLMPSVKMCSRVK*
>Protein_2 Fancy incredible description of its function
MADDSKFCFFLVSTFLLLAVVVNVTLAANYVPGDDILLNCGGPDNLPDADGRKWGTDIGS
[…] etc.
I need to extract the > lines only.
Using grep '>' proteins.fasta > protein_descriptions.txt to achieve this takes only a couple of minutes.
But using Java 7 this is now already running for over 90 minutes:
public static void main(String[] args) throws Exception {
BufferedReader fastaIn = new BufferedReader(new FileReader(args[0]));
List<String> l = new ArrayList<String>();
String str;
while ((str = fastaIn.readLine()) != null) {
if (str.startsWith(">")) {
l.append(str);
}
}
fastaIn.close();
// …
}
Does anyone have an idea of how to speed this up to grep performance?
Your help will be much appreciated.
Cheers!
If you write it to the outfile immediatelly instead of accumulating objects in the memory it will improve performance (and will be more like what you did with grep anyway).
...
BufferedWriter fastaOut = new BufferedWriter(new FileWriter(args[1]));
...
while ((str = fastaIn.readLine()) != null) {
if (str.startsWith(">")) {
fastaOut.write(str);
fastaOut.newLine();
}
}
...
fastaOut.close();
The biojava.org provides a fasta reader.
For reading huge files you would have to consider using a SeekableByteChannell and using the ByteBuffers.
The biojava library uses bytebuffers.
You could probably speed this up considerably using multiple threads. If the file is X bytes long, and you have n threads, you start each thread at X/n intervals, and read X/n bytes. You will want to synchronize your ArrayList to ensure your results are added correctly

Get csv and compare lines. ArrayList? Java

i dont't use java very often and now i got some Problem.
I want to read a CSV file like this one:
A,B,C,D
A,B,F,K
E,F,S,A
A,B,C,S
A,C,C,S
Java don't know dynamic arrays, so i choose an ArrayList. This works so far. The Problem is:
How can I store the ArrayList? I think an other ArrayList would help.
This is what I got:
BufferedReader reader = new BufferedReader(
new InputStreamReader(this.getClass().getResourceAsStream(
"../data/" + filename + ".csv")));
List rows = new ArrayList();
String line;
while ((line = reader.readLine()) != null) {
rows.add(Arrays.asList(line.split(",")));
}
Now I get an ArrayList with a size of 5 for rows.size().
How do I get row[0][0] for example?
What do I want to do? The Problem is i want to find the same row except the last column.
For example i want to find row 0 and row 3.
thank you very much
Thank you all! You helped me a lot. =) Maybe Java and I will become friends =) THANKS!
You don't need to know the row size in advance, String.split() returns a String array:
List<String[]> rows = new ArrayList<String[]>();
String line = null;
while((line = reader.readLine()) != null)
rows.add(line.split(",", -1));
To access a specific row:
int len = rows.get(0).length;
String val = rows.get(0)[0];
Also, are you always comparing by the entire row except the last column? You could just take off the last value (line.replaceFirst(",.*?$", "")) and compare the rows as strings (have to be careful of whitespace and other formatting, of course).
A slightly different way:
Set<String> rows = new HashSet<String>();
String line = null;
while((line = reader.readLine()) != null){
if(!rows.add(line.substring(0, line.lastIndexOf(','))))
System.out.println("duplicate found: " + line);
}
Of course, modify as necessary if you actually need to capture the matching lines.
You'll need to declare an ArrayList of arrays. Asuming that csv file has a known number of columns, the only dynamic list needed here are the "rows" of your "table", formed by an ArrayList(rows) of arrays char[] (columns). (If not, then an ArrayList of ArrayList is fine).
It's just like a 2D table in any other language: an array of arrays. Just that in this case one of the arrays needs to be dynamic.
To read the file you'll need two loops. One that reads each line, just as you're doing, and another one that reads char per char.
Just a quick note: if you are going to declare an array like this:
char[] row = new char[5];
and then going to add each row to the ArrayList like this:
yourList.add(row);
You will have a list full of pointers to the same array. You'll need to use the .clone() method like this:
yourList.add(row.clone());
To access it like table[1][2], you'll need to use arraylist.get(1).get(2);

Categories