I've built a document classification classifier by following the MALLET example here http://mallet.cs.umass.edu/classifier-devel.php
What I'd like to do next is get the most influential features for each class. I'm sure this is something simple but I haven't been able to find how to do this from Java.
Any help is appreciated.
I was running into the same problem. Here's what worked for me. (Not completely self-contained, e.g. assuming that you have a classifier already, and some test data)
PrintWriter debugOut = new PrintWriter(new File(<filePath>));
InstanceList testInstances = new InstanceList(classifier.getInstancePipe());
CsvIterator reader = new CsvIterator(new FileReader(<path_to_testdata>), \\w+)\\s+(\\w+)\\s+(.*)", 3, 2, 1); // (data, label, name) field indices
testInstances.addThruPipe(reader);
PerLabelInfoGain plig = new PerLabelInfoGain (testInstances);
Alphabet alpha = classifier.getAlphabet();
LabelAlphabet la = classifier.getLabelAlphabet();
debugOut.println("debugging label numbers: " + la.size());
for (int q = 0 ; q < la.size(); q++){
debugOut.println("Class: " + la.lookupLabel(q));
for (int j = 0; j < 10; j++){
int alphaId = plig.getInfoGain(q).getIndexAtRank(j);
Object label = alpha.lookupObject(alphaId);
debugOut.println(j + "\t" + plig.getInfoGain(q).getValueAtRank(i) + "\t" + label);
}
debugOut.println("===============");
}
debugOut.close();
Resulting in:
debugging label numbers: 3
Class: sexism
0 0.1257616291393775 sexist
1 0.1257616291393775 rt
2 0.1257616291393775 female
3 0.1257616291393775 notsexist
4 0.1257616291393775 m
5 0.1257616291393775 women
6 0.1257616291393775 mt8_9
7 0.1257616291393775 sports
8 0.1257616291393775 islam
9 0.1257616291393775 men
===============
Class: none
0 0.09383300761779656 sexist
1 0.09383300761779656 mkr
2 0.09383300761779656 female
3 0.09383300761779656 muslims
4 0.09383300761779656 rt
5 0.09383300761779656 notsexist
6 0.09383300761779656 women
7 0.09383300761779656 islam
8 0.09383300761779656 mt8_9
9 0.09383300761779656 mohammed
===============
Class: racism
0 0.062072998255453926 islam
1 0.062072998255453926 muslims
2 0.062072998255453926 mkr
3 0.062072998255453926 mohammed
4 0.062072998255453926 muslim
5 0.062072998255453926 maxblumenthal
6 0.062072998255453926 quran
7 0.062072998255453926 years
8 0.062072998255453926 prophet
9 0.062072998255453926 1400
===============
EDIT: plig.getInfoGain(q).getValueAtRank(i) should obviously be plig.getInfoGain(q).getValueAtRank(j)
Related
we have a 15000000 entry database and executing a SELECT like
SELECT technical_id, attribute_a, attribute_b, attribute_c from test WHERE ( attribute_a_fltr #> (?::character varying[]) OR attribute_a_fltr = '{n/a}') AND ( attribute_b_fltr #> (?::character varying[]) OR attribute_b_fltr = '{n/a}') AND ( attribute_c = ? OR attribute_c = 0)
will slow down dramatically after 9 tries. Here are the results
Iteration: 0 Entries: 593 time :6931
Iteration: 1 Entries: 593 time :7879
Iteration: 2 Entries: 593 time :8721
Iteration: 3 Entries: 593 time :9490
Iteration: 4 Entries: 593 time :10240
Iteration: 5 Entries: 593 time :11016
Iteration: 6 Entries: 593 time :11736
Iteration: 7 Entries: 593 time :12461
Iteration: 8 Entries: 593 time :13168
Iteration: 9 Entries: 593 time :152329
Iteration: 10 Entries: 593 time :290717
Iteration: 11 Entries: 593 time :435933
Iteration: 12 Entries: 593 time :567401
Iteration: 13 Entries: 593 time :695307
Iteration: 14 Entries: 593 time :835853
Here comes the Java code
Connection connection = DriverManager.getConnection("jdbc:postgresql://localhost:5432/test-db", props);
PreparedStatement prepStatement = connection.prepareStatement(FILTER_7);
prepStatement.setString(1,"{AAAAAAIAAAAAIAAAAAAAAgAAAAAgAAAAAAACAAAAACAAAAAAAAIAAgAAIAAgAAAAAgACAAAgACAAAAAAAAIA}");
prepStatement.setString(2,"{gAAAAAAQAAAAAAIAAABAAAAAAAgAAAAAAQAAACAAAAAABAAAAIAAAAAAEAAAAAACAAAAQAAAAgAIAEAAAAAA}");
prepStatement.setInt(3, 1979);
long t0 = System.currentTimeMillis();
long iter = 0;
while (true) {
ResultSet resultSet = prepStatement.executeQuery();
long count = 0;
while(resultSet.next()) {
++count;
}
System.out.println("Iteration: "+iter +" Entries: "+ count + " time :" + (System.currentTimeMillis() - t0));
++iter;
}
This question already has answers here:
extract data column-wise from text file using Java
(2 answers)
Closed 4 years ago.
I have a file txt. This is the file:
Team P W L D F A Pts
1. Arsenal 38 26 9 3 79 - 36 87
2. Liverpool 38 24 8 6 67 - 30 80
3. Manchester_U 38 24 5 9 87 - 45 77
4. Newcastle 38 21 8 9 74 - 52 71
5. Leeds 38 18 12 8 53 - 37 66
6. Chelsea 38 17 13 8 66 - 38 64
7. West_Ham 38 15 8 15 48 - 57 53
8. Aston_Villa 38 12 14 12 46 - 47 50
9. Tottenham 38 14 8 16 49 - 53 50
How can I get only the name of teams? I tried to use the regex in the following way but don't work:
FileReader f;
f=new FileReader("file.txt");
BufferedReader b;
b=new BufferedReader(f);
s=b.readLine();
String[] name = s.split("\\w+");
for(int i=0;i<name.length;i++)
System.out.println(name[i]);
How do I solve? Thanks to everyone in advance!
FileReader f;
f=new FileReader("file.txt");
BufferedReader b;
b=new BufferedReader(f);
while(s=b.readLine()!=null){
Matcher name=Pattern.compile("(?<=\\d\\.\\s)\\S+").matcher(s);
if(name.find())
System.out.println(name.group());
}
here the regex (?<=\\d\\.\\s)\\S+ will match only the name after the serial no. Regex
If you want to read line by line and your file has structure as you presented. These code enable you to get clubs names.
File f = new File("file.txt");
Scanner sc = new Scanner(f);
sc.nextLine();
while (sc.hasNextLine()) {
String[] name = sc.nextLine().split("\\s+");
System.out.println(name[1]);
}
try to use replaceAll, find all word characters (a-zA-Z_) and replace them all with empty. this gives team name.
s=b.readLine();
s.replaceAll("[^a-zA-Z_]+","");
System.out.println(s);
Your string s is one line:
1. Arsenal 38 26 9 3 79 - 36 87
All you need to do is split by space and get second entry:
s.split(" ")[1]
RegEx is overkill here. Do this for each line and add the name to a list at each step.
FILE THATS BEING READ
Rob Gronkowski 48
Zach Ertz 34
Travis Kelce 29
Evan Engram 15
Jimmy Graham 12
Cameron Brate 10
Delanie Walker 9
Kyle Rudolph 6
Austin Seferian-Jenkins 6
Jack Doyle 6
Hunter Henry 5
Jason Witten 4
Jordan Reed 4
Vernon Davis 3
Jared Cook 3
Tyler Kroft 3
Ed Dickson 3
Charles Clay 3
George Kittle 3
Antonio Brown 67
DeAndre Hopkins 62
A.J. Green 62
Mike Evans 62
Julio Jones 56
Michael Thomas 55
Dez Bryant 53
Michael Crabtree 45
Brandin Cooks 42
Tyreek Hill 42
Doug Baldwin 42
Keenan Allen 32
Jarvis Landry 29
Will Fuller 29
Amari Cooper 29
Stefon Diggs 29
Alshon Jeffery 27
Nelson Agholor 24
Adam Thielen 24
Chris Hogan 24
Golden Tate 24
Demaryius Thomas 22
Jordy Nelson 22
Larry Fitzgerald 22
DeSean Jackson 21
JuJu Smith-Schuster 19
Devante Parker 18
Devin Funchess 18
Kelvin Benjamin 18
T.Y. Hilton 17
Emmanuel Sanders 17
Marvin Jones 15
Rishard Matthews 14
Pierre Garcon 14
Cooper Kupp 14
Sterling Shepard 14
Paul Richardson 11
Danny Amendola 10Le’Veon Bell 70
Kareem Hunt 63
Todd Gurley 63
Leonard Fournette 60
Melvin Gordon 60
LeSean McCoy 60
Mark Ingram 50
Devonta Freeman 50
Jordan Howard 50
Lamar Miller 41
Doug Martin 34
Carlos Hyde 34
Aaron Jones 27
Alvin Kamara 27
Jerick McKinnon 24
DeMarco Murray 21
Chris Thompson 21
Jay Ajayi 21
Joe Mixon 18
C.J. Anderson 17
Tevin Coleman 17
Christian McCaffrey 17
Derrick Henry 16
Alex Collins 16
Dion Lewis 15
Adrian Peterson 13
Duke Johnson 12
Marshawn Lynch 11
Ameer Abdullah 10
Bilal Powell 9
LeGarrette Blount 9
Marlon Mack 9
James White 8
Ezekiel Elliott 7
Latavius Murray 7
Frank Gore 7
Isaiah Crowell 7
Orleans Darkwa 7
Kenyan Drake 5
Matt Forte 5
Darren McFadden 5
Alfred Morris 5
Damien Williams 3
Tarik Cohen 3
Jonathan Stewart 3
Robert Kelley 3
Danny Woodhead 3
Ty Montgomery 2
Javorius Allen 2
Mike Gillislee 2
Thomas Rawls 2
Theo Riddick 2
DeAndre Washington 2
Eddie Lacy 2
Giovani Bernard 2
Andre Ellington 2
Austin Ekeler 2
Jalen Richard 2
Ted Ginn 10
Robby Anderson 10
Jermaine Kearse 9
Davante Adams 9
Kenny Stills 9
Sammy Watkins 9
Marqise Lee 5
Mohamed Sanu 5
Allen Hurns 5
Josh Doctson 5
Jamison Crowder 4
Jeremy Maclin 3
Randall Cobb 3
Tyrell Williams 3
Robert Woods 3
Corey Davis 3
Jordan Matthews 3
Tyler Lockett 3
John Brown 2
Willie Snead 2
Donte Moncrief 2
Deshaun Watson 31
Dak Prescott 26
Tom Brady 24
Russell Wilson 22
Drew Brees 22
Carson Wentz 20
Alex Smith 14
Kirk Cousins 13
Matthew Stafford 11
Marcus Mariota 11
Tyrod Taylor 11
Cam Newton 11
Matt Ryan 11
Philip Rivers 8
having some problems been looking all over for answers. I found out my for loop iteration is incorrect it prints the series:0,1,2,10 etc. I was wondering if someone can point out my flaw, so I can fix this. I apprectiate anyone reading this, and appolgozie for the length of code. But just wanted to include everything so I don't miss anything. FOR LOOP LINE 87 thanks again, sincerely java noob
CODE
package trades;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class Fantasy {
public static void main(String[] args) {
int[] playerRanking = new int[75];
String infoComingIn = null;
//Finding file path
String filename = "C:\\Users\\Karanvir\\Desktop\\21days\\players.txt";
File filez = new File(filename);
BufferedReader br;
String[] playerNames = new String[75];
int counterOfReadLines = 0;
Pattern p = Pattern.compile("[0-9]{2,3}");
ArrayList<Integer> arrayList = new ArrayList<Integer>();
try {
br = new BufferedReader(new FileReader(filez));
playerNames[counterOfReadLines] = br.readLine();
while (br.readLine() != null) {
counterOfReadLines = counterOfReadLines + 1;
playerNames[counterOfReadLines] = br.readLine();
System.out.println(playerNames[counterOfReadLines - 1]);
}
br.close();
for (int i = 0; i < playerNames.length; i++) {
Matcher m = p.matcher(playerNames[i]);
if (m.find()) {
String matched = m.group(0);
int addToArray = Integer.parseInt(matched);
playerRanking[i] = addToArray;
System.out.println(i);
}
}
} catch (Exception e) {}
}
}
Okay, so by seeing the post, I can point out only one issue. Since you are incrementing counterOfReadLines variable before the line
playerNames[counterOfReadLines] = br.readLine();
so what happens is playerNames is initializing with the array of index 1 not 0 and when you are trying to call the loop below:-
for (int i = 0; i < playerNames.length; i++) {
Matcher m = p.matcher(playerNames[i]);
if (m.find()) {
String matched = m.group(0);
int addToArray = Integer.parseInt(matched);
playerRanking[i] = addToArray;
System.out.println(i);
}
it is incrementing with 0. So either start it from i=1 or increment the counterOfReadLines after the line
playerNames[counterOfReadLines] = br.readLine();
so your error will go away...! if not let me know... :) !
As the title says, I'd like to store per line(per System.out.print) in an array/arrayList.
So far I have tried ByteArrayOutputStream but it only appends everything into one object. I'd be glad to post snippets of the code if necessary. Sorry for the noob problem
Edit Code :
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos);
PrintStream old = System.out;
String[] str= new String[10];
System.setOut(ps);
for(int x=0;x<str.length;x++){
ps.println("Test: "+x);
str[x] = baos.toString();
}
System.out.flush();
System.setOut(old);
for(int x=0;x<str.length;x++){
System.out.println(str[x]);
}
Output:
Test: 0
Test: 0
Test: 1
Test: 0
Test: 1
Test: 2
Test: 0
Test: 1
Test: 2
Test: 3
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 5
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 5
Test: 6
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 5
Test: 6
Test: 7
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 5
Test: 6
Test: 7
Test: 8
Test: 0
Test: 1
Test: 2
Test: 3
Test: 4
Test: 5
Test: 6
Test: 7
Test: 8
Test: 9
What I would like to have in str array is something like this:
str[0] = "Test: 0"
str[1] = "Test: 1"
str[2] = "Test: 2"
str[3] = "Test: 3"
str[4] = "Test: 4"
str[5] = "Test: 5"
str[6] = "Test: 6"
str[7] = "Test: 7"
str[8] = "Test: 8"
str[9] = "Test: 9"
I also looked for something like deleting the value inside ByteArrayOutputStream but no luck.
Based on the code you post, I made a little modification, see if this is what you are looking for.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos);
PrintStream old = System.out;
String[] str= new String[10];
System.setOut(ps);
for(int x=0;x<str.length;x++){
ps.println("Test: "+x);
str[x] = baos.toString();
baos.reset();
}
System.out.flush();
System.setOut(old);
for(int x=0;x<str.length;x++){
System.out.println(str[x]);
}
The trick is that every time when the line str[x] = baos.toString() executed, the accumulated output are still there, so you need use reset() to discard the accumulated data, for more details about reset(), please refer to the official document here
I am trying to compare File 1 and File 2.
File 1:
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 1
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 1
File 2:
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 -1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 -1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
In both files the last element in each line is class label.
I am comparing if the class labels are equal.
ie compare the classlabel of
line1:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
with
line2:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
Matches.
compare
line1:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
with
line2:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
Not matches
Updated
What I did is
String line1;
String line2;
int notequalcnt = 0;
while((line1 = bfpart.readLine())!=null){
found = false;
while((line2 = bfin.readLine())!=null){
if(line1.equals(line2)){
found = true;
break;
}
else{
System.out.println("not equal");
notequalcnt++;
}
}
}
But I am getting every one as not equal.
Am I doing anything wrong.
After the first iteration itself, line2 becomes null. So, the loop will not execute again... Declare line2 buffer after the first while loop. Use this code:
public class CompareFile {
public static void main(String args[]) throws IOException{
String line1;
String line2;
boolean found;
int notequalcnt =0;
BufferedReader bfpart = new BufferedReader(new FileReader("file1.txt"));
while((line1 = bfpart.readLine())!=null){
found = false;
BufferedReader bfin = new BufferedReader(new FileReader("file2.txt"));
while((line2 = bfin.readLine())!=null){
System.out.println("line1"+line1);
System.out.println("line2"+line1);
if(line1.equals(line2)){
System.out.println("equal");
found = true;
break;
}
else{
System.out.println("not equal");
}
}
bfin.close();
if(found==false)
notequalcnt++;
}
bfpart.close();
}
}
You're comparing every line from file 1 with every line from file 2, and you are printing "not equal" every time any one of them doesn't match.
If file 2 has 6 lines, and you are looking for a given line from file 1 (say it's also in file 2), then 5 of the lines from file 2 won't match, and "not equal" will be output 5 times.
Your current implementation says "if any lines in file 2 don't match, it's not a match", but what you really mean is "if any lines in file 2 do match, it is a match". So your logic (pseudocode) should be more like this:
for each line in file 1 {
found = false
reset file 2 to beginning
for each line in file 2
if line 1 equals line 2
found = true, break.
if found
"found!"
else
"not found!"
}
Also you describe this as comparing "nth line of file 1 with nth line of file 2", but that's not actually what your implementation does. Your implementation is actually comparing the first line of file 1 with every line of file 2 then stopping, because you've already consumed every line of file 2 in that inner loop.
Your code has a lot of problems, and you probably need to sit back and work out your logic on paper first.
If the target is to compare and find the matching lines. Convert the file contents to an arraylist and compare the values.
Scanner s = new Scanner(new File("file1.txt"));
ArrayList<String> file1_list = new ArrayList<String>();
while (s.hasNext()){
file1_list .add(s.next());
}
s.close();
s = new Scanner(new File("file2.txt"));
ArrayList<String> file2_list = new ArrayList<String>();
while (s.hasNext()){
file2_list .add(s.next());
}
s.close();
for(String line1 : file1_list ){
if(file2_list.contains(line1)){
// found the line
}else{
// NOt found the line
}
}
Check Apache file Utils o compare files.
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html