How to create pair from two stack, in a clever way - java

I have a two stack; Stack<String> file and Stack<String[]>author. They have one-to-one relationship, i.e
file author
file1 author3, author2 // file1 is written by author3 and author2
file2 author1, author2 // file2 is written by author1 and author2
I have tried to create new data structure, ( I though Map is best ) to contain all information in pair. For example;
new data structure
author1, file2
author2, file1, file2
author3, file1
To create this pair, I have used HashMap<String, Set<String> allInfo, and implemented the concatanation as;
int len = author.size();
for(int i = 0 ; i <len ; i ++ ){
String []temp = author.pop();
int len2 = temp.length();
for(int j = 0 ; j <len2 ; j ++ ){
if(allInfo.contains(temp[j]) == false){
Set<String> list = new HashSet<String>();
allInfo.put(temp[j], list);
}
Set<String> temp2 = allInfo.get(temp[j]);
temp2.add(file.pop());
}
}
However, it seems this implementation is so ugly. How can I create this pair more cleverly? ( Relying on built-in method of Java is preferred. )

The code below is only a little better. There are (non-JDK) libraries around that provide a data structure called multimap, which is more convenient. But you are stuck with the two stacks, and the inverse ordering of associations, so you'll need a little coding effort.
while( ! author.empty() ){
String f = file.pop(); // Note that in your code this is in the wrong place
for( String aut: author.pop() ){
Set<String> files = allInfo.get( aut );
if( files == null ){
files = new HashSet<>();
allInfo.put( aut, files );
}
files.add( f );
}
}

How about having a custom type for your problem?
public class AuthorPublication{
private String authorName;
private Set<String> files;
//setters and getters
}

Related

Java Reflection with arrays based upon inner class definition

Using Java Reflection:
How do I generically access arrays of other objects to retrieve their values ?
Given this Java structure:
class Master
{
static class innerThing
{
static StringBuilder NumOfThings = new StringBuilder( 2);
static class Thing_def
{
static StringBuilder field1 = new StringBuilder( 3);
static StringBuilder field2 = new StringBuilder( 3);
static StringBuilder field3 = new StringBuilder(13);
}
static Thing_def[] Things = new Thing_def [2];
static { for (int i=0; i<Things.length; i++) Things[i] = new Thing_def(); }
}
}
Using Reflection in this bit of code:
Field[] FieldList = DataClass.getDeclaredFields();
if (0 < FieldList.length )
{
SortFieldList( FieldList );
System.out.println();
for (Field eachField : FieldList)
{
String fldType = new String( eachField.getType().toString() );
if ( fldType.startsWith("class [L") )
System.err.printf("\n### fldType= '%s'\n", fldType); //$$$$$$$$$$$$$$$
if ( fldType.startsWith("class java.lang.StringBuilder") )
{
g_iFieldCnt++;
String str = DataClass.getName().replaceAll("\\$",".");
System.out.printf("%s.%s\n", str, eachField.getName() );
}//endif
}//endfor
}//endif
I get the following output:
(Notice that it shows one copy of the fields in Thing_def.)
Master.innerThing.NumOfThings
### fldType= 'class [LMaster$innerThing$Thing_def;'
Master.innerThing.Thing_def.field1
Master.innerThing.Thing_def.field2
Master.innerThing.Thing_def.field3
In another part of the system I access the fields to generate a CSV file:
Field[] FieldList = DataClass.getDeclaredFields();
if (0 < FieldList.length )
{
for (Field eachField : FieldList)
{
String fldType = new String( eachField.getType().toString() );
if ( fldType.startsWith("class java.lang.StringBuilder") )
{
Field fld = DataClass.getDeclaredField( eachField.getName() );
StringBuilder sb = (StringBuilder)fld.get(null);
CSV_file.printf("%s,", sb ); // emit column to CSV
//fld.set( DataClass, new StringBuilder() );
}//endif
}//endfor
}//endif
So in this case I actually will need to directly access array elements.
That is, I need to get at each Master.innerThing.Thing[n].field
So, the big question is:
How do I generically access arrays like this ?
How do I know that Thing_def does not have data,
it is merely a structural definition for Things[ ] ?

How to extract key phrases from a given text with OpenNLP?

I'm using Apache OpenNLP and i'd like to extract the Keyphrases of a given text. I'm already gathering entities - but i would like to have Keyphrases.
The problem i have is that i can't use TF-IDF cause i don't have models for that and i only have a single text (not multiple documents)
Here is some code (prototyped - not so clean)
public List<KeywordsModel> extractKeywords(String text, NLPProvider pipeline) {
SentenceDetectorME sentenceDetector = new SentenceDetectorME(pipeline.getSentencedetecto("en"));
TokenizerME tokenizer = new TokenizerME(pipeline.getTokenizer("en"));
POSTaggerME posTagger = new POSTaggerME(pipeline.getPosmodel("en"));
ChunkerME chunker = new ChunkerME(pipeline.getChunker("en"));
ArrayList<String> stopwords = pipeline.getStopwords("en");
Span[] sentSpans = sentenceDetector.sentPosDetect(text);
Map<String, Float> results = new LinkedHashMap<>();
SortedMap<String, Float> sortedData = new TreeMap(new MapSort.FloatValueComparer(results));
float sentenceCounter = sentSpans.length;
float prominenceVal = 0;
int sentences = sentSpans.length;
for (Span sentSpan : sentSpans) {
prominenceVal = sentenceCounter / sentences;
sentenceCounter--;
String sentence = sentSpan.getCoveredText(text).toString();
int start = sentSpan.getStart();
Span[] tokSpans = tokenizer.tokenizePos(sentence);
String[] tokens = new String[tokSpans.length];
for (int i = 0; i < tokens.length; i++) {
tokens[i] = tokSpans[i].getCoveredText(sentence).toString();
}
String[] tags = posTagger.tag(tokens);
Span[] chunks = chunker.chunkAsSpans(tokens, tags);
for (Span chunk : chunks) {
if ("NP".equals(chunk.getType())) {
int npstart = start + tokSpans[chunk.getStart()].getStart();
int npend = start + tokSpans[chunk.getEnd() - 1].getEnd();
String potentialKey = text.substring(npstart, npend);
if (!results.containsKey(potentialKey)) {
boolean hasStopWord = false;
String[] pKeys = potentialKey.split("\\s+");
if (pKeys.length < 3) {
for (String pKey : pKeys) {
for (String stopword : stopwords) {
if (pKey.toLowerCase().matches(stopword)) {
hasStopWord = true;
break;
}
}
if (hasStopWord == true) {
break;
}
}
}else{
hasStopWord=true;
}
if (hasStopWord == false) {
int count = StringUtils.countMatches(text, potentialKey);
results.put(potentialKey, (float) (Math.log(count) / 100) + (float)(prominenceVal/5));
}
}
}
}
}
sortedData.putAll(results);
System.out.println(sortedData);
return null;
}
What it basically does is giving me the Nouns back and sorting them by prominence value (where is it in the text?) and counts.
But honestly - this doesn't work soo good.
I also tried it with lucene analyzer but the results were also not so good.
So - how can i achieve what i want to do? I already know of KEA/Maui-indexer etc (but i'm afraid i can't use them because of GPL :( )
Also interesting? Which other algorithms can i use instead of TF-IDF?
Example:
This text: http://techcrunch.com/2015/09/04/etsys-pulling-the-plug-on-grand-st-at-the-end-of-this-month/
Good output in my opinion: Etsy, Grand St., solar chargers, maker marketplace, tech hardware
Finally, i found something:
https://github.com/srijiths/jtopia
It is using the POS from opennlp/stanfordnlp. It has an ALS2 license. Haven't measured precision and recall yet but it delivers great results in my opinion.
Here is my code:
Configuration.setTaggerType("openNLP");
Configuration.setSingleStrength(6);
Configuration.setNoLimitStrength(5);
// if tagger type is "openNLP" then give the openNLP POS tagger path
//Configuration.setModelFileLocation("model/openNLP/en-pos-maxent.bin");
// if tagger type is "default" then give the default POS lexicon file
//Configuration.setModelFileLocation("model/default/english-lexicon.txt");
// if tagger type is "stanford "
Configuration.setModelFileLocation("Dont need that here");
Configuration.setPipeline(pipeline);
TermsExtractor termExtractor = new TermsExtractor();
TermDocument topiaDoc = new TermDocument();
topiaDoc = termExtractor.extractTerms(text);
//logger.info("Extracted terms : " + topiaDoc.getExtractedTerms());
Map<String, ArrayList<Integer>> finalFilteredTerms = topiaDoc.getFinalFilteredTerms();
List<KeywordsModel> keywords = new ArrayList<>();
for (Map.Entry<String, ArrayList<Integer>> e : finalFilteredTerms.entrySet()) {
KeywordsModel keyword = new KeywordsModel();
keyword.setLabel(e.getKey());
keywords.add(keyword);
}
I modified the Configurationfile a bit so that the POSModel is loaded from the pipeline instance.

java CSV file to array

I am novice to java however, I cannot seem to figure this one out. I have a CSV file in the following format:
String1,String2
String1,String2
String1,String2
String1,String2
Each line are pairs. The 2nd line is a new record, same with the 3rd. In the real word the CSV file will change in size, sometimes it will be 3 records, or 4, or even 10.
My issues is how do I read the values into an array and dynamically adjust the size? I would imagine, first we would have to parse though the csv file, get the number of records/elements, then create the array based on that size, then go though the CSV again and store it in the array.
I'm just not sure how to accomplish this.
Any help would be appreciated.
You can use ArrayList instead of Array. An ArrayList is a dynamic array. ex.
Scanner scan = new Scanner(new File("yourfile"));
ArrayList<String[]> records = new ArrayList<String[]>();
String[] record = new String[2];
while(scan.hasNext())
{
record = scan.nextLine().split(",");
records.add(record);
}
//now records has your records.
//here is a way to loop through the records (process)
for(String[] temp : records)
{
for(String temp1 : temp)
{
System.out.print(temp1 + " ");
}
System.out.print("\n");
}
Just replace "yourfile" with the absolute path to your file.
You could do something like this.
More traditional for loop for processing the data if you don't like the first example:
for(int i = 0; i < records.size(); i++)
{
for(int j = 0; j < records.get(i).length; j++)
{
System.out.print(records.get(i)[j] + " ");
}
System.out.print("\n");
}
Both for loops are doing the same thing though.
You can simply read the CSV into a 2-dimensional array just in 2 lines with the open source library uniVocity-parsers.
Refer to the following code as an example:
public static void main(String[] args) throws FileNotFoundException {
/**
* ---------------------------------------
* Read CSV rows into 2-dimensional array
* ---------------------------------------
*/
// 1st, creates a CSV parser with the configs
CsvParser parser = new CsvParser(new CsvParserSettings());
// 2nd, parses all rows from the CSV file into a 2-dimensional array
List<String[]> resolvedData = parser.parseAll(new FileReader("/examples/example.csv"));
// 3rd, process the 2-dimensional array with business logic
// ......
}
tl;dr
Use the Java Collections rather than arrays, specifically a List or Set, to auto-expand as you add items.
Define a class to hold your data read from CSV, instantiating an object for each row read.
Use the Apache Commons CSV library to help with the chore of reading/writing CSV files.
Class to hold data
Define a class to hold the data of each row being read from your CSV. Let's use Person class with a given name and surname, to be more concrete than the example in your Question.
In Java 16 and later, more briefly define the class as a record.
record Person ( String givenName , String surname ) {}
In older Java, define a conventional class.
package work.basil.example;
public class Person {
public String givenName, surname;
public Person ( String givenName , String surname ) {
this.givenName = givenName;
this.surname = surname;
}
#Override
public String toString ( ) {
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" }";
}
}
Collections, not arrays
Using the Java Collections is generally better than using mere arrays. The collections are more flexible and more powerful. See Oracle Tutorial.
Here we will use the List interface to collect each Person object instantiated from data read in from the CSV file. We use the concrete ArrayList implementation of List which uses arrays in the background. The important part here, related to your Question, is that you can add objects to a List without worrying about resizing. The List implementation is responsible for any needed resizing.
If you happen to know the approximate size of your list to be populated, you can supply an optional initial capacity as a hint when creating the List.
Apache Commons CSV
The Apache Commons CSV library does a nice job of reading and writing several variants of CSV and Tab-delimited formats.
Example app
Here is an example app, in a single PersoIo.java file. The Io is short for input-output.
Example data.
GivenName,Surname
Alice,Albert
Bob,Babin
Charlie,Comtois
Darlene,Deschamps
Source code.
package work.basil.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class PersonIo {
public static void main ( String[] args ) {
PersonIo app = new PersonIo();
app.doIt();
}
private void doIt ( ) {
Path path = Paths.get( "/Users/basilbourque/people.csv" );
List < Person > people = this.read( path );
System.out.println( "People: \n" + people );
}
private List < Person > read ( final Path path ) {
Objects.requireNonNull( path );
if ( Files.notExists( path ) ) {
System.out.println( "ERROR - no file found for path: " + path + ". Message # de1f0be7-901f-4b57-85ae-3eecac66c8f6." );
}
List < Person > people = List.of(); // Default to empty list.
try {
// Hold data read from file.
int initialCapacity = ( int ) Files.lines( path ).count();
people = new ArrayList <>( initialCapacity );
// Read CSV file.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records ) {
// GivenName,Surname
// Alice,Albert
// Bob,Babin
// Charlie,Comtois
// Darlene,Deschamps
String givenName = record.get( "GivenName" );
String surname = record.get( "Surname" );
// Use read data to instantiate.
Person p = new Person( givenName , surname );
// Collect
people.add( p ); // For real work, you would define a class to hold these values.
}
} catch ( IOException e ) {
e.printStackTrace();
}
return people;
}
}
When run.
People:
[Person{ givenName='Alice' | surname='Albert' }, Person{ givenName='Bob' | surname='Babin' }, Person{ givenName='Charlie' | surname='Comtois' }, Person{ givenName='Darlene' | surname='Deschamps' }]

Reading from Excel to feed test cases

I've an ExcelReader class which reads from Excel and populates the appropriate lists.
public static List<Address> addressList;
public static List<User> userList;
Above lists are being populated like this by the ExcelReader:
addressList = new ArrayList<Address>();
Address a = new Address();
a.setAddress1(r.getCell(0).getStringCellValue());
a.setCity(r.getCell(1).getStringCellValue());
I want to use this data in my Selenium test cases. I was planning to use TestNG's #DataProvider tag to feed the test case, but it only accepts Object[][] and Iterator.
Is there a way to convert these lists into an Object[][] format?
I am also open to any suggestions if you prefer to use anything other than #DataProvider.
Thanks in advance
There are lots of ways to do this, but here is an idea. I wrote an example here that does it.
The general gist of the idea is, using the MetaModel api:
public static Object[][] get2ArgArrayFromRows( List<Row> rows ) {
Object[][] myArray = new Object[rows.size()][2];
int i = 0;
SelectItem[] cols = rows.get(0).getSelectItems();
for ( Row r : rows ) {
Object[] data = r.getValues();
for ( int j = 0; j < cols.length; j++ ) {
if ( data[j] == null ) data[j] = ""; // force empty string where there are NULL values
}
myArray[i][0] = cols;
myArray[i][1] = data;
i++;
}
logger.info( "Row count: " + rows.size() );
logger.info( "Column names: " + Arrays.toString( cols ) );
return myArray;
}
public static Object[][] getCsvData( File csvFile )
{
CsvConfiguration conf = new CsvConfiguration( 1 );
DataContext csvContext = DataContextFactory.createCsvDataContext( csvFile, conf );
Schema schema = csvContext.getDefaultSchema();
Table[] tables = schema.getTables();
Table table = tables[0]; // a representation of the csv file name including extension
DataSet dataSet = csvContext.query()
.from( table )
.selectAll()
.where("run").eq("Y")
.execute();
List<Row> rows = dataSet.toRows();
Object[][] myArray = get2ArgArrayFromRows( rows );
return myArray;
}
Now, this code above is just a ROUGH idea. What you really need to do is merge cols and data into a Map<String,String> object and then pass that as the first argument back to your test, containing all parameters from the CSV file, including browser type. Then, as the second argument, set it like so:
myArray[i][1] = new WebDriverBuilderHelper();
Then, in your #Test annotated method, instantiate the driver:
#Test(dataProvider = "dp")
public void testIt( Map<String,String> map, WebDriverBuilderHelper wdhelper ) {
wdhelper.instantiateBrowser( map.get("browser") );
wdhelper.navigateTo(url);
....

In ArrayList, how to remove subdirecyory if its parent is already present in the list?

I've an ArrayList<String> containing paths of directiories, like:
/home, /usr...
I want to write a code that will remove all the paths from the list if the list already contains parent direcotry of that element.
For e.g:
If the list contains:
/home
/home/games
then, /home/games should get removed as its parent /home is already in the list.
Below is the code:
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(checkedList.contains(f.getParent()));
checkedList.remove(checkedList.get(i));
}
Above checkedList is a String arrayList.
The problem comes when the list contains:
/home
/home/games/minesweeper
Now the minesweeper folder will not get removed as its parent games is not in the list. How to remove these kinds of elements too?
Another possible solution would be using String.startsWith(String).
But of course you could take advantage of parent functionality of File class in order to handle the relative directories and other particularities. Follows a draft of the solution:
List<String> listOfDirectories = new ArrayList<String>();
listOfDirectories.add("/home/user/tmp/test");
listOfDirectories.add("/home/user");
listOfDirectories.add("/tmp");
listOfDirectories.add("/etc/test");
listOfDirectories.add("/etc/another");
List<String> result = new ArrayList<String>();
for (int i = 0; i < listOfDirectories.size(); i++) {
File current = new File(listOfDirectories.get(i));
File parent = current;
while ((parent = parent.getParentFile()) != null) {
if (listOfDirectories.contains(parent.getAbsolutePath())) {
current = parent;
}
}
String absolutePath = current.getAbsolutePath();
if (!result.contains(absolutePath)) {
result.add(absolutePath);
}
}
System.out.println(result);
This would print:
[/home/user, /tmp, /etc/test, /etc/another]
You can do some string manipulation to get the base directory of each string.
int baseIndex = checkedList.get(i).indexOf("/",1);
String baseDirectory = checkedList.get(i).substring(0,baseIndex);
if(baseIndex != -1 && checkedList.contains(baseDirectory))
{
checkedList.remove(checkedList.get(i));
}
This will get the index of the second '/' and extract the string up until that slash. If the second slash exists, then it checks if the list contains the base string and removes the current string if there's a mtach.
You can substract the root from your string and add it to a hashset.
For example:
if you have /home/games you can substract "home" from the string using string substraction or a regular expression or whateever you want.
before you add "home" to the hashset you must check if it's already added:
if (hashset.Contains("home"))
{
//then it s already added
}
else
{
hashhset.add("home");
}
would doing the opposite work? if the parent is NOT found in your ArrayList, add the value to a final output ArrayList?
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(!checkedList.contains(f.getParent()));
yourOutputList.Add(checkedList.get(i));
}
You should check every parent of each list item in turn.
I will assume that your list contains normalized absolute path File objects:
for (int i = 0; i < checkedList.size(); i++) {
File curItem = checkedList.get(i);
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
checkedList.remove( curItem );
break;
}
}
}
Actually, I would rewrite it with ListIterator
for (ListIterator iter = checkedList.iterator(); iter.hasNext(); )
{
File curItem = iter.next();
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
iter.remove( );
break;
}
}
}

Categories