How to compare two maps with keys having multiple values? - java

I have following two maps in the following manner:
Map<String,List<String>> sourceTags = sourceList.get(block);
Map<String,List<String>> targetTags = targetList.get(block);
I want to compare the list of values in sourceTags with list of values in targetTags corresponding to the key.
Now, the values in a map entry will be in the following manner :
SourceTag = [20C=[:ABC//0000000519983150], 22F=[:CAMV//MAND, :CAMV//MANDA], 98A[:XDTE//20160718,:MEET//20160602,:RDTE//20160719]
TargetTag = [20C=[:ABC//0000000519983150], 22F=[:CAMV//MAND],98A=[:MEET//20160602,:RDTE//20160719]
I want the output as below :
Blockquote
key-22F, compare the list of values with sub-key being CAMV, if sub-key exists, the compare the difference, else if sub-key not exists then also report.
Blockquote
Again, Key-98A, sub-Keys:XDTE,MEET,RDTE. If sub-key exists and found difference in values in source and target, then report. else if sub-key not found report as not found in source or target, same is the case with values.
if(sub-key found){
//compare their values
}else{
//report as sub-key not found
}
I have written the following program :
EDITED the Program
Set tags = sourceTags.keySet();
for(String targetTag : tags){
if(targetTags.containsKey(targetTag)){
List<String> sourceValue = sourceTags.get(targetTag);
List<String> targetValue = targetTags.get(targetTag);
for(String sValue : sourceValue){
for(String tValue : targetValue){
if(sValue.length() > 4 && tValue.length() > 4){
//get keys for both source and target
String sKey = sValue.substring(1, 5);
String tKey = tValue.substring(1,5);
//get values for both source and target
String sTagValue= sValue.substring(sValue.lastIndexOf(sKey), sValue.length());
String tTagValue = tValue.substring(tValue.lastIndexOf(tKey),tValue.length());
if(sKey.equals(tKey)){
if(!sTagValue.equals(tTagValue)){
values = createMessageRow(corpValue, block ,targetTag, sTagValue,tTagValue);
result.add(values);
}
}
}
}
}
}else{
System.out.println(sourceTags.get(targetTag).get(0));
values = createMessageRow(corpValue,block,targetTag,sourceTags.get(targetTag).get(0),"","Tag: "+targetTag+" not availlable in target");
result.add(values);
}
After executing, the comparison report shows wrong values.
Please help!!

Actually, your code has a major logical flow. When you compare the List contained in the two Maps accessed with the same key, you do this:
for(int index = 0; index < Math.max(sourceValue.size(), targetValue.size()); index ++ ){
if(index<sourceValue.size() && index<targetValue.size()){
//Do your comparations...
}
That means that you proceed along the two lists with the same index and then you compare the two items. You never compare an item of the first list with an item of the second list that doesn't have the same index.
I'll give you an example: having two lists
LIST_A = (A, B, C)
LIST_B = (C, B, A)
these are the comparisons you're making:
A == C
B == B
C == A
It's obvious then that even if the two lists contains the same elements the only correspondence you'll find is B == B.
You need to compare every item of the first list with ALL the items of the second one, to get all the matching pairs. Something like (without optimizations and elegance for clarity's sake):
for(String sValue : sourceValue){
for(String tValue : targetValue){
if(sValue.length() > 4 && tValue.length() > 4){
String sKey = sValue.substring(1,5);
String tKey = tValue.substring(1,5);
if(sKey.equals(tKey)){
//Do your logic...
}
}
}
}
This way, you don't even need to proceed in the other list when the index reaches the end of the first one like you do now...

Related

Optimisation of searching HashMap with list of values

I have a map in which values have references to lists of objects.
//key1.getElements() - produces the following
[Element N330955311 ({}), Element N330955300 ({}), Element N3638066598 ({})]
I would like to search the list of every key and find the occurrence of a given element (>= 2).
Currently my approach to this is every slow, I have a lot of data and I know execution time is relative but it takes 40seconds~.
My approach..
public String occurance>=2 (String id)
//Search for id
//Outer loop through Map
//get first map value and return elements
//inner loop iterating through key.getElements()
//if match with id..then iterate count
//return Strings with count == 2 else return null
The reason why this is so slow is because I have a lot of ids which I'm searching for - 8000~ and I have 3000~ keys in my map. So its > 8000*3000*8000 (given that every id/element exists in the key/valueSet map at least once)
Please help me with a more efficient way to make this search. I'm not too deep into practicing Java, so perhaps there's something obvious I'm missing.
Edited in real code after request:
public void findAdjacents() {
for (int i = 0; i < nodeList.size(); i++) {
count = 0;
inter = null;
container = findIntersections(nodeList.get(i));
if (container != null) {
intersections.add(container);
}
}
}
public String findIntersections(String id) {
Set<Map.Entry<String, Element>> entrySet = wayList.entrySet();
for (Map.Entry entry : entrySet) {
w1 = (Way) wayList.get(entry.getKey());
for (Node n : w1.getNodes()) {
container2 = String.valueOf(n);
if (container2.contains(id)) {
count++;
}
if (count == 2) {
inter = id;
count = 0;
}
}
}
if (inter != (null))
return inter;
else
return null;
}
Based on the pseudocode provided by you, there is no need to iterate all the keys in the Map. You can directly do a get(id) on the map. If the Map has it, you will get the list of elements on which you can iterate and get the element if its count is > 2. If the id is not there then null will be returned. So in that case you can optimize your code a bit.
Thanks

sorting list of values in java after converting string to integer

I am doing some selenium automation test case and i am getting list of web element using the below code.
List<WebElement> list = GlobalVariables.BrowserDriver.findElements(By.xpath(".//*[#id='Tbls_T']/tbody/tr/td[4]"));
and then i am iterating that web element and getting the values using
for (WebElement webElement : list) {
System.out.println(webElement.getText());
}
so i am getting the values in string format. and sample values are giving below
-100,000
-80,000
0.100
2
87.270
3,000.000
I want to check these values in sorting order or not? for that i think i should convert to integer and then check using some kind of sorting method i guess. for that i have tried to convert the values to a list of integer and then use some sorting algorithm like Guava to check the sorting. because of negative values i am facing difficulty to do that.
Is there any way i can check the sorting order for the above problem and check the order of the values. ?
thanks in advance.
You want to just check the incoming list is sorted or not.
You compare the previous value with current value. If previous value is greater than current value then the list is not sorted.
Sample code:
WebElement prev=null;
boolean isSorted = true;
for (WebElement currentWebElement : list) {
if (prev == null) {
prev = currentWebElement;
continue;
} else {
if (Double.compare(Double.parseDouble(prev.getText().replaceAll(",", "")),
Double.parseDouble(currentWebElement .getText().replaceAll(",", ""))) > 0) {
isSorted = false;
break;
}
prev = currentWebElement;
}
}
create 2 copies of your data
convert the first to integers and sort it
convert the second one to integers and leave it
check if the two lists are equal if they are then your list is sorted
Please use another list of doubles and Collections.sort(). You may have to remove the commas from strings
public class ConvertToIntSort
{
public static void main( String[] args )
{
String[] strlist = {"-100000","0.100", "2" , "3000.000","87.270", "-80000" };
java.util.List<String> stringlist = Arrays.asList(strlist);
java.util.List<Double> intList = new ArrayList<Double>();
for(String str : stringlist)
{
//converting and adding to double list
intList.add(Double.valueOf(str));
}
//Sorting is done here
Collections.sort(intList);
for(Double num : intList)
{
System.out.println(num);
}
}
}

Get the string with the highest value in java (strings have the same 'base' name but different suffixes)

I have a list with some strings in it:
GS_456.java
GS_456_V1.java
GS_456_V2.java
GS_460.java
GS_460_V1.java
And it goes on. I want a list with the strings with the highest value:
GS_456_V2.java
GS_460_V1.java
.
.
.
I'm only thinking of using lots of for statements...but isn't there a more pratical way? I'd like to avoid using too many for statements...since i'm using them a lot when i execute some queries...
EDIT: The strings with the V1, V2,.... are the names of recent classes created. When someone creates a new version of GS_456 for example, they'll do it and add its version at the end of the name.
So, GS_456_V2 is the most recent version of the GS_456 java class. And it goes on.
Thanks in advance.
You will want to process the file names in two steps.
Step 1: split the list into sublists, with one sublist per file name (ignoring suffix).
Here is an example that splits the list into a Map:
private static Map> nameMap = new HashMap>();
private static void splitEmUp(final List names)
{
for (String current : names)
{
List listaly;
String[] splitaly = current.split("_|\\.");
listaly = nameMap.get(splitaly[1]);
if (listaly == null)
{
listaly = new LinkedList();
nameMap.put(splitaly[1], listaly);
}
listaly.add(current);
}
Step 2: find the highest prefix for each name. Here is an example:
private static List findEmAll()
{
List returnValue = new LinkedList();
Set keySet = nameMap.keySet();
for (String key : keySet)
{
List listaly = nameMap.get(key);
String highValue = null;
if (listaly.size() == 1)
{
highValue = listaly.get(0);
}
else
{
int highVersion = 0;
for (String name : listaly)
{
String[] versions = name.split("_V|\\.");
if (versions.length == 3)
{
int versionNumber = Integer.parseInt(versions[1]);
if (versionNumber > highVersion)
{
highValue = name;
highVersion = versionNumber;
}
}
}
}
returnValue.add(highValue);
}
return returnValue;
}
I guess you don't want simply the lexicographic order (the solution would be obvious).
First, remove the ".java" part and split your string on the character "_".
int dotIndex = string.indexOf(".");
String []parts = split.substring(0, dotIndex).split("_");
You are interested in parts[1] and parts[2]. The first is easy, it's just a number.
int fileNumber = Integer.parseInt(parts[1]);
The second one is always of the form "VX" with X being a number. But this part may not exist (if it's the base version of the file). In which case we can say that version is 0.
int versionNumber = parts.length < 2 ? 0 : Integer.parseInt(parts[2].substring(1));
Now you can compare based on these two numbers.
To make things simple, build a class FileIdentifier based on this:
class FileIdentifier {
int fileNumber;
int versionNumber;
}
Then a function that create a FileIdentifier from a file name, with logic based on what I explained earlier.
FileIdentifier getFileIdentifierFromFileName(String filename){ /* .... */ }
Then you make a comparator on String, in which you get the FileIdentifier for the two strings and compare upon FileIdentifier members.
Then, to get the string with "the highest value", you simply put all your strings in a list, and use Collections.sort, providing the comparator.

Most efficient way of finding common subexpressions in a list of strings in Java

I have a list of strings that represents package directories. I want to iterate the list, to find largest part of the strings where the packages are the same, then extract this substring, subtract that from the original list of strings to get specific packages so I create the appropriate directories.
I was thinking of creating the original list as a static hash set, then using the retainAll method, storing the result in a new String.
Would something like this be the most performant option, or is there a better way to do it?
Many thanks
This works for me, explanation in comments
// returns the length of the longest common prefix of all strings in the given array
public static int longestCommonPrefix(String[] strings) {
// Null or no contents, return 0
if (strings == null || strings.length == 0) {
return 0;
// only 1 element? return it's length
} else if (strings.length == 1 && strings[0] != null) {
return strings[0].length();
// more than 1
} else {
// copy the array and sort it on the lengths of the strings,
// shortest one first.
// this will raise a NullPointerException if an array element is null
String[] copy = Arrays.copyOf(strings, strings.length);
Arrays.sort(copy, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.length() - o1.length();
}
});
int result = 0; // init result
// iterate through every letter of the shortest string
for (int i = 0; i < copy[0].length(); i++) {
// compare the corresponding char of all other strings
char currenChar = copy[0].charAt(i);
for (int j = 1; j < strings.length; j++) {
if (currenChar != copy[j].charAt(i)) { // mismatch
return result;
}
}
// all match
result++;
}
// done iterating through shortest string, all matched.
return result;
}
}
If changing the original array does not bother you, you can omit the line String[] copy = Arrays.copyOf(strings, strings.length); and just sort your strings array.
To retrieve the text, change the return type to String and return something like return copy[0].substring(0, result + 1); within the loop and return copy[0]; at the end of the method.
If you are just looking for the single most common package, I would do the following:
Grab the first element from the list (call it the reference package). Using this package name I would iterate through the list. For each remaining element in the list, see if the element contains the reference package. If so move to the next element. If not trim your reference package by one package (taking aa.bb.cc.serverside and converting to aa.bb.cc). Then see if the current element contains this new reference package. Repeat this until the reference package is empty or until the element matches. Then continue down the list of packages.
This will give you the largest most common package. Loop back through removing this from all elements in the list.
EDIT: Slight modification, better keep the . at the end of the package name to ensure complete package name.
Just sort them. The common prefixes will appear first.

Java Algorithm: pair list entries by multiple case criteria

I fear this won't be an easy question. I've been thinking about a proper solution for this problem for a long time and hope that a fresh bunch of brains have a better view on the problem - let's get to it:
Data:
What we're working with here is a csv file containing multiple columns, the relevant ones for this problem are:
User ID (Integer, ranging from 3 to 8 digits, multiple entries with the same UserID exist) LIST IS SORTED BY THIS
Query (String)
Epoc (Long, epoc time value)
clickurl (String)
Every entry in the data we're working with here has !null values for these attributes.
Example Data:
SID,UID,query,rawdate,timestamp,timegap,epoc,lengthwords,lengthchars,rank,clickurl
5,142,westchester.gov,2006-03-20 03:55:57,Mon Mar 20 03:55:57 CET 2006,0,1142823357504,1,15,1,http://www.westchestergov.com
10,142,207 ad2d 530,2006-04-08 01:31:14,Sat Apr 08 01:31:14 CEST 2006,10000,1144452674507,3,12,1,http://www.courts.state.ny.us
11,142,vera.org,2006-04-08 08:38:42,Sat Apr 08 08:38:42 CEST 2006,11000,1144478322507,1,8,1,http://www.vera.org
Note: there are multiple entries that have the same value for 'Epoc', this is due to the tools used to gather the data
Note2: the list has a size of ~700000, just fyi
Goal: Match pairs of entries that have the same query
Scope: entries that share the same UserID
Due to the mentioned anomaly in the data gathering process, the following has to be considered:
If two entries share the same value for 'Query' and for 'Epoc' , the following elements in the list have to be checked for these criteria until the next entry has a different value for one of these attributes. The group of entries that shared the same Query and Epoc values are to be considered as -one- entry, so in order to match a pair, another entry has to be found that matches the 'Query' value. For lack of a better name, let's call a group that shares the same Query and Epoc value a 'chain'
Now that this is out, it gets a bit easier, there are 3 types of pair compositions we can get out of this:
Entry & Entry
Entry & Chain
Chain & Chain
Type 1 here just means two entries in the list that share the same value for 'Query', but not for 'Epoc'.
So this sums up the Equal Query Pairs
There's also the case of Different Query Pairs which can be described as the following:
After we have matched the equal query pairs, there's the possibility that there are entries which have not been paired with other entries because their query didn't match - every entry that has not been matched to another entry because of this is part of the set called 'different queries'
The members of this set have to be paired without following any criteria, but chains are still treated as -one- entry of the pair.
As for matching the pairs in general, there may be no redundant pairs - a single entry can be part of n many pairs, but two individual entries can only form one pair.
EXAMPLE:
The following entries are to be paired
UID,Query,Epoc,Clickurl
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,raspberry pi,1141394164710,http://www.raspberrypi.org/
772,stackoverflow,1141394274810,http://en.wikipedia.org/wiki/Buffer_overflow
772,stackoverflow,1141394274850,http://www.stackoverflow.com
772,tall women,1141394275921,http://www.tallwomen.org/
772,raspberry pi,1141394277991,http://www.raspberrypi.org/
772,Donuts,114139427999,http://de.wikipedia.org/wiki/Donut
772,stackoverflow,1141394279999,http://www.stackoverflow.com
772,something,1141399299991,http:/something.else/something/
In this example, donuts is a chain, therefore the pairs are(linenumbers without header):
Equal Query Pairs:(1-3,9) (4,8) (5,6) (5,10) (6,10)
Different Query Pairs: (7,11)
My -failed- approach to the problem:
The algorithm I developed to solve this works as follow:
Iterate the list of entries until the value for UserID changes.
Then, applied to a separate list that only contains the just iterated elements that share the same UserID:
for (int i = 0; i < list.size(); i++) {
Entry tempI = list.get(i);
Boolean iMatched = false;
//boolean to save whether or not c1 is set
Boolean c1done = false;
Boolean c2done = false;
//Hashsets holding the clickurl values of the entries that form a pair
HashSet<String> c1 = null;
HashSet<String> c2 = null;
for (int j = i + 1; j < list.size(); j++) {
Entry tempJ = list.get(j);
// Queries match
if (tempI.getQuery().equals(tempJ.getQuery())) {
// wheter or not Entry at position i has been matched or not
if (!iMatched) {
iMatched = true;
}
HashSet<String> e1 = new HashSet<String>();
HashSet<String> e2 = new HashSet<String>();
int k = 0;
// Times match
HashSet<String> chainset = new HashSet<String>();
if (tempI.getEpoc() == tempJ.getEpoc()) {
chainset.add(tempI.getClickurl());
chainset.add(tempJ.getClickurl());
} else {
e1.add(tempI.getClickurl());
if (c1 == null) {
c1 = e1;
c1done = true;
} else {
if (c2 == null) {
c2 = e1;
c2done = true;
}
}
}
//check how far the chain goes and get their entries
if ((j + 1) < list.size()) {
Entry tempjj = list.get(j + 1);
if (tempjj.getEpoc() == tempJ.getEpoc()) {
k = j + 1;
//search for the end of the chain
while ((k < list.size())
&& (tempJ.getQuery().equals(list.get(k)
.getQuery()))
&& (tempJ.getEpoc() == list.get(k).getEpoc())) {
chainset.add(tempJ.getClickurl());
chainset.add(list.get(k).getClickurl());
k++;
}
j = k + 1; //continue the iteration at the end of the chain
if (c1 == null) {
c1 = chainset;
c1done = true;
} else {
if (c2 == null) {
c2 = chainset;
c2done = true;
}
}
// Times don't match
}
} else {
e2.add(tempJ.getClickurl());
if (c1 == null) {
c1 = e2;
c1done = true;
} else {
if (c2 == null) {
c2 = e2;
c2done = true;
}
}
}
/** Block that compares the clicks in the Hashsets and computes the resulting data
* left out for now to not make this any more complicated than it already is
**/
// Queries don't match
} else {
if (!dq.contains(tempJ)) { //note: dq is an ArrayList holding the entries of the differen query set
dq.add(tempJ);
}
}
if (j == al.size() - 1) {
if (!iMatched) {
dq.add(tempI);
}
}
}
if (dq.size() >= 2) {
for (int z = 0; z < dq.size() - 1; z++) {
if (dq.get(z + 1) != null) {
/** Filler, iterate dq just like the normal list with two loops
*
**/
}
}
}
}
So, using an excessive amount of loops I try to match the pairs, resulting in a horribly long runtime which's end I have not seen up until this point
Okay I hope I didn't forget anything crucial, I'll add further needed information later
If you've made it this far, thanks for reading - hopefully you have an idea that might help me
Use SQL to import the data into a db and then perform the queries. Your txt file is too large; it's no wonder that it takes so long to go through it. :)
First, remove all but one entry from each chain. To do this, sort by (userid, query, epoch), remove duplicates.
Then, scan the sorted list. take all entries for a (userid, query) pair. If there is only one, save it in a list for later processing, else emit all pairs.
For all the entries for a given user that You have saved for later processing (these are type 2 & 3), emit pairs.

Categories