how to avoid ConcurrentHashMap usage - java

I have written this code inside the run() method of the Reducer class in Hadoop
#Override
public void run(Context context) throws IOException, InterruptedException {
setup(context);
ConcurrentHashMap<String, HashSet<Text>> map = new ConcurrentHashMap<String, HashSet<Text>>();
while (context.nextKey()) {
String line = context.getCurrentKey().toString();
HashSet<Text> values = new HashSet<Text>();
for (Text t : context.getValues()) {
values.add(new Text(t));
}
map.put(line, new HashSet<Text>());
for (Text t : values) {
map.get(line).add(new Text(t));
}
}
ConcurrentHashMap<String, HashSet<Text>> newMap = new ConcurrentHashMap<String, HashSet<Text>>();
for (String keyToMerge : map.keySet()) {
String[] keyToMergeTokens = keyToMerge.split(",");
for (String key : map.keySet()) {
String[] keyTokens = key.split(",");
if (keyToMergeTokens[keyToMergeTokens.length - 1].equals(keyTokens[0])) {
String newKey = keyToMerge;
for (int i = 1; i < keyTokens.length; i++) {
newKey += "," + keyTokens[i];
}
if (!newMap.contains(newKey)) {
newMap.put(newKey, new HashSet<Text>());
for (Text t : map.get(keyToMerge)) {
newMap.get(newKey).add(new Text(t));
}
}
for (Text t : map.get(key)) {
newMap.get(newKey).add(new Text(t));
}
}
}
//call the reducers
for (String key : newMap.keySet()) {
reduce(new Text(key), newMap.get(key), context);
}
cleanup(context);
}
my problem is that even if my input is too small it takes 30 minutes to run epsecially because of the newMap.put() call. If I put this command in comments then it runs quickly without any problems.
As you can see I use ConcurrentHashMap. I didn't want to use it because I think that run() is called only once at each machine (it doesn't run concurrently) so I would not have any problem with a simple HashMap but if I replace the concurrentHashMap with the simple HashMap I am getting an error (concurrentModificationError).
Does anyone have an idea about how to make it work without any delays ?
thanks in advance!
*java6
*hadoop 1.2.1

I don't know if it would solve your performance problems, but I see one inefficient thing you are doing :
newMap.put(newKey, new HashSet<Text>());
for (Text t : map.get(keyToMerge)) {
newMap.get(newKey).add(new Text(t));
}
It would be more efficient to keep the HashSet in a variable instead of searching for it in newMap :
HashSet<Text> newSet = new HashSet<Text>();
newMap.put(newKey, newSet);
for (Text t : map.get(keyToMerge)) {
newSet.add(new Text(t));
}
Another inefficient thing you are doing is create a HashSet of values and then create another identical HashSet to put in the map. Since the original HashSet (values) is never used again, you are constructing all those Text objects for no reason at all.
Instead of:
while (context.nextKey()) {
String line = context.getCurrentKey().toString();
HashSet<Text> values = new HashSet<Text>();
for (Text t : context.getValues()) {
values.add(new Text(t));
}
map.put(line, new HashSet<Text>());
for (Text t : values) {
map.get(line).add(new Text(t));
}
}
You can simply write :
while (context.nextKey()) {
String line = context.getCurrentKey().toString();
HashSet<Text> values = new HashSet<Text>();
for (Text t : context.getValues()) {
values.add(new Text(t));
}
map.put(line, values);
}
EDIT :
I just saw the additional code you posted as an answer (from your cleanup() method) :
//clear map
for (String s : map.keySet()) {
map.remove(s);
}
map = null;
//clear newMap
for (String s : newMap.keySet()) {
newMap.remove(s);
}
newMap = null;
The reason this code gives you ConcurrentModificationError is that foreach loops don't support modification of the collection you are iterating over.
To overcome this, you can use an Iterator :
//clear map
Iterator<Map.Entry<String, HashSet<Text>>> iter1 = map.entrySet ().iterator ();
while (iter1.hasNext()) {
Map.Entry<String, HashSet<Text>> entry = iter1.next();
iter1.remove();
}
map = null;
//clear newMap
Iterator<Map.Entry<String, HashSet<Text>>> iter2 = newMap.entrySet ().iterator ();
while (iter2.hasNext()) {
Map.Entry<String, HashSet<Text>> entry = iter2.next();
iter2.remove();
}
newMap = null;
That said, you don't really have to remove each item separately.
You can simply write
map = null;
newMap = null;
When you remove the reference to the maps, the garbage collector can garbage collect them. Removing the items from the maps makes no difference.

Related

Find duplicates in first column and take average based on third column

My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'
I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();
You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;

How to Loop next element in hashmap

I have a set of strings like this
A_2007-04, A_2007-09, A_Agent, A_Daily, A_Execute, A_Exec, B_Action, B_HealthCheck
I want output as:
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
Key = B, Value = [Action,HealthCheck]
I'm using HashMap to do this
pckg:{A,B}
count:total no of strings
reports:set of strings
Logic I used is nested loop:
for (String l : reports[i]) {
for (String r : pckg) {
String[] g = l.split("_");
if (g[0].equalsIgnoreCase(r)) {
report.add(g[1]);
dirFiles.put(g[0], report);
} else {
break;
}
}
}
I'm getting output as
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
How to get second key?
Can someone suggest logic for this?
Assuming that you use Java 8, it can be done using computeIfAbsent to initialize the List of values when it is a new key as next:
List<String> tokens = Arrays.asList(
"A_2007-04", "A_2007-09", "A_Agent", "A_Daily", "A_Execute",
"A_Exec", "P_Action", "P_HealthCheck"
);
Map<String, List<String>> map = new HashMap<>();
for (String token : tokens) {
String[] g = token.split("_");
map.computeIfAbsent(g[0], key -> new ArrayList<>()).add(g[1]);
}
In terms of raw code this should do what I think you are trying to achieve:
// Create a collection of String any way you like, but for testing
// I've simply split a flat string into an array.
String flatString = "A_2007-04,A_2007-09,A_Agent,A_Daily,A_Execute,A_Exec,"
+ "P_Action,P_HealthCheck";
String[] reports = flatString.split(",");
Map<String, List<String>> mapFromReportKeyToValues = new HashMap<>();
for (String report : reports) {
int underscoreIndex = report.indexOf("_");
String key = report.substring(0, underscoreIndex);
String newValue = report.substring(underscoreIndex + 1);
List<String> existingValues = mapFromReportKeyToValues.get(key);
if (existingValues == null) {
// This key hasn't been seen before, so create a new list
// to contain values which belong under this key.
existingValues = new ArrayList<>();
mapFromReportKeyToValues.put(key, existingValues);
}
existingValues.add(newValue);
}
System.out.println("Generated map:\n" + mapFromReportKeyToValues);
Though I recommend tidying it up and organising it into a method or methods as fits your project code.
Doing this with Map<String, ArrayList<String>> will be another good approach I think:
String reports[] = {"A_2007-04", "A_2007-09", "A_Agent", "A_Daily",
"A_Execute", "A_Exec", "P_Action", "P_HealthCheck"};
Map<String, ArrayList<String>> map = new HashMap<>();
for (String rep : reports) {
String s[] = rep.split("_");
String prefix = s[0], suffix = s[1];
ArrayList<String> list = new ArrayList<>();
if (map.containsKey(prefix)) {
list = map.get(prefix);
}
list.add(suffix);
map.put(prefix, list);
}
// Print
for (Map.Entry<String, ArrayList<String>> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList<String> valueList = entry.getValue();
System.out.println(key + " " + valueList);
}
for (String l : reports[i]) {
String[] g = l.split("_");
for (String r : pckg) {
if (g[0].equalsIgnoreCase(r)) {
report = dirFiles.get(g[0]);
if(report == null){ report = new ArrayList<String>(); } //create new report
report.add(g[1]);
dirFiles.put(g[0], report);
}
}
}
Removed the else part of the if condition. You are using break there which exits the inner loop and you never get to evaluate the keys beyond first key.
Added checking for existing values. As suggested by Orin2005.
Also I have moved the statement String[] g = l.split("_"); outside inner loop so that it doesn't get executed multiple times.

How to change job parameter of a map-reduce job on run-time?

I have written a map job which takes up a bunch of tweets and list of keyword, and emits tweets counts for keywords
#Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException {
JSONObject tweetObject = null;
ArrayList<String> keywords = this.getKeyWords();
try {
tweetObject = (JSONObject) parser.parse(value.toString());
} catch (ParseException e) {
e.printStackTrace();
}
if (tweetObject != null) {
String tweetText = (String) tweetObject.get("text");
StringTokenizer st = new StringTokenizer(tweetText);
ArrayList<String> tokens = new ArrayList<String>();
while (st.hasMoreTokens()) {
tokens.add(st.nextToken());
}
for (String keyword : keywords) {
for (String token : tokens) {
token = token.toLowerCase();
if (token.equals(keyword) || token.contains(keyword)) {
output.write(new Text(keyword), one);
break;
}
}
}
}
output.write(new Text("count"), one);
}
ArrayList<String> getKeyWords() {
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("vodka");
keywords.add("tequila");
keywords.add("mojito");
keywords.add("margarita");
return keywords;
}
Right now my keywords list is static/hard-coded in the map-reduce jar file, how can I make this dynamic? i.e. I want to be able to change the keywords on run-time?
What is the best way to do this?
Multiple ways from the top off my head: query a webservice, read a file.
In any case you probably don't want to execute this for every record you map. It is fairly common to use a caching layer (e.g. Guava) to cache an external data source and invalidate it for example by time or modification.

java print values inside hashmap

i have HashMap and its have data,
i connect to database by xmlrpc by jetty9
i am call this function by java client , by this code
Object params[] = new Object[]{stString};
HashMap v1;
v1 = (HashMap<String, Object[]>)server.execute("DBRamService.getRmsValues", params);
i need to print it in my java client , how can i make it ?
this is my function that get data from datebase
HashMap<String, Object[]> result = new HashMap<String, Object[]>();
ArrayList<Double> vaArrL = new ArrayList<Double>();
try {
// i have connected to postgres DB and get data
while (rs.next()){
vaArrL.add(rs.getDouble("va"));
}
int sz = vaArrL.size();
result.put("va", vaArrL.toArray(new Object[sz]));
} catch ( Exception e ) {
System.out.println(e);
e.printStackTrace();
}
return result; }
Following is the snippet to loop through the vArrL and printing the values:
for (int i=0;i<vaArrL.size();i++) {
System.out.println(vaArrL.get(i));
}
Looping through HashMap using Iterator:
Iterator<Entry<String, Object[]>> it = result.entrySet().iterator();
while (it.hasNext()) {
Entry<String, Object[]> pairs = (Entry<String, Object[]>) it.next();
for(Object obj: pairs.getValue()) {
System.out.println(obj);
}
}
Here is how to iterate through a HashMap and get all the keys and values:
// example hash map
HashMap<String, Object[]> v1 = new HashMap<String, Object[]>();
v1.put("hello", new Object[] {"a", "b"});
// print keys and values
for(Map.Entry<String, Object[]> entry : v1.entrySet()) {
System.out.println("Key: " + entry.getKey() + " Values: " + Arrays.asList(entry.getValue()));
}
If you need to print in a different format, you can iterate over the elements of the value array like this:
for(Map.Entry<String, Object[]> entry : v1.entrySet()) {
System.out.println("Key:");
System.out.println(entry.getKey());
System.out.println("Values:");
for (Object valueElement : entry.getValue()) {
System.out.println(valueElement);
}
}

Heap Space error - How to optimize this code

I am having heap space errors with the below code.
Anybody have any idea how to optimize this code.
This happens for large files [180MB]. The method parameter has around 50 metatag key-values corresponding to each locale. The error shows up after handling 4500 pages.
Note: I tried changing foreach to iterator to use iterator.remove() for freeing up space.
public static String myChildPropsToString(final UnicodeProperties myLayoutProps) {
final StringBuilder sb = new StringBuilder(myLayoutProps.size());
final String[] matchTarget = new String[] { StringPool.RETURN_NEW_LINE, StringPool.NEW_LINE, StringPool.RETURN };
final String[] replaceTargetBy = new String[] { "_SAFE_NEWLINE_CHARACTER_", "_SAFE_NEWLINE_CHARACTER_",
"_SAFE_NEWLINE_CHARACTER_" };
//COMMENTED TO TRY OUT ITERATOR.REMOVE
//
// for (final Map.Entry<String, String> entry : myLayoutProps.entrySet()) {
// final String value = entry.getValue();
//
// if (Validator.isNotNull(value)) {
// StringUtil.replace(value, matchTarget, replaceTargetBy);
//
// sb.append(entry.getKey());
// sb.append(StringPool.EQUAL);
// sb.append(value);
// sb.append(StringPool.NEW_LINE);
// }
// }
final Iterator<Entry<String, String>> propsIterator = myLayoutProps.entrySet().iterator();
while (propsIterator.hasNext()) {
final Entry<String, String> entry = propsIterator.next();
if (Validator.isNotNull(entry.getValue())) {
StringUtil.replace(entry.getValue(), matchTarget, replaceTargetBy);
sb.append(entry.getKey());
sb.append(StringPool.EQUAL);
sb.append(entry.getValue());
sb.append(StringPool.NEW_LINE);
}
}
propsIterator.remove();
return sb.toString();
}
From my code I am setting this to a parent properties obj as follows :
UnicodeProperties myParentProps = new UnicodeProperties();
//Set some values to parent
UnicodeProperties myLayoutProps = new UnicodeProperties();
//Set some values to child
....
myParentProps.setProperty("childProp",myChildPropsToString(myLayoutProps));
Any help would be deeply appreciated !
Try putting the propsIterator.remove(); inside the while.
final Iterator<Entry<String, String>> propsIterator = myLayoutProps.entrySet().iterator();
while (propsIterator.hasNext()) {
final Entry<String, String> entry = propsIterator.next();
if (Validator.isNotNull(entry.getValue())) {
StringUtil.replace(entry.getValue(), matchTarget, replaceTargetBy);
sb.append(entry.getKey());
sb.append(StringPool.EQUAL);
sb.append(entry.getValue());
sb.append(StringPool.NEW_LINE);
}
propsIterator.remove();
}

Categories