Is putting asynchronously different keys in an HashMap dangerous? - java

(Don't judge the design, be mercyful)
I have a Map<String, String> that I need to populate with sub-maps from asynchronous calls. I am using map.putAll(dataMap) to insert each sub-map into the main map.
However, the asynchronous part is making me kinf of nervous. The reason is, I know that I won't attend to insert the same key twice (it is a sure fact), but I don't know if the fact that I insert data asynchronously will trigger concurrency mechanism.
Should I use a ConcurrentHashMap to be sure, or there are no risks with inserting into a classic HashMap asynchronously because I know I won't insert the same key twice ? Or is there a third object that I don't know of, that would fit the job perfectl ?

Related

Can I use LinkedHashMap in Hazelcast?

can i somehow use linkedHashMap in Hazelcast (java spring). I need to get unique records from hazelcast shared in-memory cache but in order in which I inserted them. I found in hazelcast documentation (https://docs.hazelcast.org/docs/latest-dev/manual/html-single/) they offers distributed implementations of common data structures. But map doesnt preserves elements order and list or queue dont remove duplicite data. Do you know if i can use linkedHashMap or somehow get unique data and preserves their order?
Ordered or linked storage isn't compatible with the goals of a data grid - highly concurrent and distributed storage.
Ordered retrieval is possible. Hazelcast's Paging Predicate with a comparator would do it. Or the volume is not too high, you could retreive the entry set and sort it yourself.
The catch is, you have to provide the field to order upon.
If your data already has some sort of sequence number or timestamp that is always unique, this is easy.
If not, perhaps something like Atomic Long would do it. A getAndIncrement() would give you a unique number to use for each insert.
Watch though, this has a race condition if two or more threads insert concurrently. To solve this you'd need some sort of singleton #Service running somewhere to do the "get next seqno ; inset` step.
And if you restart the grid, the seqno in the atomic counter will need repositioned to the right place.

Pass a map (or concurrent hashmap) in a DoFn(apache crunch)

Since there's a limit for Hadoop counter size(and we dont want to increase it for just one job), I am creating a map(Map) which will increment the key if some conditions are met(Same as counters). There is already a DoFn (returning custom made object) which is processing the data so I am interested in passing a map into it and grouping it outside based on keys.
I think concurrenthashmap might work but unable to implement the same.

How to avoid duplicate insert in DB through Java?

I need to insert employee in Employee table What i want is is to avoid duplicate inserts i.e. if thwo thread tries to insert same employee at same time then last transaction
should fail. For example if first_name and hire_date is same for two employees(same employee coming from two threads) then fail the last transaction.
Approach 1:- First approach i can think of put the constraint at column level(like combined unique constraint on first_name and hire_date) or in the query check if
employee exist throw error(i believe it will be possible through PL/SQL)
Approach 2:- Can it be done at java level too like create a method which first check if employee exists then throw error. In that case i need to make the method scynchronized (or
synchronized block) but it will impact performance it will unnecassrily hold other transactions also. Is there a way i can make put the lock(Reentrant lock) or use the synchronized method based on name/hiredate so that only those specific thransaction are put on hold which has same name and hiredate
public void save(Employee emp){
//hibernate api to save
}
I believe Approach 1 should be preferred as its simple and easier to implement. Right ? Even yes, i would like to know if it can be handled efficiently at java level ?
What i want is is to avoid duplicate inserts
and
but it will impact performance it will unnecassrily hold other transactions also
So, you want highly concurrent inserts that guarantee no duplicates.
Whether you do this in Java or in the database, the only way to avoid duplicate inserts is to serialize (or, Java-speak, synchronize). That is, have one transaction wait for another.
The Oracle database will do this automatically for you if you create a PRIMARY KEY or UNIQUE constraint on your key values. Simultaneous inserts that are not duplicates will not interfere or wait for one another. However, if two sessions simultaneously attempt duplicate inserts, the second will wait until the first completes. If the first session completed via COMMIT, then the second transaction will fail with a duplicate key on index violation. If the first session completed via ROLLBACK, the second transaction will complete successfully.
You can do something similar in Java as well, but the problem is you need a locking mechanism that is accessible to all sessions. synchronize and similar alternatives work only if all sessions are running in the same JVM.
Also, in Java, a key to maximizing concurrency and minimizing waits would be to only wait for actual duplicates. You can achieve something close to that by hashing the incoming key values and then synchronzing only on that hash. That is, for example, put 65,536 objects into a list. Then when an insert wants to happen, hash the incoming key values to a number between 1 and 65536. Then get that object from list and synchronize on that. Of course, you can also synchronize on the actual key values, but a hash is usually as good and can be easier to work with, especially if the incoming key values are unwieldly or sensitive.
That all said, this should absolutely all be done in the database using a simple PRIMARY KEY constraint on your table and appropriate error handling.
One of the main reasons of using databases is that they give you consistency.
You are volunteering to put some of that responsibility back into your application. That very much sounds like the wrong approach. Instead, you should study exactly which capabilities your database offers; and try to make "as much use of them as possible".
In that sense you try to fix a problem on the wrong level.
Pseudo Code :
void save (Employee emp){
if(!isEmployeeExist(emp)){
//Hibernate api to save
}
}
boolean isEmployeeExist(Employee emp){
// build and run query for finding the employee
return true; //if employee exists else return false
}
Good question. I would strongly suggest using MERGE (INSERT and UPDATE in single DML) in this case. Let Oracle handle txn and locks. It's best in your case.
You should create Primary Key, Unique constraint (approach 1) regardless of any solution to preserve data integrity.
-- Sample statement
MERGE INTO employees e
USING (SELECT * FROM hr_records) h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
since the row is not inserted yet, the isolation level such as READ_COMMITED/REPEATABLE_READ will not be applicable on them.
Best is to apply DB constraint(unique) , if that does not exist then in a multi node setup
you can't achive it thru java locks as well. As request can go to any node.
So, in that case we need to have distributed lock kind of functionality.
We can create a table lock where we can define for each table only one/or collection of insertion is possible at a node.
Ex:
Table_Name, Lock_Acquired
emp, 'N'
no any code can get READ_COMMITED on this row and try to update Lock_acuired to 'Y'
so , any other code in other thread or other node wont be able to proceed further and lock will be given only when the previous lock has been released.
This will make sure highly concurrent system which can avoid duplication, however this will suffer from scalibiliy issue. So decide accordingly what you want to achive.

Get the data from databases for each primaryKey in parallel

I have list of primary keys ex: empids, I want to get the employee information for each emplid from the database. Or rather, I want to get the data from different databases based on different types of empids using multiple threads.
Currently I'm fetching first employee information and save it into Java bean and fetch second employee and saved it into bean so on. Finally adding all these beans into ArrayList, but now I want to get data from databases in parallel. Means at a time I want to get the employee information for each employee and save it into bean.
Basically I'm looking parallel processing rather than sequentially to improve the performance.
I don't think you're looking for parallelism in this case. You really are looking for a single query that will return all employees whose id is in the collection of Ids that you have. One database connection, one thread, one query, and a result set.
If you are using hibernate, this is super easy with Hibernate Criteria where you can use a Restrictions.IN on the employeeId and pass it the collection of ids. The query underneath will be something like select a, b, c, ..., n from Employee where employee_id in (1,2,3,4...,m)
If you are using straight JDBC, you can achieve the same in your native query, you will need to change the ResultSet parsing because you will now expect a collection back.
You can create a callable task for fetching the employee information and return the ArrayList from that callable (thread).
You can then submit the tasks using a Executor and get the handle of the futures to loop back on results.
//sudo code for
Future<Arraylist<Employee>> fut = executor.submit(new EmployeeInfoTask(empIds));
//EmployeeInfoTask is a callable
for(Arraylist<Employee> result : fut){
//print result;
}
See Executor, Callable
EDIT - for java 1.4
In this case you can still make the database calls in different threads but you will need to make each thread write to a shared Employee collection. Don't forget to synchronize the access to this collection.
Also you will need to join() on all the threads which you have spawned so that you know when all the threads are done..

Can Hibernate return a collection of result objects OTHER than a List?

Does the Hibernate API support object result sets in the form of a collection other than a List?
For example, I have process that runs hundreds of thousands of iterations in order to create some data for a client. This process uses records from a Value table (for example) in order to create this output for each iteration.
With a List I would have to iterate through the entire list in order to find a certain value, which is expensive. I'd like to be able to return a TreeMap and specify a key programmatically so I can search the collection for the specific value I need. Can Hibernate do this for me?
I assume you are referring to the Query.list() method. If so: no, there is no way to return top-level results other than a List. If you are receiving too many results, why not issue a more constrained query to the database? If the query is difficult to constrain, you can populate your own Map with the contents of Hibernate's List and then throw away the list.
If I understand correctly, you load a bunch of data from the database to memory and then use them locally by looking for certain objects in that list.
If this is the case, I see 2 options.
Dont load all the data, but for each iteration access the database with a query returning only the specific record that you need. This will make more database queries, so it will probably bu slower, but with much less memory consumption. This solution could easily be improved by adding cache, so that most used values will be gotten fast. It will of course need some performance measurement, but I usually favor a naive solution with good caching, as the cache can implemented as a cross-concern and be very transparent to the programmer.
If you really want to load all your data in memory (which is actually a form of caching), the time to transform your data from a list to a TreeMap (or any other efficient structure) will probably be small compared to the full processing. So you could do the data transformation yourself.
As I said, in the general case, I would favor a solution with caching ...
From Java Persistence with Hibernate:
A java.util.Map can be mapped with
<map>, preserving key and value
pairs. Use a java.util.HashMap to
initialize a property.
A java.util.SortedMap can be mapped
with <map> element, and the sort
attribute can be set to either a
comparator or natural ordering for
in-memory sorting. Initialize the
collection with a java.util.TreeMap
instance.
Yes, that can be done.
However, you'll probably have to have your domain class implement Comparable; I don't think you can do it using a Comparator.
Edit:
It seems like I misunderstood the question. If you're talking about the result of an ad hoc query, then the above will not help you. It might be possible to make it work by binding an object with a TreeMap property to a database view if the query is fixed.
And of course you can always build the map yourself with very little work and processing overhead.

Categories