Bean class Vs Collection : which one should i prefer to hold data

Bean class Vs Collection : which one should i prefer to hold data - java

I have a TestDTO class which holds the 2 input data from user,
next step is to fetch the several data from database, lets say i am fetching ten String type values from database which requires further to execute the business logic.
I wanted to know the best way to hold the data (in terms of saving memory space and performance)
Add 10 more fields in the existing TestDTO class and set database values at run time
Use java.util.collection (List/Map/..)
Create another DTO/Bean class for 10 String values

If you want modularity of your code 3rd point is better, but for simplicity you should use a HashMap, like:
HashMap map = new HashMap();
map.put("string1",value);
.....
and so on.
This post can be useful for you : https://forums.oracle.com/thread/1153857

If TestDTO and the new values fetched are coming from the same table in the database, then they should be in the same class. Else, the new values should ideally be in another DTO. I do not know the exact scenario that you have, so given these constraints, 2nd option goes out of the window. And options 1 and 3 will depend on your scenario. Always hold values from a single table in one object(preferably).

Related

How to make a Hibernate SearchSession return results with unique attributes?

I am working on using the Hibernate SearchSession class in Java to perform a search against a database, the code I currently have to search a table looks something like this:
SearchSession searchSession = Search.session(entityManagerFactory.unwrap(SessionFactory.class).withOptions()
.tenantIdentifier("locations").openSession());
SearchResult<Location> result = searchSession.search(Location.class)
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
).fetch(page * limit, limit);
This search works and properly returns results from the database, but there is no uniqueness constraint on the locationName column and the database holds multiple records with the same value in locationName. As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before? Applying a uniqueness constraint to the database table isn't an option in this scenario, and we were hoping there's a way to handle filtering out duplicate values in the session over taking the results from the search and removing duplicate values separately.

Is there a way to make a SearchSession only return a result if another result with an identical value (such as locationName) has not been returned before?
Not really, at least not at the moment.
If you're using the Elasticsearch backend and are fine with going native, you can insert native JSON into the Elasticsearch request, in particular collapsing.
I think something like this might work:
SearchResult<Location> result = searchSession.search( Location.class )
.extension( ElasticsearchExtension.get() )
.where( f -> f.bool()
.must( f.match()
.field("locationName")
.matching((phrase)).fuzzy())
)
.requestTransformer( context -> {
JsonObject collapse = new JsonObject();
collapse.addProperty("field", "locationName_keyword")
JsonObject body = context.body();
body.add( "collapse", collapse );
} )
// You probably need a sort, as well:
.sort(f -> f.field("id"))
.fetch( page * limit, limit );
You will need to add a locationName_keyword field to your Location entity:
#Indexed
#Entity
public class Location {
// ...
#Id
#GenericField(sortable = Sortable.YES) // Add this
private Long id;
// ...
#FullTextField
#KeywordField(name = "locationName_keyword", sortable = Sortable.YES) // Add this
private String locationName;
// ...
}
(You may need to also assign a custom normalizer to the locationName_keyword field, if the duplicate locations have a slightly different locationName (different case, ...))
Note however that the "total hit count" in the Search result will indicate the number of hits before collapsing. So if there's only one matching locationName, but 5 Location instances with that name, the total hit count will be 5, but users will only see one hit. They'll be confused for sure.
That being said, it might be worth having another look at your situation to determine whether collapsing is really necessary here:
As a result, when we try to display them on the UI of the application it looks like there are duplicate values, even though they're unique in the database.
If you have multiple documents with the same locationName, then surely you have multiple rows in the database with the same locationName? Duplication doesn't appear spontaneously when indexing.
I would say the first thing to do would be to step back, and consider whether you really want to query the Location entity, or if another, related entity wouldn't make more sense. When two locations have the same name, do they have a relationship to another, common entity instance (e.g. of type Shop, ...)?
=> If so, you should probably query that entity type instead (.search(Shop.class)), and take advantage of #IndexedEmbedded to allow filtering based on Location properties (i.e. add #IndexedEmbedded to the location association in the Shop entity type, then use the field location.locationName when adding a predicate that should match the location name).
If there is no such related, common entity instance, then I would try to find out why locations are duplicated exactly, and more importantly why that duplication makes sense in the database, but not to users:
Are the users not interested in all the locations? Then maybe you should add another filter to your query (by "type", ...) that would help remove duplicates. If necessary, you could even run multiple search queries: first one with very strict filters, and if there are no hits, fall back to another one with less strict filters.
Are you using some kind of versioning or soft deletion? Then maybe you should avoid indexing soft-deleted entities or older versions; you can do that with conditional indexing or, if that doesn't work, with a filter in your search query.
If your data really is duplicated (legacy database, ...) without any way to pick a duplicate over another except by "just picking the first one", you could consider whether you need an aggregation instead of full-blown search. Are you just looking for the top location names, or maybe a count of locations by name? Then aggregations are the right tool.

is it possible to create a hash map where each value is a method that acts differently on an object

Problem
I want to know if this is possible if I could create a State machine that would contain all the methods and the Values of MethodById would be stated in the machine.
P.S. this is my first question ever on here. If I do it wrong I'm sorry but that is why.
Description (TL;DR)
I'm trying to cross reference data about Sales representatives. Each rep has territories specified by zip-codes.
One dataset has the reps, their territories and their company.
Another data set has their names, phone number and email.
I made a Sales-rep class that takes from the first data-set and needs to be updated with the second data-set.
I also need the Sales-reps to be put in a look-up table (I used a hashmap for this) of <key: zip code, value: Sales-rep object>.
What I want is for each Sales-rep object to having an ID that is standard across all my datasets. I can't use the data I'm provided with because it comes from many different sources and its impossible to standardize any data field.
Names, for example, are listed so many different ways it would be impossible to reconcile them and use that as an ID.
If I can get an ID like this (something like an SSN but less sensitive) then I want to try what my question is about.
I want to iterate through all the elements in my <key: zip code, value: Sales-rep object> hashmap, we will call it RepsByZipCode. When I iterate through each Salesrep object I want to get an ID that I can use in a different hashmap called MethodById <key: ID, value: a method run on the Object with this ID>.
I want it to run a different method for each key on the Object with the matching key (AKA the ID). The point is to run a different method on each different object in linear time so that by the end of the for loop, each object in RepsByZipCode will have some method run on it that can update information (thus completing the cross-referencing).
This also makes the code very extendable because I can change the method for each key if I want to update things differently. Ex:
//SalesRep Object Constructor:
SalesRep(String name, String email, ..., String Id)
Map<String zipcode, Salesrep rep> RepsByZipCode = new HashMap<>{}
//code fills in the above with the first dataset
Map<String ID, ??? method> MethodById = new HashMap<>{}
//code fills in the above with the second dataset
for(String ZipKey:RepsByZipCode){
Salesrep Rep = RepsByZipCode.get(ZipKey);
Rep.getId = ID;
MethodById.get(ID);
//each time this runs, one entry in RepsByZipCode is updated with one
//method from MethodById.
//after this for loop, all of RepsByZipCode has been updated in linear time

You could put these methods into different classes that implement a common interface, and store an instance of each class in your map. If you're using at least Java 8 and your methods are simple enough, you could use lambdas to avoid some boilerplate.

How can a map be used to search the list of employe objects on the basis of different parameters?

I was asked this question in an interview today to which I explained the best to my abilities. But I still don't understand if this is the correct answer.
There is a cache which has Employee object as the key. The cache is populated with data from the database. Now there is a UI where we can enter either or all of the 3 attributes from the Employee object- name, ID and date of joining. Now this search would lead to multiple matching results. To achieve this we need to check in the cache for the data.
To this I replied saying that my map would be of the structure - >. for the same EmployeeDetails object ,
I will have multiple keys in the map(EmployeeDetails class is the object which contains complete detail of the Employee including address etc. Employee object just has 3 attributes - name, ID and date of joining.).
One of the objects with only name populated. The other with ID populated and the third one with date of joining populated. And now with the combination of attributes. So the map will be having the following keys -
Employee object with only the name populated -> Value would be list of of all the Employee objects with the same name.
Employee object with only the ID populated -> Value would be list of of all the Employee objects with the same ID. Ideally the list size in this case should be 1.
Employee objects with only the Date Of Joining -> List of all the employee objects with the same date of joining.
Similarly there would be number of other Employee objects. For one such employee , all the three attributes - name , ID and date of joining would be populated.
In this way, I could have achieved the requirement to display all the employee results in case only some of the attributes out of name, ID and value is set on the UI.
I just want to understand if this is the correct way to achieve the outcome (display of list of matching results on the UI). Since I did not get selected, I believe there is something else which I possibly missed!

A reasonable short answer is to maintain 3 separate maps for each of the 3 fields, with each one mapping from each field value to the list of employees with that value for the field.
To perform a lookup, retrieve the lists for each of the values that the user specified, and then (if you have more than one criteria) iterate through the shortest one to filter out employees that don't match the other criteria.
In the cases where you have more than one criteria, one of them has to be name or ID. In real life, the lists for these fields will be very short, so you won't have to iterate through any large collections.
This solution essentially uses the maps as indexes and implements the query like a relational DB. If you were to mention that in an interview, you would get extra points, but you'd need to be able to back it up.

One of the neat things about Java 8 is the Streams API. With this new API, you an hold all of those Employee objects within just a normal List and walk away with the same results you were trying to achieve with multiple mapping objects with less overhead.
See, this API has a .filter() method that you can pass over a List that has been transformed into a Stream to only return objects that meet the criteria described in the body of the filter.
List<Employee> emps = getEmps();
List<Employee> matchedEmps = emps.stream().filter((e)->e.getID().equals(searchedID)).filter((e)->e.getName().equals(searchedName)).collect(Collectors.toList());
As you can see you can chain filters to match multiple criteria, although it may be more efficient just to have all matching done in one filter:
ist<Employee> matchedEmps = emps.stream().filter((e)->{boolean matches = e.getID().equals(searchedID);return matches && e.getName().equals(searchedName);}).collect(Collectors.toList());

I would have a map with the Employee object as key and EmployeeDetails as value. I would get a get Collection of values from the map, create then a custom Comparator for each specific search, iterate through the values collection and use the comparator to compare the values. The search results should be added during the iteration in a results Collection.

One way is create mapping with mapping Employee-EmployeeDetails then for search for a given employee id then you have to iterate over all key and search.The complexity will be O(N).
Second to improve the time complexity even in database we do indexing to avoid full scan.You can try the similar thing here i.e create mapping id-Employee,email-Employee like this when add employee to main map also update to the index map.
Third if possible you can create a TRIE and at end node you can put employee.After getting the employee You can get employee details

How can I optimize an AppEngine Java/JDO datastore put() to use less writes

I'm tuning an app we run on App Engine and one of the largest costs is data store reads and writes. I have noticed one of the biggest offenders of the writes is when we persist an order.
Basic data is Order has many items - we store both separately and relate them like this:
#PersistenceCapable
public class Order implements Serializable {
#Persistent(mappedBy="order")
#Element(dependent = "true")
private List<Item> orderItems;
// other fields too obviously
}
#PersistenceCapable
public class Item implements Serializable {
#Persistent(dependent = "true")
#JsonIgnore
private Order order;
// more fields...
}
The appstats is showing two data store puts for an order with a single item - but both are using massive numbers of writes. I want to know the best way to optimize this from anyone who's got experience.
AppStats data:
real=34ms api=1695ms cost=6400 billed_ops=[DATASTORE_WRITE:64]
real=42ms api=995ms cost=3600 billed_ops=[DATASTORE_WRITE:36]
Some of the areas I know of that would probably help:
less indexes - there's implict indexes on a number of order and item properties that I could tell appengine not to index, for example item.quantity is not something I need to query by. But is that what all these writes are for?
de-relate item and order, so that I just have a single entity OrderItem, removing the need for a relationship at all (but paying for it with extra storage).
In terms of explicity indexes, I only have 1 on the order table, by order date, and one on the order items, by SKU/date and the implict one for the relationship.
If the items were a collection, not a list, would that remove the need for an index on the children _IDX entirely?
So, my question would be, are any of the above items going to herald big wins, or are there other options I've missed that would be better to focus on initially?
Bonus points: Is there a good 'guide to less datastore writes' article somewhere?

Billing docs clearly state:
New Entity Put (per entity, regardless of entity size): 2 writes + 2 writes per indexed property value + 1 write per composite index value
Existing Entity Put (per entity): 1 write + 4 writes per modified indexed property value + 2 writes per modified composite index value
Also relevant: App Engine predefines a simple index on each property of an entity.
On to questions:
Yes, number of write ops is related to number of indexes properties. Make them unindexed to save write ops.
Combining two entities together would save you 1 write (or 2 in case of new entities).
You don't need to have "explicit" indexes for one property only. These are generated automatically by appengine. You just need to explicitly configure compound indexes, spanning more properties.
No. Collection or List (= Collection with order) is just a Java representation, Datastore API always uses list internally (= items added retain their order).
Update:
Number of indexes affect cost of write but not it's speed. Writes are done in two phases: commit phase where entity data is saved, and apply phase where indexes are built. The put operation returns after commit phase and is not affected by number of indexes.
In your case you are calling two puts, one after another. As you can see from AppStats graph they happen consecutively. You might want to execute them in parallel as async operations (not sure if available in JDO).

Efficiently finding duplicates in a constrained many-to-many dataset?

I have to write a bulk operation version of something our webapp
lets you do on a more limited basis from the UI. The desired
operation is to assign objects to a category. A category can have
multiple objects but a given object can only be in one category.
The workflow for the task is:
1) Using the browser, a file of the following form is uploaded:
# ObjectID, CategoryID
Oid1, Cid1
Oid2, Cid1
Oid3, Cid2
Oid4, Cid2
[etc.]
The file will most likely have tens to hundreds of lines, but
definitely could have thousands of lines.
In an ideal world a given object id would only occur once in the file
(reflecting the fact that an object can only be assigned to one category)
But since the file is created outside of our control, there's no guarantee
that's actually true and the processing has to deal with that possibility.
2) The server will receive the file, parse it, pre-process it
and show a page something like:
723 objects to be assigned to 126 categories
142 objects not found
42 categories not found
Do you want to continue?
[Yes] [No]
3) If the user clicks the Yes button, the server will
actually do the work.
Since I don't want to parse the file in both steps (2) and (3), as
part of (2), I need to build a container that will live across
requests and hold a useful representation of the data that will let me
easily provide the data to populate the "preview" page and will let me
efficiently do the actual work. (While obviously we have sessions, we
normally keep very little in-memory session state.)
There is an existing
assignObjectsToCategory(Set<ObjectId> objectIds, CategoryId categoryId)
function that is used when assignment is done through the UI. It is
highly desireable for the bulk operation to also use this API since it
does a bunch of other business logic in addition to the simple
assignment and we need that same business logic to run when this bulk
assign is done.
Initially it was going to be OK that if the file "illegally" specified
multiple categories for a given object -- it would be OK to assign the
object abitrarily to one of the categories the file associated it
with.
So I was initially thinking that in step (2) as I went through the
file I would build up and put into the cross-request container a
Map<CategoryId, Set<ObjectId>> (specifically a HashMap for quick
lookup and insertion) and then when it was time to do the work I could
just iterate on the map and for each CategoryId pull out the
associated Set<ObjectId> and pass them into assignObjectsToCategory().
However, the requirement on how to handle duplicate ObjectIds changed.
And they are now to be handled as follows:
If an ObjectId appears multiple times in the file and
all times is associated with the same CategoryId, assign
the object to that category.
If an ObjectId appears multiple times in the file and
is associated with different CategoryIds, consider that
an error and make mention of it on the "preview" page.
That seems to mess up my Map<CategoryId, Set<ObjectId>> strategy
since it doesn't provide a good way to detect that the ObjectId I
just read out of the file is already associated with a CategoryId.
So my question is how to most efficiently detect and track these
duplicate ObjectIds?
What came to mind is to use both "forward" and "reverse" maps:
public CrossRequestContainer
{
...
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, List<CategoryId>> categoriesByObject; // HashMap
Set<ObjectId> illegalDuplicates;
...
}
Then as each (ObjectId, CategoryId) pair was read in, it would
get put into both maps. Once the file was completely read in, I
could do:
for (Map.Entry<ObjectId, List<CategoryId>> entry : categoriesByObject.entrySet()) {
List<CategoryId> categories = entry.getValue();
if (categories.size() > 1) {
ObjectId object = entry.getKey();
if (!all_categories_are_equal(categories)) {
illegalDuplicates.add(object);
// Since this is an "illegal" duplicate I need to remove it
// from every category that it appeared with in the file.
for (CategoryId category : categories) {
objectsByCategory.get(category).remove(object);
}
}
}
}
When this loop finishes, objectsByCategory will no longer contain any "illegal"
duplicates, and illegalDuplicates will contain all the "illegal" duplicates to
be reported back as needed. I can then iterate over objectsByCategory, get the Set<ObjectId> for each category, and call assignObjectsToCategory() to do the assignments.
But while I think this will work, I'm worried about storing the data twice, especially
when the input file is huge. And I'm also worried that I'm missing something re: efficiency
and this will go very slowly.
Are there ways to do this that won't use double memory but can still run quickly?
Am I missing something that even with the double memory use will still run a lot
slower than I'm expecting?

Given the constraints you've given, I don't there's a way to do this using a lot less memory.
One possible optimization though is to only maintain lists of categories for objects which are listed in multiple categories, and otherwise just map object to category, ie:
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, CategoryId> categoryByObject; // HashMap
Map<ObjectId, Set<CategoryId>> illegalDuplicates; // HashMap
Yes, this adds yet another container, but it will contain (hopefully) only a few entries; also, the memory requirements of the categoryByObject map is reduced (cutting out one list overhead per entry).
The logic is a little more complicated of course. When a duplicate is initially discovered, the object should be removed from the categoryByObject map and added into the illegalDuplicates map. Before adding any object into the categoryByObject map, you will need to first check the illegalDuplicates map.
Finally, it probably won't hurt performance to build the objectsByCategory map in a separate loop after building the other two maps, and it will simplify the code a bit.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.