Performatic structure without data duplication - java

Say I have the following classes:
public class Tagged {
private List<String> tags;
}
public class ContainerOfTagged {
private List<Tagged> tagged;
}
With this structure, whenever I need to find a Tagged with a specific tag, I need to iterate over all the tagged in ContainerOfTagged, and iterating over all tags of each Tagged. That could affect performance depending on the size of the lists.
A simple solution would be changing the ContainerOfTagged class to use a Map, mapping tags in lists of Tagged:
public class ContainerOfTagged {
private Map<String, List<Tagged>> tagMapping;
}
Now all I need to do is provide a tag, and the Map will return all Tagged with said tag. However, by doing this I'm causing data duplication, since the same tags exist in both the Tagged and ContainerOfTagged classes.
So, is there a way to solve this problem with a performatic solution that doesn't duplicate data?

You can't really avoid "duplicating" the tags, but remember that you are not really duplicating them as the Lists and Maps only store references to the tag string, not the values (however, the references are likely to take up quite a lot of space in themselves).
The problem is that you need two indexes:
You need to find the list of tags, given the Tagged object.
You need to find the Tagged object, given a tag.
Ideally, your solution would look like this.You can solve your concerns about things getting out-of-sync by having a single method to manage tagging.
Note that in Tagged you should use a Set instead of a list to avoid duplication of tags.
public class Tagged {
Set<String> tags;
}
public class TagContainer {
Map<String, Tagged> tagIndex;
public tag(String tag, Tagged tagged) {
tagged.tags.add(tag);
tagIndex.put(tag, tagged);
}
If memory utilisation is a major concern you could try some kind of reference compression. Using this technique, you could store your tags in a array and then refer to them by index. If you had few enough, you could use a byte or short instead of a reference, but the code would be a lot messier and I would not recommend it.
EDIT:
In my first post, I proposed that Tagged should be an interface called Tagable. This is cleaner, but lengthens the solution, so I reverted to a class. Howevever, you could perhaps consider having a Tagable interface and implement this in the Tagged class.
public interface Tagable {
Set<String> getTags;
tag(String tag);
}

Related

Fluently adding to a Collection (add and return the value)

Time and again, I find myself in the situation where I want to use a value, and add it to a collection at the same time, e.g.:
List<String> names = new ArrayList<>();
person1.setName(addTo(names, "Peter"));
person2.setName(addTo(names, "Karen"));
(Note: using java.util.Collection.add(E) doesn't work of course, because it returns a boolean.)
Sure, it's easy to write a utility method myself like:
public static <E> E addTo(Collection<? super E> coll, E elem) {
coll.add(elem);
return elem;
}
But is there really not something like this already in JavaSE, Commons Collections, Guava, or maybe some other "standard" library?
The following will work if you use Eclipse Collections:
MutableList<String> names = Lists.mutable.empty();
person1.setName(names.with("Peter").getLast());
person2.setName(names.with("Karen").getLast());
The with method returns the collection being added to so you can easily chain adds if you want to. By using getLast after calling with on a MutableList (which extends java.util.List) you get the element you just added.
Note: I am a committer for Eclipse Collections.
This looks like a very strange pattern to me. A line like person1.setName(addTo(names, "Peter")) seems inverted and is very difficult to properly parse:
An existing person object is assigned a name, that name will first be added to a list of names, and the name is "Peter".
Contrast that with (for example) person1.setName("Peter"); names.add(person1.getName());:
Make "Peter" the name of an existing person object, then add that name to a list of names.
I appreciate that it's two statements instead of one, but that's a very low cost relative to the unusual semantics you're proposing. The latter formatting is easier to understand, easier to refactor, and more idiomatic.
I would be willing to wager that many scenarios that might benefit from your addTo() method have other problems and would be better-served by a different refactoring earlier on.
At its core the issue seems to be that you're trying to represent a complex data type (Person) while simultaneously constructing an unrelated list consisting of a particular facet of those objects. A potentially more straightforward (and still fluent) option would be to construct a list of Person objects and then transform that list to extract the values you need. Consider:
List<Person> people = ImmutableList.of(new Person("Peter"), new Person("Karen"));
List<String> names = people.stream().map(Person::getName).collect(toList());
Notice that we no longer need the isolated person1 and person2 variables, and there's now a more direct relationship between people and names. Depending on what you need names for you might be able to avoid constructing the second list at all, e.g. with List.forEach().
If you're not on Java 8 yet you can still use a functional syntax with Guava's functional utilities. The caveat on that page is a worthwhile read too, even in Java-8-land.

Is it a bad practice to add elements to List using getter method in java?

Suppose I have a private ArrayList or a LinkedList inside a class, that I will never assign new reference to it, or in other words this will never happen:
myLinkedList = anotherLinkedList;
So that I won't need to use setMyLinkedList(anotherLinkedList).
But! I need to add elements to it, or remove elements from it.
Should I write a new kind of setter to only, do the task of adding instead of setting, like myLinkedList.add(someElement)?
Or it is OK to do this by using getter, without disobeying Encapsulation principal?
getMyLinkedList().add(someElement)
( + Suppose I am going to lose my mark if I disobey encapsulation :-")
I don't think it a particularly great practice to do something like:
myObj.getMyList().add(x);
since you are exposing a private class variable in a non read only way, but that being said I do see it pretty frequently(I'm looking at you, auto generated classes). I would argue that instead of doing it that way, return an unmodifiable list and allow users of the class to add to the list via an explicit method:
public class MyClass{
private final List<String> myList = new ArrayList<String>();
public List<String> getList(){
return Collections.unmodifiableList(this.myList);
}
public void addToList(final String s){
this.myList.add(s);
}
}
EDIT After reviewing your comments, I wanted to add a bit about your setter idea:
I meant using that line of code inside a new kind of setter inside the class itself, like public void setter(someElement){this.myLinkedList.add(someElement);}
If I'm understanding you correctly, you are saying you want to expose a method that only adds to your list. Overall this is what I think you should be shooting for, and what many have outlined in the answers, however, labeling it as a setter is a bit misleading since you are not reassigning (setting) anything. That, and I strongly recommend returning a read only list from your getter method if possible.
I would suggest in this case it would be best to follow your Encapsulation principals and use a method for adding elements to the list. You have restricted access to your list by making it private so that other classes cannot directly access the datatype.
Let the class that stores your ArrayList have direct access to the list, but when other classes want to add to the list, use an add() method.
In general, you should not assume that the list being returned by the getter is the original one. It could be decorated or proxied for example.
If you want to prevent that a new list is set on the target object, you could define an add method on the target class instead.
As soon as you have a Collection of any kind, it is generally not a bad idea to add methods like add(), remove() to the interface of your class if it makes sense that clients can add or remove objects from your private list.
The reason why it is useful to have these extra methods implemented (it might seem like overkill, because after all those methods mostly just call the method on the Collection) is that you protect evil clients from doing things to your list you don't want them to do, because the interface of most Collections contain more than just the add() and remove() methods and mostly, you don't want clients to be messing around with things you can't control. Therefore the encapsulation principle is that important to your teacher.
Another plus: if at any time, you would decide that a certain condition must be met when an object is added to your list, this can easily be implemented in the method you already have. If you give a client access to the direct reference of your list, it is not easy at all to implement this kind of things (which are not rare).
Hope this helps
So you have a class containing a List field (it should be final, since you don't intend to assign to it), and you want to allow callers to add to the List, but not be able to replace it.
You could either provide a getter for the list:
public List<E> getMyList() {
return myList;
}
Or provide a method to add to that list:
public void addToMyList(E e) {
myList.add(e);
}
Both are valid design decisions, but which you use will depend on your use case. The first option gives callers direct access to the List, effectively making it public. This is useful when users will be modifying and working with the list repeatedly, but can be problematic as you can no longer trust the List is in any sort of reliable state (the caller could empty it, or reorder it, or even maliciously insert objects of a different type). So the first option should only be used when you intend to trust the caller.
The second option gives the caller less power, because they can only add one element at a time. If you want to provide additional features (insertion, add-all, etc.) you'll have to wrap each operation in turn. But it gives you more confidence, since you can be certain the List is only being modified in ways you approve of. This latter option also hides (encapsulates) the implementation detail that you're using a List at all, so if encapsulation is important for your use case, you want to go this way to avoid exposing your internal data structures, and only expose the behavior you want to grant to callers.
It depends on the application - both are acceptable. Take a good look at the class you're writing and decide if you want to allow users to directly access the contents of the list, or if you would prefer that they go through some intermediate process first.
For example, say you have a class ListEncrypter which holds your MyLinkedList list. The purpose of this class is to encrypt anything that is stored in MyLinkedList. In this case, you'd want to provide a custom add method in order to process the added item before placing it in the list, and if you want to access the element, you'd also process it:
public void add(Object element)
{
MyLinkedList.add(encrypt(element););
}
public Object get(int index)
{
return decrypt(MyLinkedList.get(index););
}
In this case, you clearly want to deny the user's access to the MyLinkedList variable, since the contents will be encrypted and they won't be able to do anything with it.
On the other hand, if you're not really doing any processing of the data (and you're sure you won't ever need to in the future), you can skip creating the specialized methods and just allow the user to directly access the list via the get method.

Dynamic SQL-Application

I want to create a dynamic sql java application. Normaly i create a java pojo with hard coded columns. For Example:
public class DbEntry{
private int id;
private String name;
public setter and getter
}
Now, the problem is, that the user can change the Database columns as he need. For example, he can add new columns if he need and so on. But if he change the columns the hard coded pojo cant representate the whole db entry. I have read over dynamic byte code creation, but i dont really want to use this, if there is an other/better solution.
Consider this class:
public class DbEntry{
List<Integer> integerList;
List<String> strList;
public Integer getInt(int index){
return integerList.get(index);
}
public String getStr(int index){
return strList.get(index);
}
//todo: add some constructors/factory methods
}
For fixed columns, you can write some global constants like staic int I_ID=0 and static int I_NAME=0. So you can get the id and name of an DbEntry by calling dbEntry.getInt(I_ID) and dbEntry.getStr(I_NAME)
For changeable columns you can use a List<String>, add new column names to the list and then you can call dbEntry.getStr(collst.indexOf("name"))
Or you can write a class using strings as keys, so you can call dbEntry.getStr("name"), e.g.:
public class DbEntry{
Map<String,Integer> integerMap;
Map<String,String> strMap;
public Integer getInt(String key){
return integerMap.get(key);
}
public String getStr(String key){
return strMap.get(key);
}
//todo: add some constructors/factory methods
}
This class looks more straightforward but it wastes some memory. Because every dbEntry in the same table has the same set of column names. A single list is enough for storing the column names of a table. HashMap uses more memory than ArrayList. Despite this disadvantage, what data structures to use still depends on your requirements.
Or you may want to make it an interface with getInt, getStr, getDate, getBlob, so you can have the flexibility by implementing the interface using different data structures.
I have seen this done, and it is a lot of work. What you end up doing is having a dynamic model, typically modelling classes and attributes. You expose the Classes and Attributes (and their definition) to a sysadmin role.
The rest of the application sends and retrieves instance data using this dynamic model. As a start, you won't have static Java classes representing them. In your above example, the DbEntry doesn't exist. You'll end up with a generic Model Object that allows you to return DbEntry objects in a common model. Something like
class DynamicObject {
ClassDefinition getClass(); // a ClassDefinition that contains details about DbEntry
Collection<AttributeDetails> getAttributes();
AttributeValue getValue(AttributeDetails details);
void setValue(AttributeDetails details, AttributeValue value);
}
This above is all bespoke code written/defined by you. I am unaware of any third party framework that provides this to you. That said, I haven't looked very hard.
The bottom line is, for what you want to do, the Classes and Attributes end up being modelled by the application and the rest of the application works off that model. Only by doing that, will you prevent the need for making static Java changes when the model changes.
It is not trivial, and carries with it a fair amount of maintenance. I have seen this done, and over time it did become a fairly arduous task to maintain.

Updating only some attributes of objects in cache

Let's say I have a class Item. Items have object attributes and collection of other objects attributes:
public class Item
{
//Object attributes
String name;
int id;
Color color;
//Collection of object attributes
List<Parts> parts;
Map<int,Owner> ownersById;
}
I have a fairly simple web application that allows crud operations on these items. This is split up into separate operations:
a page where you can update the simple object attributes (name, id...).
a page where you can edit the collection of parts.
a page where you can edit the map of owners.
Because the server load was getting too high, I implemented a cache in the application which holds the "most recently used item objects" with their simple attributes and their collection attributes.
Whenever an edit is made to the name of an item, I want to do the following do things:
Persist the change to the item's name. This is done by converting the item object to xml (without any collection attributes) and calling a web service named "updateItemData".
Update the current user's cache by updating the relevant item's nme inside the cache. This way the cache stays relevant without having to load the item again after persisting it.
To do this I created the following method:
public void updateItem(Item itemWithoutCollectionData)
{
WebServiceInvoker.updateItemService(itemWithoutCollectionData)
Item cachedItemWithCollectionData = cache.getItemById(itemWithoutCollectionData.getId());
cachedItemWithCollectionData.setName(itemWithoutCollectionData.getName());
cachedItemWithCollectionData.setColor(itemWithoutCollectionData.getColor());
}
This method is very annoying because I have to copy the attributes one by one, because I cannot know beforehand which ones the user just updated. Bugs arised because the objects changed in one place but not in this piece of code. I can also not just do the following: cachedItem = itemWithoutCollectionData; because this would make me lose the collection information which is not present in the itemWithoutCollectionData variable.
Is there way to either:
Perhaps by reflection, to iterate over all the non-collection attributes in a class and thus write the code in a way that it does not matter if future fields are added or removed in the Item class
Find a way so that, if my Item class gains a new attribute, a warning is shown in the class that deals with the caching to signal "hey, you need to update me too!")?
an alternative which might seem a bit overkill: wrap all the non-collection attributes in a class, for example ItemSimpleData and use that object instead of separate attributes. However, this doesn't work well with inheritance. How would you implement this method in the following structure?
classes:
public class BaseItem
{
String name;
int id;
}
public class ColoredItem
{
Color color;
}
There many things that can be done to enhance what you currently have but I am going to point out just two things that may help you with your problem.
Firstly, I am assuming that public void updateItem is a simplified version from your production code. So; make sure this method is thread safe, since it is a common source or problems when it comes to caching.
Secondly, you mentioned that
Perhaps by reflection, to iterate over all the non-collection
attributes in a class and thus write the code in a way that it does
not matter if future fields are added or removed in the Item class.
If I understand the problem correctly; then, you can easily achieve this using BeanUtils.copyProperties() here is an example:
http://www.mkyong.com/java/how-to-use-reflection-to-copy-properties-from-pojo-to-other-java-beans/
I hope it helps.
Cheers,

Naming Convention for methods that return different types with similiar parameters

I have a SearchService which uses an algorithm for querying a database and returing the results. There are a couple of different formats the data can be returned as, depending on what the invoker wants from the service. These formats are:
A list of entities that directly match against a table in the database
A list of primary keys (Longs) of the records that match
A list of 'search results' which is composed of a bunch of fields that are generally relevant to what a user would want to see from a search result (say a persons name, address phone number etc)
Currently my SearchService looks like:
public interface SearchService {
public List<People> searchPeopleReturnEntity(SearchRequest request);
public List<Long> searchPeopleReturnId(SearchRequest request);
public List<SearchResult> searchPeopleReturnSearchResult(SearchRequest request);
}
I'm looking for advice on best practices regarding this. Currently the naming convention seems pretty clunky and I believe there is a better solution than what I have now.
I'd call them something simple like getPeople, getIds, getSearchResults.
If you need these same 3 methods for entities other than people, I'd consider making some generic intermediate type defining them, allowing you to write something like this:
List<People> people = service.getPeople(request).asEntities();
List<Long> fooIds = service.getFoos(request).asIds();
// or something like this
List<People> people = service.searchPeople().getEntities(request);
I'd call them findPeople(), findPeopleIDs() and findPeopleResults().
If your service can return other instances as well (besides People), I'd name them
findPeopleEntities()
findPeopleIds()
getSearchResult() or getPeopleSearchResult() (you generally don't find search results ;) )
If SearchResult is used for people only, I'd also name it PeopleSearchResult. Otherwise I'd give it a generic parameter like SearchResult<T> and then List<SearchResult<People>> getPeopleSearchResult().
or you can use the searchBy... approach? though I agree that denotes you will return same results.
Perhaps you can refactor a bit:
search... will return a List (I'm guessing ID's in the database?)
then add a getPeople( List ) which actually returns the list of objects for those ID's
so your searc... methods always return ID's and then you have specialized functions to transform those into "proper" search results?
In case it isn't obvious, there is no real "best practice" naming scheme for this kind of method.
One idea might be:
public <T> T findPeople(SearchRequest request, Class<T> resultClass);
You can then return different things according to whether resultClass is Person.class, Long.class, or SearchResult.class.
Or, less horribly, you could do:
public <T> T findPeople(SearchRequest request, ResultConverter<T> resultConverter);
Where ResultConverter is an interface which takes some sort of raw search result and returns a suitable converted result. You could have canned instances for the common ones:
public class ResultConverters {
public static final ResultConverter<Long> ID;
public static final ResultConverter<Person> PERSON;
public static final ResultConverter<SearchResult> SEARCH_RESULT;
}

Categories