Realm for Java/Android - preserving the state of query/result - java

In the project currently under development, we are integrating the Realm Database into the customer's app to improve responsiveness while working on a huge data set of ~20.000 records. For on-screen presentation, we are incorporating Realm's Android Recyclerview. Majority of the use cases are read operations, followed up by the possibility of advanced search and/or filtering of the records.
Where the shoe pinches are that on some of our views, from all the data of given type only a subset of records is supposed to be displayed, selected by the back-end. Using the information passed by the API, we perform the initial filtering and set up the view.
Now, using the aforementioned technologies, is there a readable and maintainable way to store either this pre-filtered subset or the query fetching it for further reference, so that the initial state of the view can always be restored once the searchview and/or filters are cleared? Or should storing the API response re-applying the conditions given through it be the only way to do it? Applying any new conditions to the query seems to alter it for good, the same goes for applying new queries to the results. Shouldn't there be a way to create ourselves a fresh result set based on an old one but without disturbing the latter?
Edit: Our app being 'bilingual', both Java- and Kotlin-based solutions are welcomed, should they differ.

As we came to realise after a while earlier this week, and just as #EpicPandaForce have mentioned in the comments, while the RealmQuery object cannot be "snapshoot" by assigning it to a spare variable before extending it, the same is not true for RealmResults objects. And so:
RealmResults<Obj> widerResults = realmInstance.where(Obj.class).in("id", idArray).findAll;
RealmResults<Obj> narrowerResults = widerResults.where().equalTo("flag", true).findAll;
Will provide two independent result sets. The wider one can be used as per the use case I highlighted - to treat it as a starting point for further subqueries. The changes to objects found in the sets themselves will still be reflected in both sets.
Providing an answer for all the lost souls out there, should they get stuck like we did.

Related

Choosing databasetype for a decentralized calendar project

I am developing a calendar system which is decentralised. It should save the data on each device and synchronise if they have both internet connection. My first idea was, just using a relational database and try to synchronise data after connection. But the theory says something else. The Brewers CAP-Theorem describes the theory behind it, but i am not sure if this theorem maybe is outdated. If i use this theorem i have "AP [Availability/Partition Tolerance] Systems". "A" because i need at any given time the data for my calendar and "P" because it can happen, that there is no connection between the devices and the data can't be synchronised. The example databases are CouchDB, RIAK or Cassandra. I have worked only with relational databases and doesn't know how to go on now. Is it that bad to use a relational Database for my project?
This is for my bachelor thesis. I just wanted to start using Postgres but then i found this theorem...
The whole project is based on Java.
I think the CAP theorem isn't really helpful to your scenario. Distributed systems that deal with partitions need to decide what to when one part wants to make a modification to the data, but can't reach the other part. One solution is to make the write wait - and this is giving up the "availability" because of the "partition", one of the options presented by the CAP theorem. But there are more useful options. The most useful (highly-available) option is to allow both parts to be written independently, and reconcile the conflicts when they can connect again. The question is how to do that, and different distributed systems choose different approaches.
Some systems, like Cassandra or Amazon's DynamoDB, use "last writer wins" - when we see a conflict between two conflicting writes, the last one (according some synchronized clock) wins. For this approach to make sense you need to be very careful about how you model your data (e.g., watch out for cases where the conflict resolution results in an invalid mixture of two states).
In other systems (and also in Cassandra and DynamoDB - in their "collection" types) writes can still happen independently on different nodes, but there is more sophisticat conflict resolution. A good example is Cassandra's "list": One can send an update saying "add item X to the list", and another update saying "add item Y to the list". If these updates happen on different partitions, the conflict is later resolved by adding both X and Y to the list. The data structures such as this list - which allows the content to be modified independently in certain ways on two nodes and then automatically reconciled in a sensible way, is known as a Conflict-free Replicated Data Type (CRDT).
Finally, another approach was used in Amazon's Dynamo paper (not to be confused by their current DynamoDB service!), known as "vector clocks": When you want to write to an object - e.g., a shopping cart - you first read the current state of the object and get with it a "vector clock", which you can think of as the "version" of the data you got. You then make the modification (e.g., add an item to the shopping cart), and write back the new version while saying what was the old version you started with. If two of these modifications happen on parallel on different partitions, we later need to reconcile the two updates. The vector clocks allow the system to determine if one modification is "newer" than the other (in which case there is no conflict), or they really do conflict. And when they do, application-specific logic is used to reconcile the conflict. In the shopping cart example, if we see the conflict is that in one partition item A was added to the shopping cart and in the other partition, item B was added to the shopping cart, the straightforward resolution is to just add both times A and B to the shopping cart.
You should probably pick one of these approaches. Just saying "the CAP theorem doesn't let me do this" is usually not an option ;-) In fact, in some ways, the problem you're facing is different than some of the systems I mentioned. In those systems, the common case is every node is always connected (no partition), with very low latency, and they want this common case to be fast. In your case, you can probably assume the opposite: the two parts are usually not connected, or if they are connected there is high latency, so conflict resolution because the norm, rather than the exception. So you need to decide how to do this conflict resolution - what happens if one adds a meeting on one device and a different meeting on the other device (most likely, just keep both as two meetings...), how do you know that one device modified a pre-existing meeting and didn't add a second meeting (vector clocks? unique meeting ids? etc.) so the conflict resolution ends up fixing the existing meeting instead of adding a second one? And so on. Once you do that, where you store the data on both partitions (probably completely different database implementations in the client and server) and which protocol you send the updates on become implementation details.
There's another issue you'll need to consider. When do we do these reconciliations? In many systems like I listed above, the reconciliation happens on read: If the client wants to read data and we suddenly see two conflicting versions on two reachable nodes, we reconcile. In your calendar application, you need a slightly different approach: It is possible that the client will only ever try to read (use) the calendar when not connected. You need to use the rare opportunities when he is connected to reconcile all the differences. Moreover, you may need to "push" changes - e.g., if the data on the server changed, the client may need to be told, "hey, I have some changed data, come and reconcile", so the end-user will immediately see an announcement on a new meeting, for example, that was added remotely (e.g., perhaps by a different user sharing the same calendar). You'll need to figure out how you want to do this. Again, there is no magic solution like "use Cassandra".

How do I store objects if I want to search them by multiple attributes later?

I want to code a simple project in java in order to keep track of my watched/owned tv shows, movies, books, etc.
Searching and retrieving the metadata from an API (themovieDB, Google Books) is already working.
How would I store some of this metadata together with user-input (like progress or rating)?
I'm planning on displaying the data in a table like form (example). Users should also be able to search the local data with multiple attributes. Is there any easy way to do this? I already thought about a database since it seemed that was the easiest solution.
Any suggestions?
Thanks in advance!
You can use lightweight database as H2, HSQLDB or SqlLite. These databases can be embedded in the Java app itself and does not require extra server.
If your data is less, you can also save it in XML or Json by using any XMLParser or JsonParser (e.g. Gson()).
Your DB table will have various attributes which are fetched from API as well as user inputs. You can write query on the top of these DBs to fetch and show the various results.
Either write everything to files, or store everything on a database. It depends on what you want though.
If you choose to write everything to files, you'll have to implement both the writing and the reading to suit your needs. You'll also have to deal with read/write bugs and performance issues yourself.
If you choose a database, you'll just have to implement the high level read and write methods, i.e., the methods that format the data and store it on the appropriate tables. The actual reading and writing is already implemented and optimized for performance.
Overall, databases are usually the smart choice. Although, be careful of which one you choose. Some types might be better for reading, while others are better for writting. You should carefully evaluate what's best, given your problem's domain.
There are many ways to accomplish this but as another user posted, a database is the clear choice.
However, if you're looking to make a program to learn with or something simple for personal use, you could also use a multi dimensional array of strings to hold the name of the program, as well as any other metadata fields and treat the array like a table in excel. This is not the best way to do it, but you can get away with it with very simple code. To search you would only need to loop through the array elements and check that the name of the program (i.e. movieArray[x][0] matches the search string. Once located you can perform actions or edit the other array indexes pertaining to that movie.
For a little more versatility, you would create a class to hold the movie information with fields to hold any metadata. The advantage here is that the metadata fields can be different types rather than having to conform to the array type, and their packaged together in the instance of the class. If you're getting the info from an API then you can update or create the classes from the API response. These objects can be stored in an ArrayList and searched with a loop that checks for a certain value i.e.
for (Movie M : movieArrayList){
if(m.getTitle().equals("Arrival")){
return m;
}
}
Alternatively of course for large scale, a database would be the best answer but it all depends what this is really for and what it's needs will be in the real world.

Methods of Data Storage - Opinion

I have been working on an app and have encountered some limitations relating to my lack of experience in Java IO and data persistence. Basically I need to store information on a few Spinner objects. So far I have saved information on each Spinner into a text file using the format of:
//Blank Line
Name //the first drop-down entry of the spinner
Type //an enum value
Entries //a semicolon-separated list of the drop-down entry String values
//Blank line
And then, assuming this rigid syntax is followed always, I've extracted this information from the saved .txt whenever the app is started. But things such as editing these entries and working with certain aspects of the Scanner have been an absolute nightmare. If anything is off by even one line or space of blankness BAM! everything is ruined. There must be a better way to store information for easy access, something with some search-eability, something that won't be erased the moment the app closes and that isn't completely laxed in its layout to the extent that the most minor of changes destroys everything.
Any recommendations for how to save a simple String, a simple int, and an array of String outside the app? I am looking for a recommendation from an experienced developer here. I have seen the storage options, but am unsure which would be best for just a few simple things. Everything I need could be represented in a 3 X n table wherein n is the number of spinners.
Since your requirements are so minimal, I think the shared preferences approach is probably the best option. If your requirements were more complicated, then a using a database would start to make more sense.
Using shared preferences for simple data like yours really is as simple as the example shown on the storage options page.

Find and delete duplicates in a Lotus Notes database

I am very new to lotus notes. Recently my team mates were facing a problem regarding the Duplicates in Lotus notes as shown below in the CASE A and CASE B.
So we bought a app named scanEZ (Link About scanEX). Using this tool we can remove the first occurrence or the second occurrence. As in the case A and Case B the second items are considered as redundant because they do not have child. So we can remove all the second item as given below and thus removing the duplicates.
But in the Case 3 the order gets changed, the child item comes first and the Parent items comes second so i am unable to use the scanEX app.
Is there any other better way or software or script to accomplish my task. As I am new to this field I have not idea. Kindly help me.
Thanks in advance.
Probably the easiest way to approach this would be to force the view to always display documents with children first. That way the tool you have purchased will behave consistently for you. You would do this by adding a hidden sorted column to the right of the column that that you have circled. The formula in this column would be #DocChildren, and the sort options for the column would be set to 'Descending'. (Note that if you are uncomfortable making changes in this view, you can make a copy of it, make your changes in the copy, and run ScanEZ against the copy as well. You can also do all of this in a local replica of the database, and only replicate it back to the server when you are satisified that you have the right results.)
The other way would be to write your own code in LotusScript or Java, using the Notes classes. There are many different ways that you could write that code,
I agree with Richard's answer. If you want more details on how to go thru the document collection you could isolate the documents into a view that shows only the duplicates. Then write an agent to look at the UNID of the document, date modified and other such data elements to insure that you are getting the last updated document. I would add a field to the document as in FLAG='keep'. Then delete documents that don't have your flag in the document with a second agent. If you take this approach you can often use the same agents in other databases.
Since you are new to Notes keep in mind that Notes is a document database. There are several different conflicts like save conflicts or replication conflicts. Also you need to look at database settings on how duplicates can be handled. I would read up on these topics just so you can explain it to your co-workers/project manager.
Eventually in your heavily travelled databases you might be able to automate this process after you work down the source of the duplicates.
These are clearly not duplicates.
The definition of duplicate is that they are identical and so it does not matter which one is kept and which one is removed. To you, the fact that one has children makes it more important, which means that they are not pure duplicates.
What you have not stated is what you want to do if multiple documents with similar dates/subjects have children (a case D if you will).
To me this appears as three separate problems.
The first problem is to sort out the cases where more than one
document in a set has children.
Then sort out the cases where only one document in a set has children.
Then sort out the cases where none of the documents in a set has children.
The approach in each case will be different. The article from Ytira only really covers the last of these cases.

Resource considerations for a Java program using an SQL DB

I'm fairly new to programming, at least when it comes to anything substantial. I am about to start work on a management software for my employer which draws it's data from, and stores it's data to, an SQL database. I will likely be using JDBC to interact with it.
To try and accurately describe the problem I am going to focus on a very small portion of the program. In the database, there is a table that stores Job records. There are a couple of thousand of them. I want to display all available Jobs (as a text reference from the table) in a scroll-able panel in the program with a search function.
So, my question is... Should I create Job objects from each record in one go and have the program work with the objects to display them, OR should I simply display strings taken directly from the records? The first method would mean that other details of each job are stored in advanced so that when I open a record in the UI the load times should be minimal, however it also sounds like it would take a great deal of resources when it initially populates the panel and generates the objects. The second method would mean issuing a large quantity of queries to the Database, but might avoid the initial resource overhead, but I don't want to put too much strain on the SQL Server because other software in-house relies on it.
Really, I don't know anything about how I should be doing this. But that really is my question. Apologies if I am displaying my ignorance in this post, and thank you in advanced for any help you can offer.
"A couple thousand" is a very small number for modern computers. If you have any sort of logic to perform on these records (they're not all modified solely via stored procedures), you're going to have a much easier time using an object-relational mapping (ORM) tool like Hibernate. Look into the JPA specification, which allows you to create Java classes that represent database objects and then simply annotate them to describe how they're stored in the database. Using an ORM like this system does have some overhead, but it's nearly always worthwhile, since computers are fast and programmers are expensive.
Note: This is a specific example of the rule that you should do things in the clearest and easiest-to-understand way unless you have a very specific reason not to, and in particular that you shouldn't optimize for speed unless you've measured your program's performance and have determined that a specific section of the code is causing problems. Use the abstractions that make the code easy to understand and come back later if you actually have to speed things up.

Categories