Data Structures in JAVA to implement joins - java

Hi i am trying to implement a simple join algorithim in Java...
I have three relations i.e M(ABX) N(ACY) and O(BCZ). These relations are currently in a comma separated file and all integers(example file M will have values like 1,5,6; 2,7.9;..) was wondering what was the best data structure to use in Java to implement the join MxNxO i.e M and N will join on attribute A producing a schema(ABCXY) which will then join with O on attributes B and C producing a final result of ABXCYZ which will have all join results..

Perhaps an embedded database like hsqldb would be the right choice. It's flexible, performant, and easy to use.

There is no specialized data structures that you can readily use for this.
You would have to represent the tables extracted from your CSV files via List<List>> and then you would have to iterate over the lists and compare the proper attribute representing the column name to create intermediate lists and so on until you have joined all the relations.
I.e. you would need to implement this logic yourself.
The best way for this IMHO is to follow the answer of #Ernest Friedman-Hill.
Not only will you get this functionality faster you will get it error free as you would not need to test that the join algorithm works correctly over any dataset. The embedded database will do this for you.

Related

How to construct predicates dynamically in Java

I am not sure if that is possible or not and after a lot of research I ended up here to ask for your help or even guidance.
So, let's say I have a json array that has 10 different types of objects inside the array. This is a json that is being retrieved through an API with sports games.
What I need to do is filtering through these objects in my application. I am using JAVA and so far I have ended up that I will use stream filter and predicates. I am aware that I can create different types of predicates and put them in the stream.filter() function, but is it possible to do it somehow dynamically?
For example, I need to filter this array by time. This predicate will be
return p -> p.getTime() > 1;
And then:
return match.stream().filter( predicate ).collect(Collectors.<Match>toList());
What if another filter has another one condition which is team name. Is it possible to add some how the other predicate and also add the "AND" "OR" condition between those two? I need to do this dynamically using one filter function with different predicates.
Is there a way to make something like a custom query to store it in a database and retrieve it and use it like a predicate? Or the predicate itself is it possible to be stored in a database?
If I am completely wrong on this please guide me to find another way to do this. Otherwise a help would be appreciated. Thank you and happy new year to all. :)
This is an interesting problem. And I think this will not be uncommon face as well considering data lake scenarios.
I think, as suggested in a comment above, the way to apply is to have a Predicate. You may have a predicate that applies the conditions as AND or OR and then supply it to the stream processor. Like this (assuming that you have a base class Data to which you have mapped your API output):
/* Create the predicate with the conditions. Showing 2 here with an "AND" combination. */
Predicate<? extends Data> p = d -> d.getTime() > 1;
p.and( d -> d.getName().equals( "Football" ) ); //Consider ".or()" here, if that is what you need.
/* Supply this predicate to the stream processor. */
match.stream().filter( p ).collect(Collectors.<Match>toList());
Using an and() call is the same as calling .filter() one after the other on the stream processor. Something like this:
stream.filter(...).filter(...)...
So, you will be able to construct such a stream call in a for loop.
Is there a way to make something like a custom query to store it in a database and retrieve it and use it like a predicate? Or the predicate itself is it possible to be stored in a database?
You may do this within your Predicate itself. That is, instead writing the logic as shown above, you may make a database call to fetch you Java code. However, you will have to do dynamic compilation using JavaCompiler. That may be a bit complicated. However, you may consider a JVM-based scripting language like Groovy for such things.

JPA query from java Object

How can i use jpa for query over an object (not an entity)?
For example this simple code:
String [] theList = {a,b,c,d}.
Query q = new Query("Select tl from theList tl")
Reason behind: the queries are dynamically created and executed, but the objects in the from clause of the jpql query aren't necessarily mapped tables. In some cases there are just an Object, So the actual behavior needed is modify the query during execution of the program to meet the criteria, but i don't know how to modify the query.
Edit: I Don't use native queries because of portability of code. It will be the last option.
What you're looking for is called LINQ, and unfortunately (?) it is available only in C#.
However, you can partially emulate it with Stream(s).
A Stream offers basically all the operators you need
.filter() where
.max() max
.sorted() orderby
.limit() limit
.skip() offset
.collect(groupingBy()) group by
And so on. Just give a look at the Javadoc!
I think 'JdbcTemplate' would suffice your requirement.
JdbcTemplate gives you the flexibility to run native queries and map them to a Java class.
However, you'll have to explicitly map your Java class with the column names in the database.
I have solved using joSQL. Is a powerfull opensource tool that allows you to query over java objects using "sql". It is not jpa but satisfied my needs.
Another tool i have seen that do that is called querydsl.

How to fetch data from multiples tables in Hibernate?

i am new to hibernate and want to know a few things. I want to implement the following query in hibernate,please guide me.
SELECT p.num_is_active
FROM ins.cnfgtr_user_log t, ins.service_user_auth p
WHERE t.source = 'GC'
and t.tokenid = p.txt_auth_token
and t.sessionid = 100000000195756
and t.userid = p.txt_user_id
and t.userid = 'MASTERADMIN'
I also want to know do i have to maintain two separate pojo's for these two tables? does this pojo's need to be complete? i mean do they need to contain all the columns of the tables or can they contain only the ones needed for this query?
Q: do i have to maintain two separate POJO's for these two tables?
Answer: Yes you suppose to. Here in ORM each table will be represented by separate POJOs for modularity reasons.
Q: does this POJO's need to be complete?
Answer: Need not be. Except the columns marked as “not null”. You can use JPA/Hibernate Joins for querying purpose.
Hope this is helpful!

How do I perform a query like this in Hibernate? Joins

I have several objects classes mapped to a database using annotations and need some help working out how to put together a Hibernate query to get the results I want.
I'm using Hibernate 3.6.5. I've been using Criteria, but happy with Query etc if it works!
I'm new to Hibernate (can manage simple Criteria to filter objects by property but the join stuff is all new), so any explanation in the answer (or suggested reading) would be great.
A RawRead has a tagcode field which contains a String.
Checkpoint, IncidentItem and Guard classes all also have a TagCode property.
I want to retrieve all RawRead objects where the TagCode doesn't match any tagcode value from any of the other classes (IncidentItem, Guard, Checkpoint).
A sort of brain dump/psuedo SQL code:
select raw.* from
RAWREADS raw, checkpoints c, GUARDS g, INCIDENTITEMS i
where
raw.tagcode != c.TAGNO
and raw.TAGCODE != g.IDTAG
and raw.TAGCODE != i.IDTAG;
I realise that won't be efficient etc, just an illustration of my thoughts.
Can you suggest what to look at in Hibernate language?
EDIT/Additions:
The RawRead object is mapped to Guard and Checkpoint (has a property called checkpoint and one called guard that are both instances of those classes - both are #ManyToOne).
IncidentItem does not have any mapping to the other classes.
In order to join objects in HQL there has to be a relationship mapped between them in the annotations at the application level. If there are no mapped relationships, you'll need to do a query like this in plain SQL.

Best Java data structure to store a 3 column oracle table? 3 column array? or double map?

What is the best data structure to store an oracle table that's about 140 rows by 3 columns. I was thinking about a multi dimensional array.
By best I do not necessarily mean most efficient (but i'd be curious to know your opinions) since the program will run as a job with plenty of time to run but I do have some restrictions:
It is possible for multiple keys to be "null" at first. so the first column might have multiple null values. I also need to be able to access elements from the other columns. Anything better than a linear search to access the data?
So again, something like [][][] would work.. but is there something like a 3 column map where I can access by the key or the second column ? I know maps have only two values.
All data will probably be strings or cast as strings.
Thanks
A custom class with 3 fields, and a java.util.List of that class.
There's no benefit in shoe-horning data into arrays in this case, you get no improvement in performance, and certainly no improvement in code maintainability.
This is another example of people writing FORTRAN in an object-oriented language.
Java's about objects. You'd be much better off if you started using objects to abstract your problem, hide details away from clients, and reduce coupling.
What sensible object, with meaningful behavior, do those three items represent? I'd start with that, and worry about the data structures and persistence later.
All data will probably be strings or cast as strings.
This is fine if they really are strings, but I'd encourage you to look deeper and see if you can do better.
For example, if you write an application that uses credit scores you might be tempted to persist it as a number column in a database. But you can benefit from looking at the problem harder and encapsulating that value into a CreditScore object. When you have that, you realize that you can add something like units ("FICO" versus "TransUnion"), scale (range from 0 to 850), and maybe some rich behavior (e.g., rules governing when to reorder the score). You encapsulate everything into a single object instead of scattering the logic for operating on credit scores all over your code base.
Start thinking less in terms of tables and columns and more about objects. Or switch languages. Python has the notion of tuples built in. Maybe that will work better for you.
If you need to access your data by key and by another key, then I would just use 2 maps for that and define a separate class to hold your record.
class Record {
String field1;
String field2;
String field3;
}
and
Map<String, Record> firstKeyMap = new HashMap<String, Record>();
Map<String, Record> secondKeyMap = new HashMap<String, Record>();
I'd create an object which map your record and then create a collection of this object.

Categories