JPA - find by multiple attributes in collections of objects

JPA - find by multiple attributes in collections of objects - java

I have an event object with following attributes:
class Event {
String name;
String location;
LocalDateTime date;
String description;
}
Lets say I get from web API a list of events:
List<Events> events = getEvents(); // e.g. 5 events
And now I want to check how many of these events I already have in my DB.
Event is unique if combination of values: name, location and date is also unique.
So basically I want to a create query to do this:
Optional<Event> getByNameAndLocationAndDate(String name, String location, LocalDate date);
but for a list of item in just one query. Something like:
Optional<Event> getByNameAndLocationAndDate(List<Events> events);
Is it possible with JPA?

There is no built-in or specially pretty way of doing this. But you could generate a query by using a loop:
public List<Event> getByNameAndLocationAndDate(List<Event> events) {
if (events.isEmpty()) {
return new ArrayList<>();
}
final StringBuilder queryBuilder = new StringBuilder("select e from Event e where ");
int i = 0;
for (final Event event : events) {
if (i > 0) {
queryBuilder.append("or")
}
queryBuilder.append(" (e.name = :name" + i);
queryBuilder.append(" and e.location = :location" + i);
queryBuilder.append(" and e.date = :date" + i + ") ");
i++;
}
final TypedQuery<Event> query = em.createQuery(queryBuilder.toString());
int j = 0;
for (final Event event : events) {
query.setParameter("name" + j, event.getName());
query.setParameter("location" + j, event.getLocation());
query.setParameter("date" + j, event.getDate());
}
return query.getResultList();
}
Like I said, not very pretty. Might be better with criteria API. Then again, unless you have very strict requirements for execution speed, you might be better off looping through the list checking one event at the time. It will result in the more queries run against the database, but also much prettier code.
Edit: Here is attempt using criteria API, haven't used it much so created just by googling, no guarantee it works as it is..
public List<Event> getByNameAndLocationAndDate(List<Event> events) {
if (events.isEmpty()) {
return new ArrayList<>();
}
final CriteriaBuilder cb = em.getCriteriaBuilder();
final CriteriaQuery<Event> query = cb.createQuery(Event.class);
final Root<Event> root = query.from(Event.class);
final List<Predicate> predicates = new ArrayList<>();
final List<Predicate> predicates = events.stream().map(event -> {
return cb.and(cb.equal(root.get("name"), event.getName()),
cb.equal(root.get("location"), event.getLocation()),
cb.equal(root.get("date"), event.getDate()));
}).collect(Collectors.toList());
query.select(root).where(cb.or(predicates.toArray(new Predicate[]{})));
return em.createQuery(query).getResultList();
}

try
List<Event> findByNameInAndLocationInAndDateIn(List<String> names,List<String> locations,List<Date> dates);
but this returns a list, not a single event, if you need verify if one event is not in database, the only way to do this is search one by one,
you can use this function for decide if needs that.
I'm not sure if this function behaves as you wish

Related

Filter the query result only and not the entire table

I am new to spring and I am trying to do something like this. Let us say my table has the following columns.
task
is_completed
completd_at
The user will provide the following options in the query parameters.
is_completed=true or false
from_date = dd-mm-yyyy
to_date = dd-mm-yyyy
I will check for each parameter one by one and then filter the table.
In Django, I can do something like this
tasks = Task.object.all() # All the tasks will be stored in tasks
tasks = tasks.filter(is_completed=True) # completed tasks will be filtered from all tasks
tasks = tasks.filter(completed_at__gte=from_date, completed_date__lte=to_date) # completed tasks will be filtered based on the completed date
How can I achieve this with spring JPA. Is there any way I can save the filtered results and query the filtered results again instead of querying the entire database?
That way I can check if the parameter has any value and check it like this
if (is_completed = True){
// filter completed tasks
}
if (from_date ){
// filter completed tasks that are completed from or after this date
}
if (to_date ){
// filter completed tasks that are completed till this date
}
The problem with the current approach is I have to write SQL queries for each combination. This becomes complex when there are multiple parameters.

Lets consider this is your Task class.
class Task {
private String id;
private boolean isCompleted;
private LocalDate isCompletedAt;
public Task(String id, boolean isCompleted, LocalDate isCompletedAt) {
this.id = id;
this.isCompleted = isCompleted;
this.isCompletedAt = isCompletedAt;
}
public boolean isCompleted() {
return isCompleted;
}
public LocalDate getCompletedAt() {
return isCompletedAt;
}
#Override
public String toString() {
return "Task{" +
"id='" + id + '\'' +
", isCompleted=" + isCompleted +
", isCompletedAt=" + isCompletedAt +
'}';
}
}
Below is the code using which you can filter the data:
class TaskFilter {
public static void main(String[] args) {
// Sample user input
boolean isCompleted = true;
LocalDate fromDate = LocalDate.parse("2020-11-04");
LocalDate toDate = LocalDate.parse("2021-11-06");
// Simulate data retrieved from JPA repository eg repository.findAll()
List<Task> tasks = List.of(
new Task("1", true, LocalDate.parse("2020-10-04")),
new Task("2", false, LocalDate.parse("2010-12-02")),
new Task("3", false, LocalDate.parse("2021-04-24")),
new Task("4", true, LocalDate.parse("2021-03-12"))
);
// Create a stream on retrieved data
Stream<Task> tasksStream = tasks.stream();
// Filter that stream based on user input
if(isCompleted) {
tasksStream = tasksStream.filter(task -> task.isCompleted());
}
if(fromDate != null) {
tasksStream = tasksStream.filter(task -> task.getCompletedAt().isAfter(fromDate));
}
if(toDate != null) {
tasksStream = tasksStream.filter(task -> task.getCompletedAt().isBefore(toDate));
}
// Finally collect that in a list. This is a must operation, because stream is not executed unless terminal operation is called.
List<Task> filteredTaskList = tasksStream.collect(Collectors.toList());
System.out.println(filteredTaskList);
}
}

Web scraping using multithreading

I wrote a code to lookup for some movie names on IMDB, but if for instance I am searching for "Harry Potter", I will find more than one movie. I would like to use multithreading, but I don't have much knowledge on this area.
I am using strategy design pattern to search among more websites, and for instance inside one of the methods I have this code
for (Element element : elements) {
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if (movieName.matches(patternMatcher)) {
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
resultList.add(result);
}
}
which, for each element (which is the movie name), will create a new connection on IMDB to lookup for ratings and other stuff, on the super.imdbConnection(movieName) line.
The problem is, I would like to have all the connections at the same time, because on 5-6 movies found, the process will take much longer than expected.
I am not asking for code, I want some ideeas. I thought about creating an inner class which implements Runnable, and to use it, but I don't find any meaning on that.
How can I rewrite that loop to use multithreading?
I am using Jsoup for parsing, Element and Elements are from that library.

The most simple way is parallelStream()
List<Result> resultList = elements.parallelStream()
.map(e -> {
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if(movieName.matches(patternMatcher)){
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
return result;
}else{
return null;
}
}).filter(Objects::nonNull)
.collect(Collectors.toList());
If you don't like parallelStream() and want to use Threads, you can to this:
List<Element> elements = new ArrayList<>();
//create a function which returns an implementation of `Callable`
//input: Element
//output: Callable<Result>
Function<Element, Callable<Result>> scrapFunction = (element) -> new Callable<Result>() {
#Override
public Result call() throws Exception{
String searchedUrl = element.select("a").attr("href");
String movieName = element.select("h2").text();
if(movieName.matches(patternMatcher)){
Result result = new Result();
result.setName(movieName);
result.setLink(searchedUrl);
result.setTitleProp(super.imdbConnection(movieName));
System.out.println(movieName + " " + searchedUrl);
return result;
}else{
return null;
}
}
};
//create a fixed pool of threads
ExecutorService executor = Executors.newFixedThreadPool(elements.size());
//submit a Callable<Result> for every Element
//by using scrapFunction.apply(...)
List<Future<Result>> futures = elements.stream()
.map(e -> executor.submit(scrapFunction.apply(e)))
.collect(Collectors.toList());
//collect all results from Callable<Result>
List<Result> resultList = futures.stream()
.map(e -> {
try{
return e.get();
}catch(Exception ignored){
return null;
}
}).filter(Objects::nonNull)
.collect(Collectors.toList());

How to query list of objects grouped by their count (by frequency of mentions for last day)

The principe is like in StackOverFlow - every question has tags. And I need to display these tags by frequency of mentions for last day.
public List<TagDto> getAllTagsByCount() {
List<TagDto> tagDtos = new ArrayList<>();
try {
tagDtos = entityManager.createQuery("SELECT t.id, t.name FROM Tag t") // have no idea how to write such query
.unwrap(Query.class)
.setResultTransformer(new ResultTransformer() {
#Override
public Object transformTuple(Object[] objects, String[] strings) {
return TagDto.builder()
.id((Long) objects[0])
.name((String) objects[1])
.build();
}
#Override
public List transformList(List list) {
return list;
}
})
.getResultList();
} catch (Exception e) {
e.printStackTrace();
}
return tagDtos;
}
If u'll need some additional part of code, please let me know

What you are looking for is probably something like the following
List<Tuple> tuple = entityManager.createQuery("SELECT t.name, COUNT(*) FROM Tag t WHERE t.date BETWEEN :yesterday AND :tomorrow GROUP BY t.name")
.setParameter("yesterday", LocalDateTime.now().minus(1, ChronoUnit.DAYS).with(LocalTime.of(0, 0, 0))
.setParameter("tomorrow", LocalDateTime.now().plus(1, ChronoUnit.DAYS).with(LocalTime.of(0, 0, 0))
.getResultList();
You do not seem to have basic SQL knowledge though, so I would recommend you try to learn SQL first.

Mapping several columns from sql to a java object

I am trying to retrieve and process code from JIRA, unfortunately the pieces of information (which are in the Metadata-Plugin) are saved in a column, not a row.
Picture of JIRA-MySQL-Database
The goal is to save this in an object with following attributes:
public class DesiredObject {
private String Object_Key;
private String Aze.kunde.name;
private Long Aze.kunde.schluessel;
private String Aze.projekt.name;
private Long Aze.projekt.schluessel
//getters and setters here
}
My workbench is STS and it's a Spring-Boot-Application.
I can fetch a List of Object-Keys with the JRJC using:
JiraController jiraconnect = new JiraController();
List<JiraProject> jiraprojects = new ArrayList<JiraProject>();
jiraprojects = jiraconnect.findJiraProjects();
This is perfectly working, also the USER_KEY and USER_VALUE are easily retrievable, but I hope there is a better way than to perform
three SQL-Searches for each project and then somehow build an object from all those lists.
I was starting with
for (JiraProject jp : jiraprojects) {
String SQL = "select * from jira_metadata where ENRICHED_OBJECT_KEY = ?";
List<DesiredObject> do = jdbcTemplateObject.query(SQL, new Object[] { "com.atlassian.jira.project.Project:" + jp.getProjectkey() }, XXX);
}
to get a list with every object, but I'm stuck as i can't figure out a ObjectMapper (XXX) who is able to write this into an object.
Usually I go with
object.setter(rs.getString("SQL-Column"));
But that isn't working, as all my columns are called the same. (USER_KEY & USER_VALUE)
The Database is automatically created by JIRA, so I can't "fix" it.
The Object_Keys are unique which is why I tried to use those to collect all the data from my SQL-Table.
I hope all you need to enlighten me is in this post, if not feel free to ask for more!
Edit: Don't worry if there are some 'project' and 'projekt', that's because I gave most of my classes german names and descriptions..

I created a Hashmap with the Objectkey and an unique token in brackets, e.g.: "(1)JIRA".
String SQL = "select * from ao_cc6aeb_jira_metadata";
List<JiraImportObjekt> jioList = jdbcTemplateObject.query(SQL, new JiraImportObjektMapper());
HashMap<String, String> hmap = new HashMap<String, String>();
Integer unique = 1;
for (JiraImportObjekt jio : jioList) {
hmap.put("(" + unique.toString() + ")" + jio.getEnriched_Object_Key(),
jio.getUser_Key() + "(" + jio.getUser_Value() + ")");
unique++;
}
I changed this into a TreeMap
Map<String, String> tmap = new TreeMap<String, String>(hmap);
And then i iterated through that treemap via
String aktuProj = new String();
for (String s : tmap.keySet()) {
if (aktuProj.equals(s.replaceAll("\\([^\\(]*\\)", ""))) {
} else { //Add Element to list and start new Element }
//a lot of other stuff
}
What I did was to put all the data in the right order, iterate through and process everything like I wanted it.
Object hinfo = hmap.get(s);
if (hinfo.toString().replaceAll("\\([^\\(]*\\)", "").equals("aze.kunde.schluessel")) {
Matcher m = Pattern.compile("\\(([^)]+)\\)").matcher(hinfo.toString());
while (m.find()) {
jmo[obj].setAzeKundeSchluessel(Long.parseLong(m.group(1), 10));
// logger.info("AzeKundeSchluessel: " +
// jmo[obj].getAzeKundeSchluessel());
}
} else ...
After the loop I needed to add the last Element.
Now I have a List with the Elements which is easy to use and ready for further steps.
I cut out a lot of code because most of it is customized for my problem.. the roadmap should be enough to solve it though.
Good luck!

Solr Performance for many documents query

I want to have Solr always retrieve all documents found by a search (I know Solr wasn't built for that, but anyways) and I am currently doing this with this code:
...
List<Article> ret = new ArrayList<Article>();
QueryResponse response = solr.query(query);
int offset = 0;
int totalResults = (int) response.getResults().getNumFound();
List<Article> ret = new ArrayList<Article>((int) totalResults);
query.setRows(FETCH_SIZE);
while(offset < totalResults) {
//requires an int? wtf?
query.setStart((int) offset);
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
response = solr.query(query);
List<Article> current = response.getBeans(Article.class);
offset += current.size();
ret.addAll(current);
}
...
This works, but is pretty slow if a query gets over 1000 hits (I've read about that on here. This is being caused by Solr because I am setting the start everytime which - for some reason - takes some time). What would be a nicer (and faster) ways to do this?

To improve the suggested answer you could use a streamed response. This has been added especially for the case that one fetches all results. As you can see in Solr's Jira that guy wants to do the same as you do. This has been implemented for Solr 4.
This is also described in Solrj's javadoc.
Solr will pack the response and create a whole XML/JSON document before it starts sending the response. Then your client is required to unpack all that and offer it as a list to you. By using streaming and parallel processing, which you can do when using such a queued approach, the performance should improve further.
Yes, you will loose the automatic bean mapping, but as performance is a factor here, I think this is acceptable.
Here is a sample unit test:
public class StreamingTest {
#Test
public void streaming() throws SolrServerException, IOException, InterruptedException {
HttpSolrServer server = new HttpSolrServer("http://your-server");
SolrQuery tmpQuery = new SolrQuery("your query");
tmpQuery.setRows(Integer.MAX_VALUE);
final BlockingQueue<SolrDocument> tmpQueue = new LinkedBlockingQueue<SolrDocument>();
server.queryAndStreamResponse(tmpQuery, new MyCallbackHander(tmpQueue));
SolrDocument tmpDoc;
do {
tmpDoc = tmpQueue.take();
} while (!(tmpDoc instanceof PoisonDoc));
}
private class PoisonDoc extends SolrDocument {
// marker to finish queuing
}
private class MyCallbackHander extends StreamingResponseCallback {
private BlockingQueue<SolrDocument> queue;
private long currentPosition;
private long numFound;
public MyCallbackHander(BlockingQueue<SolrDocument> aQueue) {
queue = aQueue;
}
#Override
public void streamDocListInfo(long aNumFound, long aStart, Float aMaxScore) {
// called before start of streaming
// probably use for some statistics
currentPosition = aStart;
numFound = aNumFound;
if (numFound == 0) {
queue.add(new PoisonDoc());
}
}
#Override
public void streamSolrDocument(SolrDocument aDoc) {
currentPosition++;
System.out.println("adding doc " + currentPosition + " of " + numFound);
queue.add(aDoc);
if (currentPosition == numFound) {
queue.add(new PoisonDoc());
}
}
}
}

You might improve performance by increasing FETCH_SIZE. Since you are getting all the results, pagination doesn't make sense unless you are concerned with memory or some such. If 1000 results are liable to cause a memory overflow, I'd say your current performance seems pretty outstanding though.
So I would try getting everything at once, simplifying this to something like:
//WHOLE_BUNCHES is a constant representing a reasonable max number of docs we want to pull here.
//Integer.MAX_VALUE would probably invite an OutOfMemoryError, but that would be true of the
//implementation in the question anyway, since they were still being stored in the list at the end.
query.setRows(WHOLE_BUNCHES);
QueryResponse response = solr.query(query);
int totalResults = (int) response.getResults().getNumFound(); //If you even still need this figure.
List<Article> ret = response.getBeans(Article.class);
If you need to keep the pagination though:
You are performing this first query:
QueryResponse response = solr.query(query);
and are populating the number of found results from it, but you are not pulling any results with the response. Even if you keep pagination here, you could at least eliminate one extra query here.
This:
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
Is unnecessary. setRows specifies a Maximum number of rows to return, so asking for more than are available won't cause any problems.
Finally, apropos of nothing, but I have to ask: what argument would you expect setStart to take if not an int?

Use below logic to fetch solr data as batch to optimize performance of solr data fetch query:
public List<Map<String, Object>> getData(int id,Set<String> fields){
final int SOLR_QUERY_MAX_ROWS = 3;
long start = System.currentTimeMillis();
SolrQuery query = new SolrQuery();
String queryStr = "id:" + id;
LOG.info(queryStr);
query.setQuery(queryStr);
query.setRows(SOLR_QUERY_MAX_ROWS);
QueryResponse rsp = server.query(query, SolrRequest.METHOD.POST);
List<Map<String, Object>> mapList = null;
if (rsp != null) {
long total = rsp.getResults().getNumFound();
System.out.println("Total count found: " + total);
// Solr query batch
mapList = new ArrayList<Map<String, Object>>();
if (total <= SOLR_QUERY_MAX_ROWS) {
addAllData(mapList, rsp,fields);
} else {
int marker = SOLR_QUERY_MAX_ROWS;
do {
if (rsp != null) {
addAllData(mapList, rsp,fields);
}
query.setStart(marker);
rsp = server.query(query, SolrRequest.METHOD.POST);
marker = marker + SOLR_QUERY_MAX_ROWS;
} while (marker <= total);
}
}
long end = System.currentTimeMillis();
LOG.debug("SOLR Performance: getData: " + (end - start));
return mapList;
}
private void addAllData(List<Map<String, Object>> mapList, QueryResponse rsp,Set<String> fields) {
for (SolrDocument sdoc : rsp.getResults()) {
Map<String, Object> map = new HashMap<String, Object>();
for (String field : fields) {
map.put(field, sdoc.getFieldValue(field));
}
mapList.add(map);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JPA - find by multiple attributes in collections of objects - java

Related

Filter the query result only and not the entire table

Web scraping using multithreading

How to query list of objects grouped by their count (by frequency of mentions for last day)

Mapping several columns from sql to a java object

Solr Performance for many documents query

Categories

Resources