is it possible to get all process or task variables using TaskService:
processEngine.getTaskService.createTaskQuery().list();
I know there is an opportunity to get variables via
processEngine.getTaskService().getVariable()
or
processEngine.getRuntimeService().getVariable()
but every of operation above goes to database. If I have list of 100 tasks I'll make 100 queries to DB. I don't want to use this approach.
Is there any other way to get task or process related variables?
Unfortunately, there is no way to do that via the "official" query API! However, what you could do is writing a custom MyBatis query as described here:
https://app.camunda.com/confluence/display/foxUserGuide/Performance+Tuning+with+custom+Queries
(Note: Everything described in the article also works for bare Activiti, you do not need the fox engine for that!)
This way you could write a query which selects tasks along with the variables in one step. At my company we used this solution as we had the exact same performance problem.
A drawback of this solution is that custom queries need to be maintained. For instance, if you upgrade your Activiti version, you will need to ensure that your custom query still fits the database schema (e.g., via integration tests).
If it is not possible to use the API as elsvene says, you can query yourself the database. Activiti has several tables on the database.
You have act_ru_variable, were the currently running processes store the variables. For the already finished processess you have act_hi_procvariable. Probably you can find a detailed explanation on what is on each table in activiti userguide.
So you just need to make queries like
SELECT *
FROM act_ru_variable
WHERE *Something*
The following Test, sends a value object (Person) to a process which just adds a few tracking infos for demonstration.
I had the same problem, to get the value object after execution the service to do some validation in my test.
The following piece of code shows the execution and the gathering of the task varaible after the execution was finished.
#Test
public void justATest() {
Map<String, Object> inVariables = new HashMap<String, Object>();
Person person = new Person();
person.setName("Jens");
inVariables.put("person", person);
ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("event01", inVariables);
String processDefinitionId = processInstance.getProcessDefinitionId();
String id = processInstance.getId();
System.out.println("id " + id + " " + processDefinitionId);
List<HistoricVariableInstance> outVariables =
historyService.createHistoricVariableInstanceQuery().processInstanceId(id).list();
for (HistoricVariableInstance historicVariableInstance : outVariables) {
String variableName = historicVariableInstance.getVariableName();
System.out.println(variableName);
Person person1 = (Person) historicVariableInstance.getValue();
System.out.println(person1.toString());
}
}
Related
I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);
I have to check for changes in an old embedded DBF database which is populated by an old third-party application. I don't have access to source code of that application and cannot put trigger or whatever on the database. For business constraint I cannot change that...
My objective is to capture new records, deleted records and modified records from a table (~1500 records) of that database with a Java application for further processes. The database is accessible in my Spring application though JPA/Hibernate with HXTT DBF driver.
I am looking now for a way to efficiently capture changes made by the third-party app in the database.
Do I have to periodically read the whole table and check if each record is still unchanged or to apply any kind of diff within two readings? Is there a kind of "trigger" I can set in my Java app? How to listen properly for those changes?
There is no JPA mechanism for getting callbacks from a database when the data changes.
The only options is to build your own change detection. Typically you would start by detecting which entities were added, removed, and which still exists. For the once that still exist you will need to check if they are changed, so the entity needs an equals() method.
An entity is identified by it primary key, so you will need to get the set of all primary keys, once you have that you can easily use Guava's Sets methods to produce the 3 sets of added, removed, and existing (before and now), like this.
List<MyEntity> old = new ArrayList<>(); // load from the DB last time
List<MyEntity> current = new ArrayList<>(); // loaded from DB now
Map<Long, MyEntity> oldMap = old.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity() ));
Map<Long, MyEntity> currentMap = current.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity()));
Set<Long> oldKeys = oldMap.keySet();
Set<Long> currentKeys = currentMap.keySet();
Sets.SetView<Long> deletedKeys = Sets.difference(oldKeys, currentKeys);
Sets.SetView<Long> addedKeys = Sets.difference(currentKeys, oldKeys);
Sets.SetView<Long> couldBeChanged = Sets.intersection(oldKeys, currentKeys);
for (Long id : couldBeChanged) {
if (oldMap.get(id).equals(currentMap.get(id))) {
// entity with this id was changed
}
}
I'm new to DynamoDb and I'm struggling to work out how to do this (using the java sdk).
I currently have a table (in mongo) for notifications. The schema is basically as follows (I've simplified it)
id: string
notifiedUsers: [123, 345, 456, 567]
message: "this is a message"
created: 12345678000 (epoch millis)
I wanted to migrate to Dynamodb, but I can't work out the best way to select all notifications that went to a particular user after a certain date?
I gather I can't have an index on a list like notifiedUsers, therefore I can't use a query in this case - is that correct?
I'd prefer not to scan and then filter, there could be a lot of records.
Is there a way to do this using a query or another approach?
EDIT
This is what I'm trying now, it's not working and I'm not sure where to take it (if anywhere).
Condition rangeKeyCondition = new Condition()
.withComparisonOperator(ComparisonOperator.CONTAINS.toString())
.withAttributeValueList(new AttributeValue().withS(userId));
if(startTimestamp != null) {
rangeKeyCondition = rangeKeyCondition.withComparisonOperator(ComparisonOperator.GT.toString())
.withAttributeValueList(new AttributeValue().withS(startTimestamp));
}
NotificationFeedDynamoRecord replyKey = new NotificationFeedDynamoRecord();
replyKey.setId(partitionKey);
DynamoDBQueryExpression<NotificationFeedDynamoRecord> queryExpression = new DynamoDBQueryExpression<NotificationFeedDynamoRecord>()
.withHashKeyValues(replyKey)
.withRangeKeyCondition(NOTIFICATIONS, rangeKeyCondition);
In case anyone else comes across this question, in the end we flattened the schema, so that there is now a record per userId. This has lead to problems because it's not possible with dynamoDb to atomically batch write records. With the original schema we had one record, and could write it atomically ensuring that all users got that notification. Now we cannot be certain, and this is causing pain.
I want to fetch the 10 latest records from the BATCH_JOB_EXECUTION-table joined with the BATCH_JOB_INSTANCE-table.
So how can I access these tables?
In this application I have used Spring Data JPA. It's another application which uses Spring Batch and created these tables. In other words, I would just like to run a JOIN-query and map it directly to my custom object with just the necessary fields. As far as it's possible, I would like to avoid making seperate models for the two tables. But I don't know the best approach here.
If you want to do it from Spring Batch code you need to use JobExplorer and apply filters on either START_TIME or END_TIME. Alternatively, just send an SQL query with your desired JOIN to the DB using JDBC. The DDLs of the metadata tables can be found here.
EDIT
If you want to try to do it in SpringBatch, I guess you need to iterate through JobExecutions and find the ones that interest you, then do your thing )) someth. like:
List<JobInstance> jobInstances = jobExplorer.getJobInstances(jobName);
for (JobInstance jobInstance : jobInstances) {
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
for (JobExecution jobExecution : jobExecutions) {
if (//jobExecution.getWhatever...)) {
// do your thing...
}
}
}
Good Luck!
Since JobExplorer doesn't have the interface .getJobInstances(jobName) anymore, I have done this (this example with BatchStatus as a condition) adapted with streams :
List<JobInstance> lastExecutedJobs = jobExplorer.getJobInstances(jobName, 0, Integer.MAX_VALUE);
Optional<JobExecution> jobExecution = lastExecutedJobs
.stream()
.map(jobExplorer()::getJobExecutions)
.flatMap(jes -> jes.stream())
.filter(je -> BatchStatus.COMPLETED.equals(je.getStatus()))
.findFirst();
To return N elements, you could use others capacities of stream (limit, max, collectors, ...).
I'm looking for a solution to query completed tasks in Activiti by filtering on the completion date. Because once they're finished completed task entries are being moved into the act_hi_taskinst table by the BPMN engine i would expected the required filters to be in the HistoricTaskInstanceQuery class. However there's nothing like startedAfter/startedBefore and finishedAfter/finishedBefore methods like in the HistoricProcessInstanceQuery. The table has the start_time_ and end_time_ columns so there's no reason why this kind of query would be not possible.
Is there an other way to filter by these properties or currently the only way to get around this is to query the act_hi_tasks table directly bypassing the Activiti engine?
Activiti provides Query API so there is no need to query act_hi_taskinst directly.
You query may look like this one
NativeHistoricTaskInstanceQuery taskQuery = historyService.createNativeHistoricTaskInstanceQuery();
taskQuery.sql("SELECT * FROM "+ managementService.getTableName(HistoricTaskInstance.class)+" WHERE start_time_=#{startTime} AND end_time_=#{endTime}");
taskQuery.parameter("startTime", startTime).parameter("endTime", end_time);
List<HistoricTaskInstance> tasks = taskQuery.list();