Query batch job metadata in Spring batch - java

I want to fetch the 10 latest records from the BATCH_JOB_EXECUTION-table joined with the BATCH_JOB_INSTANCE-table.
So how can I access these tables?
In this application I have used Spring Data JPA. It's another application which uses Spring Batch and created these tables. In other words, I would just like to run a JOIN-query and map it directly to my custom object with just the necessary fields. As far as it's possible, I would like to avoid making seperate models for the two tables. But I don't know the best approach here.

If you want to do it from Spring Batch code you need to use JobExplorer and apply filters on either START_TIME or END_TIME. Alternatively, just send an SQL query with your desired JOIN to the DB using JDBC. The DDLs of the metadata tables can be found here.
EDIT
If you want to try to do it in SpringBatch, I guess you need to iterate through JobExecutions and find the ones that interest you, then do your thing )) someth. like:
List<JobInstance> jobInstances = jobExplorer.getJobInstances(jobName);
for (JobInstance jobInstance : jobInstances) {
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
for (JobExecution jobExecution : jobExecutions) {
if (//jobExecution.getWhatever...)) {
// do your thing...
}
}
}
Good Luck!

Since JobExplorer doesn't have the interface .getJobInstances(jobName) anymore, I have done this (this example with BatchStatus as a condition) adapted with streams :
List<JobInstance> lastExecutedJobs = jobExplorer.getJobInstances(jobName, 0, Integer.MAX_VALUE);
Optional<JobExecution> jobExecution = lastExecutedJobs
.stream()
.map(jobExplorer()::getJobExecutions)
.flatMap(jes -> jes.stream())
.filter(je -> BatchStatus.COMPLETED.equals(je.getStatus()))
.findFirst();
To return N elements, you could use others capacities of stream (limit, max, collectors, ...).

Related

Change where clause for batch delete

I want to like to create a batch delete something like:
DELETE t WHERE t.my_attribute = ?
First try was:
private void deleteRecord( ) {
//loop
final MyRecord myRecord = new MyRecord();
myRecord.setMyAttribute(1234);
getDslContext().batchDelete(myRecord) .execute();
}
But here the SQL contains always the pk instead of my attribute.
Second try was to create a delete statement with a bind value, but here i found no solution how i can create a where clause with ?
//loop
getDslContext().delete( MY_RECORD ).where( ???)
.bind( 12234 );
Can anybody help me further?
The DELETE statement itself
Just add your comparison predicate as you would in SQL:
getDslContext()
.delete(T)
.where(T.MY_ATTRIBUTE.eq(12234))
.execute();
This is assuming you are using the code generator, so you can static import your com.example.generated.Tables.T table reference.
Batching that
You have two options of batching such statements in jOOQ:
1. Using the explicit batch API
As explained here, create a query with a dummy bind value as I've shown above, but don't execute it directly, use the Batch API instead:
// Assuming these are your input attributes
List<Integer> attributes = ...
Query query = getDslContext().delete(T).where(T.MY_ATTRIBUTE.eq(0));
getDSLContext()
.batch(query)
.bind(attributes
.stream().map(a -> new Object[] { a }).toArray(Object[][]::new)
).execute();
2. Collect individual executions in a batched connection
You can always use the convenient batched collection in jOOQ to transparently collect executed SQL and delay it into a batch:
getDslContext().batched(c -> {
for (Integer attribute : attributes)
c.dsl().getDslContext()
.delete(T)
.where(T.MY_ATTRIBUTE.eq(attribute)
.execute(); // Doesn't execute the query yet
}); // Now the entire batch is executed
In the latter case, the SQL string might be re-generated for every single execution, so the former is probably better for simple batches.
Bulk execution
However, why batch when you can run a single query? Just do this, perhaps?
getDslContext()
.delete(T)
.where(T.MY_ATTRIBUTE.in(attributes))
.execute();

Spring + Hibernate loading large amount records

I'm trying to find the best/optimal way of loading larger amounts of data form MySQL database in a Spring/Hibernate service.
I pull about 100k records from a 3rd party API (in chunks usually between 300-1000) I then need to pull translations for each record from database since there are 30 languages that means that there will be 30 rows per record so 1000 records from API is 30,000 rows from database.
The records from API come in form of POJO (super small in size) say I get 1000 records I split the list into multiple 100 record lists and then collect id's of each record and select all translations from database for this record. I only need two values from the table which I than add to my POJOs and then I push the POJOs to the next service.
Basically this:
interface i18nRepository extends CrudRepository<Translation, Long> {}
List<APIRecord> records = api.findRecords(...);
List<List<APIRecord>> partitioned = Lists.partition(records, 100); // Guava
for(List<APIRecord> chunk : partitioned) {
List<Long> ids = new ArrayList();
for(APIRecord record : chunk) {
ids.add(record.getId());
}
List<Translation> translations = i18Repository.findAllByRecordIdIn(ids);
for(APIRecord record : chunk) {
for(Translation translation : translations) {
if (translation.getRedordId() == record.getId()) {
record.addTranslation(translation);
}
}
}
}
As far as spring-boot/hibernate properties go I only have default ones set. I would like to make this as efficient, fast and memory lite as possible. One idea I had was to use the lower layer API instead of Hibernate to bypass object mapping.
In my opinion, you should bypass JPA/Hibernate for bulk operations.
There's no way to make bulk operations efficient in JPA.
Consider using Spring's JpaTemplate and native SQL.

Faster way of updating database table using Hibernate (Java 8 reduction?)

I am working on a monitoring tool developed in Spring Boot using Hibernate as ORM.
I need to compare each row (already persisted rows of sent messages) in my table and see if a MailId (unique) has received a feedback (status: OPENED, BOUNCED, DELIVERED...) Yes or Not.
I get the feedbacks by reading csv files from a network folder. The CSV parsing and reading of files goes very fast, but the update of my database is very slow. My algorithm is not very efficient because I loop trough a list that can have hundred thousands of objects and look in my table.
This is the method that make the update in my table by updating the "target" Object (row in table database)
#Override
public void updateTargetObjectFoo() throws CSVProcessingException, FileNotFoundException {
// Here I make a call to performProcessing method which reads files on a folder and parse them to JavaObjects and I map them in a feedBackList of type Foo
List<Foo> feedBackList = performProcessing(env.getProperty("foo_in"), EXPECTED_HEADER_FIELDS_STATUS, Foo.class, ".LETTERS.STATUS.");
for (Foo foo: feedBackList) {
//findByKey does a simple Select in mySql where MailId = foo.getMailId()
Foo persistedFoo = fooDao.findByKey(foo.getMailId());
if (persistedFoo != null) {
persistedFoo.setStatus(foo.getStatus());
persistedFoo.setDnsCode(foo.getDnsCode());
persistedFoo.setReturnDate(foo.getReturnDate());
persistedFoo.setReturnTime(foo.getReturnTime());
//The save account here does an MySql UPDATE on the table
fooDao.saveAccount(foo);
}
}
}
What if I achieve this selection/comparison and update action in Java side? Then re-update the whole list in database?
Will it be faster?
Thanks to all for your help.
Hibernate is not particularly well-suited for batch processing.
You may be better off using Spring's JdbcTemplate to do jdbc batch processing.
However, if you must do this via Hibernate, this may help: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/chapters/batch/Batching.html

find all items where a list field contains a value in dynamodb

I'm new to DynamoDb and I'm struggling to work out how to do this (using the java sdk).
I currently have a table (in mongo) for notifications. The schema is basically as follows (I've simplified it)
id: string
notifiedUsers: [123, 345, 456, 567]
message: "this is a message"
created: 12345678000 (epoch millis)
I wanted to migrate to Dynamodb, but I can't work out the best way to select all notifications that went to a particular user after a certain date?
I gather I can't have an index on a list like notifiedUsers, therefore I can't use a query in this case - is that correct?
I'd prefer not to scan and then filter, there could be a lot of records.
Is there a way to do this using a query or another approach?
EDIT
This is what I'm trying now, it's not working and I'm not sure where to take it (if anywhere).
Condition rangeKeyCondition = new Condition()
.withComparisonOperator(ComparisonOperator.CONTAINS.toString())
.withAttributeValueList(new AttributeValue().withS(userId));
if(startTimestamp != null) {
rangeKeyCondition = rangeKeyCondition.withComparisonOperator(ComparisonOperator.GT.toString())
.withAttributeValueList(new AttributeValue().withS(startTimestamp));
}
NotificationFeedDynamoRecord replyKey = new NotificationFeedDynamoRecord();
replyKey.setId(partitionKey);
DynamoDBQueryExpression<NotificationFeedDynamoRecord> queryExpression = new DynamoDBQueryExpression<NotificationFeedDynamoRecord>()
.withHashKeyValues(replyKey)
.withRangeKeyCondition(NOTIFICATIONS, rangeKeyCondition);
In case anyone else comes across this question, in the end we flattened the schema, so that there is now a record per userId. This has lead to problems because it's not possible with dynamoDb to atomically batch write records. With the original schema we had one record, and could write it atomically ensuring that all users got that notification. Now we cannot be certain, and this is causing pain.

Activiti BPM get Variables within Task

is it possible to get all process or task variables using TaskService:
processEngine.getTaskService.createTaskQuery().list();
I know there is an opportunity to get variables via
processEngine.getTaskService().getVariable()
or
processEngine.getRuntimeService().getVariable()
but every of operation above goes to database. If I have list of 100 tasks I'll make 100 queries to DB. I don't want to use this approach.
Is there any other way to get task or process related variables?
Unfortunately, there is no way to do that via the "official" query API! However, what you could do is writing a custom MyBatis query as described here:
https://app.camunda.com/confluence/display/foxUserGuide/Performance+Tuning+with+custom+Queries
(Note: Everything described in the article also works for bare Activiti, you do not need the fox engine for that!)
This way you could write a query which selects tasks along with the variables in one step. At my company we used this solution as we had the exact same performance problem.
A drawback of this solution is that custom queries need to be maintained. For instance, if you upgrade your Activiti version, you will need to ensure that your custom query still fits the database schema (e.g., via integration tests).
If it is not possible to use the API as elsvene says, you can query yourself the database. Activiti has several tables on the database.
You have act_ru_variable, were the currently running processes store the variables. For the already finished processess you have act_hi_procvariable. Probably you can find a detailed explanation on what is on each table in activiti userguide.
So you just need to make queries like
SELECT *
FROM act_ru_variable
WHERE *Something*
The following Test, sends a value object (Person) to a process which just adds a few tracking infos for demonstration.
I had the same problem, to get the value object after execution the service to do some validation in my test.
The following piece of code shows the execution and the gathering of the task varaible after the execution was finished.
#Test
public void justATest() {
Map<String, Object> inVariables = new HashMap<String, Object>();
Person person = new Person();
person.setName("Jens");
inVariables.put("person", person);
ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("event01", inVariables);
String processDefinitionId = processInstance.getProcessDefinitionId();
String id = processInstance.getId();
System.out.println("id " + id + " " + processDefinitionId);
List<HistoricVariableInstance> outVariables =
historyService.createHistoricVariableInstanceQuery().processInstanceId(id).list();
for (HistoricVariableInstance historicVariableInstance : outVariables) {
String variableName = historicVariableInstance.getVariableName();
System.out.println(variableName);
Person person1 = (Person) historicVariableInstance.getValue();
System.out.println(person1.toString());
}
}

Categories