I am about to generate an excel based on the user's request.
Input:
DateRange - 2022/02/01-2022/02/07
Scenario
The system will retrieve the logs from the database based on the DateRange. The logs contain the names of people & date when it was added. Also, the system will retrieve the list of people from the database. Now, after retrieving the logs and the people, I want to get the number of occurrence a person appeared on each date.
Database Info:
logs table - 10k or more
person table - at least 1,500 people.
Expected output:
Problem Issue
From the given data above there is a possibility of 10,000(logs) * 1,500(person) = 15m or more iteration to get the total occurrence of a person. This results to a heavy traffic on the response which took almost 60seconds or more.
Here is my code:
// initialize days
List<Date> days = getDaysFromRequest(); // get the range from request
for (Person person: getPersonList()) {
// .... code here to display Persons
for (Date day: days) {
// .... code here to display day
int total = 0;
for(UserLog log: getUserLog()) {
if ( day == log.dateAdded && log.personName == person.Name) {
total++;
}
}
System.out.printLn(total); // write total here in excel sheet Like, B2 address
}
}
How should I optimize this?
If I get it right, all the information you want seems to be in the logs or if not it defaults to zero. Therefor I would do something like:
Map<String<Map<LocalDate,Long>> occurrenceByNameAndDate = // Map<Username<Map<Date,Count>>
userLogs.strream().collect(Collectors.groupingBy(UserLog::personName,
Collectors.groupingBy(UserLog::dateAdded,
Collectors.counting())));
and use the above map some how like:
personList.forEach(person -> dateRange.forEach(day -> {
long count = occurrenceByNameAndDate.getOrDefault(person.Name,Collections.EMPTY_MAP).getOrDefault(day,0);
writeToExcel(person,day,count);
}));
Or do it on the DB side
SELECT personName, dateAdded,COUNT(*)
FROM UserLog
WHERE dateAdded between(...)
GROUP BY personName,dateAdded
Related
I have a list which stores a object named GoldNetValue containing date and gold rate.There will be a difference of 10 minutes between the two records in the list and, in some cases no data will be available during the particular time interval.
Sample values as below
{GoldNetValue[2018-03-02 13:20 ,87], GoldNetValue[2018-03-02 13:30 ,86.4],GoldNetValue[2018-03-02 13:40 ,85.6]],GoldNetValue[2018-03-02 13:50 ,85.8]],GoldNetValue[2018-03-02 14:10 ,86.1]],GoldNetValue[2018-03-02 14:30 ,86.8]]
i need to loop through the list and create a new GoldNetValue object with missing date field and noDataAvailable flg enabled,then insert it back into the list. The difference is always 10 minutes.
int diffMins = 10;
Date tempDate = new Date();
for(int i= 0; i < goldNetList.size(); i++)
{
GoldNetValue goldValue = (GoldNetValue) goldNetList.get(i);
if(goldValue.getDate() != null && goldValue.getGoldRate() != null)
{
tempDate = goldValue.getDate();
}
if() // logic yet to be implemented
}
lets say from 13:30 to 13:50 pm , there is only one record available , i need to create an object with date as 13:40 and noDataFlag enabled and store it back to the list.
i am a newbie just started to learn coding.
How can i populate through the list and create objects with flag enabled with these type of value combinations?
Thank you for your time
I have built a method which takes two datasets: dataset1 and retirementSimpleData.
It matches the two datasets based on a primary key/number, defined as cnum, in the code below.
I wanted to return the value of the difference between the getAssets value and the getSums value, and this is working, except for one little problem.
Some of the cnums that exist in dataset1 don't exist in retirementSimpleData. Similarly, some cnums which may exist in retirementSimpleData may not exist in
dataset1. This is resulting in no data being returned for that cnum.
I would like to implement two passes at the end which check in one direction to see if I missed anything. The second pass would check in the opposite direction.
However, not sure how I would go about implementing this.
public void getReportData(int index) {
String date1 = Util.dateTimeToShortString(reportDate);
String date2 = reportDao.getPreviousRetirementDate(date1);
List<SurveyCompareAssetsCheckData> dataset1 = reportDao.getSurveyCheckCompareAssetsData(date1);
List<RetSurveyAssets> retirementSimpleData = reportDao.findRetSurveyByDate(date1);
for (SurveyCompareAssetsCheckData surveyCompareAssetsCheckData : dataset1) {
for (RetSurveyAssets surveyCompareAssetsCheckData2 : retirementSimpleData) {
if (surveyCompareAssetsCheckData.getCnum() == surveyCompareAssetsCheckData2.getCnum()) {
surveyCompareAssetsCheckData.setRetirementsimple(surveyCompareAssetsCheckData2.getSums());
surveyCompareAssetsCheckData.setDifference(surveyCompareAssetsCheckData.getAssets() - surveyCompareAssetsCheckData2.getSums());
Caveat: dataset1 and retirementSimpledata both use existing SQL pulls which I am not allowed to touch, otherwise I would have simply defined new SQL for these methods in my "DAOImpl." Therefore, I have to work with the data I am getting, and programmatically check for this.
Below, is the report which is being generated with my code. As you can see, I am ending up with zeros, which is showing the difference (incorrectly) as zeros, because Cnum #45, in this example simply doesn't exist in the second dataset (retirementSimpleData)
What is the datatype of Cnum, if it is int then default value is Zero.
You have to add else-if condition to check for example:
else if (surveyCompareAssetsCheckData2.getCnum()== 0){
-------- logic here --------------------
}
else if (surveyCompareAssetsCheckData.getCnum() ==0){
----------logic here -----------
}
I have a cassandra server that is queried by another service and I need to reduce the amount of queries.
My first thought was to create a bloom filter of the whole database every couple of minutes and send it to the service.
but as I have a couple of hundreds of gigabytes in the database (which is expected to grow to a couple of terabytes), it doesn't seem like a good idea overloading the database every few minutes.
After a while of searching for a better solution, I remembered that cassandra maintains its own bloom filter.
Is it possible to copy the *-Filter.db files and use them in my code instead of creating my own bloom filter?
I have Created a table test
CREATE TABLE test (
a int PRIMARY KEY,
b int
);
Inserted 1 row
INSERT INTO test(a,b) VALUES(1, 10);
After flush data to disk. we can use the *-Filter.db file. For my case it was la-2-big-Filter.db
Here is the sample code to check if a partition key exist
Murmur3Partitioner partitioner = new Murmur3Partitioner();
try (DataInputStream in = new DataInputStream(new FileInputStream(new File("la-2-big-Filter.db"))); IFilter filter = FilterFactory.deserialize(in, true)) {
for (int i = 1; i <= 10; i++) {
DecoratedKey decoratedKey = partitioner.decorateKey(Int32Type.instance.decompose(i));
if (filter.isPresent(decoratedKey)) {
System.out.println(i + " is present ");
} else {
System.out.println(i + " is not present ");
}
}
}
Output :
1 is present
2 is not present
3 is not present
4 is not present
5 is not present
6 is not present
7 is not present
8 is not present
9 is not present
10 is not present
I'm writing a simple java program, that does a simple task : it takes in input a text files folder, and it returns as output the 5 words with highest frequency per document.
At first, I tried to do it without any database support, but when I started having memory problems, I decided to change approach and configured the program to run with SQLite.
Everything works just fine now, but it takes a lot of time to just add the words in the database ( 67 seconds for 801 words).
Here is how I initiate the database :
this.Execute(
"CREATE TABLE words ("+
"word VARCHAR(20)"+
");"
);
this.Execute(
"CREATE UNIQUE INDEX wordindex ON words (word);"
);
then, once the programs has counted the documents in the folder ( let's say N), I add N counter columns and N frequency columns to the table
for(int i = 0; i < fileList.size(); i++)
{
db.Execute("ALTER TABLE words ADD doc"+i+" INTEGER");
db.Execute("ALTER TABLE words ADD freq"+i+" DOUBLE");
}
At last, I add words using the following funcion:
public void AddWord(String word, int docid)
{
String query = "UPDATE words SET doc"+docid+"=doc"+docid+"+1 WHERE word='"+word+"'";
int rows = this.ExecuteUpdate(query);
if( rows <= 0)
{
query = "INSERT INTO words (word,doc"+docid+") VALUES ('"+word+"',1)";
this.ExecuteUpdate(query);
}
}
Am i doing something wrong, or it's normal for an update query to take this long to execute?
Wrap all commands inside one transaction, otherwise you get one transaction (with the associated storage synchronizatrion) per command.
12 per second is slow but not unreasonable. With a database like MySQL I would expect it to be closer to 100/second with a HDD storage disk.
I have a requirement that should have one drop down containing some conditions on age.
like less than 10days,between 10 to 30 days,between 1 month to 3 months,between 4 month to 12 months,between 1yr to 2 yr.
I have domain class containing one property age(integer).and i am calculating age form dob to current date and storing in DB.I have search criteria to search based on age in search page,So how can i display these condition vales in drop down and when i select one option how to display the result based on age.
presently i am displaying all ages in drop down form the DB, please find the code and help me in doing this, if its not clear please write the comments so that i can explain u.
this is my drop down contaning all dobs
<td><span id="availableAge" ></span></td>
This is my script to get dobs from controller with an ajax call
function generateAge(data){
var list ="<select style='width:100px' id='age' name='age'><option value=''>-Select-</option>";
var opt;
for(var i=0; i<data.ageDetails.length; i++){
opt = "<option value="+data.ageDetails[i].age+">";
opt = opt+data.ageDetails[i].age;
opt = opt+"</option>";
list = list+opt;
}
list = list+"</select>";
var listObj = document.getElementById("availableAge");
if(listObj){
listObj.innerHTML = list;
}
}
It's a bad idea to store age in DB, as it changes all the time - better stick with DOB.
As the option set is fixed, make something like an enum for it, use its values() to render a select
enum AgeCriteriaEnum { NONE, LESS_THAN_10, BETWEEN_10_AND_30, ... so on }
and just do a switch() like:
AgeCriteriaEnum ageEnum = AgeCriteriaEnum.valueOf(params.ageEnum)
Date today = new Date()
Patient.withCriteria {
switch(ageEnum) {
case AgeCriteriaEnum.NONE:
break;
case AgeCriteriaEnum.LESS_THAN_10:
ge('dob', today-10)
break;
case AgeCriteriaEnum.BETWEEN_10_AND_30:
lt('dob', today-10)
ge('dob', today-30)
break;
//... so on
}
}