How should I store a date interval in Cassandra?

How should I store a date interval in Cassandra? - java

I'm working on an application that stores sensor measurements. Sometimes, the sensors will send erroneous measurements (e.g. the measured value is out of bound). We do not want to persist each measurement error separately, but we want to persist statistics about these errors, such as the sensor id, the date of the first error, the date of the last error, and other infos like the number of successive errors, which I'll omit here...
Here is a simplified version of the "ErrorStatistic" class:
package foo.bar.repository;
import org.joda.time.DateTime;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
import static com.google.common.base.Preconditions.checkNotNull;
public class ErrorStatistic {
#Nonnull
private final String sensorId;
#Nonnull
private final DateTime startDate;
#Nullable
private DateTime endDate;
public ErrorStatistic(#Nonnull String sensorId, #Nonnull DateTime startDate) {
this.sensorId = checkNotNull(sensorId);
this.startDate = checkNotNull(startDate);
this.endDate = null;
}
#Nonnull
public String getSensorId() {
return sensorId;
}
#Nonnull
public DateTime getStartDate() {
return startDate;
}
#Nullable
public DateTime getEndDate() {
return endDate;
}
public void setEndDate(#Nonnull DateTime endDate) {
this.endDate = checkNotNull(endDate);
}
}
I am currently persisting these ErrorStatistic using Hector as follows:
private void persistErrorStatistic(ErrorStatistic errorStatistic) {
Mutator<String> mutator = HFactory.createMutator(keyspace, StringSerializer.get());
String rowKey = errorStatistic.getSensorId();
String columnName = errorStatistic.getStartDate().toString(YYYY_MM_DD_FORMATTER);
byte[] value = serialize(errorStatistic);
HColumn<String, byte[]> column = HFactory.createColumn(columnName, value, StringSerializer.get(), BytesArraySerializer.get());
mutator.addInsertion(rowKey, COLUMN_FAMILY, column);
mutator.execute();
}
private static final DateTimeFormatter YYYY_MM_DD_FORMATTER = DateTimeFormat.forPattern("yyyy-MM-dd");
When we receive the first measurement in error, we create an ErrorStatistic with sensorId and startDate set, and a null endDate. This ErrorStatistic is kept in our in-memory model, and persisted in Cassandra.
We then update the ErrorStatistic in memory for the next measurements in error, until we receive a valid measurement, at which point the ErrorStatistic is persisted and removed from our in-memory model.
Cassandra thus contains ErrorStatistics with open-ended intervals (e.g. [2012-08-01T00:00Z|null]), and closed intervals (e.g. [2012-08-01T00:00Z|2013-01-12T10:23Z]).
I want to be able to query these ErrorStatistics by date.
For example, if I have these 3 error statistics:
sensorId = foo
startDate = 2012-08-01T00:00Z
endDate = 2012-09-03T02:10Z
sensorId = foo
startDate = 2012-10-04T03:12Z
endDate = 2013-02-01T12:28Z
sensorId = foo
startDate = 2013-03-05T23:22Z
endDate = null
(this means we have not received a valid measurement since 2013-03-05)
If I query Cassandra with the date:
2012-08-04T10:00Z --> it should return the first ErrorStatistic
2012-09-04T00:00Z --> it should return that there were no errors at this time
2014-01-03T00:00Z --> it should return the last ErrorStatistic (since it is open-ended)
I am not sure how I should store and "index" these ErrorStatistic objects, to efficiently query them. I am quite new to Cassandra, and I might be missing something obvious.
Edit: the following was added in response to Joost's suggestion that I should focus on the type of queries I am interested in.
I will have two types of query:
The first, as you guessed, is to list all ErrorStatistics for a given sensor and time range. This seems relatively easy. The only problem I will have, is when an ErrorStatistics starts before the time range I'm interested in (e.g. I query all errors for the months of april, and I want my query to return ErrorStatistics[2012-03-29:2012-04-02] too...)
The second query seems harder. I want to find, for a given sensor and date, the ErrorStatistics whose interval contains the given date, or whose startDate precedes the given date, with a null endDate (this means that we are still receiving errors for this sensor). I don't know how to do this efficiently. I could just load up all ErrorStatistics for the given sensor, then check the intervals in Java... But I'd like to avoid this if possible. I guess I want Cassandra to start at a given date and look backward until it finds the first ErrorStatistics with a startDate that precedes the given date (if any), then load it and check in Java if its endDate is null or after the given date. But I have no idea if that's possible, and how efficient that would be.

The question you have to ask yourself is what questions you have towards the ErrorStatistics. Cassandra schema design typically starts with a 'Table per query' approach. Don't start with the data (entities) you have, but with your questions/queries. This is a different mindset than 'traditional' rdbms design, and I found it takes some time to get used to.
For example, do you want to query the statistics per Sensor? Than a table with a composite key (sensor id, timeuuid) could be a solution. Such a table allows for quick lookup per sensor id, sorting the results based on time.
If you want to query the sensor statistics based on time only, a (composite) key with a time unit may be of more help, possibly with sharding elements to better distribute the load over nodes. Note that there is catch: range queries on primary keys are not feasible using the Cassandra random or murmur partitioners. There are other partitioners, but they easily tend to uneven load distribution in your cluster.
In short, start with the answers you want, and then work 'backwards' to your table design. With a proper schema, your code will follow.
Addition (2013-9-5): What is good to know is that Cassandra sorts data within the scope of a single partition key. That is something very useful. For example the measurements would be ordered by start_time in descending order (newest first) if you define a table as:
create table SensorByDate
(
sensor_id uuid,
start_date datetime,
end_date datetime,
measurement int
primary key (sensor_id, start_date)
)
with clustering order by (start_time DESC);
In this example the sensor_id is the partition key and determines the node this row is stored on. The start_date is the second item in the composite key and determines the sort order.
To get the first measurement after a certain start date in this table you could formulate a query like
select * from SensorByDate
where sensor_id = ? and start_date < ? limit 1

Related

Data stored in Class extending application returning null

I have a data class that extends Application and one of the data sets its supposed to be storing is a HashMap of POI locations and Time visited
public class CharacterSheet extends Application {
private HashMap<PointOfInterest, Date> coolDowns = new HashMap<>();
public HashMap GetAllCoolDowns() { return coolDowns; } //dev only?
public Date GetCoolDown(PointOfInterest poi) {return coolDowns.get(poi);}
public Date PutCoolDown(PointOfInterest poi, Date date) {return coolDowns.put(poi, date);}}
Then on a google maps activity I grab the OnPOIclick
#Override
public void onPoiClick(final PointOfInterest poi) {
//POI Cool Down
Date currentTime = Calendar.getInstance().getTime();
Date lastTime = ((CharacterSheet) this.getApplication()).GetCoolDown(poi);//this ONLY returns null??
if (lastTime != null){
int timeDiff = currentTime.compareTo(lastTime);
makeToast("Time Since last visit: " + timeDiff );
} else { makeToast("First");
}
((CharacterSheet) this.getApplication()).PutCoolDown(poi, currentTime);
makeToast("This?" + ((CharacterSheet) this.getApplication()).GetCoolDown(poi));}
The order should be Click poi, Get current time.. get last time visited, if last time is null.. never been before, store time and date in a hashmap with poi as key
Next time turn up and this time last time should not be null as we stored this poi and time already.. but no matter what it returns null..
Last line of code is a makeToast helper telling me what is in the Data Class.. this gives me a date value of when I clicked not a null value
There is a fragment generated later on in the OnPOIClick, but still before the user can do anything, which you end up looking at and have to "back" out of, I don't know how this could effect it as all the code is finished before even calling for data for the fragment but feel It should be mentioned
PlacesClient placesClient = Places.createClient(this);
String placeId = poi.placeId;
List<Place.Field> placeFields = Arrays.asList(Place.Field.ID, Place.Field.TYPES);
FetchPlaceRequest request = FetchPlaceRequest.newInstance(placeId, placeFields);
placesClient.fetchPlace(request).addOnSuccessListener(new OnSuccessListener<FetchPlaceResponse>() {
#Override
public void onSuccess(FetchPlaceResponse fetchPlaceResponse) {
Place place = fetchPlaceResponse.getPlace();
PlaceDataHolder holder = new PlaceDataHolder(place);
String placeName = poi.name;
makeLootFragment(holder,placeName);
The Fragment launched is the entire point of clicking the POI so if this is the case I'll need to think of another way of handling the cooldowns.. but I really don't see why it would interfere.

I was not being thoughtful enough about the PointOfInterest that was returned. It comes with a UUID for every time the request is made, rendering it useless as a key as every time I clicked it changed.. this was discovered by changing my post operation check from making sure it had gone into the hashMap, to seeing the contents of the entire hashMap, soon saw that the entries where building up despite only clicking one poi
To solve this was simple enough. I created a new String variable from poi.name and used that in place of the poi, had to change the HashMap to accept a String rather than a PointOfInterest

Select BETWEEN dates returns wrong results

I have a Table in MySQL which has it's column definitions as below:
CREATE TABLE APPOINTMENT(
CD_APPOINTMENT BIGINT NOT NULL,
-- omitted for brevity
APPOINT_DATE DATE NOT NULL
);
My JPA entity is defined as:
#Entity
#Table(name = "APPOINTMENT")
public class Appointment {
protected Long id;
protected Date date = new Date();
// other atributes omitted for brevity
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "CD_APPOINTMENT")
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
#Temporal(TemporalType.DATE)
#Column(name = "APPOINT_DATE", columnDefinition = "DATE")
public Date getDate() {
return date;
}
}
As I'm using Spring, I have benefits of Spring Data JPA. Following that line, I'm using Spring Data JPA Repositories.
I'm testing in 2019-07-12 (at my timezone [UTC-3]).
When I run:
appointmentRepository.save(appointment);
the Appointment is successfully (more or less) saved.
Fine! The column APPOINT_DATE has the value of 2019-07-12, yes? Well, it's seems ok.
When I run:
SELECT * FROM APPOINTMENT;
the retrieved rows looks as expected:
CD_APPOINTMENT|APPOINT_DATE
--------------|------------
1| 2019-07-12
The strange part appears when I try to filter BETWEEN dates.
If I run my JPQL:
SELECT ap FROM Appointment AS ap WHERE ap.date BETWEEN :startDate AND :endDate
startDate and endDate are parameters received in a #Param annotation in Spring and both of them have the value of 2019-07-12
I get 0 rows, but I was expecting to get one (the above inserted Appointment). Firstly, I thought it was a problem with the JPQL, but it's not. If I execute the same JPQL with a different RDBMS (like H2, for an example), the query works perfectly.
And if I run the same JPQL but in SQL, directly on the MySQL database:
SELECT * FROM APPOINTMENT where APPOINT_DATE BETWEEN '2019-07-12' AND '2019-07-12'
just like the JPQL it returns 0 rows.
If I run the now(); command at MySQL database, it return the CORRECT date time.
How can I fix it?
Has anybody seen something like that already? Because I have not.

BETWEEN '2019-07-12' AND '2019-07-13'

It is best not to use between for date/times. One reason is because there might be a time component that throws off the comparison.
I would suggest:
SELECT *
FROM APPOINTMENT
WHERE APPOINT_DATE >= '2019-07-12' AND
APPOINT_DATE < '2019-07-13'
This logic works with an without a time component. And it can take advantage of an index on the date column.

My MySQL instance is from Amazon RDS.
Their default Time Zone is UTC. Switched from UTC to Brazil/East and now it's working as expected.

JPA set timestamp generated by database but not CURRENTTIME or UPDATETIME

I have a class like this
#Entity
#Table(name = "Event")
public class Event {
#Transient
public static final long MAX_TIMESTAMP = 253402214400000L; // 9999-12-31 00:00:00
private Date creationTime;
private Date expiryTime;
private String otherValue;
public Event(int timeout, String otherValue){
this.creationTime = new Date();
this.expiryTime = (timeout == 0 ? new Date(MAX_TIMESTAMP) : new Date(creationTime.getTime() + SECONDS.toMillis(timeout)));
this.otherValue = otherValue;
}
}
I call save() methed in CrudRepository and save this data.
and I have a ScheduledExecutorService to find out some timeout events:
#Query("SELECT t FROM Event t WHERE t.expiryTime < CURRENT_TIMESTAMP")
List<Event> findTimeoutEvents();
this CURRENT_TIMESTAMP is database's time, but expiryTime is not. It means that I must make their time is same.sometimes, the application and database are not in the same machine, I can not make sure their time is same.
Can I set "expiryTime" generated by database? How can I pass the parameter "timeout" to database.
the database maybe postgresql or mysql.
thank you very much.

First of all I am not sure your code works, since instance of java.util.Date (if expiry time is java.util.Date object) can not be compared to int 0.
As for generating an expiryTime, yes, you obviously can. Check out how do triggers work.
Also I would like to add, that if you use spring-boot-starter-data-jpa, you may annotate creationTime field with #CreationTimestamp annotation. But I would personally set default value to CURRENT_TIMESTAMP() on db side.

Store a local year for Date

I am going to store only year value in Database and retrieve it.
This is my domain (POJO)
#Entity
public class Publisher {
public Publisher(..., Date establishDate) {
//assign other variables
this.setEstablishDate(establishDate);
}
#NotNull
private Date establishDate;
...
}
And here is my DTO:
#NotNull
#JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy")
private Long establish_date;
Here, i am creating a new publisher:
new Publisher(..., new Date(this.establish_date));
I sent a json with value 1370 for establish_date (for post a new publisher) , but in Database it displays as: 1970-01-01 03:30:01
Why?
And when i Get the Publisher, it displays establish_date as 1000 !
What is wrong ?

You are using the wrong constructor. The argument specifies the milliseconds since 1970 - not a year: Date(long) You may use the right constructor: Date(int, int, int)
Note that most of the Date API is deprecated. There are better alternatives like Calendar and DateTime. Since you are only storing the year you could also use plain integer. This will make a lot easier.

How to setup jadira PersistentLocalDateTime with java.time.LocalDateTime?

I am trying to persist java.time.LocalDateTime using Hibernate and JPA. I used Jadira Framework ("org.jadira.usertype:usertype.core:3.2.0.GA" & "org.jadira.usertype:usertype.extended:3.2.0.GA"). I created package-info.java file and created #TypeDefs({#TypeDef(defaultForType = java.time.LocalDateTime.class, typeClass = org.jadira.usertype.dateandtime.threeten.PersistentLocalDateTime.class)}) there. I tested the solution and the java.time.LocalDateTime fields are stored/retrieved to my MySQL database in DATETIME columns (almost) correctly.
The only problem is that the values in database are +2 hours to the correct time value from fields in Java. I'm in CEST (UTC+2) so I understood that this is some problem with time zones. I debugged the code of PersistentLocalDateTime and this is what I found.
PersistentLocalDateTime is using org.jadira.usertype.dateandtime.threeten.columnmapper.AbstractTimestampThreeTenColumnMapper
AbstractTimestampThreeTenColumnMapper has field ZoneOffset databaseZone by default set to ZoneOffset.of("Z") (UTC).
Because it is thinking that my database is in UTC timezone (and the application is in UTC+2) it adds two hours to my time during conversion to database (and subtracts two hours from my time during conversion from database). So in the application I see the correct date and time but in database I not.
I found that a can add parameters to the #TypeDef so I specified them as below:
#TypeDef(defaultForType = LocalDateTime.class, typeClass = PersistentLocalDateTime.class,
parameters = {
#Parameter(name = "databaseZone", value = "+02:00")
}),
but I've got an exception:
java.lang.IllegalStateException: Could not map Zone +02:00 to Calendar
at org.jadira.usertype.dateandtime.threeten.columnmapper.AbstractTimestampThreeTenColumnMapper.getHibernateType(AbstractTimestampThreeTenColumnMapper.java:59)
I debugged a little bit more. AbstractTimestampThreeTenColumnMapper has two methods:
public final DstSafeTimestampType getHibernateType() {
if (databaseZone == null) {
return DstSafeTimestampType.INSTANCE;
}
Calendar cal = resolveCalendar(databaseZone);
if (cal == null) {
throw new IllegalStateException("Could not map Zone " + databaseZone + " to Calendar");
}
return new DstSafeTimestampType(cal);
}
private Calendar resolveCalendar(ZoneOffset databaseZone) {
String id = databaseZone.getId();
if (Arrays.binarySearch(TimeZone.getAvailableIDs(), id) != -1) {
return Calendar.getInstance(TimeZone.getTimeZone(id));
} else {
return null;
}
}
getHibernateType method throws the exception because resolveCalendar method returns null. Why it returns null? Because time zones IDs from java.time.ZoneOffset and java.util.TimeZone does not match. As far as I see the only possible value which match is Z. Any other values causes exceptions.
Is there any way to setup this correctly? Or is it a bug in the Jadira Framework?

It looks like a serious bug. The problem is that jadira.usertype.databaseZone parameter is parsed to ZoneOffset instead ZoneId. This way, resolveCalendar method compares 2 different types Zone and Offset. What is funny, parameter is named databaseZone but it does not contain zone. It contains only offset.
https://github.com/JadiraOrg/jadira/issues/42
https://github.com/JadiraOrg/jadira/issues/43

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.