Firebase Real Time DB - DatabaseReference.push().getKey() is a TimeStamp - java

According to the Documentation, the below code can set a timestamp as the key of the node using push() in the Realtime Database.
public void uploadToDB(String s) {
databaseReference.push().setValue(s);
}
The returned key are below of my push(), as an example:
a) -MpfCu14jtIkEk28D3CB
b) -MpfCxv_Nzv3YJ87MfZH
My question is:
are they timestamp?
if yes, can I decode it back to a readable timestamp?

are they timestamp?
No, those pushed IDs are not timestamps. However, it contains a time component.
As Michael Lehenbauer mentioned in this blog article:
Push IDs are string identifiers that are generated client-side. They are a combination of a timestamp and some random bits. The timestamp ensures they are ordered chronologically, and the random bits ensure that each ID is unique, even if thousands of people are creating push IDs at the same time.
And to answer the second question:
if yes, can I decode it back to readable timestamp?
If you reverse the engineering, probably yes. Please check the following answer:
How are Firebase IDs generated?
But would not count on that. To have an order according to a time component, then you should add a property of type "timestamp", as explained in my answer from the following post:
How to save the current date/time when I add new value to Firebase Realtime Database

Related

Firebase orderByChild on startAt() and endAt() return wrong results

I am searching record from Firebase Database on the formated date rang converted in String type. I am calling filtering query on requestPlaceDate.
Query query = ordersDatabaseRef.limitToFirst(1000).orderByChild(ConstanceFnc.requestPlacedDate).startAt(startDate).endAt(endDate);
query.addListenerForSingleValueEvent(orderListener);
Firebase return data including previous and next dates, not return specific date range data, what I am expecting from a query on startAt() and endAt()
I am searching record from Firebase database on the formated date rang converted in String type.
You aren't getting the desired results because your requestPlacedDate field holds a String and not a Timestamp. When you order String elements, the order is lexicographical. I answered a similar question, so please check my answer from the following post:
How to order the nodes in firebase console based on key
To accomplish the correct order for your results, you should change the type of your field to be Timestamp, as explained in my answer from the following post:
How to save the current date/time when I add new value to Firebase Realtime Database
Once your field will hold the Timestamp value, your Query will work perfectly fine.

Can you convert a Timestamp to a FieldValue?

In my app, user will upload a document to the database under their subcollection, with a Timestamp attached to the document. In my Security Rules, I have it programmed so that the Timestamp will always be equal to FieldValue.serverTimestamp(). This saves a FieldValue to the database.
The problem is, whenever I get the documentSnapshot back from Firebase and access the field, it comes back as a Timestamp and not a FieldValue. Is there a way I can convert a Timestamp to a FieldValue, or vice-versa?
No, you really do want that Timestamp instead of a FieldValue. FieldValue tokens are only used for writing field values that need to have their actual values computed on the Firestore backend instead of on the client. They are just tokens, not actual values. When you fetch the document and read the value back out, it will have the actual Timestamp value (or whatever the final value is), which is what you need to work with.

Cannot use HashMap for TIMESTAMP in child

I am trying a new timestamp, and want to insert it as a child. How to do that?
I have read this post:
Write serverValue.timestamp as child in Firebase .. and still don't understand
I've tried to enter ServerValue.TIMESTAMP into child and unsuccessful.
This my code:
Object timestamp = ServerValue.TIMESTAMP;
reference.child(String.valueOf(timestamp)).child(uid).child("Status").setValue(cA);
I've read this:
How to save the current date/time when I add new value to Firebase Realtime Database
I follow the code in it, and not work properly -->
What should I do?
The ServerValue.TIMESTAMP can only be written into a value, it cannot be used as the key in the database. So if you want to store the timestamp, you will have to write it as the value of a property. If you want to have chronologically ordered, unique keys, use the push() method.
So combined:
reference.push().setValue(ServerValue.TIMESTAMP);

Tweet Analyis : How to design

I need advice in designing a system meant for tweet analysis.
Objective: For a given hashtag, find out the frequency of co-occurrence with other hash-tags. Find out hourly pattern. We should be able to answer queries of this format: For a given date (say 13/Apr/2013) and for a given one hour time period (say 3:00-4:00 PM ) what are the top 5 co-occurring hashtag with "#iPhone".
My Approach: I am using "twitter4j" liabrary to access twitter data. I can query and get 100 tweets for one call(twitter only allows only those many). I can extract time and other relevant data. I am planning to have thread which will query twitter for every 5 mins. This is done to observer hourly patterns. Here is where I am struck: How should I store this information in DB? Should I maintain a hashmap with key as and value as frequency of occurring with "#iPhone". Or should I store unaggregated data directly in DB? what is the best way to query "twitter" to observer hourly patterns? Should I store the time in "epoch" format in DB or as date one column and hour as another column in DB ?
Thanks a lot for your valuable inputs.
I would suggest you to use the Streaming API in Twitter. That will allow you to keep a persistent HTTP connection to twitter so that you can search over tweets. Twitter recommend the Streaming API for tweet analysis type applications.
But you have to pre-process certain data so that the analysis will be faster. Also look into twitter4j's inherent Streaming API support.
For an example please look into the following Github code.
As ay89 said, use key - tag and value - freq, aggregate before storing to DB, and use epoch.
In addition, because this is a multithreaded program, you have two options for synchronization:
Option 1 is to use a ConcurrentHashMap. When the aggregator runs, it will use:
(for Key key : hashMap.keySet()) {
Database.save(key, hashMap.get(key));
hashMap.replace(key, 0);
}
In other words, set a tag's freq to 0 after writing it to the database. And the method adding tweet data will use
public void increment(Key key) {
boolean done = false;
while(!done) {
int current = hashMap.get(key);
int newValue = current + 1;
done = hashMap.replace(key, current, newValue);
}
}
This is a thread-safe way to increment the frequency.
Option 2 probably makes more sense. Your aggregator will replace the hashmap with a new instance.
class DataStore {
Map map = new HashMap();
public void add(Key key, Value value) {
// called by the method querying tweet data
}
public void aggregate() {
// called by the aggregator thread every five minutes
Map oldMap = map;
map = new HashMap();
DataBase.save(oldMap);
}
}
Bottom line is that you don't want to modify the hashmap in an uncontrolled fashion while the aggregator is saving it to the database. The second option is simpler because it simply creates a new hashmap for the querying thread to modify while the aggregator saves the old hashmap to the database.
since you only have to retrieve the frequency, its better to store it in hash, (key - tag, value - freq) because having non-aggregated data stored in db would take more space (and mostly for info which is not required) and ultimately you would have to aggregate it later.
epoch time is good way to store the time. since you can use it to localize it according to timezone, if required later on.

How can I insert common data into a temp table from disparate schemas?

I am not sure how to solve this problem:
We import order information from a variety of online vendors ( Amazon, Newegg etc ). Each vendor has their own specific terminology and structure for their orders that we have mirrored into a database. Our data imports into the database with no issues, however the problem I am faced with is to write a method that will extract required fields from the database, regardless of the schema.
For instance assume we have the following structures:
Newegg structure:
"OrderNumber" integer NOT NULL, -- The Order Number
"InvoiceNumber" integer, -- The invoice number
"OrderDate" timestamp without time zone, -- Create date.
Amazon structure:
"amazonOrderId" character varying(25) NOT NULL, -- Amazon's unique, displayable identifier for an order.
"merchant-order-id" integer DEFAULT 0, -- A unique identifier optionally supplied for the order by the Merchant.
"purchase-date" timestamp with time zone, -- The date the order was placed.
How can I select these items and place them into a temporary table for me to query against?
The temporary table could look like:
"OrderNumber" character varying(25) NOT NULL,
"TransactionId" integer,
"PurchaseDate" timestamp with time zone
I understand that some of the databases represent an order number with an integer and others a character varying; to handle that I plan on casting the datatypes to String values.
Does anyone have a suggestion for me to read about that will help me figure this out?
I don't need an exact answer, just a nudge in the right direction.
The data will be consumed by Java, so if any particular Java classes will help, feel free to suggest them.
First, you can create a VIEW to provide this functionality:
CREATE VIEW orders AS
SELECT '1'::int AS source -- or any other tag to identify source
,"OrderNumber"::text AS order_nr
,"InvoiceNumber" AS tansaction_id -- no cast .. is int already
,"OrderDate" AT TIME ZONE 'UTC' AS purchase_date -- !! see explanation
FROM tbl_newegg
UNION ALL -- not UNION!
SELECT 2
"amazonOrderId"
,"merchant-order-id"
,"purchase-date"
FROM tbl_amazon;
You can query this view like any other table:
SELECT * FROM orders WHERE order_nr = 123 AND source = 2;
The source is necessary if the order_nr is not unique. How else would you guarantee unique order-numbers over different sources?
A timestamp without time zone is an ambiguous in a global context. It's only good in connection with its time zone. If you mix timestamp and timestamptz, you need to place the timestamp at a certain time zone with the AT TIME ZONE construct to make this work. For more explanation read this related answer.
I use UTC as time zone, you might want to provide a different one. A simple cast "OrderDate"::timestamptz would assume your current time zone. AT TIME ZONE applied to a timestamp results in timestamptz. That's why I did not add another cast.
While you can, I advise not to use camel-case identifiers in PostgreSQL ever. Avoids many kinds of possible confusion. Note the lower case identifiers (without the now unnecessary double-quotes) I supplied.
Don't use varchar(25) as type for the order_nr. Just use text without arbitrary length modifier if it has to be a string. If all order numbers consist of digits exclusively, integer or bigint would be faster.
Performance
One way to make this fast would be to materialize the view. I.e., write the result into a (temporary) table:
CREATE TEMP TABLE tmp_orders AS
SELECT * FROM orders;
ANALYZE tmp_orders; -- temp tables are not auto-analyzed!
ALTER TABLE tmp_orders
ADD constraint orders_pk PRIMARY KEY (order_nr, source);
You need an index. In my example, the primary key constraint provides the index automatically.
If your tables are big, make sure you have enough temporary buffers to handle this in RAM before you create the temp table. Else it will actually slow you down.
SET temp_buffers = 1000MB;
Has to be the first call to temp objects in your session. Don't set it high globally, just for your session. A temp table is dropped automatically at the end of your session anyway.
To get an estimate how much RAM you need, create the table once and measure:
SELECT pg_size_pretty(pg_total_relation_size('tmp_orders'));
More on object sizes under this related question on dba.SE.
All the overhead only pays if you have to process a number of queries within one session. For other use cases there are other solutions. If you know the source table at the time of the query, it would be much faster to direct your query to the source table instead. If you don't, I would question the uniqueness of your order_nr once more. If it is, in fact, guaranteed to be unique you can drop the column source I introduced.
For only one or a few queries, it might be faster to use the view instead of the materialized view.
I would also consider a plpgsql function that queries one table after the other until the record is found. Might be cheaper for a couple of queries, considering the overhead. Indexes for every table needed of course.
Also, if you stick to text or varchar for your order_nr, consider COLLATE "C" for it.
Sounds like you need to create an abstract class that will define the basics of interacting with the data, then derive a class per database schema you need to access. This will allow the core code to operate on a single object type, and each implementation can then specify the queries in a form specific to that database schema.
Something like:
public class Order
{
private String orderNumber;
private BigDecimal orderTotal;
... etc ...
}
public abstract class AbstractOrderInformation
{
public abstract ArrayList<Order> getOrders();
...
}
with a Newegg class:
public class NeweggOrderInformation extends AbstractOrderInformation
{
public ArrayList<Order> getOrders() {
... do the work of getting the newegg order
}
...
}
Then you can have an arbitrarily large number of formats and when you need information, you can just iterate over all the implementations and get the Orders from each.

Categories