I want to store millions of time series entries (long time, double value) with Java. (Our monitoring system is currently storing every entry in a large mySQL table but performance is very bad.)
Are there time series databases implemented in java out there?
checkout http://opentsdb.net/ as used by StumbleUpon?
checkout http://square.github.com/cube/ as used by square
I hope to see additional suggestions in this thread.
The performance was bad because of wrong database design. I am using mysql and the table had this layout:
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| fk_category | smallint(6) | NO | PRI | NULL | |
| method | enum('min','max','avg','sum','none') | NO | PRI | none | |
| time | timestamp | NO | PRI | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| value | float | NO | | NULL | |
| accuracy | tinyint(1) | NO | | 0 | |
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
My fault was an inapproriate index. After adding a multi column primary key all my queries are lightning fast:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| job | 0 | PRIMARY | 1 | fk_category | A | 18 | NULL | NULL | | BTREE | | |
| job | 0 | PRIMARY | 2 | method | A | 18 | NULL | NULL | | BTREE | | |
| job | 0 | PRIMARY | 3 | time | A | 452509710 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Thanks for all you answers!
You can take a look at KDB. It's primarily used by financial companies to fetch market time series data.
What do you need to do with the data and when?
If you are just saving the values for later, a plain text file might do nicely, and then later upload to a database.
Related
I have a requirement where I have to save employee detail to a child table, employee data remains the same for multiple parent records (columns as well), I want to use the existing child table to insert if not present and update if already present, how I can do that using Spring JPA / hibernate?
Parent Table
|id|project_id|owner_emp_id|developer_emp_id|lead_emp_id|tester_emp_id|
|:---- |:------:| -----:|:---- |:------:| -----:|
| 1 | 100 | emp_10 | emp_20 | emp_20 | emp_30 |
| 2 | 200 | emp_11 | emp_21 | emp_22 | emp_30 |
employee child table
|emp_id|first_name|olast_name|email|phone|
|:---- |:------:| -----:|:---- |:------:|
| emp_10 |..| .. | .. | .. |
| emp_20 | .. | .. | .. | .. |
| emp_30 | .. | .. | .. | .. |
| emp_11 | .. | .. | .. | .. |
| emp_21 | .. | .. | .. | .. |
| emp_22 | .. | .. | .. | .. |
from the above example when saving the child table during 1st record emp_20 data need to save only once. similarly, the 2nd record insert emp_30 already presents an update or ignores saving it, how to do this in JPA?
I am trying to write a cucumber feature file using data tables. The object that I need to form using the dataTable has a field which requires two fields. Example:
| Name | Owner | Properties.Key | Properties.value |
| Name1 | myself | someKey1 | someValue1 |
| Name2 | robins | someKey2 | someValue2 |
I was wondering if instead of writing it this way, if there's a better way to write the nested objects using dataTables. Something more like SpecFLow. Example:
| Name | Owner | Properties |
| name1 | myself | {nested} |
| name2 | robins | {nested} |
| key | value |
| someKey1 | someValue1 |
| someKey2 | someValue2 |
Or is there any other way to make the nested dataTable??
Also, how will the steps for the table will look like in java?
I am trying to export a table from HDFS to SQOOP but I am getting java exceptions.
The query I'm using is as follows:
sqoop export --connect jdbc:mysql://172.31.54.174/Database --driver com.mysql.jdbc.Driver --username user --password userpassword --table accounts --export-dir /user/pri/accounts
While execution this query gives me below error:
17/03/29 07:54:26 INFO mapreduce.Job: map 0% reduce 0%
17/03/29 07:54:30 INFO mapreduce.Job: Task Id : attempt_1489328678238_4886_m_000002_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
at accounts.__loadFromFields(accounts.java:691)
at accounts.parse(accounts.java:584)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:204)
at accounts.__loadFromFields(accounts.java:643)
... 12 more
The file that I am exporting contains data as below :
1,2008-10-23 16:05:05.0,\N,Donald,Becton,2275 Washburn Street,Oakland,CA,94660,5100032418,2014-03-18 13:29:47.0,2014-03-18 13:29:47.0
2,2008-11-12 03:00:01.0,\N,Donna,Jones,3885 Elliott Street,San Francisco,CA,94171,4150835799,2014-03-18 13:29:47.0,2014-03-18 13:29:47.0
I have also created the table accounts and its structure is as follows:
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| acct_num | varchar(20) | NO | PRI | | |
| acct_create_dt | datetime | NO | | NULL | |
| acc_close_dt | datetime | YES | | NULL | |
| first_name | varchar(20) | NO | | NULL | |
| last_name | varchar(20) | NO | | NULL | |
| address | varchar(30) | NO | | NULL | |
| city | varchar(20) | NO | | NULL | |
| state | varchar(20) | NO | | NULL | |
| zipcode | varchar(20) | NO | | NULL | |
| phone_number | varchar(20) | YES | | NULL | |
| created | datetime | NO | | NULL | |
| modified | datetime | NO | | NULL | |
+----------------+-------------+------+-----+---------+-------+
I am also attaching a screenshot of the error.
As you can see from your logs '\N' is like escape character so it isnot fitting in varchar.I dont understand why you are adding the same. Also the timestamp format issues is indicated.And also check in your existing data if any column you are using for primary key is repeated itself.
Add --input-null-string '\\N' --input-null-non-string '\\N' in your sqoop export command.
So I have the following table that I must map to Java Objects:
+---------+-----------+---------------------+---------------------+--------+
| task_id | attribute | lastModified | activity | row_id |
+---------+-----------+---------------------+---------------------+--------+
| 1 | 1 | 2016-08-23 21:05:09 | first activity | 1 |
| 1 | 3 | 2016-08-23 21:08:28 | connect to db | 2 |
| 1 | 3 | 2016-08-23 21:08:56 | create web services | 3 |
| 1 | 4 | 2016-08-23 21:08:56 | data dump | 4 |
| 1 | 5 | 2016-08-23 21:08:56 | test cases | 5 |
| 1 | 6 | 2016-08-23 21:08:57 | dao object | 6 |
| 1 | 7 | 2016-08-23 21:08:57 | buy streetfood | 7 |
| 2 | 6 | 2016-08-23 21:08:57 | drink coke | 8 |
| 2 | 6 | 2016-08-23 21:09:00 | drink tea | 9 |
| 2 | 1 | 2016-08-23 21:12:30 | make tea | 10 |
| 2 | 2 | 2016-08-23 21:13:32 | charge phone | 11 |
| 2 | 3 | 2016-08-23 21:13:32 | shower | 12 |
| 2 | 4 | 2016-08-23 21:13:32 | sleep | 13 |
+---------+-----------+---------------------+---------------------+--------+
Here, each Task object( identified by the task_id column) has multiple attribute objects. These attribute objects have the lastModified, and activity fields. So far my approach has been to create a Row object have each row of the table mapped to a Row object via myBatis. Then do some Java-side processing to sort everything out. Is there a way to directly map this table via myBatis annotations and/or xml so that the 2 Task objects are created with each of them having a list of populated Atttribute objects inside?
Here is mybatis document:http://www.mybatis.org/mybatis-3/sqlmap-xml.html .May be you can use mybatis collection to solve your problem.
I have the tables accounts and action. accounts needs to be modified according to the instruction stored in action.
In action each row contains an account-id, an action (i=insert, u=update, d=delete, x=invalid operation) and an amount by which to update the account.
On an insert, if the account already exists, an update should be done
instead
On an update, if the account does not exist, it is created by an
insert
On a delete, if the row does not exist, no action is taken
Input
accounts:
+---id----value--+
| 1 | 1000 |
| 2 | 2000 |
| 3 | 1500 |
| 4 | 6500 |
| 5 | 500 |
+----------------+
action:
+---account_id---o---new_value---status---+
| 3 | u | 599 | |
| 6 | i | 2099 | |
| 5 | d | | |
| 7 | u | 1599 | |
| 1 | i | 399 | |
| 9 | d | | |
| 10 | x | | |
+-----------------------------------------+
Output
accounts:
+---id----value--+
| 1 | 399 |
| 2 | 800 |
| 3 | 599 |
| 4 | 1400 |
| 6 | 20099 |
| 7 | 1599 |
+----------------+
action:
+---account_id---o---new_value-------------------status----------------+
| 3 | u | 599 | Update: Success |
| 6 | i | 20099 | Update: Success |
| 5 | d | | Delete: Success |
| 7 | u | 1599 | Update: ID not founds. Value inserted |
| 1 | i | 399 | Insert: Acc exists. Updated instead |
| 9 | d | | Delete: ID not found |
| 10 | x | | Invalid operation: No action taken |
+----------------------------------------------------------------------+
I am experienced with Java and JDBC, but unfortunately I just don't know, how to start here.
Do I need an additional table? Do I have to use triggers?
I've seen two techniques for an upsert. With the first technique, within a transaction, you test first to see if the row exists, and use the results to determine whether to perform an insert or an update. With the second technique, you try performing an update and verify the number of records updated (JDBC gives you this). If it's zero, then you do an insert, if one, then you're done.