Storing custom level system in MySQL - java

I am wondering how I would store my custom network level in a MySQL table. I could make four columns, 'level', 'exp', 'expreq' and 'total'. Only this will take up four columns, and as I am storing name, rank and other data in the same table it will be too many columns in the end. Are there better ways? Should I make another table?

In a relational data model, and for expansion ability you have to do it in a different table. by which the master can point to the detailed table where you can have as many attributes as you can.
BUT
This has an obvious impact on the memory when it becomes large, in addition to that, this approach is usually being replaced by less-normalized version of the tables by introducing concepts like "Custom Fields"
OR
If it is me, and this table will be accessible by certain programming language, I would store them in JSON format in very simple table. and let the program do the processing overhead

Related

How to implement efficiently unread-read message count in Java based application [duplicate]

I heard a lot about denormalization which was made to improve performance of certain application. But I've never tried to do anything related.
So, I'm just curious, which places in normalized DB makes performance worse or in other words, what are denormalization principles?
How can I use this technique if I need to improve performance?
Denormalization is generally used to either:
Avoid a certain number of queries
Remove some joins
The basic idea of denormalization is that you'll add redundant data, or group some, to be able to get those data more easily -- at a smaller cost; which is better for performances.
A quick examples?
Consider a "Posts" and a "Comments" table, for a blog
For each Post, you'll have several lines in the "Comment" table
This means that to display a list of posts with the associated number of comments, you'll have to:
Do one query to list the posts
Do one query per post to count how many comments it has (Yes, those can be merged into only one, to get the number for all posts at once)
Which means several queries.
Now, if you add a "number of comments" field into the Posts table:
You only need one query to list the posts
And no need to query the Comments table: the number of comments are already de-normalized to the Posts table.
And only one query that returns one more field is better than more queries.
Now, there are some costs, yes:
First, this costs some place on both disk and in memory, as you have some redundant informations:
The number of comments are stored in the Posts table
And you can also find those number counting on the Comments table
Second, each time someone adds/removes a comment, you have to:
Save/delete the comment, of course
But also, update the corresponding number in the Posts table.
But, if your blog has a lot more people reading than writing comments, this is probably not so bad.
Denormalization is a time-space trade-off. Normalized data takes less space, but may require join to construct the desired result set, hence more time. If it's denormalized, data are replicated in several places. It then takes more space, but the desired view of the data is readily available.
There are other time-space optimizations, such as
denormalized view
precomputed columns
As with any of such approach, this improves reading data (because they are readily available), but updating data becomes more costly (because you need to update the replicated or precomputed data).
The word "denormalizing" leads to confusion of the design issues. Trying to get a high performance database by denormalizing is like trying to get to your destination by driving away from New York. It doesn't tell you which way to go.
What you need is a good design discipline, one that produces a simple and sound design, even if that design sometimes conflicts with the rules of normalization.
One such design discipline is star schema. In a star schema, a single fact table serves as the hub of a star of tables. The other tables are called dimension tables, and they are at the rim of the schema. The dimensions are connected to the fact table by relationships that look like the spokes of a wheel. Star schema is basically a way of projecting multidimensional design onto an SQL implementation.
Closely related to star schema is snowflake schema, which is a little more complicated.
If you have a good star schema, you will be able to get a huge variety of combinations of your data with no more than a three way join, involving two dimensions and one fact table. Not only that, but many OLAP tools will be able to decipher your star design automatically, and give you point-and-click, drill down, and graphical analysis access to your data with no further programming.
Star schema design occasionally violates second and third normal forms, but it results in more speed and flexibility for reports and extracts. It's most often used in data warehouses, data marts, and reporting databases. You'll generally have much better results from star schema or some other retrieval oriented design, than from just haphazard "denormalization".
The critical issues in denormalizing are:
Deciding what data to duplicate and why
Planning how to keep the data in synch
Refactoring the queries to use the denormalized fields.
One of the easiest types of denormalizing is to populate an identity field to tables to avoid a join. As identities should not ever change, this means the issue of keeping the data in sync rarely comes up. For instance, we populate our client id to several tables because we often need to query them by client and do not necessarily need, in the queries, any of the data in the tables that would be between the client table and the table we are querying if the data was totally normalized. You still have to do one join to get the client name, but that is better than joining to 6 parent tables to get the client name when that is the only piece of data you need from outside the table you are querying.
However, there would be no benefit to this unless we were often doing queries where data from the intervening tables was needed.
Another common denormalization might be to add a name field to other tables. As names are inherently changeable, you need to ensure that the names stay in synch with triggers. But if this saves you from joining to 5 tables instead of 2, it can be worth the cost of the slightly longer insert or update.
If you have certain requirement, like reporting etc., it can help to denormalize your database in various ways:
introduce certain data duplication to save yourself some JOINs (e.g. fill certain information into a table and be ok with duplicated data, so that all the data in that table and doesn't need to be found by joining another table)
you can pre-compute certain values and store them in a table column, insteda of computing them on the fly, everytime to query the database. Of course, those computed values might get "stale" over time and you might need to re-compute them at some point, but just reading out a fixed value is typically cheaper than computing something (e.g. counting child rows)
There are certainly more ways to denormalize a database schema to improve performance, but you just need to be aware that you do get yourself into a certain degree of trouble doing so. You need to carefully weigh the pros and cons - the performance benefits vs. the problems you get yourself into - when making those decisions.
Consider a database with a properly normalized parent-child relationship.
Let's say the cardinality is an average of 2x1.
You have two tables, Parent, with p rows. Child with 2x p rows.
The join operation means for p parent rows, 2x p child rows must be read. The total number of rows read is p + 2x p.
Consider denormalizing this into a single table with only the child rows, 2x p. The number of rows read is 2x p.
Fewer rows == less physical I/O == faster.
As per the last section of this article,
https://technet.microsoft.com/en-us/library/aa224786%28v=sql.80%29.aspx
one could use Virtual Denormalization, where you create Views with some denormalized data for running more simplistic SQL queries faster, while the underlying Tables remain normalized for faster add/update operations (so long as you can get away with updating the Views at regular intervals rather than in real-time). I'm just taking a class on Relational Databases myself but, from what I've been reading, this approach seems logical to me.
Benefits of de-normalization over normalization
Basically de-normalization is used for DBMS not for RDBMS. As we know that RDBMS works with normalization, which means no repeat data again and again. But still repeat some data when you use foreign key.
When you use DBMS then there is a need to remove normalization. For this, there is a need for repetition. But still, it improves performance because there is no relation among the tables and each table has indivisible existence.

Hbase - How to add a super column family?

I am trying to create Java application that convert MYSQL database to NOSQL Hbase database .
So far it read the data from mysql and insert it to hbase correctely
But now i'am trying to handle relationship between tables of MYSQL,
and i understand if there are relationship you should add one of table as super column family .
I looked in apatch website documentation i couldn't find anything.
Any ideas ?
Column family has nothing to do with relationship. In contrast you have to correctly create inversed indexes via row key design which may allow to effectively O(1) retrieve data from one table by knowing key from another. Or to avoid join try to store all data in one row. Any tool that provides SQL interface for HBase spawns jobs which take time to start and execute. HBase is fast if you do Get operation or Scan successive rows.
Hope this was useful.
Update
Regarding more details about column families check out great book
Architecting HBase Applications
A column family is an HBase-specific concept that you will not find in other RDBMS
applications. For the same region, different column families will store the data into
different files and can be configured differently. Data with the same access pattern
and the same format should be grouped into the same column family. As an example
regarding the format, if you need to store a lot of textual metadata information for
customer profiles in addition to image files for each customer’s profile photo, you
might want to store them into two different column families: one compressed (where
all the textual information will be stored), and one not compressed (where the image
files will be stored). As an example regarding the access pattern, if some information
is mostly read and almost never written, and some is mostly written and almost never
read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup
them within the same column family.
The write cache memory area for a given RegionServer is shared by all the column
families configured for all the regions hosted by the given host. Abusing column families will put pressure on the memstore, which will generate many small files, which
in turn will generate a lot of compactions that might impact the performance. There
is no technical limitation on the number of column families you can configure for a
table. However, over the last three years, most of the use cases we had the chance to
work on only required a single column family. Some required two column families,
but each time we have seen more than two column families, it has been possible and
recommended to reduce the number to improve efficiency. If your design includes
more than three column families, you might want to take a deeper look at it and see if all those families are really required; most likely, they can be regrouped. If you do not
have any consistency constraints between your two columns families and data will
arrive into them at a different time, instead of creating two column families for a single table, you can also create two tables, each with a single column family. This strategy is useful when it comes time to decide the size of the regions. Indeed, while it was
better to keep the two column families almost the same size, by splitting them accross
two different tables, it is now easier to let me grow independently.
Also this answer can be useful.

Variable structure in a DB table to be read in a Web form

I have a "variable" structure to be put in a table DB. By "variable" I mean a sequence of couples field/value in which the "kind" of field determines the value type, I don't know exactly field order and I don't know how many times fields can repeat. Sometimes group of fields will repeat several times (it is a fiscal model).
Additional requirement: I should map these variable data into web page forms, handling some CRUD work.
JQuery-ui, Struts 2, Hibernate. Preferred DBMS: MySQL.
The solutions I thought of:
vertical table. I could have some performance issue, which I could resolve with materialized views that "pivot" the rows in columns when I need massive data process. Not gone so far in this direction as it seems to be very expensive for development.
LOB fields. Pack my columns into one of those, perhaps having a "mapping" table to decode each column. My idea is to pull-out searchable fields as "real" columns in order to leave in the LOB just the less interesting mob of data and not to generate performance problems.
or better 2a. Use an xml inside the LOB field. This could be useful to pack/unpack data more comfortably, specially having to map data to a web form.
What do you think? And more, is there some way to create automatic views from xml fields? Or better to map such data to web form? I suspect Hibernate Tools won't work in any of the cases I described.
I hope I have been clear, it's still a bit confusing even to me :)
Your option 1 is the Entity-Attribute-Value antipattern.
See my answer to Product table, many kinds of product, each product has many parameters and my blog post EAV FAIL for alternatives and some reasons why EAV is wrong, at least for a relational database (I cover EAV in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming).
Also read this article about how a similar structure nearly doomed a company: Bad CaRMa.
Your options 2 & 3 are similar to described in How FriendFeed uses MySQL to store schema-less data. I don't know of any automatic way for an ORM to maintain that structure for you. You do have the chore of keeping your inverted index tables in sync with your LOB data.

Recommended way of adding prefixes to the String keys(row keys) to be stored in database

What is the recommended way in Java to add prefixes to String keys to be stored in database of a web application?
I have an EntityId per Entity but I want to store different kinds of data for an Entity, in different rows distinguished by prefixed EntityId keys like this format:
EntityId | PrefixForThisDataCategory
In general, databases resist that kind of thing. They're happy to store "prefix" characters in a separate column, though. If that column needs to be part of a composite key, they're happy to do that, too.
But if you want to store different kinds of data in different rows of the same table, I hope I can discourage you. Databases--SQL databases, that is--are designed to keep different kinds of data in different tables. People in one table, addresses in another table . . . not addresses in some rows of a table of people.
Of course, you might be aiming at something completely different.
If I understand your question correctly you basically want to store data representing different sets of information in a common table and use one or more fields to differentiate what type of data has been stored.
THIS IS A REALLY BAD IDEA ! - had to say it :-)
I can tell you from experience on many projects that storing data in this way always leads to problems and really messy code. My strong recommendation would be to store the data is separate tables.
There is one variation I can think of however, that is similar to your request, that is derived class mapping in hibernate. There hibernate maps sets of data into tables based on the class that is being stored. This is only for mapping hierarchies of classes and is controlled by hibernate so that you don't have to worry about it.

persisting dynamic properties and query

I have a requirement to implement contact database. This contact database is special in a way that user should be able to dynamically (on runtime) add properties he/she wants to track about the contact. Some of these properties are of type string, other numbers and dates. Some of the properties have pre-defined values, others are free fields etc.. User wants to be also able to query such structure fast and easily. The database needs to handle easily 500 000 contacts each having around 10 properties.
It leads to dynamic property model having Contact class with dynamic properties.
class Contact{
private Map<DynamicProperty, Collection<DynamicValue> values> propertiesAndValues;
//other userfull methods
}
The question is how can I store such a structure in "some database" - it does not have to be RDBMS so that I can easily express queries such as
Get all contacts whose name starts with Martin, they are from Company of size 5000 or less, order by time when this contact was inserted in a database, only first 100 results (provide pagination), where each of these segments correspond to a dynamic property.
I need:
filtering - equal, partial equal, (bigger, smaller for integers, dates) and maybe aggregation - but it is not necessary at this point
sorting
pagination
I was considering RDBMS, but this leads more less to this structure which is quite hard to query and it tends to be slow for this amount of data
contact(id serial pk,....);
dynamic_property(dp_id serial pk, ...);
--only one of the values is not empty
dynamic_property_value(dpv_id serial pk, dynamic_property_fk int, value_integer int, date_value timestamp, text_value text);
contact_properties(pav_id serial pk, contact_id_fk int, dynamic_propert_fk int);
property_and_its_value(pav_id_fk int, dpv_id int);
I consider following options:
store contacts in RDBMS and use Lucene for querying - is there anything that would help with this?
Store dynamic properties as XML and store it to rdbms and use xpath support - unfortunatelly it seems to be pretty slow for 500000 contacts
use another database - Mango DB or Jackrabbit to store this information
Which way would you go and why?
Wikipedia has a great entry on Entity-Attribute-Value modeling which is a data modeling technique for representing entities with arbitrary properties. It's typically used for clinical data, but might apply to your situation as well.
Have you considered using Lucene for your querying needs? You could probably get away with just using Lucene and store all your data in the index. Although I wouldn't recommend using Lucene as your only persistence store.
Alternatively, you could use Lucene along with a RDBMS and take advantage of something like Compass.
You could try other kind of databases like CouchDB which is a document oriented db and is distributed
If you want a dumb solution, for your contacts table you could add some 50 columns like STRING_COLUMN1, STRING_COLUMN2... upto 10, DATE_COLUMN1..DATE_COLUMN10. You have another DESCRIPTION column. So if a row has a name which is a string then STRING_COLUMN1 stores the value of your name and the DESCRIPTION column value would be "STRING_COLUMN1-NAME". In this case querying can be a bit tricky. I know many purists laugh at this, but I have seen a similar requirement solved this way in one of the apps :)

Categories