I have an Oracle table that contains a number of clob fields. The clob fields themselves are plain text CSV files that are about 4kb-8kb in size. There are about 200 records in my table. I have a Sql query written that I run in JdbcTemplate that runs pretty quickly in JdbcTemplate (about 2 seconds).
Unfortunately, when I try to extract the clob field into a string for all 200 records, the execution time goes from 2 seconds to over 20 seconds, which is far too slow.
I am using the following line to convert the clob to a string.
String clobValue = clob.getSubString(1, (int) clob.length());
It seems to be this conversion that is killing performance for me. Are there any alternatives open to me on the Oracle or Java side to speed this up?
I have tried some other suggestions from here, but performance doesn't improve - Most efficient solution for reading CLOB to String, and String to CLOB in Java?
Surely there is a different, or more efficient way to do this?
Related
I have a png file which is stored as CLOB in oracle db.
and i want to get this data with jdbc template using below code
LobHandler lob = new DefaultLobHandler();
return jdbcTemplate.queryForObject(
"select table from column",
arr,
(rs, rowNum) -> lob.getClobAsString(rs, "CLOB_DATA")
);
and i am converting to byte array as follows since the encoding in oracle db is WE8MSWIN1252
clob.getBytes(Charset.forName("windows-1252"));
however i get the different bytes when i read the file manually with below code and compare with the db data.
File path = new File("path/to/file");
Files.readAllBytes(path.toPath());
some chars are not loaded correctly. what could be the problem?
I have a png file which is stored as CLOB in oracle db.
Don't do this. If you have binary data then store it in a binary format, such as a BLOB, and don't store it in a format which was not intended for that purpose.
however i get the different bytes when i read the file manually with below code and compare with the db data.
That's because you're using a CLOB for something it wasn't intended for.
what could be the problem?
The problem is that you are using the wrong data type for your data.
i cannot change it to blob since there are already added data as CLOB in DB.
Change them all to a BLOB and then just convert the BLOB to a CLOB when you need to get character data out. Trying to do it the other way round (without any encoding of the binary to make is safe to store as a string) is going to continue creating issues like the one you already have.
If you really must use a CLOB (please, DON'T) then store the binary in an encoded format, such as Base64, which can be safely stored as character data.
During execution of a program that relies on the oracle.sql package there is a large performance hit for persisting > 200 million Timestamps when compared to persisting the same number of longs.
Basic Schema
Java to persist:
Collection<ARRAY> longs = new ArrayList<ARRAY>(SIZE);
Collection<ARRAY> timeStamps = new ArrayList<ARRAY>(SIZE);
for(int i = 0; i < SIZE;i++)
{
longs.add(new ARRAY(description, connection, i));
timeStamps.add(new ARRAY(description,connection,new Timestamp(new Long(i)));
}
Statement timeStatement = conn.createStatement();
statement.setObject(1,timeStamps);
statement.execute(); //5 minutes
Statement longStatement = conn.createStatement();
statement.setObject(1,longs);
statement.execute(); //1 minutes 15 seconds
My question is what does Oracle do to Timestamps that make them so awful to insert in a bulk manner?
Configuration:
64 bit RHEL 5
jre 6u16
ojdbc14.jar
64 GB dedicated to the JVM
UPDATE
java.sql.Timestamp is being used
Number takes 4 bytes, Timestamp takes 11 bytes. In addition, Timestamp has metadata associated with it. For each Timestamp, Oracle seems to compute the metadata and store with the field.
Oracle timestamps are not stored as absolute value since epoc like a java.sql.Timestamp internally holds. It's a big bitmask containing values for the various "human" fields, centuries, months, etc.
So each one of your nanosecond-since-epoch timestamps is getting parsed into a "human" date before storage.
Adding to Srini's post, for documentation on memory use by data type:
Oracle Doc on Data Types: http://docs.oracle.com/cd/E11882_01/timesten.112/e21642/types.htm#autoId31 (includes memory size for Number and Timestamp)
The docs state that Number takes 5-22 bytes, Timestamp takes 11 bytes, Integer takes 4 bytes.
Also - to your point on querying against a date range - could you insert the dates as long values instead of timestamps and then use a stored procedure to convert when you are querying the data? This will obviously impact the speed of the queries, so it could be kicking the problem down the road, but.... :)
my site needs to store the ip and timestamp of every visit on mysql. i am concerned that very quickly i will have 1e6 rows in my database.
what is the best way to compress a date on mysql or java? does mysql already compress dates? ideally, I would like to un-compress the date values rather quickly to generate reports.
Update: sorry i meant a mil per day. but I guess that is still minuscule.
Mate, one million rows is a tiny database. I wouldn't be worrying too much about that. In any case, MySQL uses a pretty compressed format (3 bytes) anyway as per this page:
DATE: A three-byte integer packed as DD + MM×32 + YYYY×16×32
In other words, at bit level (based on the 1000-01-01 thru 9999-12-31 range):
00000yyy yyyyyyym mmmddddd
Use the built in MySQL datetime type. A million rows isn't that many.
A mysql timestamp would be only 4 bytes. An integer representing the timestamp would be the same. It would be efficient to save it as a mysql type since you'd be able to index and/or query based on that column efficiently.
Any "compressed" form not a mysql type would be inefficient to query.
I am inserting a vey big xml in the Sybase column which has type 'text'.
I am writing it using setString in PreparedStatement and reading it using getString.
But when I select it using getString I don't get the complete XML.
What can i do to read/write the complete XML?
Doesn't Sybase provide support for CLOB data type (that would be more suitable for storing large XMLs) ? In the PreparedStatement, you will need to use setClob() instead of setString().
Sybase ASE 15 has a bug when writing text columns of more than 8192 bytes: If your string (XML) has an invalid character (that does not conform to your Sybase database's legal character set) after position 8192 then Sybase will only write 8192 characters of your text and tell you that the operation was successful.
I'm working with third party user data that may or may not fit into our database. The data needs to be truncated if it is too long.
We are using IBatis with Connector/J. If the data is too long a SQL exception is thrown. I have had two choices: either truncate the strings in Java or truncate the strings in sql using substring.
I don't like truncating the strings in sql, because I am writing table structure in our Ibatis XML, but SQL on the other hand knows about our database collation (which isn't consistent and would be expensive to make consistent) and can truncate string in a multibyte safe manner.
Is there a way to have the Connector/J just straight insert this SQL and if not which route would people recommend?
According to the MySQL documentation it's possible that inserting data that exceeds the length could be treated as a warning:
Inserting a string into a string
column (CHAR, VARCHAR, TEXT, or BLOB)
that exceeds the column's maximum
length. The value is truncated to the
column's maximum length.
One of the Connector/J properties is jdbcCompliantTruncation. This is its description:
This sets whether Connector/J should
throw java.sql.DataTruncation
exceptions when data is truncated.
This is required by the JDBC
specification when connected to a
server that supports warnings (MySQL
4.1.0 and newer). This property has no effect if the server sql-mode includes
STRICT_TRANS_TABLES. Note that if
STRICT_TRANS_TABLES is not set, it
will be set as a result of using this
connection string option.
If I understand correctly then setting this property to false doesn't throw the exception but inserts the truncated data. This solution doesn't require you to truncate the data in program code or SQL statements, but delegates it to the database.