what is the range or ms sql xml argument? - java

I am using mssql with j2ee spring framework.
When insert a data to a table, i am using bulk insert with xml argument in mssql.
Can you anyone say how much data we can pass using this.
I would like to know this range with xml argument.
T.Saravanan

On the SQL Server side, it is is 2GB
The stored representation of xml data type instances cannot exceed 2 gigabytes (GB) in size
"Stored" means after some processing for efficiency
SQL Server internally represents XML in an efficient binary representation that uses UTF-16 encoding. User-provided encoding is not preserved, but is considered during the parse process.

Related

Java / Sql-server parameter binding does not work as expected

We notice a strange behaviour in our application concerning bind parameters. We use Java with JDBC to connect to a Sql Server database. In a table cell we have the value 'µ', and we compare it with a bind parameter, which is also set to the value 'µ'.
Now, in a sql statement like "... where value != ?", where 'value' is the value of 'µ' in the database and ? the bind variable, which is also set to 'µ', we notice that we get a record, though we would expect that 'µ' equals 'µ'.
The method that we use to fill the bind parameter is java.sql.PreparedStatement.setString(int, String).
Some facts:
The character value of µ in different encodings is:
ASCII (ISO-8859-1) : 0xB5
UTF-8 : 0xC2B5
UTF-16 (= Java) : 0x00B5
Now I did some investigations to see which bytes the database actually sees. Therefor I tried a sql-statement like this:
select convert(VARBINARY(MAX), value), -- selects µ from database table
convert(VARBINARY(MAX), N'µ'), -- selects µ from literal
convert(VARBINARY(MAX), ?) -- selects µ from bind parameter
from ...
The result for the three values is:
B500
B500
C200B500 <-- Here is the problem!
So, the internal representation of µ in the database and as NVARCHAR literal is B500.
Now we can't understand what is going on here. We have the value of 'µ' in a Java variable (which should internally be 0x00B5). When it is passed as bind variable, then is seems as if it is converted to UTF-8 (which makes byte sequence 0xC2B5), and then the database treats it as if it were two characters, making the sequence of characters C200B500 from it.
To make things even more confusing:
(1) On an other machine with a different database the same code works like expected. The result of the three lines is B500/B500/B500, so the bind variable is converted to be a proper B500.
(2) On the same machine, the same database but a different program (but using the same jdbc driver library and the same connect parameters) this also works as expected, giving the result of B500/B500/B500.
Some additional facts, maybe they are important:
The database is Sql Server 2014
Java is Java 7
The application in question is a webapp running in Tomcat 7.
Jdbc library is sqljdbc 4.2
Any help to sort this out is greatly appreciated!
I now found the solution. It did not at all have something to do with Sql Server or binding, but instead...
Tomcat 7 is not running in UTF-8 mode by default (I wasn't aware of that). The µ we are talking about comes from an other application that is providing this value via webservice calls. However, this application is using UTF-8 as default. So, it was sending an UTF-8 µ, but the webservice did not expect UTF-8 and thought that it would be two characters, and treated them like this, filling the internal String variable with the character for 0xC2 and 0xB5 (which is, for Sql Server, C200B500).

performance is slow with hibernate and MS sql server

I'm using hibernate and db is sqlserver.
SQL Server differentiates it's data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not. In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.
The solution we used is ,we figured that there is a property called sendStringParametersAsUnicode which helps in getting rid of this unicode conversion. This property defaults to ‘true’ which makes the JDBC driver send every string in Unicode format to the database by default. We switched off this property.
My question is now we cannot send data in unicode conversion. in future if db column of varchar is changed to nvarchar (only one column not all varchar columns), now we should sent the string in unicode format.
Please suggest me how to handle the scenario.
Thanks.
You need to specify property: sendStringParametersAsUnicode=false in connection string url.
jdbc:sqlserver://localhost:1433;databaseName=mydb;sendStringParametersAsUnicode=false
Unicode is the native string representation for communication with SQL Server, if you are converting to MBCS (Multibyte character sets), then you are doing 2 converts for every string. I suggest that if you are concerned with performance, use all Unicode instead of all MBCS
ref: http://social.msdn.microsoft.com/Forums/en/sqldataaccess/thread/249c629f-b8f2-4a8a-91e8-aad0d83919ca

Writing Big XML in sybase and reading it?

I am inserting a vey big xml in the Sybase column which has type 'text'.
I am writing it using setString in PreparedStatement and reading it using getString.
But when I select it using getString I don't get the complete XML.
What can i do to read/write the complete XML?
Doesn't Sybase provide support for CLOB data type (that would be more suitable for storing large XMLs) ? In the PreparedStatement, you will need to use setClob() instead of setString().
Sybase ASE 15 has a bug when writing text columns of more than 8192 bytes: If your string (XML) has an invalid character (that does not conform to your Sybase database's legal character set) after position 8192 then Sybase will only write 8192 characters of your text and tell you that the operation was successful.

What to use to store serialized data that can be queried?

I need to extract data from an incoming message that could be in any format. The extracted data to store is also dependent upon the format, i.e. format A could extract field X, Y, Z, but format B could extract field A, B, C. I also need to view Message B by searching for field C within the message.
Right now I'm configuring and storing a the extraction strategy (XSLT) and executing it at runtime when it's related format is encountered, but I'm storing the extracted data in an Oracle database as an XmlType column. Oracle seems to have pretty lax development/support for XmlType as it requires an old jar that forces you to use a pretty old DOM DocumentBuilderFactory impl (looks like Java 1.4 code), which collides with Spring 3, and doesn't play very nicely with Hibernate. The XML queries are slow and non-intuitive as well.
I'm concluding that Oracle with XmlType isn't a very good way to store the extracted data, so my question is, what is the best way to store the serialized/queryable data?
NoSQL (Cassandra, CouchDB, MongoDB, etc.)?
A JCR like JackRabbit?
A blob with manual de/serialization?
Another Oracle solution?
Something else??
One alterative that you haven't listed is using an XML Database. (Notice that Oracle is one of the ten or so XML database products.)
(Obviously, a blob type won't allow querying "inside" the persisted XML objects unless you read each blob instance into memory and do the querying there; e.g. using XSLT.)
I have had great success in storing complex xml objects in PostgreSQL. Together with the functional index features, you can even create indexes on node values of the stored xml files, and use those indexes to do very fast lookups using index scans without having to reparse the XML file.
This however will only work if you know your query patterns, arbitrary xpath queries will be slow also.
Example (untested, contains syntax errors for sure):
Create a simple table:
create table test123 (
int serial primary key,
myxml text
)
Now lets assume you have xml documents like:
<test>
<name>Peter</name>
<info>Peter is a <i>very</i> good cook</info>
</test>
Now create a function index:
create index idx_test123_name on table123 using xpath(xml,"/test/name");
Now do you fast xml lookups:
SELECT xml FROM test123 WHERE xpath(xml,"/test/name") = 'Peter';
You should also consider creating an index using text_pattern_ops, so you can have fast prefix lookups like:
SELECT xml FROM test123 WHERE xpath(xml,"/test/name") like 'Pe%';

Truncating strings

I'm working with third party user data that may or may not fit into our database. The data needs to be truncated if it is too long.
We are using IBatis with Connector/J. If the data is too long a SQL exception is thrown. I have had two choices: either truncate the strings in Java or truncate the strings in sql using substring.
I don't like truncating the strings in sql, because I am writing table structure in our Ibatis XML, but SQL on the other hand knows about our database collation (which isn't consistent and would be expensive to make consistent) and can truncate string in a multibyte safe manner.
Is there a way to have the Connector/J just straight insert this SQL and if not which route would people recommend?
According to the MySQL documentation it's possible that inserting data that exceeds the length could be treated as a warning:
Inserting a string into a string
column (CHAR, VARCHAR, TEXT, or BLOB)
that exceeds the column's maximum
length. The value is truncated to the
column's maximum length.
One of the Connector/J properties is jdbcCompliantTruncation. This is its description:
This sets whether Connector/J should
throw java.sql.DataTruncation
exceptions when data is truncated.
This is required by the JDBC
specification when connected to a
server that supports warnings (MySQL
4.1.0 and newer). This property has no effect if the server sql-mode includes
STRICT_TRANS_TABLES. Note that if
STRICT_TRANS_TABLES is not set, it
will be set as a result of using this
connection string option.
If I understand correctly then setting this property to false doesn't throw the exception but inserts the truncated data. This solution doesn't require you to truncate the data in program code or SQL statements, but delegates it to the database.

Categories