We have an event system producing database events for change data capture.
The system sends an event which contains the INSERT or UPDATE statement with ? placeholders and an array of the ordered values matching each question mark.
I want to use this for per hour backup files so if I get a statement like:
insert into T0(a,b,c) VALUES(?,?,?)
with an array of values 1, 2 and it's his then I write the a line to the backup file for that hour as
insert into T0(a,b,c) VALUES(1,2,'it\'s his');
A few things:
Is it only strings that need escaping? We don't have or allow binary columns
Is there a Java library that can do this already (from the Spring eco-system, Apache or otherwise)?
I've seen the Postgres JDBC code for escaping https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/core/Utils.java - is that sufficient?
I was also thinking of creating a SQLite database for each hour, writing to SQLite and then dumping it to the hr.sql text file. This has the advantage of capitalising on all the hardwork and thought already put into SQLite handling escaping but feels like overkill if there's a way to do the toString in Java then append a line to the file.
There's a performance consideration in using SQLite as well furthering my hesitation to that that route.
Found some options.
Postgres JDBC driver is this https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/core/Utils.java and other impl. is even more simple https://github.com/p6spy/p6spy/blob/master/src/main/java/com/p6spy/engine/common/Value.java#L172 literally doing
ESPECIAL_CHARACTER_PATTERN.matcher(stringValue).replaceAll("''")
Where private static final Pattern ESPECIAL_CHARACTER_PATTERN = Pattern.compile("'");
In both cases, only strings need this as I thought and binary is handled separately but we don't have/need binary.
Digging further I rediscovered ESAPI https://github.com/ESAPI/esapi-java-legacy
They have a lib for escaping SQL https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html#defense-option-4-escaping-all-user-supplied-input
https://github.com/ESAPI/esapi-java-legacy/blob/develop/src/main/java/org/owasp/esapi/codecs/MySQLCodec.java
Related
We have an application that runs with any of IBM Informix, MySQL and Oracle, and we are using Java with Hibernate to connect to the database. We will store XML, CSV and other text-based files inside the database (clob column). The entities in Java are byte[] objects.
One feature request to the application is now to "grep" content inside the data. So I need to find all files with a specific content.
On regular char/varchar fields I can use like '%xyz%', but this is not working on byte[] / blobs.
The first approach was to load each entity, cast the byte[] into a string and use the contains method in Java. If the use enters any filter parameters on other (non-clob) columns, I will apply those filters before testing the clob in order to reduce the number of blobs I have to scan.
That worked quite well for 100 files (clobs) and as long as the application and database are on the same server. But I think it will get really slow if I have 1.000.000 files inside the database and the database is not always in the same network. So I think that is not a good idea.
My next thought was creating a database procedure. But I am not quite sure if this is possible for Informix, MySQL and Oracle. And I am not sure if this is possible.
The last but not favored method is to store the content of the data not inside a clob. Maybe I can use a different datatype for that?
Does anyone has a good idea how to realize that? I need a solution for all three DBMS. The application knows on what kind of DBMS it is connected to. So it would be okay, if I have three different solutions (one for each DBMS).
I am completely open to changing what kind of datatype I use (BLOB, CLOB ...) — I can modify that as I want.
Note: the clobs will range from about 5 KiB to about 500 KiB, with a maximum of 1 MiB.
Look into Apache Lucene or other text indexing library.
https://en.wikipedia.org/wiki/Lucene
http://en.wikipedia.org/wiki/Full_text_search
If you go with a DB specific solution like Oracle Text Search you will have to implement a custom solution for each database. I know from experience that Oracle Text search takes significant time to learn and involves a lot of tweaking to get just right.
Also, if you use a DB solution you would receive different results in each DB even if the data sets were the same (each DB would have it's own methods of indexing and retrieving the data).
By going with a 3rd party solution like Lucene -- you only have to learn one solution and results will be consistent regardless of the Db.
I have table and CVS file what i want to do is from csv have to update the table.
csv file as follows (no delta)
1,yes
2,no
3,yes
4,yes
Steps through java
what i have did is read the csv file and make two lists like yesContainList,noContainList
in that list added the id values which has yes and no seperately
make the list as coma seperated strinh
Update the table with the comma seperated string
Its working fine. but if i want to handle lakhs of records means somewhat slow.
Could anyone tell whether is it correct way or any best way to do this update?
There are 2 basic techniques to do this:
sqlldr
Use an external table.
Both methods are explained here:
Update a column in table using SQL*Loader?
Doing jobs like bulk operation, import, exports or heavy SQL operation is not recommended to be done outside RDBMS due to performance issues.
By fetching and sending large tables throw ODBC like API's you will suffer network round trips, memory usage, IO hits ....
When designing a client server application (like J2EE) do you design a heavy batch operation being called and controlled from user interface layer synchronously or you will design a server side process triggered by clients command?.
Think about your java code as UI layer and RDBMS as server side.
BTW RDBMS's have embedded features for these operations like SQLLOADER in oracle.
To avoid creating SQL statements as strings in a class I've placed them as .sql files in the same package and read the contents to a string in the static constructor. The reason for this is the SQL is very complex due to an ERP system that the SQL is querying.
There's no problem with this method, though since the SQL reading mechanism quite simply just reads the whole file any comments within that file may cause the read to fail if they are at the end of the line, as when reading it first removes excess whitespace and removes new-lines. Full commented lines (i.e. lines beginning with -- are removed).
I could enhance the simple reading to read the file and remove commented lines etc, though I have to wonder if there is something already available that could read an SQL file and clean it up.
I've seen this same problem solved in a project I've worked on by storing queries in XML, and loading the XML into a custom StoredQueriesCache object at runtime. To get a query, we would call a method on the StoredQueriesCache object and just pass the query name (which is defined in the XML), and it would return the query.
Writing something like this is fairly simple. The XML would look something like this below...
<Query>
<Name>SomeUniqueQueryName</Name>
<SQL>
SELECT someColumn FROM someTable WHERE somePredicate
</SQL>
</Query>
You would have one element for every stored query. The XML would be loaded into memory at application startup from file, or depending on your needs it could be lazy loaded from file. Then your StoredQueriesCache object that holds the XML would have methods to return individual queries by name. In my experience, having comments in the query has never caused any issue since linebreaks are part of the XML node's innertext, but if you want your StoredQueriesCache methods that retrieve the queries could parse comments out.
I've found this to be the most organized way of storing queries without embedding them in code, and without using stored procedures. There should honestly be a library that does this for you; maybe I'll write one!
I'm quite new to Java Programming and am writing my first desktop app, this app takes a unique isbn and first checks to see if its all ready held in the local DB, if it is then it just reads from the local DB, if not it requests the data from isbndb.com and enters it into the DB the local DB is in XML format. Now what im wondering is which of the following two methods would create the least overhead when checking to see if the entry all ready exists.
Method 1.) File Exists.
On creating said DB entry the app would create a seperate file for every isbn number named isbn number.xml (ie. 3846504937540.xml) and when checking would use the file exists method to check if an entry all ready exists using the user provided isbn .
Method 2.) SAX XML Parser.
All entries would be entered into a single large XML file and when checking for existing entries the SAX XML Parser would be used to parse the file and then the user provided isbn would be checked against those in the XML DB for a match.
Note :
The resulting entries could number in the thousands over time.
Any information would be greatly appreciated.
I don't think either of your methods is all that great. I strongly suggest using a DBMS to store the data. If you don't have a DBMS on the system, or if you want an app that can run on systems without an installed DBMS, take a look at using SQLite. You can use it from Java with SQLiteJDBC by David Crawshaw.
As far as your two methods are concerned, the first will generate a huge amount of file clutter, not to mention maintenance and consistency headaches. The second method will be slow once you have a sizable number of entries because you basically have to read (on the average) half the data base for every query. With a DBMS, you can avoid this by defining indexes for the info you need to look up quickly. The DBMS will automatically maintain the indexes.
I don't like too much the idea of relying on the file system for that task: I don't know how critical is your application, but many things may happen to these xml files :) plus, if the folder gets very very big, you would need to think about splitting these files in some hierarchcal folder structure, to have decent performance.
On the other hand, I don't see why using an xml file as a database, if you need to update frequently.
I would use a relational database, and add a new record in a table for each entry, with an index on the isbn_number column.
If you are in the thousands records, you may very well go with sqlite, and you can replace it with a more powerful non-embedded DB if you ever need it, with no (or little :) ) code modification.
I think you'd better use DBMS instead of your 2 methods.
If you want least overhead just for checking existence, then option 1 is probably what you want, since it's direct look up. Parsing XML each time for checking requires you to to pass through the whole XML file in worst case. Although you can do caching with option 2 but that gets more complicated than option 1.
With option 1 though, you need to beware that there is a limit of how many files you can store under a directory, so you probably have to store the XML files by multiple layer (for example /xmldb/38/46/3846504937540.xml).
That said, neither of your options is good way to store data in the long run, you will find them become quite restrictive and hard to manage as data grows.
People already recommended using DBMS and I agree. On top of that I would suggest you to look into document-based database like MongoDB as your database.
Extend your db table to not only include the XML string but also the ISBN number.
Then you select the XML column based on the ISBN column.
Query: Java escaped, "select XMLString from cacheTable where isbn='"+ isbn +"'"
A different approach could be to use an ORM like Hibernate.
In ORM instead of saving the whole XML document in one column you use different different columns for each element and attribute and you could even split upp your document over several tables for a simpler long term design.
I am building an application at work and need some advice. I have a somewhat unique problem in which I need to gather data housed in a MS SQL Server, and transplant it to a mySQL Server every 15 mins.
I have done this previously in C# with a DataGrid, but now am trying to build a Java version that I can run on an Ubuntu Server, but I can not find a similar model for Java.
Just to give a little background
When I pull the data from the MS SQL Server, it always has 9 columns, but could have anywhere from 0 - 1000 rows.
Before inserting into the mySQL Server blindly, I do manipulate some of the data.
I convert a time column to CST based on a STATE column
I strip some characters to prevent SQL injection
I tried using the ResultSet, but I am having issues with the "forward only result sets" rules.
What would be the best data structure to hold that information, manipulate it, and then parse it to insert later into mySQL?
This sounds like a job for PreparedStatements!
Defined here: http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html
Quick example: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html
PreparedStatements allows you to batch up sets of data before pushing them into the target database. They also allow you use the PreparedStatement.setString method which handles escaping characters for you.
For the time conversion thing, I would retrieve the STATE value from the row and then retrieve the time value. Before calling PreparedStatement.setDate, convert the time to CST if necessary.
I dont think that you would need all the overhead that an ORM tool requires.
You could consider using an ORM technology like Hibernate. This might seem a little heavyweight at first, but it means you can maintain the various table mappings for various databases with ease as well as having the power of Java's RegEx lib for any manipulation requirements.
So you'd have a Java class that represents the source table (with its Hibernate mapping) and another Java class that represents the target table and lastly a conversion utility class that does any manipulation of that data. Hibernate takes care of the CRUD SQL for you, so no need to worry about Database specific SQL (as long as you get the mapping correct).
It also lessens the SQL injection problem