Java code to import CSV into Access

Java code to import CSV into Access - java

I posted the code below to the Sun developers forum since I thought it was erroring (the true error was before this code was even hit). One of the responses I got said it would not work and to throw it away. But it is actually working. It might not be the best code (I am new to Java) but is there something inherently "wrong" with it?
=============
CODE:
private static void ImportFromCsvToAccessTable(String mdbFilePath, String accessTableName
, String csvDirPath , String csvFileName ) throws ClassNotFoundException, SQLException {
Connection msConn = getDestinationConnection(mdbFilePath);
try{
String strSQL = "SELECT * INTO " + accessTableName + " FROM [Text;HDR=YES;DATABASE=" + csvDirPath + ";].[" + csvFileName + "]";
PreparedStatement selectPrepSt = msConn.prepareStatement(strSQL );
boolean result = selectPrepSt.execute();
System.out.println( "result = " + result );
} catch(Exception e) {
System.out.println(e);
} finally {
msConn.close();
}
}

The literal answer is no - there is never anything "inherently wrong" with code, it's a matter of whether it meets the requirements - which may or may not include being maintainable, secure, robust or fast.
The code you are running is actually a JET query purely within Access - the Java code is doing nothing except telling Access to run the query.
On the one hand, if it ain't broke don't fix it. On the other hand, there's a good chance it will break in the near future so you could try fixing it in advance.
The two likely reasons it might break are:
SQL injection risk. Depending on where csvDirPath and csvFileName come from (e.g. csvFileName might come from the name of the file uploaded by a user?), and on how clever the Access JDBC driver is, you could be open to someone breaking or deleting your data by inserting a semicolon (or some brackets to make a subquery) and some additional SQL commands into the query.
You are relying on the columns of the CSV file being compatible with the columns of the Access table. If you have unchecked CSV being uploaded, or if the CSV generator has a particular way of handling nulls, or if you one day get an unusual date or number format, you may get an error on inserting into the Access table.
Having said all that, we are all about pragmatism here. If the above code is from a utility class which you are going to use by hand a few times a week/month/year/ever, then it isn't really a problem.
If it is a class which forms part of a web application, then the 'official' Java way to do it would be to read records out of the CSV file (either using a CSV parser or a CSV/text JDBC driver), get the columns out of the recordset, do some validation or sanity checking on them, and then use a new PreparedStatement to insert them into the Access database. Much more trouble but much more robust.
You can probably find a combination of tools (e.g. object-relational layers or other data access tools) which will do a lot of that for you, but setting up the tools is going to be as much hassle as writing the code. Then again, you'll learn a lot from either one.

One word of warning - jdbc -> Access queries (which bridge using odbc) do not work on 64 bit systems, as there exist no 64 bit Access database drivers (The driver is included into 32 bit copies of Windows and can only be accessed by 32 bit processes. You can run "odbcad32" or look at the ODBC control panel to see that the driver is present)
While I don't see the code with the connection string in your code snippet, I am not aware of any noncommercial Access JDBC drivers for Java, only jdbc->odbc bridging and relying on Windows to have the Access (*.mdb) driver. Microsoft no longer supports this driver and has no plans to port it to 64bit, so infrastructure wise it is something to think about.

#david.w.fenton.myopenid.com: "Can you provide a citation about MS's plans to never introduce 64-bit ODBC drivers for Jet?"
David, I found a post on Microsoft's Connect Feedback about that.
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125117
"At the moment there are no plans to ship a 64-bit version of JET driver by Office team. We may considere alternate options and will update you when we have a concrete plan."
Thanks,
SSIS team.
Posted by Microsoft on 10/3/2007 at 9:47 PM
There's been no update from Microsoft in that feedback thread.

Question to Joshua McKinnon:
Can you provide a citation about MS's plans to never introduce 64-bit ODBC drivers for Jet? This sounds reasonable, so I'm not doubting you at all, I would just like to know if you have a source for it that you can point to.
Surely MS is providing access to Jet on 64-bit systems through OLEDB, though, right? That doesn't help with JDBC, but certainly provides a method to use Jet data (they have to provide something, since Jet 4 is part of the OS, as it is used as the data store for Active Directory, and has been used thus since Windows 2000).

Related

Using an enum in Java as an alternative to a small database

I have a java application which uses a class "Country". The business logic associates each country in the world with:
An iso code.
A telephone country calling code
A list of TZ database timezone ids suitable for that country
Normally, this is data is perfect for a database. However, given the fact that this data is quite stable (sure, it changes now and then, but it is normally not used to perform create/update/delete operations on those records) I was thinking about maybe putting the information in an enumeration like this:
public enum Country
{
US(1,new String[]{"America\New York","America\California",...}),
...
NL(31, new String[]{"Europe\Amsterdam"});
...
public Country(int telephoneCode, String[] timezoneIds)
{
...
}
}
Possible advantages to this approach:
Data is directly accesible from java source code, so there is no delay with doing database queries.
No need to set-up and maintain a database file somewhere (which is a big deal for me, because I'm not familiar with JDBC yet)
The disadvantages could be:
Data is "hidden" inside the source code and is not updated automatically if politics change.
Code must be redistributed if enums change. (See this question.)
How well is the performance of "big" enums (more than 200 possible values) compared to database queries to get specific information?
I would like to ask about the bolded question (disadvantage #3), but any comments regarding the bigger picture are welcome.
I am aware of the Locale class in java which can be used to get a list of all possible countries known in the java system. Unfortunately, telephone and timezone information is not available. My intent is too merge this data into one enumeration. Is this practice considered to be a plausible alternative or hackity anti pattern?

I can only agree with the comments so far. A list of countries may change but it will always be small enough to keep the whole list in memory. Any difference in performance doesn't really matter if it is only queried once and then cached.
It probably depends on who is using the software (personal use or a large company) but I would say the biggest concern here is #2, not #3. In code vs. in database should be decided based on who might has to change the values later. Are you really ok that you need a developer to change a value, recompile everything and roll out a new version or patch of the software? Just updating a text file or the database will be much easier.
If it's a personal project that argument probably doesn't matter for you, but even then a text file (e.g. in CSV format) might be the easier to maintain than code. I would still recommend to use a simple database - just for the learning experience. You don't need to setup a huge database system like MySQL. You could use a small embeddable database like h2 or Apache Derby.
Just for reference how often country codes and names can change:
ISO 3611-1 (which defines the language and country codes for existing countries) has been changed 17 times in the years 2007-2014. That's about 2.4 times per year.

No problem to create an Enum: generally speaking it is infrequent a change of Countries.
My suggestions:
Using Enum
store data on Db and load items during initialization (static cache)
use a hit and miss cache (JCS, EHCache, ... )
use a Resource Bundle

class file format limit(s) exceeded

I have a very long sql and it's size is more than 150000 byte . I tried run it in derby editor.
When I want to run, editör throws this error
Java
class file format limit(s) exceeded: method1:e1 code length
(158045 > 65535) in generated class
Is there any way to increase this limit ?
ps: I don't want to divide code to a few pieces. And I don't want to use temporary table.

This has been a problem for Derby over the years. See, for example, DERBY-176, DERBY-732, DERBY-766, and DERBY-1714 (there are more like these).
In all the cases that I'm familiar with, it has been possible to rewrite the SQL that the application issues in order to reduce it to a limit that the JVM can handle (this is more a Java limitation than a Derby limitation).
Are you able to share the SQL that generates this problem? Perhaps the community can suggest a better way to frame your SQL that doesn't generate such extensive bytecodes.

Unfortunately, there is no way to increase this limit. See the relevant part of the JVM spec.
I would suggest that rather than hard-coding the SQL into the class, you read it in from a text-file either as a resource on the classpath, or in a known location on the filing-system.
If the Derby editor itself is generating the class, you may not be able to use the Derby editor with this particular query. It might be wise to rethink the query. I would have thought there might be a better way to write the query. It can't be very maintainable at that length!

there is no way to increase this limit. So I used SQLite instead of
Derby.
It's a in-memory database too. And it's default code limit is over 1000000 byte. Also you can change code limit over 1000000000 byte at compile time.

Resource considerations for a Java program using an SQL DB

I'm fairly new to programming, at least when it comes to anything substantial. I am about to start work on a management software for my employer which draws it's data from, and stores it's data to, an SQL database. I will likely be using JDBC to interact with it.
To try and accurately describe the problem I am going to focus on a very small portion of the program. In the database, there is a table that stores Job records. There are a couple of thousand of them. I want to display all available Jobs (as a text reference from the table) in a scroll-able panel in the program with a search function.
So, my question is... Should I create Job objects from each record in one go and have the program work with the objects to display them, OR should I simply display strings taken directly from the records? The first method would mean that other details of each job are stored in advanced so that when I open a record in the UI the load times should be minimal, however it also sounds like it would take a great deal of resources when it initially populates the panel and generates the objects. The second method would mean issuing a large quantity of queries to the Database, but might avoid the initial resource overhead, but I don't want to put too much strain on the SQL Server because other software in-house relies on it.
Really, I don't know anything about how I should be doing this. But that really is my question. Apologies if I am displaying my ignorance in this post, and thank you in advanced for any help you can offer.

"A couple thousand" is a very small number for modern computers. If you have any sort of logic to perform on these records (they're not all modified solely via stored procedures), you're going to have a much easier time using an object-relational mapping (ORM) tool like Hibernate. Look into the JPA specification, which allows you to create Java classes that represent database objects and then simply annotate them to describe how they're stored in the database. Using an ORM like this system does have some overhead, but it's nearly always worthwhile, since computers are fast and programmers are expensive.
Note: This is a specific example of the rule that you should do things in the clearest and easiest-to-understand way unless you have a very specific reason not to, and in particular that you shouldn't optimize for speed unless you've measured your program's performance and have determined that a specific section of the code is causing problems. Use the abstractions that make the code easy to understand and come back later if you actually have to speed things up.

Parsing IBM 3270 data in java

I was wondering if anyone had experience retrieving data with the 3270 protocol. My understanding so far is:
Connection
I need to connect to an SNA server using telnet, issue a command and then some data will be returned. I'm not sure how this connection is made since I've read that a standard telnet connection won't work. I've also read that IBM have a library to help but not got as far as finding out any more about it.
Parsing
I had assumed that the data being returned would be a string of 1920 characters since the 3278 screen was 80x24 chars. I would simply need to parse these chars into the appropriate fields. The more I read about the 3270 protcol the less this seems to be the case - I read in the documentation provided with a trial of the Jagacy 3270 Java library that attributes were marked in the protocol with the char 'A' before the attribute and my understanding is that there are more chars denoting other factors such as whether fields are editable.
I'm reasonably sure my thinking has been too simplistic. Take an example like a screen containing a list of items - pressing a special key on one of the 24 visible rows drills down into more detailed information regarding that row.
Also it's been suggested to me that print commands can be issued. This has some positive implications - if the format of the string returned is not 1920 since it contains these characters such as 'A' denoting how users interact with the terminal, printing would eradicate these. Also it would stop having to page through lots of data. The flip side is I wouldn't know how to retrieve the data from the print command back to Java.
So..
I currently don't have access to the SNA server but have some screen shots of what the terminal will look like once I get a connection and was therefore going to start work on parsing. With so many assumptions and not a lot of idea on what the data will look like I feel really stumped. Does anyone have any knowledge of these systems that might help me back on track?

You've picked a ripper of a problem there. 3270 is a very complex protocol indeed. I wouldn't bother about trying to implement it, it's a fool's errand, and I'm speaking from painful personal experience. Try to find a TN3270 (Telnet 3270) client API.

This might not specifically answer your question, but...
If you are using Rational Developer for z/OS, your java code should be able to use the integrated HATS product to deal with the 3270 stream. It might not fit your project, but I thought I would mention it if all you are trying to do is some simple screen scraping, it makes things very easy.

MS Access - Can't Open Any More Tables

at work we have to deal with several MS Access mdb files, so we use the default JdbcOdbcBridge Driver which comes with the Sun JVM and, for most cases, it works great.
The problem is that when we have to deal with some larger files, we face several times exceptions with the message "Can't open any more tables". How can we avoid that?
We already close all our instances of PreparedStatements and RecordSets, and even set their variables to null, but even so this exception continues to happen. What should we do? How can we avoid these nasty exceptions? Does someone here knows how?
Is there any additional configuration to the ODBC drivers on Windows that we can change to avoid this problem?

"Can't open any more tables" is a better error message than the "Can't open any more databases," which is more commonly encountered in my experience. In fact, that latter message is almost always masking the former.
The Jet 4 database engine has a limit of 2048 table handles. It's not entirely clear to me whether this is simultaneous or cumulative within the life of a connection. I've always assumed it is cumulative, since opening fewer recordsets at a time in practice seems to make it possible to avoid the problem.
The issue is that "table handles" doesn't just refer to table handles, but to something much more.
Consider a saved QueryDef with this SQL:
SELECT tblInventory.* From tblInventory;
Running that QueryDef uses TWO table handles.
What?, you might ask? It only uses one table! But Jet uses a table handle for the table and a table handle for the saved QueryDef.
Thus, if you have a QueryDef like this:
SELECT qryInventory.InventoryID, qryAuthor.AuthorName
FROM qryInventory JOIN qryAuthor ON qryInventory.AuthorID = qryAuthor.AuthorID
...if each of your source queries has two tables in it, you're using these table handles, one for each:
Table 1 in qryInventory
Table 2 in qryInventory
qryInventory
Table 1 in qryAuthor
Table 2 in qryAuthor
qryAuthor
the top-level QueryDef
So, you might think you have only four tables involved (because there are only four base tables), but you'll actually be using 7 table handles in order to use those 4 base tables.
If in a recordset, you then use the saved QueryDef that uses 7 table handles, you've used up yet another table handle, for a total of 8.
Back in the Jet 3.5 days, the original table handles limitation was 1024, and I bumped up against it on a deadline when I replicated the data file after designing a working app. The problem was that some of the replication tables are open at all times (perhaps for each recordset?), and that used up just enough more table handles to put the app over the top.
In the original design of that app, I was opening a bunch of heavyweight forms with lots of subforms and combo boxes and listboxes, and at that time I used a lot of saved QueryDefs to preassemble standard recordsets that I'd use in many places (just like you would with views on any server database). What fixed the problem was:
loading the subforms only when they were displayed.
loading the rowsources of the combo boxes and listboxes only when they were onscreen.
getting rid of all the saved QueryDefs and using SQL statements that joined the raw tables, wherever possible.
This allowed me to deploy that app in the London office only one week later than planned. When Jet SP2 came out, it doubled the number of table handles, which is what we still have in Jet 4 (and, I presume, the ACE).
In terms of using Jet from Java via ODBC, the key point would be, I think:
use a single connection throughout your app, rather than opening and closing them as needed (which leaves you in danger of failing to close them).
open recordsets only when you need them, and clean up and release their resources when you are done.
Now, it could be that there are memory leaks somewhere in the JDBC=>ODBC=>Jet chain where you think you are releasing resources and they aren't getting released at all. I don't have any advice specific to JDBC (as I don't use it -- I'm an Access programmer, after all), but in VBA we have to be careful about explicitly closing our objects and releasing their memory structures because VBA uses reference counting, and sometimes it doesn't know that a reference to an object has been released, so it doesn't release the memory for that object when it goes out of scope.
So, in VBA code, any time you do this:
Dim db As DAO.Database
Dim rs As DAO.Recordset
Set db = DBEngine(0).OpenDatabase("[database path/name]")
Set rs = db.OpenRecordset("[SQL String]")
...after you've done what you need to do, you have to finish with this:
rs.Close ' closes the recordset
Set rs = Nothing ' clears the pointer to the memory formerly used by it
db.Close
Set db = Nothing
...and that's even if your declared variables go out of scope immediately after that code (which should release all the memory used by them, but doesn't do so 100% reliably).
Now, I'm not saying this is what you do in Java, but I'm simply suggesting that if you're having problems and you think you're releasing all your resources, perhaps you need to determine if you're depending on garbage collection to do so and instead need to do so explicitly.
Forgive me if I'd said anything that's stupid in regard to Java and JDBC -- I'm just reporting some of the problems that Access developers have had in interacting with Jet (via DAO, not ODBC) that report the same error message that you're getting, in the hope that our experience and practice might suggest a solution for your particular programming environment.

Recently I tried UCanAccess - a pure java JDBC Driver for MS Access. Check out: http://sourceforge.net/projects/ucanaccess/ - works on Linux too ;-) For loading the required libraries, some time is needed. I have not tested it for more than read-only purposes yet.
Anyway, I experienced problems as described above with the sun.jdbc.odbc.JdbcOdbcDriver. After adding close() statements following creation of statement objects (and calls to executeUpdate on those) as well as System.gc() statements, the error messages stopped ;-)

There's an outside chance that you're simply running out of free network connections. We had this problem on a busy system at work.
Something to note is that network connections, though closed, may not release the socket until garbage collection time. You could check this with NETSTAT /A /N /P TCP. If you have a lot of connections in the TIME_WAIT state, you could try forcing a garbage collection on connection closes or perhaps regular intervals.

You should also close your Connection object.
Looking into an alternative for the jdbc odbc driver would also be a good idea. Don't have any experience with an alternative myself but this would be a good place to start:
Is there an alternative to using sun.jdbc.odbc.JdbcOdbcDriver?

I had the same problem but none of the above was working. I eventualy locataed the issue.
I was using this to read the value of a form to put back into a lookup list record source.
LocationCode = [Forms]![Support].[LocationCode].Column(2)
ContactCode = Forms("Support")("TakenFrom")
Changed it to the below and it works.
LocationCode = Forms("Support")("LocationCode")
ContactCode = Forms("Support")("TakenFrom")
I know I should have written it better but I hope this helps someone else in the same situation.
Thanks
Greg

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.