Is there a simple Java library or approach that will take a SQL query and load data in a CSV file to oracle database. Pls help
You don't have to use Java to load a data file into a table unless it is absolutely necessary. Instead, I'd recommend Oracle's command-line SQL*Loader utility which was designed specially for this purpose.
For similar tasks I usually use Groovy scripts as it's really easy and quick to write and runs on the JVM off course.
...an example:
import groovy.sql.Sql
def file1 = new File(/C:\Documents and Settings\USER\Desktop\Book1.csv/)
def reader = new FileReader(file1)
def sql = Sql.newInstance("jdbc:oracle:thin:#XXXXXX:XXXX:XXX", "SCHEMA",
"USER", "oracle.jdbc.driver.OracleDriver")
reader.each { line ->
fields = line.split(';')
sql.executeInsert("insert into YOUR_TABLE values(${fields[0]},${fields[1]},${fields[2]})")
}
It's a basic example, if you have double quotes and semi columns in your csv you will probably want to use something like OpenCSV to handle that.
You could transform each line in the CSV with regular expressions, to make an insert query, and then send to Oracle (with JDBC).
I think this tool will help you for any type of database import-export problem.
http://www.dmc-fr.com/home_en.php
Do you have that CSV in a file on the database server or can you store it there? Then you may try to have Oracle open it by declaring a DIRECTORY object for the path the file is in and then create an EXTERNAL TABLE which you can query in SQL afterwards. Oracle does the parsing of the file for you.
If you are open to Python you can do bulk load using SQL*Loader
loadConf=('sqlldr userid=%s DATA=%s control=%s LOG=%s.log BAD=%s.bad DISCARD=%s.dsc' % (userid,datafile, ctlfile,fn,fn,fn)).split(' ')
p = Popen(loadConf, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=False, env=os.environ)
output, err = p.communicate()
It's will be much faster that row insert.
I uploaded basic working example here.
Related
I'm trying to create a Model in jena that won't load the entire data into memory but instead will read from the filesystem.
I found a whole lot of available configurations, but they all seem to be in-memory (for example on OntModelSpec).
Use Apache Jena TDB - see documentation here.
TDB stores your dataset on disk, but accesses it very efficiently: you shouldn't experience any real performance difference over an in-memory model.
Typically, if I'm dealing with a large model or dataset I work like this:
Load model on commandline:
# /tmp/DB is where TDB will store the indexed model
$ tdbloader2 --loc /tmp/DB file.nt
(use tdbloader on Windows)
(Optional) Try a query:
$ tdbquery --loc /tmp/DB #query.sparql
Access like any old model from java:
Dataset dataset = TDBFactory.createDataset("/tmp/DB") ;
Model model = dataset.getDefaultModel() ;
... continue as before ...
You can create your own implementation of org.apache.jena.graph.Graph, which won't work with memory.
An example is d2rq, where de.fuberlin.wiwiss.d2rq.jena.GraphD2RQ works with databases. but it is based on outdated jena.
Hie, I m using java and solr as search engine to export the data into csv format.The sequence of step which i m following
--Based on the input parameter i m making a query Let say Q.
--After that i m creating a java Process p to execute that query.
--The Response which i got from solr server i need to push it into a csv file at any location.
enter code here
String Q = "http://pcam-stg-app-03:9999/solr/brm-royalty/select?
q=VENDOR_NAME:\"SOME_NAME\"&f1=vendor_name,vendor_id&wt=csv";
Process p = Runtime.getRuntime().exec(Q);
p.getInputStream();
This stream i need to write in csv file any idea ?
Beware of just paging through the query to get all results...that is an antipattern. If the query returned many docs, and you want all of them, you need to use cursorMark.
A combination of the CSV response writer and SOLRJ to page through all of the results sending it to something like apache commons fileutils:
FileUtils.writeStringToFile(new File(output.csv), outputLine ("line.separator"), true);
Would be quiet quick to knock up in Java.
Is it possible to get URLs into Nutch directly from a database or a service etc. I'm not interested in the ways which data is taken from the database or service and written to seed.txt.
No. This cannot be done directly with the default nutch codebase. You need to modify Injector.java to achieve that.
EDIT:
Try using DBInputFormat : an InputFormat that reads input data from an SQL table. You need to modify the Inject code here (line 3 in snippet below):
JobConf sortJob = new NutchJob(getConf());
sortJob.setJobName("inject " + urlDir);
FileInputFormat.addInputPath(sortJob, urlDir);
sortJob.setMapperClass(InjectMapper.class);
I have the following feeds from my vendor,
http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule
I wanted to get the data from that xml files as java objects, so that I can insert into my database regularly.
The above data is nothing but regular updates from the vendor, so that I can update in my website.
can you please suggest me what are my options available to get this working
Should I use any webservices or just Xstream
to get my final output.. please suggest me as am a new comer to this concept
Vendor has suggested me that he can give me the data in following 3 formats rss, xml or json, I am not sure what is easy and less consumable to get it working
I would suggest just write a program that parses the XML and inserts the data directly into your database.
Example
This groovy script inserts data into a H2 database.
//
// Dependencies
// ============
import groovy.sql.Sql
#Grapes([
#Grab(group='com.h2database', module='h2', version='1.3.163'),
#GrabConfig(systemClassLoader=true)
])
//
// Main program
// ============
def sql = Sql.newInstance("jdbc:h2:db/cricket", "user", "pass", "org.h2.Driver")
def dataUrl = new URL("http://scores.cricandcric.com/cricket/getFeed?key=4333433434343&format=xml&tagsformat=long&type=schedule")
dataUrl.withReader { reader ->
def feeds = new XmlSlurper().parse(reader)
feeds.matches.match.each {
def data = [
it.id,
it.name,
it.type,
it.tournamentId,
it.location,
it.date,
it.GMTTime,
it.localTime,
it.description,
it.team1,
it.team2,
it.teamId1,
it.teamId2,
it.tournamentName,
it.logo
].collect {
it.text()
}
sql.execute("INSERT INTO matches (id,name,type,tournamentId,location,date,GMTTime,localTime,description,team1,team2,teamId1,teamId2,tournamentName,logo) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", data)
}
}
Well... you could use an XML Parser (stream or DOM), or a JSON parser (again stream of 'DOM'), and build the objects on the fly. But with this data - which seems to consist of records of cricket matches, why not go with a csv format?
This seems to be your basic 'datum':
<id>1263</id>
<name>Australia v India 3rd Test at Perth - Jan 13-17, 2012</name>
<type>TestMatch</type>
<tournamentId>137</tournamentId>
<location>Perth</location>
<date>2012-01-14</date>
<GMTTime>02:30:00</GMTTime>
<localTime>10:30:00</localTime>
<description>3rd Test day 2</description>
<team1>Australia</team1>
<team2>India</team2>
<teamId1>7</teamId1>
<teamId2>1</teamId2>
<tournamentName>India tour of Australia 2011-12</tournamentName>
<logo>/cricket/137/tournament.png</logo>
Of course you would still have to parse a csv, and deal with character delimiting (such as when you have a ' or a " in a string), but it will reduce your network traffic quite substantially, and likely parse much faster on the client. Of course, this depends on what your client is.
Actually you have RESTful store that can return data in several formats and you only need to read from this source and no further interaction is needed.
So, you can use any XML Parser to parse XML data and put the extracted data in whatever data structure that you want or you have.
I did not hear about XTREME, but you can find more information about selecting the best parser for your situation at this StackOverflow question.
In Java, I have written a program that reads a UTF8 text file. The text file contains a SQL query of the SELECT kind. The program then executes the query on the Microsoft Access 2007 database and writes all fields of the first row to a UTF8 text file.
The problem I have is when a row is returned that contains unicode characters, such as "♪". These characters show up as "?" in the text file.
I know that the text files are read and written correctly, because a dummy UTF8 character ("◎") is read from the text file containing the SQL query and written to the text file containing the resulting row. The UTF8 character looks correct when the written text file is opened in Notepad, so the reading and writing of the text files are not part of the problem.
This is how I connect to the database and how I execute the SQL query:
Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/database.accdb;Pwd=temp");
ResultSet r = c.createStatement().executeQuery(sql);
I have tried making a charSet property to the Connection but it makes no difference:
Properties p = new Properties();
p.put("charSet", "utf-8");
p.put("lc_ctype", "utf-8");
p.put("encoding", "utf-8");
Connection c = DriverManager.getConnection("...", p);
Tried with "utf8"/"UTF8"/"UTF-8", no difference. If I enter "UTF-16" I get the following exception:
java.lang.IllegalArgumentException: Illegal replacement
Been searching around for hours with no results and now turn my hope to you. Please help!
I also accept workaround suggestions. =) What I want to be able to do is to make a Unicode query (for example one that searches for posts that contain the "あ" character) and to have results with Unicode characters receieved and saved correctly.
Thank you!
Update. Here is a self-contained example of the issue:
package test;
import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.InputStreamReader;import java.io.OutputStreamWriter;import java.nio.charset.Charset;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.util.Properties;
public class Standalone {
public static void main(String[] args) {
try {
Properties p = new Properties();
p.put("charSet", "UTF8");
Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=./dummy.accdb;Pwd=pass", p);
ResultSet r = c.createStatement().executeQuery("SELECT TOP 1 * FROM main;");
r.next();
OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(new File("results.txt")), Charset.forName("UTF-8"));
osw.write(new BufferedReader(new InputStreamReader(new FileInputStream("utf8.txt"), Charset.forName("UTF-8"))).readLine() +" : "+ r.getString("content"));
osw.close();
c.close();
System.out.println("Done.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
What the example does is that it opens the database "dummy.accdb" encrypted with the password "pass" and pulls the first post out of the table "main". It then reads the text file "utf8.txt" and writes a text file "results.txt" which will contain the first row of "utf8.txt" plus the value of the field "content" it got from the database.
In the file "utf8.txt" I have stored "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙".
In the database's "main" table's "content" field I have stored "♫♪あキタℳℴℯ♥∞۞♀♂".
After the application has finished running the "results.txt" has the following content: "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙ : ?????Moe?8???".
It successfully read and write the UTF8 characters of the "utf8.txt" text file, but failed to obtain the correct characters from the database. This is where the problem lies.
Update. Thought I should mention that the field in the database is of the type "memo", I have tried havig "Unicode Compression" set both to "No" and to "Yes" (recreating the post between tries to make sure no compression were there when "No" was selected). To my understanding Access uses UTF-16 when it saves Unicode characters, however with compression on it changes to UTF-8. In any case this did not make any difference.
Bonus question, anyone know how to connect to the database using a pure ODBC provider in Java? Or any other kind of method? This would provide me with a good workaround.
Update. I have been trying to feed these four to getConnection:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Driver={Microsoft.Jet.OLEDB.4.0};Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.ACE.OLEDB.12.0;Data Source=./dummy.accdb"
The first give the error "java.sql.SQLException: No suitable driver found for Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb" and the two in the middle gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified". The last one gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name too long".
I don't understand what getConnection wants. The parameter description is as follows: "url - a database url of the form jdbc:subprotocol:subname". Huh? I clearly don't get what that means.
Anyone know any alternative working ways of connecting to the Access 2007 database through Java? Maybe the providers I tried aren't supported but some other might be?
An alternative to using jdbc with Access is the Jackcess library. the pro of using this library is that there are no problematic translation layers (like odbc) or any external system dependencies. the con of using this library is that it does not support executing queries.
Since you mentioned switching to some other DB than Access is possible, I urge you to do so. Making your software on Microsoft Office products has always been a maintenance nightmare for me, so choose anything else from this list: http://java-source.net/open-source/database-engines.
I would go with Apache Derby for this, or just use the Java Database JavaDB that comes preinstalled with any current Sun Java Installation (and is in fact a repackaged Derby DB)
Now that the JDBC-ODBC Bridge has been dropped from Java SE 8 and Oracle has confirmed that this issue will never be fixed (ref: here) a good alternative would be to use UCanAccess. For more information, see
UCanAccess on SourceForge
Manipulating an Access database from Java without ODBC
utf-8 is not supported:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6345277