Following diagram depicts the simplified ingestion flow we are building to ingest data from different RDBS to Hive.
Step 1: Using JDBC connection to the data-source, source data is streamed and saved in a CSV file on HDFS using HDFS java API.
Basically, execute a 'SELECT * ' query and each row is saved in CSV until the ResultSet is exhausted.
Step 2: Using LOAD DATA INPATH command, Hive table is populated using the CSV file created in Step 1.
We use JDBC ResultSet.getString() to get column data.
This works fine for non-binary data.
But for BLOC,CLOB type columns, we cannot write column data into a text/CSV file.
My question is it possible to use OCR or AVRO format to handle binary columns? Does these formats support write row-by-row?
(Update: We are aware of Sqoop/Nifi..etc technologies, the reason for implementing our custom ingestion-flow is beyond the scope of this question)
Related
Question
How do I store entire files in my H2 database and retrieve them using JDBC?
Some Background
I have some text files that I have as templates for various documents that will be generated in my Spring Boot app. Currently, I have my text files stored in my local file system on my PC, but that is not a long term solution. I need to somehow store them in the database and provide the necessary code for the JDBC for the retrieval of the files.
Are there any technologies/libraries out there that would help me with this? If so, please link me to them and provide an example of how to do it in Spring Boot.
Note: It is a new requirement given to me that the text files should be stored in the database, and not the file system.
You have to use a BLOB column in your database table.
CREATE TABLE my_table(ID INT PRIMARY KEY, document BLOB);
BLOB stands for Binary Large Object.
http://www.h2database.com/html/datatypes.html#blob_type
To store it with JdbcTemplate you have to create a ByteArrayInputStream
ByteArrayInputStream inputStream = new ByteArrayInputStream(document);
preparedStatement.setBlob(3, inputStream);
Please find more examples here:
https://www.logicbig.com/tutorials/spring-framework/spring-data-access-with-jdbc/jdbc-template-with-clob-blob.html
I need to bind a group of csv file in the format "YYYY-MM-DD hh:mm:ss.csv" that are present in the same folder with a unique table that contains all the data present in all the files.
I need to read the data from a Java EE application thus I would like to create a connection pool inside the application server. I found the CsvJdbc driver that allows the reading of multiple files as a single entity. A good starting point was this page in the section with this paragraph:
To read several files (for example, daily log files) as a single table, set the database connection property indexedFiles. The following example demonstrates how to do this.
The example could be fine for me but the problem is that I do not have a header word in the filename string. So the corresponding table becames an empty string that makes obviously impossible to query the table.
How can I tell the driver to map the pattern to a table that hasn't a header part?
P.S. I already tried to use hsqldb as a frontend to the csv files but it does not support multiple files.
Setup CsvJdbc to read several files as described in http://csvjdbc.sourceforge.net/doc.html and then use an empty table name in the SQL query because your CSV filenames do not have any header before the fileTailPattern regular expression. For example:
props.put("fileTailPattern", "(\\d+)-(\\d+)-(\\d+) (\\d+):(\\d+):(\\d+)");
props.put("fileTailParts", "Year,Month,Day,Hour,Minutes,Seconds");
...
ResultSet results = stmt.executeQuery("SELECT * FROM \"\" AS T1");
Hi I am creating table using schema file and loading table from data file through jdbc. I am doing batch upload using PreparedStatement and executeBatch. Data file contents look like the following structure:
key time rowid stream
X 11:40 1 A
Y 3:30 2 B
Now I am able to load successfully table in database. But I would like to test/verify that same table loaded into database against this same data file. how do I do it? How do compare table in database with data file? I am new to JDBC. Please guide. Thanks in advance.
Like Loki said, you can use a tool like DBUnit. Another option is to make a rudimentary integration test whereby your test generates a dump file of your table and compares this dump with the original "good" file.
You need DBunit . Check more details here : http://dbunit.sourceforge.net/howto.html
DB unit helps you to write test cases against data from database.
I want to insert the Xml file data into MySQL table ,, by choosing which column to insert into ,, using Java How will this be done ?
It really depends on the format of your XML file. If your XML file is a direct export from the MySQL file, please refer to this question.
If your XML is in some other format, then I would probably be using JAXB to parse XML into POJO, then write some logic to map the POJO into the database table.
I have .Data file given in the above format . I am writing a program in java that will take the values from the .data file and put it in the buffer. MY java program is connected to Mysql(windows) via JDBC. So I need to read the values from the file given in the above format and put it the buffer like
Insert Into building values ("--", "---",----)
In this way, i store these values and jdbc will populate the database tables on Mysql(windows). Please tell me teh best way.
Check out the answers to this question for reading file lines and splitting them into chunks. I know the question says Groovy: but most answers are Java. Then insert the values you retrieved via JDBC.
Actually, since your data file is obviously CSV, you could also use a CSV libary like OpenCSV to read the values.
The data is in CSV format, so use a CSV library to parse the file and then just add some JDBC code to insert this into database.
Or just call MySQL CSV import command from Java:
try {
// Execute a command with arguments
String command = "mysqlimport [options] db_name textfile1 [textfile2 ...]";
Process child = Runtime.getRuntime().exec(command);
} catch (IOException e) {
}
This is the fourth question for the same task... If your data file is well formatted like in the example you provided, then you don't have to split the line into values:
Source: "AAH196","Austin","TX","Virginia Beach","VA"
Target: INSERT INTO BUILDING VALUES("AAH196","Austin","TX","Virginia Beach","VA");
<=> "INSERT INTO BUILDING VALUES(" + Source + ");"
Just take a complete row from you csv file and concatenate a SQL expression.
(see my answer to question 1 of 4 - BTW, if SQL INJECTION is a potential problem, splitting a line of values is not a solution too)
you can bind your csv with java beans using opencsv.
http://opencsv.sourceforge.net/
you can make these beans persistent using an ORM framework, like Hibernate, Cayenne or with JPA which're based on annotations and map your fields to tables easily without creating any sql statement.
This would be a perfect job for Groovy. Here's a gist with a small skeleton script to build upon.