I am using DBUNIT for exporting my dataset as XML. I wonder is there any "simple" way to get all dependet tables and data by given table name and some criterion?
private void exportDBDepended() throws Exception {
String[] depTableNames = TablesDependencyHelper.getAllDependentTables(connection, "dbo.Users" );
IDataSet depDataset = connection.createDataSet( depTableNames );
FlatXmlDataSet.write(depDataset, new FileOutputStream("dependents.xml"));
}
This gives me all records and dependet data, but I would like to fetch only top 100 records, or some other condition for table.
Thanks in advance,
Dario
I didn't manage to solve my requirement programmatically, but http://jailer.sourceforge.net/ will be more than eneugh for preparing XML DB data.
Related
What I want to do is read an existing table and generate a new table which has the same schema as the original table plus a few extra column (computed from some columns of the original table). The original table schema can be increased without notice to me (the fields I am using in my dataflow job won't change), so I would like to always read the schema instead of defining some custom class which contains the schema.
In Dataflow SDK 1.x, I can get the TableSchema via
final DataflowPipelineOptions options = ...
final String projectId = ...
final String dataset = ...
final String table = ...
final TableSchema schema = new BigQueryServicesImpl()
.getDatasetService(options)
.getTable(projectId, dataset, table)
.getSchema();
For Dataflow SDK 2.x, BigQueryServicesImpl has become a package-private class.
I read the responses in Get TableSchema from BigQuery result PCollection<TableRow> but I'd prefer not to make a separate query to BigQuery. As that response is now almost 2 years old, are there other thoughts or ideas from the SO community?
Due to how BigQueryI/O is setup now. It needs to query the table schema before the pipleine begins to run. This is a good feature idea, but its not feasible in a single pipeline. In the example you linked the table schema is queries before running the pipeline.
If new columns are added, then unfortunately a new pipeline must be relaunched.
I am working on a monitoring tool developed in Spring Boot using Hibernate as ORM.
I need to compare each row (already persisted rows of sent messages) in my table and see if a MailId (unique) has received a feedback (status: OPENED, BOUNCED, DELIVERED...) Yes or Not.
I get the feedbacks by reading csv files from a network folder. The CSV parsing and reading of files goes very fast, but the update of my database is very slow. My algorithm is not very efficient because I loop trough a list that can have hundred thousands of objects and look in my table.
This is the method that make the update in my table by updating the "target" Object (row in table database)
#Override
public void updateTargetObjectFoo() throws CSVProcessingException, FileNotFoundException {
// Here I make a call to performProcessing method which reads files on a folder and parse them to JavaObjects and I map them in a feedBackList of type Foo
List<Foo> feedBackList = performProcessing(env.getProperty("foo_in"), EXPECTED_HEADER_FIELDS_STATUS, Foo.class, ".LETTERS.STATUS.");
for (Foo foo: feedBackList) {
//findByKey does a simple Select in mySql where MailId = foo.getMailId()
Foo persistedFoo = fooDao.findByKey(foo.getMailId());
if (persistedFoo != null) {
persistedFoo.setStatus(foo.getStatus());
persistedFoo.setDnsCode(foo.getDnsCode());
persistedFoo.setReturnDate(foo.getReturnDate());
persistedFoo.setReturnTime(foo.getReturnTime());
//The save account here does an MySql UPDATE on the table
fooDao.saveAccount(foo);
}
}
}
What if I achieve this selection/comparison and update action in Java side? Then re-update the whole list in database?
Will it be faster?
Thanks to all for your help.
Hibernate is not particularly well-suited for batch processing.
You may be better off using Spring's JdbcTemplate to do jdbc batch processing.
However, if you must do this via Hibernate, this may help: https://docs.jboss.org/hibernate/orm/5.2/userguide/html_single/chapters/batch/Batching.html
I have been using open source data set provider Casper to achieve in-memory representation of a collection of Database objects in Java.
Github Repository : https://github.com/casperds/casperdatasets
Below is the code that I have been using to pull data in Casper datasets
String[] primaryKeys = { "QUESTION_ID" };
if (resultSet != null)
{
container = CDataCacheDBAdapter.loadData(resultSet, null, primaryKeys,new HashMap<Object, Object>());
lCDataRowset = container.getAll();
preparedStatement.close();
resultSet.close();
}
The problem with using this is, when I don't mention primary keys then DBAdapter does not load data. And If I mention some column as primary keys then "Order By" does not have effect in the dataset. It just orders by primary keys.
I want to be able to pull data in dataset in order the way I have mentioned in the query.
Did anybody face this issue? Any kind of help is appreciated!! Thanks
Well it turned out to be very stupid issue. If you pass null for primaryKeys parameter then it returns data in the order the way it returns in MySQL query browser.
I thought this could help someone someday. That's why keeping this post other wise I would have deleted it.
I am using mapreduce and HfileOutputFormat to produce hfiles and bulk load them directly into the hbase table.
Now, while reading the input files, I want to produce hfiles for two tables and bulk load the outputs in a single mapreduce.
I searched the web and see some links about MultiHfileOutputFormat and couldn't find a real solution to that.
Do you think that it is possible?
My way is :
use HFileOutputFormat as well, when the job is completed , doBulkLoad, write into table1.
set a List puts in mapper, and a MAX_PUTS value in global.
when puts.size()>MAX_PUTS, do:
String tableName = conf.get("hbase.table.name.dic", table2);
HTable table = new HTable(conf, tableName);
table.setAutoFlushTo(false);
table.setWriteBufferSize(1024*1024*64);
table.put(puts);
table.close();
puts.clear();
notice:you mast have a cleanup function to write the left puts .
I'm new to HBase, what's the best way to retrieve results from a table, row by row? I would like to read the entire data in the table. My table has two column families say col1 and col2.
From Hbase shell, you can use scan command to list data in table, or get to retrieve a record. Reference here
I think here is what you need: both through HBase shell and Java API: http://cook.coredump.me/post/19672191046/hbase-client-example
However you should understand hbase shell 'scan' is very slow (it is not cached). But it is intended only for debug purpose.
Another useful part of information for you is here: http://hbase.apache.org/book/perf.reading.html
This chapter is right about reading from HBase but is is somewhat harder to understand because it assumes some level of familiarity and contains more advanced advices. I'd recommend to you this guide starting from the beginning.
USe Scan api of Hbase , there you can specify start row and end row and can retrive data frm the table .
Here is an example:
http://eternaltechnology.blogspot.in/2013/05/hbase-scanner-example-scanning.html
I was looking for something like this!
Map function
public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
String x1 = new String(value.getValue(Bytes.toBytes("ColumnFamily"), Bytes.toBytes("X1")));
String x2 = new String(value.getValue(Bytes.toBytes("ColumnFamily"), Bytes.toBytes("X2")));
}
Driver file:
Configuration config2 = new Configuration();
Job job2 = new Job(config1, "kmeans2");
//Configuration for job2
job2.setJarByClass(Converge.class);
job2.setMapperClass(Converge.Map.class);
job2.setReducerClass(Converge.Reduce.class);
job2.setInputFormatClass(TableInputFormat.class);
job2.setOutputFormatClass(NullOutputFormat.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
job2.getConfiguration().set(TableInputFormat.INPUT_TABLE, "tablename");