HBase doesn't store all records

HBase doesn't store all records - java

I have 1.2M records at my MongoDB database. And I want to store all of this data at HBase programmatically. Basically I try to put each retrieved record to HBase in a loop. After the operation is finished, I got only 39912 records on HBase.
Here's what I've tried:
Configuration config = HBaseConfiguration.create();
String tableName = "storedtweet";
String familyName = "msg";
String qualifierName = "msg";
HTable table = new HTable(config, tableName);
// using Spring Data MongoDB to interact with MongoDB
List < StoredTweet > storedTweetList = mongoDAO.getMongoTemplate().findAll(StoredTweet.class);
for (StoredTweet storedTweet: storedTweetList) {
Put p = new Put(Bytes.toBytes(storedTweet.getTweetId()));
p.add(Bytes.toBytes(familyName), Bytes.toBytes(qualifierName), Bytes.toBytes(storedTweet.getMsg()));
table.put(p);
table.flushCommits();
}

If some row key exists and you put it again, HBase Put will override the former. I think there are some records having the same tweet id (you set it to the row key) in your data. That's why some records disappear.

Related

How to get auto generated primary key after inserting or updating record using jdbc template batch update?

I am using spring JDBC template for data insertion in Oracle and I have one requirement that I have to bulk insert using spring JDBC template batch update and I want auto generated primary key and that key I need to pass in another method but I am not able to get that auto generated primary using batch update.
Can you please provide solution?

Assuming you have auto generated PK in oracle,
See sample code:
final String insertNewFieldSql = Config.getSqlProperty("insert_new_field_record");
GeneratedKeyHolder holder = new GeneratedKeyHolder();
MapSqlParameterSource parameters = null;
for (ParsedData field : fields) {
parameters = new MapSqlParameterSource();
parameters.addValue("FIELD_1",parsedEmail.getDbRecordId())
.addValue("FIELD_2",field.getName());
namedParameterJdbcTemplate.update( insertNewFieldSql, parameters, holder, new String[] {"PK_FIELD_ID" } );
Long newFieldKey = holder.getKey().longValue();
logger.log(Level.FINEST, "row was added: " + newFieldKey);
}

Liquibase loading data from csv

I want to load an entire column of my PostgreSql table with data from csv file but when I do that I get an exception saying that the primary key of my table should not be null. It looks like Liquibase is creating new rows to insert the data. Is there a way to load the data in existing rows ?
DatabaseChangeLog dbChangeLog = new DatabaseChangeLog();
Liquibase liquibase = new Liquibase(dbChangeLog, new FileSystemResourceAccessor(), database);
ChangeSet loadChangeSet = new ChangeSet(id + "", "nasri", false, false, "", "", "", liquibase.getDatabaseChangeLog());
LoadDataChange loadDataChange = new LoadDataChange();
loadDataChange.setTableName(key);
loadDataChange.setChangeSet(loadChangeSet);
loadDataChange.setResourceAccessor(new FileSystemResourceAccessor());
String path = context.getBundle().getVersion() + "." + key + "." + columnKey + "." + targetFieldKey + ".csv";
loadDataChange.setFile(path);
loadDataChange.setSchemaName("public");
LoadDataColumnConfig columnConfig = new LoadDataColumnConfig();
columnConfig.setName(targetFieldKey);
columnConfig.setType("String");
loadDataChange.addColumn(columnConfig);
loadChangeSet.addChange(loadDataChange);
liquibase.getDatabaseChangeLog().addChangeSet(loadChangeSet);
liquibase.update("");

There is a class thats called: LoadUpdateDataChange
The description says:
Loads or updates data from a CSV file into an existing table. Differs from loadData by issuing a SQL batch that checks for the existence of a record. If found, the record is UPDATEd, else the record is INSERTed. Also, generates DELETE statements for a rollback.
Looks like it should do what you are looking for (I have not used this myself though).

To update records, LoadUpdateDataChange is not working for me. Its always trying to insert new records.It won't update existing records.
If want to update existing records, you should use like below.
<update tableName="someTable">
<column name="update_column_name" value="updated value" />
<where> primaryKey = condition</where>
</update>

Apache Ignite : How to list all tables and all Caches

Is there any way to list all tables present in a specific Cache and list all caches present on a Apache Ignite Server?
----------------------------------UPDATED--------------------------
Hi,
I am running following code to know Cache name and list all tables present in my cache. This program list outs all cache name present on server. However table listing is printed as blank collection. Meanwhile SQL query present in example is working fine.
public static void main(String[] args) throws Exception {
System.out.println("Run Spring example!!");
Ignition.setClientMode(true);
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setIncludeEventTypes( EVTS_CACHE);
cfg.setPeerClassLoadingEnabled(true);
TcpDiscoveryMulticastIpFinder discoveryMulticastIpFinder = new TcpDiscoveryMulticastIpFinder();
Set<String> set = new HashSet<>();
set.add("hostname:47500..47509");
discoveryMulticastIpFinder.setAddresses(set);
TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
discoverySpi.setIpFinder(discoveryMulticastIpFinder);
cfg.setDiscoverySpi(discoverySpi);
cfg.setPeerClassLoadingEnabled(true);
cfg.setIncludeEventTypes(EVTS_CACHE);
Ignite ignite = Ignition.start(cfg);
System.out.println("All Available Cache on server : "+ignite.cacheNames());
CacheConfiguration<String, BinaryObject> cacheConfiguration = new CacheConfiguration<>(CACHE_NAME);
Collection<QueryEntity> entities = cacheConfiguration.getQueryEntities();
System.out.println("All available tables in cache : "+entities);
cacheConfiguration.setIndexedTypes(String.class, BinaryObject.class);
//cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
IgniteCache<String, BinaryObject> cache = ignite.getOrCreateCache(cacheConfiguration).withKeepBinary();
System.out.println();
QueryCursor<List<?>> query = cache.query(new SqlFieldsQuery("select Field1 from table1 where Field1='TEST'"));
List<List<?>> all = query.getAll();
for (List<?> l : all) {
System.out.println(l);
}
}

Get all cache names: Ignite.cacheNames(). Then use Ignite.cache(String) to get the cache instance.
Get SQL tables:
CacheConfiguration ccfg = cache.getConfiguration(CacheConfiguration.class);
Collection<QueryEntity> entities = ccfg.getQueryEntities();
Each query entity represents a table.

You can read using h2 query.
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA is the cache name
ClientConfiguration cfg = new ClientConfiguration().setAddresses(host+":"+port).
setUserName(username).setUserPassword(pwd);
private static IgniteClient igniteClient = Ignition.startClient(cfg);
private static ClientCache<Integer, String>
cache=igniteClient.getOrCreateCache(cacheName);
QueryCursor<List<?>> cursor =cache.query(new SqlFieldsQuery("SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='"+cacheName+"'"));
for (List<?> row : cursor) {
System.out.println(row.get(0));
}

You can get all cache names using Ignite.cacheNames(). In order to get all table names you can use SHOW TABLES command:
QueryCursor<List<?>> cursor = cache.query(new SqlFieldsQuery("SHOW TABLES FROM \""+CACHE_NAME+"\""));
for (List<?> row : cursor) {
System.out.println(row.get(0));
}
More details about the SHOW command you can find here: http://www.h2database.com/html/grammar.html#show

Retrieve multiple columns value from Cassandra using Hector client

I am working with Cassandra and I am using Hector client to read and upsert the data in Cassandra database. I am trying to retrieve the data from Cassandra database using hector client and I am able to do that if I am trying to retrieve only one column.
Now I am trying to retrieve the data for rowKey as 1011 but with columnNames as collection of string. Below is my API that will retrieve the data from Cassandra database using Hector client-
public Map<String, String> getAttributes(String rowKey, Collection<String> attributeNames, String columnFamily) {
final Cluster cluster = CassandraHectorConnection.getInstance().getCluster();
final Keyspace keyspace = CassandraHectorConnection.getInstance().getKeyspace();
try {
ColumnQuery<String, String, String> columnQuery = HFactory
.createStringColumnQuery(keyspace)
.setColumnFamily(columnFamily).setKey(rowKey)
.setName("c1");
QueryResult<HColumn<String, String>> result = columnQuery.execute();
System.out.println("Column Name from cassandra: " + result.get().getName() + "Column value from cassandra: " + result.get().getValue());
} catch (HectorException e) {
LOG.error("Exception in CassandraHectorClient::getAttributes " +e+ ", RowKey = " +rowKey+ ", Attribute Names = " +attributeNames);
} finally {
cluster.getConnectionManager().shutdown();
}
return null;
}
If you see my above method, I am trying to retrieve the data from Cassandra database for a particular rowKey and for column c1. Now I am trying to retrieve the data from Cassandra database for collection of columns for a particular rowKey.
Meaning something like this-
I want to retrieve the data for multiple columns but for the same rowKey. How can I do this using Hector client? And I don't want to retrieve the data for all the columns and then iterate to find out the individual columns data I am looking for.

Use column name made up with composite key as combination of UTF8Type and TIMEUUID
then after
sliceQuery.setKey("your row key");
Composite startRange = new Composite();
startRange.addComponent(0, "c1",Composite.ComponentEquality.EQUAL);
Composite endRange = new Composite();
endRange.addComponent(0, "c1",Composite.ComponentEquality.GREATER_THAN_EQUAL);
sliceQuery.setRange(startRange,endRange, false, Integer.MAX_VALUE);
QueryResult<ColumnSlice<Composite, String>> result = sliceQuery.execute();
ColumnSlice<Composite, String> cs = result.get();
above code will give you all records for you row key
after that iterate as follows
for (HColumn<Composite, String> col : cs.getColumns()) {
System.out.println("column key's first part : "+col.getName().get(0, HFactoryHelper.stringSerializer).toString());
System.out.println("column key's second part : "+col.getName().get(1, HFactoryHelper.uuidSerializer).toString());
System.out.println("column key's value : "+col.getValue());
}
some where you have to write logic to maintain set of records

Random column fetch in cassandra

I am using this code for fetching user_id & user_code
Keyspace keyspace = HFactory.createKeyspace("test", cluster);
CqlQuery<String,String,ByteBuffer> cqlQuery = new CqlQuery<String,String,ByteBuffer>(keyspace, stringSerializer, stringSerializer, new ByteBufferSerializer());
cqlQuery.setQuery("select user_id,user_code from User");
QueryResult<CqlRows<String,String,ByteBuffer>> result = cqlQuery.execute();
Iterator iterator = result.get().iterator();
while(iterator.hasNext()) {
Row<String, String, ByteBuffer> row = (Row<String, String, ByteBuffer>) iterator.next();
System.out.println("\nInserted data is as follows:\n" + row.getColumnSlice().getColumns().get(0).getValue().getInt());
System.out.println("\nInserted data is as follows:\n" + Charset.forName("UTF-8").decode(row.getColumnSlice().getColumns().get(1).getValueBytes()));
}
Now Problem lies here that I am converting the fields according to their specific type
What if query goes random? How to handle that scenario?

CQL queries are returned with metadata about the columns they contain, similar to a JDBC resultset.
I don't know if or how Hector exposes this information. For CQL, a better choice would be the new pure CQL driver here: https://github.com/datastax/java-driver

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HBase doesn't store all records - java

If some row key exists and you put it again, HBase Put will override the former. I think there are some records having the same tweet id (you set it to the row key) in your data. That's why some records disappear.

Related

How to get auto generated primary key after inserting or updating record using jdbc template batch update?

Liquibase loading data from csv

Apache Ignite : How to list all tables and all Caches

Retrieve multiple columns value from Cassandra using Hector client

Random column fetch in cassandra

Categories

Resources