I'm currently using HBase v0.98.6. I would like to check the current connection status from an external Java program. Right now, I'm doing something like this to check:
connectionSuccess = true;
try {
HConnection hConnection = createConnection(config);
} catch (Exception ex) {
connectionSuccess = false;
}
When the connection is working, this returns fairly quickly. The problem is when the connection is not working, and it takes 20 minutes for it to finally return connectionSuccess=false. Is there a way to reduce this time limit, as I'm just interested in getting the connection status at the current time?
The reason it takes so long is that by default if the connection fails it will retry multiple times (I think 6? don't quote me), and each connection attempt takes a while. Try a combination of these commands to limit time per connection before timeout, and number of permitted retry attempts.
hbase.client.retries.number = 3
hbase.client.pause = 1000
zookeeper.recovery.retry = 1 (i.e. no retry)
Credit to Lars from http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html
You can set the retries value to 1 to get the status of the connection at the current time.
Configuration conf = HBaseConfiguration.create();
conf.setInt("hbase.client.retries.number",1);
conf.setInt("zookeeper.recovery.retry",0);
Or you can use the below in-built HbaseAdmin method which does the same thing.
connectionSuccess = true;
try
{
HBaseAdmin.checkHBaseAvailable(config);
}
catch(MasterNotRunningException e)
{
connectionSuccess = false;
}
My org.apache.hadoop.conf.Configuration object contains following key value pairs:
Configuration conf = HBaseConfiguration.create();
//configuring timeout and retry parameters
conf.set("hbase.rpc.timeout", "10000");
conf.set("hbase.client.scanner.timeout.period", "10000");
conf.set("hbase.cells.scanned.per.heartbeat.check", "10000");
conf.set("zookeeper.session.timeout", "10000");
conf.set("phoenix.query.timeoutMs", "10000");
conf.set("phoenix.query.keepAliveMs", "10000");
conf.set("hbase.client.retries.number", "3");
conf.set("hbase.client.pause", "1000");
conf.set("zookeeper.recovery.retry", "1");
Related
I have recently started using JedisCluster for my application. There is little to no documentation and examples for the same. I tested a use case and the results are not what I expected
public class test {
private static JedisCluster setConnection(HashSet<HostAndPort> IP) {
JedisCluster jediscluster = new JedisCluster(IP, 30000, 3,
new GenericObjectPoolConfig() {{
setMaxTotal(500);
setMinIdle(1);
setMaxIdle(500);
setBlockWhenExhausted(true);
setMaxWaitMillis(30000);
}});
return jediscluster;
}
public static int getIdleconn(Map<String, JedisPool> nodes){
int i = 0;
for (String k : nodes.keySet()) {
i+=nodes.get(k).getNumIdle();
}
return i;
}
public static void main(String[] args) {
HashSet IP = new HashSet<HostAndPort>() {
{
add(new HostAndPort("host1", port1));
add(new HostAndPort("host2", port2));
}};
JedisCluster cluster = setConnection(IP);
System.out.println(getIdleconn(cluster.getClusterNodes()));
cluster.set("Dummy", "0");
cluster.set("Dummy1", "0");
cluster.set("Dummy3", "0");
System.out.println(getIdleconn(cluster.getClusterNodes()));
try {
Thread.sleep(60000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(getIdleconn(cluster.getClusterNodes()));
}
}
The output for this snippet is:
0
3
3
Questions=>
I have set the timeout to 30000 JedisCluster(IP, 30000, 3,new GenericObjectPoolConfig() . I believe this is the connection timeout which means Idle connections are closed after 30 seconds. Although this doesn't seem to be happening. After sleeping for 60 seconds, the number of idle connections is still 3. What I am doing/understanding wrong here? I want the pool to close the connection if not used for more than 30 seconds.
setMinIdle(1). Does this mean that regardless the connection timeout, the pool will always maintain one connection?
I prefer availability more than throughput for my app. What should be the value for setMaxWaitMillis if conn timeout is 30 secs?
Though rare, the app fails with redis.clients.jedis.exceptions.JedisNoReachableClusterNodeException: No reachable node in cluster. This i think is connected to 1. How to prevent this?
30000 or 30 seconds here refers to (socket) timeout; the timeout for single socket (read) operation. It is not related with closing idle connections.
Closing idle connections are controlled by GenericObjectPoolConfig. So check the parameters there.
Yes (mostly).
setMaxWaitMillis is the timeout for getting a connection object from a connection object pool. It is not related to 30 secs and not really solve you anything in terms of availability.
Keep your cluster nodes available.
There has been changes in Jedis related to this. You can try a recent version (4.x, even better 4.2.x).
Trying to load around 50K messages into KAFKA topic. In the beginning of few runs getting below exception but not all the time.
org.apache.kafka.common.KafkaException: Cannot execute transactional method because we are in an error state
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeFailWithError(TransactionManager.java:784) ~[kafka-clients-2.0.0.jar:?]
at org.apache.kafka.clients.producer.internals.TransactionManager.beginAbort(TransactionManager.java:229) ~[kafka-clients-2.0.0.jar:?]
at org.apache.kafka.clients.producer.KafkaProducer.abortTransaction(KafkaProducer.java:679) ~[kafka-clients-2.0.0.jar:?]
at myPackage.persistUpdatesPostAction(MyCode.java:??) ~[aKafka.jar:?]
...
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer
attempted an operation with an old epoch. Either there is a newer producer with
the same transactionalId, or the producer's transaction has been expired by the
broker.
Code Block is below:
public void persistUpdatesPostAction(List<Message> messageList ) {
if ((messageList == null) || (messageList.isEmpty())) {
return;
}
logger.createDebug("Messages in batch(postAction) : "+ messageList.size());
Producer<String,String> producer = KafkaUtils.getProducer(Thread.currentThread().getName());
try {
producer.beginTransaction();
createKafkaBulkInsert1(producer, messageList, "Topic1");
createKafkaBulkInsert2(producer, messageList, "Topic2");
createKafkaBulkInsert3(producer, messageList, "Topic3");
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
producer.close();
KafkaUtils.removeProducer(Thread.currentThread().getName());
}
}
-----------
static Properties setPropertiesProducer() {
Properties temp = new Properties();
temp.put("bootstrap.servers", "localhost:9092");
temp.put("acks", "all");
temp.put("retries", 1);
temp.put("batch.size", 16384);
temp.put("linger.ms", 5);
temp.put("buffer.memory", 33554432);
temp.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
temp.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return temp;
}
public static Producer<String, String> getProducer(String aThreadId) {
if ((producerMap.size() == 0) || (producerMap.get(aThreadId) == null)) {
Properties temp = producerProps;
temp.put("transactional.id", aThreadId);
Producer<String, String> producer = new KafkaProducer<String, String>(temp);
producerMap.put(aThreadId, producer);
producer.initTransactions();
return producer;
}
return producerMap.get(aThreadId);
}
public static void removeProducer(String aThreadId) {
logger.createDebug("Removing Thread ID :" + aThreadId);
if (producerMap.get(aThreadId) == null)
return;
producerMap.remove(aThreadId);
}
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer
attempted an operation with an old epoch. Either there is a newer producer with
the same transactionalId, or the producer's transaction has been expired by the
broker.
This exception message is not very helpful. I believe that it is trying to say that the broker no longer has any record of the transaction-id that is being sent by the client. This can either be because:
Someone else was using the same transaction-id and committed it already. In my experience, this is less likely unless you are sharing transaction-ids between clients. We ensure that our ids are unique using UUID.randomUUID().
The transaction timed out and was removed by broker automation.
In our case, we were hitting transaction timeouts every so often that generated this exception. There are 2 properties that govern how long the broker will remember a transaction before aborting it and forgetting about it.
transaction.max.timeout.ms -- A broker property that specifies the maximum number of milliseconds until a transaction is aborted and forgotten. Default in many Kafka versions seems to be 900000 (15 minutes). Documentation from Kafka says:
The maximum allowed timeout for transactions. If a client’s requested transaction time exceeds this, then the broker will return an error in InitProducerIdRequest. This prevents a client from too large of a timeout, which can stall consumers reading from topics included in the transaction.
transaction.timeout.ms -- A producer client property that sets the timeout in milliseconds when a transaction is created. Default in many Kafka versions seems to be 60000 (1 minute). Documentation from Kafka says:
The maximum amount of time in ms that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction.
If the transaction.timeout.ms property set in the client exceeds the transaction.max.timeout.ms property in the broker, the producer will immediately throw something like the following exception:
org.apache.kafka.common.KafkaException: Unexpected error in
InitProducerIdResponse The transaction timeout is larger than the maximum value
allowed by the broker (as configured by transaction.max.timeout.ms).
There was race condition in my Producer initialization code. I have fixed by changing Producer map to the type ConcurrentHashMap to ensure thread safe.
I write a unit test to reproduce this, from this piece of Java code, you can easily understand how this happen by two same tansactional id.
#Test
public void SendOffset_TwoProducerDuplicateTrxId_ThrowException() {
// create two producer with same transactional id
Producer producer1 = KafkaBuilder.buildProducer(trxId, servers);
Producer producer2 = KafkaBuilder.buildProducer(trxId, servers);
offsetMap.put(new TopicPartition(topic, 0), new OffsetAndMetadata(1000));
// initial and start two transactions
sendOffsetBegin(producer1);
sendOffsetBegin(producer2);
try {
// when commit first transaction it expected to throw exception
sendOffsetEnd(producer1);
// it expects not run here
Assert.assertTrue(false);
} catch (Throwable t) {
// it expects to catch the exception
Assert.assertTrue(t instanceof ProducerFencedException);
}
}
private void sendOffsetBegin(Producer producer) {
producer.initTransactions();
producer.beginTransaction();
producer.sendOffsetsToTransaction(offsetMap, consumerGroup);
}
private void sendOffsetEnd(Producer producer) {
producer.commitTransaction();
}
When running multiple instances of the application, transactional.id
must be the same on all instances to satisfy fencing zombies when
producing records on a listener container thread. However, when
producing records using transactions that are not started by a
listener container, the prefix has to be different on each instance.
https://docs.spring.io/spring-kafka/reference/html/#transaction-id-prefix
DropwizardMetricServices#submit() I'm using doesn't submit the gauge metric for second time.
i.e. My use-case is to remove the gauge metric from JMX after reading it. And my application can send the same metric (with different value).
For the first time the gauge metric is submitted successfully (then my application removes it once it reads the metric). But, the same metric is not submitted the second time.
So, I'm a bit confused what would be the reason for DropwizardMetricServices#submit() not to work for the second time?
Below is the code:
Submit metric:
private void submitNonSparseMetric(final String metricName, final long value) {
validateMetricName(metricName);
metricService.submit(metricName, value); // metricService is the DropwizardMetricServices
log(metricName, value);
LOGGER.debug("Submitted the metric {} to JMX", metricName);
}
Code that reads and removes the metric:
protected void collectMetrics() {
// Create the connection
Long currTime = System.currentTimeMillis()/1000; // Graphite needs
Socket connection = createConnection();
if (connection == null){
return;
}
// Get the output stream
DataOutputStream outputStream = getDataOutputStream(connection);
if (outputStream == null){
closeConnection();
return;
}
// Get metrics from JMX
Map<String, Gauge> g = metricRegistry.getGauges(); // metricRegistry is com.codahale.metrics.MetricRegistry
for(Entry<String, Gauge> e : g.entrySet()){
String key = e.getKey();
if(p2cMetric(key)){
String metricName = convertToMetricStandard(key);
String metricValue = String.valueOf(e.getValue().getValue());
String metricToSend = String.format("%s %s %s\n", metricName, metricValue, currTime);
try {
writeToStream(outputStream, metricToSend);
// Remove the metric from JMX after successfully sending metric to graphite
removeMetricFromJMX(key);
} catch (IOException e1) {
LOGGER.error("Unable to send metric to Graphite - {}", e1.getMessage());
}
}
}
closeOutputStream();
closeConnection();
}
I think I found the issue.
As per the DropwizardMetricServices doc - https://docs.spring.io/spring-boot/docs/current/api/org/springframework/boot/actuate/metrics/dropwizard/DropwizardMetricServices.html#submit-java.lang.String-double- ,
submit() method Set the specified gauge value.
So, I think it's recommended to use DropwizardMetricServices#submit() method to only set the values of any existing gauge metric in JMX and not for adding any new metric to JMX.
So, once I replaced DropwizardMetricServices#submit() with MetricRegistry#register() (com.codahale.metrics.MetricRegistry) method to submit all my metrics it worked as expected and my metrics are readded to JMX (once they were removed by my application).
But, I'm just wondering what makes DropwizardMetricServices#submit() to only add new metrics to JMX and not any metric that's already been removed (from JMX). Does DropwizardMetricServices cache (in memory) all the metrics submitted to JMX? that makes DropwizardMetricServices#submit() method not to resubmit the metric?
I set max active connections to 1 using the below code :
ConnectionPool initializePool(DataSource dataSource) {
if (!(org.apache.tomcat.jdbc.pool.DataSource.class.isInstance(dataSource))) {
return null;
}
org.apache.tomcat.jdbc.pool.DataSource tomcatDataSource = (org.apache.tomcat.jdbc.pool.DataSource) dataSource;
final String poolName = tomcatDataSource.getName();
try {
ConnectionPool pool = tomcatDataSource.createPool();
pool.getPoolProperties().setMaxActive(1);
pool.getPoolProperties().setInitialSize(1);
pool.getPoolProperties().setTestOnBorrow(true);
return pool;
} catch (SQLException e) {
logger.info(String.format(" !--! creation of pool failed for %s", poolName), e);
}
return null;
}
Now using threads , I have opened number of concurrent connections to DB. I also printed out the current active number of connections using the below listed code
System.out.println("Current Active Connections = " + ((org.apache.tomcat.jdbc.pool.DataSource) datasource).getActive());
System.out.println("Max Active Connections = " + ((org.apache.tomcat.jdbc.pool.DataSource) datasource).getMaxActive());
I see results similar to below. Active connections is being displayed as more than 1. However I want to restrict the max active connections to 1. Are there any other parameters that I need to set?
Current Active Connections = 9
Max Active Connections = 1
EDIT: However when I try with 15 or 20 as max active , it is always limited to 15 or 20 respectively.
Try with maxIdle and minIdle
ConnectionPool pool = tomcatDataSource.createPool();
pool.getPoolProperties().setMaxActive(1);
pool.getPoolProperties().setInitialSize(1);
pool.getPoolProperties().setMaxIdle(1);
pool.getPoolProperties().setMinIdle(1);
pool.getPoolProperties().setTestOnBorrow(true);
I am using P4Java library in my build.gradle file to sync a large zip file (>200MB) residing at a remote Perforce repository but I am encountering a "java.net.SocketTimeoutException: Read timed out" error either during the sync process or (mostly) during deleting the temporary client created for the sync operation. I am referring http://razgulyaev.blogspot.in/2011/08/p4-java-api-how-to-work-with-temporary.html for working with temporary clients using P4Java API.
I tried increasing the socket read timeout from default 30 sec as suggested in http://answers.perforce.com/articles/KB/8044 and also by introducing sleep but both approaches didn't solved the problem. Probing the server to verify the connection using getServerInfo() right before performing sync or delete operations results in a successful connection check. Can someone please point me as to where I should look for answers?
Thank you.
Providing the code snippet:
void perforceSync(String srcPath, String destPath, String server) {
// Generating the file(s) to sync-up
String[] pathUnderDepot = [
srcPath + "*"
]
// Increasing timeout from default 30 sec to 60 sec
Properties defaultProps = new Properties()
defaultProps.put(PropertyDefs.PROG_NAME_KEY, "CustomBuildApp")
defaultProps.put(PropertyDefs.PROG_VERSION_KEY, "tv_1.0")
defaultProps.put(RpcPropertyDefs.RPC_SOCKET_SO_TIMEOUT_NICK, "60000")
// Instantiating the server
IOptionsServer p4Server = ServerFactory.getOptionsServer("p4java://" + server, defaultProps)
p4Server.connect()
// Authorizing
p4Server.setUserName("perforceUserName")
p4Server.login("perforcePassword")
// Just check if connected successfully
IServerInfo serverInfo = p4Server.getServerInfo()
println 'Server info: ' + serverInfo.getServerLicense()
// Creating new client
IClient tempClient = new Client()
// Setting up the name and the root folder
tempClient.setName("tempClient" + UUID.randomUUID().toString().replace("-", ""))
tempClient.setRoot(destPath)
tempClient.setServer(p4Server)
// Setting the client as the current one for the server
p4Server.setCurrentClient(tempClient)
// Creating Client View entry
ClientViewMapping tempMappingEntry = new ClientViewMapping()
// Setting up the mapping properties
tempMappingEntry.setLeft(srcPath + "...")
tempMappingEntry.setRight("//" + tempClient.getName() + "/...")
tempMappingEntry.setType(EntryType.INCLUDE)
// Creating Client view
ClientView tempClientView = new ClientView()
// Attaching client view entry to client view
tempClientView.addEntry(tempMappingEntry)
tempClient.setClientView(tempClientView)
// Registering the new client on the server
println p4Server.createClient(tempClient)
// Surrounding the underlying block with try as we want some action
// (namely client removing) to be performed in any way
try {
// Forming the FileSpec collection to be synced-up
List<IFileSpec> fileSpecsSet = FileSpecBuilder.makeFileSpecList(pathUnderDepot)
// Syncing up the client
println "Syncing..."
tempClient.sync(FileSpecBuilder.getValidFileSpecs(fileSpecsSet), true, false, false, false)
}
catch (Exception e) {
println "Sync failed. Trying again..."
sleep(60 * 1000)
tempClient.sync(FileSpecBuilder.getValidFileSpecs(fileSpecsSet), true, false, false, false)
}
finally {
println "Done syncing."
try {
p4Server.connect()
IServerInfo serverInfo2 = p4Server.getServerInfo()
println '\nServer info: ' + serverInfo2.getServerLicense()
// Removing the temporary client from the server
println p4Server.deleteClient(tempClient.getName(), false)
}
catch(Exception e) {
println 'Ignoring exception caught while deleting tempClient!'
/*sleep(60 * 1000)
p4Server.connect()
IServerInfo serverInfo3 = p4Server.getServerInfo()
println '\nServer info: ' + serverInfo3.getServerLicense()
sleep(60 * 1000)
println p4Server.deleteClient(tempClient.getName(), false)*/
}
}
}
One unusual thing which I observed while deleting tempClient was it was actually deleting the client but still throwing "java.net.SocketTimeoutException: Read timed out" which is why I ended up commenting the second delete attempt in the second catch block.
Which version of P4Java are you using? Have you tried this out with the newest P4Java? There are notable fixes dealing with RPC sockets since the 2013.2 version forward as can be seen in the release notes:
http://www.perforce.com/perforce/doc.current/user/p4javanotes.txt
Here are some variations that you can try where you have your code to increase timeout and instantiating the server:
a] Have you tried to passing props in its own argument,? For example:
Properties prop = new Properties();
prop.setProperty(RpcPropertyDefs.RPC_SOCKET_SO_TIMEOUT_NICK, "300000");
UsageOptions uop = new UsageOptions(prop);
server = ServerFactory.getOptionsServer(ServerFactory.DEFAULT_PROTOCOL_NAME + "://" + serverPort, prop, uop);
Or something like the following:
IOptionsServer p4Server = ServerFactory.getOptionsServer("p4java://" + server, defaultProps)
You can also set the timeout to "0" to give it no timeout.
b]
props.put(RpcPropertyDefs.RPC_SOCKET_SO_TIMEOUT_NICK, "60000");
props.put(RpcPropertyDefs.RPC_SOCKET_POOL_SIZE_NICK, "5");
c]
Properties props = System.getProperties();
props.put(RpcPropertyDefs.RPC_SOCKET_SO_TIMEOUT_NICK, "60000");
IOptionsServer server =
ServerFactory.getOptionsServer("p4java://perforce:1666", props, null);
d] In case you have Eclipse users using our P4Eclipse plugin, the property can be set in the plugin preferences (Team->Perforce->Advanced) under the Custom P4Java Properties.
"sockSoTimeout" : "3000000"
REFERENCES
Class RpcPropertyDefs
http://perforce.com/perforce/doc.current/manuals/p4java-javadoc/com/perforce/p4java/impl/mapbased/rpc/RpcPropertyDefs.html
P4Eclipse or P4Java: SocketTimeoutException: Read timed out
http://answers.perforce.com/articles/KB/8044