Exception With Jedis Pipeline - java

When I use jedis like code below:
public class JedisTest extends Sync {
private static final String _SET_KEY_1 = "test1";
private static final String _SET_KEY_2 = "test2";
public void process() throws SQLException {
Set<String> appSet = getAllUserableAppkey();
final ShardedJedis jedis = RedisHelper.getJedis();
final ShardedJedisPipeline pipeline = jedis.pipelined();
for (String key : appSet) {
Set<String> result = jedis.smembers(_SET_KEY_1);
Set<String> result2 = jedis.smembers(_SET_KEY_2);
String rangName = String.format("%s::%s", "test", key);
for (int i = 0; i < 10; i++) {
pipeline.sadd(rangName, String.valueOf(i));
public Set<String> getAllUserableAppkey() {
public static void main(String[] args) throws Exception {
JedisTest jedisTest = new JedisTest();
try {
} catch (SQLException e) {
It throw the Exception like this:
Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.List
at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:224)
at redis.clients.jedis.Connection.getMultiBulkReply(Connection.java:217)
at redis.clients.jedis.Jedis.smembers(Jedis.java:1055)
at redis.clients.jedis.ShardedJedis.smembers(ShardedJedis.java:339)
at com.snda.sync.impl.test.JedisTest.process(JedisTest.java:29)
at com.snda.sync.impl.test.JedisTest.main(JedisTest.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
I can modify the code to correct like :
public void process() throws SQLException {
Set<String> appSet = getAllUserableAppkey();
final ShardedJedis jedis = RedisHelper.getJedis();
for (String key : appSet) {
final ShardedJedisPipeline pipeline = jedis.pipelined();
Set<String> result = jedis.smembers(_SET_KEY_1);
Set<String> result2 = jedis.smembers(_SET_KEY_2);
//log.warn("result1 :{},result2:{}",result,result2);
String rangName = String.format("%s::%s", "test", key);
for (int i = 0; i < 10; i++) {
pipeline.sadd(rangName, String.valueOf(i));
But I dont know why that exception throw , ,does pipline.sadd() conflict with jedis.smembers ?
Thanks for answer me!
The jedis is latest:2.7.2

You should not use Jedis instance directly while pipelined.
Pipeline uses Jedis instance' stream (not initializing new stream), and since normal operation reads response immediately and pipeline reads whole responses later, mixing up two usages gets Jedis into trouble.
P -- Pipelined / N -- Normal
Request --> P(1) P(2) N(3) N(4) P(5)
Redis Response --> P(1) P(2) N(3) N(4) P(5)
Matched request-response --> N(1 : should be 3) N(2 : should be 4) P(3 : should be 1) P(4 : should be 2) P(5)
You can see response can be easily flawed.


Apache Hbase MapReduce job take too much time while reading the datastore

I have setup Apache Hbase, Nutch and Hadoop cluster. I have crawled few documents i.e., about 30 Million. There are 3 workers in the cluster and 1 master. I have write my own Hbase mapreduce job to read crawled data and change some score little bit based on some logic.
For this purpose, I have combined the documents of same domain and found their effective bytes and found some score. Later, in reducer, I have assigned that score to each URL of that domain (via cache). This portion of job takes took much time i.e., 16 hours. Following is the code snippet
for ( int index = 0; index < Cache.size(); index++) {
String Orig_key = Cache.get(index);
float doc_score = log10;
WebPage page = datastore.get(Orig_key);
if ( page == null ) {
if (mark) {
page.getMarkers().put( Queue, Q1);
context.write(Orig_key, page);
If I remove that document read statement from datastore then job is finished in 2 to 3 hours only. That why, I think the statement WebPage page = datastore.get(Orig_key); is causing this problem. Is'nt it ?
If that is the case then what is best approach. The Cache object is simply a list that contains URLs of same domain.
public class DomainAnalysisJob implements Tool {
public static final Logger LOG = LoggerFactory
private static final Collection<WebPage.Field> FIELDS = new HashSet<WebPage.Field>();
private Configuration conf;
protected static final Utf8 URL_ORIG_KEY = new Utf8("doc_orig_id");
protected static final Utf8 DOC_DUMMY_MARKER = new Utf8("doc_marker");
protected static final Utf8 DUMMY_KEY = new Utf8("doc_id");
protected static final Utf8 DOMAIN_DUMMY_MARKER = new Utf8("domain_marker");
protected static final Utf8 LINK_MARKER = new Utf8("link");
protected static final Utf8 Queue = new Utf8("q");
private static URLNormalizers urlNormalizers;
private static URLFilters filters;
private static int maxURL_Length;
static {
* Maps each WebPage to a host key.
public static class Mapper extends GoraMapper<String, WebPage, Text, WebPage> {
protected void setup(Context context) throws IOException ,InterruptedException {
Configuration conf = context.getConfiguration();
urlNormalizers = new URLNormalizers(context.getConfiguration(), URLNormalizers.SCOPE_DEFAULT);
filters = new URLFilters(context.getConfiguration());
maxURL_Length = conf.getInt("url.characters.max.length", 2000);
protected void map(String key, WebPage page, Context context)
throws IOException, InterruptedException {
String reversedHost = null;
if (page == null) {
if ( key.length() > maxURL_Length ) {
String url = null;
try {
url = TableUtil.unreverseUrl(key);
url = urlNormalizers.normalize(url, URLNormalizers.SCOPE_DEFAULT);
url = filters.filter(url); // filter the url
} catch (Exception e) {
LOG.warn("Skipping " + key + ":" + e);
if ( url == null) {
context.getCounter("DomainAnalysis", "FilteredURL").increment(1);
try {
reversedHost = TableUtil.getReversedHost(key.toString());
catch (Exception e) {
page.getMarkers().put( URL_ORIG_KEY, new Utf8(key) );
context.write( new Text(reversedHost), page );
public DomainAnalysisJob() {
public DomainAnalysisJob(Configuration conf) {
public Configuration getConf() {
return conf;
public void setConf(Configuration conf) {
this.conf = conf;
public void updateDomains(boolean buildLinkDb, int numTasks) throws Exception {
NutchJob job = NutchJob.getInstance(getConf(), "rankDomain-update");
job.getConfiguration().setInt("mapreduce.task.timeout", 1800000);
if ( numTasks < 1) {
"mapred.map.tasks", job.getNumReduceTasks()));
} else {
ScoringFilters scoringFilters = new ScoringFilters(getConf());
HashSet<WebPage.Field> fields = new HashSet<WebPage.Field>(FIELDS);
StorageUtils.initMapperJob(job, fields, Text.class, WebPage.class,
StorageUtils.initReducerJob(job, DomainAnalysisReducer.class);
public int run(String[] args) throws Exception {
boolean linkDb = false;
int numTasks = -1;
for (int i = 0; i < args.length; i++) {
if ("-rankDomain".equals(args[i])) {
linkDb = true;
} else if ("-crawlId".equals(args[i])) {
getConf().set(Nutch.CRAWL_ID_KEY, args[++i]);
} else if ("-numTasks".equals(args[i]) ) {
numTasks = Integer.parseInt(args[++i]);
else {
throw new IllegalArgumentException("unrecognized arg " + args[i]
+ " usage: updatedomain -crawlId <crawlId> [-numTasks N]" );
LOG.info("Updating DomainRank:");
updateDomains(linkDb, numTasks);
return 0;
public static void main(String[] args) throws Exception {
final int res = ToolRunner.run(NutchConfiguration.create(),
new DomainAnalysisJob(), args);
public class DomainAnalysisReducer extends
GoraReducer<Text, WebPage, String, WebPage> {
public static final Logger LOG = DomainAnalysisJob.LOG;
public DataStore<String, WebPage> datastore;
protected static float q1_ur_threshold = 500.0f;
protected static float q1_ur_docCount = 50;
public static final Utf8 Queue = new Utf8("q"); // Markers for Q1 and Q2
public static final Utf8 Q1 = new Utf8("q1");
public static final Utf8 Q2 = new Utf8("q2");
protected void setup(Context context) throws IOException,
InterruptedException {
Configuration conf = context.getConfiguration();
try {
datastore = StorageUtils.createWebStore(conf, String.class, WebPage.class);
catch (ClassNotFoundException e) {
throw new IOException(e);
q1_ur_threshold = conf.getFloat("domain.queue.threshold.bytes", 500.0f);
q1_ur_docCount = conf.getInt("domain.queue.doc.count", 50);
LOG.info("Conf updated: Queue-bytes-threshold = " + q1_ur_threshold + " Queue-doc-threshold: " + q1_ur_docCount);
protected void cleanup(Context context) throws IOException, InterruptedException {
protected void reduce(Text key, Iterable<WebPage> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> Cache = new ArrayList<String>();
int doc_counter = 0;
int total_ur_bytes = 0;
for ( WebPage page : values ) {
// cache
String orig_key = page.getMarkers().get( DomainAnalysisJob.URL_ORIG_KEY ).toString();
// do not consider those doc's that are not fetched or link URLs
if ( page.getStatus() == CrawlStatus.STATUS_UNFETCHED ) {
int ur_score_int = 0;
int doc_ur_bytes = 0;
int doc_total_bytes = 0;
String ur_score_str = "0";
String langInfo_str = null;
// read page and find its Urdu score
langInfo_str = TableUtil.toString(page.getLangInfo());
if (langInfo_str == null) {
ur_score_str = TableUtil.toString(page.getUrduScore());
ur_score_int = Integer.parseInt(ur_score_str);
doc_total_bytes = Integer.parseInt( langInfo_str.split("&")[0] );
doc_ur_bytes = ( doc_total_bytes * ur_score_int) / 100; //Formula to find ur percentage
total_ur_bytes += doc_ur_bytes;
float avg_bytes = 0;
float log10 = 0;
if ( doc_counter > 0 && total_ur_bytes > 0) {
avg_bytes = (float) total_ur_bytes/doc_counter;
log10 = (float) Math.log10(avg_bytes);
log10 = (Math.round(log10 * 100000f)/100000f);
context.getCounter("DomainAnalysis", "DomainCount").increment(1);
// if average bytes and doc count, are more than threshold then mark as q1
boolean mark = false;
if ( avg_bytes >= q1_ur_threshold && doc_counter >= q1_ur_docCount ) {
mark = true;
for ( int index = 0; index < Cache.size(); index++) {
String Orig_key = Cache.get(index);
float doc_score = log10;
WebPage page = datastore.get(Orig_key);
if ( page == null ) {
if (mark) {
page.getMarkers().put( Queue, Q1);
context.write(Orig_key, page);
In my testing and debugging, I have found that the statement WebPage page = datastore.get(Orig_key); is major cause of too much time. It took about 16 hours to complete the job but when I replaced this statement with WebPage page = WebPage.newBuilder().build(); the time was reduced to 6 hours. Is this due to IO ?

Get row on Spark in map Call

Itry to aggregate data from a file in HDFS.
I need to add some details from those datas with value on a specific Table in hbase.
but I have the exception :
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.map(RDD.scala:286)
at org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:113)
at org.apache.spark.api.java.AbstractJavaRDDLike.mapToPair(JavaRDDLike.scala:46)
at ......
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation
Serialization stack:
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
I know that the problem occured when we try to access to the hbase during the map function.
My question is: how to complete my RDDs with the value contains on the hbase Table.
for example:
file in hdfs are csv:
in hbase we have data associate to the name toto.
i need to retrieve the sum of Number1 and Number 2 (that the easiest part)
and aggregate with the data in the table.
for example:
the key for the reducer will be tata and be retrieve by get the rowkey toto in the hbase table.
Any suggestions?
Finally a colleague did it, thanks to yours advice:
so this is the code of the map that permits to aggregate a file with datas from the hbase table.
private final Logger LOGGER = LoggerFactory.getLogger(AbtractGetSDMapFunction.class);
* Namespace name
public static final String NAMESPACE = "NameSpace";
private static final String ID = "id";
private Connection connection = null;
private static final String LINEID = "l";
private static final String CHANGE_LINE_ID = "clid";
private static final String CHANGE_LINE_DATE = "cld";
private String constClientPortHBase;
private String constQuorumHBase;
private int constTimeOutHBase;
private String constZnodeHBase;
public void initConnection() {
Configuration conf = HBaseConfiguration.create();
conf.setInt("timeout", constTimeOutHBase);
conf.set("hbase.zookeeper.quorum", constQuorumHBase);
conf.set("hbase.zookeeper.property.clientPort", constClientPortHBase);
conf.set("zookeeper.znode.parent", constZnodeHBase);
try {
connection = HConnectionManager.createConnection(conf);
} catch (Exception e) {
LOGGER.error("Error in the configuration of the connection with HBase.", e);
public Tuple2<String, myInput> call(String row) throws Exception {
//this is where you need to init the connection for hbase to avoid serialization problem
....do your work
State state = getCurrentState(myInput.getKey());
....do your work
public AbtractGetSDMapFunction( String constClientPortHBase, String constQuorumHBase, String constZnodeHBase, int constTimeOutHBase) {
this.constClientPortHBase = constClientPortHBase;
this.constQuorumHBase = constQuorumHBase;
this.constZnodeHBase = constZnodeHBase;
this.constTimeOutHBase = constTimeOutHBase;
* Table Name
public static final String TABLE_NAME = "Table";
public state getCurrentState(String key) throws TechnicalException {
LOGGER.debug("start key {}", key);
String buildRowKey = buildRowKey(key);
State currentState = new State();
String columnFamily = State.getColumnFamily();
if (!StringUtils.isEmpty(buildRowKey) && null != columnFamily) {
try {
Get scan = new Get(Bytes.toBytes(buildRowKey));
addColumnsToScan(scan, columnFamily, ID);
Result result = getTable().get(scan);
currentState.setCurrentId(getLong(result, columnFamily, ID));
} catch (IOException ex) {
throw new TechnicalException(ex);
LOGGER.debug("end ");
return currentState;
private Table getTable() throws IOException, TechnicalException {
Connection connection = getConnection();
// Table retrieve
if (connection != null) {
Table table = connection.getTable(TableName.valueOf(NAMESPACE, TABLE_NAME));
return table;
} else {
throw new TechnicalException("Connection to Hbase not available");
private Long getLong(Result result, String columnFamily, String qualifier) {
Long toLong = null;
if (null != columnFamily && null != qualifier) {
byte[] value = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
toLong = (value != null ? Bytes.toLong(value) : null);
return toLong;
private String getString(Result result, String columnFamily, String qualifier) {
String toString = null;
if (null != columnFamily && null != qualifier) {
byte[] value = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
toString = (value != null ? Bytes.toString(value) : null);
return toString;
public Connection getConnection() {
return connection;
public void setConnection(Connection connection) {
this.connection = connection;
private void addColumnsToScan(Get scan, String family, String qualifier) {
if (org.apache.commons.lang.StringUtils.isNotEmpty(family) && org.apache.commons.lang.StringUtils.isNotEmpty(qualifier)) {
scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));
private String buildRowKey(String key) throws TechnicalException {
StringBuilder rowKeyBuilder = new StringBuilder();
return rowKeyBuilder.toString();

How to reuse redis(JRedis) pool connection in Java

I am using Redis(3.2.100) for Windows to cache my database data in Java.This is my redis init code:
private static Dictionary<Integer, JedisPool> pools = new Hashtable();
static {
JedisPoolConfig config = new JedisPoolConfig();
for (int i = 0; i < 16; i++) {
JedisPool item = new JedisPool(config, "", 6379,10*1000);
pools.put(i, item);
This is the cache code:
public static String get(String key, Integer db) {
JedisPool poolItem = pools.get(db);
Jedis jredis = poolItem.getResource();
String result = jredis.get(key);
return result;
The problem is when the program run for a while,the getResource method throws:
redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
So how to reuse the connection or close the connection.I am using this command to find out that the client has reached the max value.
D:\Program Files\Redis>redis-cli.exe info clients
# Clients
How to fix it?
Remember to close the redis connection,modify this function like this:
public static String get(String key, Integer db) {
JedisPool poolItem = pools.get(db);
Jedis jredis = null;
String result = null;
try {
jredis = poolItem.getResource();
result = jredis.get(key);
} catch (Exception e) {
log.error("get value error", e);
} finally {
if (jredis != null) {
return result;

Call function failing in JACOB

Am trying to call a function called "set" using COM interface
am getting the exception
Exception in thread "main" com.jacob.com.ComFailException: Can't map name to dispid: set
eventhough when I try to call the function in matlab, it's working okey...
this is the function am using
public void setAttribute(String attribute, int value) {
Variant[] vars = new Variant[3];
vars[0] = new Variant("AttValue");
vars[1] = new Variant(attribute);
vars[2] = new Variant(value);
signalGroup.invoke("set", vars);
public void setIndexedAttribute(String attribute, Variant value) {
Variant[] indecies = new Variant[1];
indecies[0] = new Variant(attribute);
setProperty(signalGroup, "AttValue", indecies, value);
public void setProperty(Dispatch activex, String attributeName, Variant[] indecies,
Variant value) {
Variant[] variants = new Variant[indecies.length + 1];
for (int i = 0; i < indecies.length; i++) {
variants[i] = indecies[i];
variants[variants.length - 1] = value;
Dispatch.invoke(activex, attributeName, Dispatch.Put, variants,new int[variants.length]);
example to use it....
sg_1.setIndexedAttribute("State", new Variant(10));

How to Mock repository Items in ATG

I am trying to create a Mock class for droplet. I am able to mock the repository calls and req.getParameter but need help on how to mock the repository item list from the repository. Below is the sample code.
for (final RepositoryItem item : skuList) {
final String skuId = (String) item.getPropertyValue("id");
final String skuType = (String) item.getPropertyValue("skuType");
if (this.isLoggingDebug()) {
this.logDebug("skuType [ " + skuType + " ]");
final String skuActive = (String) item.getPropertyValue("isActive");
if EJSD.equalsIgnoreCase(skuType) && (skuActive.equals("1"))) {
skuCode = (String) item.getPropertyValue(ESTConstants.SKU_MISC1);
} else (PJPROMIS.equalsIgnoreCase(skuType) && skuId.contains("PP") && (skuActive.equals("1"))) {
String tmp = "";
if (skuId.lastIndexOf("-") > -1) {
tmp = skuId.substring(skuId.lastIndexOf("-") + 1);
tmp = tmp.toUpperCase();
if (this.getDefaultDisplayNameMap() != null) {
String val = this.getDefaultDisplayNameMap().get(tmp);
if (StringUtils.isNotEmpty(val)) {
displayNameMap.put(skuId, val);
} else {
val = (String) item.getPropertyValue("displayName");
displayNameMap.put(skuId, val);
} else {
final String val = (String) item.getPropertyValue("displayName");
displayNameMap.put(skuId, val);
There are a multitude of ways to 'mock' the list. I've been doing it this was as I feel it is more readable.
#Mock private RepositoryItem skuMockA;
#Mock private RepositoryItem skuMockB;
List<RepositoryItem> skuList = new ArrayList<RepositoryItem>();
#BeforeMethod(groups = { "unit" })
public void setup() throws Exception {
testObj = new YourDropletName();
skuList = new ArrayList<RepositoryItem>();
So when you then call this within a test it will be something like this:
So the key really is that you are not mocking the List but instead the contents of the List.
Creating a mock using mockito is a good option.
But I am here explaining a different way of mocking the repository item.
Create a common implementation for RepositoryItem, say MockRepositoryItemImpl like this in your test package.
Public MockRepositoryItemImpl implements RepositoryItem {
private Map<String, Object> properties;
properties = new HashMap<>();
public Object getPropertyValue(String propertyName){
return properties.get(propertyName);
public void setPropertyValue(String propertyName, Object propertyValue){
properties.put(propertyName, propertyValue);
Use this implementation to create the mock object in your test case.
RepositoryItem mockSKU = new MockRepositoryItemImpl();
mockSKU.setPropertyValue("id", "sku0001");
mockSKU.setPropertyValue("displayName", "Mock SKU");
mockSKU.setPropertyValue("skuType", "Type1");
mockSKU.setPropertyValue("isActive", "1");
