Spark MapwithState stateSnapshots Not scaling (Java)

Spark MapwithState stateSnapshots Not scaling (Java) - java

I am using spark to receive data from Kafka Stream to receive the status about IOT devices which are sending regular health updates and about state of the various sensors present in the devices . My Spark application listens to single topic to receive update messages from Kafka stream using Spark direct stream. I need to trigger different alarms based on the state of the sensors for each devices. However when I add more IOT devices which sends data to spark using Kakfa, Spark does not scale despite adding more number of machines and with number of executors increased . Below I have given the strip down version of my Spark application where notification triggering part removed with the same performance issues.
// Method for update the Device state , it just a in memory object which tracks the device state .
private static Optional<DeviceState> trackDeviceState(Time time, String key, Optional<ProtoBufEventUpdate> updateOpt,
State<DeviceState> state) {
int batchTime = toSeconds(time);
ProtoBufEventUpdate eventUpdate = (updateOpt == null)?null:updateOpt.orNull();
if(eventUpdate!=null)
eventUpdate.setBatchTime(ProximityUtil.toSeconds(time));
if (state!=null && state.exists()) {
DeviceState deviceState = state.get();
if (state.isTimingOut()) {
deviceState.markEnd(batchTime);
}
if (updateOpt.isPresent()) {
deviceState = DeviceState.updatedDeviceState(deviceState, eventUpdate);
state.update(deviceState);
}
} else if (updateOpt.isPresent()) {
DeviceState deviceState = DeviceState.newDeviceState(eventUpdate);
state.update(deviceState);
return Optional.of(deviceState);
}
return Optional.absent();
}
SparkConf conf = new SparkConf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.streaming.receiver.writeAheadLog.enable", "true")
.set("spark.rpc.netty.dispatcher.numThreads", String.valueOf(Runtime.getRuntime().availableProcessors()))
JavaStreamingContext context= new JavaStreamingContext(conf, Durations.seconds(10));
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put( “zookeeper.connect”, “192.168.60.20:2181,192.168.60.21:2181,192.168.60.22:2181”);
kafkaParams.put("metadata.broker.list", “192.168.60.20:9092,192.168.60.21:9092,192.168.60.22:9092”);
kafkaParams.put(“group.id”, “spark_iot”);
HashSet<String> topics=new HashSet<>();
topics.add(“iottopic”);
JavaPairInputDStream<String, ProtoBufEventUpdate> inputStream = KafkaUtils.
createDirectStream(context, String.class, ProtoBufEventUpdate.class, KafkaKryoCodec.class, ProtoBufEventUpdateCodec.class, kafkaParams, topics);
JavaPairDStream<String, ProtoBufEventUpdate> updatesStream = inputStream.mapPartitionsToPair(t -> {
List<Tuple2<String, ProtoBufEventUpdate>> eventupdateList=new ArrayList<>();
t.forEachRemaining(tuple->{
String key=tuple._1;
ProtoBufEventUpdate eventUpdate =tuple._2;
Util.mergeStateFromStats(eventUpdate);
eventupdateList.add(new Tuple2<String, ProtoBufEventUpdate>(key,eventUpdate));
});
return eventupdateList.iterator();
});
JavaMapWithStateDStream<String, ProtoBufEventUpdate, DeviceState, DeviceState> devceMapStream = null;
devceMapStream=updatesStream.mapWithState(StateSpec.function(Engine::trackDeviceState)
.numPartitions(20)
.timeout(Durations.seconds(1800)));
devceMapStream.checkpoint(new Duration(batchDuration*1000));
JavaPairDStream<String, DeviceState> deviceStateStream = devceMapStream
.stateSnapshots()
.cache();
deviceStateStream.foreachRDD(rdd->{
if(rdd != null && !rdd.isEmpty()){
rdd.foreachPartition(tuple->{
tuple.forEachRemaining(t->{
SparkExecutorLog.error("Engine::getUpdates Tuple data "+ t._2);
});
});
}
});
Even when the load is increasing I don't see the CPU usage increasing for Executor instances . Most of the time Executor instances CPU is idling. I tried increasing kakfa paritions (Currently Kafka is having 72 partitions. I did try to bring it down to 36 also) . Also I tried increasing devceMapStream partitions . but I couldn't see any performance improvements . The code is not spending any time on IO.
I am running our Spark Appication with 6 executor instances on Amazon EMR(Yarn) with each machine having 4 cores and 32 gb Ram. It tried to increate the number of executor instances to 9 then to 15, but didn't see any performance improvement. Also Played a bit around on spark.default.parallelism value by setting it 20, 36, 72, 100 , but I could see 20 was the one which gave me better performance (Maybe number of cores per executor has some influence on this) .
spark-submit --deploy-mode cluster --class com.ajay.Engine --supervise --driver-memory 5G --driver-cores 8 --executor-memory 4G --executor-cores 4 --conf spark.default.parallelism=20 --num-executors 36 --conf spark.dynamicAllocation.enabled=false --conf spark.streaming.unpersist=false --conf spark.eventLog.enabled=false --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties --conf spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError --conf spark.executor.extraJavaOptions=-XX:HeapDumpPath=/tmp --conf spark.executor.extraJavaOptions=-XX:+UseG1GC --conf spark.driver.extraJavaOptions=-XX:+UseG1GC --conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties s3://test/engine.jar
At present Spark is struggling to complete the processing in 10 seconds (I have even tried different batch duration like 5, 10, 15 etc) . Its taking 15-23 seconds to complete one batch with input rate of 1600 records per seconds and having 17000 records for each batch. I need to use statesteam to check the state of the devices periodically to see whether the device is raising any alarms or any sensors have stopped responding. I am not sure how I can improve the performance my spark application ?

mapWithState does the following:
applying a function to every key-value element of this stream, while maintaining some state data for each unique key
as per its docs: PairDStreamFunctions#mapWithState
which also means that for every batch all the elements with the same key are processed in sequence, and, because the function in StateSpec is arbitrary and provided by us, with no state combiners defined, it can't be parallelized any further, no matter how you partition the data before mapWithState. I.e. when keys are diverse, parallelization will be good, but if all the RDD elements have just a few unique keys among them, then the whole batch will be mostly processed by just the number of cores equal to the number of unique keys.
In your case, keys come from Kafka:
t.forEachRemaining(tuple->{
String key=tuple._1;
and your code snippet doesn't show how they are generated.
From my experience, this is what may be happening: some part of your batches is getting quickly processed by multiple cores, and another part, having same key for a substantial part of the whole, takes more time and delays the batch, and that's why you see just a few tasks running most of the time, while there are under-loaded executors.
To see if it's true, check your keys distribution, how many elements are there for each key, can it be that just a couple of keys has 20% of all the elements? If this is true, you have these options:
change your keys generation algorithm
artificially split problematic keys before mapWithState and combine state snapshots later to make sense for the whole
cap the number of elements with the same key to be processed in each batch, either ignore elements after first N in every batch, or send them elsewhere, into some "can't process in time" Kafka stream and deal with them separately

Related

Getting OutOfMemoryError GC overhead limit exceeded when collecting a dataset in Java Spark

I have some data of approximate size 250MB.
I want to load the data and convert it to a map
class MyData implements Serializable {
private Map<String, List<SomeObject>> myMap;
MyData(SparkSession sparkSession, String inputPath) {
Dataset<Klass> ds = sparkSession.read().json(inputPath).as(Encoders.bean(Klass.class));
myMap = ds.collectAsList().stream().collect(Collectors.toMap(
Klass::getField1(),
Klass::getField2()
)
);
}
}
This is my spark execution configuration
--master yarn --deploy-mode cluster --executor-cores 2 --num-executors 200 --executor-memory 10240M
Is it not a good practice to convert dataset to a list/map ? Or is it a configuration issue ? Or a code issue ?

It looks like your collecting all the data in the Dataset into the Spark driver with:
myMap = ds.collectAsList()...
Therefore you should set the driver memory with --driver-memory 2G on the command line (aka your "spark execution configuration".
The default value for this parameter is 1G which is likely not quite enough for 250M of raw data.

Cassandra, Java and MANY Async request : is this good?

I'm developping a Java application with Cassandra with my table :
id | registration | name
1 1 xxx
1 2 xxx
1 3 xxx
2 1 xxx
2 2 xxx
... ... ...
... ... ...
100,000 34 xxx
My tables have very large amount of rows (more than 50,000,000). I have a myListIds of String id to iterate over. I could use :
SELECT * FROM table WHERE id IN (1,7,18, 34,...,)
//image more than 10,000,000 numbers in 'IN'
But this is a bad pattern. So instead I'm using async request this way :
List<ResultSetFuture> futures = new ArrayList<>();
Map<String, ResultSetFuture> map = new HashMap<>();
// map : key = id & value = data from Cassandra
for (String id : myListIds)
{
ResultSetFuture resultSetFuture = session.executeAsync(statement.bind(id));
mapFutures.put(id, resultSetFuture);
}
Then I will process my data with getUninterruptibly() method.
Here is my problems : I'm doing maybe more than 10,000,000 Casandra request (one request for each 'id'). And I'm putting all these results inside a Map.
Can this cause heap memory error ? What's the best way to deal with that ?
Thank you

Note: your question is "is this a good design pattern".
If you are having to perform 10,000,000 cassandra data requests then you have structured your data incorrectly. Ultimately you should design your database from the ground up so that you only ever have to perform 1-2 fetches.
Now, granted, if you have 5000 cassandra nodes this might not be a huge problem(it probably still is) but it still reeks of bad database design. I think the solution is to take a look at your schema.

I see the following problems with your code:
Overloaded Cassandra cluster, it won't be able to process so many async requests, and you requests will be failed with NoHostAvailableException
Overloaded cassadra driver, your client app will fails with IO exceptions, because system will not be able process so many async requests.(see details about connection tuning https://docs.datastax.com/en/developer/java-driver/3.1/manual/pooling/)
And yes, memory issues are possible. It depends on the data size
Possible solution is limit number of async requests and process data by chunks.(E.g see this answer )

Apache Spark with Spark JobServer crash after some hours

I'm using Apache Spark 2.0.2 together with Apache JobServer 0.7.0.
I know this is not a best practice but this is a first step. My server have 52 Gb RAM and 6 CPU Cores, Cent OS 7 x64, Java(TM) SE Runtime Environment (build 1.7.0_79-b15) and it have the following running applications with the specified memory configuration.
JBoss AS 7 (6 Gb)
PDI Pentaho 6.0 (12 Gb)
MySQL (20 Gb)
Apache Spark 2.0.2 (8 Gb)
I start it and everything works as expected. And works so for several hours.
I have a jar with 2 implemented jobs who extends from My_Job class.
public class VIQ_SparkJob extends JavaSparkJob {
protected SparkSession sparkSession;
protected String TENANT_ID;
#Override
public Object runJob(SparkContext jsc, Config jobConfig) {
sparkSession = SparkSession.builder()
.sparkContext(ctx)
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "file:///value_iq/spark-warehouse/")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer", "8m")
.getOrCreate();
Class<?>[] classes = new Class<?>[2];
classes[0] = UsersCube.class;
classes[1] = ImportCSVFiles.class;
sparkSession.sparkContext().conf().registerKryoClasses(classes);
TENANT_ID = jobConfig.getString("tenant_id");//parameters.getString("tenant_id");
return true;
}
#Override
public SparkJobValidation validate(SparkContext sc, Config config) {
return SparkJobValid$.MODULE$; //To change body of generated methods, choose Tools | Templates.
}
}
This Job import the data from some .csv files and store them as parquet files partitioned by tenant. Are 2 entities users which ocupe 674 Mb in disk as a parquet files and user_processed with 323 Mb.
#Override
public Object runJob(SparkContext jsc, Config jobConfig) {
super.runJob(jsc, jobConfig);
String entity= jobConfig.getString("entity");
Dataset<Row> ds = sparkSession.read()
.option("header", "true")
.option("inferschema", true)
.csv(csvPath);
ds.withColumn("tenant_id", ds.col("tenant_id").cast("int"))
.write()
.mode(SaveMode.Append)
.partitionBy(JavaConversions.asScalaBuffer(asList("tenant_id")))
.parquet("/value_iq/spark-warehouse/"+entity);
return null;
}
This one is to query the parquet files:
#Override
public Object runJob(SparkContext jsc, Config jobConfig) {
super.runJob(jsc, jobConfig); //To change body of generated methods, choose Tools | Templates.
String query = jobConfig.getString("query");
Dataset<Row> lookup_values = getDataFrameFromMySQL("value_iq", "lookup_values").filter(new Column("lookup_domain").equalTo("customer_type"));
Dataset<Row> users = getDataFrameFromParket(USERS + "/tenant_id=" + TENANT_ID);
Dataset<Row> user_profiles = getDataFrameFromParket(USER_PROCESSED + "/tenant_id=" + TENANT_ID);
lookup_values.createOrReplaceTempView("lookup_values");
users.createOrReplaceTempView("users");
user_profiles.createOrReplaceTempView("user_processed");
//CREATING VIEWS DE AND EN
sparkSession
.sql(Here I join the 3 datasets)
.coalesce(200)
.createOrReplaceTempView("cube_users_v_de");
List<String> list = sparkSession.sql(query).limit(1000).toJSON().takeAsList(1000);
String result = "[";
for (int i = 0; i < list.size(); i++) {
result += (i == 0 ? "" : ",") + list.get(i);
}
result += "]";
return result;
}
Every day I run the first job saving to parquet files some csv. And during the day I execute some queries to the second one. But after some hours crash because of out of memory this the error log:
k.memory.TaskMemoryManager [] [akka://JobServer/user/context-supervisor/application_analytics] - Failed to allocate a page (8388608 bytes), try again.
WARN .netty.NettyRpcEndpointRef [] [] - Error sending message [message = Heartbeat(0,[Lscala.Tuple2;#18c54652,BlockManagerId(0, 157.97.107.42, 55223))] in 1 attempt
I have the master and one worker in this server. My spark-defaults.conf
spark.debug.maxToStringFields 256
spark.shuffle.service.enabled true
spark.shuffle.file.buffer 64k
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 5
spark.rdd.compress true
This is my Spark Jobserver settings.sh
DEPLOY_HOSTS="myserver.com"
APP_USER=root
APP_GROUP=root
JMX_PORT=5051
INSTALL_DIR=/bin/spark/job-server-master
LOG_DIR=/var/log/job-server-master
PIDFILE=spark-jobserver.pid
JOBSERVER_MEMORY=2G
SPARK_VERSION=2.0.2
MAX_DIRECT_MEMORY=2G
SPARK_CONF_DIR=$SPARK_HOME/conf
SCALA_VERSION=2.11.8
I create my context with the following curl:
curl -k --basic --user 'user:password' -d "" 'https://localhost:4810/contexts/application?num-cpu-cores=5&memory-per-node=8G'
And the spark driver use 2Gb.
The created application looks like
ExecutorID Worker Cores Memory State Logs
0 worker-20170203084218-157.97.107.42-50199 5 8192 RUNNING stdout stderr
Those are my executors
Executor ID Address ▴ Status RDD Blocks Storage Memory Disk Used Cores
driver 157.97.107.42:55222 Active 0 0.0 B / 1018.9 MB 0.0 B 0
0 157.97.107.42:55223 Active 0 0.0 B / 4.1 GB 0.0 B 5
I have a process who checks the memory used per process and the top amount was 8468 Mb.
There are 4 processes related with spark.
The master process. Start with 1Gb memory assigned, I don't know from where this configuration cames. But seems to be enough. Use only 0.4 Gb at top.
The worker process. The same as the master with the memory use.
The driver process. Who have 2Gb configured.
The context. Who have 8Gb configured.
In the following table you can see how the memory used by the driver and contexts behaves. After getting the java.lang.OutOfMemoryError: Java heap space. The context fails, but the driver acept another context, so it remains fine.
system_user | RAM(Mb) | entry_date
--------------+----------+---------------------
spark.driver 2472.11 2017-02-07 10:10:18 //Till here everything was fine
spark.context 5470.19 2017-02-07 10:10:18 //it was running for more thant 48 hours
spark.driver 2472.11 2017-02-07 10:11:18 //Then I execute three big concurrent queries
spark.context 0.00 2017-02-07 10:11:18 //and I get java.lang.OutOfMemoryError: Java heap space
//in $LOG_FOLDER/job-server-master/server_startup.log
# I've check and the context was still present in the jobserver but unresponding.
#in spark the application was killed
spark.driver 2472.11 2017-02-07 10:16:18 //Here I have deleted and created again
spark.context 105.20 2017-02-07 10:16:18
spark.driver 2577.30 2017-02-07 10:19:18 //Here I execute the three big
spark.context 3734.46 2017-02-07 10:19:18 //concurrent queries again.
spark.driver 2577.30 2017-02-07 10:20:18 //Here after the queries where
spark.context 5154.60 2017-02-07 10:20:18 //executed. No memory issue.
I have 2 questions:
1- Why when I check the spark GUI my driver who has 2 configured Gb only use 1, the same with the executor 0 which only use 4.4 Gb. Where goes the other configured memory? But when the processes in the system the driver it use 2Gb.
2- If I have enough memory on the server then why I'm out of memory?

How could a distributed queue-like-thing be implemented on top of a RBDMS or NOSQL datastore or other messaging system (e.g., rabbitmq)?

From the wouldn't-it-be-cool-if category of questions ...
By "queue-like-thing" I mean supports the following operations:
append(entry:Entry) - add entry to tail of queue
take(): Entry - remove entry from head of queue and return it
promote(entry_id) - move the entry one position closer to the head; the entry that currently occupies that position is moved in the old position
demote(entry_id) - the opposite of promote(entry_id)
Optional operations would be something like:
promote(entry_id, amount) - like promote(entry_id) except you specify the number of positions
demote(entry_id, amount) - opposite of promote(entry_id, amount)
of course, if we allow amount to be positive or negative, we can consolidate the promote/demote methods with a single move(entry_id, amount) method
It would be ideal if the following operations could be performed on the queue in a distributed fashion (multiple clients interacting with the queue):
queue = ...
queue.append( a )
queue.append( b )
queue.append( c )
print queue
"a b c"
queue.promote( b.id )
print queue
"b a c"
queue.demote( a.id )
"b c a"
x = queue.take()
print x
"b"
print queue
"c a"
Are there any data stores that are particularly apt for this use case? The queue should always be in a consistent state even if multiple users are modifying the queue simultaneously.
If it weren't for the promote/demote/move requirement, there wouldn't be much of a problem.
Edit:
Bonus points if there are Java and/or Python libraries to accomplish the task outlined above.
Solution should scale extremely well.

Redis supports lists and ordered sets: http://redis.io/topics/data-types#lists
It also supports transactions and publish/subscribe messaging. So, yes, I would say this can be easily done on redis.
Update: In fact, about 80% of it has been done many times: http://www.google.co.uk/search?q=python+redis+queue
Several of those hits could be upgraded to add what you want. You would have to use transactions to implement the promote/demote operations.
It might be possible to use lua on the server side to create that functionality, rather than having it in client code. Alternatively, you could create a thin wrapper around redis on the server, that implements just the operations you want.

Python: "Batteries Included"
Rather than looking to a data store like RabbitMQ, Redis, or an RDBMS, I think python and a couple libraries have more than enough to solve this problem. Some may complain that this do-it-yourself approach is re-inventing the wheel but I prefer running a hundred lines of python code over managing another data store.
Implementing a Priority Queue
The operations that you define: append, take, promote, and demote, describe a priority queue. Unfortunately python doesn't have a built-in priority queue data type. But it does have a heap library called heapq and priority queues are often implemented as heaps. Here's my implementation of a priority queue meeting your requirements:
class PQueue:
"""
Implements a priority queue with append, take, promote, and demote
operations.
"""
def __init__(self):
"""
Initialize empty priority queue.
self.toll is max(priority) and max(rowid) in the queue
self.heap is the heap maintained for take command
self.rows is a mapping from rowid to items
self.pris is a mapping from priority to items
"""
self.toll = 0
self.heap = list()
self.rows = dict()
self.pris = dict()
def append(self, value):
"""
Append value to our priority queue.
The new value is added with lowest priority as an item. Items are
threeple lists consisting of [priority, rowid, value]. The rowid
is used by the promote/demote commands.
Returns the new rowid corresponding to the new item.
"""
self.toll += 1
item = [self.toll, self.toll, value]
self.heap.append(item)
self.rows[self.toll] = item
self.pris[self.toll] = item
return self.toll
def take(self):
"""
Take the highest priority item out of the queue.
Returns the value of the item.
"""
item = heapq.heappop(self.heap)
del self.pris[item[0]]
del self.rows[item[1]]
return item[2]
def promote(self, rowid):
"""
Promote an item in the queue.
The promoted item swaps position with the next highest item.
Returns the number of affected rows.
"""
if rowid not in self.rows: return 0
item = self.rows[rowid]
item_pri, item_row, item_val = item
next = item_pri - 1
if next in self.pris:
iota = self.pris[next]
iota_pri, iota_row, iota_val = iota
iota[1], iota[2] = item_row, item_val
item[1], item[2] = iota_row, iota_val
self.rows[item_row] = iota
self.rows[iota_row] = item
return 2
return 0
The demote command is nearly identical to the promote command so I'll omit it for brevity. Note that this depends only on python's lists, dicts, and heapq library.
Serving our Priority Queue
Now with the PQueue data type, we'd like to allow distributed interactions with an instance. A great library for this is gevent. Though gevent is relatively new and still beta, it's wonderfully fast and well tested. With gevent, we can setup a socket server listening on localhost:4040 pretty easily. Here's my server code:
pqueue = PQueue()
def pqueue_server(sock, addr):
text = sock.recv(1024)
cmds = text.split(' ')
if cmds[0] == 'append':
result = pqueue.append(cmds[1])
elif cmds[0] == 'take':
result = pqueue.take()
elif cmds[0] == 'promote':
result = pqueue.promote(int(cmds[1]))
elif cmds[0] == 'demote':
result = pqueue.demote(int(cmds[1]))
else:
result = ''
sock.sendall(str(result))
print 'Request:', text, '; Response:', str(result)
if args.listen:
server = StreamServer(('127.0.0.1', 4040), pqueue_server)
print 'Starting pqueue server on port 4040...'
server.serve_forever()
Before that runs in production, you'll of course want to do some better error/buffer handling. But it'll work just fine for rapid-prototyping. Notice that this doesn't require any locking around the pqueue object. Gevent doesn't actually run code in parallel, it just gives that impression. The drawback is that more cores won't help but the benefit is lock-free code.
Don't get me wrong, the gevent SocketServer will process multiple requests at the same time. But it switches between answering requests through cooperative multitasking. This means you have to yield the coroutine's time slice. While gevents socket I/O functions are designed to yield, our pqueue implementation is not. Fortunately, the pqueue completes it's tasks really quickly.
A Client Too
While prototyping, I found it useful to have a client as well. It took some googling to write a client so I'll share that code too:
if args.client:
while True:
msg = raw_input('> ')
sock = gsocket.socket(gsocket.AF_INET, gsocket.SOCK_STREAM)
sock.connect(('127.0.0.1', 4040))
sock.sendall(msg)
text = sock.recv(1024)
sock.close()
print text
To use the new data store, first start the server and then start the client. At the client prompt you ought to be able to do:
> append one
1
> append two
2
> append three
3
> promote 2
2
> promote 2
0
> take
two
Scaling Extremely Well
Given your thinking about a data store, it seems you're really concerned with throughput and durability. But "scale extremely well" doesn't quantify your needs. So I decided to benchmark the above with a test function. Here's the test function:
def test():
import time
import urllib2
import subprocess
import random
random = random.Random(0)
from progressbar import ProgressBar, Percentage, Bar, ETA
widgets = [Percentage(), Bar(), ETA()]
def make_name():
alphabet = 'abcdefghijklmnopqrstuvwxyz'
return ''.join(random.choice(alphabet)
for rpt in xrange(random.randrange(3, 20)))
def make_request(cmds):
sock = gsocket.socket(gsocket.AF_INET, gsocket.SOCK_STREAM)
sock.connect(('127.0.0.1', 4040))
sock.sendall(cmds)
text = sock.recv(1024)
sock.close()
print 'Starting server and waiting 3 seconds.'
subprocess.call('start cmd.exe /c python.exe queue_thing_gevent.py -l',
shell=True)
time.sleep(3)
tests = []
def wrap_test(name, limit=10000):
def wrap(func):
def wrapped():
progress = ProgressBar(widgets=widgets)
for rpt in progress(xrange(limit)):
func()
secs = progress.seconds_elapsed
print '{0} {1} records in {2:.3f} s at {3:.3f} r/s'.format(
name, limit, secs, limit / secs)
tests.append(wrapped)
return wrapped
return wrap
def direct_append():
name = make_name()
pqueue.append(name)
count = 1000000
#wrap_test('Loaded', count)
def direct_append_test(): direct_append()
def append():
name = make_name()
make_request('append ' + name)
#wrap_test('Appended')
def append_test(): append()
...
print 'Running speed tests.'
for tst in tests: tst()
Benchmark Results
I ran 6 tests against the server running on my laptop. I think the results scale extremely well. Here's the output:
Starting server and waiting 3 seconds.
Running speed tests.
100%|############################################################|Time: 0:00:21
Loaded 1000000 records in 21.770 s at 45934.773 r/s
100%|############################################################|Time: 0:00:06
Appended 10000 records in 6.825 s at 1465.201 r/s
100%|############################################################|Time: 0:00:06
Promoted 10000 records in 6.270 s at 1594.896 r/s
100%|############################################################|Time: 0:00:05
Demoted 10000 records in 5.686 s at 1758.706 r/s
100%|############################################################|Time: 0:00:05
Took 10000 records in 5.950 s at 1680.672 r/s
100%|############################################################|Time: 0:00:07
Mixed load processed 10000 records in 7.410 s at 1349.528 r/s
Final Frontier: Durability
Finally, durability is the only problem I didn't completely prototype. But I don't think it's that hard either. In our priority queue, the heap (list) of items has all the information we need to persist the data type to disk. Since, with gevent, we can also spawn functions in a multi-processing way, I imagined using a function like this:
def save_heap(heap, toll):
name = 'heap-{0}.txt'.format(toll)
with open(name, 'w') as temp:
for val in heap:
temp.write(str(val))
gevent.sleep(0)
and adding a save function to our priority queue:
def save(self):
heap_copy = tuple(self.heap)
toll = self.toll
gevent.spawn(save_heap, heap_copy, toll)
You could now copy the Redis model of forking and writing the data store to disk every few minutes. If you need even greater durability then couple the above with a system that logs commands to disk. Together, those are the AFP and RDB persistence methods that Redis uses.

Websphere MQ can do almost all of this.
The promote/demote is almost possible, by removing the message from the queue and putting it back on with a higher/lower priority, or, by using the "CORRELID" as a sequence number.

What's wrong with RabbitMQ? It sounds exactly like what you need.
We extensively use Redis as well in our Production environment, but it doesn't have some of the functionality Queues usually have, like setting a task as complete, or re-sending the task if it isn't completed in some TTL. It does, on the other hand, have other features a Queue doesn't have, like it is a generic storage, and it is REALLY fast.

Use Redisson it implements familiar List, Queue, BlockingQueue, Deque java interfaces in distributed approach provided by Redis. Example with a Deque:
Redisson redisson = Redisson.create();
RDeque<SomeObject> queue = redisson.getDeque("anyDeque");
queue.addFirst(new SomeObject());
queue.addLast(new SomeObject());
SomeObject obj = queue.removeFirst();
SomeObject someObj = queue.removeLast();
redisson.shutdown();
Other samples:
https://github.com/mrniko/redisson/wiki/7.-distributed-collections/#77-list
https://github.com/mrniko/redisson/wiki/7.-distributed-collections/#78-queue https://github.com/mrniko/redisson/wiki/7.-distributed-collections/#710-blocking-queue

If you for some reason decide to use an SQL database as a backend, I would not use MySQL as it requires polling (well and would not use it for lots of other reasons), but PostgreSQL supports LISTEN/NOTIFY for signalling other clients so that they do not have to poll for changes. However, it signals all listening clients at once, so you still would require a mechanism for choosing a winning listener.
As a sidenote I am not sure if a promote/demote mechanism would be useful; it would be better to schedule the jobs appropriately while inserting...

Inconsistent counter values between replicas in Cassandra

I've got a 3 machine Cassandra cluster using rack unaware placements strategy with a replication factor of 2.
The column family is defined as follows:
create column family UserGeneralStats with comparator = UTF8Type and default_validation_class = CounterColumnType;
Unfortunately after a few days of production use I got some inconsistent values for the counters:
Query on replica 1:
[default#StatsKeyspace] list UserGeneralStats['5261666978': '5261666978'];
Using default limit of 100
-------------------
RowKey: 5261666978
=> (counter=bandwidth, value=96545030198)
=> (counter=downloads, value=1013)
=> (counter=previews, value=10304)
Query on replica 2:
[default#StatsKeyspace] list UserGeneralStats['5261666978': '5261666978'];
Using default limit of 100
-------------------
RowKey: 5261666978
=> (counter=bandwidth, value=9140386229)
=> (counter=downloads, value=339)
=> (counter=previews, value=1321)
As the standard read repair mechanism doesn't seem to repair the values I tried to force an
anti-entropy repair using nodetool repair. It did't have any effect on the counter values.
Data inspection showed that the lower values for the counters are the correct ones so I suspect that either Cassandra (or Hector which I used as API to call Cassandra from Java) retried some increments.
Any ideas how to repair the data and possibly prevent the sittuation from happening again?

If neither RR nor repair fixes it, it's probably a bug.
Please upgrade to 0.8.3 (out today) and verify it's still present in that version, then you can file a ticket at https://issues.apache.org/jira/browse/CASSANDRA.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark MapwithState stateSnapshots Not scaling (Java) - java

Related

Getting OutOfMemoryError GC overhead limit exceeded when collecting a dataset in Java Spark

Cassandra, Java and MANY Async request : is this good?

Apache Spark with Spark JobServer crash after some hours

How could a distributed queue-like-thing be implemented on top of a RBDMS or NOSQL datastore or other messaging system (e.g., rabbitmq)?

Inconsistent counter values between replicas in Cassandra

Categories

Resources