Spark unevenly distributes load

Spark unevenly distributes load - java

No matter how much resource I put to the system it cannot go below 11 min.
I am trying to parallelize a simple Spark program processes HBase data in parallel.
// Get Hbase RDD
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD =
jsc.newAPIHadoopRDD(
conf,
TableInputFormat.class,
ImmutableBytesWritable.class,
Result.class
);
long count = hBaseRDD.count();
The problem is my program is as slow as the largest bar.
Found that ZK is taking long time before shutting.
18/05/19 17:26:55 INFO zookeeper.ClientCnxn: Session establishment complete on server <IP>:2181, sessionid = 0x163662b64eb046d, negotiated timeout = 40000
18/05/19 17:38:00 INFO zookeeper.ZooKeeper: Session: 0x163662b64eb046d closed

Related

io.smallrye.mutiny.TimeoutException when using kafka vs redis

I'm using kafka + redis in my project.
I get message from Kafka, process and save to redis, but it is giving error like below when my code runs after some time my code
io.smallrye.mutiny.TimeoutException
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.set(RedisClientImpl.java:687)
at worker.redis.process.implementation.ProductImplementation.refresh(ProductImplementation.java:34)
at worker.redis.Worker.refresh(Worker.java:51)
at
kafka.InComingProductKafkaConsume.lambda$consume$0(InComingProductKafkaConsume.java:38)
at business.core.hpithead.ThreadStart.doRun(ThreadStart.java:34)
at business.core.hpithead.core.NotifyingThread.run(NotifyingThread.java:27)
at java.base/java.lang.Thread.run(Thread.java:833)
The record 51761 from topic-partition 'mer-outgoing-master-item-0' has waited for 153 seconds to be acknowledged. This waiting time is greater than the configured threshold (150000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 51760. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
#Incoming("mer_product")
#Blocking
public CompletionStage<Void> consume2(Message<String> payload) {
var objectDto = configThreadLocal.mapper.readValue(payload.getPayload(), new TypeReference<KafkaPayload<ItemKO>>(){});
worker.refresh(objectDto.payload.castDto());
return payload.ack();
}

How to identify the optimum number of shuffle partition in Spark

I am running a spark structured streaming job (bounces every day) in EMR. I am getting an OOM error in my application after a few hours of execution and get killed. The following are my configurations and spark SQL code.
I am new to Spark and need your valuable input.
The EMR is having 10 instances with 16 core and 64GB memory.
Spark-Submit arguments:
num_of_executors: 17
executor_cores: 5
executor_memory: 19G
driver_memory: 30G
Job is reading input as micro-batches from a Kafka at an interval of 30seconds. Average number of rows read per batch is 90k.
spark.streaming.kafka.maxRatePerPartition: 4500
spark.streaming.stopGracefullyOnShutdown: true
spark.streaming.unpersist: true
spark.streaming.kafka.consumer.cache.enabled: true
spark.hadoop.fs.s3.maxRetries: 30
spark.sql.shuffle.partitions: 2001
Spark SQL aggregation code:
dataset.groupBy(functions.col(NAME),functions.window(functions.column(TIMESTAMP_COLUMN),30))
.agg(functions.concat_ws(SPLIT, functions.collect_list(DEPARTMENT)).as(DEPS))
.select(NAME,DEPS)
.map((row) -> {
Map<String, Object> map = Maps.newHashMap();
map.put(NAME, row.getString(0));
map.put(DEPS, row.getString(1));
return new KryoMapSerializationService().serialize(map);
}, Encoders.BINARY());
Some logs from the driver:
20/04/04 13:10:51 INFO TaskSetManager: Finished task 1911.0 in stage 1041.0 (TID 1052055) in 374 ms on <host> (executor 3) (1998/2001)
20/04/04 13:10:52 INFO TaskSetManager: Finished task 1925.0 in stage 1041.0 (TID 1052056) in 411 ms on <host> (executor 3) (1999/2001)
20/04/04 13:10:52 INFO TaskSetManager: Finished task 1906.0 in stage 1041.0 (TID 1052054) in 776 ms on <host> (executor 3) (2000/2001)
20/04/04 13:11:04 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 3.
20/04/04 13:11:04 INFO DAGScheduler: Executor lost: 3 (epoch 522)
20/04/04 13:11:04 INFO BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
20/04/04 13:11:04 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, <host>, 38533, None)
20/04/04 13:11:04 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
20/04/04 13:11:04 INFO YarnAllocator: Completed container container_1582797414408_1814_01_000004 on host: <host> (state: COMPLETE, exit status: 143)
And by the way, I am using collectasList in my forEachBatch code
List<Event> list = dataset.select("value")
.selectExpr("deserialize(value) as rows")
.select("rows.*")
.selectExpr(NAME, DEPS)
.as(Encoders.bean(Event.class))
.collectAsList();

With these settings, you may be causing your own issues.
num_of_executors: 17
executor_cores: 5
executor_memory: 19G
driver_memory: 30G
You are basically creating extra containers here to have to shuffle between. Instead, start off with something like 10 executors, 15 cores, 60g memory. If that is working, then you can play these a bit to try and optimize performance. I usually try splitting my containers in half each step (but I also havent needed to do this since spark 2.0).
Let Spark SQL keep the default at 200. The more you break this up, the more math you make Spark do to calculate the shuffles. If anything, I'd try to go with the same number of parallelism as you have executors, so in this case just 10. When 2.0 came out, this is how you would tune hive queries.
Making the job complex to break up puts all the load on the master.
Using Datasets and Encoding are also generally not as performant as going with straight DataFrame operations. I have found great lifts in performance of factoring this out for dataframe operations.

Storm KafkaSpout stopped to consume messages from Kafka Topic

My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this:
2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo#2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x230867ec "[__sendqueue = {sojourn_time_ms=0.0, write_pos=5411461, read_pos=5411461, overflow=0, arrival_rate_secs=0.0, capacity=1024, population=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x7cdec8eb "[__complete-latency = {default=0.0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x658fc59 "[__skipped-max-spout = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3c1f3a50 "[__receive = {sojourn_time_ms=4790.5, write_pos=2468305, read_pos=2468304, overflow=0, arrival_rate_secs=0.20874647740319383, capacity=1024, population=1}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x262d7906 "[__skipped-inactive = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x73648c7e "[kafkaPartition = {Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPICallCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMax=null, Partition{host=slave103:9092, topic=packet, partition=12}/lostMessageCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMean=null, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPIMessageCount=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x4e43df61 "[kafkaOffset = {packet/totalLatestCompletedOffset=154305947, packet/partition_12/spoutLag=82472754, packet/totalEarliestTimeOffset=233919465, packet/partition_12/earliestTimeOffset=233919465, packet/partition_12/latestEmittedOffset=154307691, packet/partition_12/latestTimeOffset=236778701, packet/totalLatestEmittedOffset=154307691, packet/partition_12/latestCompletedOffset=154305947, packet/totalLatestTimeOffset=236778701, packet/totalSpoutLag=82472754}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x49fe816b "[__transfer-count = {__ack_init=0, default=0, __metrics=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x63e2bdc0 "[__fail-count = {}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3b17bb7b "[__skipped-throttle = 1086120]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x1315a68c "[__emit-count = {__ack_init=0, default=0, __metrics=0}]"]]]
2016-07-05 03:58:55.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.946 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __metrics_tick, id: {}, [60]
My test topology is really simple, One KafkaSpout and another Counter Bolt. When the topology works fine, the value between FOR and TUPLE is a positive number; when the topology stops to consume the message, the value becomes negative. so I'm curious about what causes the problem of Processing received message FOR -2 TUPLE, and how to fix this problem?
By the way, my experiment environment is:
OS: Red Hat Enterprise Linux Server release 7.0 (Maipo)
Kafka: 0.10.0.0
Storm: 1.0.1

With the help from the stom mail list I was able to tune KafkaSpout and resolve the issue. The following settings work for me.
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
config.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
I tested by sending 20k-50k batches with 1sec pause between bursts. Each message was 2048 bytes.
I am running 3 node cluster, my topology has 4 spouts and topic has 64 partitions.
After 200M messages its still working....

Check if the producer is actually writing to the topic you expect.
Make sure that the spouts can reach Kafka, at the network level. You can check it using Telnet command.
Can spouts reach Zookeeper? Check it again using Telnet.
Source: KafkaSpout is not receiving anything from Kafka
If above three are true, then:
Kafka has fixed retention window for topics. If the retention is full, it will drop the messages from the tail.
So here what 'might' be happening : the rate at which you are pushing the data to kafka is faster than the rate at which the consumers can consume the messages.
Source : Storm-kafka spout not fast enough to process the information

connecting to mongo shard via java driver 3.2.2

I am trying to connect to mongo query routers in a test environment (I setup just one query router for test -> pointing to a config server (instead of three) which in turn points to a two node shard with no replicas). I can insert/fetch documents using the mongo shell (and have verified that the documents are going to the sharded nodes). However, when I try to test the connection to the mongo database, I get the output copied below (code being used is also copied underneath). I am using mongo database v3.2.0 and java driver v3.2.2 (I am trying to use the async api).
[info] 14:34:44.562 227 [main] MongoAuthentication INFO - testing 1
[info] 14:34:44.595 260 [main] cluster INFO - Cluster created with settings {hosts=[192.168.0.1:27018], mode=MULTIPLE, requiredClusterType=SHARDED, serverSelectionTimeout='30000 ms', maxWaitQueueSize=30}
[info] 14:34:44.595 260 [main] cluster INFO - Adding discovered server 192.168.0.1:27018 to client view of cluster
[info] 14:34:44.652 317 [main] cluster DEBUG - Updating cluster description to {type=SHARDED, servers=[{address=192.168.0.1:27018, type=UNKNOWN, state=CONNECTING}]
[info] Outputting database names:
[info] 14:34:44.660 325 [main] cluster INFO - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=SHARDED, connectionMode=MULTIPLE, all=[ServerDescription{address=192.168.0.1:27018, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
[info] Counting the number of documents
[info] 14:34:44.667 332 [main] cluster INFO - No server chosen by ReadPreferenceServerSelector{readPreference=primaryPreferred} from cluster description ClusterDescription{type=SHARDED, connectionMode=MULTIPLE, all=[ServerDescription{address=192.168.0.1:27018, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
[info] - Count result: 0
[info] 14:34:45.669 1334 [cluster-ClusterId{value='577414c420055e5bc086c255', description='null'}-192.168.0.1:27018] connection DEBUG - Closing connection connectionId{localValue:1}
part of the code being used
final MongoClient mongoClient = MongoClientAccessor.INSTANCE.getMongoClientInstance();
final CountDownLatch listDbsLatch = new CountDownLatch(1);
System.out.println("Outputting database names:");
mongoClient.listDatabaseNames().forEach(new Block<String>() {
#Override
public void apply(final String name) {
System.out.println(" - " + name);
}
}, new SingleResultCallback<Void>() {
#Override
public void onResult(final Void result, final Throwable t) {
listDbsLatch.countDown();
}
});
The enum being used is responsible for reading config options and passing a MongoClient reference to its caller. The enum itself calls other classes which I can copy as well if needed. I have the following option configured for ReadPreference:
mongo.client.readPreference=PRIMARYPREFERRED
Any thoughts on what I might be doing wrong or might have misinterpreted? The goal is to connect to the shard via the mongos (query router) so that I can insert/fetch documents in the Mongo shard.

The mongo shard setup (query router, config and shard with replica sets) was not correctly configured. Ensure that the config server(s) replica set is launched first, mongos (query router) is brought up and points to these config servers, the mongo shards are brought up and then the shards are registered via the query router (mongos) as well as the collection is enabled for sharding. Obviously, make sure that the driver is connecting to the mongos (query router) process.

Spark Job in YARN - Executors are not executing the tasks for long time

I can see the executors are not executing the tasks for long time from the Spark UI.
When i see the executors tab stderr, i can see the below logs.
6/02/04 05:30:56 INFO storage.MemoryStore: Block broadcast_91 of size 153016 dropped from memory (free 6665239401)
16/02/04 06:11:20 WARN hdfs.DFSClient: Slow ReadProcessor read fields took 31337ms (threshold=30000ms); ack: seqno: 1240 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 4835789, targets: [DatanodeInfoWithStorage[10.25.36.18:1004,DS-f6e20cf7-0ccb-45aa-988f-f3310d5acf89,DISK], DatanodeInfoWithStorage[10.25.36.11:1004,DS-61ad0a2d-a6fd-402e-b0a1-61682d1755fb,DISK], DatanodeInfoWithStorage[10.25.36.5:1004,DS-c77503a2-0c7f-4b5c-8f4a-9c61cb4f18d7,DISK]]
I do not see any log for long time. i do not see error as well. It is keep on running..
Is anyone faced the same problem? how we can improve this?
Update:
It is actually took long time on saveAsTextFile() method.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark unevenly distributes load - java

Related

io.smallrye.mutiny.TimeoutException when using kafka vs redis

How to identify the optimum number of shuffle partition in Spark

Storm KafkaSpout stopped to consume messages from Kafka Topic

connecting to mongo shard via java driver 3.2.2

Spark Job in YARN - Executors are not executing the tasks for long time

Categories

Resources