Assume original message size is 500 bytes (before sending it to the Kafka). So what will be the size of the message after sending it to the Kafka? And what if we use any compression?
Additional information: I am putting a ByteBuffer of size 2048 bytes to a topic (with single partition) without any key.
Topic name: ub3
Path: /data/kafka-logs/ub3-0
[hdpusr#hdpdev2 ub3-0]$ $KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hdpdev2:8092 --topic ub3 --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
184
[hdpusr#hdpdev2 ub3-0]$ du -sh *
10M 00000000000000000000.index
448K 00000000000000000000.log
10M 00000000000000000000.timeindex
4.0K leader-epoch-checkpoint
[hdpusr#hdpdev2 ub3-0]$
[hdpusr#hdpdev2 ub3-0]$
[hdpusr#hdpdev2 ub3-0]$ $KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hdpdev2:8092 --topic ub3 --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
86284
[hdpusr#hdpdev2 ub3-0]$ du -sh *
10M 00000000000000000000.index
256M 00000000000000000000.log
10M 00000000000000000000.timeindex
4.0K leader-epoch-checkpoint
[hdpusr#hdpdev2 ub3-0]$
[hdpusr#hdpdev2 ub3-0]$ $KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hdpdev2:8092 --topic ub3 --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
172405
[hdpusr#hdpdev2 ub3-0]$ du -sh *
10M 00000000000000000000.index
512M 00000000000000000000.log
10M 00000000000000000000.timeindex
4.0K leader-epoch-checkpoint
[hdpusr#hdpdev2 ub3-0]$
[hdpusr#hdpdev2 ub3-0]$ $KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hdpdev2:8092 --topic ub3 --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
258491
[hdpusr#hdpdev2 ub3-0]$ du -sh *
10M 00000000000000000000.index
596M 00000000000000000000.log
10M 00000000000000000000.timeindex
4.0K leader-epoch-checkpoint
[hdpusr#hdpdev2 ub3-0]$
[hdpusr#hdpdev2 ub3-0]$ $KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hdpdev2:8092 --topic ub3 --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
344563
[hdpusr#hdpdev2 ub3-0]$ du -sh *
10M 00000000000000000000.index
1.1G 00000000000000000000.log
10M 00000000000000000000.timeindex
4.0K leader-epoch-checkpoint
[hdpusr#hdpdev2 ub3-0]$
The short answer is: who knows?
But let's try to find out some numbers. I have started a Kafka in Docker using this guide. Then, wrote a simple producer:
public class App {
public static void main(String[] args) throws Exception {
final Producer<String, byte[]> producer = producer();
producer.send(
new ProducerRecord<>(
"test",
key(),
value()
)
).get();
}
private static Producer<String, byte[]> producer() {
final Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.CLIENT_ID_CONFIG, "so57472830");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getName());
return new KafkaProducer<>(props);
}
private static String key() {
return UUID.randomUUID().toString();
}
}
So, will be sending to localhost:9092 with a client id equal to so57472830 into a test topic. The payloads are byte arrays and keys are string UUIDs. As you'll see later all these values (except the host:port) contribute to the "overhead". Here I suppose that overhead is everything except the message payload itself.
Let's start with a "Hello, world!":
private static byte[] value() {
return "Hello, world!".getBytes();
}
Run the app and capture the traffic to localhost:9092. I used WireShark for that.
Here I found the message with the payload. Let's see the whole TCP stream ("Follow TCP stream" in WireShark):
So, the whole stream took 527 bytes of which the client send (highlighted with rose color) 195:
(This also means that Kafka send 527 - 195 == 332 bytes in response):
Our payload was 13 bytes. As you noticed, the outbound traffic contains the client id twice (2 × 10 bytes) and the message key (16 bytes). So, of 195 bytes send 146 are mystery (probably the one that you named as "overhead" in your question).
Let's send 500 random bytes:
private static byte[] value() {
final byte[] result = new byte[500];
new Random().nextBytes(result);
return result;
}
Outbound traffic was 684 bytes (the entire conversation took 1016):
Again, the server send 332 byte in response and the outbound mystery (overhead) made up 684 - (500 + 2 × 10 + 16) = 164 bytes!
All these numbers are not final and may change with producer versions or specific config settings. One of them, you've mentioned, is compression. Let's check it out. Be warned that the compression depends on the data. Random bytes are tougher to compress than the constant ones as they have more entropy. So, let's send 500 repetiting bytes with a GZIP compression. Without the compression, the numbers are the same:
Add props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip"); to the producer() method and change the value():
private static byte[] value() {
final byte[] result = new byte[500];
Arrays.fill(result, (byte) 'a');
return result;
}
When the compression is enabled, the message (key and value, not the client id and topic) are compressed, and the outbound traffic is only 208 bytes:
I'd say that the overhead is about the same as in the examples above, the compression impacts the size of the message itself.
That all applies to the traffic, but after your edit I see you were interested in the storage size. Nevertheless, I would say that the answer is the same: "who knows". The numbers definitely depend on your configuration.
Related
I have a small app to count the number of colors using Apache Kafka -
public class FavouriteColor {
private static final String INPUT_TOPIC_NAME = "favourite-colour-input";
private static final String OUTPUT_TOPIC_NAME = "favourite-colour-output";
private static final String INTERMEDIATE_TOPIC_NAME = "favourite-colour-output";
private static final String APPLICATION_ID = "favourite-colour-java";
public static void main(String[] args) {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, APPLICATION_ID);
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, "0");
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> textLines = builder.stream(INPUT_TOPIC_NAME);
KStream<String, String> usersAndColours = textLines
.filter((key, value) -> value.contains(","))
.selectKey((key, value) -> value.split(",")[0].toLowerCase())
.mapValues(value -> value.split(",")[1].toLowerCase())
.filter((user, colour) -> Arrays.asList("green", "blue", "red").contains(colour));
usersAndColours.to(INTERMEDIATE_TOPIC_NAME);
KTable<String, String> usersAndColoursTable = builder.table(INTERMEDIATE_TOPIC_NAME);
KTable<String, Long> favouriteColours = usersAndColoursTable
.groupBy((user, colour) -> new KeyValue<>(colour, colour))
.count(Named.as("CountsByColours"));
favouriteColours.toStream().to(OUTPUT_TOPIC_NAME, Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.cleanUp();
streams.start();
System.out.println(streams);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
The topics are created and producers/ consumers are started using the terminal:
kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic favourite-colour-input
kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic user-keys-and-colours --config cleanup.policy=compact
kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic favourite-colour-output --config cleanup.policy=compact
kafka-console-consumer --bootstrap-server localhost:9092 \
--topic favourite-colour-output \
--from-beginning \
--formatter kafka.tools.DefaultMessageFormatter \
--property print.key=true \
--property print.value=true \
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
kafka-console-producer --bootstrap-server localhost:9092 --topic favourite-colour-input
I provided the following inputs into the terminal:
stephane,blue
john,green
stephane,red
alice,red
I received the error in the consumer terminal:
stephane Processed a total of 1 messages
[2021-11-27 21:31:58,155] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.SerializationException: Size of data received by LongDeserializer is not 8
at org.apache.kafka.common.serialization.LongDeserializer.deserialize(LongDeserializer.java:26)
at org.apache.kafka.common.serialization.LongDeserializer.deserialize(LongDeserializer.java:21)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
at kafka.tools.DefaultMessageFormatter.$anonfun$writeTo$2(ConsoleConsumer.scala:519)
at scala.Option.map(Option.scala:242)
at kafka.tools.DefaultMessageFormatter.deserialize$1(ConsoleConsumer.scala:519)
at kafka.tools.DefaultMessageFormatter.writeTo(ConsoleConsumer.scala:568)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:115)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:52)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
What's the issue here? I did brief research and find similar questions asked by other people, but, the solutions seem not to work for me.
You defined the value deserializer to be that for Long, but it looks like your data is a String instead.
I custom the k8s core dns file to resolve a custom name.which works fine in pods checked by ping xx.
But it not resolved in java appliation(jdk14).
Nameserver is ok.
/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search xxxx-5-production.svc.cluster.local svc.cluster.local cluster.local
/ # ping xx
PING xx (192.168.65.2): 56 data bytes
64 bytes from 192.168.65.2: seq=0 ttl=37 time=0.787 ms
Edit: I use coredns rewrite host name xx to host.docker.internal,this is change to coredns config
rewrite name regex (^|(?:\S*\.)*)xx\.?$ {1}host.docker.internal
I add some debug code to the entry:
static void runCommand(String... commands) {
try {
ProcessBuilder cat = new ProcessBuilder(commands);
Process start = cat.start();
start.waitFor();
String output = new BufferedReader(new InputStreamReader(start.getInputStream())).lines().collect(Collectors.joining());
String err = new BufferedReader(new InputStreamReader(start.getErrorStream())).lines().collect(Collectors.joining());
log.info("\n{}: stout {}", Arrays.toString(commands),output);
log.info("\n{}: sterr{}", Arrays.toString(commands),err);
} catch (IOException | InterruptedException e) {
log.error(e.getClass().getCanonicalName(), e);
}
}
public static void main(String[] args) {
try {
InetAddress xx = Inet4Address.getByName("xx");
log.info("{}: {}", "InetAddress xx", xx.getHostAddress());
} catch (IOException e) {
log.error(e.getClass().getCanonicalName(), e);
}
runCommand("cat", "/etc/resolv.conf");
runCommand("ping", "xx","-c","1");
runCommand("ping", "host.docker.internal","-c","1");
runCommand("nslookup", "xx");
runCommand("ifconfig");
SpringApplication.run(FileServerApp.class, args);
}
Here is output:
01:01:39.950 [main] ERROR com.j.file_server_app.FileServerApp - java.net.UnknownHostException
java.net.UnknownHostException: xx: Name or service not known
at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:932)
at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1505)
at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:851)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1495)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1354)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1288)
at java.base/java.net.InetAddress.getByName(InetAddress.java:1238)
at com.j.file_server_app.FileServerApp.main(FileServerApp.java:43)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:51)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:52)
01:01:39.983 [main] INFO com.j.file_server_app.FileServerApp -
[cat, /etc/resolv.conf]: stout nameserver 10.96.0.10search default.svc.cluster.local svc.cluster.local cluster.localoptions ndots:5
01:01:39.985 [main] INFO com.j.file_server_app.FileServerApp -
[cat, /etc/resolv.conf]: sterr
01:01:39.991 [main] INFO com.j.file_server_app.FileServerApp -
[ping, xx, -c, 1]: stout
01:01:39.991 [main] INFO com.j.file_server_app.FileServerApp -
[ping, xx, -c, 1]: sterrping: unknown host
01:01:39.998 [main] INFO com.j.file_server_app.FileServerApp -
[ping, host.docker.internal, -c, 1]: stout PING host.docker.internal (192.168.65.2): 56 data bytes64 bytes from 192.168.65.2: icmp_seq=0 ttl=37 time=0.757 ms--- host.docker.internal ping statistics ---1 packets transmitted, 1 packets received, 0% packet lossround-trip min/avg/max/stddev = 0.757/0.757/0.757/0.000 ms
01:01:39.998 [main] INFO com.j.file_server_app.FileServerApp -
[ping, host.docker.internal, -c, 1]: sterr
01:01:40.045 [main] INFO com.j.file_server_app.FileServerApp -
[nslookup, xx]: stout Server: 10.96.0.10Address: 10.96.0.10#53Non-authoritative answer:Name: host.docker.internalAddress: 192.168.65.2** server can't find xx: NXDOMAIN
01:01:40.045 [main] INFO com.j.file_server_app.FileServerApp -
[nslookup, xx]: sterr
01:01:40.048 [main] INFO com.j.file_server_app.FileServerApp -
[ifconfig]: stout eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.1.3.14 netmask 255.255.0.0 broadcast 0.0.0.0 ether ce:71:60:9a:75:05 txqueuelen 0 (Ethernet) RX packets 35 bytes 3776 (3.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 22 bytes 1650 (1.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 1 bytes 29 (29.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1 bytes 29 (29.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
01:01:40.048 [main] INFO com.j.file_server_app.FileServerApp -
[ifconfig]: sterr
Looks like coredns not working,but in the front end pod,ping is ok,this is front end Dockerfile
FROM library/nginx:stable-alpine
RUN mkdir /app
EXPOSE 80
ADD dist /app
COPY nginx.conf /etc/nginx/nginx.conf
Using docker inspect for fontend and backend container,both network setting are:
"NetworkSettings": {
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {}
}
Both frontend and backend has service with type: LoadBalancer,now my question is why the name resolve behave different in this two pods?
I have a system with HTTP POST requests and it runs with Spring 5 (standalone tomcat). In short it looks like this:
client (Apache AB) ----> micro service (java or golang) --> RabbitMQ --> Core(spring + tomcat).
The thing is, when I use my Java (Spring) service, it is ok. AB shows this output:
ab -n 1000 -k -s 2 -c 10 -s 60 -p test2.sh -A 113:113 -T 'application/json' https://127.0.0.1:8449/SecureChat/chat/v1/rest-message/send
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
...
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8449
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Document Path: /rest-message/send
Document Length: 39 bytes
Concurrency Level: 10
Time taken for tests: 434.853 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 0
Total transferred: 498000 bytes
Total body sent: 393000
HTML transferred: 39000 bytes
Requests per second: 2.30 [#/sec] (mean)
Time per request: 4348.528 [ms] (mean)
Time per request: 434.853 [ms] (mean, across all concurrent
requests)
Transfer rate: 1.12 [Kbytes/sec] received
0.88 kb/s sent
2.00 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 4 14 7.6 17 53
Processing: 1110 4317 437.2 4285 8383
Waiting: 1107 4314 437.2 4282 8377
Total: 1126 4332 436.8 4300 8403
That is through TLS.
But when I try to use my Golang service I get timeout:
Benchmarking 127.0.0.1 (be patient)...apr_pollset_poll: The timeout specified has expired (70007)
Total of 92 requests completed
And this output:
ab -n 100 -k -s 2 -c 10 -s 60 -p test2.sh -T 'application/json' http://127.0.0.1:8089/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)...^C
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8089
Document Path: /
Document Length: 39 bytes
Concurrency Level: 10
Time taken for tests: 145.734 seconds
Complete requests: 92
Failed requests: 1
(Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Keep-Alive requests: 91
Total transferred: 16380 bytes
Total body sent: 32200
HTML transferred: 3549 bytes
Requests per second: 0.63 [#/sec] (mean)
Time per request: 15840.663 [ms] (mean)
Time per request: 1584.066 [ms] (mean, across all concurrent requests)
Transfer rate: 0.11 [Kbytes/sec] received
0.22 kb/s sent
0.33 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1229 1494 1955.9 1262 20000
Waiting: 1229 1291 143.8 1262 2212
Total: 1229 1494 1955.9 1262 20000
That is through plane tcp.
I guess I have some mistakes in my code. I made it in one file
func initAmqp(rabbitUrl string) {
var err error
conn, err = amqp.Dial(rabbitUrl)
failOnError(err, "Failed to connect to RabbitMQ")
}
func main() {
err := gcfg.ReadFileInto(&cfg, "config.gcfg")
if err != nil {
log.Fatal(err);
}
PrintConfig(cfg)
if cfg.Section_rabbit.RabbitUrl != "" {
initAmqp(cfg.Section_rabbit.RabbitUrl);
}
mux := http.NewServeMux();
mux.Handle("/", NewLimitHandler(1000, newTestHandler()))
server := http.Server {
Addr: cfg.Section_basic.Port,
Handler: mux,
ReadTimeout: 20 * time.Second,
WriteTimeout: 20 * time.Second,
}
defer conn.Close();
log.Println(server.ListenAndServe());
}
func NewLimitHandler(maxConns int, handler http.Handler) http.Handler {
h := &limitHandler{
connc: make(chan struct{}, maxConns),
handler: handler,
}
for i := 0; i < maxConns; i++ {
h.connc <- struct{}{}
}
return h
}
func newTestHandler() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
handler(w, r);
})
}
func handler(w http.ResponseWriter, r *http.Request) {
if b, err := ioutil.ReadAll(r.Body); err == nil {
fmt.Println("message is ", string(b));
res := publishMessages(string(b))
w.Write([]byte(res))
w.WriteHeader(http.StatusOK)
counter ++;
}else {
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte("500 - Something bad happened!"))
}
}
func publishMessages(payload string) string {
ch, err := conn.Channel()
failOnError(err, "Failed to open a channel")
q, err = ch.QueueDeclare(
"", // name
false, // durable
false, // delete when unused
true, // exclusive
false, // noWait
nil, // arguments
)
failOnError(err, "Failed to declare a queue")
msgs, err := ch.Consume(
q.Name, // queue
"", // consumer
true, // auto-ack
false, // exclusive
false, // no-local
false, // no-wait
nil, // args
)
failOnError(err, "Failed to register a consumer")
corrId := randomString(32)
log.Println("corrId ", corrId)
err = ch.Publish(
"", // exchange
cfg.Section_rabbit.RabbitQeue, // routing key
false, // mandatory
false, // immediate
amqp.Publishing{
DeliveryMode: amqp.Transient,
ContentType: "application/json",
CorrelationId: corrId,
Body: []byte(payload),
Timestamp: time.Now(),
ReplyTo: q.Name,
})
failOnError(err, "Failed to Publish on RabbitMQ")
defer ch.Close();
result := "";
for d := range msgs {
if corrId == d.CorrelationId {
failOnError(err, "Failed to convert body to integer")
log.Println("result = ", string(d.Body))
return string(d.Body);
}else {
log.Println("waiting for result = ")
}
}
return result;
}
Can someone help?
EDIT
here are my variables
type limitHandler struct {
connc chan struct{}
handler http.Handler
}
var conn *amqp.Connection
var q amqp.Queue
EDIT 2
func (h *limitHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
select {
case <-h.connc:
fmt.Println("ServeHTTP");
h.handler.ServeHTTP(w, req)
h.connc <- struct{}{}
default:
http.Error(w, "503 too busy", http.StatusServiceUnavailable)
}
}
EDIT 3
func failOnError(err error, msg string) {
if err != nil {
log.Fatalf("%s: %s", msg, err)
panic(fmt.Sprintf("%s: %s", msg, err))
}
}
My code is attempting to decompress an input stream read from a gzipped file.
Here is the code snippet:
InputStream is = new GZIPInputStream(new ByteArrayInputStream(fcontents.getBytes()));
The file itself is fine:
$cat storefront3.gz | gunzip
180028796
80026920
180028796
180026921
8002790180
800001
1800002
1800007
800008
800009
The data read in prior to the top code snippet via FileInputStream sure looks like gzip stuff (note the original file was storefront3.tsv):
��[�Rstorefront3.tsvu���0k{)�?�/FBģ��Y'��Q�a���s~���}6���d�{2+���O���D�m~�O��
But get the following:
Caused by: java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:141)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:56)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:65)
Here is a hex dump of the .gz file
23:40:44/storefronts:72 $od -cx storefront3.gz
0000000 037 213 \b \b 201 [ 347 R \0 003 s t o r e f
8b1f 0808 5b81 52e7 0300 7473 726f 6665
0000020 r o n t 3 . t s v \0 u 212 273 025 200 0
6f72 746e 2e33 7374 0076 8a75 15bb 3080
0000040 \f 003 k { 032 ) 200 ? 373 / F B ģ ** 302 131
030c 7b6b 291a 3f80 2ffb 4246 a3c4 cdc2
0000060 Y ' 261 200 Q 331 a 276 276 350 001 s ~ 222 262 175
2759 80b1 d951 be61 e8be 7301 927e dcb2
0000100 } 6 226 231 367 d 200 { 2 + 211 337 342 020 O 022
367d 9996 64f7 7b80 2b32 df89 10e2 f14f
0000120 022 343 035 246 D 211 m ~ 003 326 O 235 030 236 \0 \0
e312 a61d 8944 7e6d d603 9d4f 9e18 0000
0000140 \0
0000
UPDATE
I also tried to use FileInputStream. Following gives same error
GZIPInputStream strm = new GZIPInputStream(new FileInputStream(tmpFileName));
Since fcontents contains your gzipped data it should be a byte[] and not a String?
I recommend using IOUtils for reading the file into a byte array as reading it into a string will most likely corrupt your data.
What I want is to parse the result of a ping, line by line. It's a bit tricky for me and tried a lot of things but well... I'm using the ping on Android.
For example:
PING google.com (173.194.35.9) 56(84) bytes of data.
64 bytes from mil01s16-in-f9.1e100.net (173.194.35.9): icmp_seq=1 ttl=52 time=33.0 ms
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 33.086/33.086/33.086/0.000 ms
On the first line, I want the Ip address, the "56(84) bytes of data". On second line "64 bytes", 1,52,33.0 ms etc.
If a ping an IP directly, it changes a little bit
PING 192.168.0.12 (192.168.0.12) 56(84) bytes of data.
64 bytes from 192.168.0.12: icmp_seq=1 ttl=64 time=0.134 ms
--- 192.168.0.12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.134/0.134/0.134/0.000 ms
But should work too !
And if I have a little explaination with an answer, it would be cool !
Thanks so much!
Description
This expression will capture IP, bytes of data, bytes, ICMP_SEQ, ttl, time. I couldn't find etc.
^PING\b # match ping
[^(]*\(([^)]*)\) # capture IP
\s([^.]*)\. # capture the bytes of data
.*?^(\d+\sbytes) # capture bytes
.*?icmp_seq=(\d+) # capture icmp_seq
.*?ttl=(\d+) # capture ttl
.*?time=(.*?ms) # capture time
.*?(\d+)\spackets\stransmitted # the rest of these lines will capture the other portions of the ping result
.*?(\d+)\sreceived
.*?(\d+%)\spacket\sloss
.*?time\s(\d+ms)
.*?=\s([^\/]*)\/([^\/]*)\/([^\/]*)\/(.*)\sms
Example
Live Example: http://www.rubular.com/r/uEDoEZwY7U
Sample Text
PING google.com (173.194.35.9) 56(84) bytes of data.
64 bytes from mil01s16-in-f9.1e100.net (173.194.35.9): icmp_seq=1 ttl=52 time=33.0 ms
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 33.086/33.086/33.086/0.000 ms
PING 192.168.0.12 (192.168.0.12) 56(84) bytes of data.
64 bytes from 192.168.0.12: icmp_seq=1 ttl=64 time=0.134 ms
--- 192.168.0.12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.134/0.134/0.134/0.000 ms
Sample Code
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("^PING\\b # match ping
[^(]*\\(([^)]*)\\) # capture IP
\\s([^.]*)\\. # capture the bytes of data
.*?^(\\d+\\sbytes) # capture bytes
.*?icmp_seq=(\\d+) # capture icmp_seq
.*?ttl=(\\d+) # capture ttl
.*?time=(.*?ms) # capture time
.*?(\\d+)\\spackets\\stransmitted
.*?(\\d+)\\sreceived
.*?(\\d+%)\\spacket\\sloss
.*?time\\s(\\d+ms)
.*?=\\s([^\\/]*)\\/([^\\/]*)\\/([^\\/]*)\\/(.*?)\\sms
",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
Capture Groups
[0][0] = PING google.com (173.194.35.9) 56(84) bytes of data.
64 bytes from mil01s16-in-f9.1e100.net (173.194.35.9): icmp_seq=1 ttl=52 time=33.0 ms
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 33.086/33.086/33.086/0.000 ms
[0][2] = 173.194.35.9
[0][2] = 56(84) bytes of data
[0][3] = 64 bytes
[0][4] = 1
[0][5] = 52
[0][6] = 33.0 ms
[0][7] = 1
[0][8] = 1
[0][9] = 0%
[0][10] = 0ms
[0][11] = 33.086
[0][12] = 33.086
[0][13] = 33.086
[0][14] = 0.000
[1][0] = PING 192.168.0.12 (192.168.0.12) 56(84) bytes of data.
64 bytes from 192.168.0.12: icmp_seq=1 ttl=64 time=0.134 ms
--- 192.168.0.12 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.134/0.134/0.134/0.000 ms
[1][3] = 192.168.0.12
[1][2] = 56(84) bytes of data
[1][3] = 64 bytes
[1][4] = 1
[1][5] = 64
[1][6] = 0.134 ms
[1][7] = 1
[1][8] = 1
[1][9] = 0%
[1][10] = 0ms
[1][11] = 0.134
[1][12] = 0.134
[1][13] = 0.134
[1][14] = 0.000
I think you only need to parse the second line.
As such:
String domainPing = "64 bytes from mil01s16-in-f9.1e100.net (173.194.35.9): icmp_seq=1 ttl=52 time=33.0 ms";
String ipPing = "64 bytes from 192.168.0.12: icmp_seq=1 ttl=64 time=0.134 ms";
String wholeDomainPing = "PING google.com (173.194.35.9) 56(84) bytes of data.\r\n"+
"64 bytes from mil01s16-in-f9.1e100.net (173.194.35.9): icmp_seq=1 ttl=52 time=33.0 ms\r\n\r\n"+
"--- google.com ping statistics ---\r\n"+
"1 packets transmitted, 1 received, 0% packet loss, time 0ms\r\n" +
"rtt min/avg/max/mdev = 33.086/33.086/33.086/0.000 ms";
Pattern pattern = Pattern.compile(
// "[digit] bytes"..... "from [ip]" or "([ip])"
"(\\d+(?=\\sbytes)).*?(((?<=(from\\s))[\\d\\.]+)|((?<=\\()[\\d\\.]+(?=\\))))",
Pattern.MULTILINE
);
Matcher matcher = pattern.matcher(domainPing);
if (matcher.find()) {
System.out.println("Bytes: " + matcher.group(1));
System.out.println("IP: " + matcher.group(2));
}
matcher = pattern.matcher(ipPing);
if (matcher.find()) {
System.out.println("Bytes: " + matcher.group(1));
System.out.println("IP: " + matcher.group(2));
}
matcher = pattern.matcher(wholeDomainPing);
if (matcher.find()) {
System.out.println("Bytes: " + matcher.group(1));
System.out.println("IP: " + matcher.group(2));
}
// etc...
Output:
Bytes: 64
IP: 173.194.35.9
Bytes: 64
IP: 192.168.0.12
Bytes: 64
IP: 173.194.35.9
Edit added example for whole input (first scenario) and Pattern.MULTILINE flag to the Pattern.