I need some help, Our service uses the lettuce 5.1.6 version, and a total of 22 docker nodes are deployed.
Whenever the service is deployed, several docker nodes will appear ERROR: READONLY You can't write against a read only slave.
Restart the problematic docker node ERROR no longer appears
redis server configuration:
8 master 8 slave
stop-writes-on-bgsave-error no
slave-serve-stale-data yes
slave-read-only yes
cluster-enabled yes
cluster-config-file "/data/server/redis-cluster/{port}/conf/node.conf"
lettuce configuration:
ClientResources res = DefaultClientResources.builder()
.commandLatencyPublisherOptions(
DefaultEventPublisherOptions.builder()
.eventEmitInterval(Duration.ofSeconds(5))
.build()
)
.build();
redisClusterClient = RedisClusterClient.create(res, REDIS_CLUSTER_URI);
redisClusterClient.setOptions(
ClusterClientOptions.builder()
.maxRedirects(99)
.socketOptions(SocketOptions.builder().keepAlive(true).build())
.topologyRefreshOptions(
ClusterTopologyRefreshOptions.builder()
.enableAllAdaptiveRefreshTriggers()
.build())
.build());
RedisAdvancedClusterCommands<String, String> command = redisClusterClient.connect().sync();
command.setex("some key", 18000, "some value");
The Exception that appears:
io.lettuce.core.RedisCommandExecutionException: READONLY You can't write against a read only slave.
at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:122)
at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:123)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
at com.sun.proxy.$Proxy135.setex(Unknown Source)
at com.xueqiu.infra.redis4.RedisClusterImpl.lambda$setex$164(RedisClusterImpl.java:1489)
at com.xueqiu.infra.redis4.RedisClusterImpl$$Lambda$1422/1017847781.apply(Unknown Source)
at com.xueqiu.infra.redis4.RedisClusterImpl.execute(RedisClusterImpl.java:526)
at com.xueqiu.infra.redis4.RedisClusterImpl.executeTotal(RedisClusterImpl.java:491)
at com.xueqiu.infra.redis4.RedisClusterImpl.setex(RedisClusterImpl.java:1489)
In the face of distributed middleware, the client side will put some partitions, sharding and other relationships on the client side for management.
And lettuce is the slot mapping management of redis cluster:
The method adopted is to use an array of slotCache, and cache the node corresponding to each slot locally in the form of an array.
When there is a key that needs to read and write to the server, the slot will be calculated through the CRC16 in the client, and then the node will be obtained in the cache.
When the redis cluster server performs cluster management, it records the mapping relationship between slot and node in the local node.conf of each node.
When ping pong data is exchanged through the gossip protocol, these metadata information are broadcast to form the final consistent metadata information.
However, if there is an error in the slot mapping relationship on the server side, the client side will use these wrong data.
This time the problem appears here. The server part node maps the slot to the slave, so that the slot cached by the client is mapped to the slave node, and the read and write requests are sent to the slave node, resulting in an error.
lettuce source code investigation
1 lettuce initialization Partitions.java
/**
* Update the partition cache. Updates are necessary after the partition details have changed.
*/
public void updateCache() {
synchronized (partitions) {
if (partitions.isEmpty()) {
this.slotCache = EMPTY;
this.nodeReadView = Collections.emptyList();
return;
}
RedisClusterNode[] slotCache = new RedisClusterNode[SlotHash.SLOT_COUNT];
List<RedisClusterNode> readView = new ArrayList<>(partitions.size());
for (RedisClusterNode partition: partitions) {
readView.add(partition);
for (Integer integer: partition.getSlots()) {
slotCache[integer.intValue()] = partition;
}
}
this.slotCache = slotCache;
this.nodeReadView = Collections.unmodifiableCollection(readView);
}
}
2 lettuce send command PooledClusterConnectionProvider.java
private CompletableFuture<StatefulRedisConnection<K, V>> getWriteConnection(int slot) {
CompletableFuture<StatefulRedisConnection<K, V>> writer;// avoid races when reconfiguring partitions.
synchronized (stateLock) {
writer = writers[slot];
}
if (writer == null) {
RedisClusterNode partition = partitions.getPartitionBySlot(slot);
if (partition == null) {
clusterEventListener.onUncoveredSlot(slot);
return Futures.failed(new PartitionSelectorException("Cannot determine a partition for slot "+ slot + ".",
partitions.clone()));
}
// Use always host and port for slot-oriented operations. We don't want to get reconnected on a different
// host because the nodeId can be handled by a different host.
RedisURI uri = partition.getUri();
ConnectionKey key = new ConnectionKey(Intent.WRITE, uri.getHost(), uri.getPort());
ConnectionFuture<StatefulRedisConnection<K, V>> future = getConnectionAsync(key);
return future.thenApply(connection -> {
synchronized (stateLock) {
if (writers[slot] == null) {
writers[slot] = CompletableFuture.completedFuture(connection);
}
}
return connection;
}).toCompletableFuture();
}
return writer;
}
The sending principle of lettuce:
Load the topology when the client starts, and store the mapping relationship between slot and node locally in an array structure slotCache
When sending, after calculating the CRC16 of the key, go to the array slotCache through slot to get the corresponding node, and continue to get the connection of this node
Note that basically in all middleware of this cluster mode, the logic of the client is to obtain the network topology of the server, and then calculate the mapping logic on the client,
Compare the performance analysis of Kafka across computer rooms:
redis cluster information troubleshooting
./bin/redis-cli -h 10.10.28.2 -p 25661 cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size: 3
cluster_current_epoch:8
cluster_my_epoch:6
cluster_stats_messages_ping_sent:615483
cluster_stats_messages_pong_sent:610194
cluster_stats_messages_meet_sent:3
cluster_stats_messages_fail_sent:8
cluster_stats_messages_auth-req_sent:5
cluster_stats_messages_auth-ack_sent:2
cluster_stats_messages_update_sent:4
cluster_stats_messages_sent:1225699
cluster_stats_messages_ping_received:610188
cluster_stats_messages_pong_received:603593
cluster_stats_messages_meet_received:2
cluster_stats_messages_fail_received:4
cluster_stats_messages_auth-req_received:2
cluster_stats_messages_auth-ack_received:2
cluster_stats_messages_received:1213791
./bin/redis-cli -h 10.10.28.2 -p 25661 cluster nodes
5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662#35662 slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921769000 15 connected
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651#35651 master - 0 1595921770000 18 connected 4096-6143
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661#35661 myself,master - 0 1595921759000 15 connected 10240-12287
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686#35686 master - 0 1595921769000 14 connected 12288-14335
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656#35656 master - 0 1595921771000 13 connected 14336-16383
f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652#35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921769000 18 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666#35666 master - 0 1595921769870 16 connected 8192-10239
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687#35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921763000 14 connected
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667#35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921770870 16 connected
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681#35681 master - 0 1595921773876 0 connected 0-2047
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677#35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921768000 17 connected
d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671#35671 master - 0 1595921773000 6 connected 2048-4095
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682#35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921771872 12 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672#35672 slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921770000 6 connected
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657#35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921762000 13 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676#35676 master - 0 1595921772873 17 connected 6144-8191
./bin/redis-cli -h 10.10.28.3 -p 25662 cluster nodes
f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652#35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921741000 18 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666#35666 master - 0 1595921744000 16 connected 8192-10239
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657#35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921740000 13 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676#35676 master - 0 1595921743127 17 connected 6144-8191
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651#35651 master - 0 1595921743000 18 connected 4096-6143
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661#35661 master - 0 1595921744129 15 connected 10240-12287
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667#35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921740000 16 connected
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687#35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921745130 14 connected
5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662#35662 myself,slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921733000 5 connected 0-1820
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682#35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921744000 12 connected
d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671#35671 master - 0 1595921739000 6 connected 2048-4095
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656#35656 master - 0 1595921742000 13 connected 14336-16383
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681#35681 master - 0 1595921746131 0 connected 1821-2047
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686#35686 master - 0 1595921747133 14 connected 12288-14335
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677#35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921742126 17 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672#35672 slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921745000 6 connected
./bin/redis-cli -h 10.10.49.9 -p 25672 cluster nodes
d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671#35671 master - 0 1595921829000 6 connected 2048-4095
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651#35651 master - 0 1595921830000 18 connected 4096-6143
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681#35681 master - 0 1595921830719 0 connected 0-1820
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687#35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921827000 14 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676#35676 master - 0 1595921827000 17 connected 6144-8191
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661#35661 master - 0 1595921822000 15 connected 10240-12287
5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662#35662 slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921828714 15 connected
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682#35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921832721 12 connected
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686#35686 master - 0 1595921825000 14 connected 12288-14335
f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652#35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921830000 18 connected
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677#35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921829716 17 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672#35672 myself,slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921832000 4 connected 1821-2047
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667#35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921826711 16 connected
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657#35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921829000 13 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666#35666 master - 0 1595921831720 16 connected 8192-10239
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656#35656 master - 0 1595921827714 13 connected 14336-16383
./bin/redis-trib.rb check 10.10.30.9:25671
>>> Performing Cluster Check (using node 10.10.30.9:25671)
M: d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671
slots:2048-4095 (2048 slots) master
1 additional replica(s)
S: e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672
slots: (0 slots) slave
········
········
S: f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657
slots: (0 slots) slave
replicates 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757
M: 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656
slots:14336-16383 (2048 slots) master
1 additional replica(s)
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Be suspicious of everything, be vigilant, diligence can make up for one’s weaknesses
At the beginning, I suspected that the cluster is healthy, but due to online phenomena, most of the nodes are normal, and a few do not have problems, restart the problem fix
In the beginning, the information was checked through normal nodes, and no problems were found. Even if the topological information of several nodes was inconsistent through the logs at the beginning, it would be difficult to see the problem without the mapping relationship of the client.
Comparison found that some slots are mapped to slave nodes
After executing the check, it is found that there is a problem with the cluster, which is explained in the following open source related issues
I saw the same issue as you and tried to investigate that.
I figured out that caused by lettuce.
When we run a Redis command, lettuce will analyze and recognize which is the Redis-Endpoint to send the order.
If READ-COMMAND then it will send to slave-node (By setting ReadFrom.Any_Rep). Please note that the other ReadFrom options may change the behavior.
If WRITE-COMMAND then it will send to master-node
To determine what are READ-COMMAND. Lettuce used ReadOnlyCommands class to list all Read commands.
In my case, I used the EVAL command to write a key value to Redis. But Lettuce determines it is READ-COMMAND then send to slave-node => The exception happens.
So please check ReadOnlyCommands class and make sure your write-command does not include that. This is a mistake from the Lettuce team and they already fix this issue from newer versions.
In your version, ReadOnlyCommands for cluster settings is
class ReadOnlyCommands {
private static final Set<CommandType> READ_ONLY_COMMANDS = EnumSet.noneOf(CommandType.class);
static {
for (CommandName commandNames : CommandName.values()) {
READ_ONLY_COMMANDS.add(CommandType.valueOf(commandNames.name()));
}
}
/**
* #param protocolKeyword must not be {#literal null}.
* #return {#literal true} if {#link ProtocolKeyword} is a read-only command.
*/
public static boolean isReadOnlyCommand(ProtocolKeyword protocolKeyword) {
return READ_ONLY_COMMANDS.contains(protocolKeyword);
}
/**
* #return an unmodifiable {#link Set} of {#link CommandType read-only} commands.
*/
public static Set<CommandType> getReadOnlyCommands() {
return Collections.unmodifiableSet(READ_ONLY_COMMANDS);
}
enum CommandName {
ASKING, BITCOUNT, BITPOS, CLIENT, COMMAND, DUMP, ECHO, EVAL, EVALSHA, EXISTS, //
GEODIST, GEOPOS, GEORADIUS, GEORADIUSBYMEMBER, GEOHASH, GET, GETBIT, //
GETRANGE, HEXISTS, HGET, HGETALL, HKEYS, HLEN, HMGET, HSCAN, HSTRLEN, //
HVALS, INFO, KEYS, LINDEX, LLEN, LRANGE, MGET, PFCOUNT, PTTL, //
RANDOMKEY, READWRITE, SCAN, SCARD, SCRIPT, //
SDIFF, SINTER, SISMEMBER, SMEMBERS, SRANDMEMBER, SSCAN, STRLEN, //
SUNION, TIME, TTL, TYPE, ZCARD, ZCOUNT, ZLEXCOUNT, ZRANGE, //
ZRANGEBYLEX, ZRANGEBYSCORE, ZRANK, ZREVRANGE, ZREVRANGEBYLEX, ZREVRANGEBYSCORE, ZREVRANK, ZSCAN, ZSCORE, //
// Pub/Sub commands are no key-space commands so they are safe to execute on slave nodes
PUBLISH, PUBSUB, PSUBSCRIBE, PUNSUBSCRIBE, SUBSCRIBE, UNSUBSCRIBE
So you can check easily.
Solution -> Upgrade version Letture is the best way to do. Or you can try to override this setting
centos 7, tomcat 8.5.
a.war and rest.war are in the same tomcat.
a.war use following code to call rest.war:
import org.apache.http.impl.client.DefaultHttpClient;
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(url);
httpPost.addHeader(HTTP.CONTENT_TYPE, "application/json");
StringEntity se = new StringEntity(json.toString());
se.setContentType("text/json");
se.setContentEncoding(new BasicHeader(HTTP.CONTENT_TYPE, "application/json"));
httpPost.setEntity(se);
HttpResponse response = httpClient.execute(httpPost);
however, if url of HttpPost(url) is <public ip>:80, then httpClient.execute(httpPost) will throw connection refused.
while if url of HttpPost(url) is localhost:80 or 127.0.0.1:80, then httpClient.execute(httpPost) is success.
why? and how can solve this problem?
Note: if I access a.war from browser with public ip like http://<public ip>/a in my computer, all operations are success.
my tomcat connector is:
<Connector
port="80"
protocol="HTTP/1.1"
connectionTimeout="60000"
keepAliveTimeout="15000"
maxKeepAliveRequests="-1"
maxThreads="1000"
minSpareThreads="200"
maxSpareThreads="300"
minProcessors="100"
maxProcessors="900"
acceptCount="1000"
enableLookups="false"
executor="tomcatThreadPool"
maxPostSize="-1"
compression="on"
compressionMinSize="1024"
redirectPort="8443" />
my server has no domain, only has a public ip, its /etc/hosts is:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
updated with some commands run in server:
ss -nltp
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:111 *:* users:(("rpcbind",pid=643,fd=8))
LISTEN 0 128 *:80 *:* users:(("java",pid=31986,fd=53))
LISTEN 0 128 *:22 *:* users:(("sshd",pid=961,fd=3))
LISTEN 0 1 127.0.0.1:8005 *:* users:(("java",pid=31986,fd=68))
LISTEN 0 128 :::111 :::* users:(("rpcbind",pid=643,fd=11))
LISTEN 0 128 :::22 :::* users:(("sshd",pid=961,fd=4))
LISTEN 0 80 :::3306 :::* users:(("mysqld",pid=1160,fd=19))
netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 643/rpcbind
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 31986/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 961/sshd
tcp 0 0 127.0.0.1:8005 0.0.0.0:* LISTEN 31986/java
tcp6 0 0 :::111 :::* LISTEN 643/rpcbind
tcp6 0 0 :::22 :::* LISTEN 961/sshd
tcp6 0 0 :::3306 :::* LISTEN 1160/mysqld
ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 1396428 bytes 179342662 (171.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1396428 bytes 179342662 (171.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
p2p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.25 netmask 255.255.255.0 broadcast 192.168.1.255
ether f8:bc:12:a3:4f:b7 txqueuelen 1000 (Ethernet)
RX packets 5352432 bytes 3009606926 (2.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2839034 bytes 559838396 (533.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether f8:bc:12:a3:4f:b7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.25/24 brd 192.168.1.255 scope global noprefixroute dynamic p2p1
valid_lft 54621sec preferred_lft 54621sec
route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 100 0 0 p2p1
192.168.1.0 0.0.0.0 255.255.255.0 U 100 0 0 p2p1
ip route
default via 192.168.1.1 dev p2p1 proto dhcp metric 100
192.168.1.0/24 dev p2p1 proto kernel scope link src 192.168.1.25 metric 100
iptables -L -n -v --line-numbers
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
You probably have configured one of these:
Firewall public IP's ports, so that nothing goes through.
Tomcat may bind a specific IP, e.g. localhost (see Connector elements in tomcat's server.xml)
Apache httpd, nginx or another reverse proxy might handle various virtual host names, and also they might handle localhost different than the public IP
Port Forwarding - if you only forward localhost:80 to localhost:8080 (tomcat's default port), you might not have anything on publicip:80 that forwards that traffic as well.
Edit after your comment:
incoming traffic seems to be fine, but outgoing you do have those problems. Adding from #stringy05's comment: Check if the IP in question is routable from your server: You're connecting to whatever IP from that server, so use another means to create an outgoing connection, e.g. curl.
Explanation for #1 & #3:
If you connect to an external http server, it will handle the request differently based on the hostname used. It might well be that the IP "hostname" is blocked, either by a high level firewall, or just handled differently than the URL by the webserver itself. In most cases you can check this by connecting to the webserver in question from any other system, e.g. your own browser.
If Tomcat is listening (bound) to your public IP-adres it should work, but maybe your public IP-adres belongs to some other device, like a SOHO router, than your problem is similar to this:
https://superuser.com/questions/208710/public-ip-address-answered-by-router-not-internal-web-server-with-port-forwardi
But without an DNS name you cannot simply add a line to /etc/hosts but you can add the public IP-adres to one of your Network Interfaces Cards (NIC) like lo (loopback), eth0, etc. as described in one of these articles:
https://www.garron.me/en/linux/add-secondary-ip-linux.html
https://www.thegeekdiary.com/centos-rhel-6-how-to-addremove-additional-ip-addresses-to-a-network-interface/
E.g. with public IP-address 1.2.3.4 you would need (which will only be effective until next reboot and worst case might interfere with your ability to connect to the server with e.g. SSH!):
sudo ip addr add 1.2.3.4/32 dev lo
It may be useful to have the output of these commands to better understand your setup, feel free to share it in your question, with consistently anonymized public IP-adres):
Either one of these (ss = socket stat, newer replacement for good old netstat):
ss -nltp
netstat -nltp
And one of these:
ifconfig
ip addr show
And last but not least either one of these:
route
ip route
I don't expect that we need to know your firewall config, but if you use it, it may be interesting to keep an eye on it while you are at it:
iptables -L -n -v --line-numbers
Try putting your public domain names into the local /etc/hosts file of your server like this:
127.0.0.1 localhost YOURPUBLIC.DOMAIN.NAME
This way your Java code does not need to try to use the external IP-adres but instead connects directly to Tomcat.
Good luck!
I think the curl timeout explains it - you have a firewall rule somewhere that is stopping the server accessing the public IP address.
If there's no reason the service can't be accessed using localhost or the local hostname then do that but if you need to call the service via a public IP then it's a matter of working out why the request gets a timeout from the server.
Some usual suspects:
The server might not actually have internet access - can you curl https://www.google.com?
There might be a forward proxy required - a sys admin will know this sort of thing
There might be IP whitelisting on some infra around your server - think AWS security groups, load balancer IP whitelists that sort of thing. To fix that you need to know the public IP of your server curl https://canihazip.com/s and get that added to the whitelist
I spent some time for the ports used by jvm on krt boxes. I see each jvm opens 10 ports.
Five are defined in the command line for mgmt., http, debug, jmx and ajp. Out of the other
five I can understand 1 for activemq and 2 for jdbc. There are two unknown to me
One out of that connects back to the server and another does not show what it is listening
To. The one option I read on net it to increase the range of ephemeral ports (we have 32k
Starting we can go 16k) I am not sure how we can dictate the port numbers for the five
Which are not defined today
Some commands to describe the situation.
[krtdev7#surya:/env/krtdev7/bin]$krtport KRTDataHistory-1
PORT ASSIGNMENTS:
=================
mgmt/shutdown=17091
http=17291
ajp=17491
jmx=17691
debug=17891
[krtdev7#surya:/env/krtdev7/bin] $ netstat -ap|grep 16831
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:17291 0.0.0.0:* LISTEN 16831/java
tcp 0 0 0.0.0.0:17491 0.0.0.0:* LISTEN 16831/java
tcp 0 0 0.0.0.0:36691 0.0.0.0:* LISTEN 16831/java
tcp 0 0 0.0.0.0:40596 0.0.0.0:* LISTEN 16831/java
tcp 0 0 0.0.0.0:17691 0.0.0.0:* LISTEN 16831/java
tcp 0 0 localhost:17091 0.0.0.0:* LISTEN 16831/java
tcp 0 0 0.0.0.0:17891 0.0.0.0:* LISTEN 16831/java
tcp 0 0 surya.internal.su:51631 sky.internal.s:ncube-lm ESTABLISHED 16831/java
tcp 0 0 surya.internal.su:40938 agni.internal.sun:61616 ESTABLISHED 16831/java
tcp 0 0 surya.internal.su:51630 sky.internal.s:ncube-lm ESTABLISHED 16831/java
unix 2 [ ] STREAM CONNECTED 16386441 16831/java**
Now we can see the 5 extra ports are assigned Could anybody let me know how to control these 5 extra port assignment rather how to make the jvm choose from the range of ports for these 5 extra ports?
I have a problem setting up camel netty consumer for 514 port in order to catch syslog messages.
My route:
from("netty:udp://127.0.0.1:514?sync=false")
.process(new Processor(){
public void process(Exchange exchange) throws Exception {
processor.processAntyMalwareLog(exchange);
}
}).log("I've got message");
application is starting:
Route: route3 started and consuming from: Endpoint[udp://127.0.0.1:514]
and 514 port is opened but not is not listening
>netstat -lnp | grep 514
udp6 0 0 127.0.0.1:514 :::* 21513/java
I can see in tcpdump with tcpdump -i eth1 -nn -A -s 0 port 514 and udp that the messages are are being send and received properly.
Can anyone point me where I am doing mistake?
You need to use the client mode, eg set clientMode=true. See more details in the netty docs:
http://camel.apache.org/netty.html
And upgrade and use Netty 4 if possible:
http://camel.apache.org/netty4.html
I'm trying one scenario where my local ip is pinging ,server1_ip and server2_ip, but it's causing hogging on server as there are more than 1 connections on same ip's port as below..
[root#local ~]# netstat -antup -p|grep 8000
tcp 1 1 ::ffff:local_ip:58972 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49169 ::ffff:server2_ip:8000 LAST_ACK -
tcp 1 0 ::ffff:local_ip:49172 ::ffff:server2_ip:8000 CLOSE_WAIT 25544/java
tcp 1 0 ::ffff:local_ip:58982 ::ffff:server1_ip:8000 CLOSE_WAIT 25544/java
tcp 1 1 ::ffff:local_ip:58975 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49162 ::ffff:server2_ip:8000 LAST_ACK -
there are 2 threads , on some functionality I need to stop thread and also close socket connection on port 8000.
which is I'm doing with following method which is part of my thread.
protected void disconnect() {
if (this.mSocket != null) {
try {
this.mSocket.shutdownInput();
this.mSocket.shutdownOutput();
this.mOutputStream.flush();
this.mOutputStream.close();
this.mInputStream.close();
this.mSocket.close();
} catch (Exception vException) {
vException.printStackTrace();
}
}
this.mInputStream = null;
this.mOutputStream = null;
this.mSocket = null;
}
but when this method is called it's sending that connection in LAST_ACK state.
Please let me know cause of this and solution on this problem.
CLOSE_WAIT and LAST_ACK are intermediary states of a TCP connection reached just before closing. The tcp connection should eventually reache the CLOSED state: CLOSE_WAIT->LAST_ACK->CLOSED. So there is normal what are you seeing in netstat.
See this diagram of a tcp connection transition state: http://www.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf
The only issue in your code is that you call flush on the OutputStream after shutting down output. You can remove the shutdownInput & shutdownOutput, they are useful when you want to close a single communication way: input or output.