Detect protocol from java SOCKS socket - java

I'm developing a custom SOCKS5 server in Java. Other than the first CONNECT message that includes the HOST and PORT, is then any way to inspect the subsequent messages to determine the protocol of the data? For example, if the application data starts with "GET /...", the request is likely HyperText Transfer Protocol (HTTP), but that is far from a complete solution. Is there a way to see if the data is say HTTPS, or FTP, or "NetFlix streaming", etc...?
Secondarily, if the data is http or https how would I forward the request to a dedicated HTTP proxy?

is then any way to inspect the subsequent messages to determine the protocol of the data?... s there a way to see if the data is say HTTPS, or FTP, or "NetFlix streaming", etc...?
Basically you have destination port, destination IP address and maybe hostname (if DNS resolving is done through the SOCKS5 server too) and the payload. Based on the knowledge of well known target hosts, target ports and typical payloads you could build heuristics to guess the protocol.
You will find such heuristics in today's Intrusion Detection Systems, better firewalls and traffic classifiers and they differ a lot in the detection quality and a determined user can often fool these heuristics. This is a very wide topic but you might start looking at free deep inspection (DPI) libraries like nDPI and read more about DPI at Wikipedia.
Secondarily, if the data is http or https how would I forward the request to a dedicated HTTP proxy?
First change the target from the target requested by the client to the proxy. This must be done of course before any data gets transferred which might conflict with the DPI you do on the data stream because some connections first get data from the server (like SMTP) while others like HTTP(S) first get data from the client. Thus you probably need to find out if this is HTTP(S) before getting any payload, i.e. only based on target port. For HTTPS you would then need then to establish a tunnel using a CONNECT request as described in RFC 2817. For HTTP you would modify the request to include not only the path but the full URL (i.e. http://host[:port]/path).
As you can see all of this uses lots of heuristics which work for most but not all cases. Apart from that this can be a very complex task depending on the quality of traffic classification you need.

Related

Was HTTP request received over TCP or UDP?

Is there a way in Java to know if an HTTP request was received over TCP or over UDP?
Quote from the RFC2616
HTTP communication usually takes place over TCP/IP connections. The
default port is TCP 80 [19], but other ports can be used. This does
not preclude HTTP from being implemented on top of any other protocol
on the Internet, or on other networks. HTTP only presumes a reliable
transport; any protocol that provides such guarantees can be used;
the mapping of the HTTP/1.1 request and response structures onto the
transport data units of the protocol in question is outside the scope
of this specification.
I would say this eliminates default UDP. Other Reliable forms of protocols would still be possible
As #ceekay says, RFC tells that HTTP uses reliable transport only, so that means no way for UDP. But one may try to build some other protocol on top of UDP, or may be do not use TCP/IP stack at all..
But as your question is about Java, then the answer is - this is all about Java libraries and frameworks used. Actually all the libraries that I know, like HtmlUnit http://htmlunit.sourceforge.net for example, hide this information from you. So you are dealing with HTTP(s) only without knowing details about underground transport. But in theory this is possible that some library will show this information for you.
But actually I do not see a way why this may be importatnt for you (in 99.999999% HTTP will use TCP). If you tell us why you are asking that strange question, then maybe we will answer you more specific.

Does Fiddler work as reverse proxy in the following scenario? If not any other tool to use for debugging?

I use Fiddler a lot (especially the composer and reverse proxy features) in web debugging. I am able to successfully send traffic to my reverseproxy server ("myreverseproxy:8888"). This is basically using custom rules explained here, and it works with most cases.
However I tried to redirect traffic similar with one of the third party DLLs. These are native code, so can't decompile to see what's happening. Basically I don't have a source - but it connects to a host on a specified port - I wanted to observe the packets to take a closer look at headers, packets, authentication etc.
But in this particular case reverse proxy is not working. Probably the third party DLL is not using http stack.
My questions:
Is there anything I can do in this particular case to understand how this external native assembly is connecting to the server? or Fiddler can't be used as it probably can't intercept this traffic?
if that is the case, what other tools I can use to intercept the traffic?
Update
Here are the things I observed:
From client side, Fiddler is unable to capture packets (probably the api's Fiddler use at higher level)
Then I setup reverse proxy, to make sure I catch the traffic at 'rever proxy server' - however Fiddler still could catch the traffic. Thanks to Lex - looks like fiddler won't even show the packet if it can't decode payload as http (not too sure - but it's my conjecture from Lex's response and other posts)
Then I have installed Wireshark, captured the packets, and by default 'wireshark' showed the interactions between client and server as TLSv1 - so it confirms why fiddler couldn't catch
Just to confirm I applied HTTP decoder to see if 'default decoder' assigned by Wireshark is wrong - but, it couldn't decode details.
So, it confirms the native module is using TLSv1, not https and that explains why I can't use Fiddler.
If the protocol is known, you can utilize Wireshark to capture and analyze the packets.
If the protocol is private, you can only get raw bytes, which takes much more efforts to analyze.
Fiddler is designed for HTTP based protocols only, so it is not as general purpose as Wireshark.

Jetty - proxy server with dynamic registration

We have a number of Jetty http(s) servers, all behind different firewalls. The http servers are at customer sites (not under our control). Opening ports in the firewalls at these sites is not an option. Right now, these servers only serve JSON documents in response to REST requests.
We have web clients that need to interact with a given http server based on URL parameter or header value.
This seems like a straightforward proxy server situation - except for the firewall.
The approach that I'm currently trying is this:
Have a centralized proxy server (also Jetty based) that listens for inbound registration requests from the remote http servers. The registration request will take the form of a Websocket connection, which will be kept alive as long at the remote HTTP server is available. On registration, the Proxy Server will capture the websocket connection and map it to a resource identifier.
The web client will connect the proxy server, and include the resource identifier in the URL or header.
The proxy server will determine the appropriate Websocket to use, then pass the request on to the HTTP server. So the request and response will travel over the Websocket. Once the response is received, it will be returned to the web client.
So this is all well and good in theory - what I'm trying to figure out is:
a) is there a better way to achieve this?
b) What's the best way to set up Jetty to do the proxying on the HTTP Server end of the pipe?
I suppose that I could use Jetty's HttpClient, but what I really want to do is just pull the HTTP bytes from the websocket and pipe them directly into the Jetty connector. It doesn't seem to make sense to parse everything out. I suppose that I could open a regular socket connection on localhost, grab the bytes from the websocket, and do it that way - but it seems silly to route through the OS like that (I'm already operating inside the HTTP Server's Jetty environment).
It sure seems like this is the sort of problem that may have already been solved... Maybe by using a custom jetty Connection that works on WebSockets instead of TCP/IP sockets?
Update: as I've been playing with this, it seems like another tricky problem is how to handle request/response behavior (and ideally support muxing over the websocket channel). One potential resource that I've found is the WAMP sub-protocol for websockets: http://wamp.ws/
In case anyone else is looking for an answer to this one - RESTEasy has a mocking framework that can be used to invoke the REST functionality without running through a full servlet container: http://docs.jboss.org/resteasy/docs/2.0.0.GA/userguide/html_single/index.html#RESTEasy_Server-side_Mock_Framework
This, combined with WAMP, appears to do what I'm looking for.

Intercept HTTP requests on linux

I need something that can intercept HTTP requests, extract their information (content, destination,...), perform various analysing tasks, and finally determine if the request should be dropped or not. Legal requests must than be forwarded to the application.
Basically, same functionalities as an IDS. But mind, I am NOT looking for a packet sniffer/filter. I want something that operates on the HTTP level.
It should be implementable on linux and run on the same system as the application(s) to which the requests are headed.
As a bonus, https could be supported (unencrypted viewing of the request content)
Try mitmproxy.
mitmproxy is an SSL-capable man-in-the-middle proxy for HTTP. It provides a console interface that allows traffic flows to be inspected and edited on the fly.
mitmdump is the command-line version of mitmproxy, with the same functionality but without the user interface. Think tcpdump for HTTP.
Features
Intercept HTTP requests and responses and modify them on the fly.
Save complete HTTP conversations for later replay and analysis.
Replay the client-side of an HTTP conversations.
Replay HTTP responses of a previously recorded server.
Reverse proxy mode to forward traffic to a specified server.
Make scripted changes to HTTP traffic using Python.
SSL certificates for interception are generated on the fly.
Screenshot
Example
I setup an example Jekyll Bootstrap app which is listening on port 4000 on my localhost. To intercept it's traffic I'd do the following:
% mitmproxy --mode reverse:http://localhost:4000 -p 4001
Then connect to my mitmproxy on port 4001 from my web browser (http://localhost:4001), resulting in this in mitmproxy:
You can then select any of the GET results to see the header info associated to that GET:
Try using
Burp Proxy, sounds like what you need.
I use Wire Shark for this, if you provide all the server certs it wil even decypt HTTPS.
You should be able to use squid proxy for that (https://en.wikipedia.org/wiki/Squid_(software))
You should learn more about ICAP, then make an ICAP server of your HTTP filtering application.
I ended up using LittleProxy because it is java, fast and lightweight.
It is a originally forward proxy, so I had to adjust it for reverse proxy functionality by forwarding every request to the local host.
I did this simply by editing the HttpRequestHandler. I hardcoded the host and port address.
hostAndPort = "localhost:80";
Why not Apache HTTP Client http://hc.apache.org/httpclient-legacy/tutorial.html
This simple lib is useful.

How to detect SSL gracefully

I've got a web service which may be bound either to ssl or plain http. The java clients configured to know the server host and port. When client connects, I construct the server end point like http://host:port/service. Clients don't have a knowledge whether the server is using ssl - server always binds to a single port so that it's either secure or not. Now, the question is how to make a client to discover this without introducing another parameter? Can I challenge plain http request and then fall back to ssl (or vice verse) on a certain exception? Or I must explicitly introduce new connection parameter for the clients?
On the server side, you could use a mechanism like Grizzly's port unification implementation. This can be used to serve HTTP and HTTPS on the same port. This relies on the fact that in both cases, the client talks first and either sends an HTTP request or an SSL/TLS Client Hello message. It's quite handy for this on the server side (although I'm not sure I'd recommend running two protocols on the same port in general).
From the client's point of view (which is what you're asking about), the consequences of that are:
The fact that the client talks first means that it will always have to try first. Expect an exception of some sort if you try to talk SSL/TLS to a plain HTTP service and vice versa.
If the server uses port unification, there is no way you're going to be able to find out reliably.
Port unification aside (this is a rare case after all), you could try to cache results of past attempts.
More fundamentally, from a security point of view, not knowing which protocol should be used introduces a vulnerability: your system will be open to downgrade attacks (in a similar way as blindly relying on automatic redirects would). If your user-agent supports HSTS, it would be worth looking into that (although it would require the user-agent to remember which sites are to be used with HTTPS).
Either way, if you're concerned about security, you must configure the client to know when to use https://.

Categories