Apache HttpClient, User-Agent, and corporate proxy

Apache HttpClient, User-Agent, and corporate proxy - java

This is somewhat of a speculative question in that the answer may not be apparent in the info I have available, but I am hoping that someone with sufficient experience will recognize a likely answer based on common practices for corporate proxies.
I work (not as a software developer) behind a corporate proxy. In my spare time I was messing around with a Java program I'm developing. This program needs to make a few very simple HTTP GET requests, and I'm using Apache HttpClient for that. I was concerned at first about whether or not I'd make it through the proxy server. In our web browsers, the proxy server is simple entered into the network settings... no authentication needed. So, I added the following to my Java program:
myClient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, MY_PROXY);
Sure enough, it worked! However, I had another concern. The HTTP requests coming from my program probably had some strange User-Agent specified (I've since confirmed this is the case), and I did not want them to ever trigger any sort of suspicion in automated or manual packet inspections. So I said to myself, "why not just set the User-Agent header to be the same as the browser on this machine?"
myClient.getParams().setParameter(CoreProtocolPNames.USER_AGENT, BROWSER_AGENT);
Here is where it gets weird. If the BROWSER_AGENT string above is set to exactly the same value as the corporate supplied browser on my machine (either IE or FF), I get an "authentication failed, missing credentials" type error message returned from the corporate proxy server. But, if I set the User-Agent header to something generic, like say Mozilla 5.0 or even a totally bogus string, or even an empty string, it all works fine! The parts that confuse me are:
When User-Agent is set to the same as my browser (a long complex string), I "fail authentication" somehow, which makes no sense since in the real browser I provide no authentication information (unless it comes from some pre-installed certificate maybe?)
If the corporation requires authentication for any requests sent to the proxy server on port 80, then how come they let random User-Agent strings get through? Oversight? Some other reason I can't comprehend?
Hopefully this question is not too speculative to be deemed constructive. I'd love to hear from people with experience in this area. Thanks.

By default, HTTPClient identifies itself as the user agent. As you have seen, you can override this to any string you want.
Looks like your proxy servers is configured to automatically add user credentials based on browser type however due to some exception found, your admin added an exception rule, ie, when the user-agent is not known, just let it through. Personally, I think it is a very bad security policy since as you found out, all program can go through your proxy without authentication just by using a bogus user-agent.

Related

2 Same HTTP Requests Give Different Results?

What differentiate these 2 requests that cause them to have different results/responses from the server although they should be the same ?
Request initiated by Chrome after a simple
click/navigation(successful, response code is 302)
I simply copied
that request as a curl and imported it to Postman and then postman
hanged
I did the same with Java - HttpUrlConnection(mimicking all the request headers and cookies like Chrome sent), but it hanged and waited forever. Is this simply because of the server logic that doesn't accept non-browser client ?
Here are the steps that I tried:
1. Visited this link: https://www.tokopedia.com/p/handphone-tablet/handphone
2. I opened the inspector and opened the Network - All tab
3. I clicked one of the products
4. I clicked the top request from the Network - All tab
5. I copied it as cURL bash
6. I imported it to Postman
7. I ran that request
8. Postman hanged

Actually the problem might even go deeper than what the other answers say.
So neither the User-Agent request header nor telnet might solve that problem (unless you initialize the TLS handshake also with telnet MANUALLY, but that is near impossible to complete).
TLS fingerprinting
If the connection is an SSL/TLS connection, the server could detect which algorithm is used to generate keys, and most applications have their specific signature / cipher.
So only by the TLS handshake alone you can tell Chrome from Postman or FireFox or Java. Java usually - unless a JVM implementation REALLY wants to go off-road - has the same signature across all platforms, using the same cipher/algorithm across all implementations.
I am sorry I cannot properly recall the name of this technique. The first project I know that published this is called something like "A3" or "S3". Salesforce published an article about JA3 analysis. They describe the technique and show a list of signatures and applications so you can guesstimate what app you're talking to, without the need to even decrypt the data: https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967
My Solution
I had that same problem too, wanted to scan the NVidia or AMD servers for graphics card availability. Did not work from Java, so after a lot of research, finding the project mentioned above, I simply used Selenium to control FireFox and that got the proper server responses and I achieved my goal this way.

The only way to be sure that the exact same data is sent is to manually send it yourself through something like telnet. I had a similar problem once- it turned out that the browser was sending the data in one big chunk, while my code was sending it line-by-line. No site should have this problem, but it's possible that it exists.

The server might be checking for User-Agent request header and will block traffic that does not originate from a browser. Try setting the header in curl or your Java Code to a value corresponding to (any) browser. I've encountered such behavior on some e-shops and commercial websites.

How to put a my own proxy between any client and any server (via a web page)

what I want to do is to build a web application(proxy) that user use to request the webpage he want and
my application forward the request to the main server,
modify HTML code,
send to the client the modified one.
The question now is
How to keep my application between the client and main serevr
(for example when the user click any link inside the modified page-
ajax request - submit forms - and so on)
in another words
How to grantee that any request (after the first URL request) from the client sent to my proxy and any response come first to my proxy

The question is: Why do you need a proxy? Why do you want to build it - why not use already existing one like HAProxy ?
EDIT: sorry, I didn't read your whole post correctly. You can start with:
http://www.jtmelton.com/2007/11/27/a-simple-multi-threaded-java-http-proxy-server/

If the user is willing to, or can be forced1 to configure his clients (e.g. web browser) to use a web proxy, then your problem is already solved. Another way to do this (assuming that the user is cooperative) is to get them to install a trusted browser plugin that dynamically routes selected URLs through your proxy. But you can't do this using an untrusted webapp: the Browser sandbox won't (shouldn't) let you.
Doing it without the user's knowledge and consent requires some kind of interference at the network level. For example, a "smart" switch could recognizes TCP/IP packets on port 80 and deliberately route them to your proxy instead of the IP address that the client's browser specifies. This kind of thing is known as "deep packet inspection". It would be very difficult to implement yourself, and it requires significant compute power in your network switch if you are going to achieve high network rates through the switch.
The second problem is that making meaningful on-the-fly modifications to arbitrary HTML + Javascript responses is a really difficult problem.
The final problem is that this is only going to work with HTTP. HTTPS protects against "man in the middle" attacks ... such as this ... that monitor or interfere with the requests and responses. The best you could hope to do would be to capture the encrypted traffic between the client and the server.
1 - The normal way to force a user to do this is to implement a firewall that blocks all outgoing HTTP connections apart from those made via your proxy.
UPDATE
The problem now what should I change in the html code to enforce client to request any thing from my app --- for example for link href attribute may be www.aaaa.com?url=www.google.com but for ajax and form what I should do?
Like I said, it is a difficult task. You have to deal with the following problems:
Finding and updating absolute URLs in the HTML. (Not hard)
Finding and dealing with the base URL (if any). (Not hard)
Dealing with the URLs that you don't want to change; e.g. links to CSS, javascript (maybe), etc. (Harder ...)
Dealing with HTML that is syntactically invalid ... but not to the extent that the browser can't cope. (Hard)
Dealing with cross-site issues. (Uncertain ...)
Dealing with URLs in requests being made by javascript embedded in / called from the page. This is extremely difficult, given the myriad ways that javascript could assemble the URL.
Dealing with HTTPS. (Impossible to do securely; i.e. without the user not trusting the proxy to see private info such as passwords, credit card numbers, etc that are normally sent securely.)
and so on.

HttpResponse body is being altered

We are facing a peculiar issue at the moment and we have no clue what is causing this.
We have a web-service hosted on serverA.
When this web-service is invoked from serverB (using the command, curl http://serverA:8008/service/getId), we get the required response. (the web service returns an Id which is an integer).
When the same web-service is invoked from serverC, we get the required response but the digit 2 in the response is getting replaced by _ .
For example, we get 5002 when the web-service is invoked from serverB.
When the same web service is invoked from serverC, we get 500_
We checked the wireshark details from serverA and the data going out from serverA is the same for both the servers.
We have no clue at the moment why this is happening. I would like to add that serverC is in DMZ while serverB is not.
Any input/help in this regard is highly appreciated.

by gather the facts that
1. Server doesn't change the response by its own.
2. Web Service is giving the same response for the same input.
only culprit is your firewall, can you stop it for testing purpose and see if the response is coming as expected. OR
Try to check the firewall settings and create a hole/exception for web Service.

Thanks everyone for your efforts, the issue is now resolved. It was an incorrect firewall rule that was causing this. I asked our network engineer how the firewall setting can alter http response body and following is the reply I got:
For certain protocols the firewall does deep-level packet inspection,
so rather than just check the port number it actually looks into the
payload. This allows it to block malware, malformed packets that might
be exploiting a vulnerability and the like. So it know what to inspect
you have to specify in the rule what the traffic is, so you say it’s
on port 8008 and it’s HTTP. The problem was that for some reason this
rule had been set to use port 8008, but the traffic type was set to
passive mode FTP rather than HTTP. Once I corrected it to HTTP, it
started working.

Try putting ServerB in DMZ too and see what happen.
If it acts the same its a network issue.
If not you might have 2 different versions of code on the servers.

This sounds to me like you have special characters in your URL and they cause the overwriting of the port number, but only if the characters are recognized in the character set. Can you use a hex editor to check the URL for special characters (backspace, specifically)?

I can't solve your problem, but look for any transcoders on the path.
Send request from server C to server A.
1) wireshark at A, to see if it receives request correctly. A possible transcoder may convert host-less urls to host-ful ( GET /service/getId to GET http:// serverA:8080/service/getId), or may drop Host header etc. But if you see nothing wrong here proceed to step 2.
2) wireshark at B, to see if response is valid. Look if Content-Type is set correctly. If set correctly, and still getting manipulated try adding header Cache-Control: no-transform. Many transcoders respect that. If this also fails and can't remove any possible transcoders, viruses you may have go to step 3.
3) Just go https, it is immune to such things.

This is a feature of Apache, designed to hide parts of the HTTPresponce.
I did not see a fix immediatly, and do not have the time to look right now. I'll try to edit one in later.
If you want to try to find it, here is the link to the documentation: http://xianshield.org/guides/apache2.0guide.html
use [Ctrl] + [F] to find this statement (without qoutes) "Configure and build the Apache Server"

Security matter: are parameters in url secure?

I have developed myself in the last few months about web development in java (servlets and jsp). I am developing a web server, which is mainly serving for an application. Actually it is running on google app engine. My concern is, although I am using SSL connections, sending parameters in the URL (e.g. https://www.xyz.com/server?password=1234&username=uname) may not be secure. Should I use another way or is it really secure? I don't know if this url is delivered as plaint text as whole (with the parameters)?
Any help would be appreciated!

Everything is encrypted, including the URL and its parameters. You might still avoid them because they might be stored in server-side logs and in the browser history, though.

Your problem seems to go further than Web Server and Google App Engine.
Sending a password through a web form to your server is a very common security issue. See this SO threads:
Is either GET or POST more secure than the other? (meaningly, POST will simply not display the parameter in the URL so this is not enough)
Are https URLs encrypted? (describes something similar to what you intend to do)

The complete HTTP request including the request line is encrypted inside SSL.
Example http request for the above URL which will all be contained within the SSL tunnel:
GET /server?password=1234&username=uname HTTP/1.1
Host: www.xyz.com
...
It is possible though that your application will log the requested URL, as this contains the users password this may not be OK.

Well, apart from the issues to do with logging and visibility of URLs (i.e., what happens before and after the secure communication) both GET and POST are equally secure; there is very little information that is exchanged before the encrypted channel is established, not even the first line of the HTTP protocol. But that doesn't mean you should use GET for this.
The issue is that logging in is changing the state of the server and should not be repeated without the user getting properly notified that this is happening (to prevent surprises with Javascript). The state that is being changed is of the user session information on the server, because what logging in does is associate a verified identity with that session. Because it is a (significant) change of state, the operation should not be done by GET; while you could do it by PUT technically, POST is better because of the non-idempotency assumptions associated with it (which in turn encourages browsers to pop up a warning dialog).

Secure connection between client and server

I'm developing a server component that will serve requests for a embedded client, which is also under my control.
Right now everything is beta and the security works like this:
client sends username / password over https.
server returns access token.
client makes further requests over http with the access token in a custom header.
This is fine for a demo, but it has some problems that need to be fixed before releasing it:
Anyone can copy a login request, re-send it and get an access token back. As some users replied this is not an issue since it goes over https. My mistake.
Anyone can listen and get an access key just by inspecting the request headers.
I can think of a symmetric key encryption, with a timestamp so I can reject duplicate requests, but I was wondering if there are some well known good practices for this scenario (that seems a pretty common).
Thanks a lot for the insight.
PS: I'm using Java for the server and the client is coded in C++, just in case.

I don't get the first part, If the login request is https, how can anyone just copy it?
Regarding the second part, t This is a pretty standard session hijacking scenario. See this question. Of course you don't have the built-in browser options here, but the basic idea is the same - either send the token only over a secure connection when it matters, or in some way associate the token with the sending device.
In a browser, basically all you have is IP address (which isn't very good), but in your case you may be able to express something specific about your device that you validate against the request to ensure the same token isn't being used from somewhere else.
Edit: You could just be lucky here and be able to rule out the IP address changing behind proxies, and actually use it for this purpose.
But at the end of the day, it is much more secure to use https from a well-known and reviewed library rather than trying to roll your own here. I realize that https is an overhead, but rolling your own has big risks around missing obvious things that an attacker can exploit.

First question, just to get it out there: if you're concerned enough about nefarious client-impersonator accesses, why not carry out the entire conversation over HTTPS? Is the minimal performance hit significant enough for this application that it's not worth the added layer of security?
Second, how can someone replay the login request? If I'm not mistaken, that's taking place over HTTPS; if the connection is set up correctly, HTTPS prevents replay attacks using one-time nonces (see here).

One of the common recommendations is - use https
https man in the middle attack aside using https for the entire session should be reliable enough. You do not even need to worry about access tokens - https takes care of this for you.
Using http for further requests seems to introduce some vulnerabilities. Now anybody with a network sniffer can intercept your traffic steal the token and spoof your requests. you can build protection to prevent it - token encryption, use once tokens, etc. but in doing so you will be re-creating https.
Going back to the https man in the middle attack - it is based on somebody's ability to insert himself between your server and your client and funnel your requests through their code. It is all doable i.e. in case the attacker has access to the physical network. The problem such attacker will face is that he will not be able to give you a proper digital certificat - he does not have the private key you used to sign it. When https is accessed through a browser, the browser gives you a warning but still can let you through to the page.
In your case it is your client who will communicate with the server. And you can make sure that all proper validations of the certificate are in place. If you do that you should be fine
Edit
Seconding Yishai - yes some overhead is involved, primarily CPU, but if this additional overhead pushes your server over board, you have bigger problems with your app

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.