Java User Agent - java

I have recently started seeing user agents like Java/1.6.0_14 (and variations) on my site
What does this mean. Is it a browser or bot or what

This likely means someone is crawling your website using Java. This isn't much of anything to be concerned about unless you notice the crawler using large amounts of your bandwidth or not respecting your robots.txt file. Usually legitimate crawlers will take the time to create custom user agent to make it easy to contact the crawler if you have a problem, but even if they're using the default user agent, it's more than likely perfectly benign.
However, if you do notice a spike in 404 hits or lots of hits from the Java client, you're likely under attack by spammers looking for security holes in your website. If your site is built well, there's not a whole lot they can do other than burn some of your bandwidth, but if they find a security hole, they'll be sure to exploit it. Dealing with spammers properly is beyond the scope of this answer, but a scorched earth solution (which will work as a short term fix at the very least) would be to block all user agents that contain the string 'java'.

It means your site is being accessed through the JVM on someones machine. It could be a crawler or simply someone scraping data. You can replicate the user-agent string using the HttpURLConnection class. Here is a sample:
import java.net.*;
public class Request {
public static void main(String[] args) {
try {
URL url=new URL("http://google.ca");
HttpURLConnection con=(HttpURLConnection)url.openConnection();
con.connect();
System.out.println(con.getResponseCode());
} catch (Exception e) {
e.printStackTrace();
}
}
}

Java's HttpURLConnection class will send the JVM version information as the User-Agent header.

Related

Block or redirect to other site in browser using java

I'm trying to write a little application that will block sites (ip) while using browser (chrome, ie, firefox). It can also redirect to other site. As long as user won't be able to use this site I would be satisfied with result.
The problem is that I've searched few hours for solution in google and I still can't find good solution to my problem. There were two solutions for now:
Use host file - this would be a little problematic for my aplication, because I want to block site for period of time. If application will crash - it won't redo host file.
Use "Windows Filtering Platform" - it's written in C++ so it will be harder for me to do. I would love to use java. I can still use C++ in java application but it still isn't satisfying solution.
I would appreciate for any help.
I think I have found solution:
Blocking a website from access for all browsers
Well will try :). But still if anybody have any better ideas don't hesitate to answer this post :).
I did a similar work some time ago, I used the hosts files to block all the entries that spybot search & destroy marked as "dangerous" sites. If you want to secure the site will be freeed when the app crashes, you could use a second program or thread (don't know how complex your application is) that checks if the programm is still running.
Microsoft has the following entry for displaying task names:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa446864(v=vs.85).aspx
Maybe try this code and check for your application to be alive.
However, the user will notice a second task in his taskmanager........!
To patch host files I used this java-method which saves entrys from a Default list model:
try
{
BufferedWriter out;
this.out = new BufferedWriter(new FileWriter("C:\\Windows\\System32\\drivers\\etc\\hosts"));
for (int save = 0; save < Blocker.model.size(); save++) {
this.out.write((String)Blocker.model.getElementAt(save));
this.out.newLine();
}
this.out.close();
} catch (IOException fail) {
JOptionPane.showMessageDialog(null, "Speichern konnte nicht abgeschlossen werden",
"About", 0);
}
I'm not sure if this will really help you, anyway good luck at your project.
(Note that you have to run as administrator to get write rights to hosts-file)

How to Ensure Input from URL isn't from a Redirected Page

I have the following lines of code that gathers the source code from a given URL:
URL url = new URL(websiteAddress);
URLConnection connection = url.openConnection(); // throws an IOException
connection.setConnectTimeout(timeoutInMilliseconds);
bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null) {
outputString += line;
}
However, the problem that I'm having is that wi-fi hotspots often redirect you to a page where you have to click "I Agree." If you run this code before you have clicked that checkbox, then it gathers the source code from the hotspot login page, rather than the intended page.
What I want to do is have some way of checking whether or not the intended page was reached. I was hoping that calling connection.getURL() after creating the InputStreamReader would show me the actual web page that was arrived, but no such luck. How can I determine whether or not the intended URL has been redirected?
One way would be to look for any specific element in your web page, and if its not there then you know that you may be in some other page (possibly redirected to some login page).
The only thing I can suggest is to have a server where you know what the response is, and query that first to ensure connectivity to at least that server. That will (typically) be enough to assume full connectivity.
You can then go on to query the url you're interested in.
The challenege is that if a computer asks for the page at some url, the way a lot of wifi hotspots work is to intercept that request and return the page. There's often no clue, form the computer's POV that the page returned is not the page requested.
One option would be to call setFollowRedirects(false). By default, a connection will quietly follow redirects and try to reach a page which returns a 200 HTTP response. Disabling redirect following will make confirming the expected page is returned easier, simply confirm the response is a 200.
That said, #rec's comment is worth taking into account - it isn't enough to simply check the response code, because there are many different ways a router could interrupt your request, many of which are not detectable. A malicious router could, for instance, intercept all your requests and change the responding content in a subtle but dangerous way - this is called a man-in-the-middle attack.
By definition you cannot avoid MitM attacks unless you can open a secure and trusted connection (generally, HTTPS) between yourself and the remote site, however assuming you aren't really concerned about attacks, the better tactic is simply to assume the data you get back could be broken in any number of ways, and instead make your scraping logic more robust to that possibility.
I can't speak directly to how you would make your logic more robust without understanding your use case and the issues you've run into, however the gist would be to add checks where issues might arise, and throw an exception that you then handle gracefully higher up the stack.
For instance, if your code was:
System.out.println(outputString.subString(outputString.indexOf('A'));
This would fail if outputString didn't actually have an'A'` character. So check that explicitly:
int aPos = outputString.indexOf('A');
if (aPos < 0) {
throw new InvalidParseException("Didn't find an 'A', cannot proceed");
}
System.out.println(outputString.subString(aPos);
And handle the InvalidParseException wherever makes the most sense for your use case.

Downloading html source code is slow

I'm using jsoup in my android app but the problem is, the html source takes too much time to download. Here is my code:
long t = System.currentTimeMillis();
String url = "http://www.stackoverflow.com/";
Document doc = null;
try {
Connection c = Jsoup.connect(url);
doc = c.get();
System.out.println(System.currentTimeMillis() - t);
} catch (IOException e) {
e.printStackTrace();
}
Executing this code takes 1.265 seconds which feels really weird because i can download the whole website (with images and all that good stuff) using web browser in less than a 0.5 seconds on the same device. Did I do something wrong? Or maybe there is a faster way for getting html source of website? Thanks in advance.
Where are you trying this code on? Your device? If you are using the LTE/3G network it wouldn't be too much off.
The other reason that I could think is that your wireless router is not situated in the best place from your device in case you are using Wifi.
From that code I don't see anything that could cause more delay. 1.2 secs may not be that bad if you dont have the host DNS entry cached and the server is far away from you.
Also, try setting the Agent to the same as your browser when comparing times. It may happen that the server gives different priorities based on the user agent. In this case you are using the default Java user agent.

Executing URI commands in Java

One way that Steam lets users launch games and perform many other operations, is by using URI protocols, for example (from Valve developer community):
steam://run/<id> will launch the game that corresponds to the specified ID.
steam://validate/<id> will validate the game files of the specified ID.
How can I get Java to 'run' these? I don't even know what you call it, i.e. do you 'run' URIs, or 'execute' them, or what? Because persumably these URIs don't have anything to return, and the URI class in Java doesn't have anything related to 'executing' them, however URL does, but it doesn't work!
I've tried this:
...
try
{
URI testURI = URI.create("steam://run/240");
URL testURL = joinURI.toURL();
// URL testURL = new URL("steam://run/240") doesn't work either
joinURL.openConnection(); // Doesn't work
// joinURL.openStream() doesn't work either
}
catch (MalformedURLException e)
{
System.err.println(e.getMessage());
}
...
Each combination gives the error: unknown protocol: steam.
The system that Steam uses to handle the URIs is definitely working, because for example, I can type the above URI into Firefox and it works.
My eternal gratitude to the person who provides the answer!
Thanks
Try Desktop.browse(URI), this should start the "default action" which is the Steam client for a steam:// URI, e.g.
URI uri = new URI("steam://store/240");
if (Desktop.isDesktopSupported()) {
Desktop.getDesktop().browse(uri);
}
The system that Steam uses to handle the URIs is definitely working, because for example, I can type the above URI into Firefox and it works.
It is working because Firefox (or other browsers) can associate unkown protocols with applications. When you load steam://xxx for the first time, Firefox asks you which application you want to open. If it didn't ask you, steam probably installed a browser plugin for that.
A Uniform Resource Identifier (URI) just identifies a resource, it doesn't necessarily describe how to access it. Moreover, for custom protocols, such as "steam" the vendor can define any underlying access conventions which compatible client programs must know to interact.
In order to "execute" a URI like this you need to know exactly how the protocol is implemented (is it over HTTP? TCP? UDP?) and how to speak with the server at the other end.
The Valve Developer Community wiki page might have some useful information.

Can a web service return a stream?

I've been writing a little application that will let people upload & download files to me. I've added a web service to this applciation to provide the upload/download functionality that way but I'm not too sure on how well my implementation is going to cope with large files.
At the moment the definitions of the upload & download methods look like this (written using Apache CXF):
boolean uploadFile(#WebParam(name = "username") String username,
#WebParam(name = "password") String password,
#WebParam(name = "filename") String filename,
#WebParam(name = "fileContents") byte[] fileContents)
throws UploadException, LoginException;
byte[] downloadFile(#WebParam(name = "username") String username,
#WebParam(name = "password") String password,
#WebParam(name = "filename") String filename) throws DownloadException,
LoginException;
So the file gets uploaded and downloaded as a byte array. But if I have a file of some stupid size (e.g. 1GB) surely this will try and put all that information into memory and crash my service.
So my question is - is it possible to return some kind of stream instead? I would imagine this isn't going to be terribly OS independent though. Although I know the theory behind web services, the practical side is something that I still need to pick up a bit of information on.
Cheers for any input,
Lee
Yes, it is possible with Metro. See the Large Attachments example, which looks like it does what you want.
JAX-WS RI provides support for sending and receiving large attachments in a streaming fashion.
Use MTOM and DataHandler in the programming model.
Cast the DataHandler to StreamingDataHandler and use its methods.
Make sure you call StreamingDataHandler.close() and also close the StreamingDataHandler.readOnce() stream.
Enable HTTP chunking on the client-side.
Stephen Denne has a Metro implementation that satisfies your requirement. My answer is provided below after a short explination as to why that is the case.
Most Web Service implementations that are built using HTTP as the message protocol are REST compliant, in that they only allow simple send-receive patterns and nothing more. This greatly improves interoperability, as all the various platforms can understand this simple architecture (for instance a Java web service talking to a .NET web service).
If you want to maintain this you could provide chunking.
boolean uploadFile(String username, String password, String fileName, int currentChunk, int totalChunks, byte[] chunk);
This would require some footwork in cases where you don't get the chunks in the right order (Or you can just require the chunks come in the right order), but it would probably be pretty easy to implement.
When you use a standardized web service the sender and reciever do rely on the integrity of the XML data send from the one to the other. This means that a web service request and answer only are complete when the last tag was sent. Having this in mind, a web service cannot be treated as a stream.
This is logical because standardized web services do rely on the http-protocol. That one is "stateless", will say it works like "open connection ... send request ... receive data ... close request". The connection will be closed at the end, anyway. So something like streaming is not intended to be used here. Or he layers above http (like web services).
So sorry, but as far as I can see there is no possibility for streaming in web services. Even worse: depending on the implementation/configuration of a web service, byte[] - data may be translated to Base64 and not the CDATA-tag and the request might get even more bloated.
P.S.: Yup, as others wrote, "chuinking" is possible. But this is no streaming as such ;-) - anyway, it may help you.
I hate to break it to those of you who think a streaming web service is not possible, but in reality, all http requests are stream based. Every browser doing a GET to a web site is stream based. Every call to a web service is stream based. Yes, all. We don't notice this at the level where we are implementing services or pages because lower levels of the architecture are dealing with this for you - but it is being done.
Have you ever noticed in a browser that sometimes it can take a while to fetch a page - the browser just keeps cranking away showing the hourglass? That is because the browser is waiting on a stream.
Streams are the reason mime/types have to be sent before the actual data - it's all just a byte stream to the browser, it wouldn't be able to identify a photo if you didn't tell it what it was first. It's also why you have to pass the size of a binary before sending - the browser won't be able to tell where the image stops and the page picks up again.
It's all just a stream of bytes to the client. If you want to prove this for yourself, just get a hold of the output stream at any point in the processing of a request and close() it. You will blow up everything. The browser will immediately stop showing the hourglass, and will display a "cannot find" or "connection reset at server" or some other such message.
That a lot of people don't know that all of this stuff is stream based shows just how much stuff has been layered on top of it. Some would say too much stuff - I am one of those.
Good luck and happy development - relax those shoulders!
For WCF I think its possible to define a member on a message as stream and set the binding appropriately - I've seen this work with wcf talking to Java web service.
You need to set the transferMode="StreamedResponse" in the httpTransport configuration and use mtomMessageEncoding (need to use a custom binding section in the config).
I think one limitation is that you can only have a single message body member if you want to stream (which kind of makes sense).
Apache CXF supports sending and receiving streams.
One way to do it is to add a uploadFileChunk(byte[] chunkData, int size, int offset, int totalSize) method (or something like that) that uploads parts of the file and the servers writes it the to disk.
Keep in mind that a web service request basically boils down to a single HTTP POST.
If you look at the output of a .ASMX file in .NET , it shows you exactly what the POST request and response will look like.
Chunking, as mentioned by #Guvante, is going to be the closest thing to what you want.
I suppose you could implement your own web client code to handle the TCP/IP and stream things into your application, but that would be complex to say the least.
I think using a simple servlet for this task would be a much easier approach, or is there any reason you can not use a servlet?
For instance you could use the Commons open source library.
The RMIIO library for Java provides for handing a RemoteInputStream across RMI - we only needed RMI, though you should be able to adapt the code to work over other types of RMI . This may be of help to you - especially if you can have a small application on the user side. The library was developed with the express purpose of being able to limit the size of the data pushed to the server to avoid exactly the type of situation you describe - effectively a DOS attack by filling up ram or disk.
With the RMIIO library, the server side gets to decide how much data it is willing to pull, where with HTTP PUT and POSTs, the client gets to make that decision, including the rate at which it pushes.
Yes, a webservice can do streaming. I created a webservice using Apache Axis2 and MTOM to support rendering PDF documents from XML. Since the resulting files could be quite large, streaming was important because we didn't want to keep it all in memory. Take a look at Oracle's documentation on streaming SOAP attachments.
Alternately, you can do it yourself, and tomcat will create the Chunked headers. This is an example of a spring controller function that streams.
#RequestMapping(value = "/stream")
public void hellostreamer(HttpServletRequest request, HttpServletResponse response) throws CopyStreamException, IOException
{
response.setContentType("text/xml");
OutputStreamWriter writer = new OutputStreamWriter (response.getOutputStream());
writer.write("this is streaming");
writer.close();
}
It's actually not that hard to "handle the TCP/IP and stream things into your application". Try this...
class MyServlet extends HttpServlet
{
public void doGet(HttpServletRequest request, HttpServletResponse response)
{
response.getOutputStream().println("Hello World!");
}
}
And that is all there is to it. You have, in the above code, responded to an HTTP GET request sent from a browser, and returned to that browser the text "Hello World!".
Keep in mind that "Hello World!" is not valid HTML, so you may end up with an error on the browser, but that really is all there is to it.
Good Luck in your development!
Rodney

Categories