sock = new Socket("www.google.com", 80);
out = new BufferedOutputStream(sock.getOutputStream());
in = new BufferedInputStream(sock.getInputStream());
When i try to do printing out of content inside "in" like below
BufferedInputStream bin = new BufferedInputStream(in);
int b;
while ( ( b = bin.read() ) != -1 )
{
char c = (char)b;
System.err.print(""+(char)b); //This prints out content that is unreadable.
//Isn't it supposed to print out html tag?
}
If you want to print the content of a web page, you need to work with the HTTP protocol. You do not have to implement it yourself, the best way is to use existing implementations such as the java API HttpURLConnection or Apache's HttpClient
Here is an example of how to do it with HttpURLConnection:
URL url = new URL("http","www.google.com");
HttpURLConnection urlc = (HttpURLConnection)url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.setRequestMethod("GET");
urlc.connect();
// check you have received an status code 200 to indicate OK
// get the encoding from the Content-Type header
BufferedReader in = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
System.out.println(line);
}
// close sockets, handle errors, etc.
As written above, you can save traffic by adding the Accept-Encoding header and check the
Content-Encoding header of the response.
Here is an HttpClient Example, taken from here:
// Create an instance of HttpClient.
HttpClient client = new HttpClient();
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
try {
// Execute the method.
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + method.getStatusLine());
}
// Read the response body.
byte[] responseBody = method.getResponseBody();
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
System.out.println(new String(responseBody));
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
method.releaseConnection();
}
Very easy to create a String from a Stream using Java 8 Stream API:
new BufferedReader(new InputStreamReader(in)).lines().collect(Collectors.joining("\n"))
Using IntelliJ I even can set this beeing a debug expression:
I guess in Eclipse it will work similar.
If you what to fetch the content of a webpage, you should take a look at apache httpclient instead of coding this yourself, expect for learning purposes or any other really good reason.
Related
I am new to laravel and I would like to save data to my online server via laravel api from a java program but I am getting errors.
this is my route on api.php:
Route::middleware('auth:api')->get('/user', function (Request $request) {
return $request->user();
});
Route::post('hooks','ApiTestController#store');
my ApiTestController: its just handles POST request then saves to the table.
public function store(Request $request)
{
$postdata = json_decode($request->input('post_data'), true);
$datas = $postdata['header'];
$data = $datas[0];
$testH = new TestH();
$testH->test_date = $data['test_date'];
$testH->expiration = $data['test_date'];
$testH->source = $data['source'];
$testH->save();
return $testH;
}
and my java code :
try {
//local development server url
URL url = new URL("http://127.0.0.1:8000/api/hooks");
URLConnection con = url.openConnection();
// activate the output
con.setDoOutput(true);
PrintStream ps = new PrintStream(con.getOutputStream());
//create the JSON String
String json = null;
StringWriter sw = new StringWriter();
JSONWriter wr = new JSONWriter(sw);
try {
wr.object().key("header").array();
wr.object();
wr.key("test_date").value(new Date());
wr.key("source").value("TEST");
wr.key("expiration").value(new Date());
wr.endObject();
wr.endArray().endObject();
json = sw.toString();
System.out.println(json);
} catch (JSONException ex) {
Logger.getLogger(WebConnectSample.class.getName()).log(Level.SEVERE, null, ex);
}
// send to laravel server
ps.print("post_data="+json);
HttpURLConnection httpConn = (HttpURLConnection) con;
InputStream is;
if (httpConn.getResponseCode() >= 419) {
is = httpConn.getErrorStream();
} else {
is = httpConn.getInputStream();
}
// read the server reply
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(line);
// close the print stream
}
ps.close();
} catch (Exception e) {
e.printStackTrace();
}
}
the thing is when I dont save via $testH->save() everything works fine. but if I include it java returns with the following error:
Type error: Argument 1 passed to Illuminate\Routing\Middleware\ThrottleRequests::addHeaders() must be an instance of Symfony\Component\HttpFoundation\Response, string given, called in C:\Users\relixusdev\Documents\WebProjects\tcmsite\vendor\laravel\framework\src\Illuminate\Routing\Middleware\ThrottleRequests.php on line 61
any idea what part causes the error? does it have to do with authentication? i just want to be able to save to the online database via my java program.
Try using Route group with prefix as below
Route::group(['prefix' => 'api'], function() {
Route::post('hooks','ApiTestController#store');
});
if anyone comes here having the same problem, i found out that the problem is that I dont have the created_at and updated_at column at my table. I didn't realized its a requirement for laravel. silly me.
I am trying to get the source of a webpage using the following code:
public static String getFile(String sUrl) throws ClientProtocolException, IOException {
DefaultHttpClient httpclient = new DefaultHttpClient();
StringBuilder b = new StringBuilder();
// Prepare a request object
HttpGet httpget = new HttpGet(sUrl);
// Execute the request
HttpResponse response = httpclient.execute(httpget);
// Examine the response status
System.out.println(response.getStatusLine());
//status code should be 200
if (response.getStatusLine().getStatusCode() != 200) {
return null;
}
// Get hold of the response entity
HttpEntity entity = response.getEntity();
// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
InputStream instream = entity.getContent();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(instream));
// do something useful with the response
String s = reader.readLine();
while (s != null) {
b.append(s);
b.append("\n");
s = reader.readLine();
}
} catch (IOException ex) {
// In case of an IOException the connection will be released
// back to the connection manager automatically
throw ex;
} catch (RuntimeException ex) {
// In case of an unexpected exception you may want to abort
// the HTTP request in order to shut down the underlying
// connection and release it back to the connection manager.
httpget.abort();
throw ex;
} finally {
// Closing the input stream will trigger connection release
instream.close();
}
// When HttpClient instance is no longer needed,
// shut down the connection manager to ensure
// immediate deallocation of all system resources
httpclient.getConnectionManager().shutdown();
}
return b.toString();
}
It works fine, but certain symbols like  , - , single quotes etc. are not getting copied correctly.
I try to save the page source as a text/html type into amazon s3 and display it by accessing the page saved in the s3 server.
The symbols that I mentioned above are displayed as � .
Is there any solution for this?
You need to make sure that you are reading the content with the encoding of the page, else your system default encoding would be used (which apparently is not the correct one as you have seen):
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, entity.getContentEncoding()));
First one need to specify the encoding that the InputStreamReader uses. Your version of the constructor takes the default encoding on your system.
The encoding could be delivered in the headers. It defaults to ISO-8859-1 but (Latin-1) but in reality is Windows-1252 (Windows Latin-1).
String charset = "Windows-1252"; // Can be used as default.
String enc = entity.getContentEncoding(); // Or from Content-Type.
if (enc != null) {
charset = enc;
}
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, charset));
For HTML entities, apache has:
String s = ...
s = StringEscapeUtils.unescapeHTML4(s);
I want to send values of two variables to a PHP file from a Java applet), and I tried the following code.
try {
URL url = new URL(getCodeBase(),"abc.php");
URLConnection con = url.openConnection();
con.setDoOutput(true);
PrintStream ps = new PrintStream(con.getOutputStream());
ps.print("score="+score);
ps.print("username="+username);
con.getInputStream();
ps.close();
} catch (Exception e) {
g.drawString(""+e, 200,100);
}
I got the following error:
java.net.UnknownServiceException:protocol doesn't support output
java.net.UnknownServiceException:protocol doesn't support output
Means that you are using a protocol that doesn't support output.
getCodeBase() refers to a file url, so something like
file:/path/to/the/applet
The protocol is file, which doesn't support outout. You are looking for a http protocol, which supports output.
Maybe you wanted getDocumentBase(), which actually returns the web page where the applet is, i.e.
http://www.path.to/the/applet
Here's some code I used with my own applet, to send values (via POST) to a PHP script on my server:
I would use it like this:
String content = "";
content = content + "a=update&gid=" + gid + "&map=" + getMapString();
content = content + "&left_to_deploy=" + leftToDeploy + "&playerColor=" + playerColor;
content = content + "&uid=" + uid + "&player_won=" + didWin;
content = content + "&last_action=" + lastActionCode + "&appletID=" + appletID;
String result = "";
try {
result = requestFromDB(content);
System.out.println("Sending - " + content);
} catch (Exception e) {
status = e.toString();
}
As you can see, I am adding up all my values to send into a "content" string, then calling my requestFromDB method (which posts my "request" values, and returns the server's response) :
public String requestFromDB(String request) throws Exception
{
// This will accept a formatted request string, send it to the
// PHP script, then collect the response and return it as a String.
URL url;
URLConnection urlConn;
DataOutputStream printout;
DataInputStream input;
// URL of CGI-Bin script.
url = new URL ("http://" + siteRoot + "/globalconquest/applet-update.php");
// URL connection channel.
urlConn = url.openConnection();
// Let the run-time system (RTS) know that we want input.
urlConn.setDoInput (true);
// Let the RTS know that we want to do output.
urlConn.setDoOutput (true);
// No caching, we want the real thing.
urlConn.setUseCaches (false);
// Specify the content type.
urlConn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
// Send POST output.
printout = new DataOutputStream (urlConn.getOutputStream ());
printout.writeBytes (request);
printout.flush ();
printout.close ();
// Get response data.
input = new DataInputStream (urlConn.getInputStream ());
String str;
String a = "";
while (null != ((str = input.readLine())))
{
a = a + str;
}
input.close ();
System.out.println("Got " + a);
if (a.trim().equals("1")) {
// Error!
mode = "error";
}
return a;
} // requestFromDB
In my PHP script, I would only need to look at $_POST for my values. Then I would just print a response.
Note! Your PHP script MUST be on the same server as the applet for security reasons, or this will not work.
so i made a little code that can download 4chan pages. i get the raw HTML page and parse it for my need. the code below was working fine but it suddenly stopped working. when i run it the server does not accept my request it seems its waiting for something more. however i know that HTTP request is as below
GET /ck HTTP/1.1
Host: boards.4chan.org
(extra new line)
if i change this format in anyway i revive "400 bad request" status code. but if i change HTTP/1.1 to 1.0 the server responses in "200 ok" status and i get the whole page. so this makes me thing the error is in the host line since that became mandatory in HTTP/1.1. but still i cannot figure out what exactly need to be changed.
the calling function simply this, to get one whole board
downloadHTMLThread( "ck", -1);
or for a specific thread u just change -1 to that number. for example like for the link below will have like below.
//http://boards.4chan.org/ck/res/3507158
//url.getDefaultPort() is 80
//url.getHost() is boards.4chan.org
//url.getFile() is /ck/res/3507158
downloadHTMLThread( "ck", 3507158);
any advise would be appreciated, thanks
public static final String BOARDS = "boards.4chan.org";
public static final String IMAGES = "images.4chan.org";
public static final String THUMBS = "thumbs.4chan.org";
public static final String RES = "/res/";
public static final String HTTP = "http://";
public static final String SLASH = "/";
public String downloadHTMLThread( String board, int thread) {
BufferedReader reader = null;
PrintWriter out = null;
Socket socket = null;
String str = null;
StringBuilder input = new StringBuilder();
try {
URL url = new URL(HTTP+BOARDS+SLASH+board+(thread==-1?SLASH:RES+thread));
socket = new Socket( url.getHost(), url.getDefaultPort());
reader = new BufferedReader( new InputStreamReader( socket.getInputStream()));
out = new PrintWriter(socket.getOutputStream(), true);
out.println( "GET " +url.getFile()+ " HTTP/1.1");
out.println( "HOST: " + url.getHost());
out.println();
long start = System.currentTimeMillis();
while ((str = reader.readLine()) != null) {
input.append( str).append("\r\n");
}
long end = System.currentTimeMillis();
System.out.println( input);
System.out.println( "\nTime: " +(end-start)+ " milliseconds");
} catch (Exception ex) {
ex.printStackTrace();
input = null;
} finally {
if( reader!=null){
try {
reader.close();
} catch (IOException ioe) {
// nothing to see here
}
}
if( socket!=null){
try {
socket.close();
} catch (IOException ioe) {
// nothing to see here
}
}
if( out!=null){
out.close();
}
}
return input==null? null: input.toString();
}
Try using Apache HttpClient instead of rolling your own:
static String getUriContentsAsString(String uri) throws IOException {
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(new HttpGet(uri));
return EntityUtils.toString(response.getEntity());
}
If you are doing this to really learn the internals of HTTP client requests, then you might start by playing with curl from the command line. This will let you get all your headers and request body squared away. Then it will be a simple matter of adjusting your request to match what works in curl.
By the code I think that you are sending 'HOST' instead of 'Host'. Since this is a compulsory header in http/1.1, but ignored in http/1.0, that might be the problem.
Anyway, you could use a program to capture the packet sent (i. e. wireshark), just to make sure.
Using println is quite useful, but the line separator appended to the command depends on the system property line.separator. I think (although I'm not sure) that the line separator used in http protocol has to be '\r\n'. If you're capturing the packet, I think it'd be a good idea to check that each line sent ends with '\r\n' (bytes x0D0A) (just in case your os line separator is different)
Use www.4chan.org as the host instead. Since boards.4chan.org is a 302 redirect to www.4chan.org, you won't be able to scrape anything from boards.4chan.org.
I noticed a strange phenomenon when using the apache httpclient libraries and I want to know why it occurs. I created some sample code to demonstrate.
Consider the following code:
//Example URL
String url = "http://www.amazon.com/gp/offer-listing/05961580/ref=dp_olp_used?ie=UTF8";
GetMethod get = new GetMethod(url);
HttpMethodRetryHandler httpHandler = new DefaultHttpMethodRetryHandler(1, false);
get.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, httpHandler );
get.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
HttpConnectionManager connectionManager = new SimpleHttpConnectionManager();
HttpClient client = new HttpClient( connectionManager );
client.getParams().setParameter("http.useragent", FIREFOX );
String line;
StringBuilder stringBuilder = new StringBuilder();
String toStreamBody = null;
String toStringBody = null;
try {
int statusCode = client.executeMethod(get);
if( statusCode != HttpStatus.SC_OK ){
System.err.println("Internet Status: " + HttpStatus.getStatusText(statusCode) );
System.err.println("While getting page: " + url );
}
//toString
toStringBody = get.getResponseBodyAsString();
//toStream
InputStreamReader isr = new InputStreamReader(get.getResponseBodyAsStream())
BufferedReader rd = new BufferedReader(isr);
while ((line = rd.readLine()) != null) {
stringBuilder.append(line);
}
} catch (java.io.IOException ex) {
System.out.println( "Failed to get page: " + url);
} finally {
get.releaseConnection();
}
toStreamBody = stringBuilder.toString();
This code prints nothing:
System.out.println(toStringBody); // ""
This code prints the web page:
System.out.println(toStreamBody); // "Whole Page"
But it gets even stranger...
Replace:
get.getResponseBodyAsString();
With:
get.getResponseBodyAsString(150000);
Now we get the error:
Failed to get page: http://www.amazon.com/gp/offer-listing/0596158068/ref=dp_olp_used?ie=UTF8
I was unable to find another website besides for amazon that replicates this behavior but I assume there are others.
I am aware that according to the documentation at http://hc.apache.org/httpclient-3.x/performance.html discourages the use of getResponseBodyAsString(), it does not say that the page will not load, only that you may be at risk of an out of memory exception. Is it possible that getResponseBodyAsString() is returning the page before it loads? Why does this only happen with amazon?
Did you test with any other URL?
The URL in code that you provided redirects with 302 to http://www.amazon.com/dp/05961580/?tag=stackoverfl08-20, which then returns 404 (not found).
HttpClient does not handle redirects: http://hc.apache.org/httpclient-3.x/redirects.html