I'm not very well-versed in how the internets work, so I'm not really sure what this java method is doing or how to best replicate it in python. I have tried several different methods including urllibs and sockets, but nothing seems to work. The only time I get a response I end up getting a huge HTML document when the response should only be about 5 lines of xml.
Any help would be greatly appreciate, thanks guys :).
try {
URL url = new URL( sPROTOCOL, sHOSTNAME, sPAGENAME );
URLConnection url_con = url.openConnection();
url_con.setDoInput(true);
url_con.setDoOutput(true);
url_con.setUseCaches (false);
url_con.setRequestProperty ("content-type", "application/x-www-form-urlencoded");
String input_xml = make_XML( sAppID, sAppPassword, sUserID, sPassword );
if (bDEBUG) {
System.out.println( "\nINPUT XML------------------\n" + input_xml );
System.out.println( "\nEND INPUT XML--------------\n" );
}
BufferedWriter writebuf = new BufferedWriter(new OutputStreamWriter(url_con.getOutputStream()));
writebuf.write("XMLData=");
writebuf.write( URLEncoder.encode( input_xml, "UTF-8" ) ); //Java 1.4.x and later
//writebuf.write( URLEncoder.encode( input_xml ) ); //Java 1.3.1 and earlier
writebuf.flush();
writebuf.close();
writebuf = null;
HashMap hm = parseResp(url_con);
it looks like it's opening a connection to sHOSTNAME, sending the XML data generated by make_XML (apparently as a single POST parameter called XMLData, so sPROTOCOL must be HTTP), and then processing the response in parseResp.
in python you would use httplib. the final example at http://docs.python.org/library/httplib.html is doing something similar (but is sending three parameters). note that the code you posted is kind of ugly, in that it explicitly writes the POST contents - in python you would just give the XML as a parameter.
Related
I have this strange bug that I have been debugging for like two days. My server constructs a string representing a HTML file and sent it over to another API to get transferred into a PDF file. However, all the foreign characters like Chinese will be converted to question marks.
I view these variables in debug mode, they all look fine, but when it is send it will be changed to question marks. I tried
new String(originalString.getBytes(), StandardCharsets.UTF_8)
The value returned by this constructor makes all the chinese character question mark as well.
I can view the originalString normally in debug mode with breakpoints, but I can't log them or System.out.println() them. The printed string will turn all foreign character into question marks.
I tried to look into the String, the correct String have slot 364 as a Chinese character with code 20000 something, but when they are converted, they get changed to ? with code 63
I do have console Charset set to UTF-8
This is how I am sending over to the other API
FYI This is old legacy code written by some random guy. Apologize for horrible style
String htmlData = princeServices.createPdf(pdfData, teacher, connection);
htmlData = tidyHTML(htmlData, pdfData); // This is the html data, can be viewed in debug mode
URL urlobj = null;
if ("Landscape".equalsIgnoreCase(pdfData.getPrintLayout())) {
urlobj = new URL(princePath + "Landscape");
} else {
urlobj = new URL(princePath);
}
HttpURLConnection conn = (HttpURLConnection) urlobj.openConnection();
HttpURLConnection.setFollowRedirects(true);
conn.setRequestMethod("POST");
conn.setDoOutput(true);
conn.setRequestProperty("Authorization", "Basic " + printBase64Binary(/* password here */ );
conn.setRequestProperty("Content Length", Integer.toString(htmlData.length()));
outToPrinceServer = new OutputStreamWriter(conn.getOutputStream());
outToPrinceServer.write(htmlData);
outToPrinceServer.flush();
resp.setContentType("application/pdf");
resp.addHeader("Content-Disposition", "attachment; filename=" + "planbook.pdf");
ServletOutputStream outToBrowser = resp.getOutputStream();
ByteStreams.copy(conn.getInputStream(), outToBrowser);
Save my life pls :)
System.out.println() is using a "local" console, which means that depending on your OS language, it will not display special characters.
See this answer for more info : https://stackoverflow.com/a/27218881/967768
Essentially, like a bulletproof tank, i want my program to absord 404 errors and keep on rolling, crushing the interwebs and leaving corpses dead and bludied in its wake, or, w/e.
I keep getting this error:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:29)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:38)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Runner.main(Runner.java:35)
But I can't understand why because I am checking to see if I have a valid URL before I navigate to it. What about my checking procedure is incorrect?
I tried to examine the other stack overflow questions on this subject but they're not very authoritative, plus I implemented the many of the solutions from this one and this one, so far nothing has worked.
I'm using the apache commons URL validator, this is the code I've been using most recently:
//get it's normal wiki disambig page
String URL_check = "https://en.wikipedia.org/wiki/" + associated_alias;
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_check ) )
{
Document docx = Jsoup.connect( URL_check ).get();
//this can handle the less structured ones.
and
//check the validity of the URL
String URL_czech = "https://www.wikidata.org/wiki/Special:ItemByTitle?site=en&page=" + associated_alias + "&submit=Search";
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_czech ) )
{
URL wikidata_page = new URL( URL_czech );
URLConnection wiki_connection = wikidata_page.openConnection();
BufferedReader wiki_data_pagecontent = new BufferedReader(
new InputStreamReader(
wiki_connection.getInputStream()));
The URLConnection throws an error when the status code of the webpage your downloading returns anything other than 2xx (such as 200 or 201 ect...). Instead of passing Jsoup a URL or String to parse your document consider passing it an input stream of data which contains the webpage.
Using the HttpURLConnection class we can try to download the webpage using getInputStream() and place that in a try/catch block and if it fails attempt to download it via getErrorStream().
Consider this bit of code which will download your wiki page even if it returns 404
String URL_czech = "https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29";
URL wikidata_page = new URL(URL_czech);
HttpURLConnection wiki_connection = (HttpURLConnection)wikidata_page.openConnection();
InputStream wikiInputStream = null;
try {
// try to connect and use the input stream
wiki_connection.connect();
wikiInputStream = wiki_connection.getInputStream();
} catch(IOException e) {
// failed, try using the error stream
wikiInputStream = wiki_connection.getErrorStream();
}
// parse the input stream using Jsoup
Jsoup.parse(wikiInputStream, null, wikidata_page.getProtocol()+"://"+wikidata_page.getHost()+"/");
The Status=404 error means there's no page at that location. Just because a URL is valid doesn't mean there's anything there. A validator can't tell you that. The only way you can determine that is by fetching it, and seeing if you get an error, as you're doing.
I've studied up on the Oracle documentation and examples and still can't get this to work.
I have a Java Applet that is simply trying to send a text field to a PHP script via POST, using URLConnection and OutputStreamWriter. The Java side seems to work fine, no exceptions are thrown, but PHP is not showing any output on my page. I am a PHP noob so please bear with me on that part.
Here is the relevant Java portion:
try {
URL url = new URL("myphpfile.php");
URLConnection con = url.openConnection();
con.setDoOutput(true);
out = new OutputStreamWriter(con.getOutputStream());
String outstring = "field1=" + field1 + "&field2=" + field2;
out.write(outstring);
out.close();
}
catch (Exception e) {
System.out.println("HTTPConnection error: " + e);
return;
}
and here is the relevant PHP code:
<?php
$field1= $_POST['field1'];
$field2= $_POST['field2'];
print "<table><tr><th>Column1</th><th>Column2</th></tr><tr><td>" .
$field1 . "</td><td>" . $field2 . "</td></tr></table>";
?>
All I see are the table headers Column1 and Column2 (let's just keep these names generic for testing purposes). What am I doing wrong? Do I need to tell my PHP script to check when my Java code does the write?
Not USE $_POST ,USE $_REQUEST OR $_GET
WHERE TO SET $field1 and $field2 in your php script?
Try URL url = new URL("myphpfile.php?field1=" + field1 + "&field2=" + field2);
Well, I feel like I've tried every possible thing that can be tried with PHP, so I eventually went with JSObject. Now THAT was easy.
Working Java code:
JSObject window = JSObject.getWindow(this);
// invoke JavaScript function
String result = "<table><tr><th>Column1</th><th>Column2</th></tr><tr><td>"
+ field1 + "</td><td>" + field2 + "</td></tr></table>";
window.call("writeResult", new Object[] {result});
Relevant working Javascript:
function writeResult(result) {
var resultElem =
document.getElementById("anHTMLtagID");
resultElem.innerHTML = result;
}
From here I can even send the results from Javascript to PHP via Ajax to do database-related actions. Yay!
I'm building a simple news readers app and I am using HTMLCleaner to retrieve and parse the data. I've sucessfully gotten the data I need using the commandline version of HTMLCleaner and using xmllint for example:
java -jar htmlcleaner-2.6.jar src=http://www.reuters.com/home nodebyxpath=//div[#id=\"topStory\"]
and
curl www.reuters.com | xmllint --html --xpath //div[#id='"topStory"'] -
both return the data I want. Then when I try to make this request using HTMLCleaner in my code I get no results. Even more troubling is that even basic queries like //div only return 8 nodes in my app while command line reports 70+ which is correct.
Here is the code I have now. It is in an Android class extending AsyncTask so its performed in the background. The final code will actually get the text data I need but I'm having trouble just getting it to return a result. When I Log Title Node the node count is zero.
I've tried every manner of escaping the xpath query strings but it makes no difference.
The HTMLCleaner code is in a separate source folder in my project and is (at least I think) compiled to dalvik with the rest of my app so an incompatible jar file shouldn't be the problem.
I've tried to dump the HTMLCleaner file but it doesn't work well with LogCat and alot of the page markup is missing when I dump it which made me think that HTMLCleaner was parsing incorrectly and discarding most of the page but how can that be the case when the commandline version works fine?
Also the app does not crash and I'm not logging any exceptions.
protected Void doInBackground(URL... argv) {
final HtmlCleaner cleaner = new HtmlCleaner();
TagNode lNode = null;
try {
lNode = cleaner.clean( argv[0].openConnection().getInputStream() );
Log.d("LoadMain", argv[0].toString());
} catch (IOException e) {
Log.d("LoadMain", e.getMessage());
}
final String lTitle = "//div[#id=\"topStory\"]";
// final String lBlurp = "//div[#id=\"topStory\"]//p";
try {
Object[] x = lNode.evaluateXPath(lTitle);
// Object[] y = lNode.evaluateXPath(lBlurp);
Log.d("LoadMain", "Title Nodes: " + x.length );
// Log.d("LoadMain", "Title Nodes: " + y.length);
// this.mBlurbs.add(new BlurbView (this.mContext, x.getText().toString(), y.getText().toString() ));
} catch (XPatherException e) {
Log.d("LoadMain", e.getMessage());
}
return null;
}
Any help is greatly appreciated. Thank you.
UPDATE:
I've narrowed down the problem to being something to do with the http request. If I load the html source as an asset I get what I want so clearly the problem is in receiving the http request. In other words using lNode = cleaner.clean( getAssets().open("reuters.html") ); works fine.
Problem was that the http request was being redirected to the mobile website. This was solved by changing the User-Agent property like so.
private static final String USER_AGENT = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0";
HttpURLConnection lConn = (HttpURLConnection) argv[0].openConnection();
lConn.setRequestProperty("User-Agent", USER_AGENT);
lConn.connect();
lNode = cleaner.clean( lConn.getInputStream() );
I was typing a question but finally I solved the problem and don't wanted to toss it (and encouraged by https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/), and decided to share my problem-solution.
The problem is that I want to retrieve some bytes from a Java application server, that means, via a Servlet to load in a flash game for a replay feature.
There are some questions trying to solve the other way problem, that means, from as3 to a server (php, java, etc): How to send binary data from AS3 through Java to a filesystem?, How can I send a ByteArray (from Flash) and some form data to php?, Uploading bytearray via URLRequest and Pushing ByteArray to POST. I didn't find something like what I'm sharing (correct me if I'm wrong).
Well, as I said in the question, I was encouraged by StackOverflow to answer and here it is:
The Servlet doGet method that gives the byte array:
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
MouseInput oneInput = getMouseInput(); //abstracted (I'm using google appengine)
byte[] inputInBytes = oneInput.getInBytes();
OutputStream o = resp.getOutputStream();
o.write(inputInBytes);
o.flush();
o.close();
}
MouseInput.getInBytes method body:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeInt(this.type);
dos.writeDouble(this.localX);
dos.writeDouble(this.localY);
dos.writeBoolean(this.buttonDown);
return baos.toByteArray();
My Actionscript code to receive the byte array data:
var url:String = "http://localhost:8888/input"; //servlet url
var request:URLRequest = new URLRequest(url);
//get rid of the cache issue:
var urlVariables:URLVariables = new URLVariables();
urlVariables.nocache = new Date().getTime();
request.data = urlVariables;
request.method = URLRequestMethod.GET;
var loader:URLLoader = new URLLoader();
loader.dataFormat = URLLoaderDataFormat.BINARY;
loader.addEventListener(Event.COMPLETE, function (evt:Event) {
var loader:URLLoader = URLLoader(evt.target);
var bytes:ByteArray = loader.data as ByteArray;
trace(bytes); //yeah, you'll get nothing!
//the bytes obtained from the request (see Servlet and
//MouseInput.getInBytes method body code above) were written in
//the sequence like is read here:
trace(bytes.readInt());
trace(bytes.readDouble());
trace(bytes.readDouble());
trace(bytes.readBoolean());
}
loader.addEventListener(IOErrorEvent.IO_ERROR, function (evt:Event) {
trace("error");
});
loader.load(request);
Well, it works! Obviously you can make some adjustments, like not using the anonymous function for better reading, but to illustrate it was ok! Now I can save some memory to a game replay feature (for debug purpose) with ByteArray instead of heavy XML that I was trying.
Hope it helped and any critics is appreciated!
Cheers