Equivalent to Angular $sanitize in Scala/Java on the server side - java

I sanitize a string in Angular like so:
var sanitized = $sanitize($scope.someHtml);
This works well if the user tries to enter malign HTML/Javascript on the application screen.
But if the user presses F12 and sends to the server an HTTP request bypassing the UI code without sanitizing the string, the server will take it. Is there a way to run sanitize on the server side as well? I'm using Scala/Java.

Take a look at Jsoup, a Java lib (that you can easily use with scala) for HTML parsing, DOM manipulations, and so on.
The given link explains how to clean a document using a Whitelist (so that only the specified elements/tags are accepted).

Related

How to parse a HTML Source Code without getting the entire source code.

I am interested in extracting a particular from the source code of a website. I am able to do this using JSoup, by getting the entire source code using
Document doc;
doc = Jsoup.connect("http://example.com").get();
Element divs = document.getElementById("importantDiv");
However, the problem is that I need to do this about 20000 times a day, to be able to get all the changes that are happening in the div. To create the whole document every time would use a lot of network bandwidth, which I would like to avoid. Is there a way to be able to extract the required element without re-creating the entire document on the client side.
NOTE : The code snippet is an example and not the actual URL or ID which I need to extract.
I don't believe you can request specific portions of a web page. JSoup is basically a web client class, and the web client has no control over what the server sends it. The server is the one that dictates what is sent, so you can't really request a segment of a webpage without requesting the entire web page.
Do you have access to this webpage, or is it an external website?
If you don't have control of the server side, you cannot do it. You will need to download the complete html. But note that it's just the HTML, not the rest of the resources like stylesheets, images, javascripts, etc.
To save bandwidth you would need to install some code in the server, so that it serves just the bits of information required.
Take a look at the URLConnection class, you can use it to open a connection to an URL get the connection's input stream and read only as much bytes as you need, this will work and you won't have to download the entire document, but unfortunately you won't be able to download the document starting from an offset. You will always have to start downloading the document from its beginning.

Httpclient in java without using any third party libraries

I trying to automate few things in my workplace where we are not allowed to use internet (Not all website very few allowed).
Req: I have a form which has a single text box & a single submit button, I have to put something in the text box and submit the form. The response I need to parse the HTML and get a specific text. The pages are written in JSP
Constraint: I don't have access to third party libraries & have to work with Java 6.
Please put me in right direction.
HttpURLConnection comes default with java. You may consider using this API. This API does most of the functionality as Apache HTTPClient. Here is simple example on how to use HTTPURLConnection.
I would use something like tamperdata from Firefox to capture the HTTP request that gets sent to the server, and then use HTTPUrlConnection (part of the JDK) to re-create that request.

PHP, Java Applet communication

Here's what I would like to do.
I have a PHP file in my server where I would like to call java applet. The applet function will send a get request to read a page from third party server. Now I want page read from applet function to be sent to PHP script. To simply put ,i want the return value of the applet request function in a PHP variable. Is it possible to do?
I want to do this way because I already have the code to parse the page information in PHP, so I don't want to rewrite that in java again.
I wanted the Java applet because the request has to be sent using the client information like IP. So I don't want to use proxies.
Note: I am not trying to hack anyone's server. I am not a advanced programmer of either Java or PHP. Please reply me in a descriptive manner possibly with pseudo code.
I already have the code to parse the page information in PHP, so I don't want to rewrite that in java again.
PHP should be able to get that page more easily than can a Java applet. The applet would need to be trusted or in communication with a site that uses the 'cross-domain resources' file that explicitly allows hot-linking.
Searches on 'php proxie' seemed to spill out around 7.32 million hits. I'd start there.

Accessing HTML DOM elements from Java

I'm developing (with Java) a P2P application. One of the features includes a chat service. When a user sends a message to all of the application users, each user gets the message and updates its chat HTML page.
How can I access, from my Java code, the DOM of this page and change it, without the need to refresh the page in order to see the new message?
Is there any object in Java that can get me this access? For example, can I call a JavaScript function that inserts the new message?
If by from Java you mean applet then:
You can define some javascript functions in your HTML page to return/modify what you want and then call the javascripts from the applet. Look at here.
If by Java you mean Web server then you have to use some AJAX solution, you can look for example at JQuery
What you're really looking for is a technology known as Comet. Comet is Reverse Ajax. It's a technique that uses long-lived HTTP connections to hold a connection open from a client browser to a server so that the server can push updates back to the client browser.
The basic flow is that the server pushes a command back to the browser in the response, and JavaScript parses the response via a callback function, and then the JavaScript updates the DOM, all without reloading the page.
You can learn more about Comet on the CometD Website, and if you're developing on Google App Engine, this blog post on the ChannelAPI will be helpful.

How do you set javascript as enabled when using DefaultHttpClient?

Im trying to use DefaultHttpClient to log into xbox.com. I realize that you cant be logged in without visiting http://login.live.com, so I was going to submit to the form on that page and then use the cookies in any requests to xbox.com.
The problem is that requesting anything from live.com using DefaultHttpClient returns the followings message.
Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.
How do I tell DefaultHttpClient to tell the server that javascript is available for use? I tried looking in the default options and also adding it as a parameter object but I cant see what I've got to do.
The reason this is happening is that this line of HTML is getting parsed from live:
<noscript><meta http-equiv="Refresh" content="0; URL=http://login.live.com/jsDisabled.srf?mkt=EN-US&lc=1033"/>Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.<br /><br />To find out whether your browser supports JavaScript, or to allow scripts, see the browser's online help.</noscript>
Which is used to redirect you if your client does not have javascript enabled (and therefore will parse <noscript> tags.)
You could try to use a less intelligent HTTP library which does no parsing of the content, but which instead simply does the transport and leaves the parsing to you.
Use Wireshark to trace the communication using both a browser and your program, and look for the differences. It's hard to say what, exactly, live.com/xbox.com are looking for, but there is likely some AJAX-y code used to get the actual content.
Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.To find out whether your browser supports JavaScript, or to allow scripts, see the browser's online help.

Categories