How can I extract web app content from html code?

How can I extract web app content from html code? - java

So I'm currently trying to gather data from csgo gambling sites to analyze them. So I wrote a very short programm extracting the html code from this website but it won't extract the content of the web app.
My problem now is that I need the information within this web app. I mean I can view it in Chrome so I guess there will be solution.
Maybe the pictures help to understand what I'm looking for:
HTML code; marked the line I want
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
public class Main {
public static void main(String[] args) {
try {
String html = Jsoup.connect("https://www.wtfskins.com/crash").get().html();
System.out.println(html);
} catch (IOException e) {
e.printStackTrace();
}
}
}
So that's what I get. I need the content of
<body> <app-root>
loading... // That's the problem
</app-root>
<script src="https://code.jquery.com/jquery-3.1.1.min.js" integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/tether/1.4.0/js/tether.min.js" integrity="sha384-DztdAPBWPRXSA/3eYEEUWrWCy7G5KFbe8fFjk5JAIxUYHKkDx6Qin1DkWx51bBrb" crossorigin="anonymous"></script>
<script src="/assets/js/jquery-ui.min.js"></script>
<script src="/assets/js/bootstrap.js"></script>
<script src="/assets/js/sha3.js"></script>
<script src="/assets/js/sha256.js"></script>
<script type="text/javascript" src="inline.318b50c57b4eba3d437b.bundle.js"></script>
<script type="text/javascript" src="polyfills.2b75d68d2d6cb678fc8d.bundle.js"></script>
<script type="text/javascript" src="main.7932c68952979c366236.bundle.js"></script>
</body>

The data is loaded in the page after the initial DOM.
When you are getting data with JSoup, you get the initial html request.
This image shows that the html request really gives kinda empty html structure
If you check the Network tab in the dev tools in the browser, you will see that after the initial load there will be extra XHR requests, getting the data.
ngcontent attributes of tags assure that the page is loaded using Angular, which is a Javascript framework.
This is done to make page loads more efficient and protect from the scraping a bit more.
AFTER CHECKING
The network tab shows multiple requests after the page load that have JSON responses. You need to look at those, see which request headers are mandatory to request them.
As image shows, one of interesting ones is:
https://www.wtfskins.com/api/v1/p2ptrading/usertrades/
You can start by looking at How the Web works with subcategories about Async Javascript requests and REST API basics as well. If you are not familiar with web dev, the research will take a bit of time.

Related

Which Strategy is Better To Use When Working With Push Notification In terms of performance And Efficiency?

Now that i have discovered the push technique in primefaces, I am wondering which of these two ways are better to use in terms of performance.
I am Creating a Dashboard in java with spring and primefaces.
Right now i have one channel for all dashlets, I am broadcasting a message alongside a uuid to all pages and each page is responsible to check if the the given uuid concerns it or not. If the answer is yes the page will do the desired action, like showing the message or just refreshing itself.
For example in almighty java :
pushContext.push("/refresh-chan", getUuid().toString());
and in XHTML side:
<p:socket onMessage="handleMessage" channel="/refresh-chan" />
<script type="text/javascript">
function handleMessage(uuid) {
var myUuid = '#{myBean.id}';
//console.log(myUuid + "-"+uuid);
if(uuid==myUuid)
{
//console.log("- I am refreshing myself : my id="+uuid);
location.reload();
}
}
</script>
But there is an other way to handle this in which i can create a separate channel for each dashlet and push the context for that dashlet alone:
Notice : The code is imaginary and it may not function:
pushContext.push("/refresh-chan"+getUuid().toString(),"may leave empty");
and in XHTML page (Yup! still imaginary) :
<p:socket onMessage="handleMessage" channel="/refresh-chan"+#{myBean.id} />
<script type="text/javascript">
function handleMessage(dummy) {
location.reload();
}
</script>
In addition to performance and efficiency consider that the code is part of a framework that i am building and other developers will have to write their own XHTML pages and then add the XHTML part of the code(demonstrated above) in their pages too, so it should be easy for them to understand what is happening .
thank you

Inject to GWT authorize.net Verified Merchant Seal

I have gwt app with payment feature. I would like to insert into UIBinder following code (provided by authorize.net for verified merchants. I added my site to Verified Merchant Seal Domains List on authorize.net server):
<div class="AuthorizeNetSeal">
<script type="text/javascript" language="javascript">var ANS_customer_id="MY_ID";</script>
<script type="text/javascript" language="javascript" src="//verify.authorize.net/anetseal/seal.js"></script>
MerchantServices
</div>
I tried following:
UIBinder:
<g:HTMLPanel>
<div class="AuthorizeNetSeal">
Merchant Services
</div>
</g:HTMLPanel>
and in constructor of the control after initWidget(...)
ScriptInjector.fromString("var ANS_customer_id = 'MY_ID';").inject();
ScriptInjector.fromUrl("//verify.authorize.net/anetseal/seal.js").inject();
Tried scheduleDeferred. Tried setCallback() for ScriptInjector.fromUrl. Success method is called.
But seal doesn't appear.
please help
Thanks

Add script elements to the head section of your host page. If scripts are small and/or you do not use code splitting, there is little to no advantage in using a ScriptInjector.
Verify with your browser console that the script loads correctly from a URL you provided.

You need to inject them in the correct window:
ScriptInjector.fromString("var ANS_customer_id = 'MY_ID';").setWindow(ScriptInjector.TOP_WINDOW).inject();
But the seal.js uses document.write so it cannot be injected that way anyway, it must be present in the HTML document when loaded by the browser.
You could put this snippet in your HTML host page in a hidden <div> that you later relocate (Document.get().getElementById(…) and other appendChild DOM methods) where you want it within your app.
…and you should ask Authorize.net to provide an async version of their script so that if their servers are slow they don't slow down the loading of your app.

Thanks for your answers. I created separate HTML page with html code above and inserted it to iframe.

GWT: How to make sure a javascript is run after the GWT page is constructed

I have a javascript file main.js. The main.js contains something like this:
$(document).ready(function() {
Cufon.replace('#myform p.head', { fontFamily: 'HelveticaNeueLT Std Thin' });
......
});
I suppose what this does is to run this method after the whole page is loaded and apply the change to the css elements.
But what I found out is that this only works when the script is loaded before all the HTML elements, e.g.:
<body>
HTML......
<script type="text/javascript" src="js/main.js"></script>
</body>
However, if this script is put on top of all the HTML, it stops working:
<body>
<script type="text/javascript" src="js/main.js"></script>
HTML......
</body>
This happens on both static HTML and the GWT page. Because my GWT always put the generated HTML stuff at the end of all the body contents, the script is always before the HTML, hence is does not work. For example, my HTML for GWT module is like this:
<body>
<script type="text/javascript" src="js/main.js"></script>
</body>
And after compiled, the generated HTML from my UIBinding gives HTML page like:
<body>
<script type="text/javascript" src="js/main.js"></script>
Generated HTML....
</body>
My questions are:
Is there anyway in GWT where I can specify the generated HTML goes
between some statements in the tag.
Is there any other ways instead of $(document).ready I can guarantee
it is called as the last thing happened in a page load?
Many thanks

While I find it strange that the script doesn't work as intended when moved up in a static page ($(document).ready(…) is supposed to wait for the </html> to be reached –aka DOMContentLoaded– before running the function passed to it), it's not the reason it doesn't work with your GWT application (in other words, your diagnostic is wrong).
GWT's onModuleLoad also runs at DOMContentLoaded (or later, but never earlier) so you probably have a race condition between your app's onModuleLoad and jQuery's $(document).ready(…). You could try putting the <script> for your GWT app before the main.js, but because onModuleLoad might run after DOMContentLoader anyway, there's no guarantee it'll work (even less in a crossbrowser way).
I think you'd better remove the main.js or replace the $(document).ready(…) with a simple function, and call Cufon (and/or whatever else you were doing in $(document).ready(…)) from within your GWT app, at the moment appropriate for your needs (i.e. after you attached the #myform p.head element/widget to the document).
The easiest way to do that is to put the script in a JSNI method and then call that method where appropriate. Just make sure you use $wnd.Cufon instead of Cufon (and similarly for all other globals), and replace all occurrences of document with $doc and window with $wnd.
public static void cufon() /*-{
$wnd.Cufon.replace('#myform p.head', { fontFamily: 'HelveticaNeueLT Std Thin' });
}-*/;

Getting Started With GWT

I have been looking into the GWT for a couple of days now and I have some confusion.
I come from a PHP/JSP background so when I wanted to create a website that had multiple pages I would just create a PHP page for each page and then let the user select what to view.
Now that I am looking into GWT I don't really understand how this is done?
Lets say I would like my site to have three pages (index.html, help.html, contact.html), when a GWT app is loaded the onModuleLoad() method is called. How would I then code each separate pages widgets then using only this one method?
Looking at the example GWT application that is created in Eclipse, A single HTML page is created. How would I create an application with multiple pages if there is only a single onModuleLoad() method?

GWT can be used in a Web 2.0, client-side application way as mentioned by Chris Lercher and nvcleemp or you can use it in conjunction with a more traditional page view/reload model. If you simply want to inject DHTML functionality into existing, static pages, you can look for specific element id's for injecting into or you could read a javascript embedded configuration variable when onModuleLoad() is called to determine what state/mode you are in and what type of GWT client functionality you should be running.
For example, using the different injection points:
page 1:
<html>
<head>
...
<script type="text/javascript" src="yourmodule.nocache.js"></script>
...
</head>
<body>
...
<div id="injectMode1"></div>
...
</body>
</html>
page 2:
<html>
<head>
...
<script type="text/javascript" src="yourmodule.nocache.js"></script>
...
</head>
<body>
...
<div id="injectMode2"></div>
...
</body>
</html>
Your GWT EntryPoint:
#Override
public void onModuleLoad() {
final Panel mode1 = RootPanel.get("injectionMode1");
if (mode1 != null) {
mode1.add(new ModeOneWidget());
}
final Panel mode2 = RootPanel.get("injectionMode2");
if (mode2 != null) {
mode2.add(new ModeTwoWidget());
}
}
EDIT:
Using javascript variables, on each page that you want to embed GWT functionality you can do something similar to:
page foo:
<html>
<head>
...
<script type='text/javascript'>
var appMode="mode1";
</script>
<script type="text/javascript" src="yourmodule.nocache.js"></script>
...
</head>
...
Your GWT EntryPoint:
private static final native String getAppMode()/*-{
return $wnd.appMode;
}-*/;
#Override
public void onModuleLoad() {
String appMode = getAppMode();
if(appMode != null){
if(appMode.equals(MODE1)){
...
}
...
}
}

GWT uses JavaScript to modify the page content. So you don't load a new page [*].
With GWT, you don't need the server to create dynamic HTML content anymore. It's created dynamically on the client side (using static JavaScript code). When you need to load something from the server, you just load data objects (in JSON or XML format, or using GWT-RPC). The client may then use this data to build HTML snippets (to set innerHTML) or DOM objects to modify the browser's DOM tree.
With GWT, you don't have to build these snippets manually: You can use Widgets and UiBinder (client side HTML templating, enhanced with GWT tags and dynamic parameters).
[*] There are some special cases (e.g. if you have a https login page, whereas the rest of the app might use http), where you do load a new page, but that means either that your other page doesn't use GWT at all, or that you create a separate GWT module for it. Of course you can share some of the Java classes between these modules.

GWT is used to build applications like e.g. Google Reader or Gmail: this means that there is just 'one' page. You could have a 'window' inside that page that shows the contact information and a 'window' that shows the help information. When the users clicks the corresponding link you show that 'window'

google.load issue

Hi I am messing around with google ajax api at the momemt and following the examples from the documentation I have two script tags in my html file:
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script language="Javascript" type="text/javascript">google.load('search', '1');</script>
All works fine, but it will not work when I am using jquery and trying to call the google.load('search', '1'); in an external javascript file after $(document).ready(function()
I get the following error: null is null or not an object.
I am obviously missing something fundamental as I am just learning javascript but I was under the impression that it is best to use javascript unobtrusively. The second script tag that actually contains some js code isnt unobtrusive. Can anyone lend any help with this please?

From what you have explained it seems your page is setup something like this:
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">
google.load('jquery');
$(document).ready(function(){
... do stuff ...
});
</script>
<script src="/my/external.js" type="text/javascript"></script>
However, this will not work as you expect since the document.ready event will not fire until the DOM is completely loaded. JavaScript files, however, are executed as they are loaded. So the actual execution looks like this:
Load Google JSAPI
Load jQuery
Load External.js
Call Document Ready
Depending on what the rest of your code looks like, you might want to either put all your initialization code in a separate file, or move your search load back into the main document.
ABOUT UNOBTRUSIVE CODE:
David, unobtrusive JavaScript has to do with how it affects the page, not with whether or not it is in-page or external.
It is more about not making your site so dependent on JavaScript that it does not function with it disabled
For instance, this is obtrusive:
Click Me
Because it will only work with JavaScript enabled. Additionally the code is inline which is bad because it does not separate functionality from structure (HTML).
However, taking a similar piece of code:
Click Me
and using this javascript/jquery snippet:
$(document).ready(function(){
$("#do-something").click(function(e){
doSomethingNicer();
e.preventDefault(); // Keep the browser from following the href
});
});
Is becomes unobtrusive because the page still works (loads /do/something by default), but it works in a nicer way when JavaScript is enabled (executes the javascript instead of loading that url). This is also called Progressive Enhancement.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I extract web app content from html code? - java

Related

Which Strategy is Better To Use When Working With Push Notification In terms of performance And Efficiency?

Inject to GWT authorize.net Verified Merchant Seal

GWT: How to make sure a javascript is run after the GWT page is constructed

Getting Started With GWT

google.load issue

Categories

Resources