I am working with Liferay and I need to show a preview of HTML output of an URL as an embeded window in a JSP view. I am assessing different possibilities.
Store somewhat the interface to preview as a screenshot image and show it as an embeded image. Good thing is that formatting would be totally the same.
Parse URL output stream with a BufferedReader and clean all html, scripting, body tags with indexOf. Embed images as cid:
Some kind of include, jsp:include or liferay-util:include form direct URL of downloaded temporal HTML output
Any JQuery AJAX $().html() kind of solution
Any HTML-level solution in iframe, applet, frame, appletwindow or whatever if it exists
What do you think is the best or recommended way: simplest, reliable and exact looking? Any code or reference?
And in case I had to send it as a JavaMail Message content into an email direction?
Thank you!!
This should probably be done client-side, in Javascript, or even via iframes. Either put the iframes in the page directly, or have javascript code that generates the iframes, and point the iframes at the URL to be previewed. Keep it simple.
Related
im working on a server-side html render.
The case: The user has a simpel page with 3 cells. In each cell he can fill the html, css and JS code. After that, it will be send to the server which render the html and css code considering the javascript code.
My idea was to "simulate" a headless-browser. Till now i just found PhantomJS but i think its not really comfortable.
My result should be only the rendered HTML DOM
thank you
Try headless Chrome, this works on all operating systems:
https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
On Linux you have one more option. You can run any normal browser with virtual screen buffer.
Thank you for your response. As far as i see i have to use node.js. Is there a way to stay in the java envoirment without node.js?
I'm trying to use Jsoup to gather wave height information from Surfline.com. I have the element I desire in the screenshot and the it's showing in the dev tools. When I scrape the site with Jsoup, the returned string includes everything seen in the dev tool but the "1-2ft" which is what I need. The site is Javascript heavy and I'm assuming that jsoup is snagging the html before the javascript actually runs (I have no clue really). Do I need to specifically tell jsoup to wait for the pageload or am I missing some other critical component?
This is the code I'm using.
Document doc = Jsoup.connect("http://www.surfline.com/surf-report/folly-beach-pier-southside-southeast_5294/").get();
Elements content = doc.select("div[id=current-surf-range]");
System.out.println(content);
and this is the output I'm seeing in my IDE
<div id="current-surf-range" style="font-size:21px;font-weight:bold;padding-top:7px; padding-bottom: 7px;"></div>
it seems really odd that the contents of the div wouldn't be returned with it. This is my first time using Jsoup and I tried to read through the docs as best I could but nothing seemed to touch on this particular issue. Any insight would be awesome and greatly appreciated.
What you see in the browser is not what necessarily you would get when download the page by URL with your HTTP library of a choice. In fact, you should never expect them to be the same. In the modern web, webpages are quite dynamic and are loaded asynchronously involving multiple API calls to different resource providers and javascript being executed in the browser (which has the javascript engine).
What you get with JSoup in this case is the initial HTML that browser starts to form the page with. Then, there is a set of XHR calls to the surfline API that brings the data into the browser which then dynamically fills up different parts of the page, including the current surf range.
The simplest way to approach the problem is to switch to browser automation tool called selenium which would fire up a real browser. You can then wait for the current surf range element to have a value and, if you wish to continue with JSoup, get the page source and feed it to JSoup for further parsing.
Another approach would involve looking into the requests that the page makes in the browser developer tools and then try to simulate these requests in your code, parsing the JSON responses and extracting the surf forecast data.
I'm writing an Android app that parses a web page, filters the image links from it and load them in a WebView.
It works fine for static pages, but i have no idea how to handle pages that dynamically add content as i scroll down, such as 9gag, imgur, Facebook etc.
Is there a solution for this? I guess the dynamic content is handled by JavaScript. Maybe there's a way to call this JavaScript code before parsing the page?
I'd appreciate any advice.
Thanks in advance.
You should try looking at the requests that dynamic pages make.
All of them use a pattern of dynamic pagination, or a cursor.
Imgur for example issues requests with an url like this.
https://imgur.com/gallery/hot/viral/page/4/hit?set=0
Where you specify the page and the set is the portion of the page (Normaly they go up to 3)
i want to display an external webpage (exactly as it's rendered in that site) into a webpage in my application in a way that's fast and better for SEO crawlers, and i was wondering if there's a way to do that with javaee ?
if not then what is better in performance and for SEO the XMLHTTPRequest way or the iframes way.
please advise with sample code or link if possible, thanks
Update: example website is: http://www.akhbarak.net/
If you need to display content from different pages inline, use iframe (iframe stands for inline frame - it has nothing to do with Apple).
If you'd like to use AJAX to display pages, I would recommend colorbox.
Note that accessing pages in a different domain via AJAX is next to impossible - this is a very, very big security hole. I would not recommend doing it. You would have to use a proxy on your own server to fetch the page and return its HTML.
That said, using the iframe in your source code, so it is loaded with the rest of the page, seems like your best bet. Sites like facebook and twitter use this in embeddable "like" and "tweet" widgets so that those widgets can make requests on their own domain - that is, twitter or facebook. While managing lots of iframes isn't very fun, it is a very accepted way of doing what you want to do.
In theory, you could
load the whole page into a PHP variable,
replace the body tags with ,
take out the html tags,
pull out the entire section and put it in the encompassing pages ,
and replace all links with absolute ones (ie '/images' changes to 'http://example.com/images')
Would it be easy to do? Probably not. It's the only way I can think of to accomplish it so that the site appears as part of yours though.
I need a solution for getting HTML content from a browser. As rendering in a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work.
I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end.
Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok.
I need to use Python or java to do that.
selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!
For a detailed example of use with Python, see here.