Java/HtmlUnit - How to get an HtmlImage from an HtmlImageInput?

Java/HtmlUnit - How to get an HtmlImage from an HtmlImageInput? - java

I've got an image on an html page that is also an input.
<input type="image" src=...
I couldn't care less about clicking the image. I want to save the image to a File. It seems to be impossible which seems ridiculous. I tried casting from HtmlImageInput to HtmlImage but I just get an error. How can I do this? Do I need to switch from HtmlUnit to something else? I don't care what I need to do to get this done.
By the way, I tried using selenium and taking a screenshot but it's taking a screenshot of the wrong area. Tried multiple different xpaths to the same element and it always takes the wrong screenshot.

Thanks for reporting.
Similar to HtmlImage, .saveAs(File) has been just added to HtmlImageInput.
BTW, if you can't use latest snapshot, then you can use:
try (WebClient webClient = new WebClient()) {
HtmlPage page = webClient.getPage("http://localhost:8080");
HtmlImageInput input = page.querySelector("input");
URL url = page.getFullyQualifiedUrl(input.getSrcAttribute());
final String accept = webClient.getBrowserVersion().getImgAcceptHeader();
final WebRequest request = new WebRequest(url, accept);
request.setAdditionalHeader("Referer", page.getUrl().toExternalForm());
WebResponse imageWebResponse = webClient.loadWebResponse(request);
}

HtmlImage codeImg = (HtmlImage) findElement(xpath, index);
InputStream is = null;
byte[] data = null;
try {
is = codeImg.getWebResponse(true).getContentAsStream();
data = new byte[is.available()];
is.read(data);
} catch (IOException e) {
log.error("get img stream meets error :", e);
} finally {
IOUtils.closeQuietly(is);
}
if (ArrayUtils.isEmpty(data)) {
String errorMessage = String.format("downLoad img verify code with xpath %s failed.", xpath);
throw new EnniuCrawlException(TargetResponseError.ERROR_RESPONSE_BODY, errorMessage);
}
String base64Img = Base64Utils.encodeToString(data);

Related

Add Image to Document/Template stored on HelloSign using JAVA

USECASE:
I have a document stored on HELLOSIGN which is supposed to be sent to a signer after prepopulating it with some data. Additionally, I have a field in the document where in I should be able to upload the signer image from my DB.
What I have done:
TemplateSignatureRequest request = new TemplateSignatureRequest();
request.setTitle(title);
request.setSubject(emailSubject);
request.setMessage(message);
request.setSigner("ROLE", "<<email_id>>", name);
request.setClientId(CLIENT_ID);
request.setTemplateId(TEMPLATE_ID);
request.setTestMode(true);
request.setCustomFields(customFields);
HelloSignClient client = new HelloSignClient(API_KEY);
client.sendTemplateSignatureRequest(request);
QUESTION : Is there a way I can directly populate the image in the request object by using something like:
request.setDocuments(docs);
Or is there any other way I can achieve this?
Note: I could not mark the image part in the doc as a custom field since I could not find an option to do it on HelloSign
I am trying to replace the Picture section in the image below

The TemplateSignatureRequest extends AbstractRequest which has a function for adding a file
public void addFile(File file) throws HelloSignException {
this.addFile(file, (Integer)null);
}
This was taken from the library. So you can simply use
request.addFile(file);

I reached out to apisupport#hellosign.com to ask them if there is any way to achieve this, and this is the response I got:
"This is currently not available, However, We're always looking for ways to improve HelloSign API and we regularly release new versions of our products with better performance, additional features, and security enhancements. I'll reach out to our product team and pass this idea along as a feature enhancement for them to review to see if this is something we can place on our roadmap"
So, I figured out a work around using PDF stamper
private byte[] stampImageToDoc() throws Exception {
try {
PdfReader pdfReader = new PdfReader(<<template_pdf_path>>);
ByteArrayOutputStream os = new ByteArrayOutputStream();
PdfStamper pdfStamper = new PdfStamper(pdfReader, os);
PdfContentByte cb = pdfStamper.getOverContent(1);
File file = new File(<<imagePath>>);
byte[] imageFile = FileUtils.readFileToByteArray(file);
if (imageFile != null) {
Image image = Image.getInstance(imageFile);
image.scaleAbsoluteHeight(150);
image.scaleAbsoluteWidth(150);
image.setAbsolutePosition(29, 500); //position
cb.addImage(image);
}
pdfStamper.close();
return os.toByteArray();
} catch (DocumentException e) {
e.printStackTrace();
throw e;
} catch (IOException e) {
e.printStackTrace();
throw e;
}
}
}
Instead of using TemplateSignatureRequest we will be using SignatureRequest and add the stamped doc to send request::
SignatureRequest request = new SignatureRequest();
List<Signer> signers = new ArrayList<>();
Signer signer = new Signer(req.getStudentEmail(), "DME");
signers.add(signer);
request.setTitle(title);
request.setSubject(emailSubject);
request.setMessage(message);
request.setSigners(signers);
request.setClientId(CLIENT_ID);
request.setTestMode(true);
// Image
byte[] docBytes = stampImageToDoc();
List<Document> docs = new ArrayList<>();
Document d = new Document();
File tempFile = new File(<<temporary_path>>);
FileUtils.writeByteArrayToFile(tempFile, docBytes);
d.setFile(tempFile);
docs.add(d);
request.setDocuments(docs);
HelloSignClient client = new HelloSignClient(API_KEY);
client.sendSignatureRequest(request);
Note: This might not be the best solution, but its just a workaround i could think of

Storing text into a String using jSoup

I'm trying to understand how to use htmlUnit and jSoup together and have been successful in understanding the basics. However, I'm trying to store text from a specific webpage into a string but when I try to do this, it only returns a single line rather than the whole text.
I know the code I've written works as I when I print out p.text, it returns the whole text stored within the website.
private static String getText() {
try {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("https://www.gov.uk/government/policies/brexit");
List<HtmlAnchor> anchors = page.getAnchors();
HtmlPage page1 = anchors.get(18).click();
String url = page1.getUrl().toString();
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("div[class=govspeak] p");
for (Element p : paragraphs)
System.out.println(p.text());
} catch (Exception e) {
e.printStackTrace();
Logger.getLogger(HTMLParser.class.getName()).log(Level.SEVERE, null, e);
}
return null;
}
}
When I introduce the notion of a string to store the text from p.text, it only returns a single line rather than the whole text.
private static String getText() {
String text = "";
try {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("https://www.gov.uk/government/policies/brexit");
List<HtmlAnchor> anchors = page.getAnchors();
HtmlPage page1 = anchors.get(18).click();
String url = page1.getUrl().toString();
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("div[class=govspeak] p");
for (Element p : paragraphs)
text=p.text();
} catch (Exception e) {
e.printStackTrace();
Logger.getLogger(HTMLParser.class.getName()).log(Level.SEVERE, null, e);
}
return text;
}
Ultimately, all I want to do is store the whole text into a string. Any help would be greatly appreciated, thanks in advance.

Document doc = Jsoup.connect(url).get();
String text = doc.text();
That's basically it. Due to the fact that JSoup is already taking care of cleaning all the html tags from the text, you can use the doc.text() and you'll receive the content of the whole page cleaned from html tags.

for (Element p : paragraphs)
text+=p.text(); // Append the text.
In your code, you are overwriting the values of variable text. That's why only last line is returned by the function.

I think it is a strange idea to use the HtmlUnit result as starting point for jSoup. There a various drawbacks of your approach (e.g. think about cookies). And of course HtmlUnit had parsed the html code already; you will do the work twice.
I hope this code will fulfill your requirements without jSoup.
private static String getText() throws FailingHttpStatusCodeException, MalformedURLException, IOException {
StringBuilder text = new StringBuilder();
try (WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage("https://www.gov.uk/government/policies/brexit");
List<HtmlAnchor> anchors = page.getAnchors();
HtmlPage page1 = anchors.get(18).click();
DomNodeList<DomNode> paragraphs = page1.querySelectorAll("div[class=govspeak] p");
for (DomNode p : paragraphs) {
text.append(p.asText());
}
}
return text.toString();
}

Is there way to use assertion in cycle to find all broken images on page

I am using selenium webdriver + TestNG. Help me to solve following issue if possible:
Searching all broken images on page and show them (using assertion) in console after test fails.
The following test fails after first broken image is found, I need test to check all images and show result when it fails:
public class BrokenImagesTest3_ {
#Test
public static void links() throws IOException, StaleElementReferenceException {
System.setProperty("webdriver.chrome.driver", "/C: ...");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://some url");
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
//Find total Number of links on page and print In console.
List<WebElement> total_images = driver.findElements(By.tagName("img"));
System.out.println("Total Number of images found on page = " + total_images .size());
//for loop to open all links one by one to check response code.
boolean isValid = false;
for (int i = 0; i < total_images .size(); i++) {
String image = total_images .get(i).getAttribute("src");
if (image != null) {
//Call getResponseCode function for each URL to check response code.
isValid = getResponseCode(image);
//Print message based on value of isValid which Is returned by getResponseCode function.
if (isValid) {
System.out.println("Valid image:" + image);
System.out.println("----------XXXX-----------XXXX----------XXXX-----------XXXX----------");
System.out.println();
} else {
System.out.println("Broken image ------> " + image);
System.out.println("----------XXXX-----------XXXX----------XXXX-----------XXXX----------");
System.out.println();
}
} else {
//If <a> tag do not contain href attribute and value then print this message
System.out.println("String null");
System.out.println("----------XXXX-----------XXXX----------XXXX-----------XXXX----------");
System.out.println();
continue;
}
}
driver.close();
}
//Function to get response code of link URL.
//Link URL Is valid If found response code = 200.
//Link URL Is Invalid If found response code = 404 or 505.
public static boolean getResponseCode(String chkurl) {
boolean validResponse = false;
try {
//Get response code of image
HttpClient client = HttpClientBuilder.create().build();
HttpGet request = new HttpGet(chkurl);
HttpResponse response = client.execute(request);
int resp_Code = response.getStatusLine().getStatusCode();
System.out.println(resp_Code);
Assert.assertEquals(resp_Code, 200);
if (resp_Code != 200) {
validResponse = false;
} else {
validResponse = true;
}
} catch (Exception e) {
}
return validResponse;
}
}

The reason your code stops at the first failure is because you are using an Assert for the resp_Code to equal 200. TestNG will stop execution on the first failed assert.
I would do this a little differently. You can use a CSS selector to find only images that contain a src attribute using "img[src]" so you don't have to deal with that case. When I look for broken images, I use the naturalWidth attribute. It will be 0 if the image is broken. Using these two pieces, the code would look like...
List<WebElement> images = driver.findElements(By.cssSelector("img[src]"));
System.out.println("Total Number of images found on page = " + images.size());
int brokenImagesCount = 0;
for (WebElement image : images)
{
if (isImageBroken(image))
{
brokenImagesCount++;
System.out.println(image.getAttribute("outerHTML"));
}
}
System.out.println("Count of broken images: " + brokenImagesCount);
Assert.assertEquals(brokenImagesCount, 0, "Count of broken images is 0");
then add this function
public boolean isImageBroken(WebElement image)
{
return !image.getAttribute("naturalWidth").equals("0");
}
I'm only writing out the images that are broken. I prefer this method since it keeps the log cleaner. Writing image is going to write gibberish that isn't going to be useful so I changed that to write the outerHTML which is the HTML of the IMG tag.

assertEquals() is throwing AssertionError, not an Exception. If codes are not equal in your case it will throw AssertionError and your test will stop and finish as failed.
If you catch Error instead of Exception in your catch() it should probably work as you expect it.

As an addendum to JeffC's, I prefer to collect the erroneous src attributes and report them as the failure rather than logging to a separate file, something like:
List<WebElement> images = driver.findElements(By.cssSelector("img[src]"));
System.out.println("Total Number of images found on page = " + images.size());
StringBuilder brokenImages = new StringBuilder();
for (WebElement image : images)
if (isImageBroken(image))
brokenImages.append(image.getAttribute("src")).append(";");
Assert.assertEquals(brokenImages.getLength(), 0,
"the following images failed to load", brokenImages);
(only an answer as it's easier to explain with code than in a comment)

Update database table without uploading a file while using a MultiPart Form - JavaEE, Servlet

I have a servlet which is responsible for enabling a user to update a reports table and upload a report at the same time. I have written code that enables a user upload a document and also be able to update the table with other details e.g date submitted etc.
However not all the times will a user have to upload a document. in this case it should be possible for a user to still edit a report's details and come back later to upload the file. i.e the user can submit the form without selecting a file and it still updates the table.
This part is what is not working. If a user selects a file and makes some changes. The code works. If a user doesn't select a file and tries to submit the form, it redirects to my servlet but it is blank. no stacktrace. No error is thrown.
Below is part of the code I have in my servlet:
if(param.equals("updateschedule"))
{
String[] allowedextensions = {"pdf","xlsx","xls","doc","docx","jpeg","jpg","msg"};
final String path = request.getParameter("uploadlocation_hidden");
final Part filepart=request.getPart("uploadreport_file");
int repid = Integer.parseInt(request.getParameter("repid_hidden"));
int reptype = Integer.parseInt(request.getParameter("reporttype_select"));
String webdocpath = request.getParameter("doclocation_hidden");
String subperiod = request.getParameter("submitperiod_select");
String duedate = request.getParameter("reportduedate_textfield");
String repname = request.getParameter("reportname_textfield");
String repdesc = request.getParameter("reportdesc_textarea");
String repinstr = request.getParameter("reportinst_textarea");
int repsubmitted = Integer.parseInt(request.getParameter("repsubmitted_select"));
String datesubmitted = request.getParameter("reportsubmitdate_textfield");
final String filename = getFileName(filepart);
OutputStream out = null;
InputStream filecontent=null;
String extension = filename.substring(filename.lastIndexOf(".") + 1, filename.length());
if(Arrays.asList(allowedextensions).contains(extension))
{
try
{
out=new FileOutputStream(new File(path+File.separator+filename));
filecontent = filepart.getInputStream();
int read=0;
final byte[] bytes = new byte[1024];
while((read=filecontent.read(bytes))!=-1)
{
out.write(bytes,0,read);
}
String fulldocpath = webdocpath+"/"+filename;
boolean succ = icreditdao.updatereportschedule(repid, reptype, subperiod, repname, repsubmitted,datesubmitted, duedate,fulldocpath, repdesc, repinstr);
if(succ==true)
{
response.sendRedirect("/webapp/Pages/Secured/ReportingSchedule.jsp?msg=Report Schedule updated successfully");
}
}
catch(Exception ex)
{
throw new ServletException(ex);
}
}
I'm still teaching myself javaee. Any help will be appreciated. Also open to other alternatives. I have thought of using jquery to detect if a file has been selected then use a different set of code. e.g
if(param.equals("updatewithnofileselected"))
{//update code here}
but I think there must be a better solution. Using jdk6, servlet3.0.

try this one.
MultipartParser parser = new MultipartParser(request, 500000000, false, false, "UTF-8");
Part part;
while ((part = parser.readNextPart()) != null) {
if(part.isParam()){
if(part.isFile()){
if(part.getName().equals("updatewithnofileselected")){
//update code here.
} else if(part.getName().equals("updateschedule")) {
//updateschedule
}
}
}
}
I used this one when I am using Multipart-form and it's working fine.

Htmlunit getByXPath not returning image tags

I am trying to search all image tags on a specific page. An example page would be www.chapitre.com
I am using the following code to search for all images on the page:
HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();
while (li.hasNext() ) {
HtmlImage image = (HtmlImage)li.next();
URL url = new URL(image.getSrcAttribute());
//For now, only load 1X1 pixels
if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}
}
This doesn't return me all the image tags in the page. For example, an image tag with attributes "src="http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=[+]LP2" width="1" height="1"" should be captured, but its not. Am I doing something wrong here?
Any help is really appreciated.
Cheers!

That's because
URL url = new URL(image.getSrcAttribute());
Is throwing you an exception :)
Try this code:
public Main() throws Exception {
WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);
HtmlPage page = webClient.getPage("http://www.chapitre.com");
List<HtmlImage> imageList = (List<HtmlImage>) page.getByXPath("//img");
for (HtmlImage image : imageList) {
try {
new URL(image.getSrcAttribute());
if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println(image.getSrcAttribute());
}
} catch (Exception e) {
System.out.println("You didn't see this comming :)");
}
}
}
You can even get those 1x1 pixel images by xpath.
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java/HtmlUnit - How to get an HtmlImage from an HtmlImageInput? - java

Related

Add Image to Document/Template stored on HelloSign using JAVA

Storing text into a String using jSoup

Is there way to use assertion in cycle to find all broken images on page

Update database table without uploading a file while using a MultiPart Form - JavaEE, Servlet

Htmlunit getByXPath not returning image tags

Categories

Resources