Page orientation in Aspose - java

I am using aspose-words-15.6.0 api for java. I want to change the page orientation to portrait or landscape based on the page number.
Scenario:
I've a doc having 3 pages in it, I want page orientation as follow:
1st Page : Portrait.
2nd Page : Landscape.
3rd Page : Portrait.
EDIT:
I have tried with DocumentBuilder, there is a way to achieve this but I am missing something, please refer the screenshot I've attached with this question.
Any help would be greatly appreciated.

There is no concept of Page in MS Word documents. Pages are created by Microsoft Word on the fly and unfortunately there is no straight forward way that you can use to set orientation per Page. However, you can specify orientation settings for a whole Section using Section.PageSetup.Orientation property and a Section may contain more than just one Page.
Alternatively, you may be able create a separate Section for each page in word document using Aspose.Words and then specify page orientation for each Section corresponding to a particular page. Please report this requirement in Aspose.Words forum, we will then develop code for this requirement and provide you more information.
EDIT:
If you want to build document from scratch, please use the following code:
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.writeln("Content on first page");
builder.getPageSetup().setOrientation(Orientation.PORTRAIT);
builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);
builder.writeln("Content on second page");
builder.getPageSetup().setOrientation(Orientation.LANDSCAPE);
builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);
builder.writeln("Content on third page");
builder.getPageSetup().setOrientation(Orientation.PORTRAIT);
doc.save(getMyDir() + "15.10.0.docx");
I work with Aspose as Developer Evangelist.

Related

JSOUP don't be able to extract dynamically generated DOM content in Android Studio

I use JSOUP to extract movie information on the web (as pictured) with JAVA on Android Studio. But I can not get the movie link (as shown). Please help me find a way to get the link highlighted in the picture.
enter image description here
Document document = Jsoup.connect(strings[0]).get();
Elements el = document.select("video.jw-video.jw-reset"); //-->Error
According to the documentation, Document#select(String tag) accept the cssQuery only.
You may use Document#getElementsByClass(String className) in your case
API documents
Official site

Link between different Sections in same PDF

We are creating a PDF document consisting of different sections generated separately and then added in single document. Now we need to provide a hyperlink to navigate the user to a point in different section.
We are using iText 7.0.5 and have tried using Link and Link Annotations, but no use. The mapping works for the same section but not between different sections.
Your use case: Merge pdfs and provide a mean of navigating in the resultant document.
How to do it?: Please look at the next iText sample: https://github.com/itext/i7js-examples/blob/develop/src/test/java/com/itextpdf/samples/sandbox/merge/MergeWithToc.java
In that sample some pdfs are merged and then a table of contents is created, so that one can click on its items and be navigated to the appropriate page.
In order to do so, it's suggested to set a named destination on some of elements of your "sections:
// Put the destination at the very first page of each merged document
if (i == 1) {
text.setDestination("p" + pageNo);
}
doc.add(new Paragraph(text).setFixedPosition(pageNo, 549, 810, 40).setMargin(0).setMultipliedLeading(1));
and then to set an action on an appropriate TOC element:
p.setAction(PdfAction.createGoTo("p" + entry.getKey()));
Perhaphs the same logic can be applied in your case as well.

Going to next page on an aspx form with JSoup

I'm trying to go to the next page on an aspx form using JSoup.
I can find the next button itself. I just don't know what to do with it.
The idea is that, for that particular form, if the next button exists, we would simulate a click and go to the next page. But any other solution other than simulating a click would be fine, as long as we get to the next page.
I also need to update the results once we go to the next page.
// Connecting, entering the data and making the first request
...
// Submitting the form
Document searchResults = form.submit().cookies(resp.cookies()).post();
// reading the data. Everything up to this point works as expected
...
// finding the next button (this part also works as expected)
Element nextBtn = searchResults.getElementById("ctl00_MainContent_btnNext");
if (nextBtn != null) {
// click? I don't know what to do here.
searchResults = ??? // updating the search results to include the results from the second page
}
The page itself is www.somePage.com/someForm.aspx, so I can't use the solution stated here:
Android jsoup, how to select item and go to next page
I was unable to find any other suggestions.
Any ideas? What am I missing? Is simulating a click even possible with JSoup? The documentation says nothing about it. But I'm sure people are able to navigate these type of forms.
Also, I'm working with Android, so I can't use HtmlUnit, as stated here:
importing HtmlUnit to Android project
Thank you.
This is not Jsoup work! Jsoup is a parser with a nice DOM API that allows you to deal with wild HTML as if it were well-formed and not crippled with errors and nonsenses.
In your specific case you may be able to scrape the target site directly from your app by finding links and retrieving HTML pages recursively. Something like
private void scrape(String url) {
Document doc = Jsoup.connect(url).get();
// Analyze current document content here...
// Then continue
for (Element link : doc.select(".ctl00_MainContent_btnNext")) {
scrape(link.attr("href"));
}
}
But in the general case what you want to do requires far more functionality that Jsoup provides: a user agent capable of interpreting HTML, CSS and Javascript with a scriptable API that you can call from your app to simulate a click. For example Selenium:
WebDriver driver = new FirefoxDriver();
driver.findElement(By.name("next_page")).click();
Selenium can't be bundled in an Android app, so I suggest you put your Selenium code on a server and make it accessible with some REST API.
Pagination on ASPX can be a pain. The best thing you can do is to use your browser to see the data parameters it sends to the server, then try to emulate this in code.
I've written a detailed tutorial on how to handle it here but it uses the univocity HTML parser (which is commercial closed source) instead of JSoup.
In short, you should try to get a <form> element with id="aspnetForm", and read the form elements to generate a POST request for the next page. The form data usually comes out with stuff such as this:
__EVENTTARGET =
__EVENTARGUMENT =
__VIEWSTATE = /wEPDwUKMTU0OTkzNjExNg8WBB4JU29ydE9yZ ... a very long string
__VIEWSTATEGENERATOR = 32423F7A
... and other gibberish
Then you need to look at each one of these and compare with what your browser sends. Sometimes you need to get values from other elements of the page to generate a similar POST request. You may have to REMOVE some of the parameters you get - again, make your code behave exactly the same as your browser
After some (frustrating) trial and error you will get it working. The server should return a pipe-delimited result, which you can break down and parse. Something like:
25081|updatePanel|ctl00_ContentPlaceHolder1_pnlgrdSearchResult|
<div>
<div style="font-weight: bold;">
... more stuff
|__EVENTARGUMENT||343908|hiddenField|__VIEWSTATE|/wEPDwU... another very long string ...1Pni|8|hiddenField|__VIEWSTATEGENERATOR|32423F7A| other gibberish
From THAT sort of response you need to generate new POST requests for the subsequent pages, for example:
String viewState = substringBetween(ajaxResponse, "__VIEWSTATE|", "|");
Then:
request.setDataParameter("__VIEWSTATE", viewState);
There are will be more data parameters to get from each response. But a lot depends on the site you are targeting.
Hope this helps a little.

Jsoup Remove Everything before a H2 tag

I have my HTML source that I get from a website using Jsoup.connect() method. Following is an piece of code from that HTML source (link: https://learn.microsoft.com/en-us/visualstudio/install/workload-component-id-vs-community)
.....
<p>When you set dependencies in your VSIX manifest, you must specify Component IDs
only. Use the tables on this page to determine our minimum component dependencies.
In some scenarios, this might mean that you specify only one component from a workload.
In other scenarios, it might mean that you specify multiple components from a single
workload or multiple components from multiple workloads. For more information, see
the
How to: Migrate Extensibility Projects to Visual Studio 2017 page.</p>
.....
<h2 id="visual-studio-core-editor-included-with-visual-studio-community-2017">Visual Studio core editor (included with Visual Studio Community 2017)</h2>
.....
<h2 id="see-also">See also</h2>
.....
What I want to do using jsoup is that, I would like to remove every single Html piece before <h2 id="visual-studio-core-editor-included-with-visual-studio-community-2017">Visual Studio core editor (included with Visual Studio Community 2017)</h2>
,and everything after (including) <h2 id="see-also">See also</h2>
I have a solution like this, but this pretty much didnt work for me:
try {
document = Jsoup.connect(Constants.URL).get();
}
catch (IOException iex) {
iex.printStackTrace();
}
document = Parser.parse(document.toString().replaceAll(".*?Visual Studio 2017 Workload and Component IDs page.</p>", "") , Constants.URL);
document = Parser.parse(document.toString().replaceAll("<h2 id=\"see-also\">See also</h2>?.*", "") , Constants.URL);
return null;
Any help would be appreciated.
Simple way could be: get the whole html of the page as a string, make a substring of the part you need and parse that substring once again with jsoup.
Document doc = Jsoup.connect("https://learn.microsoft.com/en-us/visualstudio/install/workload-component-id-vs-community").get();
String html = doc.html().substring(doc.html().indexOf("visual-studio-core-editor-included-with-visual-studio-community-2017")-8,
doc.html().indexOf("unaffiliated-components")-8);
Document doc2 = Jsoup.parse(html);
System.out.println(doc2);
I'll just make a small change to #eritrean s answer above. There is small modification to be made for me to get the required output.
document = Jsoup.parse(document.html().substring(document.html().indexOf("visual-studio-core-editor-included-with-visual-studio-community-2017")-26,
document.html().indexOf("see-also")-8));
System.out.println(document);

Adding a page with PDFBox doesn't work

I'm trying to add a page to an existing PDF-Document that I'm performing multiple different actions on before and after the page is supposed to be added.
Currently I open the page at the beginning of the document and write stuff on the first and second page of it. On the second page I add some images aswell. The Stuff that's written on the PDFs is different per PDF and sometimes it's so much stuff that two pages (or sometimes even 3) aren't enough. Now I'm trying to add a third or even fourth page once a certain amount of written text/printed images is on the second page.
Somehow no matter what I do, the third page I want to add doesn't show up in the final document. Here's my code to add the page:
if(doc.getNumberOfPages() < p+1){
PDDocument emptyDoc = PDDocument.load("./data/EmptyPage.pdf");
PDPage emptyPage = (PDPage)emptyDoc.getDocumentCatalog().getAllPages().get(0);
doc.addPage(emptyPage);;
emptyDoc.close();
}
When I check doc.getNumberOfPages() before, it says 2. Afterwards it says 3. The final document still just has 2 pages. The code after the if-clause contains multiple contentStreams that are supposed to write text on the new page (and on the first and second page).
contentStream = new PDPageContentStream(doc, (PDPage) allPages.get((int)p), true, true);
In the end, I save the document via
doc.save(tarFolder+nr.get(i)+".pdf");
doc.close();
I've created a whole new project with a class that's supposed to do the exact same thing - add a page to another PDF. This code works perfectly fine and the third page shows up - so it seems like I'm just missing something. My code works perfectly fine for page 1 + 2, we just had the new case that we need a third/fourth page sometimes lately, so I want to integrate that into my main project.
Here's the new project that works:
PDDocument doc = PDDocument.load("D:\\test.pdf");
PDDocument doc2 = PDDocument.load("D:\\EmptyPage.pdf");
List<PDPage> allPages = doc2.getDocumentCatalog().getAllPages();
PDPage page = (PDPage) allPages.get(pageNumber);
doc.addPage(page);
doc.save("D:\\testoutput.pdf");
What's weird in my main project is that the third page I add gets counted by
"getNumberOfPages()"
but doesn't show up in the final product. The program throws an error if I don't add the page because it tries to write content on the third page.
Any idea what I'm doing wrong?
Thanks in advance!
Edit:
If I add the page at the beginning, when my document is loaded the first time, the page gets added and exists in my final document - like this:
doc = PDDocument.load(config.getFolder("template"));
PDDocument emptyDoc = PDDocument.load("./data/EmptyPage.pdf");
PDPage emptyPage = (PDPage)emptyDoc.getDocumentCatalog().getAllPages().get(0);
doc.addPage(emptyPage);
However, since some documents don't need that extra page, it gets unnecessarily complicated - and I feel like removing the page if it isn't needed isn't really pretty, since I'd like to avoid adding it in the first place. Maybe somebody has an idea now?
I found an answer thanks to Tilman Hausherr.
If I move the
emptyDoc.close()
to the end of my code, right after:
doc.save(tarFolder+nr.get(i)+".pdf");
doc.close();
the page shows up in the final document without any issues.

Categories