Html search validation android - java

I have an android app with a search functionality. The search functionality loops through locally stored html files and appends a span with a background color to words that equal the imputed word, the same as if you press ctrl -f on your desktop. The problem i am having is that if the user searches for head, body, div, span etc it adds a span to the html tags. My question. Is there an android validation library that deals with this issue or do i need to make my own blacklist? I am aware of Android form validator's libraries but but i am not sure that they are built for what i am looking for.

I've use jsoup before to strip out unwanted html tags. You could do this in order to make the html data more "searchable". Also look at Android's Html.escapeHtml(CharSequence) that converts html into a String.

Related

Split PDF Into Separate Files By Child Bookmarks

I am trying to split PDF file (book) to multiple files by child bookmarks in code
Use case: table of contents of a book is available for a user. User can select up to n sections (might be not sequential) to preview. Application need to extract this sections and merge into single preview PDF
I found few tools, while looking into the solution in internet: Aspose, Spire (E-IceBlue), etc.
All of them can split PDF by pages (top bookmarks), but I need to split PDF by child bookmarks. It means, that area to extract can be started and/or finished at the middle of the page.
Ideally to have abiliti to do this in java code, but if someone knows solution in any other programming language or CLI program - it also would be great
It depends whether you insist that the non-chosen content on a page be redacted or not. For example, if section 6.3.2 takes up the middle half of a page, do you care if the end of 6.3.1 and the beginning of 6.3.3 are shown in the output on the same page?
If you don't care, cpdf can do this easily. Just output the bookmark data as JSON:
cpdf -list-bookmarks-json -utf8 in.pdf > marks.json
Then you can parse this JSON to show the list of bookmarks, and choose which pages to extract based on child bookmark page numbers.
As for redaction, you could use -add-rectangle or -hard-box to clean up the output based on the coordinates from the JSON bookmarks file, but that's not real redaction -- it just removes the content from view.

Is there a way to set page orientation to landscape when a specific element is found in OpenHtmlToPDF (or another HTML to PDF converter)?

The functionality for the project that I am currently working is to get data from a WYSIWYG editor and convert all the input to a PDF document. The problem is sometimes there is necessary to add wider tables and this produces a truncated visualization of them.
To solve this problem, I added to the editor (specifically, CKEditor) a HR button but I renamed it to "Change page orientation", so users can click that before inserting a table. In Java, I used iText 7 to detect this element (<hr>) and change the page orientation. This works like a charm.
Example using iText with a simple table
Now, requirements changed and for license purposes we need to replace iText for another HTML to PDF converter, but we need to keep this functionality.
I found OpenHTMLToPdf and I liked it, but I didn't find the way to replicate this page orientation when a hr (or another specific element) is found.
How can I solve that? I can use whatever library as long as they are open source.

add custom title after page break

i am using docx4j library and using templating to genearte report from my application.
i have following requirement,
When a page break comes between a paragraph content, i need to add a custom title before the next page content starts as you see in figure.
I know if we need to repeat same title , we can achieve it by using table and repeating header row. But there title will be same. Here I need custom title.
Paragraph is getting populatated from backend and how do we figure out page breaks happens at code level ?
Thanks in advance
This has to do with the Word Object Model. Word really does not have the concept of pages in the underlying structure of a document. Word Doesn't Know What a Page Is by Daiya Mitchell, MVP
Because of that, this would be better posted in the Word Answers forum hosted by Microsoft.
There are ways to deal with this using headers (not table headers necessarily although they can be used) or using a shape anchored to the table to occlude the word "continued" in the original header.
When you say templating, are you talking about Word templates (term of art) or something else?

how to web scrape autocompleting textfield

I am trying to fill a website form(compareraja.in) to search and compare mobile phones using java. I currently am using the jaunt library, but I just cant get to know, how to fill an autocompleting textfield, what i want is to select a particular item from the autocompleting list after i apply a certain initial letters to the textfield. Is it possible with jsoup or jaunt or htmlunit or any other library?
if yes, which is the most better and easiest choice ? Also how can it be done?
My clue is that first of all you have to retrieve the whole autocompletion list which appears after you've applied a certain letters. If you open the web page in e. g. Chrome, go to Developer Tools F12, Network Tab, then you will see that each time you type a letter in the textfield, a corresponding XML HTTP Request is logged in the list.
For example, I've typed "htc ":
On the Network Tab the last one XHR Header section contains all the necessary query parameters :
And Response section shows the received data, which is actually being shown in that autocompletion list:
So, you can just make GET XHR to URL http://www.compareraja.in/autocompletedata.ashx?q=htc+&c=mobiles&limit=150 (you can even click this link or paste it to the browser's address line to test), where your URL-encoded initial letters should be placed instead of htc+. It works fine without timestamp parameter for me.
After that it's easy to parse the response, splitting text by \n and ; chars, and fill the textfield with selected item.

SWT Browser focus on next and previous highlight text

I am developing a small application with SWT Browser widget. I am highlighting a search text word with
<a id="xyz" href=''><mark>test</mark></a>
in a HTML document. and replace all the search words in HTML Text in this way so we get all the search words highlighted.
htmltext.replaceAll("(?i)"+Pattern.quote(searchword), "\\<a id='xyz' href=''> <mark>$0\\</mark></a>
I want to implement functionality that if I click on next button, next highlighted word should get focus and if I click on previous button previous highlighted text should get focus. how can I accomplish Next and Previous Hit using Javascript in Eclipse RCP application.
This is best solved by combining JavaScript with Java code. It depends what kind of HTML content are you going to handle, if it's stateful (e.g. cannot reload), dynamic with lot of JS code, or plain static. In most cases, the best solution would involve most of logic to be written in JS and just minimal code in Java to bind JS actions to SWT GUI.
There's several things you need to implement:
keyword searching
toggling highlighting
toggling highlight from one word to another
1. Search: you realise that you won't be able to search for words that span through many HTML elements, like W<span>o</span>rd? If that's ok then you can just search and replace from Java as you do now. I'd go for individually tagging each word match with id: <span id="match1"> and remembering how many matches in total were found.
You could likely do such search on JS side as well by adding a function that iterates through DOM and searches for specific text and wraps it with another DOM object.
2. Toggling highlighting: It's best done in JavaScript. Append to your HTML a JS code fragment that toggles DOM element style. Something like:
`
function highlight(id) {
document.getElementById(id).className = 'highlighted'
}
You'll be able to call this JS from SWT by invoking swtBrowser.execute("highlight('match1')")
Further you should implement function that takes off highlighting.
3. Toggling highlighting between elements:
This can be done both on Java side and on JS side. I would probably go with JS and add two more functions: highlightNext() and highlightPrev() that would just call highlight() function with proper ids.
Then in Java you could make SWT buttons that call JS functions through SWTBrowser.execute().

Categories