How to format dates in Jahia 6 CMS? - java

I am helping a friend of mine put up a site for his business. I’ve read different posts and sites trying to find the ideal CMS tool, but people have different views of what is the best, so I finally just picked one of them at random.
So I went for an evaluation of Jahia 6.0-CE. As you’ve probably guessed by now, I don’t have so much experience with CMS tools. I just want to setup the CMS, write the templates for the site and let my friend manage the content from there on.
So I extracted the sources from SVN and went for a test drive. I managed to create some simple templates to get a hang of things but now I have an issue with a date format.
In my definitions.cnd I declared the field like so:
date myDateField (datetimepicker[format='dd.MM.yyyy'])
This is formatted in the page and the selector also presents this in the dd.MM.yyyy format when inserting the content. But how about sites in other countries, countries that represent the date as MM.dd.yyyy for example?
If I specify the format in the CND, hard coded, how can I change this later on so that it adapts based on the browser’s language? Do I extract the content from the repository and format it by hand in the JSP template based on a Locale, or is there a better way?
Thank you.

Indeed, when a format is specified in the CND, you have to formated in this jsp "by hand". You can use jst and fmt tags, so you will have in your template (jsp file) someting like that:
<fmt:formatDate pattern="yyyy-MM-dd" value="${myDate}"/>
You can set the pattern to the current local ones.
Regards.

To get fast answers about jahia templating, or else, you can go on the dedicated website Jahia.org

FYI you can create an account into the Jahia forum in less than one minute at https://sso.jahia.com/cas/casRegister.jsp

Related

Alternative to Markdown with Color support

I am writing on a Note App (Android and REST API built with PHP/Slim 3). I am wondering if there is something else than Markdown to save notes to a readable and interchangeable format. The problem with Markdown for me is that there is no solution to style texts (e.g. colored text). It is also hard to extend Markdown with custom attributes.
I am already thinking of creating an own data format (or using XML). But this means a lot of work for parsing it. I like the idea of using a standard format to interchange it between client/server and between other applications. But the featureset of Markdown is very limited (by design for sure).
Do you have any tips on this topic?
This question verges on overly-broad, i.e. it may lead to an argument over technologies rather than a "this is the solution" situation.
That being said, here's an answer I think won't be controversial: when you say
"readable, interchangeable format... solution to style texts... custom attributes"
I think HTML. I don't recommend trying to roll-your-own format, because 1.) you are correct that it will be difficult and 2.) it will be even more difficult to match the feature sets of existing solutions
To sum it up: I like the idea of using HTML instead of Markdown. It is an open standard format and exchangable as well as human-readable.
The problem I see with all of these solutions: How to write a WYSIWYG-Editor with this in mind? I am already working with Markdown using the Markwon library: https://github.com/noties/Markwon
It is no problem to write Markdown in an Android EditText widget and render it. You can easily convert it back to plaintext (you can save it). It is much more complicated to get a WYSIWYG experience. You have to deal with every User input, writing in a second file or string which contains the Markup while the user just sees the rendered result. The user can edit/delete anything anywhere in the EditText and you have to take care that those changes will affect the Markdown String/File too. I didn't find an easy solution for this.
The easiest way would be to somehow parse the content of the EditText back to Markdown. But there is no getSpannables-method or alike for the EditText widget. I am thinking of looping through the EditText and see what character is there and how it's formatted. But I think this will have disadvantages too, because there are other things like bulleted lists and checkboxes..

updating Notes document(s) via java

I am re-writing a Domino application with XPages. I have setup a basic CRUD implementation with help of Java classes. I am now at the point that I am creating/editing documents.
Since I am not so familiar in this area my code for now only worked with formats like text and date.
Where can I find examples how to work with other formats like multiple value fields, rich text, attachments, names, authors?
I assume I cannot cover every type of field via getItemValue(String) and replaceItemValue, or can I?
If you want so save yourself a lot of headaches, deploy the OpenNTF Domino API (ODA). It takes care of recycling, provides proper Java collections, allows for easy extraction of MIME and JSON.
There's an intro on openntf.org and you find some YouTube videos on it. Or head to Paul's for more info: http://www.intec.co.uk/ibm-connect-2017/
I tend to use Views wherever I can as I believe it is quicker than getting the document. It can be a little unwieldy though if you have lots of columns.
So using dates you need to convert from Notes DateTime to a Java Date.
Getting
account.setDateExpiry1(((DateTime)entry.getColumnValues().get(17)).toJavaDate())
or
account.setDateExpiry1(((DateTime)entry.getColumnValue("Column Name")).toJavaDate());
If I get the document I would use something like this.
item = doc.getFirstItem("DateApproved");
account.setDateExpiry1(((DateTime)item.getDateTimeValue()).toJavaDate());
or
account.setDateExpiry1(((DateTime)doc.getItemValueDateTimeArray("DateApproved").get(0)).toJavaDate());
Setting
With Dates you have to create a Notes DateTime object.
So something like
Date tmpDate =(Date)account.getDateCompleted();
doc.replaceItemValue("dateCompleted", (DateTime)Session.createDateTime((Date) tmpDate));
Similar concepts apply to Name Fields etc, however, there does not appear to be an easy way or direct 'java' replacement for the XPages upload and download attachment controls. You need a solid understanding of the more advance techniques in Java on this. I have struggled but I do need to revisit it. There are some examples on this forum. The same applies to Rich Text, my understanding is you need to become fully conversant with MIME - which I am not.

Is there a Java library which will convert Olsen timezone ids to Windows timezone ids

I have a legacy Windows application which reads data from a database. One of the columns is 'TimeZoneInfoId'. Which in the legacy world was written by another windows application so it stores the Windows string:
TimeZoneInfo.CurrentTimeZone.StandardName
I now need to write to this table from a Java application. So I'm trying to find a library that will map a time zone ID from the tz database (formerly known as the Olson database) to the windows timezone id. Ideally I'd like to use a library that in theory I could update to later releases in the future as I've read that timezone info can sometimes change.
I've searched a bit online already, and the answers I've found generally say either write your own mapping/lookup or use NodaTime and do the conversion in .NET (if you really need a library that can be updated).
I can't update the legacy code (wihtout a complete re-write) so just asking the question here since most of the answers I've found are a little old so maybe there is something new that I can avail of ;)
If not it will have to be a custom lookup function I hfea.
Well, assuming you meant the value came from TimeZoneInfo.Local.Id, then you can do this conversion. If it came from TimeZone.CurrentTimeZone.StandardName, then there will be some entries that fail because StandardName is a localized string, and the English form doesn't necessarily match the Id
Getting to your question, you could just look at the source XML data in CLDR, and parse it similar to how I described here.
Or, if you really want a library, consider that ICU4J has the methods getWindowsId and getIDForWindowsID that uses the same data to do the conversion. However, ICU can be really big, so if you only need it for this one thing then you might be better off going to the XML directly.

Is it possible to do this type of search in Java

I am stuck on a project at work that I do not think is really possible and I am wondering if someone can confirm my belief that it isn't possible or at least give me new options to look at.
We are doing a project for a client that involved a mass download of files from a server (easily did with ftp4j and document name list), but now we need to sort through the data from the server. The client is doing work in Contracts and wants us to pull out relevant information such as: Licensor, Licensee, Product, Agreement date, termination date, royalties, restrictions.
Since the documents are completely unstandardized, is that even possible to do? I can imagine loading in the files and searching it but I would have no idea how to pull out information from a paragraph such as the licensor and restrictions on the agreement. These are not hashes but instead are just long contracts. Even if I were to search for 'Licensor' it will come up in the document multiple times. The documents aren't even in a consistent file format. Some are PDF, some are text, some are html, and I've even seen some that were as bad as being a scanned image in a pdf.
My boss keeps pushing for me to work on this project but I feel as if I am out of options. I primarily do web and mobile so big data is really not my strong area. Does this sound possible to do in a reasonable amount of time? (We're talking about at the very minimum 1000 documents). I have been working on this in Java.
I'll do my best to give you some information, as this is not my area of expertise. I would highly consider writing a script that identifies the type of file you are dealing with, and then calls the appropriate parsing methods to handle what you are looking for.
Since you are dealing with big data, python could be pretty useful. Javascript would be my next choice.
If your overall code is written in Java, it should be very portable and flexible no matter which one you choose. Using a regex or a specific string search would be a good way to approach this;
If you are concerned only with Licensor followed by a name, you could identify the format of that particular instance and search for something similar using the regex you create. This can be extrapolated to other instances of searching.
For getting text from an image, try using the API's on this page:
How to read images using Java API?
Scanned Image to Readable Text
For text from a PDF:
https://www.idrsolutions.com/how-to-search-a-pdf-file-for-text/
Also, PDF is just text, so you should be able to search through it using a regex most likely. That would be my method of attack, or possibly using string.split() and make a string buffer that you can append to.
For text from HTML doc:
Here is a cool HTML parser library: http://jericho.htmlparser.net/docs/index.html
A resource that teaches how to remove HTML tags and get the good stuff: http://www.rgagnon.com/javadetails/java-0424.html
If you need anything else, let me know. I'll do my best to find it!
Apache tika can extract plain text from almost any commonly used file format.
But with the situation you describe, you would still need to analyze the text as in "natural language recognition". Thats a field where; despite some advances have been made (by dedicated research teams, spending many person years!); computers still fail pretty bad (heck even humans fail at it, sometimes).
With the number of documents you mentioned (1000's), hire a temp worker and have them sorted/tagged by human brain power. It will be cheaper and you will have less misclassifications.
You can use tika for text extraction. If there is a fixed pattern, you can extract information using regex or xpath queries. Other solution is to use Solr as shown in this video.You don't need solr but watch the video to get idea.

Extracting webpage information based on a template in Java

Right now I use Jsoup to extract certain information (not all the text) from some third party webpages, I do it periodically. This works fine until the HTML of certain webpage changes, this change leads to a change in the existing Java code, this is a tedious task, because these webpage change very frequently. Also it requires a programmer to fix the Java code. Here is an example of HTML code of my interest on a webpage:
<div>
<p><strong>Score:</strong>2.5/5</p>
<p><strong>Director:</strong> Bryan Singer</p>
</div>
<div>some other info which I dont need</div>
Now here is what I want to do, I want to save this webpage (an HTML file) locally and create a template out of it, like:
<div>
<p><strong>Score:</strong>{MOVIE_RATING}</p>
<p><strong>Director:</strong>{MOVIE_DIRECTOR}</p>
</div>
<div>some other info which I dont need</div>
Along with the actual URLs of the webpages these HTML templates will be the input to the Java program which will find out the location of these predefined keywords (e.g. {MOVIE_RATING}, {MOVIE_DIRECTOR}) and extract the values from the actual webpages.
This way I wouldn't have to modify the Java program every time a webpage changes, I will just save the webpage's HTML and replace the data with these keywords and rest will be taken care by the program. For example in future the actual HTML code may look like this:
<div>
<div><b>Rating:</b>**1/2</div>
<div><i>Director:</i>Singer, Bryan</div>
</div>
and the corresponding template will look like this:
<div>
<div><b>Rating:</b>{MOVIE_RATING}</div>
<div><i>Director:</i>{MOVIE_DIRECTOR}</div>
</div>
Also creating these kind of templates can be done by a non-programmer, anyone who can edit a file.
Now the question is, how can I achieve this in Java and is there any existing and better approach to this problem?
Note: While googling I found some research papers, but most of them require some prior learning data and accuracy is also a matter of concern.
The approach you gave is pretty much similar to the Gilbert's except
the regex part. I don't want to step into the ugly regex world, I am
planning to use template approach for many other areas apart from
movie info e.g. prices, product specs extraction etc.
The template you describe is not actually a "template" in the normal sense of the word: a set static content that is dumped to the output with a bunch of dynamic content inserted within it. Instead, it is the "reverse" of a template - it is a parsing pattern that is slurped up & discarded, leaving the desired parameters to be found.
Because your web pages change regularly, you don't want to hard-code the content to be parsed too precisely, but want to "zoom in" on its' essential features, making the minimum of assumptions. i.e. you want to commit to literally matching key text such as "Rating:" and treat interleaving markup such as"<b/>" in a much more flexible manner - ignoring it and allowing it to change without breaking.
When you combine (1) and (2), you can give the result any name you like, but IT IS parsing using regular expressions. i.e. the template approach IS the parsing approach using a regular expression - they are one and the same. The question is: what form should the regular expression take?
3A. If you use java hand-coding to do the parsing then the obvious answer is that the regular expression format should just be the java.util.regex format. Anything else is a development burden and is "non-standard" and will be hard to maintain.
3B. If you use want to use an html-aware parser, then jsoup is a good solution. Problem is you need more text/regular expression handling and flexibility than jsoup seems to provide. It seems too locked into specific html tags and structures and so breaks when pages change.
3C. You can use a much more powerful grammar-controlled general text parser such as ANTLR - a form of backus-naur inspired grammar is used to control the parsing and generator code is inserted to process parsed data. Here, the parsing grammar expressions can be very powerful indeed with complex rules for how text is ordered on the page and how text fields and values relate to each other. The power is beyond your requirements because you are not processing a language. And there's no escaping the fact that you still need to describe the ugly bits to skip - such as markup tags etc. And wrestling with ANTLR for the first time involves educational investment before you get productivity payback.
3D. Is there a java tool that just uses a simple template type approach to give a simple answer? Well a google search doesn't give too much hope https://www.google.com/search?q=java+template+based+parser&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a. I believe that any attempt to create such a beast will degenerate into either basic regex parsing or more advanced grammar-controlled parsing because the basic requirements for matching/ignoring/replacing text drive the solution in those directions. Anything else would be too simple to actually work. Sorry for the negative view - it just reflects the problem space.
My vote is for (3A) as the simplest, most powerful and flexible solution to your needs.
Not really a template-based approach here, but jsoup can still be a workable solution if you just externalize your Selector queries to a configuration file.
Your non-programmer doesn't even have to see HTML, just update the selectors in the configuration file. Something like SelectorGadget will make it easier to pick out what selector to actually use.
How can I achieve this in Java and is there any existing and better approach to this problem?
The template approach is a good approach. You gave all of the reasons why in your question.
Your templates would consist of just the HTML you want to process, and nothing else. Here's my example based on your example.
<div>
<p><strong>Score:</strong>{MOVIE_RATING}</p>
<p><strong>Director:</strong>{MOVIE_DIRECTOR}</p>
</div>
Basically, you would use Jsoup to process your templates. Then, as you use Jsoup to process the web pages, you check all of your processed templates to see if there's a match.
On a template match, you find the keywords in the processed template, then you find the corresponding values in the processed web page.
Yes, this would be a lot of coding, and more difficult than my description indicates. Your Java programmer will have to break this description down into simpler and simpler tasks until she or he can code the tasks.
If the web page changes frequently, then you'll probably want to confine your search for the fields like MOVIE_RATING to the smallest possible part of the page, and ignore everything else. There are two possibilities: you could either use a regular expression for each field, or you could use some kind of CSS selector. I think either would work and either "template" can consist of a simple list of search expressions, regex or css, that you would apply. Just roll through the list and extract what you can, and fail if some particular field isn't found because the page changed.
For example, the regex could look like this:
"Score:"(.)*[0-9]\.[0-9]\/[0-9]
(I haven't tested this.)
Or you can try different approach, using what i would call 'rules' instead of templates: for each piece of information that you need from the page, you can define jQuery expression(s) that extracts the text. Often when page change is small, the same well written jQuery expressions would still give the same results.
Then you can use Jerry (jQuery in Java), with the almost the same expressions to fetch the text you are looking for. So its not only about selectors, but you also have other jQuery methods for walking/filtering the DOM tree.
For example, rule for some Director text would be (in sort of sudo-java-jerry-code):
$.find("div#movie").find("div:nth-child(2)")....text();
There could be more (and more complex) expressions in the rule, spread across several lines, that for example iterate some nodes etc.
If you are OO person, each rule may be defined in its own implementation. If you are groovy person, you can even rewrite rules when needed, without recompiling your project, and still being in java. Etc.
As you see, the core idea here is to define rules how to find your text; and not to match to patterns as that may be fragile to minor changes - imagine if just a space has been added between two divs:). In this example of mine, I've used jQuery-alike syntax (actually, it's Jerry-alike syntax, since we are in Java) to define rules. This is only because jQuery is popular and simple, and known by your web developer too; at the end you can define your own syntax (depending on parsing tool you are using): for example, you may parse HTML into DOM tree and then write rules using your helper methods how to traverse it to the place of interest. Jerry also gives you access to underlaying DOM tree, too.
Hope this helps.
I used the following approach to do something similar in a personal project of mine that generates a RSS feed out of here the leading real estate website in spain.
Using this tool I found the rented place I'm currently living in ;-)
Get the HTML code from the page
Transform the HTML into XHTML. I used this this library I guess there might be today better options available
Use XPath to navigate the XHTML to the information you're interesting in
Of course every time they change the original page you will have to change the XPath expression. The other approach I can think of -semantic analysis of the original HTML source- is far, far beyond my humble skills ;-)

Categories