How to get rid of files with names bin$ - java

Using Jooq generator, by Gradle plugin, I am getting now with POJOs and tables not only classes with normal names, bu also heaps of files whose names start by bin$.
They are not necessary, for only yesterday the generator did not make these files. And everything works OK with or without them. But I don't want the project to be littered with tens of excessive files.

Since 10'th version, Oracle puts the dropped tables to the recycle bin. They have names starting by Bin$. So, JooQ simply makes classes for dropped tables. That could be blocked in two ways: To stop use recycling bean in Oracle or to filter the tables for which the Jooq generator makes classes.
ALTER SYSTEM SET RECYCLEBIN = OFF DEFERRED;
purge dba_recyclebin;
or to change the generator setting (the example is for Gradle)
generator{
...
database {
...
excludes = '(?i:BIN\\$.*)'
Edit: Finally after several attempts (by Lukas) and checks (by me) Lukas had found the correct meaning for excludes. Its form, IMHO, has the only explanation - JOOQ doesn't work with regex'es correctly, for Groovy does not parse the strings in single quotes.

jOOQ's <excludes/> setting is a Java regular expression. You have to properly form it like this:
excludes = '(?i:BIN\\$.*)'
Explanation:
Use (?i:...) for case-insensitivity. Just in case. Pun intended.
Use \\ before the $ sign because the $ means "end of line" in regular expressions. You want to escape that. And because Groovy/Gradle parses (as in "look for escape sequences") your string, you need to escape the backslash too, for it to reach the Java Pattern.compile() call
Use .* to indicate that after the $, you want to match any number of characters. . = any character and * = any number of repetitions

Related

Remove Attributes by Name. Filter broken?

There is an attribute filter which should remove each attribute which is matching a specified regular Expression from a set of Instances.
I have problems with the RegEx.
I tried several simple which all are valid (tested on regexr).
But the Filter seems to not accept them.
Following the relevant code.
Instances dataset1_x=new Instances(dataset1);
RemoveByName filterX=new RemoveByName();
filterX.setInputFormat(dataset1_x);
filterX.setInvertSelection(true);
filterX.setExpression(Pattern.quote("^.*i$"));
//filterX.setExpression("^.*i$"); also don't work
Instances dataset1_=Filter.useFilter(dataset1_x,filterX);
This should match all names ending with an "i".
The resulting dataset is named
"dataset-weka.filters.unsupervised.attribute.StringToNominal-Rlast-weka.filters.unsupervised.attribute.Remove-weka.filters.unsupervised.attribute.RemoveByName-E^.*id$"
Note that ^.*id$ is the default expression. It has not changed.
Although filterX.getExpression(); gives the correct regex set before.
Also this usage of the filter corresponds to several code-examples.
Same if I set the regex using Filter.setOptions();
This is an issue of version 3.9.0 dev and also 3.8 stable.
Using the WEKA-GUI, the filter is working correctly.
Thus another assumption is that if entered programmatically, the regex must have a special format.. Unfortunately the API does not provide examples..
You need to set the expression and the InvertSelection-flag before setting the input format.
More generally i assume that you have to set all option before setting the inputFormat.
Following is working.
Instances dataset1_x=new Instances(dataset1);
RemoveByName filterX=new RemoveByName();
filterX.setInvertSelection(true);
filterX.setExpression(Pattern.quote("^.*i$"));
filterX.setInputFormat(dataset1_x);
Instances dataset1_=Filter.useFilter(dataset1_x,filterX);

How to use single quotes with MessageFormat

On my current project, we are using properties files for strings. Those strings are then "formatted" using MessageFormat. Unfortunately, MessagFormat has a handling of single quotes that becomes a bit of a hindrance in languages, such as French, which use a lot of apostrophes.
For instance, suppose we have this entry
login.userUnknown=User {0} does not exist
When this gets translated into French, we get:
login.userUnknown=L'utilisateur {0} n'existe pas
This, MessageFormat does not like...
And I, do not like the following, i.e. having to use double quotes:
login.userUnknown=L''utilisateur {0} n''existe pas
The reason I don't like it is that it causes spellchecking errors everywhere.
Question: I am looking for an alternative to the instruction below, an alternative that does not need doubling quotes but still uses positional placeholders ({0}, {1}…). Is there anything else that can I use?
MessageFormat.format(Messages.getString("login.userUnkown"), username);
No there is no other way as it is how we are supposed to do it according to the javadoc.
A single quote itself must be represented by doubled single quotes '' throughout a String
As workaround, what you could do is doing it programmatically using replace("'", "''") or for this particular use case you could use the apostrophe character instead which is ’ it would be even more correct actually than using a single quote.
Probably too late for you, but someone else might find this useful: Instead of Java's MessageFormat, use ICU (International Components for Unicode) (or rather its Java port ICU4J). It's basically a set of tools and data to support you in internationalizing your application. And among those tools is their own version of MessageFormat. It's very similar (maybe even backwards compatible) and can handle single quotes exactly like you want it. It can even handle doubled/escaped single quotes so you can try it as a drop-in replacement for Java's MessageFormat without having to unescape your single quotes first.

Properties file escape a value

I've looking through How to escape the equals sign in properties files but didn't find my answer.
I have a Java Properties File that includes sets such as:
SOME_KEY = SOME_VALUE
This is normal. However, some of the values actually contain escape/control characters, such as URL's. This properties file is to be hand edited by a user on a rare occasion. I want the user to be able to simply paste in a URL and not worry about special rules, etc.
So I have this showing in my file now:
SOME_KEY = http://www.example.com/something.asp?some=
where some= is the base of dynamic URL where something after the = will cause the URL to respond differently.
From reading http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html it doesn't seem to make mention of needing to escape any escape/control characters after the first unescaped = or : is encountered, but I need/want to make sure.
I know that if my KEY had one of those characters present, then it would have to be escaped otherwise it'd be misread... such as:
SOME\=KEY = SOME_VALUE
Would make for a literal SOME=KEY as the key value.
In this above situation, excluding the obvious escaping of the KEY, is it necessary to hand-escape the values?
After the first = without escape, no.
If you use eclipse, you might want install the JBoss Tools Properties Editor. You not need to worry about escaping values ​​manually as you mention SOME=KEY or Unicode. However, the pluging escapes the characters to avoid reading and coding problems.
http://www.jboss.org/tools

Detecting words that start with an accented uppercase using regular expressions

I want to extract the words that begin with a capital — including accented capitals — using regular expressions in Java.
This is my conditional for words beginning with capital A through Z:
if (link.text().matches("^[A-Z].+") == true)
But I also want words that begin with an accented uppercase character, too.
Do you have any ideas?
Start with http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
To match an uppercase letter at the beginning of the string, you need the pattern ^\p{Lu}.
Unfortunately, Java does not support the mandatory \p{Uppercase} property, necessary for meeting UTS#18’s RL1.2.
That’s hardly the only thing missing from Java regular expressions to meet even Level 1, the most bareboned Basic Unicode Functionality. Without Level 1, you really can’t work with Unicode test using regular expressions. Too much is broken or absent.
UTS#18’s RL1.1 will finally be met with JDK7, but I do not believe there are currently any plans to meet RL1.2, RL1.2a, or any of the others that it’s currently lacking, nor even meeting the two Strong Recommendations. Alas!
Indeed, of the very short list of mandatory properties required by RL1.2, Java is missing the \p{Alphabetic}, \p{Uppercase}, \p{Lowercase}, \p{White_Space}, \p{Noncharacter_Code_Point}, \p{Default_Ignorable_Code_Point}, \p{ANY}, and \p{ASSIGNED} properties. Those are all mandatory but either completely missing or else fail to obey The Unicode Standard with respect to their definitions. This is also the problem with the POSIX compatible properties in Java: they’re all broken with respect to UTS#18.
Prior to JDK7, it is also missing the mandatory Script properties. JDK7 does get script properties at long last, but that’s all — nothing else. Java is still light years away from meeting even RL1.2a, which is a daily gotcha for zillions of programmers.
In JDK7, you can finally also two-part properties in the form \p{name=value} if they’re block, script, or general categories. That means these are all the same in JDK7’s Pattern class:
\p{Block=Number_Forms}, \p{blk=Number_Forms}, and \p{InNumber_Forms}.
\p{Script=Latin}, \p{sc=Latin}, \p{IsLatin}, and \p{Latin}.
\p{General_Category=Lu}, \p{GC=Lu}, and \p{Lu}.
However, you still cannot use the the long forms like \p{Lowercase_Letter} and \p{Letter_Number}, and the POSIX-looking properties are all broken from RL1.2a’s perspective. Plus super-basic properties from RL1.2 like \p{White_Space} and \p{Alphabetic} are still missing.
There was some talk of trying to fix \b and \B, which are miserably broken with respect to \w and \W, but I don't know how they’re going to fix all that without fully complying with RL1.2a. And no, I have no idea when they will add those basic properties to Java. You can’t get by without them, either.
To fully work with Unicode using regexes in Java at even Level 1, you really cannot use the standard Pattern class that Java comes with. The easiest way to do so is to instead use JNI to connect up with ICU regex libraries using the Google Android code, which is available.
There do exist other languages that are at least Level-1 compliant (or better) with UTS#18, but if you want to stay within Java, ICU is currently your own real option.
java has an method java.lang.Character.isUpperCase, its not exactly a regular expression, but might satisfy.
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#isUpperCase(int)

Should I use java.text.MessageFormat for localised messages without placeholders?

We are localising the user-interface text for a web application that runs on Java 5, and have a dilemma about how we output messages that are defined in properties files - the kind used by java.util.Properties.
Some messages include a placeholder that will be filled using java.text.MessageFormat. For example:
search.summary = Your search for {0} found {1} items.
MessageFormat is annoying, because a single quote is a special character, despite being common in English text. You have to type two for a literal single quote:
warning.item = This item''s {0} is not valid.
However, three-quarters of the application's 1000 or so messages do not include a placeholder. This means that we can output them directly, avoiding MessageFormat, and leave the single quotes alone:
help.url = The web page's URL
Question: should we use MessageFormat for all messages, for consistent syntax, or avoid MessageFormat where we can, so most messages do not need escaping?
There are clearly pros and cons either way.
Note that the API documentation for MessageFormat acknowledges the problem and suggests a non-solution:
The rules for using quotes within
message format patterns unfortunately
have shown to be somewhat confusing.
In particular, it isn't always obvious
to localizers whether single quotes
need to be doubled or not. Make sure
to inform localizers about the rules,
and tell them (for example, by using
comments in resource bundle source
files) which strings will be processed
by MessageFormat.
Just write your own implementation of MessageFormat without this annoying feature. You may look at the code of SLF4J Logger.
They have their own version of message formatter which can be used as followed:
logger.debug("Temperature set to {}. Old temperature was {}.", t, oldT);
Empty placeholders could be used with default ordering and numbered for some localization cases where different languages do permutations of words or parts of sentences.
In the end we decided to side-step the single quote problem by always using ‘curly’ quotes:
warning.item = This item\u2019s {0} is not valid.
Use the ` character instead of ' for quoting. We use it all the time without problems.
Use MessageFormat only when you need it, otherwise they only bloat up the code and have no extra value.
In my opinion, consistency is important for this sort of thing. Properties files and MessageFormat already have lots of limitations. If you find these troubling you could "compile" your properties files to generate properly-formed ones. But I'd say go with using MessageFormat everywhere. This way, as you maintain the code you don't need to worry about which strings are formatted and which aren't. It becomes simpler to deal with, since you can hand off message processing to a library and not worry about the details at a high level.
Another alternative...When loading the properties file, just wrap the inputstream in a FilterInpuStream that doubles up every single quote.

Categories