Is there a way to build UTF-8 source files using ant - java

I have UTF-8 string literals hardcoded in my java files.
Eclipse builds this application correctly, so resulting class files contains those strings in UTF-8.
But If use ant build.xml, resulting class files contains strings with incorrect encoding.
I already tried adding encoding="UTF-8" to the javac task, but with no success.
How it can be fixed?
p.s. I know this is quite bad practice to have string literals hardcoded in the source files, but this is situation when I need it there, so please don't suggest to extract it to the resource bundle.
Any help is greatly appreciated

Proper way is
<javac ... encoding="UTF-8" ... />
If in resulting class files strings are in incorrect encoding, it means that probably your source encoding is not UTF-8, or these files are compiled by some other javac task, not the one you modified.

The encoding must match the encoding of the file. Your guess of utf-8 may not be correct. Please check other suitable encoding names like iso-8859-1.
Please check upvoted answers in How do I set -Dfile.encoding within ant's build.xml?

This is not actually an answer to original problem, but I don't want this question to be unanswered. Maybe moderators will decide to delete it. Anyway.
Looks like this is some kind of bug when using specific combination of ant, jdk, windows.
I dug really deep and wasn't able to fix this in a let's say normal way.
So I decided to externalize string to properties file, which is anyway better practice...
Normally solution suggested by Mikhail Vladimirov should work fine, but not this time.

Related

Who is responsible for docs and implementation of the JavaDoc-tool?

This question is related to an issue raised for Maven, which doesn't seem to escape paths forwarded to argument files supported by the JavaDoc-tool on Windows. The problem is that it's unclear from the documentation of JavaDoc itself how paths under Windows should be provided in those files.
The following is for Java 7:
If a filename contains embedded spaces, put the whole filename in double quotes, and double each backslash ("My Files\Stuff.java").
https://docs.oracle.com/javase/7/docs/technotes/tools/windows/javadoc.html#argumentfiles
The following from Java 8:
If a file name contains embedded spaces, then put the whole file name in double quotation marks.
https://docs.oracle.com/javase/8/docs/technotes/tools/windows/javadoc.html
In docs of Java 11 that part is completely missing, no mention of quotes, spaces or backslashes anymore:
https://docs.oracle.com/en/java/javase/11/javadoc/javadoc-command.html#GUID-EFE927BC-DB00-4876-808C-ED23E1AAEF7D
If you have a look at the URIs, in former versions of Java they were Windows-specific, while the last one is not. So I guess things have been refactored and some details of the argument files have been simply lost.
So, I need a place where I can talk to people about those differences in the documentation AND in the end how things are supposed to work on Windows. If backslash is an escape character in paths only and all that stuff. I would simply like to get some awareness from people who might know why the docs lack some details now and maybe even provide those details again.
So who/where do I write to? I don't know if it's Oracle or the OpenJDK project or someone completely different. Thanks!
I think, but don't take that authoritatively too lightly, that the javadoc tool is just an optional tool (can anyone show a formal obligation for any JDK to include a javadoc tool implementation ?) with a kind of de-facto standard set by the original owners, Sun thence Oracle.
But de-facto is only de-facto. Meaning formally and strictly speaking, that no JDK implementer has any obligation to make his javadoc tool behave like all the others do.
I think the best two places are the javadoc-dev mailinglist as well as the bugs database. Starting at some point in time (9, I guess) the have unified the parsing of the #files across tools. I have failed to find the code in the Mercurial repo last time.

Using Freemarker to generate Java .properties files

I am currently using Freemarker to generate a number of configuration files. So far these have been either xml files or properietary format text files. I now would like to generate some Java .properties files but have hit a couple of issues.
The first is character encoding. As far as I can see simply adding
<#ftl encoding="8859_1">
to the start of the file should sort this out.
The second issue is the escaping of the keys and values. The keys are probably ok as I would be hardcoding these in the template anyway so I can escape them in the template. The values will be coming from my data model and so will need escaping.
I can see how I can create my own user defined directive and by installing it as a shared variable use it in my template.
Is this the best or only way to do this? I would have thought generating .properties files is something that has been tackled many times before and was hoping something may already exist before I start writing my own code.
The class java.util.Properties got various store methods to save properties to OutputStreams or files. This seems more preferable than trying to adapt freemarker.
I don't get what are the charset issues that are specific to generating properties files. But note that the charset of the template and the charset of the output are independent, so you might as well use the same charset for these templates as for the others (like maybe UTF-8).
As of escaping, always use auto-escaping if you can. In 2.3.24 that will be especially sleek, but unless you are allowed to use unreleased versions, you had to wait for that until the end of February or so. (If you can use unreleased/unofficial versions, you can find out about the internal testing releases in the developer list archive.) Before 2.3.24, there's <#escape x as propEsc(x)>all the template content here</#escape>, where propEsc is a TemplateMethodModelEx (not a TemplateDirectiveModel) that you have added as shared variable or such. And so all ${...}-s will be magically escaped.

Why do i need library file with -sources suffix?

I don't understand why there are often two files in libraries, one with -sources suffix.
Here's what i mean
The sources are useful if you want to step into the library when debugging. You don't need them, but they might save you if you can't understand why the library behaves in a certain way.
To add to the answer below, the -source archive is the actual source code, while the other file is the compiled version of it.

Is it possible to edit a pre-existing .class file from within my program?

This may seem like an odd thing to ask, but it'd take me forever to explain why I need it...
What I need is a way to edit a pre-existing Java .class file within its JAR file, with either a command prompt, or within my Python program. I need it to happen automatically, once the user pushes a button.
I have absolutely no clue how to do this, or if it's possible.
A jar file is a zip package, you need only to extract the file, edit the content and put it back. The harder part is how to edit the .class file. The java .class file is a binary format , there're several libraries may help you.
Yes you can do this. Now how you gonna do it depends upon what you want to do. For your cross-cutting issues look at AspectJ. Using AspectJ you can add your custom code even after the class is compiled.
You have a problem with this approach, if the class has already been loaded by a JVM classloader, as it may not actually reread the .class file again until the application has been rerun.
I know that there exists the BCEL but I've not used it, so I dont know if it can be used a) from python, or b) during runtime.
EDIT: Actually, Jeffrey's list is better as it provides a much more comprehensive list of Byte Code manipulators.

Java problems with UTF-8 in different OS

I'm programing with other people an application to college homework, and sometime we use non-english characters in comments or in Strings displayed in the views. The problem is that everyone of use is using a different OS and sometimes different IDE's to program.
Concretely, one is using MacOS, another Windows7, and another and me Ubuntu Linux. Furthermore, all of them use Eclipse and I use gedit. We have no idea if Eclipse or gedit are configurable to work propertly with UTF8 bussiness, at least I don't found nothing for mine.
The fact is that what I write with non-english characters appears in Windows & MacOS virtual machines with strange symbols and vice-versa , and sometimes, what my non-linux friends write provokes compilation warnings like this: warning: unmappable character for encoding UTF8.
Do you have any Idea to solve this? It is not really urgent but it will be a help.
Thank you.
Not sure about gedit, but you can certainly configure eclipse to use whatever encoding you like for source code. It's part of the project properties (and saved in the .settings directory within the project).
Eclipse works fine with UTF-8. See Michael's answer about configuring it. Maybe for Windows and/or MacOS it is really necessary. Ubuntu uses UTF-8 as the default encoding so I don't think it's necessary to configure Eclipse there.
As for Gedit, this picture shows that it is possible to change the encoding when saving a file in Gedit.
Anyway, you need to make sure that all of you use UTF-8 for your sources. This is the only reasonable way to achieve cross-platform portability of your sources.
You could avoid the issue in Strings by using character escape sequences, and using only ASCII encoding for the files.
For example, an en dash can be expressed as "\u2013".
You can quickly search for the Java code for individual characters here.
As Sergey notes below, this works best for small numbers of non-ASCII characters. An alternative is to put all the UTF-8 strings in resource files. Eclipse provides a handy wizard for this.
If your UTF8 file contains a BOM (byte order mark) then you will have a problem. It is a known bug , see here and here.
The BOM is optional with UTF8 and most of the time it is not there because it breaks many tools (like Javadoc, XML parser,...).
More info here.

Categories