Problem with unicode String literal in unit test

Problem with unicode String literal in unit test - java

I have a JUnit test that tests adding Strings to a Dictionary custom type. Everything works fine for everyone else on a Linux/Windows machine, however, being the first dev in my shop on a mac, this unit test fails for me. The offending lines are where unicode string literals are used:
dict.add( "Su字/会意pin", "Su字/会意pin" );
dict.add( "字/会意", "字/会意" );
Is there a platform-independent way to specify the unicode string? I've tried changing the encoding of the file in Eclipse to UTF-8 instead of the default MacRoman, but the test still fails.

In the flags for the javac compiler, set the -encoding flag, so in your case you'd mark it as
javac -encoding UTF-8

Related

setting JVM "line.separator" property for mvn test command

We have a number of unit tests which assemble a multi-line string in memory and compare it against a reference string in a file.
We've gotten a bit carried away and implemented the tests to use System.getProperty("line.separator") to make the code OS-agnostic, but the reference files are just files with \n line endings (Linux), so the tests which compared generated content to reference file content fail on a Windows machine because System.getProperty("line.separator") returns \r\n there.
This is test code so we'll probably simply define a final String LINE_ENDING="\n" and update tests to use it instead of the "line.separator" property value, but that said, I'd really like to understand why I'm unable to specify a different line separator. I tried mvn -DargLine="-Dline.separator=\n" test, but the newline special character appears to have been interpreted as a literal letter "n" so tests failed again. To my surprise, trying \\n instead of \n made no difference, either.
Can anyone show how one would set the line.separator parameter properly?
Final note: the above commands were issued on a Linux machine. When running one of the tests from within Eclipse on a Windows, the \n special character (passed in the debug configuration as a JVM parameter -Dline.separator=\n) seems to be interpreted as the literal value "\\n". Searching the web proves frustratingly fruitless.

rendering the US unit separator on unix machine

I have a java application that feeds a file on a unix machine, each string contains multiple US unit separators,
Locally, when i run it on eclipse on a windows machine, it displays fine on the console:
1▼somedata▼somedata▼0▼635064▼0▼somedata▼6
But when i run the program from the unix machine, the content of the file appears as.
1â¼N/Aâ¼somedataoâ¼somedataâ¼somedata
Changing the LANG variable to any value in locale -a seems not to work.

looks like character set mismatch. On linux you most probably have UTF-8. With Java you usually get UTF-16. Try converting from UTF16 to UTF8 with iconv and see how it looks like on linux.
cat file | iconv -f UTF-16 -t UTF-8
But actually it would have been much worse if it was UTF-16. Maybe it is simply a font mismatch. But you can play with character encoding (see what source is and convert to utf-8) if that's the issue. Or maybe your source is UTF-8 and destination - some local encoding.
This makes sense because your special character appears as 2 characters in the UNIX machine. Which means source is pretty much likely UTF-8 and UNIX is using an encoding where each byte is a single character.

Junk character output even after encoding

So, I have basically been trying to use Spanish Characters in my program, but wherever I used them, Java would print out '?'.
I am using Slackware, and executing my code there.
I updated lang.sh, and added: export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
After this when I tried printing, it did not print the question marks, but other junk characters. I printed the default Charset on screen, and it has been successfully set, but it is not printing properly.
Help?
Thanks!
EDIT: I'm writing code in windows on NetBeans, and executing .class or .jar on slackware.
Further, I cannot seem to execute locale command. I get error "bash: locale: command not found".
This is what confuses me: When I echo any special characters on Slackware console, they are displayed perfectly, but when I run a java program that simply prints it's command line arguments (and I enter the special characters as Command Line input), it outputs garbage.

If you are using an ssh client such as PuTTY, check that it is using a UTF-8 charset as well.

JSoup does not clean non-ascii string nicely

I am using JSoup to clean incoming text from users. Alas, it seems like it does not support non-ascii chars for the cleaning:
assertEquals("привет", Jsoup.clean("привет", Whitelist.none()));
this does not work.
Any idea?

What is the default encoding if you run your code? Maybe this one is not UTF-8 but linux / windows default. You can use the VM argument -Dfile.encoding=UTF8 to enshure UTF-8.
Checked your code with jsoup 1.6.3 too --> test successful

Error compiling Java file with special characters in class name

I am encountering an issue compiling a source file with a special character in the class name. The class file compiles fine in the Eclipse IDE, but not from javac. I believe I need to leverage the -encoding flag, but haven't hit the right setting yet. I would appreciate any pointers:
File Name: DeptView和SDO.java
Java Source:
public interface DeptView\u548cSDO {
public int getDeptno();
public void setDeptno(int value);
}
Error Message:
Running javac *.java results in the following error message:
javac: file not found: DeptView?SDO.java
UPDATE
I am currently trying the compile at a Windows XP command prompt
Ultimately this compile will need to be part of an ant build, an run on different operating systems
I work on the tool that is producing this generated source file.

One solution is to list the file name of each compilation unit in a separate file, say files, and pass #files as a command-line argument to javac. Otherwise, you will have to set the locale of your shell, so that it is using the correct character encoding.

Have you tried using -encoding UTF8 or -encoding UTF16LE (Little Endian) or -encoding UTF16BE (big endian)? (The usage of LE or BE depends on the system you are using -- Windows is LE from what I remember.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem with unicode String literal in unit test - java

In the flags for the javac compiler, set the -encoding flag, so in your case you'd mark it as javac -encoding UTF-8

Related

setting JVM "line.separator" property for mvn test command

rendering the US unit separator on unix machine

Junk character output even after encoding

JSoup does not clean non-ascii string nicely

Error compiling Java file with special characters in class name

Categories

Resources