java.* package for encoding decoding [duplicate]

java.* package for encoding decoding [duplicate] - java

This question already has answers here:
Decode Base64 data in Java
(21 answers)
Closed 9 years ago.
I have used com.sun.org.apache.xerces.internal.impl.dv.util.Base64 package for the purpose of encoding decoding of strings. But I want to use java.* package for encoding and decoding instead of com.sun.apache.* package.
Can you please suggest an appropriate java.* package?

If you can wait until Java 8 is released - there will be a java.util.Base64 class.
In the meantime you should use the solution from Joachim Sauer's comment. (See Decode Base64 data in Java - second answer)

Use the java Packages:
java.net.URLDecoder
java.net.URLEncoder
And use it like this:
public static String decodeString(final String string) {
try {
return URLDecoder.decode(string, "UTF-8");
} catch (final UnsupportedEncodingException e) {
TLog.d(LOG, "Decoding Not Supported");
}
return string;
}

What you mean?
You want to use:
import java.*;
instead of using:
import com.sun.apache.*;
?
Seems a lit bit hard. I have one way to do this:
Download the source code of com.sun.org.apache.xerces.internal.impl.dv.util.Base64 packege.
Update the package name.
Re-package the source code.
Import the jar file again.
I don't think you should do this, it might be some license issue.

Related

How do I identify a files type in Java? [duplicate]

This question already has answers here:
How to get a file's Media Type (MIME type)?
(28 answers)
Closed 1 year ago.
Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)
Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:
try {
saxReader.read(f);
} catch (DocumentException e) {
logger.warn(" - File is not XML: " + e.getMessage());
return false;
}
return true;
As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.
This breaks however when we deal with a malformed XML (still XML though) file.
I'd rather not rely on .xml extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?> string inside the file etc.
Is there another way this can be handled?
What would you have to see inside the file to "suspect it may be XML though DocumentException was caught". This is needed for parsing purposes.

File type detection tools:
Mime Type Detection Utility
DROID (Digital Record Object Identification)
ftc - File Type Classifier
JHOVE, JHOVE2
NLNZ Metadata Extraction Tool
Apache Tika
TrID, TrIDNet
Oracle Outside In (commercial)
Forensic Innovations File Investigator TOOLS (commercial)

Apache Tika gives me the least amount of issues and is not platform specific unlike Java 7 : Files.probeContentType
import java.io.File;
import java.io.IOException;
import javax.activation.MimeType;
import org.apache.tika.Tika;
File inputFile = ...
String type = new Tika().detect(inputFile);
System.out.println(type);
For a xml file I got 'application/xml'
for a properties file I got 'text/plain'
You can however add a Detector to the new Tika()
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.xx</version>
</dependency>

For those who do not need very precise detection (the Java 7's Files.probeContentType method mentioned by rjdkolb)
Path filePath = Paths.get("/path/to/your/file.jpg");
String contentType = Files.probeContentType(filePath);

How to find Encoding of a file using Java [duplicate]

This question already has answers here:
Java : How to determine the correct charset encoding of a stream
(16 answers)
Closed 3 years ago.
I am trying to find the encoding of a file using the java program. But it always providing the UTF-8 as the output. Even though it is an ANSI file.
import java.io.InputStream
import java.io.FileInputStream
import java.io.BufferedInputStream
import java.io.InputStreamReader
new InputStreamReader(new FileInputStream("FILE_NAME")).getEncoding
The library is old and looks no proper support for that.
https://code.google.com/archive/p/juniversalchardet/
Some are so many answers, that say we can find the encoding of the file like
Java : How to determine the correct charset encoding of a stream
These solutions doesnt look good. According to # Jörg W Mittag We cannot find the encoding of a file for sure.

In scala I don't have sure, but have you tried alread this lib?
public static Charset guessCharset2(File file) throws IOException {
return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);
}

How to reliably detect file types? [duplicate]

This question already has answers here:
How to get a file's Media Type (MIME type)?
(28 answers)
Closed 1 year ago.
Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)
Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:
try {
saxReader.read(f);
} catch (DocumentException e) {
logger.warn(" - File is not XML: " + e.getMessage());
return false;
}
return true;
As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.
This breaks however when we deal with a malformed XML (still XML though) file.
I'd rather not rely on .xml extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?> string inside the file etc.
Is there another way this can be handled?
What would you have to see inside the file to "suspect it may be XML though DocumentException was caught". This is needed for parsing purposes.

File type detection tools:
Mime Type Detection Utility
DROID (Digital Record Object Identification)
ftc - File Type Classifier
JHOVE, JHOVE2
NLNZ Metadata Extraction Tool
Apache Tika
TrID, TrIDNet
Oracle Outside In (commercial)
Forensic Innovations File Investigator TOOLS (commercial)

Apache Tika gives me the least amount of issues and is not platform specific unlike Java 7 : Files.probeContentType
import java.io.File;
import java.io.IOException;
import javax.activation.MimeType;
import org.apache.tika.Tika;
File inputFile = ...
String type = new Tika().detect(inputFile);
System.out.println(type);
For a xml file I got 'application/xml'
for a properties file I got 'text/plain'
You can however add a Detector to the new Tika()
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.xx</version>
</dependency>

For those who do not need very precise detection (the Java 7's Files.probeContentType method mentioned by rjdkolb)
Path filePath = Paths.get("/path/to/your/file.jpg");
String contentType = Files.probeContentType(filePath);

URL Decode in Java 6 [duplicate]

This question already has answers here:
How to do URL decoding in Java?
(11 answers)
Closed 8 years ago.
I see that java.net.URLDecoder.decode(String) is deprecated in 6.
I have the following String:
String url ="http://172.20.4.60/jsfweb/cat/%D7%9C%D7%97%D7%9E%D7%99%D7%9D_%D7%A8%D7%92%D7%99%D7%9C%D7%99%D7%9"
How should I decode it in Java 6?

You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data).

Now you need to specify the character encoding of your string. Based off the information on the URLDecoder page:
Note: The World Wide Web Consortium
Recommendation states that UTF-8
should be used. Not doing so may
introduce incompatibilites.
The following should work for you:
java.net.URLDecoder.decode(url, "UTF-8");
Please see Draemon's answer below.

As the documentation mentions, decode(String) is deprecated because it always uses the platform default encoding, which is often wrong.
Use the two-argument version instead. You will need to specify the encoding used n the escaped parts.

Only the decode(String) method is deprecated. You should use the decode(String, String) method to explicitly set a character encoding for decoding.

As noted by previous posters, you should use java.net.URI class to do it:
System.out.println(String.format("Decoded URI: '%s'", new URI(url).getPath()));
What I want to note additionally is that if you have a path fragment of a URI and want to decode it separately, the same approach with one-argument constructor works, but if you try to use four-argument constructor it does not:
String fileName = "Map%20of%20All%20projects.pdf";
URI uri = new URI(null, null, fileName, null);
System.out.println(String.format("Not decoded URI *WTF?!?*: '%s'", uri.getPath()));
This was tested in Oracle JDK 7. The fact that this does not work is counter-intuitive, runs contrary to JavaDocs and it should be probably considered a bug.
It could trip people who are trying to use an approach symmetrical to encoding. As noted for example in this post: "how to encode URL to avoid special characters in java", in order to encode URI, it's a good idea to construct a URI by passing different URI parts separately since different encoding rules apply to different parts:
String fileName2 = "Map of All projects.pdf";
URI uri2 = new URI(null, null, fileName2, null);
System.out.println(String.format("Encoded URI: '%s'", uri2.toASCIIString()));

Add non-ASCII file names to zip in Java

What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly read in both Windows and Linux?
Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ.txt in file-roller.
import java.io.IOException;
import java.io.PrintStream;
import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;
public class Main {
public static void main(final String[] args) throws IOException {
try {
PrintStream ps = new PrintStream(new FileOutputStream(
"outer.zip/abc-åäö.txt"));
try {
ps.println("The characters åäö works here though.");
} finally {
ps.close();
}
} finally {
File.umount();
}
}
}
Unlike java.util.zip, truezip allows specifying zip file encoding. Here's another sample, this time explicitly specifiying the encoding. Neither IBM437, UTF-8 nor ISO-8859-1 works in Linux. IBM437 works in Windows.
import java.io.IOException;
import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;
public class Main {
public static void main(final String[] args) throws IOException {
for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
ZipOutputStream zipOutput = new ZipOutputStream(
new FileOutputStream(encoding + "-example.zip"), encoding);
ZipEntry entry = new ZipEntry("abc-åäö.txt");
zipOutput.putNextEntry(entry);
zipOutput.closeEntry();
zipOutput.close();
}
}
}

The encoding for the File-Entries in ZIP is originally specified as IBM Code Page 437. Many characters used in other languages are impossible to use that way.
The PKWARE-specification refers to the problem and adds a bit. But that is a later addition (from 2007, thanks to Cheeso for clearing that up, see comments). If that bit is set, the filename-entry have to be encoded in UTF-8. This extension is described in 'APPENDIX D - Language Encoding (EFS)', that is at the end of the linked document.
For Java it is a known bug, to get into trouble with non-ASCII-characters. See bug #4244499 and the high number of related bugs.
My colleague used as workaround URL-Encoding for the filenames before storing them into the ZIP and decoding after reading them. If you control both, storing and reading, that may be a workaround.
EDIT: At the bug someone suggests using the ZipOutputStream from Apache Ant as workaround. This implementation allows the specification of an encoding.

In Zip files, according to the spec owned by PKWare, the encoding of file names and file comments is IBM437. In 2007 PKWare extended the spec to also allow UTF-8. This says nothing about the encoding of the files contained within the zip. Only the encoding of the filenames.
I think all tools and libraries (Java and non Java) support IBM437 (which is a superset of ASCII), and fewer tools and libraries support UTF-8. Some tools and libs support other code pages. For example if you zip something using WinRar on a computer running in Shanghai, you will get the Big5 code page. This is not "allowed" by the zip spec but it happens anyway.
The DotNetZip library for .NET does Unicode, but of course that doesn't help you if you are using Java!
Using the Java built-in support for ZIP, you will always get IBM437. If you want an archive with something other than IBM437, then use a third party library, or create a JAR.

Miracles indeed happen, and Sun/Oracle did really fix the long-living bug/rfe:
Now it's possible to set up filename encodings upon creating the zip file/stream (requires Java 7).

You can still use the Apache Commons implementation of the zip stream : http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.html#setEncoding%28java.lang.String%29
Calling setEncoding("UTF-8") on your stream should be enough.

From a quick look at the TrueZIP manual - they recommend the JAR format:
It uses UTF-8 for file name encoding
and comments - unlike ZIP, which only
uses IBM437.
This probably means that the API is using the java.util.zip package for its implementation; that documentation states that it is still using a ZIP format from 1996. Unicode support wasn't added to the PKWARE .ZIP File Format Specification until 2006.

Did it actually fail or was just a font issue? (e.g. font having different glyphs for those charcodes) I've seen similar issues in Windows where rendering "broke" because the font didn't support the charset but the data was actually intact and correct.

Non-ASCII file names are not reliable across ZIP implementations and are best avoided. There is no provision for storing a charset setting in ZIP files; clients tend to guess with 'the current system codepage', which is unlikely to be what you want. Many combinations of client and codepage can result in inaccessible files.
Sorry!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java.* package for encoding decoding [duplicate] - java

If you can wait until Java 8 is released - there will be a java.util.Base64 class. In the meantime you should use the solution from Joachim Sauer's comment. (See Decode Base64 data in Java - second answer)

Use the java Packages: java.net.URLDecoder java.net.URLEncoder And use it like this: public static String decodeString(final String string) { try { return URLDecoder.decode(string, "UTF-8"); } catch (final UnsupportedEncodingException e) { TLog.d(LOG, "Decoding Not Supported"); } return string; }

Related

How do I identify a files type in Java? [duplicate]

How to find Encoding of a file using Java [duplicate]

How to reliably detect file types? [duplicate]

URL Decode in Java 6 [duplicate]

Add non-ASCII file names to zip in Java

Categories

Resources