PLEASE help me to understand what is going on here:
my code:
import java.io.File;
public class Main {
public static void main(String[] args) {
String name = "d:\\downloads\\testfile.mp3";
File file1 = new File(name);
System.out.println(file1.getAbsolutePath());
File file = new File("d:\\downloads\\testfile.mp3");
System.out.println(file.getAbsolutePath());
}
}
The output:
J:\Louw\Programming\PathTest\d:\downloads\testfile.mp3
d:\downloads\testfile.mp3
Question:
Why would the String variable produce a different Absolutepath than typing the string directly with new File object? (Obviously the first output also throws a "FileNotFound" exception if trying to use later).
my Eclipse java development environment is:
Eclipse Java EE IDE for Web Developers.
Version: Neon.2 Release (4.6.2)
Build id: 20161208-0600
Please assist.
Now I am not 100 % sure whether this is the correct explanation, but I believe it is consistent, so I also believe that it is worth for you to check.
When I copy your code into my Eclipse, your string name begins with a character with Unicode value 8234 (202A hexadecimal). This character is not printed, so the two strings look the same, but they are not. The mentioned character is not it the string that you pass when constructing the second File object. On fileformat.info the character is called “left-to-right embedding”, I don’t know what this means.
It would make sense that such a character in front of d:\\ would cause Java not to recognize the string as an absolute path name and therefore take it as a relative one, relative to your working directory.
It remains to be determined whether that character is in your source file too or only has crept in on Stack Overflow or in my copy-paste operation.
If the 8234 is indeed the culprit: in my Eclipse I can delete it with backspace as any other character, and everything works as expected. Failing that, you can always delete a sequence of characters containing at least the " before and the d after and type them again.
Where that char may come from, I have no good idea. It sounds unlikely that you should have typed Alt-202A on your keyboard without knowing you had done so.
Your code is fine, and is doing what you expect. It's printing d:\downloads\testfile.mp3 twice.
Either something in how you execute your program is printing J:\Louw\Programming\PathTest\ with no newline to stdout before running your program, or you're seeing the system prompt and interpreting it as output.
You have a strange non-printable Unicode character at the start of your name String. The second instance of the string looks the same, but doesn't include that character. Paste the second string over the top of your first string and the problem will go away.
Related
I have been attempting to do a HelloWorld for an online Java class I started, but I have run into a weird error. I did look around on this website, and while I did see some similar instances of this error, none of them had the exact same issue that I'm having.
So for my class, I entered the code below (as instructed in the video, copied every letter, bracket, and symbol) into Notepad and saved it, set up javac, and attempted to run a javac.
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello world!");
}
}
Now when I run the javac, it gives me this error.
(The \ shows up like a weird W-like symbol on the command prompt for me, but it still functions properly).
The little arrow should point to the error in the text file, but the problem is it shows up as a Chinese character for some reason. I haven't been able to figure out what causes this, but my guess is it's something beyond the scope of this text document. However, my system doesn't use Chinese, and the system locale isn't in Chinese either, so I have no idea what it could be. I don't think it can be any of the brackets, as they look accurate to me, unless there's something I completely missed. Any help would be greatly appreciated.
It looks like your program file has something in it that is being interpreted as a multi-byte character. The character in error seems to be right before the "p" in public which is why the compiler is giving the error message. It is expecting a keyword and getting a Chinese character.
What editor did you use? I think the real problem is something about how your editor is set up. This error message is just a symptom. Another possibility is that your system is set up for interpreting certain byte sequences as Chinese. That would explain "\ " being interpreted as a character and something starting with a curly brace and going through "p" being seen as a sequence of them.
Windows notepad is the worst txt editor,it will change your line breaks,and it will insert UTF-8 BOM into the text file that you edit.
For your instance,that's why your java file can't compile.
There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained via Encoding.Default, and is often Windows-1252 but can be other locales.
I've run into what I thought was unusual behavior when working with some File objects.
import java.io.File;
public class MyClass
{
public static void main(String[] args)
{
File file = new File("C:\\x..");
System.out.println(file.isDirectory());
System.out.println(file.listFiles());
}
}
Assuming that some directory C:\x exists, file.isDirectory() will return true with the added two dots at the end of the path. This replicates the behavior in the command line, where cd x.. will change the directory to x.
However, when calling file.listFiles(), the method returns null, which is only supposed to happen if the file is not a directory. This seems to be inconsistent with the definition of the listFiles().
Why is this so? And why does having two dots at the end of the path go to the same directory as if there was no dots?
This problem seems to be exclusive to Windows. Linux correctly (?) returns false for isDirectory().
Windows trims trailing dots from path and filenames. I am unable to find a concrete reference for this, it's just one of those mysterious things that has always been that way.
It trims the trailing dots from the full pathname, not the individual components.
Therefore, while "C:\x...." is the same as "C:\x", "C:\x....\filename" is not the same as "C:\x\filename", as the latter does not have trailing dots.
You would have to look at the JDK's native FileSystem source on Windows to see precisely how it is obtaining a list of files, but I suspect it's doing some kind of search on e.g. "C:\x..\*.*" using Windows' FindFirstFile API call or something, where the dots are no longer trailing. In other words, presuming "C:\x" is a directory, while the path "C:\x.." is a directory, "C:\x..\*.*" matches nothing, and "C:\x..\subdirectory" does not map to "C:\x\subdirectory".
You should avoid using such paths. I'm not sure how you obtained that path string in the first place.
You can use File.getCanonicalPath() or File.getCanonicalFile() to convert it back to a more useable pathname.
By the way, if you want to have some fun with Windows, type the following in a command prompt (assuming "c:\windows\temp" exists, otherwise replace with some other path):
echo > \\?\c:\windows\temp\x.
The \\?\ prefix in Windows disables filename processing and expansion. Now delete the resulting file. Try it in Explorer too.
Spoiler:
You'll have to delete it with a wildcard from the console, e.g. del x?
You forgot to add extra slashes before .. to it should be c:\\x\\... This will point you to C: indeed
Can anyone tell me how to cope with illegal file names in java? When I run the following on Windows:
File badname = new File("C:\\Temp\\a:b");
System.out.println(badname.getAbsolutePath()+" length="+badname.length());
FileWriter w = new FileWriter(badname);
w.write("hello world");
w.close();
System.out.println(badname.getAbsolutePath()+" length="+badname.length());
The output shows that the file has been created and has the expected length, but in C:\Temp all I can see is a file called "a" with 0 length. Where is java putting the file?
What I'm looking for is a reliable way to throw an error when the file can't be created. I can't use exists() or length() - what other options are there?
In that particular example, the data is being written to a named stream. You can see the data you've written from the command line as follows:
more < .\a:b
For information about valid file names, look here.
To answer your specific question: exists() should be sufficient. Even in this case, after all, the data is being written to the designated location - it just wasn't where you expected it to be! If you think this case will cause problems for your users, check for the presence of a colon in the file name.
I would suggest looking at Regular Expressions. They allow you to break apart a string and see if certain characteristics apply. The other method that would work is splitting the String into a char[], and then processing each point to see what's in it, and if it's legal... but I think RegEx would work much better.
You should take a look at Regular Expressions and create a pattern which will match any illegal character, something like this:
String fileName = "...";
Pattern pattern = Pattern.compile("[:;!?]");
Matcher matcher = pattern.match(fileName);
if (matcher.find())
{
//Do something when the file name has an illegal character.
}
Note: I have not tested this code, but it should be enough to get you on the right track. The above code will match any string which contains a :, ;, `!' and '?'. Feel free to add/remove as you see fit.
You can use File.renameTo(File dest);.
Get the file name first:
String fileName = fullPath.substring(fullPath.lastIndexOf('\\'), fullPath.length);
Create an array of all special chars not allowed in file names.
for each char in array, check if fileName contains it. I guess, Java has a pre-built API for it.
Check this.
Note: This solution assumes that parent directory exists
I have to read the name of some files and put them in a list as a string. Its not so hard I just have some Problems with some characters like ä,ö,ü ... they are always as a '?' in my string.
Whats the Problem? Well the encoding. Ok this should be easy... thats what i thought. So I tried to use functions like:
new String(insert.getBytes("UTF-8")
or
new String(insert.getBytes("ISO-8859-1"), "UTF-8")
because the most of the files are ISO-8859-1
Its not helping. This is my code:
...
File[] fileList = dir.listFiles();
String insert;
for(File f : fileList) {
...
insert=f.getName().substring(0,f.getName().length()-4);
insert=insert.charAt(0)+insert.substring(1,insert.length()).toLowerCase().replaceFirst("([0-9]*(_s?(i)?(_dat)?)*$)", "").replaceFirst("_", " ");
...
System.out.println("test UTF8: " + new String(insert.getBytes("UTF-8"))); //not helping
System.out.println("test ISO , UTF8: " + new String(insert.getBytes("ISO-8859-1"), "UTF-8")); //not helping
...
names.add(insert);
}
At the end there are a lot of strings with '?' characters in my list.
How to fix the problem? And whats the best way if there are not only ISO-8859-1 files? (lets say there are a lot of unknown encoded files)
Thank You!
Given the extended comments back and forth under the question, it now looks like this is either a font problem or (perhaps more likely) a filename encoding problem.
I asked Lissy to run the following command to let us figure out what the problem is. If she is sure that the filename contain "ä" in them, but that character does not appear when she ls the filename, then this command will tell us whether this is a font or encoding problem.
touch filenäme
ls filen*me
If this shows "filenäme" in the output of ls then we know the problem is with the creation/copy of the files onto this system. This could happen if the program which created the files didn't realize what the filesystem encoding was or was too stupid to do the right thing. The convmv program will probably be the best way to fix this.
convmv -f ENCODING -t utf8 -r .
The question is what is the proper encoding. Possibilities include UTF-16, cp850, or perhaps iso8859-1. convmv --list will show you the list of currently known (to your system) encodings. Since the listed command above only shows you what it might do, it is safe to run several times with different encodings until you find one which works for all files.
If this is a font problem, we'll have to look into that
Unexpected question marks, spalts, etc in a String are a sign that something somewhere doesn't recognize a particular character when converting from one character set to another.
In your case, the problem could be occurring in a couple of places:
It could be occurring when your Java program is reading the file names from the directory (in the dir.listFiles() call).
It could be happening when you print the characters to the console stream.
In either case, the root cause is most likely a mismatch between what Java thinks the locale settings should be and the settings that the operating system and/or command shell are using.
As an experiment, try to list a directory containing the problematic file names from the command line. Do you see question marks or other splats there?
A second experiment to perform is to modify your Java program to dump one of the problem Strings as a sequence of numbers representing the character codes for each of the characters. Do you see the character codes for an ASCII / Unicode '?'.
The encoding of the content of the file name has nothing to do with the encoding of the file name itself.
You should get correct results from System.out.println(insert)
If you don't, it means that the shell has a different character encoding that the default character encoding for your system (this rarely happens; it would usually be the result of an explicit command to switch encodings in the shell).
If the file names are displayed correctly when you list the directory in the shell, I would expect them to be displayed correctly without specifying an encoding in your Java program.
If the shell is incapable of displaying the character (it is substituting the replacement character 0xFFFD (�) for these unprintable characters), there's nothing you can do from your Java application to change that. You need to change the terminal character encoding, install the right fonts, etc.; that is a operating system issue, not a Java issue.
At the same time, even if your terminal can't display the correct results, the Java program should be handling the character encodings correctly without your intervention.
The library behind the File API is figuring out the correct character encoding for your system and doing the necessary decoding into characters. Likewise, the database driver should negotiate with the database to determine the correct encoding, and do any necessary encoding into bytes on behalf of your application.
In a comment you wrote:
#mdrg: well, theres a Problem. I have to read the name of the files and then put them into a database. And there are a lot of '?' , that shouldnt be... – Lissy 27 mins ago
My guess is that the column you're inserting the filenames into specifies US-ASCII as the encoding and replaces characters outside that range with a replacement character, which in your case is the question mark.
So you have to find out the encoding for the column in your database table where you store the filenames. Various products have various syntaxes for retrieving that information.
In Java 1.6 you can use System.console() instead of System.out.println() to display accentuated characters to console.
public class Test {
public static void main(String args[]){
String s = "caractères français : à é \u00e9"; // Unicode for "é"
System.console().writer().println(s);
}
}
and the output is
C:\temp>java Test
caractères français : à é é
I am using Runtime.getRuntime.exec() Method to invoke an exe. The problem what I face with this method is that when I pass some exe path (c:\JPN_char_folder\mypath\myexe.exe) with other language chars (ex.Japanese) "it's saying "System cannot find the file specified". Would you please suggest some ideas to get around this? I even tried passing that exe path after converting to UTF-8 as well, but still I could not solve this.
-Robert.
I don't think that Japanese characters are the issue; it's the c: drive.
You need to write it this way:
String path = "c:\\\JPN_char_folder\\mypath\\myexe.exe";
See if that helps.
Most probably you have an encoding problem somewhere.
There are several steps here that the path value takes:
InstallAnywhere retrieves the path
InstallAnywhere puts it into a variable
Java reads the variable
Java puts it into a String
Java creates a java.io.File instance from String
Java runtime passes path (via File) to OS
Somehwere along this sequence something goes wrong with the path :-(.
It's hard to tell where; your best bet probably is to try and print out the value at every step along the path, to see where it goes wrong.
At least from inside Java, you should probably print out the String both as text, and as a list of Unicode code points (using String.codePointAt). That way you can see the real data Java uses.
Another approach:
Print out the value Java gets from InstallAnywhere (as text & as codepoints, as above)
Try to put the path into your Java program as a String literal, and fiddle until you can open the file that way. Then print that String as well.
Now you can compare the two results; that should give you an idea where the path gets messed up.
Note:
Does the path contain characters outside the Basic Multilingual Plane (BMP)? Java handles these a bit awkwardly, so you need to pay extra attention. Maybe you can check this first.
Even if you're using Windows, you can use slashes when specifying directories. This will help you with escaping backslash hell.
For example, on my system, 7z is located in directory c:\Program Files\7-Zip\.
Executing this
File file = new File("c:/Program Files/7-Zip/7z.exe");
if(file.exists()) {
System.out.println(file.getAbsolutePath());
}
Results in
c:\Program Files\7-Zip\7z.exe
being printed on the console.
I'd suggest you try using this idiom, i.e. check if .exe file exits before trying to execute it.