Here I have the following bit of code taken from this oracle java tutorial:
// Defaults to READ
try (SeekableByteChannel sbc = Files.newByteChannel(file)) {
ByteBuffer buf = ByteBuffer.allocate(10);
// Read the bytes with the proper encoding for this platform. If
// you skip this step, you might see something that looks like
// Chinese characters when you expect Latin-style characters.
String encoding = System.getProperty("file.encoding");
while (sbc.read(buf) > 0) {
buf.rewind();
System.out.print(Charset.forName(encoding).decode(buf));
buf.flip();//LINE X
}
} catch (IOException x) {
System.out.println("caught exception: " + x);
So basically I do not get any output out of it.
I have tried to put some flags in the while loop to check whether or not it gets into, and it gets into. I also changed the encoding in Charset.defaultCharset().decode(buf), result : no output.
Of course there is text in the file passed to newByteChannel(file);
Any idea?
Thanks a lot in advance.
**
EDIT:
** Solved, it was just the file I was trying to access that had been previously accidentally corrupted. After having changed file, everything is working.
The code looks wrong. Try changing the rewind() to flip(), and the flip() to compact().
Related
I am trying to download web page with all its resources . First i download the html, but when to be sure to keep file formatted and use this function below .
there is and issue , i found 10 in the final file and when i found that hexadecimal code of the LF or line escape . and this makes troubles to my javascript functions .
Example of the final result :
<!DOCTYPE html>10<html lang="fr">10 <head>10 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />10
Can someone help me to found the real issue ?
public static String scanfile(File file) {
StringBuilder sb = new StringBuilder();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
while (true) {
String readLine = bufferedReader.readLine();
if (readLine != null) {
sb.append(readLine);
sb.append(System.lineSeparator());
Log.i(TAG,sb.toString());
} else {
bufferedReader.close();
return sb.toString();
}
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
There are multiple problems with your code.
Charset error
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
This isn't going to work in tricky ways.
Files (and, for that matter, data given to you by webservers) comes in bytes. A stream of numbers, each number being between 0 and 255.
So, if you are a webserver and you want to send the character ö, what byte(s) do you send?
The answer is complicated. The mapping that explains how some character is rendered in byte(s)-form is called a character set encoding (shortened to 'charset').
Anytime bytes are turned into characters or vice versa, there is always a charset involved. Always.
So, you're reading a file (that'd be bytes), and turning it into a Reader (which is chars). Thus, charset is involved.
Which charset? The API of new FileReader(path) explains which one: "The system default". You do not want that.
Thus, this code is broken. You want one of two things:
Option 1 - write the data as is
When doing the job of querying the webserver for the data and relaying this information onto disk, you'd want to just store the bytes (after all, webserver gives bytes, and disks store bytes, that's easy), but the webserver also sends the encoding, in a header, and you need to save this separately. Because to read that 'sack of bytes', you need to know the charset to turn it into characters.
How would you do this? Well, up to you. You could for example decree that the data file starts with the name of a charset encoding (as sent via that header), then a 0 byte, and then the data, unmodified. I think you should go with option 2, however
Option 2
Another, better option for text-based documents (which HTML is), is this: When reading the data, convert it to characters, using the encoding as that header tells you. Then, to save it to disk, turn the chars back to bytes, using UTF-8, which is a great encoding and an industry standard. That way, when reading, you just know it's UTF-8, period.
To read a UTF-8 text file, you do:
Files.newBufferedReader(Paths.get(file));
The reason this works, is that the Files API, unlike most other APIs (and unlike FileReader, which you should never ever use), defaults to UTF_8 and not to platform-default. If you want, you can make it more readable:
Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8);
same thing - but now in the code it is clear what's happening.
Broken exception handling
} catch (IOException e) {
e.printStackTrace();
return null;
}
This is not okay - if you catch an exception, either [A] throw something else, or [B] handle the problem. And 'log it and keep going' is definitely not 'handling' it. Your strategy of exception handling results in 1 error resulting in a thousand things going wrong with a thousand stack traces, and all of them except the first are undesired and irrelevant, hence why this is horrible code and you should never write it this way.
The easy solution is to just put throws IOException on your scanFile method. The method inherently interacts with files, it SHOULD be throwing that. Note that your psv main(String[] args) method can, and usually should, be declared to throws Exception.
It also makes your code simpler and shorter, yay!
Resource Management failure
a filereader is a resource. You MUST close it, no matter what happens. You are not doing that: If .readLine() throws an exception, then your code will jump to the catch handler and bufferedReader.close is never executed.
The solution is to use the ARM (Automatic Resource Management) construct:
try (var br = Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8)) {
// code goes here
}
This construct ensures that close() is invoked, regardless of how the 'code goes here' block exits. Even if it 'exits' via an exception or a return statement.
The problem
Your 'read a file and print it' code is other than the above three items mostly fine. The problem is that the HTML file on disk is corrupted; the error lies in your code that reads the data from the web server and saves it to disk. You did not paste that code.
Specifically, System.lineSeparator() returns the actual string. Thus, assuming the code you pasted really is the code you are running, if you are seeing an actual '10' show up, then that means the HTML file on disk has that in there. It's not the read code.
Closing thoughts
More generally the job of 'just print a file on disk with a known encoding' can be done in far fewer lines of code:
public static String scanFile(String path) throws IOException {
return Files.readString(Paths.get(path));
}
You should just use the above code instead. It's simple, short, doesn't have any bugs, cannot leak resources, has proper exception handling, and will use UTF-8.
Actually, there is no problem in this function I was mistakenly adding 10 using another function in my code .
I need to read a text from file and, for instance, print it in console. The file is in UTF-8. It seems that I'm doing something wrong because some russian symbols are printed incorrectly. What's wrong with my code?
StringBuilder content = new StringBuilder();
try (FileChannel fChan = (FileChannel) Files.newByteChannel(Paths.get("D:/test.txt")) ) {
ByteBuffer byteBuf = ByteBuffer.allocate(16);
Charset charset = Charset.forName("UTF-8");
while(fChan.read(byteBuf) != -1) {
byteBuf.flip();
content.append(new String(byteBuf.array(), charset));
byteBuf.clear();
}
System.out.println(content);
}
The result:
Здравствуйте, как поживае��е?
Это п��имер текста на русском яз��ке.ом яз�
The actual text:
Здравствуйте, как поживаете?
Это пример текста на русском языке.
UTF-8 uses a variable number of bytes per character. This gives you a boundary error: You have mixed buffer-based code with byte-array based code and you can't do that here; it is possible for you to read enough bytes to be stuck halfway into a character, you then turn your input into a byte array, and convert it, which will fail, because you can't convert half a character.
What you really want is either to first read ALL the data and then convert the entire input, or, to keep any half-characters in the bytebuffer when you flip back, or better yet, ditch all this stuff and use code that is written to read actual characters. In general, using the channel API complicates matters a ton; it's flexible, but complicated - that's how it goes.
Unless you can explain why you need it, don't use it. Do this instead:
Path target = Paths.get("D:/test.txt");
try (var reader = Files.newBufferedReader(target)) {
// read a line at a time here. Yes, it will be UTF-8 decoded.
}
or better yet, as you apparently want to read the whole thing in one go:
Path target = Paths.get("D:/test.txt");
var content = Files.readString(target);
NB: Unlike most java methods that convert bytes to chars or vice versa, the Files API defaults to UTF-8 (instead of the useless and dangerous, untestable-bug-causing 'platform default encoding' that most java API does). That's why this last incredibly simple code is nevertheless correct.
Currently saving an int[] from hashmap in a file with the name of the key to the int[]. This exact key must be reachable from another program. Hence I can't switch name of the files to english only chars. But even though I use ISO_8859_1 as the charset for the filenames the files get all messed up in the file tree. The english letters are correct but not the special ones.
/**
* Save array to file
*/
public void saveStatus(){
try {
for(String currentKey : hmap.keySet()) {
byte[] currentKeyByteArray = currentKey.getBytes();
String bytesString = new String(currentKeyByteArray, StandardCharsets.ISO_8859_1);
String fileLocation = "/var/tmp/" + bytesString + ".dat";
FileOutputStream saveFile = new FileOutputStream(fileLocation);
ObjectOutputStream out = new ObjectOutputStream(saveFile);
out.writeObject(hmap.get(currentKey));
out.close();
saveFile.close();
System.out.println("Saved file at " + fileLocation);
}
} catch (IOException e) {
e.printStackTrace();
}
}
Could it have to do with how linux is encoding characters or is more likely to do with the Java code?
EDIT
I think the problem lies with the OS. Because when looking at text files with cat for example the problem is the same. However vim is able to decode the letters correctly. In that case I would have to perhaps change the language settings from the terminal?
You have to change the charset in the getBytes function as well.
currentKey.getBytes(StandardCharsets.ISO_8859_1);
Also, why are you using StandardCharsets.ISO_8859_1? To accept a wider range of characters, use StandardCharsets.UTF_8.
The valid characters of a filename or path vary depending on the file system used. While it should be possible to just use a java string as filename (as long as it does not contain characters invalid in the given file system), there might be interoperability issues and bugs.
In other words, leave out all Charset-magic as #RealSkeptic recommends and it should work. But changing the environment might result in unexpected behavior.
Depending on your requirements, you might therefore want to encode the key to make sure it only uses a reduced character set. One variant of Base64 might work (assuming your file system is case sensitive!). You might even find a library (Apache Commons?) offering a function to reduce a string to characters safe for use in a file name.
The code below gets a byte array from an HTTP request and saves it in bytes[], the final data will be saved in message[].
I check to see if it contains a header by converting it to a String[], if I do, I read some information from the header then cut it off by saving the bytes after the header to message[].
I then try to output message[] to file using FileOutputStream and it works slightly, but only saves 10KB of information,one iteration of the while loop, (seems to be overwriting), and if I set the FileOutputStream(file, true) to append the information, it works... once, then the file is just added on to the next time I run it, which isn't what I want. How do I write to the same file with multiple chunks of bytes through each iteration, but still overwrite the file in completeness if I run the program again?
byte bytes[] = new byte[(10*1024)];
while (dis.read(bytes) > 0)
{
//Set all the bytes to the message
byte message[] = bytes;
String string = new String(bytes, "UTF-8");
//Does bytes contain header?
if (string.contains("\r\n\r\n")){
String theByteString[] = string.split("\r\n\r\n");
String theHeader = theByteString[0];
String[] lmTemp = theHeader.split("Last-Modified: ");
String[] lm = lmTemp[1].split("\r\n");
String lastModified = lm[0];
//Cut off the header and save the rest of the data after it
message = theByteString[1].getBytes("UTF-8");
//cache
hm.put(url, lastModified);
}
//Output message[] to file.
File f = new File(hostName + path);
f.getParentFile().mkdirs();
f.createNewFile();
try (FileOutputStream fos = new FileOutputStream(f)) {
fos.write(message);
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
You're opening a new FileOutputStream on each iteration of the loop. Don't do that. Open it outside the loop, then loop and write as you are doing, then close at the end of the loop. (If you use a try-with-resources statement with your while loop inside it, that'll be fine.)
That's only part of the problem though - you're also doing everything else on each iteration of the loop, including checking for headers. That's going to be a real problem if the byte array you read contains part of the set of headers, or indeed part of the header separator.
Additionally, as noted by EJP, you're ignoring the return value of read apart from using it to tell whether or not you're done. You should always use the return value of read to know how much of the byte array is actually usable data.
Fundamentally, you either need to read the whole response into a byte array to start with - which is easy to do, but potentially inefficient in memory - or accept the fact that you're dealing with a stream, and write more complex code to detect the end of the headers.
Better though, IMO, would be to use an HTTP library which already understands all this header processing, so that you don't need to do it yourself. Unless you're writing a low-level HTTP library yourself, you shouldn't be dealing with low-level HTTP details, you should rely on a good library.
Open the file ahead of the loop.
NB you need to store the result of read() in a variable, and pass that variable to new String() as the length. Otherwise you are converting junk in the buffer beyond what was actually read.
There is an issue with reading the data - you read only part of the response (because at that moment not all data was transfered to you yet) - so obviusly you write only that part.
check this answer for how to read full data from the InputStream:
Convert InputStream to byte array in Java
Hello I am currently working with sockets and input/output streams. I have a strange problem with my loop I use to send bytes. For some reason it get stuck when it tries to read from the inputstream when it is supposed to stop. Does anyone have any idea whats wrong?
int bit;
final byte[] request = new byte[1024];
if (response instanceof InputStream)
{
while ((bit = response.read(request)) > 0) { // <-- Stuck here
incoming.getOutputStream().write(request,0,bit);
incoming.getOutputStream().flush();
}
}
incoming.close();
InputStream.read blocks until input data is available, end of file is detected, or an exception is thrown.
You don't catch the exception, and don't check for EOF.
What I've done in the past to leave each side open is to add a termination character to the end of each message that you wouldn't expect to see in the message. If you are building the messages yourself then you could use a character such as a ; or maybe double pipes or something ||. Then just check for that character on the receiving end. Just a workaround. Not a solution. It was necessary in my case but may not be for you.