I've to make a code to upload/download a file on remote machine. But when i upload the file new line is not saved as well as it automatically inserts some binary characters. Also I'm not able to save the file in its actual format, I've to save it as "filename.ser". I'm using serialization-deserialization concept of java.
Thanks in advance.
How exactly are you transmitting the files? If you're using implementations of InputStream and OutputStream, they work on a byte-by-byte level so you should end up with a binary-equal output.
If you're using implementations of Reader and Writer, they convert the bytes to characters according to some character mapping, and then perform the reverse process when saving. Depending on the platform encodings of the various machines (and possibly other effects if you're not specifying the charset explicitly), you could well end up with differences in the binary file.
The fact that you mention newlines makes me think that you're using Readers to send strings (and possibly that you're stitching the strings back together yourself by manually adding newlines). If you want the files to be binary equal, then send them as a stream of bytes and store that stream verbatim. If you want them to be equal as strings in a given character set, then use Readers and Writers but specify the character set explicitly. If you want them to be transmitted as strings in the platform default set (not very useful), then accept that they're not going to be binary equal as files.
(Also, your question really doesn't provide much information to solve it. To me, it basically reads "I wrote some code to do X, and it doesn't work. Where did I go wrong?" You seem to assume that your code is correct by not listing it, but at the same time recognise that it's not...)
Related
I have implemented the program that will transfer any txt file using the udp socket in java. I am using printwriter to write and read. But using that I am not able to transfer any file other than txt (say i want to transfer pdf). In this case what should be done. I am using the below function for file write.
Output_File_Write = new PrintWriter("dummy.txt");
Output_File_Write.print(new String(p.getData()));
Writers / PrintWriters are for writing text files. They take (Unicode-based) character data and encode it using the default character encoding (or a specified one), and write that to the file.
A PDF document (as you get it from the network) is in a binary format, so you need to use a FileOutputStream to write the file.
It is also a little bit concerning that you are attempting to transfer documents using UDP. UDP provides no guarantees that the datagrams sent will all arrive, or that they will arrive in the same order as they were sent. Unless you can always fit the entire document into a single datagram, you will have to do a significant amount of work to detect that datagrams have been dropped or have arrived in the wrong order ... and take remedial action.
Using TCP would be far simpler.
AFAIK PrintWriter is meant to be used with Text. Quote from doc
Prints formatted representations of objects to a text-output stream. This class implements all of the print methods found in PrintStream. It does not contain methods for writing raw bytes, for which a program should use unencoded byte streams.
To be able to send binary data you would need to use apt API for it, for example PrintStream
Pretty simple question: what's the performance difference between a Byte Stream and a Character Stream?
The reason I ask is because I'm implementing level loading from a file, and initially I decided I would just use a Byte Stream for the purpose, because it's the simplest type, and thus it should perform the best. But then I figured that it might be nice to be able to read and write the level files via a text editor instead of writing a more complex level editor (to start off with). In order for it to be legible by a text editor, I would need to use Character streams instead of Byte streams, so I'm wondering if there's really any performance difference worth mentioning between the two methods? At the moment it doesn't really matter much since level loading is infrequent, but I'd be interested to know for future reference, for instances where I might need to load levels from hard drive on the fly (large levels).
Pretty simple question: what's the performance difference between a Byte Stream and a Character Stream?
I assume you are compare Input/OutputStream with Reader/Writer streams. If that is the case the performance is almost the same. Unless you have a very fast drive, the bottleneck will be almost certainly the disk, in which case it doesn't matter too much what you do in Java.
The reason I ask is because I'm implementing level loading from a file, and initially I decided I would just use a Byte Stream for the purpose, because it's the simplest type, and thus it should perform the best. But then I figured that it might be nice to be able to read and write the level files via a text editor instead of writing a more complex level editor (to start off with).
All files are actually a stream of bytes. So when you use Reader/Writer it uses an encoder to convert bytes to chars and back again. There is nothing stopping you reading and writing bytes directly which do exactly the same thing.
In order for it to be legible by a text editor, I would need to use Character streams instead of Byte streams,
You wouldn't, but it might make it easier. If you only want ASCII encoding, there is no difference. If you want UTF-8 encoding with non-ASCII characters using chars is likely to be simpler.
so I'm wondering if there's really any performance difference worth mentioning between the two methods?
I would worry about correctness first and performance second.
I might need to load levels from hard drive on the fly (large levels).
Java can read/write text at about 90 MB/s, most hard drives and networks are not that fast. However if you need to write GBs in second and you have fast SSD, then it might make a difference. SSDs can perform 500 MB/s or more and then I would suggest you use NIO to maximise performance.
Java has only one kind of stream: a byte stream. The class java.io.InputStream and java.io.OutputStream are defined in terms of bytes.
To convert bytes to characters, and eventually Strings, you will always be using the functionality in java.nio.charset. However, for your convenience, Java provides Reader and Writer methods that adapt byte streams into stream-like objects that operate on characters and Strings.
There is a CPU time cost, of course, in conversion. However, the cost is very low. If you manage to write a program that has performance dominated by this cost, you've written a very lean program indeed.
I don't know Java, so take this with a pinch of salt.
A character stream typically means each thing you read is decoded into an individual character based on the current locale, which means it's important for internationalised text data which can't be represented with just 128 or 256 different choices. The set of all possible characters is defined in the Unicode system and how you get from individual bytes to characters is defined by the encoding. More information here: http://www.joelonsoftware.com/articles/Unicode.html
A byte stream on the other hand just reads in values from 0 to 255 and doesn't try and interpret them as characters from any particular language. As such, a byte stream should always be somewhat faster. But if you had international characters in there, they'll not display properly unless you know exactly how they were encoded.
For most purposes, human-readable data can be stored in ASCII, which only uses 7 bits of data per character and gives you 128 different characters. This will be readable by any typical text editor, and since ASCII characters are a subset of Unicode and of the UTF-8 encoding, you can read an ASCII file either as bytes or as UTF-8 characters, and the content will be unchanged.
If you ever need to store binary values for more efficient serialisation (eg. to store the number 123456789 as a 4 byte integer instead of as a 9 byte string) then you'll need to switch to a byte stream, but you also give up human-readability at this point so the issue becomes somewhat irrelevant.
It's unlikely that the size of the level will ever have much effect on your loading times - a typical hard drive can read well over a hundred megabytes per second. Code whichever way is easiest for you, and only optimise this later if your profiling shows there is a problem.
I am storing large amounts of information inside of text files that are written via java. I have two questions relating to this:
Is there any efficiency boost to writing in binary or bytecode over Strings?
What would I use to write the data type into a file.
I already have a setup based around Strings, but I want to compare and at least know how to write the file in bytecode or binary.
When I read in the file, it will be translated into Strings again, but according to my reasearch if I write the file straight into bytecode it removes the added process on both ends of translating between Strings and code both for writing the file and for reading it.
cHao has a good point about just using Strings anyway, but I am still interested in the how if how to write varied data types in the file.
In other words, can I still use the FileReader and BufferedReader to read and translate back to Strings, or is there another thing to use. Also using a BinaryWriter, is it still just the FileWriter class that I use???
If you want to write it in "binary", and you want to save space, why not just zip it using the jdk? Meets all your requirements.
I want to write two simple utilities:
Receives a Binary file, and converts it to a text file (ASCII format).
Receives a text file in the format of the above file and restores the original binary file.
The reason I need this is that very stupid, but still a reason. I have two computers - one with internet access and one without. I write software on the one without internet. I get emails on the 2nd one. I need to transfer binary files from one to another (e.g. jars) but the only communication between them is a clipboard (text only).
Might be a very localized problem - but I assume it has some solution in the worlds of data encryption/compression/network transfer.
The only thing I could come up is go over the binary file and convert each byte into it's HEX representation - so for every byte I'll get two ASCII characters (i.e. two bytes). Is there anything better? (This solution doubles the amount of info and might not be possible to transfer via clipboard)
One limitation - I need it as a java based solution (I want to write it myself)
Google for Base64, and use Apache commons codec to have a ready to use implementation.
When reading zipfiles (using Java ZipInputStream or any other library) from an unknown source is there any way of detecting which entries are "character data" (and if so the encoding) or "binary data". And, if binary, any way of determining any more information (MIME types, etc.)
EDIT does the ByteOrderMark (BOM) occur in zipentries and if so do we have to make special operations for it.
It basically boils down to heuristics for determining the contents of files. For instance, for text files (ASCII) it should be possible to make a fairly good guess by checking the range of byte values used in the file -- although this will never be completely fool-proof.
You should try to limit the classes of file types you want to identify, e.g. is it enough to discern between "text data" and "binary data" ? If so you should be able to get a fairly high success rate for detection.
For UNIX systems, there is always the file command which tries to identify file types based on (mostly) content.
Maybe implement a Java component that is capable of applying the rules defined in /usr/share/file/magic. I would love to have something like that. (You would basically have to be able to look at the first x couple of bytes.)