I am a beginner and I have a file having variable sized records; there are two fields per row
i.e. one is 7-15 digits key and then followed by space there is a string which is also variable sized for each record.
I am trying to read bytes only of page size into my buffer and then process them.
The problem is that if i use Java.RanomAccessFile and use seek method to reach a particular line , then i use ReadFully method to read those 1024 bytes into my buffer. I have written the functions to convert byte into int and byte into string -but the problem is that I dont know how many bytes form that 7-15 digit and how many bytes form my string.
When you say a row, do you mean each row has a line separator in between? If that is the case, you can use something like BufferedReader's readline() method. That gives you a string which is 1 line without the line separator.
Related
I have a use case where I have one S3 file. The size is not large enough but it can contains 10-50 million single row records. I want to read a specific byte range. I have read that we can use Range header in S3 GetObject.
Like this:
final GetObjectRequest request = new GetObjectRequest(s3Bucket, key);
request.withRange(byteStartRange, byteEndRange);
return s3Client.getObject(request);
But want to know, does the byte range always guarantees a complete line?
For e.g:
My S3 file content is :
dhjdjdjdjdk
djdjjdfddkkd
dhdjjdjdjdd
cjjjdjdddd
......
If I specify the byte range to be some range X to Y, Will it guarantee full line read of it can read some incomplete line which falls in the byte range?
No, the Range will not guarantee a complete line.
It will provide back only the specific range of bytes requested. Amazon S3 has no insight into the contents of a file. It cannot parse/recognize newline characters.
You will need to request a large enough range that it (hopefully) contains a complete line. Then your code would need to determine where the line ends and the next line begins.
I am trying to convert comp3 and EBCIDIC characters in my java code but im running into out of memory exception as the amount of data handled is huge about 5 gb. my code is currently as follows:
byte[] data = Files.readAllBytes(path);
this is resulting in an out of memory exception which i can understand, but i cant use a file scanner as well since the data in the file wont be split into lines.
Can anyone point me in the correct direction on how to handle this
Note: the file may contain records of different length hence splitting it based on record length seams not possible.
As Bill said you could (should) ask for the data to be converted to display characters on the mainframe and if English speaking you can do a ascii transfer.
Also how are you deciding where comp-3 fields start ???
You do not have to read the whole file into memory, you can still read the file in blocks, This method will fill an array of bytes:
protected final int readBuffer(InputStream in, final byte[] buf)
throws IOException {
int total = 0;
int num = in.read(buf, total, buf.length);
while (num >= 0 && total + num < buf.length) {
total += num;
num = in.read(buf, total, buf.length - total);
}
return num;
}
if all the records are the same length, create an array of the record length and the above method will read one record at a time.
Finally the JRecord project has classes to read fixed length files etc. It can do comp-3 conversion. Note: I am the author of JRecord.
I'm running into out of memory exception as the amount of data handled is huge about 5 gb.
You only need to read one record at a time.
My code is currently as follows:
byte[] data = Files.readAllBytes(path);
This is resulting in an out of memory exception which i can understand
Me too.
but i cant use a file scanner as well since the data in the file wont be split into lines.
You mean you can't use the Scanner class? That's not the only way to read a record at a time.
In any case not all files have record delimiters. Some have fixed-length records, some have length words at the start of each record, and some have record type attributes at the start of each record, or in both cases at least in the fixed part of the record.
I'll have to split it based on an attribute record_id at a particular position(say at the begining of each record) that will tell me the record length
So read that attribute, decode it if necessary, and read the rest of the record according to the record length you derive from the attribute. One at a time.
I direct your attention to the methods of DataInputStream, especially readFully(). You will also need a Java COMP-3 library. There are several available. Most of the rest can be done by built-in EBCDIC character set decoders.
I am using an OBJ Loader library that I found on the 'net and I want to look into speeding it up.
It works by reading an .OBJ file line by line (its basically a text file with lines of numbers)
I have a 12mb OBJ file that equates to approx 400,000 lines. suffice to say, it takes forever to try and read line by line.
Is there a way to speed it up? It uses a BufferedReader to read the file (which is stored in my assets folder)
Here is the link to the library: click me
Just an idea: you could first get the size of the file using the File class, after getting the file:
File file = new File(sdcard,"sample.txt");
long size = file.length();
The size returned is in bytes, thus, divide the file size into a sizable number of chunks e.g. 5, 10, 20 e.t.c. with a byte range specified and saved for each chunk. Create byte arrays of the same size as each chunk created above, then "assign" each chunk to a separate worker thread, which should read its chunk into its corresponding character array using the read(buffer, offset, length) method i.e. read "length" characters of each chunk into the array "buffer" beginning at array index "offset". You have to convert the bytes into characters. Then concatenate all arrays together to get the final complete file contents. Insert checks for the chunk sizes so each thread does not overlap the others boundaries. Again, this is just an idea, hopefully it will work when actually implemented.
I am building history parser, there's an application that already done the logging task (text based).
Now that my supervisor want me to create an application to read that log.
The log is is created at the end of the month, and is separated by [date]:
[19-11-2014]
- what goes here
- what goes here
[20-11-2014]
- what goes here
- what goes here
etc...
If the log file has small size, there's no problem processing the content by DataInputStream to get the byte[], and convert it to String and then do the filtering process (by doing substring and such).
But when the file has a large size (about 100mb), it throws JavaHeapSpace exception. I know that this is because the length of the content exceeds String maxlength, when I try not to convert the byte[] into string, no exception was thrown.
Now the question is, how do I split the byte[] into several byte[]?
Which is each new byte[] only contains single:
[date]
- what goes here
So if within a month we have 9 dates in log, it would be split into 9 byte[].
The splitting marker would be based on [\\d{2}-\\d{2}-\\d{4}] , if it is string I could just use Regex to find all the marker, get the indexOf and then substring it.
But how do I do this without converting to string first? As it would throws the JavaHeapSpace.
I think there are several concepts here that you're missing.
First, an InputStream is a Stream, which means it is a flow of bytes. What you do with that flow is up to you, but saving all of the stream to memory defies the point of the stream construct altogether.
Second, a DataInputStream is used to read objects from a binary file that were serialized there by a DataOutputStream. Reading just a string is overkill for this type of Stream, since a simple InputStream can do that.
As for your specific problem, I would use a BufferedFileReader, and read one line at a time, until reaching the next date. At that point you can do whatever processing you need on the last chunk of lines you read, and free the memory. Thus not running into the same problem.
Good evening!
I have been reading through stuff on the internet for hours now but I can't find any way to get the conent of a file from the internet into an int array.
I got a .txt file (that I download from the internet) which is loaded over a BufferedStreamInput. There is a byte array which I tried to make use of it, but didn't have much success. Inside the file are random letters such as "abcCC". Now I would need the int value of each character (such as 97,98,99,67,67). I would add them to an array and then count how often a specific value appears. My problem tho is to get those values from the stream and I don't seem to find a way to do so.
Thank you for any ideas!
The Java API already contains a method that seems very convenient for you. InputStream defines the following method.
public abstract int read() throws IOException;
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
It should make it trivial to read integers from a file one byte at a time (assuming characters to integers is one-to-one; int is actually four bytes in size).
An example of using this method to read characters individually as int and then casting to char follows. Again, this assumes that each character is encoded as a single byte, and that there is a one-to-one from characters to ints. If you're dealing with multi-byte character encoding and/or you want to support integers greater than 255, then the problem becomes more complex.
public static void main(String[] args) {
ByteArrayInputStream in = new ByteArrayInputStream("abc".getBytes());
int value;
while ((value = in.read()) > -1) {
System.out.println((char) value);
}
}
I would use a
java.util.Scanner.
You can initialize it with many different options including
File
InputStream
Readable
Later you can process the whole input line by line or in any way you like.
please refer to the great javadoc of Scanner.
Regards,
z1pi