I have a project in which I have to write to a random access file.
I am reading a country with a some information including: id, name, year of independence, etc. That information is what I have to write to the file.
My questions are:
How can I measure the size of the record I'm writing to on the random access file?
I know how to do it via: filewrite.int(variable). But the project requires me to somehow have a constant size of where I write each character.
I know 2 bytes is one character, but how can I say "I want to write this line (where each line is a country with its information) so write the line from byte 1 to byte 15 and have a constant size"?
Thanks!
You shouldn't have to measure the size of a record. In this instance, I'd say you should define a fixed length for each field of a record. This way, you will always know how long a record will be. You should be able to use RandomAccessFile. Read through the documentation on that class. If you want to make your life easier, write a service class for your particular file to wrap the RandomAccessFile methods.
An example of a method signature for the service class would be:
void writeCountry(int recordNumber, String country) throws IllegalArgumentException;
Then when you implement this, you will do some math to figure out where to seek to, and what to write to file.
You can define a fixed length for each field. Then format your values before writing them out. Since each line is a fixed number of characters you can determine the number of bytes based on the number of lines.
You could use the Formatter class to help.
http://docs.oracle.com/javase/6/docs/api/java/util/Formatter.html
Specifically take a look at the Width and Precision sections of the doc.
Take a look at the out put of something like this:
public static void main(String [] args) {
Formatter formatter = new Formatter(System.out);
formatter.format("%5.5s %3.3s %3.3s %3.3s", "012345678901", "b", "c", "d");
}
Keep in mind what should happen if the value is larger than the field.
Read more at Ankit.co
import java.io.IOException;
import java.io.RandomAccessFile;
public class RandomAccessDemo
{
public static void main(String[] args)
{
try
{
RandomAccessFile raf = new RandomAccessFile("test.txt", "rw");
raf.writeInt(10);
raf.writeInt(20);
raf.writeInt(30);
raf.writeInt(400);
raf.seek((3 - 1) * 4); // For 3rd integer, We are doing 2 * size of Int(4).
raf.writeInt(99);
raf.seek(0); // Going back to start point
int i = 0;
while(raf.length()>raf.getFilePointer())
{
i = raf.readInt();
System.out.println(i);
}
raf.close();
}
catch (Exception e)
{
System.out.println(e);
}
}
}
View my post on Random Access Files in Java at ankit.co
Related
I am basically looking for a solution that allows me to stream the lines and replace them IN THE SAME FILE, a la Files.lines
Any mechanism in Java 8/NIO for replacing the lines of a big file without loading it in memory?
Basically, no.
Any change to a file that involves changing the number of bytes between offets A and B can only be done by rewriting the file, or creating a new one. In either case, everything after B has to be loaded / read into memory.
This is not a Java-specific restriction. It is a consequence of the way that modern operating systems represent files, and the low-level (ie.e. syscall) APIs that they provide to applications.
In the specific case where you replace one line (or sequence of lines) with a line (or sequence of lines) of exactly the same length, then you can do the replacement using either RandomAccessFile, or by mapping the file into memory. Note that the latter approach won't cause the entire file to be read into memory.
It is also possible to replace or delete lines while updating the file "in place" (changing the file length ...). See #Sergio Montoro's answer for an example. However, with an in place update, there is a risk that the file will be corrupted if the application is interrupted. And this does involve reading and rewriting all bytes in the file after the insertion / deletion point. And that entails loading them into memory.
There was a mechanism in Java 1: RandomAccessFile; but any such in-place mechanism requires that you know the start offset of the line, and that the new line is the same length as the old one.
Otherwise you have to copy the file up to that line, substitute the new line in the output, and then continue the copy.
You certainly don't have to load the entire file into memory.
Yes.
A FileChannel allows random read/write to any position of a file. Therefore, if you have a read ahead buffer which is long enough you can replace lines even if the new line is longer than the former one.
The following example is a toy implementation which makes two assumptions: 1st) the input file is ISO-8859-1 Unix LF encoded and 2nd) each new line is never going to be longer than the next line (one line read ahead buffer).
Unless you definitely cannot create a temporary file, you should benchmark this approach against the more natural stream in -> stream out, because I do not know what performance may a spinning drive provide you for an algorithm that constantly moves forward and backward in a file.
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import static java.nio.file.StandardOpenOption.*;
import java.io.IOException;
public class ReplaceInFile {
public static void main(String args[]) throws IOException {
Path file = Paths.get(args[0]);
ByteBuffer writeBuffer;
long readPos = 0l;
long writePos;
String line_m;
String line_n;
String line_t;
FileChannel channel = FileChannel.open(file, READ, WRITE);
channel.position(0);
writePos = readPos;
line_m = readLine(channel);
do {
readPos += line_m.length() + 1;
channel.position(readPos);
line_n = readLine(channel);
line_t = transformLine(line_m)+"\n";
writeBuffer = ByteBuffer.allocate(line_t.length()+1);
writeBuffer.put(line_t.getBytes("ISO8859_1"));
System.out.print("replaced line "+line_m+" with "+line_t);
channel.position(writePos);
writeBuffer.rewind();
while (writeBuffer.hasRemaining()) {
channel.write(writeBuffer);
}
writePos += line_t.length();
line_m = line_n;
assert writePos > readPos;
} while (line_m.length() > 0);
channel.close();
System.out.println("Done!");
}
public static String transformLine(String input) throws IOException {
return input.replace("<", "<").replace(">", ">");
}
public static String readLine(FileChannel channel) throws IOException {
ByteBuffer readBuffer = ByteBuffer.allocate(1);
StringBuffer line = new StringBuffer();
do {
int read = channel.read(readBuffer);
if (read<1) break;
readBuffer.rewind();
char c = (char) readBuffer.get();
readBuffer.rewind();
if (c=='\n') break;
line.append(c);
} while (true);
return line.toString();
}
}
I have to parse a txt file for a tax calculator that has this form:
Name: Mary Jane
Age: 23
Status: Married
Receipts:
Id: 1
Place: Restaurant
Money Spent: 20
Id: 2
Place: Mall
Money Spent: 30
So, what i have done so far is:
public void read(File file) throws FileNotFoundException{
Scanner scanner = new Scanner(file);
String[] tokens = null;
while(scanner.hasNext()){
String line= scanner.nextLine();
tokens = line.split(":");
String lastToken = tokens[tokens.length - 1];
System.out.println(lastToken);
So, I want to access only the second column of this file (Mary Jane, 23, Married) to a class taxpayer(name, age, status) and the receipts' info to an Arraylist.
I thought of taking the last token and save it to an String array, but I can't do that because I can't save string to string array. Can someone help me? Thank you.
The fastest way, if your data is ASCII and you don't need charset conversion, is to use a BufferedInputStream and do all the parsing yourself -- find the line terminators, parse the numbers. Do NOT use a Reader, or create Strings, or create any objects per line, or use parseInt. Just use byte arrays and look at the bytes. It's a little messier, but pretend you're writing C code, and it will be faster.
Also give some thought to how compact the data structure you're creating is, and whether you can avoid creating an object per line there too by being clever.
Frankly, I think the "fastest" is a red herring. Unless you have millions of these files, it is unlikely that the speed of your code will be relevant.
And in fact, your basic approach to parsing (read line using Scanner, split line using String.split(...) seems pretty sound.
What you are missing is that the structure of your code needs to match the structure of the file. Here's a sketch of how I would do it.
If you are going to ignore the first field of each line, you need a method that:
reads a line, skipping empty lines
splits it, and
returns the second field.
If you are going to check that the first field contains the expected keyword, then modify the method to take a parameter, and check the field. (I'd recommend this version ...)
Then call the above method in the correct pattern; e.g.
call it 3 times to extract the name, age and marital status
call it 1 time to skip the "reciepts" line
use a while loop to call the method 3 times to read the 3 fields for each receipt.
First why do you need to invest time into the fastest possible solution? Is it because the input file is huge? I also do not understand how you want to store result of parsing? Consider new class with all fields you need to extract from file per person.
Few tips:
- Avoid unnecessary per-line memory allocations. line.split(":") in your code is example of this.
- Use buffered input.
- Minimize input/output operations.
If these are not enough for you try to read this article http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
Do you really need it to be as fast as possible? In situations like this, it's often fine to create a few objects and do a bit of garbage collection along the way in order to have more maintainable code.
I'd use two regular expressions myself (one for the taxpayer and another for the receipts loop).
My code would look something like:
public class ParsedFile {
private Taxpayer taxpayer;
private List<Receipt> receipts;
// getters and setters etc.
}
public class FileParser {
private static final Pattern TAXPAYER_PATTERN =
// this pattern includes capturing groups in brackets ()
Pattern.compile("Name: (.*?)\\s*Age: (.*?)\\s*Status: (.*?)\\s*Receipts:", Pattern.DOTALL);
public ParsedFile parse(File file) {
BufferedReader reader = new BufferedReader(new FileReader(file)));
String firstChunk = getNextChunk(reader);
Taxpayer taxpayer = parseTaxpayer(firstChunk);
List<Receipt> receipts = new ArrayList<Receipt>();
String chunk;
while ((chunk = getNextChunk(reader)) != null) {
receipts.add(parseReceipt(chunk));
}
return new ParsedFile(taxpayer, receipts);
}
private TaxPayer parseTaxPayer(String chunk) {
Matcher matcher = TAXPAYER_PATTERN.matcher(chunk);
if (!matcher.matches()) {
throw new Exception(chunk + " does not match " + TAXPAYER_PATTERN.pattern());
}
// this is where we use the capturing groups from the regular expression
return new TaxPayer(matcher.group(1), matcher.group(2), ...);
}
private Receipt parseReceipt(String chunk) {
// TODO implement
}
private String getNextChunk(BufferedReader reader) {
// keep reading lines until either a blank line or end of file
// return the chunk as a string
}
}
I have a requirement to edit the .bat file with Java.
The file contains following line of text
testrunner.bat -ParId=12810 -PsysDate=2014-07-03 "C:\SOAP METHODS\DELINQ-soapui-project.xml"
Here I have a string -ParId=12810 and -PsysDate=2014-07-03, in this I need to write the new content after = sign, i.e. I need to assign different values to -ParId and -PsysDate variables.
What's wrong with rewriting the complete file?
I don't know much about regex, in fact i almost never used it, but you can utilizing regex for your problem, something like:
class RegexExample {
public static void main(String[] args) {
String input = "testrunner.bat -ParId=12810 -PsysDate=2014-07-03 'C:\\SOAP METHODS\\DELINQ-soapui-project.xml'";
input = input.replaceAll("ParId=[0-9]+","ParId=newValueID");
input = input.replaceAll("PsysDate=\\w+\\-\\w+\\-\\w+","PsysDate=newValueDate");
System.out.println(input);
}
}
I know it is not the most efficient or pretty, but you can start from there, many references found in Google though :)
If the file always contains the same text(without the parameters) you could do:
String formatstr = "testrunner.bat -ParId=%d -PsysDate=%s \"C:\SOAP METHODS\DELINQ-soapui-project.xml\"";
String output = String.format(formatstr,id,datestring);
// write output to file
I'm used to python and django but I've recently started learning java. Since I don't have much time because of work I missed a lot of classes and I'm a bit confused now that I have to do a work.
EDIT
The program is suppose to attribute points according to the time each athlete made in bike and race. I have 4 extra tables for male and female with points and times.
I have to compare then and find the corresponding points for each time (linear interpolation).
So this was my idea to read the file, and use an arrayList
One of the things I'm having difficulties is creating a two dimensional array.
I have a file similar to this one:
12 M 23:56 62:50
36 F 59:30 20:60
Where the first number is an athlete, the second the gender and next time of different races (which needs to be converted into seconds).
Since I can't make an array mixed (int and char), I have to convert the gender to 0 and 1.
so where is what I've done so far:
public static void main(String[] args) throws FileNotFoundException {
Scanner fileTime = new Scanner (new FileReader ("time.txt"));
while (fileTime.hasNext()) {
String value = fileTime.next();
// Modify gender by o and 1, this way I'm able to convert string into integer
if (value.equals("F"))
value = "0";
else if (value.equals("M"))
value = "1";
// Verify which values has :
int index = valor.indexOf(":");
if (index != -1) {
String [] temp = value.split(":");
for (int i=0; i<temp.length; i++) {
// convert string to int
int num = Integer.parseInt(temp[i]);
// I wanted to multiply the first number by 60 to convert into seconds and add the second number to the first
num * 60; // but this way I multiplying everything
}
}
}
I'm aware that there's probably easier ways to do this but honestly I'm a bit confused, any lights are welcome.
Just because an array works well to store the data in one language does not mean it is the best way to store the data in another language.
Instead of trying to make a two dimensional array, you can make a single array (or collection) of a custom class.
public class Athlete {
private int _id;
private boolean _isMale;
private int[] _times;
//...
}
How you intend to use the data may change the way you structure the class. But this is a simple direct representation of the data line you described.
Python is a dynamically-typed language, which means you can think of each row as a tuple, or even as a list/array if you like. The Java idiom is to be stricter in typing. So, rather than having a list of list of elements, your Java program should define a class that represents a the information in each line, and then instantiate and populate objects of that class. In other words, if you want to program in idiomatic Java, this is not a two-dimensional array problem; it's a List<MyClass> problem.
Try reading the file line by line:
while (fileTime.hasNext())
Instead of hasNext use hasNextLine.
Read the next line instead of next token:
String value = fileTime.next();
// can be
String line = fileTime.nextLine();
Split the line into four parts with something as follows:
String[] parts = line.split("\\s+");
Access the parts using parts[0], parts[1], parts[2] and parts[3]. And you already know what's in what. Easily process them.
import java.io.*;
class BS{
public void pStr(){
try{
String command="cat /usr/share/doc/bash/rbash.pdf";
Process ps=Runtime.getRuntime().exec(command);
InputStream in = ps.getInputStream();
int c;
while((c=in.read())!=-1){
System.out.print((char)c);
}
}catch(Exception e){
e.printStackTrace();
}
}
public static void main(String args[]){
new BS().pStr();
}
}
jabira-whosechild-lm.local 23:54:00 % java BS|wc
384 2003 43885
jabira-whosechild-lm.local 23:54:05 % wc /usr/share/doc/bash/rbash.pdf
384 2153 43885 /usr/share/doc/bash/rbash.pdf
Why do i see the difference in the number of characters that are read
and printed to the console
The method InputStream.read() reads only one byte.
Your source code line System.out.print((char)c); is wrong. The method PrintStream.print(char c) is called and this method writes two bytes for some non-ASCII character values.
You need to call a method that always writes one byte value. The correct method is System.out.write(c);.
Isn't it that the number of characters is the same, but the number of words are different?
I'm guessing that somewhere in your c=in.read() and print((char)c) code there is some encoding issues going on.
Can you save the output to another PDF file and do a binary compare of them? If they are identical then that's really weird! If they're not, then you might find a clue in the differences.