Java: How to read huge records (>10k) - java

Looking for best practice to read a file line by line which has >10 records and storing it in ArrayList.
My program was able to read to 3.5k records and ignoring rest of the records.
URL cityurl = ClassLoader.getSystemResource(citypath);
citybr = new BufferedReader(new FileReader(cityurl.getFile()));
for (String city = citybr.readLine(); city != null; city = citybr.readLine()) {
citycountryairport.add(citybr.readLine());
}
Thanks in advance!!

BufferedReader is a good choice for reading large files because it buffers the file and thus avoids loading the whole file into memory, see BufferedReader Doc.
Each time you call
readLine();
The next line of the file is read, in your code change :
citycountryairport.add(citybr.readLine());
to :
citycountryairport.add(city);
otherwise the lines read by the line
city = citybr.readLine()
will not be added to your list because you never add the String city to your list.

Related

Updating a particular column in csv with huge amount of data with java

I have a csv file 'Master List' with 800 K records, each record have 13 values.
combination of cell[0] and cell[1] give a unique record and I need to update value of cell[12] say status for every record.
I have another csv file say 'Updated subset list'. This is sort of subset of file 'Master list'. For all the records in my 2nd csv which are less in number say 10000, I need to update cell[11] aka status column value of each matching record.
I tried direct BufferedReader, CsvParser from commons-csv and CsvParser from univocity.parsers.
But reading whole file and creating List of 800K is giving out of memory exception.
Same code will be deployed on different servers so I want to have a efficient code for reading huge csv file and updating same file.
Partially reading huge file and writing in same file might corrupt the data.
Any suggestions on how can I do this. ??
File inputF = new File(inputFilePath);
if (inputF.exists()) {
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));
// skip the header of the file
String line = br.readLine();
mandatesList = new ArrayList<DdMandates>();
while ((line = br.readLine()) != null) {
mandatesList.add(mapToItem(line));
}
br.close();
}
Memory issue is resolved via doing it in chunks. reading single line and writing single line might result is taking more time. I didn't tried it as my issue was resolved with using batches of 100k records at time and clearing list after writing 100k records
Now issue is updating status is taking too much looping....
I have two csv's. Master sheet (Master list) have 800 K records then I have a subset csv as well say it have 10 k records. This subset csv is updated from some other system and it have updated status say 'OK' and 'NOT OK'. I need to update this status in Master sheet. How can I do that in best possible way. ??? Dumbest way I am using is follwing : –
// Master list have batches but it contains 800 k records and 12 columns
List<DdMandates> mandatesList = new ArrayList<DdMandates>();
// Subset list have updated status
List<DdMandates> updatedMandatesList = new ArrayList<DdMandates>();
// Read Subset csv file and map DdMandates item and then add to updated mandate list
File inputF = new File(Property.inputFilePath);
if(inputF.exists()) {
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS, "UTF-8"));
checkFilterAndmapToItem(br);
br.close();
In Method checkFilterAndmapToItem(BufferedReader br)
private static void checkFilterAndmapToItem(BufferedReader br) {
FileWriter fileWriter = null;
try {
// skip the header of the csv
String line = br.readLine();
int batchSize = 0, currentBatchNo=0;
fileWriter = new FileWriter(Property.outputFilePath);
//Write the CSV file header
fileWriter.append(FILE_HEADER.toString());
//Add a new line separator after the header
fileWriter.append(NEW_LINE_SEPARATOR);
if( !Property.batchSize.isEmpty()) {
batchSize = Integer.parseInt(Property.batchSize.trim());
}
while ((line = br.readLine()) != null) {
DdMandates item = new DdMandates();
String[] p = line.concat(" ").split(SEPERATOR);
Parse each p[x] and map to item of type DdMandates\
Iterating here on updated mandate list to check if this item is present in updated mandate list
then get that item and update that status to item . so here is a for loop for say 10K elements
mandatesList.add(item);
if (batchSize != 0 && mandatesList.size() == batchSize) {
currentBatchNo++;
logger.info("Batch no. : "+currentBatchNo+" is executing...");
processOutputFile(fileWriter);
mandatesList.clear();
}
}
processing output file here for the last batch ...
}
It will have while loop (800 K iteration) { insider loop 10K iteration for each element )
so at least 800K * 10K loop
Please help in getting its best possible way and reduce iteration .
Thanks in advance
Suppose you are reading 'Main Data File' in batches of 50K:
Store this data in java HashMap using cell[0] and cell[1] as key and rest of the columns as value.
The complexity of get and put is O(1) most of the time. see here
So the complexity for searching 10K records in that particular batch will be O(10K).
HashMap<String, DdMandates> hmap = new HashMap<String, DdMandates>();
Use key=DdMandates.get(0)+DdMandates.get(1)
Note: If 50K records are exceeding the memory limit of HashMap create smaller batches.
For further performance enhancement you can use multi-threading by creating small batches and processing them on different threads.
The first suggestion, when you create the ArrayList, it will make list capacity of 10. So, if you work with large amount of data, initialize it first like:
private static final int LIST_CAPACITY = 800000;
mandatesList = new ArrayList<DdMandates>(LIST_CAPACITY);
The second suggestion, don't store data in the memory, read the data line by line, make your business logic needs, then free up memory, like:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
/* your business rule here */
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}

How to access values of a line, while reading in a text file in Java

I am trying to load in two files at the same time but also access the first gps1 file. I want to access the gps1 file line-by-line and depending on the sentence type which I will explain later I want to do different stuff with that line and then move to the next line.
Basically gps1 for example has multiple lines but each line falls under a couple of catagories all starting with $GPS(then other characters). Some of these types have a time stamp which I need to collect and some types do not have a time stamp.
File gps1File = new File(gpsFile1);
File gps2File = new File(gpsFile2);
FileReader filegps1 = new FileReader(gpsFile1);
FileReader filegps2 = new FileReader(gpsFile2);
BufferedReader buffer1 = new BufferedReader(filegps1);
BufferedReader buffer2 = new BufferedReader(filegps2);
String gps1;
String gps2;
while ((gps1 = buffer1.readLine()) != null) {
The gps1 data file is as follows
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A
$GPRMC,151018.000,A,5225.9627,N,00401.1624,W,0.11,104.71,210214,,*14
$GPGGA,151019.000,5225.9627,N,00401.1624,W,1,09,1.0,38.9,M,51.1,M,,0000*72
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A
Thanks
I don't really understand the problem you are facing but anyway, if you want to get your lines content you can use a StringTokenizer
StringTokenizer st = new StringTokenizer(gps1, ",");
And then access the data one by one
while(st.hasMoreToken)
String s = st.nextToken();
EDIT:
NB: the first token will be your "$GPXXX" attribute

How to append existing line in text file

How do i append an existing line in a text file? What if the line to be edited is in the middle of the file? Please kindly offer a suggestion, given the following code.
Have went through & tried the following:
How to add a new line of text to an existing file in Java?
How to append existing line within a java text file
My code:
filePath = new File("").getAbsolutePath();
BufferedReader reader = new BufferedReader(new FileReader(filePath + "/src/DBTextFiles/Customer.txt"));
try
{
String line = null;
while ((line = reader.readLine()) != null)
{
if (!(line.startsWith("*")))
{
//System.out.println(line);
//check if target customer exists, via 2 fields - customer name, contact number
if ((line.equals(customername)) && (reader.readLine().equals(String.valueOf(customermobilenumber))))
{
System.out.println ("\nWelcome (Existing User) " + line + "!");
//w target customer, alter total number of bookings # 5th line of 'Customer.txt', by reading lines sequentially
reader.readLine();
reader.readLine();
int total_no_of_bookings = Integer.valueOf(reader.readLine());
System.out.println (total_no_of_bookings);
reader.close();
valid = true;
//append total number of bookings (5th line) of target customer # 'Customer.txt'
try {
BufferedWriter writer = new BufferedWriter(new FileWriter(new File(filePath + "/src/DBTextFiles/Customer.txt")));
writer.write(total_no_of_bookings + 1);
//writer.write("\n");
writer.close();
}
catch (IOException ex)
{
ex.printStackTrace();
}
//finally
// {
//writer.close();
//}
}
}
}
To be able to append content to an existing file you need to open it in append mode. For example using FileWriter(String fileName, boolean append) and passing true as second parameter.
If the line is in the middle then you need to read the entire file into memory and then write it back when all editing was done.
This might be workable for small files but if your files are too big, then I would suggest to write the actual content and the edited content into a temp file, when done delete the old one an rename the temp file to be the same name as the old one.
The reader.readLine() method increments a line each time it is called. I am not sure if this is intended in your program, but you may want to store the reader.readline() as a String so it is only called once.
To append a line in the middle of the text file I believe you will have to re-write the text file up to the point at which you wish to append the line, then proceed to write the rest of the file. This could possibly be achieved by storing the whole file in a String array, then writing up to a certain point.
Example of writing:
BufferedWriter writer = new BufferedWriter(new FileWriter(new File(path)));
writer.write(someStuff);
writer.write("\n");
writer.close();
You should probably be following the advice in the answer to the second link you posted. You can access the middle of a file using a random access file, but if you start appending at an arbitrary position in the middle of a file without recording what's there when you start writing, you'll be overwriting its current contents, as noted in this answer. Your best bet, unless the files in question are intractably large, is to assemble a new file using the existing file and your new data, as others have previously suggested.
AFAIK you cannot do that. I mean, appending a line is possible but not inserting in the middle. That has nothing to do with java or another language...a file is a sequence of written bytes...if you insert something in an arbitrary point that sequence is no longer valid and needs to be re-written.
So basically you have to create a function to do that read-insert-slice-rewrite

read data from a file in java

I have a text file in the following format:
Details.txt
The file is a .txt file. I want to read course title from this file and print corresponding textbook and instructor information. But i am not sure about what process to follow ? storing the information in an array won't be efficient! How should i proceed ? NOTE: I can't change the info in the file, it should not be changed!! obviously the file will be read by the following code:
File newFile=new File("C:/details");
but how should i extract the data from this file according to the labels course title, textbook and instructor!?
First read the file correctly line by line, and search for your entered course title, lets consider "Java"
Now you hit your title and you know you need 3 consecutive lines from your file as all information related to that title are there.
if(str.startsWith(title)); { // for title = "Java"
line1 = 1st line // contains ISBN and First Name
line2 = 2nd line // Title and Last Name
line3 = 3rd line // Author and Department
line4 = 4th line // Email
break; // this will take you out of while loop
}
Now on those four lines do string operations and extract your data as you need and use it.
I am home so I can't give you exact code. But if you follow this it will solve your issue. Let me know if any problem you got while doing this.
Follow this to get some info on String operations
Use String Tokenizer and separate each string and then store them in a Linked List or Array List. Have Separate List for each title like course title, instructor etc. and then print them
//Find the directory for the SD Card using the API
//*Don't* hardcode "/sdcard"
File sdcard = Environment.getExternalStorageDirectory();
//Get the text file
File file = new File(sdcard,"file.txt");
//Read text from file
StringBuilder text = new StringBuilder();
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
text.append(line);
text.append('\n');
}
}
catch (IOException e) {
//You'll need to add proper error handling here
}
//Find the view by its id
TextView tv = (TextView)findViewById(R.id.text_view);
//Set the text
tv.setText(text);
you can use FileUtils.readFileToString(new File(""C:/details.txt");
Now you can extract the required data based on your wish
Use Scanner class
Scanner s=new Scanner(new File("C:/Details.txt"));
while(s.hasNext())
{
System.out.println(s.nextLine());
}
if you want in work by word then use String Tokenizer
see this article

Java - Load file, replace string, save

I have a program that loads lines from a user file, then selects the last part of the String (which would be an int)
Here's the style it's saved in:
nameOfValue = 0
nameOfValue2 = 0
and so on. I have selected the value for sure - I debugged it by printing. I just can't seem to save it back in.
if(nameOfValue.equals(type)) {
System.out.println(nameOfValue+" equals "+type);
value.replace(value, Integer.toString(Integer.parseInt(value)+1));
}
How would I resave it? I've tried bufferedwriter but it just erases everything in the file.
My suggestion is, save all the contents of the original file (either in memory or in a temporary file; I'll do it in memory) and then write it again, including the modifications. I believe this would work:
public static void replaceSelected(File file, String type) throws IOException {
// we need to store all the lines
List<String> lines = new ArrayList<String>();
// first, read the file and store the changes
BufferedReader in = new BufferedReader(new FileReader(file));
String line = in.readLine();
while (line != null) {
if (line.startsWith(type)) {
String sValue = line.substring(line.indexOf('=')+1).trim();
int nValue = Integer.parseInt(sValue);
line = type + " = " + (nValue+1);
}
lines.add(line);
line = in.readLine();
}
in.close();
// now, write the file again with the changes
PrintWriter out = new PrintWriter(file);
for (String l : lines)
out.println(l);
out.close();
}
And you'd call the method like this, providing the File you want to modify and the name of the value you want to select:
replaceSelected(new File("test.txt"), "nameOfValue2");
I think most convenient way is:
Read text file line by line using BufferedReader
For each line find the int part using regular expression and replace
it with your new value.
Create a new file with the newly created text lines.
Delete source file and rename your new created file.
Please let me know if you need the Java program implemented above algorithm.
Hard to answer without the complete code...
Is value a string ? If so the replace will create a new string but you are not saving this string anywhere. Remember Strings in Java are immutable.
You say you use a BufferedWriter, did you flush and close it ? This is often a cause of values mysteriously disappearing when they should be there. This exactly why Java has a finally keyword.
Also difficult to answer without more details on your problem, what exactly are you trying to acheive ? There may be simpler ways to do this that are already there.

Categories