I have this app where i process a very large file, Extract the lines that have the same first 5 characters (i call this currentlineId ), use them to create an object and do something with it, example sample of the file contents:
AZDFS12345678998765432345678
AZDFS09876545432345678987654
AZDFS34568987654567890987654
AZDFS12345670987654345678998
AZDFS12345098734567765123456
// the lines above have the same first 5 characters, they create Object1.
FGHJUY121324
FGHJUY089909
FGHJUYTTUTUU
//same for the lines above, they create Object2.
NB: the lines will always be in an order where lines with the same first 5 will always be together (abover/below each other) so i wonn't have lines all over the place
My current function code:
private void processScpFile(File file) {
LOGGER.info("Processing File: {} ", file.getName());
try (var br = new BufferedReader(new FileReader(file))) {
String currentLine;
String lastLineId = null;
List<String> similarLineIdsList = new ArrayList<>();
while ((currentLine = br.readLine()) != null) {
if (StringUtils.isEmpty(lastLineId)) {
lastLineId = currentLine.substring(0,5);
}
if (lastLineId.equals(currentLine.substring(0,5))) {
similarLineIdsList.add(currentLine);
}
else if (!lastLineId.equals(currentLine.substring(0,5))) {
doSomethinsWithTheList(similarLineIdsList);
similarLineIdsList.clear();
similarLineIdsList.add(currentLine);
lastLineId= currentLine.substring(0,5);
}
}
doSomethinsWithTheList(similarLineIdsList);
}
catch (IOException e) {
LOGGER.error("Couldn't read file, {}", e.getMessage(), e);
}
}
Now this has worked well up until now, but going forward i have to process files where i would have for instance over 100k lines with same first 5, which makes this process very slow.
Please do you have any suggestion on haow to make this process faster, thank you
Edit: just to be precise it's the generating the list with the same first 5 chars that's slower as the number of similar lines gets larger.
Related
I have this code which is used to read lines from a file and insert it into Postgre:
try {
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(
"C:\\in_progress\\test.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
Thread.sleep(100);
Optional<ProcessedWords> isFound = processedWordsService.findByKeyword(line);
if(!isFound.isPresent()){
ProcessedWords obj = ProcessedWords.builder()
.keyword(line)
.createdAt(LocalDateTime.now())
.build();
processedWordsService.save(obj);
}
// read next line
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
catch (Exception e) {
e.printStackTrace();
}
How I can remove a line from the file after the line is inserted into SQL database?
The issues with the current code:
Adhere to the Single responsibility principle. Your code is doing too many things: reads from a file, performs findByKeyword() call, prepares the data and hands it out to store in the database. It's hardly can be thoroughly tested, and it's very difficult to maintain.
Always use try-with-recourses to get your recourses closed at any circumstances.
Don't catch the general Exception type - your code should only catch thous exceptions, which are more or less expected and for which there's a clear scenario on how to handle them. But don't catch all the exceptions.
How I can remove a line from the file after the line is inserted into SQL database?
It is not possible to remove a line from a file in the literal sense. You can override the contents of the file or replace it with another file.
My advice would be to file data in memory, process it, and then write the lines which should be retained into the same file (i.e. override the file contents).
You can argue that the file is huge and dumping it into memory would result in an OutOfMemoryError. And you want to read a line from a file, process it somehow, then store the processed data into the database and then write the line into a file... So that everything is done line by line, all actions in one go for a single line, and as a consequence all the code is crammed in one method. I hope that's not the case because otherwise it's a clear XY-problem.
Firstly, File System isn't a reliable mean of storing data, and it's not very fast. If the file is massive, then reading and writing it will a take a considerable amount of time, and it's done just it in order to use a tinny bit of information then this approach is wrong - this information should be stored and structured differently (i.e. consider placing into a DB) so that it would be possible to retrieve the required data, and there would be no problem with removing entries that are no longer needed.
But if the file is lean, and it doesn't contain critical data. Then it's totally fine, I will proceed assuming that it's the case.
The overall approach is to generate a map Map<String, Optional<ProcessedWords>> based on the file contents, process the non-empty optionals and prepare a list of lines to override the previous file content.
The code below is based on the NIO2 file system API.
public void readProcessAndRemove(ProcessedWordsService service, Path path) {
Map<String, Optional<ProcessedWords>> result;
try (var lines = Files.lines(path)) {
result = processLines(service, lines);
} catch (IOException e) {
result = Collections.emptyMap();
logger.log();
e.printStackTrace();
}
List<String> linesToRetain = prepareAndSave(service, result);
writeToFile(linesToRetain, path);
}
Processing the stream of lines from a file returned Files.lines():
private static Map<String, Optional<ProcessedWords>> processLines(ProcessedWordsService service,
Stream<String> lines) {
return lines.collect(Collectors.toMap(
Function.identity(),
service::findByKeyword
));
}
Saving the words for which findByKeyword() returned an empty optional:
private static List<String> prepareAndSave(ProcessedWordsService service,
Map<String, Optional<ProcessedWords>> wordByLine) {
wordByLine.forEach((k, v) -> {
if (v.isEmpty()) saveWord(service, k);
});
return getLinesToRetain(wordByLine);
}
private static void saveWord(ProcessedWordsService service, String line) {
ProcessedWords obj = ProcessedWords.builder()
.keyword(line)
.createdAt(LocalDateTime.now())
.build();
service.save(obj);
}
Generating a list of lines to retain:
private static List<String> getLinesToRetain(Map<String, Optional<ProcessedWords>> wordByLine) {
return wordByLine.entrySet().stream()
.filter(entry -> entry.getValue().isPresent())
.map(Map.Entry::getKey)
.collect(Collectors.toList());
}
Overriding the file contents using Files.write(). Note: since varargs OpenOption isn't provided with any arguments, this call would be treated as if the CREATE, TRUNCATE_EXISTING, and WRITE options are present.
private static void writeToFile(List<String> lines, Path path) {
try {
Files.write(path, lines);
} catch (IOException e) {
logger.log();
e.printStackTrace();
}
}
For Reference
import java.io.*;
public class RemoveLinesFromAfterProcessed {
public static void main(String[] args) throws Exception {
String fileName = "TestFile.txt";
String tempFileName = "tempFile";
File mainFile = new File(fileName);
File tempFile = new File(tempFileName);
try (BufferedReader br = new BufferedReader(new FileReader(mainFile));
PrintWriter pw = new PrintWriter(new FileWriter(tempFile))
) {
String line;
while ((line = br.readLine()) != null) {
if (toProcess(line)) { // #1
// process the code and add it to DB
// ignore the line (i.e, not add to temp file)
} else {
// add to temp file.
pw.write(line + "\n"); // #2
}
}
} catch (Exception e) {
e.printStackTrace();
}
// delete the old file
boolean hasDeleted = mainFile.delete(); // #3
if (!hasDeleted) {
throw new Exception("Can't delete file!");
}
boolean hasRenamed = tempFile.renameTo(mainFile); // #4
if (!hasRenamed) {
throw new Exception("Can't rename file!");
}
System.out.println("Done!");
}
private static boolean toProcess(String line) {
// any condition
// sample condition for example
return line.contains("aa");
}
}
Read the file.
1: The condition to decide whether to delete the line or to retain it.
2: Write those line which you don't want to delete into the temporary file.
3: Delete the original file.
4: Rename the temporary file to original file name.
The basic idea is the same as what #Shiva Rahul said in his answer.
However another approach can be , store all the line numbers you want to delete in a list. After you have all the required line numbers that you want to delete you can use LineNumberReader to check and duplicate your main file.
Mostly I have used this technique in batch-insert where I was unsure how many lines may have a particular file plus before removal of lines had to do lot of processing.
It may not be suitable for your case ,just posting the suggestion here if any one bumps to this thread.
private void deleteLines(String inputFilePath,String outputDirectory,List<Integer> lineNumbers) throws IOException{
File tempFile = new File("temp.txt");
File inputFile = new File(inputFilePath);
// using LineNumberReader we can fetch the line numbers of each line
LineNumberReader lineReader = new LineNumberReader(new FileReader(inputFile));
//writter for writing the lines into new file
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(tempFile));
String currentLine;
while((currentLine = lineReader.readLine()) != null){
//if current line number is present in removeList then put empty line in new file
if(lineNumbers.contains(lineReader.getLineNumber())){
currentLine="";
}
bufferedWriter.write(currentLine + System.getProperty("line.separator"));
}
//closing statements
bufferedWriter.close();
lineReader.close();
//delete the main file and rename the tempfile to original file Name
boolean delete = inputFile.delete();
//boolean b = tempFile.renameTo(inputFile); // use this to save the temp file in same directory;
boolean b = tempFile.renameTo(new File(outputDirectory+inputFile.getName()));
}
To use this function all you have to do is gather all the required line numbers.inputFilePath is the path of the source file and outputDirectory is where I want store the file after processing.
Hopefully my explanation does me some justice. I am pretty new to java. I have a text file that looks like this
Java
The Java Tutorials
http://docs.oracle.com/javase/tutorial/
Python
Tutorialspoint Java tutorials
http://www.tutorialspoint.com/python/
Perl
Tutorialspoint Perl tutorials
http://www.tutorialspoint.com/perl/
I have properties for language name, website description, and website url. Right now, I just want to list the information from the text file exactly how it looks, but I need to assign those properties to them.
The problem I am getting is "index 1 is out of bounds for length 1"
try {
BufferedReader in = new BufferedReader(new FileReader("Tutorials.txt"));
while (in.readLine() != null) {
TutorialWebsite tw = new TutorialWebsite();
str = in.readLine();
String[] fields = str.split("\\r?\\n");
tw.setProgramLanguage(fields[0]);
tw.setWebDescription(fields[1]);
tw.setWebURL(fields[2]);
System.out.println(tw);
}
} catch (IOException e) {
e.printStackTrace();
}
I wanted to test something so i removed the new lines and put commas instead and made it str.split(",") which printed it out just fine, but im sure i would get points taken off it i changed the format.
readline returns a "string containing the contents of the line, not including any line-termination characters", so why are you trying to split each line on "\\r?\\n"?
Where is str declared? Why are you reading two lines for each iteration of the loop, and ignoring the first one?
I suggest you start from
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
and work from there.
The first readline gets the language, the second gets the description, and the third gets the url, and then the pattern repeats. There is nothing to stop you using readline three times for each iteration of the while loop.
you can read all the file in a String like this
// try with resources, to make sure BufferedReader is closed safely
try (BufferedReader in = new BufferedReader(new FileReader("Tutorials.txt"))) {
//str will hold all the file contents
StringBuilder str = new StringBuilder();
String line;
while ((line = in.readLine()) != null) {
str.append(line);
str.append("\n");
} catch (IOException e) {
e.printStackTrace();
}
Later you can split the string with
String[] fields = str.toString().split("[\\n\\r]+");
Why not try it like this.
allocate a List to hold the TutorialWebsite instances.
use try with resources to open the file, read the lines, and trim any white space.
put the lines in an array
then iterate over the array, filling in the class instance
the print the list.
The loop ensures the array length is a multiple of nFields, discarding any remainder. So if your total lines are not divisible by nFields you will not read the remainder of the file. You would still have to adjust the setters if additional fields were added.
int nFields = 3;
List<TutorialWebsite> list = new ArrayList<>();
try (BufferedReader in = new BufferedReader(new FileReader("tutorials.txt"))) {
String[] lines = in.lines().map(String::trim).toArray(String[]::new);
for (int i = 0; i < (lines.length/nFields)*nFields; i+=nFields) {
TutorialWebsite tw = new TutorialWebsite();
tw.setProgramLanguage(lines[i]);
tw.setWebDescription(lines[i+1]);
tw.setWebURL(lines[i+2]);
list.add(tw);
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
list.forEach(System.out::println);
A improvement would be to use a constructor and pass the strings to that when each instance is created.
And remember the file name as specified is relative to the directory in which the program is run.
I want to cut a text file.
I want to cut the file 50 lines by 50 lines.
For example, If the file is 1010 lines, I would recover 21 files.
I know how to count the number of files, the number of lines but as soon as I write, it's doesn't work.
I use the Camel Simple (Talend) but it's Java code.
private void ExtractOrderFromBAC02(ProducerTemplate producerTemplate, InputStream content, String endpoint, String fileName, HashMap<String, Object> headers){
ArrayList<String> list = new ArrayList<String>();
BufferedReader br = new BufferedReader(new InputStreamReader(content));
String line;
long numSplits = 50;
int sourcesize=0;
int nof=0;
int number = 800;
try {
while((line = br.readLine()) != null){
sourcesize++;
list.add(line);
}
System.out.println("Lines in the file: " + sourcesize);
double numberFiles = (sourcesize/numSplits);
int numberFiles1=(int)numberFiles;
if(sourcesize<=50) {
nof=1;
}
else {
nof=numberFiles1+1;
}
System.out.println("No. of files to be generated :"+nof);
for (int j=1;j<=nof;j++) {
number++;
String Filename = ""+ number;
System.out.println(Filename);
StringBuilder builder = new StringBuilder();
for (String value : list) {
builder.append("/n"+value);
}
producerTemplate.sendBodyAndHeader(endpoint, builder.toString(), "CamelFileName",Filename);
}
}
} catch (IOException e) {
e.printStackTrace();
}
finally{
try {
if(br != null)br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
For people who don't know Camel, this line is used to send the file:
producerTemplate.sendBodyAndHeader (endpoint, line.toString (), "CamelFileName" Filename);
endpoint ==> Destination (it's ok with another code)
line.toString () ==> Values
And then the file name (it's ok with another code)
you count the lines first
while((line = br.readLine()) != null){
sourcesize++; }
and then you're at the end of the file: you read nothing
for (int i=1;i<=numSplits;i++) {
while((line = br.readLine()) != null){
You have to seek back to the start of the file before reading again.
But that's a waste of time & power because you'll read the file twice
It's better to read the file once and for all, put it in a List<String> (resizable), and proceed with your split using the lines stored in memory.
EDIT: seems that you followed my advice and stumbled on the next issue. You should have maybe asked another question, well... this creates a buffer with all the lines.
for (String value : list) {
builder.append("/n"+value);
}
You have to use indexes on the list to build small files.
for (int k=0;k<numSplits;k++) {
builder.append("/n"+list[current_line++]);
current_line being the global line counter in your file. That way you create files of 50 different lines each time :)
I'm working in Android trying to write some lines into a file. After a certain amount of lines, say 100, I want the file to delete the first line, and then append a line to the end. So basically, I want to keep writing to the file but keep the limit at 100 lines. I have been reading and found the files in java aren't to friendly with what I'm trying to do. I haven't found anything here either http://docs.oracle.com/javase/tutorial/essential/io/file.html
Is there a better way of keeping a file to the limit of 100 lines, and deleting the oldest lines when adding the new lines after that?
More specifically, I want a textView to display the 100 most recent events that a service has sent.
As of now, I have this method to display my STORETEXT file,
public void readFileInEditor() {
try {
InputStream in = openFileInput(STORETEXT);
if (in != null) {
InputStreamReader tmp = new InputStreamReader(in);
BufferedReader reader = new BufferedReader(tmp);
String str;
StringBuilder buf = new StringBuilder();
while ((str = reader.readLine()) != null) {
buf.append(str + "\n");
}
in.close();
writelog.setText(buf.toString());
}
} catch (java.io.FileNotFoundException e) {
// that's OK, we probably haven't created it yet
} catch (Throwable t) {
Toast.makeText(this, "Exception: " + t.toString(),
Toast.LENGTH_LONG).show();
}
}
and I write to the file like this...
OutputStreamWriter out = new OutputStreamWriter(openFileOutput(
STORETEXT, MODE_APPEND));
out.write("Some User Activity");
out.write("\n");
out.close();
I want to modify my code to only write the 100 most recent activities, and then set that to my textView. Thanks for any help.
I have a text file called "high.txt". I need the data inside for my Android app. But I have absolutely no idea how to read it into an ArrayList of the Strings. I tried the normal way of doing it in Java but apparently that doesn't work in Android since it cant find the file. So how do I go about doing this? I have put it in my res folder. But how do you take the input stream that you get from opening the file within Android and read it into an ArrayList of Strings. I am stuck on that part.
Basically it would look something like this:
3. What do you do for an upcoming test?
L: make sure I know what I'm studying and really review and study for this thing. Its what Im good at. Understand the material really well.
CL: Time to study. I got this, but I really need to make sure I know it,
M: Tests can be tough, but there are tips and tricks. Focus on the important, interesting stuff. Cram in all the little details just to get past this test.
CR: -sigh- I don't like these tests. Hope I've studied enough to pass or maybe do well.
R: Screw the test. I'll study later, day before should be good.
This is for a sample question and all the lines will be stored as separate strings in the array list.
If you put the text file in your assets folder you can use code like this which I've taken and modified from one of my projects:
public static void importData(Context context) {
try {
BufferedReader br = new BufferedReader(new InputStreamReader(context.getAssets().open("high.txt")));
String line;
while ((line = br.readLine()) != null) {
String[] columns = line.split(",");
Model model = new Model();
model.date = DateUtil.getCalendar(columns[0], "MM/dd/yyyy");
model.name = columns[1];
dbHelper.insertModel(model);
}
} catch (IOException e) {
e.printStackTrace();
}
}
Within the loop you can do anything you need with the columns, what this example is doing is creating an object from each row and saving it in the database.
For this example the text file would look something like this:
15/04/2013,Bob
03/03/2013,John
21/04/2013,Steve
If you want to read file from External storage than use below method.
public void readFileFromExternal(){
String path = Environment.getExternalStorageDirectory().getPath()
+ "/AppTextFile.txt";
try {
BufferedReader reader = new BufferedReader(new FileReader(path));
String line, results = "";
while( ( line = reader.readLine() ) != null)
{
results += line;
}
reader.close();
Log.d("FILE","Data in your file : " + results);
} catch (Exception e) {
}
}
//find all files from folder /assets/txt/
String[] elements;
try {
elements = getAssets().list("txt");
} catch (IOException e) {
e.printStackTrace();
}
//for every files read text per line
for (String fileName : elements) {
Log.d("xxx", "File: " + fileName);
try {
InputStream open = getAssets().open("txt/" + fileName);
InputStreamReader inputStreamReader = new InputStreamReader(open);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
String line = "";
while ((line = bufferedReader.readLine()) != null) {
Log.d("xxx", line);
}
} catch (IOException e) {
e.printStackTrace();
}
}