Split file into multiple files

Split file into multiple files - java

I want to cut a text file.
I want to cut the file 50 lines by 50 lines.
For example, If the file is 1010 lines, I would recover 21 files.
I know how to count the number of files, the number of lines but as soon as I write, it's doesn't work.
I use the Camel Simple (Talend) but it's Java code.
private void ExtractOrderFromBAC02(ProducerTemplate producerTemplate, InputStream content, String endpoint, String fileName, HashMap<String, Object> headers){
ArrayList<String> list = new ArrayList<String>();
BufferedReader br = new BufferedReader(new InputStreamReader(content));
String line;
long numSplits = 50;
int sourcesize=0;
int nof=0;
int number = 800;
try {
while((line = br.readLine()) != null){
sourcesize++;
list.add(line);
}
System.out.println("Lines in the file: " + sourcesize);
double numberFiles = (sourcesize/numSplits);
int numberFiles1=(int)numberFiles;
if(sourcesize<=50) {
nof=1;
}
else {
nof=numberFiles1+1;
}
System.out.println("No. of files to be generated :"+nof);
for (int j=1;j<=nof;j++) {
number++;
String Filename = ""+ number;
System.out.println(Filename);
StringBuilder builder = new StringBuilder();
for (String value : list) {
builder.append("/n"+value);
}
producerTemplate.sendBodyAndHeader(endpoint, builder.toString(), "CamelFileName",Filename);
}
}
} catch (IOException e) {
e.printStackTrace();
}
finally{
try {
if(br != null)br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
For people who don't know Camel, this line is used to send the file:
producerTemplate.sendBodyAndHeader (endpoint, line.toString (), "CamelFileName" Filename);
endpoint ==> Destination (it's ok with another code)
line.toString () ==> Values
And then the file name (it's ok with another code)

you count the lines first
while((line = br.readLine()) != null){
sourcesize++; }
and then you're at the end of the file: you read nothing
for (int i=1;i<=numSplits;i++) {
while((line = br.readLine()) != null){
You have to seek back to the start of the file before reading again.
But that's a waste of time & power because you'll read the file twice
It's better to read the file once and for all, put it in a List<String> (resizable), and proceed with your split using the lines stored in memory.
EDIT: seems that you followed my advice and stumbled on the next issue. You should have maybe asked another question, well... this creates a buffer with all the lines.
for (String value : list) {
builder.append("/n"+value);
}
You have to use indexes on the list to build small files.
for (int k=0;k<numSplits;k++) {
builder.append("/n"+list[current_line++]);
current_line being the global line counter in your file. That way you create files of 50 different lines each time :)

Related

Java ignore a specific symbol after another specific symbol

I have a .csv file. Data is divided by commas and I need to extract information out this file. Thing is if i just write this it works but partially:
String file = "FinalProject/src/Data.csv";
BufferedReader rd = null;
String line = "";
HashSet<String> platforms = new HashSet<String>();
try
{
rd = new BufferedReader(new FileReader(file));
rd.readLine();
while ((line = rd.readLine())!=null)
{
String [] arr = line.split("\"");
var words = new ArrayList<String>();
for(int i =0; i < arr.length;i++)
{
if(i % 2 == 0)
{
words.addAll(Arrays.asList(arr[i].split(",")));
}
else
{
words.add(arr[i]);
}
platforms.add(words.get(2));
}
}
}
catch (Exception e)
{
System.out.println("");
}
finally
{
try
{
rd.close();
}
catch (IOException e)
{
throw new RuntimeException(e);
}
}
When I check the contents of Set and extract the same data out of the database created from this .csv file it shows difference. For example - my set has 38 values, when the database has 40, all of them are unique( nothing is repeated). I think the problem is caused by separation of data in .csv file with comma signs. Because some of these signs are inside of quotes and this probably causes a loss of the potential values that i need. Is there any solution to that problem? Or perhaps there is a more efficient way to deal with the comma sings inside of the quotes so that they are ignored?

Strange behavior reading files in Java [duplicate]

This question already has answers here:
BufferedReader is skipping every other line when reading my file in java
(3 answers)
Closed 4 months ago.
I'm trying to read nearly 120,000 lines from a file, put the data into a new record, and add this record to a list.
My problem is that I can't load all the data getting weird behavior.
In particular, using BufferedReader a first time I can count the rows and the result is correct, but when I try with a while loop to load the data into memory I see that the loop iterates about 60,000 times and the final list with the data contains only about 5000 objects.
I've also tried using other classes for loading data, but I always get the same problem.
I am currently using java 17 with spring and javafx.
Thank you.
I am attaching the latest version of my method:
public void getFixList(FixReadyCallback callback) {
List<Fix> fixList;
int firstCount = 0;
int whileCount = 0;
try {
File file = new File("src/main/resources/fligh_data/fix.dat");
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
String currentLine = null;
while (reader.readLine() != null) {
firstCount++;
}
fixList = new ArrayList<>(firstCount);
reader.close();
reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
while ((currentLine = reader.readLine()) != null) {
whileCount++;
currentLine = reader.readLine();
if (currentLine.matches(
"[-]?[0-9]{2}\\.[0-9]{6}\\s+[-]?[0-9]{3}\\.[0-9]{6}\\s+[0-9A-Z]{2,5}")) {
String[] splitted = currentLine.split("\\s+");
String denomination = splitted[2];
double latitude = Double.parseDouble(splitted[0]);
double longitude = Double.parseDouble(splitted[1]);
Coordinates coordinates = new Coordinates(latitude, longitude);
fixList.add(new Fix(denomination, coordinates));
}
}
System.out.println("FIRST_COUNT -> " + firstCount);
System.out.println("WHILE_COUNT -> " + whileCount);
System.out.println("LIST_SIZE -> " + fixList.size());
reader.close();
callback.onReady(fixList);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
And the output of the terminal:
FIRST_COUNT -> 119724
WHILE_COUNT -> 59862
LIST_SIZE -> 5128

while ((currentLine = reader.readLine()) != null) {
whileCount++;
currentLine = reader.readLine();
...
}
This skips every other line in your file. currentLine is already the next line in the file, and then you overwrite it with the line after that. I think you only meant to read one line per loop.
It seems pretty clear that you should simply delete the last line I quoted:
while ((currentLine = reader.readLine()) != null) {
whileCount++;
...
}

Creating an inverted index with limited memory in java

Im curious on how create an Inverted Index on data that doesn't fit into memory. So right now I'm reading a file directory and indexing the files based on the contents inside the file, I am using a HashMap to store the index. The code below is a snippet from a function I use and I call the function on an entire directory. What do I do if this directory was just massive and the HashMap can't fit all the entries. Yes, This does sound like premature optimization. Im just having fun. I don't want to use Lucene so don't even mention it because I'm tired as to seeing that as the majority answer to "Index" stuff. This HashMap is my only constraint everything else is stored in files to easily reference stuff later on.
Im just curious how I can do this since it stores it in the map like so
keyword -> file1,file2,file3,etc..(locations)
keyword2 -> file9,file11,file13,etc..(locations)
My thoughts were to create a file which would some how be able to update itself to be like the format above but I feel thats not efficient.
Code Snippet
br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
for (String _word : line.split("\\W+")) {
word = _word.toLowerCase();
if (!ignore_words.contains(word)) {
fileLocations = index.get(word);
if (fileLocations == null) {
fileLocations = new LinkedList<Long>();
index.put(word, fileLocations);
}
fileLocations.add(file_offset);
}
}
}
br.close();
Update:
So I managed to come up with something, but performance wise I feel this is slow, especially if there was a large amount of data. I basically created a file that would just have to word and its offset on each line the word appeared.Lets name it index.txt.
It had the format of like so
word1:offset
word2:offset
word1:offset <-encountered again.
word3:offset
etc...
I then created multiple files for each word and appended the offset to that file each time it was encountered in the index.txt file.
So basically the format of the word files are like so
word1.txt -- Format
word1:offset1:offset2:offset3:offset4...and so on
each time word1 is encountered in the index.txt file it would append it to the word1.txt file and add to end.
Then finally, I go through all the word files I created and overwrite the index.txt file with the final output in the index file looking like so
word1:offset1:offset2:offset3:offset4:...
word2:offset9:offset11:offset13:offset14:...
etc..
Then to finish it up, I delete all the word files.
The nasty code snippet for this is below, its a fair amount.
public void createIndex(String word, long file_offset)
{
PrintWriter writer;
try {
writer = new PrintWriter(new FileWriter(this.file,true));
writer.write(word + ":" + file_offset + "\n");
writer.close();
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
}
public void mergeFiles()
{
String line;
String wordLine;
String[] contents;
String[] wordContents;
BufferedReader reader;
BufferedReader mergeReader;
PrintWriter writer;
PrintWriter mergeWriter;
try {
reader = new BufferedReader(new FileReader(this.file));
while((line = reader.readLine()) != null)
{
contents = line.split(":");
writer = new PrintWriter(new FileWriter(
new File(contents[0] + ".txt"),true));
if(this.words.get(contents[0]) == null)
{
this.words.put(contents[0], contents[0]);
writer.write(contents[0] + ":");
}
writer.write(contents[1] + ":");
writer.close();
}
//This could be put in its own method below.
mergeWriter = new PrintWriter(new FileWriter(this.file));
for(String word : this.words.keySet())
{
mergeReader = new BufferedReader(
new FileReader(new File(word + ".txt")));
while((wordLine = mergeReader.readLine()) != null)
{
mergeWriter.write(wordLine + "\n");
}
}
mergeWriter.close();
deleteFiles();
}
catch(IOException ioe)
{
ioe.printStackTrace();
}
}
public void deleteFiles()
{
File toDelete;
for(String word : this.words.keySet())
{
toDelete = new File(word + ".txt");
if(toDelete.exists())
{
toDelete.delete();
}
}
}

Read one line of a csv file in Java

I have a csv file that currently has 20 lines of data.
The data contains employee info and is in the following format:
first name, last name, Employee ID
So one line would like this: Emma, Nolan, 2
I know how to write to the file in java and have all 20 lines print to the console, but what I'm not sure how to do is how to get Java to print one specific line to the console.
I also want to take the last employee id number in the last entry and have java add 1 to it one I add new employees. I thinking this needs to be done with a counter just not sure how.

You can do something like this:
BufferedReader reader = new BufferedReader(new FileReader(<<your file>>));
List<String> lines = new ArrayList<>();
String line = null;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
System.out.println(lines.get(0));
With BufferedReader you are able to read lines directly. This example reads the file line by line and stores the lines in an array list. You can access the lines after that by using lines.get(lineNumber).

You can read text from a file one line at a time and then do whatever you want to with that line, print it, compare it, etc...
// Construct a BufferedReader object from the input file
BufferedReader r = new BufferedReader(new FileReader("employeeData.txt"));
int i = 1;
try {
// "Prime" the while loop
String line = r.readLine();
while (line != null) {
// Print a single line of input file to console
System.out.print("Line "+i+": "+line);
// Prepare for next loop iteration
line = r.readLine();
i++;
}
} finally {
// Free up file descriptor resources
r.close();
}
// Remember the next available employee number in a one-up scheme
int nextEmployeeId = i;

BufferedReader reader =new BufferedReader(new FileReader("yourfile.csv"));
String line = "";
while((line=reader.readLine())!=null){
String [] employee =line.trim().split(",");
// if you want to check either it contains some name
//index 0 is first name, index 1 is last name, index 2 is ID
}

Alternatively, If you want more control over read CSV files then u can think about CsvBeanReader that will give you more access over files contents..

Here is an algorithm which I use for reading csv files. The most effective way is to read all the data in the csv file into a 2D array first. It just makes it a lot more flexible to manipulate the data.
That way you can specify which line of the file to print to the console by specifying it in the index of the array and using a for. I.e: System.out.println(employee_Data[1][y]); for record 1. y is the index variable for fields. You would need to use a For Loop of course, to print every element for each line.
By the way, if you want to use the employee data in a larger program, in which it may for example store the data in a database or write to another file, I'd recommend encapsulating this entire code block into a function named Read_CSV_File(), which will return a 2D String array.
My Code
// The return type of this function is a String.
// The CSVFile_path can be for example "employeeData.csv".
public static String[][] Read_CSV_File(String CSVFile_path){
String employee_Data[][];
int x;
int y;
int noofFields;
try{
String line;
BufferedReader in = new BufferedReader(new FileReader(CSVFile_path));
// reading files in specified directory
// This assigns the data to the 2D array
// The program keeps looping through until the line read in by the console contains no data in it i.e. the end of the file.
while ( (( line = in.readLine()) != null ){
String[] current_Record = line.split(",");
if(x == 0) {
// Counts the number of fields in the csv file.
noofFields = current_Record.length();
}
for (String str : values) {
employee_Data[x][y] = str;
System.out.print(", "+employee_Data[x][y]);
// The field index variable, y is incremented in every loop.
y = y + 1;
}
// The record index variable, x is incremented in every loop.
x = x + 1;
}
// This frees up the BufferedReader file descriptor resources
in.close();
/* If an error occurs, it is caught by the catch statement and an error message
* is generated and displayed to the user.
*/
}catch( IOException ioException ) {
System.out.println("Exception: "+ioException);
}
// This prints to console the specific line of your choice
System.out.println(("Employee 1:);
for(y = 0; y < noofFields ; y++){
// Prints out all fields of record 1
System.out.print(employee_Data[1][y]+", ");
}
return employee_Data;
}

For reading large file,
log.debug("****************Start Reading CSV File*******");
copyFile(inputCSVFile);
StringBuilder stringBuilder = new StringBuilder();
String line= "";
BufferedReader brOldFile = null;
try {
String inputfile = inputCSVFile;
log.info("inputfile:" + inputfile);
brOldFile = new BufferedReader(new FileReader(inputfile));
while ((line = brOldFile.readLine()) != null) {
//line = replaceSpecialChar(line);
/*do your stuff here*/
stringBuilder.append(line);
stringBuilder.append("\n");
}
log.debug("****************End reading CSV File**************");
} catch (Exception e) {
log.error(" exception in readStaffInfoCSVFile ", e);
}finally {
if(null != brOldFile) {
try {
brOldFile.close();
} catch (IOException e) {
}
}
}
return stringBuilder.toString();

Get the offset of previous line in a file

I'm extracting data from a file line by line into a database and i can't figure out a proper way to flag lines that I've already read into my database.
I have the following code that I use to iterate through the file lines and I attempt to verify
that the line has my flag or else I try to append the flag to the file line
List<String> fileLines = new ArrayList<String>();
File logFile = new File("C:\\MyStuff\\SyslogCatchAllCopy.txt");
try {
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line = "";
String doneReadingFlag = "##";
Scanner fileScanner = new Scanner(logFile);
while ((line = raf.readLine()) != null && !line.contains(doneReading)) {
Scanner s = new Scanner(line);
String temp = "";
if (!s.hasNext(doneReadingFlag)) {
fileLines.add(line);
raf.write(doneReadingFlag.getBytes(), (int) raf.getFilePointer(),
doneReadingFlag.getBytes().length);
} else {
System.err.println("Allready Red");
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException e) {
System.out.println("Exception while reading the file ");
}
// return fileLines;
// MoreProccessing(fileLines);
This code appends the flag to the next line and it overwrites the characters in that position
Any Help ?

When you write to a file, it doesn't insert do you should expect it to replace the characters.
You need to reserve space in the file for information you want to change or you can add information to another file.
Or instead of marking each file, you can store somewhere the lines number (or better the character position) you have read up to.
If you are not restarting your process you can have process read the file as it is appended (meaning you might not need to store where you are up to anywhere)

#Peter Lawrey I did as you said and it worked for me like that:
as follows:
ArrayList<String> fileLines=new ArrayList<String>();
File logFile=new File("C:\\MyStuff\\MyFile.txt");
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line="";
String doneReadingFlag="#";
long oldOffset=raf.getFilePointer();
long newOffset=oldOffset;
while ((line=raf.readLine())!=null)
{
newOffset=raf.getFilePointer();
if(!line.contains(doneReadingFlag))
{
fileLines.add(line);
raf.seek((long)oldOffset);
raf.writeChars(doneReadingFlag);
raf.seek(newOffset);
System.out.println("Line added and flaged");
}
else
{
System.err.println("Already Red");
}
oldOffset=newOffset;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split file into multiple files - java

Related

Java ignore a specific symbol after another specific symbol

Strange behavior reading files in Java [duplicate]

Creating an inverted index with limited memory in java

Read one line of a csv file in Java

Get the offset of previous line in a file

Categories

Resources