Create multiple files from one text file in java - java

I have one input.txt file which consist on let suppose 520 lines.
I have to make a code in java which will act like this.
Create first file named file-001.txt from first 200 lines. then create another file-002 from 201-400 lines. then file-003.txt from remaining lines.
I have coded this, it just write first 200 lines. What changes I need to make in order to update its working to above scenario.
public class DataMaker {
public static void main(String args[]) throws IOException{
DataMaker dm=new DataMaker();
String file= "D:\\input.txt";
int roll=1;
String rollnum ="file-00"+roll;
String outputfilename="D:\\output\\"+rollnum+".txt";
String urduwords;
String path;
ArrayList<String> where = new ArrayList<String>();
int temp=0;
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
for(String line; (line = br.readLine()) != null; ) {
++temp;
if(temp<201){ //may be i need some changes here
dm.filewriter(line+" "+temp+")",outputfilename);
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found");
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
void filewriter(String linetoline,String filename) throws IOException{
BufferedWriter fbw =null;
try{
OutputStreamWriter writer = new OutputStreamWriter(
new FileOutputStream(filename, true), "UTF-8");
fbw = new BufferedWriter(writer);
fbw.write(linetoline);
fbw.newLine();
}catch (Exception e) {
System.out.println("Error: " + e.getMessage());
}
finally {
fbw.close();
}
}
}
One way can be use of if else but I cant just use it because my actual file is 6000+ lines.
I want this code to work like I run the code and give me 30+ output files.

You can change the following bit:
if(temp<201){ //may be i need some changes here
dm.filewriter(line+" "+temp+")",outputfilename);
}
to this:
dm.filewriter(line, "D:\\output\\file-00" + ((temp/200)+1) + ".txt");
This will make sure first 200 lines go to first file, next 200 lines go to next file and so on.
Also, you might want to batch 200 lines together and write them in one go rather than creating a writer everytime and write to file.

You may have a method that creates the Writer to the current File, reads up to limit number of lines, closes the Writer to the current File, then returns true if it had enough to read , false if it couldn't read the limit number of lines (i.e, abort next call, don't attempt to read more lines or write next file).
Then you would call this in a loop , passing the Reader, the new file name, and the limit number.
Here is an example :
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
public class DataMaker {
public static void main(final String args[]) throws IOException {
DataMaker dm = new DataMaker();
String file = "D:\\input.txt";
int roll = 1;
String rollnum = null;
String outputfilename = null;
boolean shouldContinue = false;
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
do {
rollnum = "file-00" + roll;
outputfilename = "D:\\output\\" + rollnum + ".txt";
shouldContinue = dm.fillFile(outputfilename, br, 200);
roll++;
} while (shouldContinue);
} catch (FileNotFoundException e) {
System.out.println("File not found");
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private boolean fillFile(final String outputfilename, final BufferedReader reader, final int limit)
throws IOException {
boolean result = false;
String line = null;
BufferedWriter fbw = null;
int temp = 0;
try {
OutputStreamWriter writer = new OutputStreamWriter(
new FileOutputStream(outputfilename, true), "UTF-8");
fbw = new BufferedWriter(writer);
while (temp < limit && ((line = reader.readLine()) != null)) {
temp++;
fbw.write(line);
fbw.newLine();
}
// abort if we didn't manage to read the "limit" number of lines
result = (temp == limit);
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
} finally {
fbw.close();
}
return result;
}
}

Related

java split large files into smaller files while splitting the multiline record without breaking the record in incomplete state

I have a record split into multiple lines in a file. Only way to identify the end of the record is when new record starts with ABC. Below is the sample. File size could be 5-10 GB and I am looking for a efficient java logic ONLY to split the files(no need of reading every line), but splitting logic should a check to start a new file with new record, which should start with "ABC" in this case.
Added few more details, I am just looking for splitting the file and while splitting the last record should be ended correctly in a file.
Can someone please suggest?
HDR
ABCline1goesonforrecord1 //first record
line2goesonForRecord1
line3goesonForRecord1
line4goesonForRecord1
ABCline2goesOnForRecord2 //second record
line2goesonForRecord2
line3goesonForRecord2
line4goesonForRecord2
line5goesonForRecord2
ABCline2goesOnForRecord3 //third record
line2goesonForRecord3
line3goesonForRecord3
line4goesonForRecord3
TRL
So, this is the code that you need. I tested on a 10Gb file and it takes 64 seconds to split the file
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.TimeUnit;
public class FileSplitter {
private final Path filePath;
private BufferedWriter writer;
private int fileCounter = 1;
public static void main(String[] args) throws Exception {
long startTime = System.nanoTime();
new FileSplitter(Path.of("/tmp/bigfile.txt")).split();
System.out.println("Time to split " + TimeUnit.NANOSECONDS.toSeconds(System.nanoTime() - startTime));
}
private static void generateBigFile() throws Exception {
var writer = Files.newBufferedWriter(Path.of("/tmp/bigfile.txt"), StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);
for (int i = 0; i < 100_000; i++) {
writer.write(String.format("ABCline1goesonforrecord%d\n", i + 1));
for (int j = 0; j < 10_000; j++) {
writer.write(String.format("line%dgoesonForRecord%d\n", j + 2, i + 1));
}
}
writer.flush();
writer.close();
}
public FileSplitter(Path filePath) {
this.filePath = filePath;
}
void split() throws IOException {
try (var stream = Files.lines(filePath, StandardCharsets.UTF_8)) {
stream.forEach(line -> {
if (line.startsWith("ABC")) {
closeWriter();
openWriter();
}
writeLine(line);
});
}
closeWriter();
}
private void writeLine(String line) {
if (writer != null) {
try {
writer.write(line);
writer.write("\n");
} catch (IOException e) {
throw new UncheckedIOException("Failed to write line to file part", e);
}
}
}
private void openWriter() {
if (this.writer == null) {
var filePartName = filePath.getFileName().toString().replace(".", "_part" + fileCounter + ".");
try {
writer = Files.newBufferedWriter(Path.of("/tmp/split", filePartName), StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);
} catch (IOException e) {
throw new UncheckedIOException("Failed to write line to file", e);
}
fileCounter++;
}
}
private void closeWriter() {
if (writer != null) {
try {
writer.flush();
writer.close();
writer = null;
} catch (IOException e) {
throw new UncheckedIOException("Failed to close writer", e);
}
}
}
}
Btw, the solution with Scanner works too.
Regarding not reading all the lines, I don't see why don't you want this. If you choose not not read all the lines (it is possible) then, first you will overcomplicate the solution and second I'm pretty sure that you will loose from performance because of that logic that you have to incorporate in the splitting.
I didn't test this but something like this should work, you are not reading the whole file in memory just one line at a time so it should not be bad.
public void spiltRecords(String filename) {
/*
HDR
ABCline1goesonforrecord1 //first record
line2goesonForRecord1
line3goesonForRecord1
line4goesonForRecord1
ABCline2goesOnForRecord2 //second record
line2goesonForRecord2
line3goesonForRecord2
line4goesonForRecord2
line5goesonForRecord2
ABCline2goesOnForRecord3 //third record
line2goesonForRecord3
line3goesonForRecord3
line4goesonForRecord3
TRL
*/
try {
Scanner scanFile = new Scanner(new File(filename));
// now you do not want to edit the existing file in case things go wrong. one way is to get list of index
// where a new record starts.
LinkedList<Long> startOfRecordIndexes = new LinkedList<>();
long index = 0;
while (scanFile.hasNext()) {
if (scanFile.nextLine().startsWith("ABC")) {
startOfRecordIndexes.add(index);
}
index++;
}
// Once you have the starting index for all records you can iterate through the list and create new records
scanFile = scanFile.reset();
index = 0;
BufferedWriter writer = null;
while (scanFile.hasNext()) {
if (!startOfRecordIndexes.isEmpty() && index == startOfRecordIndexes.peek()) {
if(writer != null) {
writer.write("TRL");
writer.close();
}
writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("Give unique filename"), StandardCharsets.UTF_8));
writer.write("HDR");
writer.write(scanFile.nextLine());
startOfRecordIndexes.remove();
} else {
writer.write(scanFile.nextLine());
}
}
// Close the last record
if(writer != null) {
writer.write("TRL");
writer.close();
}
} catch (IOException e) {
// deal with exception
}
}

Concatenate multiple files

I am trying to concatenate multiple text files. The program is working correctly, but if I do not know the total number of files, then how should the for loop be changed?
public class MultipleMerge {
public static void main(String[] args) {
BufferedReader br = null;
BufferedWriter bw = null;
String inFileName = "C:\\Users\\dokania\\Desktop\\Bio\\Casp10\\fasta\\out";
File file = new File("C:\\Users\\dokania\\Desktop\\New folder\\out.txt");
try {
String s;
int fileCounter = 0;
FileWriter fw = new FileWriter(file.getAbsoluteFile());
bw = new BufferedWriter(fw);
for (fileCounter = 0; fileCounter < 157; fileCounter++) {
br = new BufferedReader(new FileReader(inFileName + (fileCounter++) + ".fa"));
while ((s = br.readLine()) != null) {
bw.write(s + "\n");
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) {
br.close();
bw.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Try get an array of Files in directory:
File[] array = new File("C:\\Users\\dokania\\Desktop\\Bio\\Casp10\\fasta\\").listFiles();
And then go through all files using foreach cycle
for(File file:array){
//...
}
Maybe you'll need to use FileFilter:
http://docs.oracle.com/javase/7/docs/api/java/io/FileFilter.html
in method listFiles()
You could use command line arguments:
public class CommandLineTest {
public static void main(String[] args) {
int howManyFiles = Integer.parseInt(args[0]);
}
}
Above code gives you the first command line argument and treats it as an integer. In your code, you should check if there really is an integer specified, though.

How to check if txt file contains a String, if so don't duplicate it

So right now I'm making a mod in Minecraft where it takes everyones username from a server and adds it to a txt file, it works but the the problem is I don't want to duplicate the names when I use the command again. Nothing has worked so far. How would I check if the txt already contains the username, don't add it again? Thank you. Again, I need it to before writing another name to the list, check the txt file if it already contains the name, if so don't add it.
for (int i = 0; i < minecraft.thePlayer.sendQueue.playerInfoList.size(); i++) {
List playerList = minecraft.thePlayer.sendQueue.playerInfoList;
GuiPlayerInfo playerInfo = (GuiPlayerInfo) playerList.get(i);
String playerName = StringUtils.stripControlCodes(playerInfo.name);
try {
fileWriter = new FileWriter(GameDirectory() + "\\scraped.txt", true);
bufferedReader = new BufferedReader(new FileReader(GameDirectory() + "\\scraped.txt"));
lineNumberReader = new LineNumberReader(new FileReader(GameDirectory() + "\\scraped.txt"));
} catch (IOException e) {
e.printStackTrace();
}
printWriter = new PrintWriter(fileWriter);
try {
fileWriter.write(playerName + "\r\n");
lineNumberReader.skip(Long.MAX_VALUE);
} catch (IOException e) {
e.printStackTrace();
}
printWriter.flush();
}
addMessage("Scraped " + lineNumberReader.getLineNumber() + " usernames!");
EDIT: Really need an answer guys :( Thank you
EDIT: this is what I have now, but it's not even writing it anymore.
List playerList = minecraft.thePlayer.sendQueue.playerInfoList;
for (int i = 0; i < minecraft.thePlayer.sendQueue.playerInfoList.size(); i++) {
GuiPlayerInfo playerInfo = (GuiPlayerInfo) playerList.get(i);
String playerName = StringUtils.stripControlCodes(playerInfo.name);
String lines;
try {
if ((lines = bufferedReader.readLine()) != null) {
if (!lines.contains(playerName)) {
bufferedWriter.write(playerName);
bufferedWriter.newLine();
bufferedWriter.flush();
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
int linenumber = 0;
try {
while (lineNumberReader.readLine() != null) {
linenumber++;
}
} catch (IOException e) {
e.printStackTrace();
}
The logic of your second piece of code is wrong. If you write out the pseudo-code of it, it's easy to see why:
Open a File Reader at the start of the file
For every Player on the server
Save the player name
Read the next line of the file
If we have not reached the end of the file
If the player name is not on this line of the file
Write the name of the player to the file
You need to read the entire file outside of the loop, and then check if the player exists anywhere in the file, not just if it happens to be on the line which is the same position as the player on the server.
The easiest way to do this is to keep the players in a list while you're processing, and read/write them to file, like this:
public static List<String> loadPlayerList() throws FileNotFoundException
{
final Scanner scanner = new Scanner(new File(GameDirectory() + "\\scraped.txt"));
final List<String> players = new ArrayList<>();
while(scanner.hasNextLine())
players.add(scanner.nextLine());
return players;
}
public static void writePlayersList(final List<String> players) throws IOException
{
try(final BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream((GameDirectory() + "\\scraped.txt")))))
{
for(final String player : players)
{
writer.write(player);
writer.newLine();
}
}
}
public static void main(String[] args) throws IOException
{
final List<String> players = loadPlayerList();
for(final GuiPlayerInfo player : minecraft.thePlayer.sendQueue.playerInfoList)
{
final String playerName = StringUtils.stripControlCodes(player.name);
if(!players.contains(playerName))
players.add(playerName);
}
writePlayersList(players);
}

Java : Can't capture console output of a console program (Blockland.exe)

I'm trying to capture the output of a console program and write overwriting lines of the output to a file which another program will read, line by line I write into this file (the file should only contain one line at a time) but when I made this code and tried running it, it didn't work. The process started perfectly, but the file is not being created, written to, and I am not getting any System.out.println's of "Streaming : blah blah blah"
You can read the code below or use this pastebin : http://pastebin.com/raw.php?i=Yahsqxma
import java.io.*;
import java.util.Scanner;
public class OpenRC {
static BufferedReader consoleInput = null;
static String os = System.getProperty("os.name").toLowerCase();
static Process server;
public static void main(String[] args) throws IOException {
// OpenRC by Pacnet2013
System.out.println(System.getProperty("user.dir"));
if(os.indexOf("win") >= 0) {
os = "Windows";
}
else if(os.indexOf("mac") >= 0) {
os = "Mac";
}
else if(os.indexOf("nux") >= 0) {
os = "Linux";
}
switch(os){
case "Linux" : //cause I need WINE
File file = new File(System.getProperty("user.dir") + "/OpenRC.txt");
try {
Scanner scanner = new Scanner(file);
String path = scanner.nextLine();
System.out.println("Got BlocklandEXE - " + path);
String port = scanner.nextLine();
System.out.println("Got port - " + port);
scanner.close();
server = new ProcessBuilder("wine", path + "Blockland.exe", "ptlaaxobimwroe", "-dedicated", "-port" + port).start();
if(consoleInput != null)
consoleInput.close();
consoleInput = new BufferedReader(new InputStreamReader(server.getInputStream()));
streamLoop();
} catch (FileNotFoundException e) {
System.out.println("You don't have an OpenRC Config file OpenRC.txt in the directory of this program");
}
}
}
public static void streamConsole()
{
String line = "";
int numLines = 0;
try
{
if (consoleInput != null)
{
while((line = consoleInput.readLine()) != null && consoleInput.ready())
{
numLines++;
}
}
}
catch (IOException e)
{
System.out.println("There may be a problem - An IOException (java.io.IOException) was caught so some lines may not display / display correctly");
}
if(!line.equals("") && !(line == null))
{
System.out.println("Streaming" + numLines + line);
writeToFile(System.getProperty("user.dir"), line);
}
}
public static void streamLoop()
{
try
{
Thread.sleep(5000);
}
catch (InterruptedException e)
{
System.out.println("A slight problem may have happened while trying to read a command");
}
streamConsole();
streamLoop(); //it'll go on until you close this program
}
public static void writeToFile(String filePath, String content)
{
try {
File file = new File(filePath);
if (!file.exists()) {
file.createNewFile();
System.out.println("Creating new stream text file");
}
FileWriter writer = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(writer);
bw.write(content);
bw.close();
System.out.println("Wrote stream text file");
} catch (IOException e) {
e.printStackTrace();
}
}
}
You are running a DOS console application, which does not necessarily write to stdout or stderr, but it writes to the "console". It's nearly impossible to capture the "console" output reliably. The only tool that I have ever seen that is able to capture console output is expect by Don Libes, and that does all sorts of hacks.

Continually read the lines being appended to a log file

Concerning my previous question , I found out that maven can't really output jboss console. So I thought I'd like to make workaround it. Here is the deal:
While jboss is running, it writes console logs into server.log file, so I'm trying to retrieve the data as it comes in, because every few seconds the file is changes/updated by jboss I've encountered some difficulties so I need help.
What I actually need is:
read file server.log
when server.log is changed with adding few more lines output the change
Here is the code so far I got, there is a problem with it, it runs indefinitely and it starts every time from the beginning of the file, I'd like it to continue printing just the new lines from server.log. Hope it makes some sense here is the code:
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
for(;;){ //run indefinitely
// Open the file
FileInputStream fstream = new FileInputStream("C:\\jboss-5.1.0.GA\\server\\default\\log\\server.log");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}
}
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
According to the Montecristo suggestion I did this :
import java.io.*;
class FileRead {
public static void main(String args[]) {
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream(
"C:\\jboss-5.1.0.GA\\server\\default\\log\\server.log");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String line;
// Read File Line By Line
while ((line = br.readLine()) != null) {
// Print the content on the console
line = br.readLine();
if (line == null) {
Thread.sleep(1000);
} else {
System.out.println(line);
}
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
And it still not working, it just printed the original file.. although the file changes constantly nothing happens.. nothing gets printed out except the original log file.
HERE IS THE SOLUTION: tnx Montecristo
import java.io.*;
class FileRead {
public static void main(String args[]) {
try {
FileInputStream fstream = new FileInputStream(
"C:\\jboss-5.1.0.GA\\server\\default\\log\\server.log");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String line;
while (true) {
line = br.readLine();
if (line == null) {
Thread.sleep(500);
} else {
System.out.println(line);
}
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Also see :
http://vanillajava.blogspot.co.uk/2012/08/java-memes-which-refuse-to-die.html
I don't know if you're going in the right direction but if I've understood correctly you'll find this useful: java-io-implementation-of-unix-linux-tail-f
You can use RandomAccessFile.
import java.io.IOException;
import java.io.RandomAccessFile;
public class LogFileReader {
public static void main( String[] args ) {
String fileName = "abc.txt";
try {
RandomAccessFile bufferedReader = new RandomAccessFile( fileName, "r"
);
long filePointer;
while ( true ) {
final String string = bufferedReader.readLine();
if ( string != null )
System.out.println( string );
else {
filePointer = bufferedReader.getFilePointer();
bufferedReader.close();
Thread.sleep( 2500 );
bufferedReader = new RandomAccessFile( fileName, "r" );
bufferedReader.seek( filePointer );
}
}
} catch ( IOException | InterruptedException e ) {
e.printStackTrace();
}
}
}

Categories