I'm creating an xml-file in java using jaxb and XmlStreamWriter. This will become a very large file and has to be split into several pieces of max. 200MB. These pieces shouldn't be readable xml anymore.
The name of this file is very specific using the date and several parameters and at the end they're numbered like this: '3.1', '3.2', '3.3' where the first number is the number of chunks created and the second number is the following-number of the file. The first part of the filename (apart from the numbering) is created in the java application.
Now I want to create a UNIX script that calls the java application with the needed parameters, splits the file and renames the chunks.
I know the commands to call the java application and to split and rename files but I don't know how to combine them because I only now the filename in the Java application so I can't decide which file has to be split and renamed.
Does anyone have an idea how to deal with it?
EDIT:
Ok I'll try to be a bit less vague.
The application I created creates very large xml-files. The name of this files are in the following format: FI.DB2P.107601.20130130.20010.T.1.1 . This name contains some identification numbers and the date when the file is created. The first part of the name is created in the Java application like this: FI.DB2P.107601.20130130.20010.T.
Now this file should be split into several chunks of max. 200 MB each. Then the created chunks should have the same name as the 'base-file' but they have to end with 'T.3.1', 'T.3.2' and 'T.3.3' for example.
My question now is how I can obtain the filename of the file created by the java application in the Unix script. The filename is pretty complex and contains data from the database so I can't define the name in the Unix script.
I hope it's a bit clearer now.
Is it not the case that your Java process will call the Unix script and therefore will be able to pass it the filename on the command line.
The Unix script can take the filename and run something like split on it and then fix-up the filenames to those that your Java process is expecting.
Unless I misunderstand your question that should be fairly easy to do.
When you create your XMLStreamWriter you know (hopefully) the name of the file:
String fileName = "FI.DB2P.107601.20130130.20010.T.1.1";
XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter(fileName ));
Then it's not a problem to pass this name as a parameter to your shell script:
Runtime.getRuntime().exec("yourscript.sh " + fileName);
yourscript.sh will have code to split the file and add incrementing variable to the file name, something like this might work:
#!/bin/bash
split -b 200m -a 5 "$1" splited_file
i=1
for file in splited_file*
do
mv $file $1_${i}
i=$(( i + 1 ))
done
ps: this script is not thread safe :)
Related
I'm a Uni student trying to multiply two matrices stored in txt files via java and Eclipse. We were given a pre-compiled class file, but not the source code for the class file, essentially making it a blackbox class. We're supposed to use vim and the Linux terminal to program and execute our java code, but I find that Ecplise is far more time-efficient. However, when using the Linux terminal and vim my program works as intended, whereas when using Eclipse it does not.
Here's my source code with only the LOCs using the blackbox class
String fileOne = ArrayReader.getFileName("Enter the file name of matrix one");
int[][] matrixOne = ArrayReader.readArray(fileOne);
String fileTwo = ArrayReader.getFileName("Enter the file name of matrix two");
int[][] matrixTwo = ArrayReader.readArray(fileTwo);
The getFileName function outputs the argument, expecting the user to enter the file name (including the extension) of the file with the elements of the matrix in it. If it doesn't find the file, it returns an error message stating so, then asking for the file name again. The readArray function simply gets the elements and assigns them to the elements of the integer matrix.
I've tried putting the txt files in both my src and bin folders in my project directory, and inputting the file names with and without the file extension multiple times, but to no avail.
Any ideas?
I should put this in a comment but i don't have enough reputations
*Can you provide more details about the error so we can help and try to decompile the class to view it's source code you may find your answer also you can hardcode the file name (write it directly in the code) to test if everything works correctly *
The ArrayReader class expects the computer to be using Linux, not Windows.
I have two files int the directory '/tmp': test.txt ,test.pl
the content of test.txt is :
abcdefg
the content of the test.pl is:
#!/usr/bin/perl -w
chdir '/tmp';
$data = `more test.txt`;
open (MYFILE,">","newtest.txt") || die ("can not open this file");
print MYFILE $data;
Then in java class I write :
Process ps1= Runtime.getRuntime().exec("perl /tmp/test.pl ");
Then the content of the newtest.txt generated from the perl is :
:::::::::::
test.txt
:::::::::::
abcdefg
Here is the problem ,There is a difference
:::::::
test.txt
:::::::
but when I run 'perl test.pl' in linux , there is no difference between two files.
anyone knows the reason ? Thanks !
more is the guilty one here. It's intended for looking at a file one page at a time and waits for hitting e.g. SPACE to show the next page. Try cat instead of more to see if that is the reason.
more tries to be smart and autodetect whether it was called in an interactive situation but sometimes it fails. Depending on its settings it might behave as cat then, but you cannot be sure...
I'm using Hadoop 2.7.1 and coding in Java. I'm able to run a simple mapreduce program where I provide a folder as input to the MapReduce program.
However I want to use a file (full paths are inside ) as input; this file contains all the other files to be processed by the mapper function.
Below is the file content,
/allfiles.txt
- /tmp/aaa/file1.txt
- /tmp/bbb/file2.txt
- /tmp/ccc/file3.txt
How can I specify the input path to MapReduce program as a file , so that it can start processing each file inside ? thanks.
In your driver class, you can read in the file, and add each line as a file for input:
//Read allfiles.txt and put each line into a List (requires at least Java 1.7)
List<String> files = Files.readAllLines(Paths.get("allfiles.txt"), StandardCharsets.UTF_8);
/Loop through the file names and add them as input
for(String file : files) {
//This Path is org.apache.hadoop.fs.Path
FileInputFormat.addInputPath(conf, new Path(file));
}
This is assuming that your allfiles.txt is local to the node on which your MR job is being run, but it's only a small change if allfiles.txt is actually on the HDFS.
I strongly recommended that you check that each file exists on the HDFS before you add it as input.
Instead of creating a file with path to other files, you could use globs.
In your example, you could have defined your inputs as -input /tmp/*/file?.txt
Im currently coding a program, but i need to make it execute a vbs file. TempDir.vbs. However, the directory to this file contains spaces.
Unfortunally, all other topics dont work when the directory contains spaces.
In my case:
C:\\Users\\"the user"\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Startup
the code im currently using is:
Runtime.getRuntime().exec("wscript.exe " + "\"\"\"" + path + "\"\"\"" + "TempDir.vbs");
So, how can i execute the file TempDir.vbs.
Instead of using Runtime.exec(String), use Runtime.exec(String[]):
Runtime.getRuntime().exec(new String[] {
"wscript.exe",
path + "TempDir.vbs"
});
As mentioned in a comment to a now deleted answer by ziesemer, if the .vbs file is a console script, you might need to use cscript.exe. See this for explanation: Difference between wscript and cscript
I have a Java file Animal.java, and I want to be able to write a terminal command like so:
java Animal -a -print < data.txt
I know that the -a and -print appear as variables in the array arg (which is an input to the main method -- so arg[0] is -a and arg[1] is -print), but how can I access the data from data.txt?
It's is not possible in java, but instead you may pass the location of the file(not required if its already known and fixed) and then access the file's data using InputStream/ readLine() etc..
You can not manipulate the files in the same way that the Unix pipes do. You need treat your arguments so that there is a prefix or a position that defines the name of the file you want to read, so it is possible read the file passed as argument.
// Treating arguments and verify the possibility of reading file
// Get index which is the file name
FileInputStream fstream = new FileInputStream (argv [FileIndex]);
// Handle the file .
See an example here: Passing a file as a command line argument and reading its lines
Imagina the structure src/mypack/Main.java, run on terminal:
$ cd src/mypack/
/src/mypack$ javac Main.java
/src/mypack$ cd ..
/src$ java mypack.Main < path/your/file