EDI to XML Huge file conversions

EDI to XML Huge file conversions - java

I am converting an EDI file to XML. However my input file which happens to also be in BIF is approximately 100Mb is giving me a JAVA out of memory error.
I tried to consult Smook's Documentation for the huge file conversion, however it is a conversion from XML to EDI.
Below is the response I am getting when running my main
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.StringWriter.write(StringWriter.java:94)
at java.io.Writer.write(Writer.java:127)
at freemarker.core.TextBlock.accept(TextBlock.java:56)
at freemarker.core.Environment.visit(Environment.java:257)
at freemarker.core.MixedContent.accept(MixedContent.java:57)
at freemarker.core.Environment.visitByHiddingParent(Environment.java:278)
at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
at freemarker.core.Environment.visitIteratorBlock(Environment.java:501)
at freemarker.core.IteratorBlock.accept(IteratorBlock.java:67)
at freemarker.core.Environment.visit(Environment.java:257)
at freemarker.core.Macro$Context.runMacro(Macro.java:173)
at freemarker.core.Environment.visit(Environment.java:686)
at freemarker.core.UnifiedCall.accept(UnifiedCall.java:80)
at freemarker.core.Environment.visit(Environment.java:257)
at freemarker.core.MixedContent.accept(MixedContent.java:57)
at freemarker.core.Environment.visit(Environment.java:257)
at freemarker.core.Environment.process(Environment.java:235)
at freemarker.template.Template.process(Template.java:262)
at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:92)
at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:86)
at org.milyn.event.report.HtmlReportGenerator.applyTemplate(HtmlReportGenerator.java:76)
at org.milyn.event.report.AbstractReportGenerator.processFinishEvent(AbstractReportGenerator.java:197)
at org.milyn.event.report.AbstractReportGenerator.processLifecycleEvent(AbstractReportGenerator.java:157)
at org.milyn.event.report.AbstractReportGenerator.onEvent(AbstractReportGenerator.java:92)
at org.milyn.Smooks._filter(Smooks.java:558)
at org.milyn.Smooks.filterSource(Smooks.java:482)
at com.***.xfunctional.EdiToXml.runSmooksTransform(EdiToXml.java:40)
at com.***.xfunctional.EdiToXml.main(EdiToXml.java:57)
import java.io.*;
import java.util.Arrays;
import java.util.Locale;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.container.ExecutionContext;
import org.milyn.event.report.HtmlReportGenerator;
import org.milyn.io.StreamUtils;
import org.milyn.payload.StringResult;
import org.milyn.payload.SystemOutResult;
import org.xml.sax.SAXException;
public class EdiToXml {
private static byte[] messageIn = readInputMessage();
protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {
Locale defaultLocale = Locale.getDefault();
Locale.setDefault(new Locale("en", "EN"));
// Instantiate Smooks with the config...
Smooks smooks = new Smooks("smooks-config.xml");
try {
// Create an exec context - no profiles....
ExecutionContext executionContext = smooks.createExecutionContext();
StringResult result = new StringResult();
// Configure the execution context to generate a report...
executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));
// Filter the input message to the outputWriter, using the execution context...
smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)),result);
Locale.setDefault(defaultLocale);
return result.getResult();
} finally {
smooks.close();
}
}
public static void main(String[] args) throws IOException, SAXException, SmooksException {
System.out.println("\n\n==============Message In==============");
System.out.println("======================================\n");
pause(
"The EDI input stream can be seen above. Press 'enter' to see this stream transformed into XML...");
String messageOut = EdiToXml.runSmooksTransform();
System.out.println("==============Message Out=============");
System.out.println(messageOut);
System.out.println("======================================\n\n");
pause("And that's it! Press 'enter' to finish...");
}
private static byte[] readInputMessage() {
try {
InputStream input = new BufferedInputStream(new FileInputStream("/home/****/Downloads/BifInputFile.DATA"));
return StreamUtils.readStream(input);
} catch (IOException e) {
e.printStackTrace();
return "<no-message/>".getBytes();
}
}
private static void pause(String message) {
try {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
System.out.print("> " + message);
in.readLine();
} catch (IOException e) {
}
System.out.println("\n");
}
}
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
<!--
Configure the EDI Reader to parse the message stream into a stream of SAX events.
-->
<edi:reader mappingModel="edi-to-xml-bif-mapping.xml" validate="false"/>
</smooks-resource-list>
I edited this line in the code to reflect the usage of a stream :-
smooks.filterSource(executionContext, new StreamSource(new FileInputStream("/home/***/Downloads/sample-text-file.txt")), result);
However I now have this below as error. Anybody any guess what is the best approach ?
Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
at org.milyn.Smooks._filter(Smooks.java:526)
at org.milyn.Smooks.filterSource(Smooks.java:482)
at ****.EdiToXml.runSmooksTransform(EdiToXml.java:41)
at com.***.***.EdiToXml.main(EdiToXml.java:58)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0]. Must be a minimum of 1 instances of segment [UNH]. Currently at segment number 1.
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:504)
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:453)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:428)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:386)
at org.milyn.smooks.edi.EDIReader.parse(EDIReader.java:111)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
... 5 more

The message was valid and the xml mapping was good. I was just not using the optimal method for message reading and writing.
I came to realize the filterSource method of Smooks can directly be fed with an InputStream & OutputStream as variables. Kindly find below the piece of code that led to an efficient running of the program without going through JAVA memory error.
//Instantiate a FileInputStream
FileInputStream inputStream = new FileInputStream(inputFileName);
//Instantiate an FileOutputStream
FileOutputStream outputStream = new FileOutputStream(outputFileName);
try {
// Filter the input message to the outputWriter...
smooks.filterSource(new StreamSource(inputStream), new StreamResult(outputStream));
Locale.setDefault(defaultLocale);
} finally {
smooks.close();
inputStream.close();
outputStream.close();
}
Thanks to the community.
Regards.

I'm the original author of Smooks and that Edifact parsing stuff. Jason emailed me asking for advice on this but I haven't been involved in it for a number of years now, so not sure how helpful I’d be.
Smooks doesn’t read the full message into memory. It streams it though a parser that converts it to a stream of SAX events, making it “look like” XML to anything downstream of it. If those events are then used to build a big Java object model in men then that might result in OOM errors etc.
Looking at the Exception message, it simply looks like the EDIFACT input doesn’t match the definition file being used.
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0]. Must be a minimum of 1 instances of segment [UNH]. Currently at segment number 1.
Those EDIFACT definition files were originally generated directly from the definitions published by the EDIFACT group, but I do remember that many people “tweak” the message formats, which seems like what might be happening here (and hence the above error). One solution to that would be to tweak the pre-generated definitions to match.
I know that a lot of changes have been made in Smooks in this area in the last year or two (using Apache Daffodil for the definitions) but I wouldn’t be the best person to talk about that. You can try the Smooks mailing list for help on that.

Related

Why is StandardOpenOption.CREATE_NEW not creating a file to be read from in Java?

Here is my code:
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.InvalidPathException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class ExplicitChannelRead {
public static void main(String[] args) {
int count;
Path filePath = null;
// First, obtain a path to a file.
try {
filePath = Paths.get("test1.txt");
}
catch(InvalidPathException e) {
System.out.println("Path error: "+e);
return;
}
// Next, obtain a channel to that file within a try-with-resources block.
try(SeekableByteChannel fChan =
Files.newByteChannel(filePath, StandardOpenOption.CREATE_NEW)) {
// Allocate a buffer.
ByteBuffer mBuf = ByteBuffer.allocate(128);
while((count=fChan.read(mBuf)) != -1) {
//Rewind the buffer so that it can be read.
mBuf.rewind();
for(int i=0; i<count; i++) System.out.print((char)mBuf.get());
}
System.out.println();
} catch (IOException e) {
e.printStackTrace();
// System.out.println("I/O error: "+e);
}
}
}
On running the above code I get this exception:
java.nio.file.NoSuchFileException: test1.txt
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:85)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108)
at java.base/sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:235)
at java.base/java.nio.file.Files.newByteChannel(Files.java:375)
at java.base/java.nio.file.Files.newByteChannel(Files.java:426)
at fileNIO.ExplicitChannelRead.main(ExplicitChannelRead.java:31)
I don't understand why test1.txt file is not being created as it doesn't exist currently and I am using the StandardOpenOption.CREATE_NEW option?
When I use StandardOpenOption.WRITE option along with StandardOpenOption.CREATE_NEW then I see the file text1.txt being created and at that time I get the exception:
Exception in thread "main" java.nio.channels.NonReadableChannelException
This exception I understand its cause because I have opened the file in write mode and in the code I am performing read operation on the file.
It seems to me that a new file can't be created when the file is opened in read mode.

I have reproduced what you are seeing (on Linux with Java 17).
As I noted in the comments, the behavior seems to contradict what the javadocs say what should happen, but what I discovered is this:
With READ or neither READ or WRITE, a NoSuchFileException is thrown.
With WRITE (and no READ), the file is created but then NonReadableChannelException is thrown.
With both READ and WRITE, it works. At least ... it did for me.
I guess this sort of makes sense. You need READ to read the file and WRITE to create it. And the javadocs state that READ is the default if you don't specify READ, WRITE or APPEND.
But creating an empty file1 with CREATE_NEW and then immediately trying to read it is a use-case that borders on pointless. So it not entirely surprising that they didn't (clearly) document how to achieve this.
1 - As a comment noted, CREATE_NEW is specified to fail if the file already exists. If you want "create it if it doesn't exist", then you should use CREATE instead.

try/catch failing incorrectly + console commands taking input but not running

So I'm currently taking AP Comp Sci A and am trying to learn Java, and while developing a small program to keep up with my teacher I've run into a couple issues.
The program is intended to take entries and log them in a journal. I'd eventually like to have it stored in an HTML format and then be able to email my logs to a teacher in an HTML table, but this issue is preventing that.
Basically, with my catch, I'm trying to create and then enter starting HTML code (,, and then necessary tags for the html table) but even if the file doesn't exist the catch isn't running correctly, presumably because of the 'throws IOException' Exclipse had me add.
I also attempted to add commands to my program, but nothing happens when used. No exceptions thrown, nothing printed, etc.
Here's my code:
import java.util.Scanner;
import java.util.Calendar;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.text.SimpleDateFormat;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
public class LogSend {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
System.out.println("");
System.out.println("Type '!help' for commands");
Scanner cmd = new Scanner(System.in);
String initialCmd = cmd.nextLine();
if (initialCmd == "!help") {
System.out.println("The commands are:");
System.out.println("'!help' for commands");
System.out.println("'!log' to log an entry");
} else if (initialCmd == "!log") {
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());
System.out.println(timeStamp);
Scanner entry = new Scanner(System.in);
String journalEntry = entry.nextLine();
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
}
}
/*String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());
System.out.println(timeStamp);
Scanner entry = new Scanner(System.in);
String journalEntry = entry.nextLine();
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
} /*catch(FileNotFoundException e) {
FileWriter fileWriter = new FileWriter("log.html");
PrintWriter saveLog = new PrintWriter(fileWriter);
saveLog.println("<html>");
saveLog.println("<body>");
saveLog.println(timeStamp+":"+journalEntry);
saveLog.println("</body>");
saveLog.println("</html>");
}*/
}
}
Sorry if this is all a bit stupid, I'm brand new to Java and find I learn best through just making programs. I appreciate the help.
NOTE: It's worth noting I've commented out the catch because it's simply not working, but that's the code I used.

First String comparison goes by a method equals
initialCmd == "!help"
initialCmd.equals("!help") // Or equalsIgnoreCase
Then
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
}
is okay, as it writes to the file in append mode (the true) and will almost never throw a FileNotFoundException (=could not create file).
You may do:
try (...
...
} catch (IOException e) {
e.printStackTrace();
}
As FileNotFoundException is also an IOException. For out-of-diskspace, missing rights, wrong directory path.
One remark: FileWriter will use the default (=platform) encoding. For
the full Unicode range that a String is capable of, you could use UTF-8:
try (PrintWriter saveLog = new PrintWriter(
Files.newBufferedWriter(Paths.get("log.html"),
StandardCharsets.UTF_8,
StandardOpenOption.APPEND,
StandardOpenOption.CREATE))) {
saveLog.printf("%s:<b>%s</b><br>%n", timeStamp, journalEntry);
}
Also not that one cause for an exception might be two thread logging to the file.
It would not be wrong to invest time in the java.util.logging framework that can be customized to about the same functionality, and more.
The above does not throw a catch on a non-existing file, assuming the same content being written.
For HTML one would like to write a beginning. One can do that by including the log file in a real HTML file:
<!DOCTYPE html [
<ENTITY log SYSTEM "log.html">
]>
<html>
<head>
<title>Logs</title>
<meta charset="UTF-8">
<meta http-equiv="refresh" content="5"> <!-- reload every 5s -->
</head>
<body>
&log;
</body>
</html>

#index.php For clarification, passing true as the 2nd argument of FileWriter (as #JoopEggen suggests) causes the file to be created if it doesn't exist and to be opened for appending if it does exist. Therefore you won't hit the catch as you'll create a new file logs.html and append to it once created. Isn't that what you're trying to do anyway? So if the catch is just for creating the file when it doesn't exist, it is no longer required?
On the other hand, if you're trying to log the fact the file never existed you can use java.io.File and do something like:
File f = new File(filePathString);
//Note we do !f.isDirectory() as exists will return true for directories too
if(f.exists() && !f.isDirectory()) {
// log that file didn't exist
// create file and append custom error
}
Though with this method you need to be careful not to run into race conditions.
Hope this helps, let me know how you get on!

How to serialize the data to AVRO schema in Spark (with Java)?

I have defined an AVRO schema, and generated some classes with avro-tools for the schemes. Now, I want to serialize the data to disk. I found some answers about scala for this, but not for Java. The class Article is generated with avro-tools, and is made from a schema defined by me.
Here's a simplified version of the code of how I try to do it:
JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);
JavaRDD<Article> processingFiles = filesRDD.map(fileNameContent -> {
// The name of the file
String fileName = fileNameContent._1();
// The content of the file
String fileContent = fileNameContent._2();
// An object from my avro schema
Article a = new Article(fileContent);
Processing processing = new Processing();
// .... some processing of the content here ... //
processing.serializeArticleToDisk(avroFileName);
return a;
});
where serializeArticleToDisk(avroFileName) is defined as follows:
public void serializeArticleToDisk(String filename) throws IOException{
// Serialize article to disk
DatumWriter<Article> articleDatumWriter = new SpecificDatumWriter<Article>(Article.class);
DataFileWriter<Article> dataFileWriter = new DataFileWriter<Article>(articleDatumWriter);
dataFileWriter.create(this.article.getSchema(), new File(filename));
dataFileWriter.append(this.article);
dataFileWriter.close();
}
where Article is my avro schema.
Now, the mapper throws me the error:
java.io.FileNotFoundException: hdfs:/...path.../avroFileName.avro (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at org.apache.avro.file.SyncableFileOutputStream.<init>(SyncableFileOutputStream.java:60)
at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:129)
at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:129)
at sentences.ProcessXML.serializeArticleToDisk(ProcessXML.java:207)
. . . rest of the stacktrace ...
although the file path is correct.
I use a collect() method afterwards, so everything else within the map function works fine (except for the serialization part).
I am quite new with Spark, so I am not sure if this might be something trivial actually. I suspect that I need to use some writing functions, not to do the writing in the mapper (not sure if this is true, though). Any ideas how to tackle this?
EDIT:
The last line of the error stack-trace I showed, is actually on this part:
dataFileWriter.create(this.article.getSchema(), new File(filename));
This is the part that throws the actual error. I am assuming the dataFileWriter needs to be replaced with something else. Any ideas?

This solution is not using data-frames and is not throwing any errors:
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.io.NullWritable;
import org.apache.avro.mapred.AvroKey;
import org.apache.spark.api.java.JavaPairRDD;
import scala.Tuple2;
. . . . .
// Serializing to AVRO
JavaPairRDD<AvroKey<Article>, NullWritable> javaPairRDD = processingFiles.mapToPair(r -> {
return new Tuple2<AvroKey<Article>, NullWritable>(new AvroKey<Article>(r), NullWritable.get());
});
Job job = AvroUtils.getJobOutputKeyAvroSchema(Article.getClassSchema());
javaPairRDD.saveAsNewAPIHadoopFile(outputDataPath, AvroKey.class, NullWritable.class, AvroKeyOutputFormat.class,
job.getConfiguration());
where AvroUtils.getJobOutputKeyAvroSchema is:
public static Job getJobOutputKeyAvroSchema(Schema avroSchema) {
Job job;
try {
job = new Job();
} catch (IOException e) {
throw new RuntimeException(e);
}
AvroJob.setOutputKeySchema(job, avroSchema);
return job;
}
Similar things for Spark + Avro can be found here -> https://github.com/CeON/spark-utils.

It seems that you use Spark in a wrong way.
Map is a transformation function. Just calling map doesn't invoke calulation of RDD. You have to call action like forEach() or collect().
Also note, that lambda supplied to map will be serialized at driver and transferred to some Node in a cluster.
ADDED
Try to use Spark SQL and Spark-Avro to save Spark DataFrame in Avro format:
// Load a text file and convert each line to a JavaBean.
JavaRDD<Person> people = sc.textFile("/examples/people.txt")
.map(Person::parse);
// Apply a schema to an RDD
DataFrame peopleDF = sqlContext.createDataFrame(people, Person.class);
peopleDF.write()
.format("com.databricks.spark.avro")
.save("/output");

SequenceFile Compactor of several small files in only one file.seq

Novell in HDFS and Hadoop:
I am developing a program which one should get all the files of a specific directory, where we can find several small files of any type.
Get everyfile and make append in a SequenceFile compressed, where the key must be the path of the file, and the value must be the file got, For now my code is:
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.io.compress.BZip2Codec;
public class Compact {
public static void main (String [] args) throws Exception{
try{
Configuration conf = new Configuration();
FileSystem fs =
FileSystem.get(new URI("hdfs://quickstart.cloudera:8020"),conf);
Path destino = new Path("/user/cloudera/data/testPractice.seq");//test args[1]
if ((fs.exists(destino))){
System.out.println("exist : " + destino);
return;
}
BZip2Codec codec=new BZip2Codec();
SequenceFile.Writer outSeq = SequenceFile.createWriter(conf
,SequenceFile.Writer.file(fs.makeQualified(destino))
,SequenceFile.Writer.compression(SequenceFile.CompressionType.BLOCK,codec)
,SequenceFile.Writer.keyClass(Text.class)
,SequenceFile.Writer.valueClass(FSDataInputStream.class));
FileStatus[] status = fs.globStatus(new Path("/user/cloudera/data/*.txt"));//args[0]
for (int i=0;i<status.length;i++){
FSDataInputStream in = fs.open(status[i].getPath());
outSeq.append(new org.apache.hadoop.io.Text(status[i].getPath().toString()), new FSDataInputStream(in));
fs.close();
}
outSeq.close();
System.out.println("End Program");
}catch(Exception e){
System.out.println(e.toString());
System.out.println("File not found");
}
}
}
But after of every execution I receive this exception:
java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.fs.FSDataInputStream'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization.
File not found
I understand the error must be in the type of the file I am creating and the type of object I define for adding to the sequenceFile, but I don't know which one should add, can anyone help me?

FSDataInputStream, like any other InputStream, is not intended to be serialized. What serializing an "iterator" over a stream of byte should do ?
What you most likely want to do, is to store the content of the file as the value. For example you can switch the value type from FsDataInputStream to BytesWritable and just get all the bytes out of the FSDataInputStream. One drawback of using Key/Value SequenceFile for a such purpose is that the content of each file has to fit in memory. It could be fine for small files but you have to be aware of this issue.
I am not sure what you are really trying to achieve but perhaps you could avoid reinventing the wheel by using something like Hadoop Archives ?

Thanks a lot by your comments, the problem was the serializer like you say, and finally I used BytesWritable:
FileStatus[] status = fs.globStatus(new Path("/user/cloudera/data/*.txt"));//args[0]
for (int i=0;i<status.length;i++){
FSDataInputStream in = fs.open(status[i].getPath());
byte[] content = new byte [(int)fs.getFileStatus(status[i].getPath()).getLen()];
outSeq.append(new org.apache.hadoop.io.Text(status[i].getPath().toString()), new org.apache.hadoop.io.BytesWritable(in));
}
outSeq.close();
Probably there are other better solutions in the hadoop ecosystem but this problem was an exercise of a degree I am developing, and for now We are remaking the wheel for understanding concepts ;-).

Java: How to I change the configuration file value in Java easily?

I have a config file, named config.txt, look like this.
IP=192.168.1.145
PORT=10022
URL=http://www.stackoverflow.com
I wanna change some value of the config file in Java, say the port to 10045. How can I achieve easily?
IP=192.168.1.145
PORT=10045
URL=http://www.stackoverflow.com
In my trial, i need to write lots of code to read every line, to find the PORT, delete the original 10022, and then rewrite 10045. my code is dummy and hard to read. Is there any convenient way in java?
Thanks a lot !

If you want something short you can use this.
public static void changeProperty(String filename, String key, String value) throws IOException {
Properties prop =new Properties();
prop.load(new FileInputStream(filename));
prop.setProperty(key, value);
prop.store(new FileOutputStream(filename),null);
}
Unfortunately it doesn't preserve the order or fields or any comments.
If you want to preserve order, reading a line at a time isn't so bad.
This untested code would keep comments, blank lines and order. It won't handle multi-line values.
public static void changeProperty(String filename, String key, String value) throws IOException {
final File tmpFile = new File(filename + ".tmp");
final File file = new File(filename);
PrintWriter pw = new PrintWriter(tmpFile);
BufferedReader br = new BufferedReader(new FileReader(file));
boolean found = false;
final String toAdd = key + '=' + value;
for (String line; (line = br.readLine()) != null; ) {
if (line.startsWith(key + '=')) {
line = toAdd;
found = true;
}
pw.println(line);
}
if (!found)
pw.println(toAdd);
br.close();
pw.close();
tmpFile.renameTo(file);
}

My suggestion would be to read the entire config file into memory (maybe into a list of (attribute:value) pair objects), do whatever processing you need to do (and consequently make any changes), then overwrite the original file with all the changes you have made.
For example, you could read the config file you have provided by line, use String.split("=") to separate the attribute:value pairs - making sure to name each pair read accordingly. Then make whatever changes you need, iterate over the pairs you have read in (and possibly modified), writing them back out to the file.
Of course, this approach would work best if you had a relatively small number of lines in your config file, that you can definitely know the format for.

this code work for me.
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Properties;
public void setProperties( String key, String value) throws IOException {
Properties prop = new Properties();
FileInputStream ip;
try {
ip = new FileInputStream("config.txt");
prop.load(ip);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
prop.setProperty(key, value);
PrintWriter pw = new PrintWriter("config.txt");
prop.store(pw, null);
}

Use the Properties class to load/save configuration. Then simply set the value and save it again.
Properties p = new Properties();
p.load(...);
p.put("key", "value");
p.save(...)
It's easy and straightforward.
As a side, if your application is a single application that does not need to scale to run on multiple computers, do not bother to use a database to save config. It is utter overkill. However, if you application needs real time config changes and needs to scale, Redis works pretty well to distribute config and handle the synchronization for you. I have used it for this purpose with great success.

Consider using java.util.Properties and it's load() and store() methods.
But remember that this would not preserve comments and extra line breaks in the file.
Also certain chars need to be escaped.

If you are open to use third party libraries, explore http://commons.apache.org/configuration/. It supports configurations in multiple format. Comments will be preserved as well. (Except for a minor bug -- apache-commons-config PropertiesConfiguration: comments after last property is lost)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

EDI to XML Huge file conversions - java

Related

Why is StandardOpenOption.CREATE_NEW not creating a file to be read from in Java?

try/catch failing incorrectly + console commands taking input but not running

How to serialize the data to AVRO schema in Spark (with Java)?

SequenceFile Compactor of several small files in only one file.seq

Java: How to I change the configuration file value in Java easily?

Categories

Resources