Write output in different directories - java

Are there any settings that I can use to make the output go in a separate timestamped (give a format) directory every time I run the job?
I use the following Scalding code to write my flow output:
val out = TypedPipe[MyType]
out.write(PackedAvroSource[MyType]("my/output/path"))
By default Scalding replaces the output in the my/output/path directory in HDFS. I'd like the output to go into a different my/output/path/MMDDyyyyHHmm/ path depending on when the job runs. I am about to write some utils to add a timestamp to the path myself by I'd rather use some existing ones if available.

Try to concatenate the date to the directory.
Date date = new Date();
String direct = "my/output/" + date.toString();
out.write(PackedAvroSource[MyType](direct));
For more information on date and time click here

You can use PartitionedDelimited sink to write to multiple directories. See comments in https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/PartitionedDelimitedSource.scala for more info.
This would preclude you from using AVRO format, but perhaps you could write PartitionedPackedAvro?

Related

How to get the first file and also the last modified file using java

How to get the first created file and also the last modified file using java.
I have written below snippet for getting last modified date and time , how can I get the date of first created file in a folder.
DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
long time = directory.lastModified();
System.out.println(sdf.format(time));
Java NIO and java.time
While the old File class does offer what we need here, I always use the more modern Java NIO (New IO, since Java 1.7) package for file operations. It offers many more operations, so just to make my code ready for any future requirements that File may not support. In some situations it is also nicer to work with, for example it provides some support for stream operations (since Java 1.8, obviously).
I very definitely recommend that you use java.time, the modern Java date and time API, for your date and time work.
The two recommendations go nicely hand in hand since a java.nio.file.attribute.FileTime has a toInstant method for converting to a java.time.Instant.
To find the file that was least recently modified (where its last modification is the longest time ago):
Path dp = Paths.get("/path/to/your/dir");
Optional<Path> firstModified = Files.list(dp)
.min(Comparator.comparing(f -> getLastModified(f)));
firstModified.ifPresentOrElse(
p -> System.out.println("" + p + " modified "
+ getLastModified(p).atZone(ZoneId.systemDefault())
.format(FORMATTER)),
() -> System.out.println("No files"));
Example output:
./useCount.txt modified 2016-12-26 15:11:54
The code uses this auxiliary method:
private static Instant getLastModified(Path p) {
try {
return Files.readAttributes(p, BasicFileAttributes.class).lastModifiedTime().toInstant();
} catch (IOException ioe) {
throw new IllegalStateException(ioe);
}
}
— and this formatter:
private static final DateTimeFormatter FORMATTER
= DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss", Locale.ROOT);
For the creation time instead of the last modification time use creationTime() in this line;
return Files.readAttributes(p, BasicFileAttributes.class).creationTime().toInstant();
For the file that has been modified last just use max() instead of min() in this line:
.max(Comparator.comparing(f -> getLastModified(f)));
./bin modified 2021-10-12 07:57:08
By the way directory.lastModified() used in your question gives you the last time the directory itself was modified. It is (usually) not the same time the last file in the directory was modified.
Tutorial links
Java NIO Tutorial on Jenkov
Oracle tutorial: Date Time explaining how to use java.time.

Is there a Java code to convert csv files into pbix?

We need a Java code which automatically converts csv files into pbix files, so they can be opened and further worked on in the PowerBI Desktop. Now, I know PowerBI offers this super cool feature, which converts csv files and many other formats into pbix manually. However, we need a function which automatically converts our reports directly into pbix, so that no intermediate files need to be created and stored somewhere.
We have already been able to develop a function with three parameters: The first one corresponds to the selected report, from our database; the second corresponds to the directory, in which the converted report should be generated; and finally the third one is the converted output file itself. The two first parameters work well and the code is able to generate a copy of any report we select into any directory we select. However, it is able to generate csv files only. Any other format will have the same size as the csv and won't be able to open.
This is what we've tried so far for the conversion part of the code:
Util.writeFile("C:\\" + "test.csv", byteString);
The above piece of code works just fine, however csv is not what we wanted, the original reports are already in csv format anyway.
Util.writeFile("C:\\" + "test.pbix", byteString);
Util.writeFile("C:\\" + "test.pdf", byteString);
Util.writeFile("C:\\" + "test.xlsx", byteString);
Each of the three lines above generates one file in the indicated format, however each of the generated files are just as large as its corresponding csv(but should be much larger) and therefore are unable to open.
File file = new File("C:\\" + "test1.csv");
File file2 = new File("C:\\" + "test1.pbix");
file.renameTo(file2);
The above piece of code does not generate any file at all, but I thought it could be worth mentioning it, as it doesn't throw any exception at all either.
P.S. We would also be interested in a java code which converts csv in any other BI reporting software besides PowerBI, like Tableau, BIRT, Knowage, etc.
P.S.2 The first piece of code uses objects of a class (sailpoint.tools.Util) which is apparently only available for those who have access to Sailpoint.

File wildcard use *

I am trying to read a file which has name: K2ssal.timestamp.
I want to handle the time stamp part of the file name as wildcard.
How can I achieve this ?
tried * after file name but not working.
var getK2SSal: Iterator[String] = Source.fromFile("C://Users/nrakhad/Desktop/Work/Data stage migration/Input files/K2Ssal.*").getLines()
You can use Files.newDirectoryStream with directory + glob:
import java.nio.file.{Paths, Files}
val yourFile = Files.newDirectoryStream(
Paths.get("/path/to/the/directory"), // where is the file?
"K2Ssal.*" // glob of the file name
).iterator.next // get first match
Misconception on your end: unless the library call is specifically implemented to do so, using a wildcard simply doesn't work like you expect it to.
Meaning: a file system doesn't know about wildcards. It only knows about existing files and folders. The fact that you can put * on certain commands, and that the wildcard is replaced with file names is a property of the tool(s) you are using. And most often, programming APIs that allow you to query the file system do not include that special wild card handling.
In other words: there is no sense in adding that asterisk like that.
You have to step back and write code that actively searches for files itself. Here are some examples for scala.
You can read the directory and filter on files based upon the string.
val l = new File("""C://Users/nrakhad/Desktop/Work/Data stage migration/Input files/""").listFiles
val s = l.filter(_.toString.contains("K2Ssal."))

Merging two .odt files from code

How do you merge two .odt files? Doing that by hand, opening each file and copying the content would work, but is unfeasable.
I have tried odttoolkit Simple API (simple-odf-0.8.1-incubating) to achieve that task, creating an empty TextDocument and merging everything into it:
private File masterFile = new File(...);
...
TextDocument t = TextDocument.newTextDocument();
t.save(masterFile);
...
for(File f : filesToMerge){
joinOdt(f);
}
...
void joinOdt(File joinee){
TextDocument master = (TextDocument) TextDocument.loadDocument(masterFile);
TextDocument slave = (TextDocument) TextDocument.loadDocument(joinee);
master.insertContentFromDocumentAfter(slave, master.getParagraphByReverseIndex(0, false), true);
master.save(masterFile);
}
And that works reasonably well, however it looses information about fonts - original files are a combination of Arial Narrow and Windings (for check boxes), output masterFile is all in TimesNewRoman. At first I suspected last parameter of insertContentFromDocumentAfter, but changing it to false breaks (almost) all formatting. Am I doing something wrong? Is there any other way?
I think this is "works as designed".
I tried this once with a global document, which imports documents and display them as is... as long as paragraph styles have different names !
Using same named templates are overwritten with the values the "master" document have.
So I ended up cloning standard styles with unique (per document) names.
HTH
Ma case was a rather simple one, files I wanted to merge were generated the same way and used the same basic formatting. Therefore, starting off of one of my files, instead of an empty document fixed my problem.
However this question will remain open until someone comes up with a more general solution to formatting retention (possibly based on ngulams answer and comments?).

how to append timestamp to file name for java.util.logging.FileHandler.pattern

Hi I was wondering if anyone knows a way to append a timestamp to the log file name specified through logging.properties java.util.logging.FileHandler.pattern
seems like something pretty straight forward but I cant seem to find a solution to this anywhere.
Thanks
I am afraid that just by configuration you can't set the file name in teh way you want.
Have a look at the code in FileHandler.generate() to convince you.
What you can do is write your own FileHandler that will handle this naming or switch to another log framework.
If you use java.util.logging, I wrote some years ago a Formatter & a Handler that can still be usefull, feel free to use.
You could instantiate the FileHandler in code with pattern, limit, count etc parameters.
So, the pattern string can be formed consisting of date and time.
Example Code:
String timeStamp = new SimpleDateFormat().format( new Date() );
FileHandler fh = new FileHandler( "./jay_log_%u.%g_" + timeStamp + ".log", 30000, 4);
logger.addHandler(fh);
You could use SLF4J which points again back to java.util.logging packages which seems to have this feature
http://javablog.co.uk/2008/07/12/logging-with-javautillogging/
alternatively for a no-third-party-frameworks approach you can use a CustomFormatter a sample of which is already available here,
http://www.kodejava.org/examples/458.html

Categories