I am creating a java.util.logging.FileHandler that is allowed to cycle through files. When multiple instances of my application are run, a new log file is created for each instance of the application. I need to know what file is being used by the application because I want to upload the log file to my servers for further review. How can I tell what file is being used by a certain FileHandler?
The easiest way is to put some kind of identifier in the file name itself, i.e. the pattern argument when you create the FileHandler. Since these are instances of the same application, one way to distinguish them is by their process id, so you could make that part of the pattern. A better approach is to pass in an identifier through the command line and use that to make your filename. That way you control the files being created in some sense. Finally, if your application has some knowledge of why it's different from all the others, for example it connects to a particular database server, then you could just use that database server name as part of the filename.
EDIT: There does not seem to be any API to get the name of the file being used by a FileHandler. I would suggest looking into the logging extensions in x4juli (which ports a lot of the log4j functionality to the java.util.logging specs):
http://www.x4juli.org/
You should be able to substitute an instance of their FileHandler which provides a getFile() method:
http://www.x4juli.org/api/org/x4juli/handlers/FileHandler.html
Actually, you could do this much simpler by simply extending FileHandler yourself. For example...
MyFileHandler.java:
import java.io.IOException;
import java.util.logging.FileHandler;
public class MyFileHandler extends FileHandler {
protected String _MyFileHandler_Patern;
public MyFileHandler(String pattern) throws IOException {
_MyFileHandler_Patern = pattern;
}
public String getMyFileHandlerPattern() {
return _MyFileHandler_Patern;
}
}
DeleteMe.java:
import java.io.IOException;
import java.util.logging.Handler;
import java.util.logging.Logger;
public class DeleteMe {
public static void main(String[] args) throws IOException {
Logger log = Logger.getLogger(DeleteMe.class.getName());
MyFileHandler output = new MyFileHandler("output.log");
log.addHandler(output);
for (Handler handler : log.getHandlers()) {
if (handler instanceof MyFileHandler) {
MyFileHandler x = (MyFileHandler) handler;
if ("output.log".equals(x.getMyFileHandlerPattern())) {
System.out.println("found hanlder writing to output.log");
}
}
}
}
}
OK, I do have to say that FileHandler not providing a way to determine the log file is seriously dumb.
I wound up writing a function called "chooseFile()" which searches /tmp for the next available log file name and returns that file. You can then pass the name of that file into new FileHandler().
/**
* Utility: select a log file. File is created immediately to reserve
* its name.
*/
static public File chooseFile(final String basename) throws IOException {
final int nameLen = basename.length();
File tmpDir = new File(System.getProperty("java.io.tmpdir"));
String[] logs = tmpDir.list(new FilenameFilter() {
public boolean accept(File d, String f) {
return f.startsWith(basename);
}
});
int count = 0;
if (logs.length > 0) {
for (String name : logs) {
int n = atoi(name.substring(nameLen));
if (n >= count) count = n + 1;
}
}
String filename = String.format("%s%d.log", basename, count);
File logFile = new File(tmpDir, filename);
logFile.createNewFile();
return logFile;
}
Here's my rather hacky way around it. It works for the default if you don't use any format strings, and should work if you use the g and u format strings in filename, but not the others.
public class FriendlyFileHandler extends FileHandler {
/***
* In order to ensure the most recent log file is the file this one owns,
* we flush before checking the directory for most recent file.
*
* But we must keep other log handlers from flushing in between and making
* a NEW recent file.
*/
private static Object[] flushLock = new Object[0];
private String pattern;
public FriendlyFileHandler(String pattern, int maxLogLengthInBytes, int count) throws IOException,
SecurityException {
super(pattern, maxLogLengthInBytes, count);
this.pattern = pattern;
}
/***
* Finds the most recent log file matching the pattern.
* This is just a guess - if you have a complicated pattern
* format it may not work.
*
* IMPORTANT: This log file is still in use. You must
* removeHandler() on the logger first, .close() this handler,
* then add a NEW handler to your logger. THEN, you can read
* the file.
*
* Currently supported format strings: g, u
*
* #return A File of the current log file, or null on error.
*/
public synchronized File getCurrentLogFile() {
synchronized(flushLock) {
// so the file has the most recent date on it.
flush();
final String patternRegex =
// handle incremental number formats
pattern.replaceAll("%[gu]", "\\d*") +
// handle default case where %g is appended to end
"(\\.\\d*)?$";
final Pattern re = Pattern.compile(patternRegex);
final Matcher matcher = re.matcher("");
// check all files in the directory where this log would be
final File basedir = new File(pattern).getParentFile();
final File[] logs = basedir.listFiles(new FileFilter() {
#Override
public boolean accept(final File pathname) {
// only get files that are part of the pattern
matcher.reset(pathname.getAbsolutePath());
return matcher.find();
}
});
return findMostRecentLog(logs);
}
}
private File findMostRecentLog(File[] logs) {
if (logs.length > 0) {
long mostRecentDate = 0;
int mostRecentIdx = 0;
for (int i = 0; i < logs.length; i++) {
final long d = logs[i].lastModified();
if (d >= mostRecentDate) {
mostRecentDate = d;
mostRecentIdx = i;
}
}
return logs[mostRecentIdx];
}
else {
return null;
}
}
#Override
public synchronized void flush() {
// only let one Handler flush at a time.
synchronized(flushLock) {
super.flush();
}
}
}
Related
I have class Artical:
first variable is code of artical, second variable is name of article and third is price of article.
public class Artical {
private final String codeOfArtical;
private final String nameOfArtical;
private double priceOfArtical;
public Artical(String codeOfArtical, String nameOfArtical, double priceOfArtical) {
this.codeOfArtical= codeOfArtical;
this.nameOfArtical= nameOfArtical;
this.priceOfArtical= priceOfArtical;
}
public void setPriceOfArtical(double priceOfArtical) {
this.priceOfArtical= priceOfArtical;
}
public String getCodeOfArtical() {
return codeOfArtical;
}
public String getNameOfArtical() {
return nameOfArtical;
}
public double getPriceOfArtical() {
return priceOfArtical;
}
}
I want in main class to write something like:
Artical a1 = new Artical("841740102156", "LG Monitor", 600.00);
new ShowArticalClass(a1).do();
new WriteArticalInFileClass(new File("baza.csv"), a1).do();
so that data in file will be written in format like this:
841740102156; Monitor LG; 600.00;
914918414989; Intel CPU; 250.00;
Those 2 classes ShowArticalClass and WriteArticalInFileClass arent important, those are abstract classes.*
So my question is: How do I set format to look like this, where every line is new Artical.
A very naive implementation can be the following:
Create a class that in turn creates a CSVWriter (assuming you want to write to a CSV). That class will expose a public method allowing you to pass in a path where the desired csv file lives as well as the Artical object you want to write to this file. Using that class you will format your data and write them to the file. An example of this could be:
public class CsvWriter {
private static final Object LOCK = new Object();
private static CsvWriter writer;
private CsvWriter() {}
public static CsvWriter getInstance() {
synchronized (LOCK) {
if (null == writer) {
writer = new CsvWriter();
}
return writer;
}
}
public void writeCsv(String filePath, Artical content) throws IOException {
try (var writer = createWriter(filePath)) {
writer.append(getDataline(content)).append("\n");
}
}
private String getDataline(Artical content) {
return String.join(",", content.getCode(), content.getName(), Double.toString(content.getPrice()));
}
private PrintWriter createWriter(String stringPath) throws IOException {
var path = Paths.get(stringPath);
try {
if (Files.exists(path)) {
System.out.printf("File under path %s exists. Will append to it%n", stringPath);
return new PrintWriter(new FileWriter(path.toFile(), true));
}
return new PrintWriter(path.toFile());
} catch (Exception e) {
System.out.println("An error has occurred while writing to a file");
throw e;
}
}
}
Note that this will take into account where the file provided is already in place (thus appending to it). In any other case the file will be created and written to directly.
Call this write method in a fashion similar to this:
public static void main(String... args) throws IOException {
var artical = new Artical("1", "Test", 10.10);
CsvWriter.getInstance().writeCsv("/tmp/test1.csv", artical);
var artical2 = new Artical("2", "Test", 11.14);
CsvWriter.getInstance().writeCsv("/tmp/test1.csv", artical2);
}
With that as a starting point you can go ahead and modify the code to be able to handle list of Artical objects.
If you really need to support CSV files though I would strongly recommend into looking at the various CSV related libraries that are out there instead of implementing your own code.
I need to get a list of all caller methods for a method of interest for me in Java. Is there a tool that can help me with this?
Edit: I forgot to mention that I need to do this from a program. I'm usig Java Pathfinder and I want to run it an all the methods that call my method of interest.
For analyzing bytecode, I would recommend ASM. Given a list of Classes to analyze, a visitor can be made which finds the method calls you're interested in. One implementation which analyses classes in a jar file is below.
Note that ASM uses internalNames with '/' instead of '.' as a separator. Specify the target method as a standard declaration without modifiers.
For example, to list methods that could be calling System.out.println("foo") in the java runtime jar:
java -cp "classes;asm-3.1.jar;asm-commons-3.1.jar" App \
c:/java/jdk/jre/lib/rt.jar \
java/io/PrintStream "void println(String)"
Edit: source and line numbers added: Note that this only indicates the last target method invocation per calling method - the original q only wanted to know which methods. I leave it as an exercise for the reader to show line numbers of the calling method declaration, or the line numbers of every target invocation, depending on what you're actually after. :)
results in:
LogSupport.java:44 com/sun/activation/registries/LogSupport log (Ljava/lang/String;)V
LogSupport.java:50 com/sun/activation/registries/LogSupport log (Ljava/lang/String;Ljava/lang/Throwable;)V
...
Throwable.java:498 java/lang/Throwable printStackTraceAsCause (Ljava/io/PrintStream;[Ljava/lang/StackTraceElement;)V
--
885 methods invoke java/io/PrintStream println (Ljava/lang/String;)V
source:
public class App {
private String targetClass;
private Method targetMethod;
private AppClassVisitor cv;
private ArrayList<Callee> callees = new ArrayList<Callee>();
private static class Callee {
String className;
String methodName;
String methodDesc;
String source;
int line;
public Callee(String cName, String mName, String mDesc, String src, int ln) {
className = cName; methodName = mName; methodDesc = mDesc; source = src; line = ln;
}
}
private class AppMethodVisitor extends MethodAdapter {
boolean callsTarget;
int line;
public AppMethodVisitor() { super(new EmptyVisitor()); }
public void visitMethodInsn(int opcode, String owner, String name, String desc) {
if (owner.equals(targetClass)
&& name.equals(targetMethod.getName())
&& desc.equals(targetMethod.getDescriptor())) {
callsTarget = true;
}
}
public void visitCode() {
callsTarget = false;
}
public void visitLineNumber(int line, Label start) {
this.line = line;
}
public void visitEnd() {
if (callsTarget)
callees.add(new Callee(cv.className, cv.methodName, cv.methodDesc,
cv.source, line));
}
}
private class AppClassVisitor extends ClassAdapter {
private AppMethodVisitor mv = new AppMethodVisitor();
public String source;
public String className;
public String methodName;
public String methodDesc;
public AppClassVisitor() { super(new EmptyVisitor()); }
public void visit(int version, int access, String name,
String signature, String superName, String[] interfaces) {
className = name;
}
public void visitSource(String source, String debug) {
this.source = source;
}
public MethodVisitor visitMethod(int access, String name,
String desc, String signature,
String[] exceptions) {
methodName = name;
methodDesc = desc;
return mv;
}
}
public void findCallingMethodsInJar(String jarPath, String targetClass,
String targetMethodDeclaration) throws Exception {
this.targetClass = targetClass;
this.targetMethod = Method.getMethod(targetMethodDeclaration);
this.cv = new AppClassVisitor();
JarFile jarFile = new JarFile(jarPath);
Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
JarEntry entry = entries.nextElement();
if (entry.getName().endsWith(".class")) {
InputStream stream = new BufferedInputStream(jarFile.getInputStream(entry), 1024);
ClassReader reader = new ClassReader(stream);
reader.accept(cv, 0);
stream.close();
}
}
}
public static void main( String[] args ) {
try {
App app = new App();
app.findCallingMethodsInJar(args[0], args[1], args[2]);
for (Callee c : app.callees) {
System.out.println(c.source+":"+c.line+" "+c.className+" "+c.methodName+" "+c.methodDesc);
}
System.out.println("--\n"+app.callees.size()+" methods invoke "+
app.targetClass+" "+
app.targetMethod.getName()+" "+app.targetMethod.getDescriptor());
} catch(Exception x) {
x.printStackTrace();
}
}
}
Edit: the original question was edited to indicate a runtime solution was needed - this answer was given before that edit and only indicates how to do it during development.
If you are using Eclipse you can right click the method and choose "Open call hierarchy" to get this information.
Updated after reading comments: Other IDEs support this as well in a similar fashion (at least Netbeans and IntelliJ do)
Annotate the method with #Deprecated ( or tag it with #deprecated ), turn on deprecation warnings, run your compile and see which warnings get triggered.
The run your compile bit can be done either by invoking an external ant process or by using the Java 6 compiler API.
right click on method
Go to references and (depending on your requirement)
choose workspace/project/Hierarchy.
This pops up a panel that shows all references to this functions. Eclipse FTW !
In eclipse, highlight the method name and then Ctrl+Shift+G
There isn't a way to do this (programmatically) via the Java reflection libraries - you can't ask a java.lang.reflect.Method "which methods do you call?"
That leaves two other options I can think of:
Static analysis of the source code. I'm sure this is what the Eclipse Java toolset does - you could look at the Eclipse source behind the JDT, and find what it does when you ask Eclipse to "Find References" to a method.
Bytecode analysis. You could inspect the bytecode for calls to the method. I'm not sure what libraries or examples are out there to help with this - but I can't imagine that something doesn't exist.
Yes, most modern IDE:s will let you either search for usages of a method or variable. Alternatively, you could use a debugger and set a trace point on the method entry, printing a stack trace or whatever every time the method is invoked.
Finally, you could use some simple shell util to just grep for the method, such as
find . -name '*.java' -exec grep -H methodName {} ;
The only method that will let you find invokations made through some reflection method, though, would be using the debugger.
I made a small example using #Chadwick's one. It's a test that assesses if calls to getDatabaseEngine() are made by methods that implement #Transaction.
/**
* Ensures that methods that call {#link DatabaseProvider#getDatabaseEngine()}
* implement the {#link #Transaction} annotation.
*
* #throws Exception If something occurs while testing.
*/
#Test
public void ensure() throws Exception {
final Method method = Method.getMethod(
DatabaseEngine.class.getCanonicalName() + " getDatabaseEngine()");
final ArrayList<java.lang.reflect.Method> faultyMethods = Lists.newArrayList();
for (Path p : getAllClasses()) {
try (InputStream stream = new BufferedInputStream(Files.newInputStream(p))) {
ClassReader reader = new ClassReader(stream);
reader.accept(new ClassAdapter(new EmptyVisitor()) {
#Override
public MethodVisitor visitMethod(final int access, final String name, final String desc, final String signature, final String[] exceptions) {
return new MethodAdapter(new EmptyVisitor()) {
#Override
public void visitMethodInsn(int opcode, String owner, String nameCode, String descCode) {
try {
final Class<?> klass = Class.forName(Type.getObjectType(owner).getClassName());
if (DatabaseProvider.class.isAssignableFrom(klass) &&
nameCode.equals(method.getName()) &&
descCode.equals(method.getDescriptor())) {
final java.lang.reflect.Method method = klass.getDeclaredMethod(name,
getParameters(desc).toArray(new Class[]{}));
for (Annotation annotation : method.getDeclaredAnnotations()) {
if (annotation.annotationType().equals(Transaction.class)) {
return;
}
}
faultyMethods.add(method);
}
} catch (Exception e) {
Throwables.propagate(e);
}
}
};
}
}, 0);
}
}
if (!faultyMethods.isEmpty()) {
fail("\n\nThe following methods must implement #Transaction because they're calling getDatabaseEngine().\n\n" + Joiner.on("\n").join
(faultyMethods) + "\n\n");
}
}
/**
* Gets all the classes from target.
*
* #return The list of classes.
* #throws IOException If something occurs while collecting those classes.
*/
private List<Path> getAllClasses() throws IOException {
final ImmutableList.Builder<Path> builder = new ImmutableList.Builder<>();
Files.walkFileTree(Paths.get("target", "classes"), new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(final Path file, final BasicFileAttributes attrs) throws IOException {
if (file.getFileName().toString().endsWith(".class")) {
builder.add(file);
}
return FileVisitResult.CONTINUE;
}
});
return builder.build();
}
/**
* Gets the list of parameters given the description.
*
* #param desc The method description.
* #return The list of parameters.
* #throws Exception If something occurs getting the parameters.
*/
private List<Class<?>> getParameters(String desc) throws Exception {
ImmutableList.Builder<Class<?>> obj = new ImmutableList.Builder<>();
for (Type type : Type.getArgumentTypes(desc)) {
obj.add(ClassUtils.getClass(type.getClassName()));
}
return obj.build();
}
1)In eclipse it is ->right click on the method and select open call hierarchy or CLT+ALT+H
2)In jdeveloper it is -> right click on the method and select calls or ALT+SHIFT+H
The closest that I could find was the method described in this StackOverflow questions selected answer.check this out
You can do this with something in your IDE such as "Find Usages" (which is what it is called in Netbeans and JDeveloper). A couple of things to note:
If your method implements a method from an interface or base class, you can only know that your method is POSSIBLY called.
A lot of Java frameworks use Reflection to call your method (IE Spring, Hibernate, JSF, etc), so be careful of that.
On the same note, your method could be called by some framework, reflectively or not, so again be careful.
I am using java.util.logging to generate log files. It make use of FileHandler to write write records into the log files.
The FileHandler class contains limit parameter to decide when to create new file in case size (in bytes) of the current log file has exceeded the limit.
Is there any way we can override the behavior of limiting the file handler to some other parameter instead of size? like - maximum N number of records in each log file. If there comes (N+1)th record a new file log file is generated.
In case can't be achieved using standard java.logging, is there any other open source implementing this behaviour (like log4j or any other opensource logger)?
The FileHandler can be extended to listen for a rotation by overriding the setLevel method. Then force the FileHandler to always rotate by setting the limit to one byte and then prevent the rotation from happening if your conditions are not met.
Here is a sample solution:
import java.io.File;
import java.io.IOException;
import java.util.logging.FileHandler;
import java.util.logging.Level;
import java.util.logging.LogRecord;
import java.util.logging.SimpleFormatter;
public class CountingFileHandler extends FileHandler {
private static final RuntimeException PREVENT_ROTATE = new RuntimeException();
private final long maxRecords;
private long count;
public CountingFileHandler(String pattern, long maxRecords, int files) throws IOException {
super(pattern, 1, files, false);
this.maxRecords = maxRecords;
}
#Override
public synchronized void setLevel(Level lvl) {
if (Level.OFF.equals(lvl)) { //Rotation sets the level to OFF.
if (++count < maxRecords) {
throw PREVENT_ROTATE;
}
count = 0L;
}
super.setLevel(lvl);
}
#Override
public synchronized void publish(LogRecord record) {
try {
super.publish(record);
} catch (RuntimeException re) {
if (re != PREVENT_ROTATE) {
throw re;
}
}
}
public static void main(String[] args) throws Exception {
System.out.println(new File(".").getCanonicalPath());
CountingFileHandler cfh = new CountingFileHandler("test%g.log", 2, 5);
cfh.setFormatter(new SimpleFormatter());
for (int i = 0; i < 10; i++) {
cfh.publish(new LogRecord(Level.SEVERE, Integer.toString(i)));
}
cfh.close();
}
}
Otherwise, if you just want some max limit for a single log file you can just install com.sun.mail.util.logging.DurationFilter with a duration of Long.MAX_VALUE. That filter is included in the javax.mail.jar or the logging-mailhandler.jar. This solution won't provide the rotations you want.
I'm looking for a way to read ENTIRE files so that every file will be read entirely to a single String.
I want to pass a pattern of JSON text files on gs://my_bucket/*/*.json, have a ParDo then process each and every file entirely.
What's the best approach to it?
I am going to give the most generally useful answer, even though there are special cases [1] where you might do something different.
I think what you want to do is to define a new subclass of FileBasedSource and use Read.from(<source>). Your source will also include a subclass of FileBasedReader; the source contains the configuration data and the reader actually does the reading.
I think a full description of the API is best left to the Javadoc, but I will highlight the key override points and how they relate to your needs:
FileBasedSource#isSplittable() you will want to override and return false. This will indicate that there is no intra-file splitting.
FileBasedSource#createForSubrangeOfFile(String, long, long) you will override to return a sub-source for just the file specified.
FileBasedSource#createSingleFileReader() you will override to produce a FileBasedReader for the current file (the method should assume it is already split to the level of a single file).
To implement the reader:
FileBasedReader#startReading(...) you will override to do nothing; the framework will already have opened the file for you, and it will close it.
FileBasedReader#readNextRecord() you will override to read the entire file as a single element.
[1] One example easy special case is when you actually have a small number of files, you can expand them prior to job submission, and they all take the same amount of time to process. Then you can just use Create.of(expand(<glob>)) followed by ParDo(<read a file>).
Was looking for similar solution myself. Following Kenn's recommendations and few other references such as XMLSource.java, created the following custom source which seems to be working fine.
I am not a developer so if anyone has suggestions on how to improve it, please feel free to contribute.
public class FileIO {
// Match TextIO.
public static Read.Bounded<KV<String,String>> readFilepattern(String filepattern) {
return Read.from(new FileSource(filepattern, 1));
}
public static class FileSource extends FileBasedSource<KV<String,String>> {
private String filename = null;
public FileSource(String fileOrPattern, long minBundleSize) {
super(fileOrPattern, minBundleSize);
}
public FileSource(String filename, long minBundleSize, long startOffset, long endOffset) {
super(filename, minBundleSize, startOffset, endOffset);
this.filename = filename;
}
// This will indicate that there is no intra-file splitting.
#Override
public boolean isSplittable(){
return false;
}
#Override
public boolean producesSortedKeys(PipelineOptions options) throws Exception {
return false;
}
#Override
public void validate() {}
#Override
public Coder<KV<String,String>> getDefaultOutputCoder() {
return KvCoder.of(StringUtf8Coder.of(),StringUtf8Coder.of());
}
#Override
public FileBasedSource<KV<String,String>> createForSubrangeOfFile(String fileName, long start, long end) {
return new FileSource(fileName, getMinBundleSize(), start, end);
}
#Override
public FileBasedReader<KV<String,String>> createSingleFileReader(PipelineOptions options) {
return new FileReader(this);
}
}
/**
* A reader that should read entire file of text from a {#link FileSource}.
*/
private static class FileReader extends FileBasedSource.FileBasedReader<KV<String,String>> {
private static final Logger LOG = LoggerFactory.getLogger(FileReader.class);
private ReadableByteChannel channel = null;
private long nextOffset = 0;
private long currentOffset = 0;
private boolean isAtSplitPoint = false;
private final ByteBuffer buf;
private static final int BUF_SIZE = 1024;
private KV<String,String> currentValue = null;
private String filename;
public FileReader(FileSource source) {
super(source);
buf = ByteBuffer.allocate(BUF_SIZE);
buf.flip();
this.filename = source.filename;
}
private int readFile(ByteArrayOutputStream out) throws IOException {
int byteCount = 0;
while (true) {
if (!buf.hasRemaining()) {
buf.clear();
int read = channel.read(buf);
if (read < 0) {
break;
}
buf.flip();
}
byte b = buf.get();
byteCount++;
out.write(b);
}
return byteCount;
}
#Override
protected void startReading(ReadableByteChannel channel) throws IOException {
this.channel = channel;
}
#Override
protected boolean readNextRecord() throws IOException {
currentOffset = nextOffset;
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int offsetAdjustment = readFile(buf);
if (offsetAdjustment == 0) {
// EOF
return false;
}
nextOffset += offsetAdjustment;
isAtSplitPoint = true;
currentValue = KV.of(this.filename,CoderUtils.decodeFromByteArray(StringUtf8Coder.of(), buf.toByteArray()));
return true;
}
#Override
protected boolean isAtSplitPoint() {
return isAtSplitPoint;
}
#Override
protected long getCurrentOffset() {
return currentOffset;
}
#Override
public KV<String,String> getCurrent() throws NoSuchElementException {
return currentValue;
}
}
}
A much simpler method is to generate the list of filenames and write a function to process each file individually. I'm showing Python, but Java is similar:
def generate_filenames():
for shard in xrange(0, 300):
yield 'gs://bucket/some/dir/myfilname-%05d-of-00300' % shard
with beam.Pipeline(...) as p:
(p | generate_filenames()
| beam.FlatMap(lambda filename: readfile(filename))
| ...)
FileIO does that for you without the need to implement your own FileBasedSource.
Create matches for each of the files that you want to read:
mypipeline.apply("Read files from GCS", FileIO.match().filepattern("gs://mybucket/myfilles/*.txt"))
Also, you can read like this if do not want Dataflow to throw exceptions when no file is found for your filePattern:
mypipeline.apply("Read files from GCS", FileIO.match().filepattern("gs://mybucket/myfilles/*.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
Read your matches using FileIO:
.apply("Read file matches", FileIO.readMatches())
The above code returns a PCollection of the type FileIO.ReadableFile (PCollection<FileIO.ReadableFile>). Then you create a DoFn that process these ReadableFiles to meet your use case.
.apply("Process my files", ParDo.of(MyCustomDoFnToProcessFiles.create()))
You can read the entire documentation for FileIO here.
I'm learning design patterns and at this stage, I'm playing around with creational patterns, Factory Method to be precise.
The idea is to have a Factory Method to create either documents or links(symlinks) in a "virtual" filesystem.
Would someone be willing to take a look at the code and advice me and guide me to the right path ?
FileFactory.java
public final class FileFactory {
public FileFactory(){}
public static IFileFactory createSoftLink(){
return new SymbolicLinkFactory();
}
public static IFileFactory createDocument(){
return new DocumentFileFactory();
}
Interface IFileFactory.java
public interface IFileFactory {
FileSystemElement createFile (String name, String mimeType, String currentDirectory, String user) throws IOException;}
DocumentFileFactory.java
public class DocumentFileFactory implements IFileFactory {
protected DocumentFileFactory() {}
#Override
public mFile createFile (String name, String mimeType, String currentDirectory, String user) throws IOException {
String fName = name;
if (mimeType.equalsIgnoreCase("docx")) {
fName += ".docx";
}else if (mimeType.equalsIgnoreCase("pptx")) {
fName += ".pptx";
}else if (mimeType.equalsIgnoreCase("xlsx")) {
fName += ".xlsx";
}else if (mimeType.equalsIgnoreCase("docm")) {
fName += ".docm";
}else if (mimeType.equalsIgnoreCase("pptm")) {
fName += ".pptm";
}else if (mimeType.equalsIgnoreCase("xlxm")) {
fName += ".xlxm";
}else {
fName += "."+mimeType;
}
mFile file = new mFile();
file.rename(fName);
file.create(fName, currentDirectory, user);
Path filePath = Paths.get(currentDirectory+System.getProperty("file.separator")+fName);
file.setPath(filePath);
return file;
}
SymbolicLinkFactory.java
public class SymbolicLinkFactory implements IFileFactory {
private FileSystemElement fsElement;
protected SymbolicLinkFactory() {}
#Override
public FileSystemElement createFile(String name, String mimeType, String currentDirectory, String user) throws IOException {
SoftLink sl = new SoftLink(name, fsElement, fsElement.getPath(), Paths.get(currentDirectory));
return sl;
}
public void setFileSystemElement(FileSystemElement fsElement) {
this.fsElement = fsElement;
}
In general it looks fine to me. It would be better if you had some logic that showd why you would use the factory pattern, for example something that checks if the document you want to create already exists, and returns a softlink creator if it does.
As mentioned by Josh, the implementation of createFile is a bit confusing, and diverts attention from your problem/goal.
As the code stands, there is no need for the FileFactory class. All it does is to centralize/shortcut to your implementations of the IFileFactory interface.
There are two other common use cases for factory class that might be relevant to your situation:
Dependency injection: Hide which implementation is actually used
inside the FileFactory class, allow to switch implementation at
startup/runtime
Factory/class families: Here you are only creating
one instance. Imagine instead that you had two kinds of items - file
item and folder. In that case it's important that a "file item" you
create is of the same kind as your "folder", for exampl eso a file
system file goes in a file system folder, and a Wiki page goes in a
Wiki URL.