We have an application based on the netbeans rich client platform.
Default behaviour is that only one instance of the software can run, and this can be overridden by specifying a different user-dir as argument when starting the application.
Are there any alternatives to this? Our clients rely on being able to run several instances of the application.
(We would prefer if we didnt have to implement our own launcher that examines netbeans userdir for locks to pass alternative userdirs)
Is there a build-in way in the netbeans RCP to easily support multiple instances?
cheers
/B
So, unable to find a proper way to do this (seems like it will be available with nb 7 though through a new launcher input parameter), my workmate implemented it in a somewhat hackish way.
He created an alternative main-class, which checks and manipulates the user-dir property so each instance gets its own userdir.
Im still surprised that netbeans does not have this functionality!
import java.io.File;
import org.netbeans.Main;
/**
* Launches Netbeans Rich Client Platform application where mulitple instances of
* the application can run simultaneously for the same user.
* <p>
* Netbeans does not allow to run several instances of an application for the same
* user simultaneously, but this class does the job by fooling Netbeans.
* If an instance is already running then a new folder is created and used for this instance.
* </p>
* <p>
* This is quite tricky and this class is indeed a hack to override Netbeans behaviour.
* An example makes it a little easier to understand, when application is first started
* Netbeans userdir is set from a configuration file like {#code etc/xxxxx.conf} to something like:<br>
* {#code /home/username/.xxxxx/dev/} which includes a lock file.
* <br>
* If application is started again then this lock file is checked and Netbeans tries to connect to the other instance through a socket.
* This class does the trick by never using this folder but instead creating unique directories for each instance like:<br>
* {#code /home/username/.xxxxx/instance_01/netbeans/}<br>
* {#code /home/username/.xxxxx/instance_02/netbeans/}<br>
* {#code /home/username/.xxxxx/instance_03/netbeans/}<br>
* ...
* </p>
*
*/
public class MultipleInstancesMain
{
/** Key for Netbeans default user dir */
private static final String NETBEANS_USER = "netbeans.user";
/** Argument to Netbeans for alternate user dir */
private static final String USERDIR_ARG = "--userdir";
/** Like "C:\Documents and Settings\username\Application Data\.xxxxx" or "/home/username/.xxxxx" */
private static final File MAIN_DIR = getMainDir();
/** Sub dir of MAIN_DIR for each instance of application */
private static final String INSTANCE_DIR = "instance";
/** Sub dir of INSTANCE_DIR where netbeans stores it's settings, logs and cache */
private static final String NETBEANS_SUBDIR = "netbeans";
/** Name of Netbeans lock file inside of NETBEANS_SUBDIR */
private static final String LOCKFILE = "lock";
/** Max number of instance directories we allow */
private static final int MAX_DIR_COUNT = 999;
/** Padding of instance dir */
private static final String PAD = "000";
private static final int PAD_LENGTH = PAD.length();
/**
* Lock file could be present even though application is not running (after crash).
* So we treat it as "dead" after some time, to prevent "dead directories".
*/
private static final long LOCKFILE_MAX_AGE_IN_MILLIS = 1000L * 60L * 60L * 24L * 30L; // 30 days
/**
* Launches application with multiple instances support.
* #param args command line arguments
*/
public static void main(String[] args) throws Exception
{
// Get directory for this instance
String[] userDir = new String[2];
userDir[0] = USERDIR_ARG;
userDir[1] = getNetbeansDir();
// Remove default dir and set this class not to run again
deleteDefaultNetbeansDir();
System.clearProperty("netbeans.mainclass");
// Start Netbeans again with new userdir and default main class
startNetbeans(args, userDir);
}
/**
* Starts Netbeans.
* #param oldArgs command line arguments
* #param newArgs new arguments added
*/
private static void startNetbeans(String[] oldArgs, String[] newArgs) throws Exception
{
String[] args = new String[oldArgs.length + newArgs.length];
for (int i = 0; i <oldArgs.length; i++)
{
args[i] = oldArgs[i];
}
for (int i = 0; i < newArgs.length; i++)
{
args[oldArgs.length + i] = newArgs[i];
}
Main.main(args);
}
/**
* #return the directory that Netbeans will use for this instance of the application
*/
private static String getNetbeansDir()
{
for(int i = 1; i <= MAX_DIR_COUNT; i++)
{
File instanceDir = getSuffixedInstanceDir(i);
if (isLockFileFree(instanceDir))
{
File dirToUse = new File(instanceDir, NETBEANS_SUBDIR);
return dirToUse.getAbsolutePath();
}
}
// This would probably never happen, but we don't want an eternal loop above
String errorMessage = String.format("Unable to find Netbeans userdir, %s dirs checked in '%s'",
MAX_DIR_COUNT, MAIN_DIR.getAbsolutePath());
throw new RuntimeException(errorMessage);
}
private static File getSuffixedInstanceDir(int count)
{
String suffix = PAD + count;
suffix = suffix.substring(suffix.length() - PAD_LENGTH);
File suffixedDir = new File(MAIN_DIR, INSTANCE_DIR + "_" + suffix);
return suffixedDir;
}
/**
* Checks is if Netbeans lock file is free.
* #param instanceDir directory to look for Netbeans directory and lock file in
* #return true if {#code instanceDir} does not have a Netbeans folder with a occupied lock file
*/
private static boolean isLockFileFree(File instanceDir)
{
File netbeansDir = new File(instanceDir, NETBEANS_SUBDIR);
File lockFile = new File(netbeansDir, LOCKFILE);
if (lockFile.exists())
{
return deleteLockFileIfOldEnough(lockFile);
}
else
{
return true;
}
}
/**
* Deletes the lock file if it's old enough.
* #param lockFile lock file to delete
* #return true if it was deleted
* #see #LOCKFILE_MAX_AGE_IN_MILLIS
*/
private static boolean deleteLockFileIfOldEnough(File lockFile)
{
long currentTime = System.currentTimeMillis();
long fileCreated = lockFile.lastModified();
long ageInMillis = currentTime - fileCreated;
if (ageInMillis > LOCKFILE_MAX_AGE_IN_MILLIS)
{
return lockFile.delete();
}
else
{
return false;
}
}
/**
* Netbeans is started with a default userdir, but we need to have a unique dir for each instance.
* Main dir is the directory where directories of all instances are.
* #return main directory in users home area where application settings, logs and cache is stored
*/
private static File getMainDir()
{
String defaultNetbeansDir = System.getProperty(NETBEANS_USER);
File mainDir = new File(defaultNetbeansDir).getParentFile();
return mainDir;
}
/**
* Since we don't use default Netbeans dir at all, we remove it completely.
*/
private static void deleteDefaultNetbeansDir()
{
File defaultNetbeansDir = new File(System.getProperty(NETBEANS_USER));
Thread t = new Thread(new DirectoryDeleter(defaultNetbeansDir), "NetbeansDirDeleter");
t.start();
}
/**
* There are unpredictable behaviour when deleting Netbeans default directory, sometime it succeeds and sometimes not.
* But after some attempts it always succeeds, by deleting it in the background it will eventually be deleted.
* #author username
*/
private static class DirectoryDeleter implements Runnable
{
private static final long SLEEP_TIME = 3000;
private final File dirToDelete;
DirectoryDeleter(File dirToDelete)
{
this.dirToDelete = dirToDelete;
}
/**
* #see java.lang.Runnable#run()
*/
public void run()
{
while(!deleteDirOrFile(dirToDelete))
{
try
{
Thread.sleep(SLEEP_TIME);
}
catch (InterruptedException e)
{
// No idea to do anything here, should never happen anyway...
continue;
}
}
}
/**
* Deletes a file or directory
* #param dirFile directory or file to delete
* #return true if deletion succeeded
*/
private boolean deleteDirOrFile(File dirFile)
{
if (dirFile.isDirectory())
{
for (File f : dirFile.listFiles())
{
boolean deleted = deleteDirOrFile(f);
if (!deleted)
{
return false;
}
}
}
// The directory is now empty so delete it
return dirFile.delete();
}
}
}
I'm the mysterious developer who wrote the MultipleInstancesMain class above =)
As answer to the questions by Swati Sharma and uvaraj above, these notes might help:
(1) The alternative main class above should work for applications built on Netbeans Platform 6.5.1 (our does), other versions I don't know about.
(2) We keep this class in a tiny project that is built by the main build.xml ant script of our application (the one in the "suite" module). The jar is then copied to the /[appname]/modules/ folder (so it's on Netbeans classpath)
(3) The separate project can NOT be a Netbeans module, I guess there will be cyclic dependency or Netbeans classloader can't handle it or something like that.
(4) Our application is then started by adding parameter to configuration file /[appname]/etc/.conf:
default_options="-J-Dnetbeans.mainclass=com.appname.startup.MultipleInstancesMain"
Hope it helps
// Uhlén
Related
In android 10 and in android 11 how to load all files from internal and external storage. Actually I want to load a specific extension files (i.e.pdf, txt) within my app, either the specific extension file is saved in internal or external storage.
I have used the following code which is working fine upto android 8.1 API Level 26
I have taken the StorageUtil class from internet, which will load all the possible physical path
public class StorageUtil {
// Primary physical SD-CARD (not emulated)
private static final String EXTERNAL_STORAGE = System.getenv("EXTERNAL_STORAGE");
// All Secondary SD-CARDs (all exclude primary) separated by File.pathSeparator, i.e: ":", ";"
private static final String SECONDARY_STORAGES = System.getenv("SECONDARY_STORAGE");
// Primary emulated SD-CARD
private static final String EMULATED_STORAGE_TARGET = System.getenv("EMULATED_STORAGE_TARGET");
// PhysicalPaths based on phone model
#SuppressLint("SdCardPath")
#SuppressWarnings("SpellCheckingInspection")
private static final String[] KNOWN_PHYSICAL_PATHS = new String[]{
"/storage/sdcard0",
"/storage/sdcard1", //Motorola Xoom
"/storage/extsdcard", //Samsung SGS3
"/storage/sdcard0/external_sdcard", //User request
"/mnt/extsdcard",
"/mnt/sdcard/external_sd", //Samsung galaxy family
"/mnt/sdcard/ext_sd",
"/mnt/external_sd",
"/mnt/media_rw/sdcard1", //4.4.2 on CyanogenMod S3
"/removable/microsd", //Asus transformer prime
"/mnt/emmc",
"/storage/external_SD", //LG
"/storage/ext_sd", //HTC One Max
"/storage/removable/sdcard1", //Sony Xperia Z1
"/data/sdext",
"/data/sdext2",
"/data/sdext3",
"/data/sdext4",
"/sdcard1", //Sony Xperia Z
"/sdcard2", //HTC One M8s
"/storage/microsd" //ASUS ZenFone 2
};
/**
* Returns all available storages in the system (include emulated)
* <p/>
* Warning: Hack! Based on Android source code of version 4.3 (API 18)
* Because there is no standard way to get it.
*
* #return paths to all available storages in the system (include emulated)
*/
public static String[] getStorageDirectories(Context context) {
// Final set of paths
final Set<String> availableDirectoriesSet = new HashSet<>();
if (!TextUtils.isEmpty(EMULATED_STORAGE_TARGET)) {
// Device has an emulated storage
availableDirectoriesSet.add(getEmulatedStorageTarget());
} else {
// Device doesn't have an emulated storage
availableDirectoriesSet.addAll(getExternalStorage(context));
}
// Add all secondary storages
Collections.addAll(availableDirectoriesSet, getAllSecondaryStorages());
String[] storagesArray = new String[availableDirectoriesSet.size()];
return availableDirectoriesSet.toArray(storagesArray);
}
private static Set<String> getExternalStorage(Context context) {
final Set<String> availableDirectoriesSet = new HashSet<>();
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
// Solution of empty raw emulated storage for android version >= marshmallow
// because the EXTERNAL_STORAGE become something like: "/Storage/A5F9-15F4",
// so we can't access it directly
File[] files = getExternalFilesDirs(context, null);
for (File file : files) {
if (file != null) {
String applicationSpecificAbsolutePath = file.getAbsolutePath();
String rootPath = applicationSpecificAbsolutePath.substring(
0,
applicationSpecificAbsolutePath.indexOf("Android/data")
);
availableDirectoriesSet.add(rootPath);
}
}
} else {
if (TextUtils.isEmpty(EXTERNAL_STORAGE)) {
availableDirectoriesSet.addAll(getAvailablePhysicalPaths());
} else {
// Device has physical external storage; use plain paths.
availableDirectoriesSet.add(EXTERNAL_STORAGE);
}
}
return availableDirectoriesSet;
}
private static String getEmulatedStorageTarget() {
String rawStorageId = "";
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.JELLY_BEAN_MR1) {
// External storage paths should have storageId in the last segment
// i.e: "/storage/emulated/storageId" where storageId is 0, 1, 2, ...
final String path = Environment.getExternalStorageDirectory().getAbsolutePath();
final String[] folders = path.split(File.separator);
final String lastSegment = folders[folders.length - 1];
if (!TextUtils.isEmpty(lastSegment) && TextUtils.isDigitsOnly(lastSegment)) {
rawStorageId = lastSegment;
}
}
if (TextUtils.isEmpty(rawStorageId)) {
return EMULATED_STORAGE_TARGET;
} else {
return EMULATED_STORAGE_TARGET + File.separator + rawStorageId;
}
}
private static String[] getAllSecondaryStorages() {
if (!TextUtils.isEmpty(SECONDARY_STORAGES)) {
// All Secondary SD-CARDs split into array
return SECONDARY_STORAGES.split(File.pathSeparator);
}
return new String[0];
}
/**
* Filter available physical paths from known physical paths
*
* #return List of available physical paths from current device
*/
private static List<String> getAvailablePhysicalPaths() {
List<String> availablePhysicalPaths = new ArrayList<>();
for (String physicalPath : KNOWN_PHYSICAL_PATHS) {
File file = new File(physicalPath);
if (file.exists()) {
availablePhysicalPaths.add(physicalPath);
}
}
return availablePhysicalPaths;
}
/**
* Returns absolute paths to application-specific directories on all
* external storage devices where the application can place persistent files
* it owns. These files are internal to the application, and not typically
* visible to the user as media.
* <p>
* This is like {#link Context#getFilesDir()} in that these files will be
* deleted when the application is uninstalled, however there are some
* important differences:
* <ul>
* <li>External files are not always available: they will disappear if the
* user mounts the external storage on a computer or removes it.
* <li>There is no security enforced with these files.
* </ul>
* <p>
* External storage devices returned here are considered a permanent part of
* the device, including both emulated external storage and physical media
* slots, such as SD cards in a battery compartment. The returned paths do
* not include transient devices, such as USB flash drives.
* <p>
* An application may store data on any or all of the returned devices. For
* example, an app may choose to store large files on the device with the
* most available space, as measured by {#link android.os.StatFs}.
* <p>
* Starting in {#link Build.VERSION_CODES#KITKAT}, no permissions
* are required to write to the returned paths; they're always accessible to
* the calling app. Before then,
* {#link android.Manifest.permission#WRITE_EXTERNAL_STORAGE} is required to
* write. Write access outside of these paths on secondary external storage
* devices is not available. To request external storage access in a
* backwards compatible way, consider using {#code android:maxSdkVersion}
* like this:
*
* <pre class="prettyprint"><uses-permission
* android:name="android.permission.WRITE_EXTERNAL_STORAGE"
* android:maxSdkVersion="18" /></pre>
* <p>
* The first path returned is the same as
* {#link Context#getExternalFilesDir(String)}. Returned paths may be
* {#code null} if a storage device is unavailable.
*
* #see Context#getExternalFilesDir(String)
*/
private static File[] getExternalFilesDirs(Context context, String type) {
if (Build.VERSION.SDK_INT >= 19) {
return context.getExternalFilesDirs(type);
} else {
return new File[]{context.getExternalFilesDir(type)};
}
}
And loadDirectoryFiles function which will load all the files within specific directory
object LoadDirectory {
fun loadDirectoryFiles(directory: File) {
try {
val fileList: Array<File> = directory.listFiles()
if (fileList.isNotEmpty()) {
for (i in fileList.indices) {
if (fileList[i].isDirectory) {
loadDirectoryFiles(fileList[i])
} else {
val name: String = fileList[i].name.toLowerCase(Locale.getDefault())
//Now look for desired extension
if (name.endsWith(Constants.PDF_EXTENSION)) {
Constants.allFileList.add(fileList[i])
}
}
}
}
}
catch (e : Exception){
Log.d("TAG","Error: ${e.message} and directory: ${directory.absoluteFile}")
}
}
}
and here its usage
val storagePaths = StorageUtil.getStorageDirectories(this)
for (path in storagePaths) {
val storageFile = File(path)
LoadDirectory.loadDirectoryFiles(storageFile)
}
The above code is working fine upto android 8.1 but is not working on above android 8.1, Now how to achieve this by loading a specific extension (i.e. pdf) files from internal and external storage
Thanks in advance for helping me
I am learning Java (SE 8) using the Eclipse IDE (Oxygen) on Windows. I have done some "hobby" programming before but this is my first formal class in the subject. I want to be able to print the output that the assignment requires in the normal console (System.out.println) and print informational what's happening text into a different console at the same time.
In pseudocode what I want is something like:
printToConsole1("Normal program outputs.");
printToConsole2("Behind the scenes information.");
Can I do anything like this in Java?
Using a logging framework like Logback or Log4j2, you can write messages at specific logging levels or from different loggers to two separate files. In this vein, you can accomplish what you're looking for; you would write out the "normal" output to either the console or one file, and the behind-the-scenes information to a separate file.
How to configure these in your application is well outside the scope of Stack Overflow, but I strongly encourage you to peruse them.
Hey, there's no one saying you can't have two terminal windows open, y'know?
You can print to standard err (System.err.println()) and my IDE (which is not Eclipse) will format those lines differently -- they show up in red, making them easy spot.
Regarding swing-console, no I was thinking of something different. Namely just using the logging framework as Makoto suggested, and then just using a log handler that will display log lines to a window. The following is a short example I created.
/**
* A log handler which creates a visible window for logging output.
*
* <p>A WindowHandler is a standard log handler which may be utilized
* in just the same was as any other log handler.</p>
*
* <p>In addition, a convenience static method is provided, WindowHandler.show(),
* which will programmatically install a root handler and display a window,
* allowing debugging output to be easily captured and viewed on screen.</p>
*
* #author Brenden
*/
public class WindowHandler extends java.util.logging.Handler {
// These static globals must be accessed on the EDT only!
private static WindowHandler globalHandler;
private static final ArrayList<WeakReference<WindowHandler>> windowList =
new ArrayList<>();
private JTextArea text;
private JFrame logWindow;
/**
* Returns the window component (usually a JFrame) associated with this
* WindowHandler.
*
* #return The top-level window for this handler.
*/
public Component getWindow() {
return logWindow;
}
/**
* Returns the JTextArea associated with the logging output captured by
* this handler.
*
* #return A JTextArea where logging is displayed.
*/
public JTextArea getTextArea() {
return text;
}
/**
* A list of ALL WindowHandler created so far in the system.
*
* <p>This list is useful for getting a list of windows on screen, for
* example, and then performing some operation on these windows. For
* example, hiding all windows, or closing them, or tiling them on screen
* for easier viewing.</p>
*
* #return An array list of all current window handlers.
*/
public static ArrayList<WindowHandler> getHandlers() {
ArrayList<WindowHandler> retVal = new ArrayList<>();
for( WeakReference<WindowHandler> wr : windowList ) {
WindowHandler h = wr.get();
if( h != null ) retVal.add( h );
}
return retVal;
}
/**
* A convenience method for starting an instance of a WindowHandler.
*
* <p>Call this method at the beginning of your program. After calling
* this method, all debugging output will be captured by the
* WindowHandler. Note that individual loggers still have their respective
* log levels. WindowHandler is set to Level.ALL by default, but won't
* receive output from loggers that have their own levels set too high
* to capture output.</p>
*/
public static synchronized void show() {
utilEDTWait( () -> {
if( globalHandler == null ) {
Logger root = Logger.getLogger( "" );
root.addHandler( globalHandler = new WindowHandler() );
} else {
globalHandler.getWindow().setVisible( true );
}
} );
}
/**
* Creates a new WindowHandler.
*
* <p>This method creates a new handler, sets its level to ALL, and creates
* the necessary Swing components on the EDT, and displays those components
* as well. After creation, this handler may still be configured as normal
* for any other logging handler.</p>
*/
public WindowHandler() {
utilEDTWait( () -> {
setLevel( Level.ALL );
JFrame frame = new JFrame( "DEBUG" );
text = new JTextArea();
text.setEditable( false );
text.setPreferredSize( new Dimension( 300, 300 ) );
JScrollPane scroll = new JScrollPane( text );
frame.add( scroll );
frame.setDefaultCloseOperation( JFrame.HIDE_ON_CLOSE );
frame.pack();
frame.setLocation( windowList.size()*20, windowList.size()*20 );
frame.setVisible( true );
logWindow = frame;
windowList.add( new WeakReference<>( this ) );
} );
}
#Override
public void publish( LogRecord record ) {
SwingUtilities.invokeLater( () -> {
StringBuilder log = new StringBuilder( text.getText() );
log.append( record.getMessage() );
log.append( " -- " );
log.append( record.getLoggerName() );
log.append( " : " );
log.append( record.getLevel() );
log.append( " // " );
log.append( new Date( record.getMillis() ) );
log.append( "\n" );
text.setText( log.toString() );
} );
}
#Override
public void flush() {
}
#Override
public void close() throws SecurityException {
}
private static void utilEDTWait( Runnable r ) {
if( SwingUtilities.isEventDispatchThread() ) {
r.run();
} else {
try {
SwingUtilities.invokeAndWait( r );
} catch( InterruptedException ex ) {
Thread.currentThread().interrupt();
throw new RuntimeException( ex );
} catch( InvocationTargetException ex ) {
Logger.getLogger( WindowHandler.class.getName() ).log( Level.SEVERE, null, ex );
}
}
}
}
No. Your Java Application has the same limitations as any process: a standard input (System.in), a standard output (System.out), and standard error output (System.err), and both the regular and error output are displayed in the same Console. They are, however, displayed using different colors. Typically you would either wrap invocations of one using a master boolean value as an on/off switch, or use a logging library of some kind that would make the two kinds of output easier to differentiate.
Separating the two into consoles of their own would make for an interesting feature request.
I'm having a difficult time understanding the concepts of .withFileNamePolicy of TextIO.write(). The requirements for supplying a FileNamePolicy seem incredibly complex for doing something as simple as specifying a GCS bucket to write streamed filed.
At a high level, I have JSON messages being streamed to a PubSub topic, and I'd like to write those raw messages to files in GCS for permanent storage (I'll also be doing other processing on the messages). I initially started with this Pipeline, thinking it would be pretty simple:
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);
p.apply("Read From PubSub", PubsubIO.readStrings().fromTopic(topic))
.apply("Write to GCS", TextIO.write().to(gcs_bucket);
p.run();
}
I got the error about needing WindowedWrites, which I applied, and then needing a FileNamePolicy. This is where things get hairy.
I went to the Beam docs and checked out FilenamePolicy. It looks like I would need to extend this class which then also require extending other abstract classes to make this work. Unfortunately the documentation on Apache is a bit scant and I can't find any examples for Dataflow 2.0 doing this, except for The Wordcount Example, which even then uses implements these details in a helper class.
So I could probably make this work just by copying much of the WordCount example, but I'm trying to better understand the details of this. A few questions I have:
1) Is there any roadmap item to abstract a lot of this complexity? It seems like I should be able to do supply a GCS bucket like I would in a nonWindowedWrite, and then just supply a few basic options like the timing and file naming rule. I know writing streaming windowed data to files is more complex than just opening a file pointer (or object storage equivalent).
2) It looks like to make this work, I need to create a WindowedContext object which requires supplying a BoundedWindow abstract class, and PaneInfo Object Class, and then some shard info. The information available for these is pretty bare and I'm having a hard time knowing what is actually needed for all of these, especially given my simple use case. Are there any good examples available that implement these? In addition, it also looks like I need the set the # of shards as part of TextIO.write, but then also supply # shards as part of the fileNamePolicy?
Thanks for anything in helping me understand the details behind this, hoping to learn a few things!
Edit 7/20/17
So I finally got this pipeline to run with extending the FilenamePolicy. My challenge was needing to define the window of the streaming data from PubSub. Here is a pretty close representation of the code:
public class ReadData {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);
p.apply("Read From PubSub", PubsubIO.readStrings().fromTopic(topic))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
.apply("Write to GCS", TextIO.write().to("gcs_bucket")
.withWindowedWrites()
.withFilenamePolicy(new TestPolicy())
.withNumShards(10));
p.run();
}
}
class TestPolicy extends FileBasedSink.FilenamePolicy {
#Override
public ResourceId windowedFilename(
ResourceId outputDirectory, WindowedContext context, String extension) {
IntervalWindow window = (IntervalWindow) context.getWindow();
String filename = String.format(
"%s-%s-%s-%s-of-%s.json",
"test",
window.start().toString(),
window.end().toString(),
context.getShardNumber(),
context.getShardNumber()
);
return outputDirectory.resolve(filename, ResolveOptions.StandardResolveOptions.RESOLVE_FILE);
}
#Override
public ResourceId unwindowedFilename(
ResourceId outputDirectory, Context context, String extension) {
throw new UnsupportedOperationException("Unsupported.");
}
}
In Beam 2.0, the below is an example of writing the raw messages from PubSub out into windowed files on GCS. The pipeline is fairly configurable, allowing you to specify the window duration via a parameter and a sub directory policy if you want logical subsections of your data for ease of reprocessing / archiving. Note that this has an additional dependency on Apache Commons Lang 3.
PubSubToGcs
/**
* This pipeline ingests incoming data from a Cloud Pub/Sub topic and
* outputs the raw data into windowed files at the specified output
* directory.
*/
public class PubsubToGcs {
/**
* Options supported by the pipeline.
*
* <p>Inherits standard configuration options.</p>
*/
public static interface Options extends DataflowPipelineOptions, StreamingOptions {
#Description("The Cloud Pub/Sub topic to read from.")
#Required
ValueProvider<String> getTopic();
void setTopic(ValueProvider<String> value);
#Description("The directory to output files to. Must end with a slash.")
#Required
ValueProvider<String> getOutputDirectory();
void setOutputDirectory(ValueProvider<String> value);
#Description("The filename prefix of the files to write to.")
#Default.String("output")
#Required
ValueProvider<String> getOutputFilenamePrefix();
void setOutputFilenamePrefix(ValueProvider<String> value);
#Description("The shard template of the output file. Specified as repeating sequences "
+ "of the letters 'S' or 'N' (example: SSS-NNN). These are replaced with the "
+ "shard number, or number of shards respectively")
#Default.String("")
ValueProvider<String> getShardTemplate();
void setShardTemplate(ValueProvider<String> value);
#Description("The suffix of the files to write.")
#Default.String("")
ValueProvider<String> getOutputFilenameSuffix();
void setOutputFilenameSuffix(ValueProvider<String> value);
#Description("The sub-directory policy which files will use when output per window.")
#Default.Enum("NONE")
SubDirectoryPolicy getSubDirectoryPolicy();
void setSubDirectoryPolicy(SubDirectoryPolicy value);
#Description("The window duration in which data will be written. Defaults to 5m. "
+ "Allowed formats are: "
+ "Ns (for seconds, example: 5s), "
+ "Nm (for minutes, example: 12m), "
+ "Nh (for hours, example: 2h).")
#Default.String("5m")
String getWindowDuration();
void setWindowDuration(String value);
#Description("The maximum number of output shards produced when writing.")
#Default.Integer(10)
Integer getNumShards();
void setNumShards(Integer value);
}
/**
* Main entry point for executing the pipeline.
* #param args The command-line arguments to the pipeline.
*/
public static void main(String[] args) {
Options options = PipelineOptionsFactory
.fromArgs(args)
.withValidation()
.as(Options.class);
run(options);
}
/**
* Runs the pipeline with the supplied options.
*
* #param options The execution parameters to the pipeline.
* #return The result of the pipeline execution.
*/
public static PipelineResult run(Options options) {
// Create the pipeline
Pipeline pipeline = Pipeline.create(options);
/**
* Steps:
* 1) Read string messages from PubSub
* 2) Window the messages into minute intervals specified by the executor.
* 3) Output the windowed files to GCS
*/
pipeline
.apply("Read PubSub Events",
PubsubIO
.readStrings()
.fromTopic(options.getTopic()))
.apply(options.getWindowDuration() + " Window",
Window
.into(FixedWindows.of(parseDuration(options.getWindowDuration()))))
.apply("Write File(s)",
TextIO
.write()
.withWindowedWrites()
.withNumShards(options.getNumShards())
.to(options.getOutputDirectory())
.withFilenamePolicy(
new WindowedFilenamePolicy(
options.getOutputFilenamePrefix(),
options.getShardTemplate(),
options.getOutputFilenameSuffix())
.withSubDirectoryPolicy(options.getSubDirectoryPolicy())));
// Execute the pipeline and return the result.
PipelineResult result = pipeline.run();
return result;
}
/**
* Parses a duration from a period formatted string. Values
* are accepted in the following formats:
* <p>
* Ns - Seconds. Example: 5s<br>
* Nm - Minutes. Example: 13m<br>
* Nh - Hours. Example: 2h
*
* <pre>
* parseDuration(null) = NullPointerException()
* parseDuration("") = Duration.standardSeconds(0)
* parseDuration("2s") = Duration.standardSeconds(2)
* parseDuration("5m") = Duration.standardMinutes(5)
* parseDuration("3h") = Duration.standardHours(3)
* </pre>
*
* #param value The period value to parse.
* #return The {#link Duration} parsed from the supplied period string.
*/
private static Duration parseDuration(String value) {
Preconditions.checkNotNull(value, "The specified duration must be a non-null value!");
PeriodParser parser = new PeriodFormatterBuilder()
.appendSeconds().appendSuffix("s")
.appendMinutes().appendSuffix("m")
.appendHours().appendSuffix("h")
.toParser();
MutablePeriod period = new MutablePeriod();
parser.parseInto(period, value, 0, Locale.getDefault());
Duration duration = period.toDurationFrom(new DateTime(0));
return duration;
}
}
WindowedFilenamePolicy
/**
* The {#link WindowedFilenamePolicy} class will output files
* to the specified location with a format of output-yyyyMMdd'T'HHmmssZ-001-of-100.txt.
*/
#SuppressWarnings("serial")
public class WindowedFilenamePolicy extends FilenamePolicy {
/**
* Possible sub-directory creation modes.
*/
public static enum SubDirectoryPolicy {
NONE("."),
PER_HOUR("yyyy-MM-dd/HH"),
PER_DAY("yyyy-MM-dd");
private final String subDirectoryPattern;
private SubDirectoryPolicy(String subDirectoryPattern) {
this.subDirectoryPattern = subDirectoryPattern;
}
public String getSubDirectoryPattern() {
return subDirectoryPattern;
}
public String format(Instant instant) {
DateTimeFormatter formatter = DateTimeFormat.forPattern(subDirectoryPattern);
return formatter.print(instant);
}
}
/**
* The formatter used to format the window timestamp for outputting to the filename.
*/
private static final DateTimeFormatter formatter = ISODateTimeFormat
.basicDateTimeNoMillis()
.withZone(DateTimeZone.getDefault());
/**
* The filename prefix.
*/
private final ValueProvider<String> prefix;
/**
* The filenmae suffix.
*/
private final ValueProvider<String> suffix;
/**
* The shard template used during file formatting.
*/
private final ValueProvider<String> shardTemplate;
/**
* The policy which dictates when or if sub-directories are created
* for the windowed file output.
*/
private ValueProvider<SubDirectoryPolicy> subDirectoryPolicy = StaticValueProvider.of(SubDirectoryPolicy.NONE);
/**
* Constructs a new {#link WindowedFilenamePolicy} with the
* supplied prefix used for output files.
*
* #param prefix The prefix to append to all files output by the policy.
* #param shardTemplate The template used to create uniquely named sharded files.
* #param suffix The suffix to append to all files output by the policy.
*/
public WindowedFilenamePolicy(String prefix, String shardTemplate, String suffix) {
this(StaticValueProvider.of(prefix),
StaticValueProvider.of(shardTemplate),
StaticValueProvider.of(suffix));
}
/**
* Constructs a new {#link WindowedFilenamePolicy} with the
* supplied prefix used for output files.
*
* #param prefix The prefix to append to all files output by the policy.
* #param shardTemplate The template used to create uniquely named sharded files.
* #param suffix The suffix to append to all files output by the policy.
*/
public WindowedFilenamePolicy(
ValueProvider<String> prefix,
ValueProvider<String> shardTemplate,
ValueProvider<String> suffix) {
this.prefix = prefix;
this.shardTemplate = shardTemplate;
this.suffix = suffix;
}
/**
* The subdirectory policy will create sub-directories on the
* filesystem based on the window which has fired.
*
* #param policy The subdirectory policy to apply.
* #return The filename policy instance.
*/
public WindowedFilenamePolicy withSubDirectoryPolicy(SubDirectoryPolicy policy) {
return withSubDirectoryPolicy(StaticValueProvider.of(policy));
}
/**
* The subdirectory policy will create sub-directories on the
* filesystem based on the window which has fired.
*
* #param policy The subdirectory policy to apply.
* #return The filename policy instance.
*/
public WindowedFilenamePolicy withSubDirectoryPolicy(ValueProvider<SubDirectoryPolicy> policy) {
this.subDirectoryPolicy = policy;
return this;
}
/**
* The windowed filename method will construct filenames per window in the
* format of output-yyyyMMdd'T'HHmmss-001-of-100.txt.
*/
#Override
public ResourceId windowedFilename(ResourceId outputDirectory, WindowedContext c, String extension) {
Instant windowInstant = c.getWindow().maxTimestamp();
String datetimeStr = formatter.print(windowInstant.toDateTime());
// Remove the prefix when it is null so we don't append the literal 'null'
// to the start of the filename
String filenamePrefix = prefix.get() == null ? datetimeStr : prefix.get() + "-" + datetimeStr;
String filename = DefaultFilenamePolicy.constructName(
filenamePrefix,
shardTemplate.get(),
StringUtils.defaultIfBlank(suffix.get(), extension), // Ignore the extension in favor of the suffix.
c.getShardNumber(),
c.getNumShards());
String subDirectory = subDirectoryPolicy.get().format(windowInstant);
return outputDirectory
.resolve(subDirectory, StandardResolveOptions.RESOLVE_DIRECTORY)
.resolve(filename, StandardResolveOptions.RESOLVE_FILE);
}
/**
* Unwindowed writes are unsupported by this filename policy so an {#link UnsupportedOperationException}
* will be thrown if invoked.
*/
#Override
public ResourceId unwindowedFilename(ResourceId outputDirectory, Context c, String extension) {
throw new UnsupportedOperationException("There is no windowed filename policy for unwindowed file"
+ " output. Please use the WindowedFilenamePolicy with windowed writes or switch filename policies.");
}
}
In Beam currently the DefaultFilenamePolicy supports windowed writes, so there's no need to write a custom FilenamePolicy. You can control the output filename by putting W and P placeholders (for the window and pane respectively) in the filename template. This exists in the head beam repository, and will also be in the upcoming Beam 2.1 release (which is being released as we speak).
As we know we can perform svn operations like checkout, commit, update using tools like Tortoise svn etc.
Now I am trying to perform svn operations like svn checkout, commit, update using ant script (so svn process will be much easier).
I am trying using svnkit sdk with their given as follows:
/*
* ====================================================================
* Copyright (c) 2004-2010 TMate Software Ltd. All rights reserved.
*
* This software is licensed as described in the file COPYING, which
* you should have received as part of this distribution. The terms
* are also available at http://svnkit.com/license.html
* If newer versions of this license are posted there, you may use a
* newer version instead, at your option.
* ====================================================================
*/
import java.util.Collection;
import java.util.Iterator;
import java.util.Set;
import org.tmatesoft.svn.core.SVNCommitInfo;
import org.tmatesoft.svn.core.SVNException;
import org.tmatesoft.svn.core.SVNLogEntry;
import org.tmatesoft.svn.core.SVNLogEntryPath;
import org.tmatesoft.svn.core.SVNURL;
import org.tmatesoft.svn.core.auth.ISVNAuthenticationManager;
import org.tmatesoft.svn.core.internal.io.dav.DAVRepositoryFactory;
import org.tmatesoft.svn.core.internal.io.fs.FSRepositoryFactory;
import org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryFactoryImpl;
import org.tmatesoft.svn.core.io.ISVNEditor;
import org.tmatesoft.svn.core.io.SVNRepository;
import org.tmatesoft.svn.core.io.SVNRepositoryFactory;
import org.tmatesoft.svn.core.wc.SVNWCUtil;
/*
* The following example program demonstrates how you can use SVNRepository to
* obtain a history for a range of revisions including (for each revision): all
* changed paths, log message, the author of the commit, the timestamp when the
* commit was made. It is similar to the "svn log" command supported by the
* Subversion client library.
*
* As an example here's a part of one of the program layouts (for the default
* values):
*
* ---------------------------------------------
* revision: 1240
* author: alex
* date: Tue Aug 02 19:52:49 NOVST 2005
* log message: 0.9.0 is now trunk
*
* changed paths:
* A /trunk (from /branches/0.9.0 revision 1239)
* ---------------------------------------------
* revision: 1263
* author: sa
* date: Wed Aug 03 21:19:55 NOVST 2005
* log message: updated examples, javadoc files
*
* changed paths:
* M /trunk/doc/javadoc-files/javadoc.css
* M /trunk/doc/javadoc-files/overview.html
* M /trunk/doc/examples/src/org/tmatesoft/svn/examples/wc/StatusHandler.java
* ...
*
*/
public class History {
/*
* args parameter is used to obtain a repository location URL, a start
* revision number, an end revision number, user's account name & password
* to authenticate him to the server.
*/
public static void main(String[] args) {
/*
* Default values:
*/
String url = "svnUrl";
String name = "username";
String password = "password";
long startRevision = 0;
long endRevision = -1;//HEAD (the latest) revision
/*
* Initializes the library (it must be done before ever using the
* library itself)
*/
setupLibrary();
if (args != null) {
/*
* Obtains a repository location URL
*/
url = (args.length >= 1) ? args[0] : url;
/*
* Obtains the start point of the revisions range
*/
startRevision = (args.length >= 2) ? Long.parseLong(args[1])
: startRevision;
/*
* Obtains the end point of the revisions range
*/
endRevision = (args.length >= 3) ? Long.parseLong(args[2])
: endRevision;
/*
* Obtains an account name (will be used to authenticate the user to
* the server)
*/
name = (args.length >= 4) ? args[3] : name;
/*
* Obtains a password
*/
password = (args.length >= 5) ? args[4] : password;
}
SVNRepository repository = null;
try {
/*
* Creates an instance of SVNRepository to work with the repository.
* All user's requests to the repository are relative to the
* repository location used to create this SVNRepository.
* SVNURL is a wrapper for URL strings that refer to repository locations.
*/
repository = SVNRepositoryFactory.create(SVNURL.parseURIEncoded(url));
} catch (SVNException svne) {
/*
* Perhaps a malformed URL is the cause of this exception.
*/
System.err
.println("error while creating an SVNRepository for the location '"
+ url + "': " + svne.getMessage());
System.exit(1);
}
ISVNAuthenticationManager authManager = SVNWCUtil.createDefaultAuthenticationManager(name, password);
repository.setAuthenticationManager(authManager);
/*
* Gets the latest revision number of the repository
*/
try {
endRevision = repository.getLatestRevision();
} catch (SVNException svne) {
System.err.println("error while fetching the latest repository revision: " + svne.getMessage());
System.exit(1);
}
Collection logEntries = null;
try {
logEntries = repository.log(new String[] {""}, null,
startRevision, endRevision, true, true);
} catch (SVNException svne) {
System.out.println("error while collecting log information for '"
+ url + "': " + svne.getMessage());
System.exit(1);
}
for (Iterator entries = logEntries.iterator(); entries.hasNext();) {
/*
* gets a next SVNLogEntry
*/
SVNLogEntry logEntry = (SVNLogEntry) entries.next();
System.out.println("---------------------------------------------");
/*
* gets the revision number
*/
System.out.println("revision: " + logEntry.getRevision());
/*
* gets the author of the changes made in that revision
*/
System.out.println("author: " + logEntry.getAuthor());
/*
* gets the time moment when the changes were committed
*/
System.out.println("date: " + logEntry.getDate());
/*
* gets the commit log message
*/
System.out.println("log message: " + logEntry.getMessage());
/*
* displaying all paths that were changed in that revision; cahnged
* path information is represented by SVNLogEntryPath.
*/
String logMessage = "log message";
try {
ISVNEditor editor = repository.getCommitEditor( logMessage , null /*locks*/ , true /*keepLocks*/ , null /*mediator*/ );
History.copyDir(editor, "C:/svnCommitCode/src","svnurl/src", logEntry.getRevision());
} catch (SVNException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if (logEntry.getChangedPaths().size() > 0) {
System.out.println();
System.out.println("changed paths:");
/*
* keys are changed paths
*/
Set changedPathsSet = logEntry.getChangedPaths().keySet();
for (Iterator changedPaths = changedPathsSet.iterator(); changedPaths
.hasNext();) {
/*
* obtains a next SVNLogEntryPath
*/
SVNLogEntryPath entryPath = (SVNLogEntryPath) logEntry
.getChangedPaths().get(changedPaths.next());
/*
* SVNLogEntryPath.getPath returns the changed path itself;
*
* SVNLogEntryPath.getType returns a charecter describing
* how the path was changed ('A' - added, 'D' - deleted or
* 'M' - modified);
*
* If the path was copied from another one (branched) then
* SVNLogEntryPath.getCopyPath &
* SVNLogEntryPath.getCopyRevision tells where it was copied
* from and what revision the origin path was at.
*/
System.out.println(" "
+ entryPath.getType()
+ " "
+ entryPath.getPath()
+ ((entryPath.getCopyPath() != null) ? " (from "
+ entryPath.getCopyPath() + " revision "
+ entryPath.getCopyRevision() + ")" : ""));
}
}
}
}
/*
* Initializes the library to work with a repository via
* different protocols.
*/
private static void setupLibrary() {
/*
* For using over http:// and https://
*/
DAVRepositoryFactory.setup();
/*
* For using over svn:// and svn+xxx://
*/
SVNRepositoryFactoryImpl.setup();
/*
* For using over file:///
*/
FSRepositoryFactory.setup();
}
private static SVNCommitInfo copyDir( ISVNEditor editor , String srcDirPath , String dstDirPath , long revision ) throws SVNException {
editor.openRoot( -1 );
editor.addDir( dstDirPath , srcDirPath , revision );
System.out.println("done--------------------");
//Closes dstDirPath.
editor.closeDir( );
//Closes the root directory.
editor.closeDir( );
return editor.closeEdit( );
}
}
I am able to get some of the outputs as history:
revision: 7
author: username
date: Wed Apr 23 15:47:58 2014
log message: testing
changed paths:
A /testCode/src
But I am getting below error when I am calling SVNCommitInfo copyDir() method:
org.tmatesoft.svn.core.SVNException: svn: E160013: '/testCode/!svn/bc/2/C:/svnCommitCode/src' path not found: 404 Not Found (http://svnUrl)
I am providing both source (my local system directory path) and destination path (svn directory path), what I am doing wrong over here? (Means on svn same 'src' folder exist, I just want to replace with current local directory.)
Can anyone guide me in this context?
This solution worked for me.
First call this method getSVNClientManager to get authenticated,it will return clientManager which will be used to get different kind of svn clients instances to do different activities.
public SVNClientManager getSVNClientManager () throws IOException{
SVNURL url = SVNURL
.parseURIDecoded("<path to the base svn repository>");
SVNRepository repository = SVNRepositoryFactory.create(url, null);
ISVNOptions myOptions = SVNWCUtil.createDefaultOptions(true);
//provide svn username and password
//username = name used to connect to svn
//password = password used to connect to svn
ISVNAuthenticationManager myAuthManager = SVNWCUtil
.createDefaultAuthenticationManager("<username>", "<password>");
repository.setAuthenticationManager(myAuthManager);
//clientManager will be used to get different kind of svn clients instances to do different activities
//like update, commit, view diff etc.
SVNClientManager clientManager = SVNClientManager.newInstance(
myOptions, myAuthManager);
return clientManager ;
}
and then call method commitToSvn()
public void commitToSvn(SVNClientManager clientManager)
throws SVNException {
SVNCommitClient commitClient = clientManager.getCommitClient();
File fileToCheckin = new File("LocalDir/SampleFileFolder/SampleFile1");
boolean recursive = true;
SVNCommitInfo importInfo = commitClient
.doImport(
fileToCheckin ,
SVNURL.parseURIDecoded("<path at which we want to check-in the file>"),
"testing svn kit integration", recursive);
System.out.println(importInfo.getNewRevision());
}
Similarly we can call checkout method exportFromSvn()
public void exportFromSvn(SVNClientManager clientManager) throws SVNException {
SVNUpdateClient updateClient = clientManager.getUpdateClient();
SVNURL url = SVNURL.parseURIDecoded("<svn url to export from>");
//destination path
File dstPath = new File("LocalDirNew");
//the revision number which should be looked upon for the file path
SVNRevision pegRevision = SVNRevision.create(<right svn revision number>);
//the revision number which is required to be exported.
SVNRevision revision = SVNRevision.create(<right svn revision number>);
//if there is any special character for end of line (in the file) then it is required. For our use case, //it can be null, assuming there are no special characters. In this case the OS specific EoF style will //be assumed
String eolStyle = null;
//this would force the operation
boolean force = true;
//Till what extent under a directory, export is required, is determined by depth. INFINITY means the whole subtree of that directory will be exported
SVNDepth recursive = SVNDepth.INFINITY;
updateClient.doExport(url, dstPath, pegRevision, revision, eolStyle, force, recursive );
}
I am trying to run the following code for BasicCrawlController in java but I get some error:
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package edu.uci.ics.crawler4j.examples.basic;
import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;
/**
* #author Yasser Ganjisaffar <lastname at gmail dot com>
*/
public class MyWebCrawler {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("Needed parameters: ");
System.out.println("\t rootFolder (it will contain intermediate crawl data)");
System.out.println("\t numberOfCralwers (number of concurrent threads)");
return;
}
/*
* crawlStorageFolder is a folder where intermediate crawl data is
* stored.
*/
String crawlStorageFolder = args[0];
/*
* numberOfCrawlers shows the number of concurrent threads that should
* be initiated for crawling.
*/
int numberOfCrawlers = Integer.parseInt(args[1]);
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
/*
* Be polite: Make sure that we don't send more than 1 request per
* second (1000 milliseconds between requests).
*/
config.setPolitenessDelay(1000);
/*
* You can set the maximum crawl depth here. The default value is -1 for
* unlimited depth
*/
config.setMaxDepthOfCrawling(2);
/*
* You can set the maximum number of pages to crawl. The default value
* is -1 for unlimited number of pages
*/
config.setMaxPagesToFetch(1000);
/*
* Do you need to set a proxy? If so, you can use:
* config.setProxyHost("proxyserver.example.com");
* config.setProxyPort(8080);
*
* If your proxy also needs authentication:
* config.setProxyUsername(username); config.getProxyPassword(password);
*/
/*
* This config parameter can be used to set your crawl to be resumable
* (meaning that you can resume the crawl from a previously
* interrupted/crashed crawl). Note: if you enable resuming feature and
* want to start a fresh crawl, you need to delete the contents of
* rootFolder manually.
*/
config.setResumableCrawling(false);
/*
* Instantiate the controller for this crawl.
*/
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
/*
* For each crawl, you need to add some seed urls. These are the first
* URLs that are fetched and then the crawler starts following links
* which are found in these pages
*/
controller.addSeed("http://www.ics.uci.edu/");
controller.addSeed("http://www.ics.uci.edu/~lopes/");
controller.addSeed("http://www.ics.uci.edu/~welling/");
/*
* Start the crawl. This is a blocking operation, meaning that your code
* will reach the line after this only when crawling is finished.
*/
controller.start(BasicCrawler.class, numberOfCrawlers);
}
}
the error is:
log4j:WARN No appenders could be found for logger (org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager).
log4j:WARN Please initialize the log4j system properly.
Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - Erroneous tree type: <any>
at mywebcrawler.MyWebCrawler.main(MyWebCrawler.java:107)
what is the problem with the code? it is completely copied from thw web site of crawler4j!
You are missing the log4j properties file.
What is BasicCrawler, is that your own class? How did you define it, is it a generic class? Haven't you forgot specifying the generic type?
BasicCrawler is a class that I copied from crawler4j documentation:
import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.parser.HtmlParseData;
import edu.uci.ics.crawler4j.url.WebURL;
import java.util.List;
import java.util.regex.Pattern;
public class BasicCrawler extends WebCrawler {
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" + "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf" + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");
/**
* You should implement this function to specify whether the given url
* should be crawled or not (based on your crawling logic).
*/
#Override
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches() && href.startsWith("http://www.aut.ac.ir/");
}
/**
* This function is called when a page is fetched and ready to be processed
* by your program.
*/
#Override
public void visit(Page page) {
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String domain = page.getWebURL().getDomain();
String path = page.getWebURL().getPath();
String subDomain = page.getWebURL().getSubDomain();
String parentUrl = page.getWebURL().getParentUrl();
System.out.println("Docid: " + docid);
System.out.println("URL: " + url);
System.out.println("Domain: '" + domain + "'");
System.out.println("Sub-domain: '" + subDomain + "'");
System.out.println("Path: '" + path + "'");
System.out.println("Parent page: " + parentUrl);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
List<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
}
System.out.println("=============");
}
}