is there any nice way to print the progresss in a kafka stream app? I feel that my app is falling behind and I want a nice way to show the progress of processing the events in my app
Out of the box, not within the Streams API.
You're more than welcome to import methods that ConsumerGroupCommand.scala uses to get the group lag and calculate / print from there.
Or you can externally install a tool like Burrow or Remora which have REST APIs for accessing lag information
I wrote the following class to help be print the lag/progress easily
package util;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.ListConsumerGroupOffsetsResult;
import org.apache.kafka.clients.admin.ListOffsetsResult;
import org.apache.kafka.clients.admin.OffsetSpec;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.util.*;
import java.util.concurrent.*;
import java.util.function.Function;
import java.util.stream.Collectors;
#Slf4j
public class LagLogger implements AutoCloseable {
private ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1);
private String topic;
private String consumerGroupName;
private int logDelayInMilliSeconds;
private Properties kafkaStreamsProperties;
private boolean closed;
private AdminClient adminClient;
public LagLogger(String topic, String consumerGroupName, Properties kafkaStreamProperties, int logDelayInMilliSeconds) {
this.topic = topic;
this.kafkaStreamsProperties = kafkaStreamProperties;
this.logDelayInMilliSeconds = logDelayInMilliSeconds;
this.consumerGroupName = consumerGroupName;
adminClient = AdminClient.create(LagLogger.this.kafkaStreamsProperties);
}
public class LagVisualizerTask implements AutoCloseable, Runnable {
public LagVisualizerTask() {
}
public void run() {
ListConsumerGroupOffsetsResult listConsumerGroupOffsetsResult = adminClient.listConsumerGroupOffsets(LagLogger.this.consumerGroupName);
// Current offsets.
Map<TopicPartition, OffsetAndMetadata> topicPartitionOffsetAndMetadataMap = null;
try {
topicPartitionOffsetAndMetadataMap = listConsumerGroupOffsetsResult.partitionsToOffsetAndMetadata().get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
// all topic partitions.
Set<TopicPartition> topicPartitions = topicPartitionOffsetAndMetadataMap.keySet();
// list of end offsets for each partitions.
ListOffsetsResult listOffsetsResult = adminClient.listOffsets(topicPartitions.stream()
.collect(Collectors.toMap(Function.identity(), tp -> OffsetSpec.latest())));
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append(topic+": ");
for (var entry : topicPartitionOffsetAndMetadataMap.entrySet()) {
String finalString = stringBuilder.toString();
if (entry.getKey().topic().equals(LagLogger.this.topic)) {
long current_offset = entry.getValue().offset();
long end_offset = 0;
try {
end_offset = listOffsetsResult.partitionResult(entry.getKey()).get().offset();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
stringBuilder.append(current_offset);
stringBuilder.append(" --> ");
stringBuilder.append(end_offset);
stringBuilder.append(" ("+String.format("%.2f", ((double)current_offset/end_offset)*100) +"%)");
stringBuilder.append(" / ");
}
}
log.info(stringBuilder.toString());
}
public void close() {
closed = true;
}
}
public LagVisualizerTask startNewLagVisualizerTask() {
LagVisualizerTask lagVisualizerTask = new LagVisualizerTask();
scheduledExecutorService.scheduleWithFixedDelay(lagVisualizerTask,0, LagLogger.this.logDelayInMilliSeconds, TimeUnit.MILLISECONDS);
return lagVisualizerTask;
}
public void close() {
if (scheduledExecutorService != null) {
scheduledExecutorService.shutdownNow();
scheduledExecutorService = null;
}
}
}
Which can be used as follows:
LagLogger lagVisualizer = new LagLogger(INPUT_TOPIC_NAME,APPLICATION_ID,configuration.getKafkaStreamsProperties(),DELY_BETWEEN_LOGS);
lagVisualizer.startNewLagVisualizerTask();
Related
I have tried to reproduce the problem i am facing. My problem statement - In a folder multiple files are present. I need to do word counts for each file and print the result. Each file should be processed parallely! of course, there is a limit to parallelism. I have written the following code to accomplish it. It is running fine.The cluster is having spark installation of mapR. The cluster has spark.scheduler.mode = FIFO.
Q1- is there a better way to accomplish the task mentioned above?
Q2- i have observed that the application does not stop even when it
has completed the word counting of avaialble files. i am unable to
figure out how to deal with it?
package groupId.artifactId;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
public class Executor {
/**
* #param args
*/
public static void main(String[] args) {
final int threadPoolSize = 5;
SparkConf sparkConf = new SparkConf().setMaster("yarn-client").setAppName("Tracker").set("spark.ui.port","0");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
List<Future> listOfFuture = new ArrayList<Future>();
for (int i = 0; i < 20; i++) {
if (listOfFuture.size() < threadPoolSize) {
FlexiWordCount flexiWordCount = new FlexiWordCount(jsc, i);
Future future = executor.submit(flexiWordCount);
listOfFuture.add(future);
} else {
boolean allFutureDone = false;
while (!allFutureDone) {
allFutureDone = checkForAllFuture(listOfFuture);
System.out.println("Threads not completed yet!");
try {
Thread.sleep(2000);//waiting for 2 sec, before next check
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
printFutureResult(listOfFuture);
System.out.println("printing of future done");
listOfFuture.clear();
System.out.println("future list got cleared");
}
}
try {
executor.awaitTermination(5, TimeUnit.MINUTES);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void printFutureResult(List<Future> listOfFuture) {
Iterator<Future> iterateFuture = listOfFuture.iterator();
while (iterateFuture.hasNext()) {
Future tempFuture = iterateFuture.next();
try {
System.out.println("Future result " + tempFuture.get());
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
private static boolean checkForAllFuture(List<Future> listOfFuture) {
boolean status = true;
Iterator<Future> iterateFuture = listOfFuture.iterator();
while (iterateFuture.hasNext()) {
Future tempFuture = iterateFuture.next();
if (!tempFuture.isDone()) {
status = false;
break;
}
}
return status;
}
package groupId.artifactId;
import java.io.Serializable;
import java.util.Arrays;
import java.util.concurrent.Callable;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
public class FlexiWordCount implements Callable<Object>,Serializable {
private static final long serialVersionUID = 1L;
private JavaSparkContext jsc;
private int fileId;
public FlexiWordCount(JavaSparkContext jsc, int fileId) {
super();
this.jsc = jsc;
this.fileId = fileId;
}
private static class Reduction implements Function2<Integer, Integer, Integer>{
#Override
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
}
private static class KVPair implements PairFunction<String, String, Integer>{
#Override
public Tuple2<String, Integer> call(String paramT)
throws Exception {
return new Tuple2<String, Integer>(paramT, 1);
}
}
private static class Flatter implements FlatMapFunction<String, String>{
#Override
public Iterable<String> call(String s) {
return Arrays.asList(s.split(" "));
}
}
#Override
public Object call() throws Exception {
JavaRDD<String> jrd = jsc.textFile("/root/folder/experiment979/" + fileId +".txt");
System.out.println("inside call() for fileId = " + fileId);
JavaRDD<String> words = jrd.flatMap(new Flatter());
JavaPairRDD<String, Integer> ones = words.mapToPair(new KVPair());
JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Reduction());
return counts.collect();
}
}
}
Why is Program not closing automatically ?
Ans : you have not closed the Sparkcontex , try changing main method to this :
public static void main(String[] args) {
final int threadPoolSize = 5;
SparkConf sparkConf = new SparkConf().setMaster("yarn-client").setAppName("Tracker").set("spark.ui.port","0");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
List<Future> listOfFuture = new ArrayList<Future>();
for (int i = 0; i < 20; i++) {
if (listOfFuture.size() < threadPoolSize) {
FlexiWordCount flexiWordCount = new FlexiWordCount(jsc, i);
Future future = executor.submit(flexiWordCount);
listOfFuture.add(future);
} else {
boolean allFutureDone = false;
while (!allFutureDone) {
allFutureDone = checkForAllFuture(listOfFuture);
System.out.println("Threads not completed yet!");
try {
Thread.sleep(2000);//waiting for 2 sec, before next check
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
printFutureResult(listOfFuture);
System.out.println("printing of future done");
listOfFuture.clear();
System.out.println("future list got cleared");
}
}
try {
executor.awaitTermination(5, TimeUnit.MINUTES);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
jsc.stop()
}
Is there a better way ?
Ans : Yes you should pass the directory of the files to sparkcontext and use .textFile over directory , in this case spark would parallaize the reads from directories over the executors . If you try to create threads yourself and then use the same spark context to re-submit job for each file you are adding a extra overhead of submitting application to yarn queue .
I think the fastest approach would be to directly pass the entire directory and create RDD out of it and then then let spark launch parallel task to process all the files in different executors .You can experiment with using .repartition() method over the RDD , as it would launch that many tasks to run parallely .
I am currently developing an application that requires random access to many (60k-100k) relatively large files.
Since opening and closing streams is a rather costly operation, I'd prefer to keep the FileChannels for the largest files open until they are no longer needed.
The problem is that since this kind of behaviour is not covered by Java 7's try-with statement, I'm required to close all the FileChannels manually.
But that is becoming increasingly too complicated since the same files could be accessed concurrently throughout the software.
I have implemented a ChannelPool class that can keep track of opened FileChannel instances for each registered Path. The ChannelPool can then be issued to close those channels whose Path is only weakly referenced by the pool itself in certain intervals.
I would prefer an event-listener approach, but I'd also rather not have to listen to the GC.
The FileChannelPool from Apache Commons doesn't address my problem, because channels still need to be closed manually.
Is there a more elegant solution to this problem? And if not, how can my implementation be improved?
import java.io.IOException;
import java.lang.ref.WeakReference;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.Map;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.ConcurrentHashMap;
public class ChannelPool {
private static final ChannelPool defaultInstance = new ChannelPool();
private final ConcurrentHashMap<String, ChannelRef> channels;
private final Timer timer;
private ChannelPool(){
channels = new ConcurrentHashMap<>();
timer = new Timer();
}
public static ChannelPool getDefault(){
return defaultInstance;
}
public void initCleanUp(){
// wait 2 seconds then repeat clean-up every 10 seconds.
timer.schedule(new CleanUpTask(this), 2000, 10000);
}
public void shutDown(){
// must be called manually.
timer.cancel();
closeAll();
}
public FileChannel getChannel(Path path){
ChannelRef cref = channels.get(path.toString());
System.out.println("getChannel called " + channels.size());
if (cref == null){
cref = ChannelRef.newInstance(path);
if (cref == null){
// failed to open channel
return null;
}
ChannelRef oldRef = channels.putIfAbsent(path.toString(), cref);
if (oldRef != null){
try{
// close new channel and let GC dispose of it
cref.channel().close();
System.out.println("redundant channel closed");
}
catch (IOException ex) {}
cref = oldRef;
}
}
return cref.channel();
}
private void remove(String str) {
ChannelRef ref = channels.remove(str);
if (ref != null){
try {
ref.channel().close();
System.out.println("old channel closed");
}
catch (IOException ex) {}
}
}
private void closeAll() {
for (Map.Entry<String, ChannelRef> e : channels.entrySet()){
remove(e.getKey());
}
}
private void maintain() {
// close channels for derefenced paths
for (Map.Entry<String, ChannelRef> e : channels.entrySet()){
ChannelRef ref = e.getValue();
if (ref != null){
Path p = ref.pathRef().get();
if (p == null){
// gc'd
remove(e.getKey());
}
}
}
}
private static class ChannelRef{
private FileChannel channel;
private WeakReference<Path> ref;
private ChannelRef(FileChannel channel, WeakReference<Path> ref) {
this.channel = channel;
this.ref = ref;
}
private static ChannelRef newInstance(Path path) {
FileChannel fc;
try {
fc = FileChannel.open(path, StandardOpenOption.READ);
}
catch (IOException ex) {
return null;
}
return new ChannelRef(fc, new WeakReference<>(path));
}
private FileChannel channel() {
return channel;
}
private WeakReference<Path> pathRef() {
return ref;
}
}
private static class CleanUpTask extends TimerTask {
private ChannelPool pool;
private CleanUpTask(ChannelPool pool){
super();
this.pool = pool;
}
#Override
public void run() {
pool.maintain();
pool.printState();
}
}
private void printState(){
System.out.println("Clean up performed. " + channels.size() + " channels remain. -- " + System.currentTimeMillis());
for (Map.Entry<String, ChannelRef> e : channels.entrySet()){
ChannelRef cref = e.getValue();
String out = "open: " + cref.channel().isOpen() + " - " + cref.channel().toString();
System.out.println(out);
}
}
}
EDIT:
Thanks to fge's answer I have now exactly what I needed. Thanks!
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import com.google.common.cache.RemovalListener;
import com.google.common.cache.RemovalNotification;
import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.ExecutionException;
public class Channels {
private static final LoadingCache<Path, FileChannel> channelCache =
CacheBuilder.newBuilder()
.weakKeys()
.removalListener(
new RemovalListener<Path, FileChannel>(){
#Override
public void onRemoval(RemovalNotification<Path, FileChannel> removal) {
FileChannel fc = removal.getValue();
try {
fc.close();
}
catch (IOException ex) {}
}
}
)
.build(
new CacheLoader<Path, FileChannel>() {
#Override
public FileChannel load(Path path) throws IOException {
return FileChannel.open(path, StandardOpenOption.READ);
}
}
);
public static FileChannel get(Path path){
try {
return channelCache.get(path);
}
catch (ExecutionException ex){}
return null;
}
}
Have a look here:
http://code.google.com/p/guava-libraries/wiki/CachesExplained
You can use a LoadingCache with a removal listener which would close the channel for you when it expires, and you can specify expiry after access or write.
I'm trying to learn how to use to pool connections so I can get better throughput by not having to open/close channels to a server but I can't seem to get it to work. I had a slightly modified version of my code that worked when I forked a thread and made each thread run a loop to dump data, but now I'm trying to use ThreadPoolExecutor to send jobs vai a single thread and then spawn 2 threads to deal handle the work. My experiment should hopefully show around 2 channels open at any given time(or as many threads as I have) but instead when i change my code I get illegalstateexception: pool not open
I'm really confused if my design of the pool is wrong or my understanding of ThreadPoolExecutor is flawed. My understanding of ThreadPoolExecutor was that it kept the threads alive if there was work to be done and didn't keep killing/respawning them upon each iteration.
Here's the code(you can ignore all the rabbitmq stuff, the gist of it is you need to open a connection to a server, then open a channel. I am trying to open one connection to the server and then a pool of channels that are shared). My idea was to create an instance of the objectpool class and then pass it to a runnable which borrows channels as needed.
Code:
import org.apache.commons.pool.BasePoolableObjectFactory;
import org.apache.commons.pool.ObjectPool;
import org.apache.commons.pool.PoolableObjectFactory;
import org.apache.commons.pool.impl.GenericObjectPool;
import java.io.IOException;
import java.util.NoSuchElementException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import com.rabbitmq.client.ConnectionFactory;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.MessageProperties;
public class PoolExample {
private static ExecutorService executor_worker;
static {
final int numberOfThreads_ThreadPoolExecutor = 2;
executor_worker =
new ThreadPoolExecutor(numberOfThreads_ThreadPoolExecutor, numberOfThreads_ThreadPoolExecutor, 1000, TimeUnit.SECONDS,
new LinkedBlockingDeque<Runnable>());
}
public static void main(String[] args) throws Exception {
System.out.println("starting..");
PoolableObjectFactory<MyPooledObject> factory = new MyPoolableObjectFactory();
ObjectPool<MyPooledObject> pool = new GenericObjectPool<MyPooledObject>(factory);
for (int x = 0; x<500000000; x++) {
executor_worker.submit(new Thread(new MyRunnable(x, pool)));
}
}
}
class MyPooledObject {
//Connection connection;
Channel channel;
public MyPooledObject() throws IOException {
System.out.println("hello world");
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
channel = connection.createChannel();
}
public Channel sing() throws IOException {
//System.out.println("mary had a little lamb");
return channel;
}
public void destroy() {
System.out.println("goodbye cruel world");
}
}
class MyPoolableObjectFactory extends BasePoolableObjectFactory<MyPooledObject> {
#Override
public MyPooledObject makeObject() throws Exception {
return new MyPooledObject();
}
#Override
public void destroyObject(MyPooledObject obj) throws Exception {
obj.destroy();
}
}
class MyRunnable implements Runnable{
protected int x = 0;
protected ObjectPool<MyPooledObject> pool = null;
public MyRunnable(int x, ObjectPool<MyPooledObject> pool) {
// TODO Auto-generated constructor stub
this.x = x;
this.pool = pool;
}
public void run(){
try {
MyPooledObject obj;
obj = pool.borrowObject();
Channel channel = obj.sing();
String message = Integer.toString(x);
channel.basicPublish( "", "task_queue",
MessageProperties.PERSISTENT_TEXT_PLAIN,
message.getBytes());
pool.returnObject(obj);
} catch (NoSuchElementException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IllegalStateException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
try {
pool.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
Any idea what's wrong with my design? or is my entire approach to pool objects flawed?
UPDATE1: as per request here is the stack trace (I get many many of these continuously):
stacktrace:
java.lang.IllegalStateException: Pool not open
at org.apache.commons.pool.BaseObjectPool.assertOpen(BaseObjectPool.java:140)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1079)
at MyRunnable.run(PoolExample.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
If it helps, line 85 in my code(where this error is triggered is): obj = pool.borrowObject();
UPDATE2: very odd. I get the error but it does write 2 items to the queue. I don't want to send anyone on a wild goose chase but I'm thinking that means it can successfully borrow objects when its creating them but not when they are being returned to the pool?
UPDATE3: I designed the code so it doesn't go through the middle step I had above. I no longer get the error but it basically does nothing..I launch 10 threads and expect 10 channels but I only get 1 channel for a few seconds then it turns off as well.
code:
import org.apache.commons.pool.BasePoolableObjectFactory;
import org.apache.commons.pool.ObjectPool;
import org.apache.commons.pool.PoolableObjectFactory;
import org.apache.commons.pool.impl.GenericObjectPool;
import java.io.IOException;
import java.util.NoSuchElementException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import com.rabbitmq.client.ConnectionFactory;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.MessageProperties;
public class PoolExample {
private static ExecutorService executor_worker;
static {
final int numberOfThreads_ThreadPoolExecutor = 20;
executor_worker =
new ThreadPoolExecutor(numberOfThreads_ThreadPoolExecutor, numberOfThreads_ThreadPoolExecutor, 1000, TimeUnit.SECONDS,
new LinkedBlockingDeque<Runnable>());
}
public static void main(String[] args) throws Exception {
System.out.println("starting..");
ObjectPool<Channel> pool =
new GenericObjectPool<Channel>(
new ConnectionPoolableObjectFactory(), 5);
for (int x = 0; x<500000000; x++) {
executor_worker.submit(new MyRunnable(x, pool));
}
executor_worker.shutdown();
pool.close();
}
}
class ConnectionPoolableObjectFactory extends BasePoolableObjectFactory<Channel> {
Channel channel;
public ConnectionPoolableObjectFactory() throws IOException {
System.out.println("hello world");
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
channel = connection.createChannel();
}
#Override
public Channel makeObject() throws Exception {
return channel;
}
#Override
public boolean validateObject(Channel channel) {
return channel.isOpen();
}
#Override
public void destroyObject(Channel channel) throws Exception {
channel.close();
}
#Override
public void passivateObject(Channel channel) throws Exception {
//System.out.println("sent back to queue");
}
}
class MyRunnable implements Runnable{
protected int x = 0;
protected ObjectPool<Channel> pool;
public MyRunnable(int x, ObjectPool<Channel> pool) {
// TODO Auto-generated constructor stub
this.x = x;
this.pool = pool;
}
public void run(){
try {
Channel channel = pool.borrowObject();
String message = Integer.toString(x);
channel.basicPublish( "", "task_queue",
MessageProperties.PERSISTENT_TEXT_PLAIN,
message.getBytes());
pool.returnObject(channel);
} catch (NoSuchElementException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IllegalStateException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I have made a program that continuously monitors a log file. But I don't know how to monitor multiple log files. This is what I did to monitor single file. What changes should I make in the following code so that it monitors multiple files also?
package com.read;
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FileWatcherTest {
public static void main(String args[]) {
final File fileName = new File("D:/logs/myFile.log");
// monitor a single file
TimerTask fileWatcherTask = new FileWatcher(fileName) {
long addFileLen = fileName.length();
FileChannel channel;
FileLock lock;
String a = "";
String b = "";
#Override
protected void onChange(File file) {
RandomAccessFile access = null;
try {
access = new RandomAccessFile(file, "rw");
channel = access.getChannel();
lock = channel.lock();
if (file.length() < addFileLen) {
access.seek(file.length());
} else {
access.seek(addFileLen);
}
} catch (Exception e) {
e.printStackTrace();
}
String line = null;
try {
while ((line = access.readLine()) != null) {
System.out.println(line);
}
addFileLen = file.length();
} catch (IOException ex) {
Logger.getLogger(FileWatcherTest.class.getName()).log(
Level.SEVERE, null, ex);
}
try {
lock.release();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} // Close the file
try {
channel.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
};
Timer timer = new Timer();
// repeat the check every second
timer.schedule(fileWatcherTask, new Date(), 1000);
}
}
package com.read;
import java.util.*;
import java.io.*;
public abstract class FileWatcher extends TimerTask {
private long timeStamp;
private File file;
static String s;
public FileWatcher(File file) {
this.file = file;
this.timeStamp = file.lastModified();
}
public final void run() {
long timeStamp = file.lastModified();
if (this.timeStamp != timeStamp) {
this.timeStamp = timeStamp;
onChange(file);
}
}
protected abstract void onChange(File file);
}
You should use threads. Here's a good tutorial:
http://docs.oracle.com/javase/tutorial/essential/concurrency/
You would do something like:
public class FileWatcherTest {
public static void main(String args[]) {
(new Thread(new FileWatcherRunnable("first.log"))).start();
(new Thread(new FileWatcherRunnable("second.log"))).start();
}
private static class FileWatcherRunnable implements Runnable {
private String logFilePath;
// you should inject the file path of the log file to watch
public FileWatcherRunnable(String logFilePath) {
this.logFilePath = logFilePath;
}
public void run() {
// your code from main goes in here
}
}
}
I am programming a links collector from specified number of pages. To make it more efficient I am using a ThreadPool with fixed size. Because I am really a newbie in the multithreading area I have problems with fixing some issues. So my idea is that every thread does the same thing: Connect to page and collect every url. After that urls are added to Queue for next thread.
But this doesn't work. At first program analyze baseurl and add urls from it. But at first I want to do it only with LinksToVisit.add(baseurl) and run it with threadpool but it always poll queue and threads add nothing new so on the top of queue is null.And I dont know why:(
I tried to do it with ArrayBlockingQueue but with no success. Fixing it with analyze base url is not good solution because when on baseurl is for example only one link it doesn't follow it. So I think I am going about it the wrong way or missing something important. As html parser I am using Jsoup. Thanks for answers.
Source(removed unnecessary methods) :
package collector;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.Iterator;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;
import java.util.concurrent.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Collector {
private String baseurl;
private int links;
private int cvlinks;
private double time;
private int chcount;
private static final int NTHREADS = Runtime.getRuntime().availableProcessors()*2;
private ConcurrentLinkedQueue<String> LinksToVisit = new ConcurrentLinkedQueue<String>();
private ConcurrentSkipListMap<String, Double> SortedCharMap = new ConcurrentSkipListMap<String, Double>();
private ConcurrentHashMap<String, Double> CharMap = new ConcurrentHashMap<String, Double>();
public Collector(String url, int links) {
this.baseurl = url;
this.links = links;
this.cvlinks = 0;
this.chcount = 0;
try {
Document html = Jsoup.connect(url).get();
if(cvlinks != links){
Elements collectedLinks = html.select("a[href]");
for(Element link:collectedLinks){
if(cvlinks == links) break;
else{
String current = link.attr("abs:href");
if(!current.equals(url) && current.startsWith(baseurl)&& !current.contains("#")){
LinksToVisit.add(current);
cvlinks++;
}
}
}
}
AnalyzeDocument(html, url);
} catch (IOException e) {
e.printStackTrace();
}
CollectFromWeb();
}
private void AnalyzeDocument(Document doc,String url){
String text = doc.body().text().toLowerCase().replaceAll("[^a-z]", "").trim();
chcount += text.length();
String chars[] = text.split("");
CharCount(chars);
}
private void CharCount(String[] chars) {
for(int i = 1; i < chars.length; i++) {
if(!CharMap.containsKey(chars[i]))
CharMap.put(chars[i],1.0);
else
CharMap.put(chars[i], CharMap.get(chars[i]).doubleValue()+1);
}
}
private void CollectFromWeb(){
long startTime = System.nanoTime();
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
CollectorThread[] workers = new CollectorThread[this.links];
for (int i = 0; i < this.links; i++) {
if(!LinksToVisit.isEmpty()){
int j = i+1;
System.out.println("Collecting from "+LinksToVisit.peek()+" ["+j+"/"+links+"]");
//Runnable worker = new CollectorThread(LinksToVisit.poll());
workers[i] = new CollectorThread(LinksToVisit.poll());
executor.execute(workers[i]);
}
else break;
}
executor.shutdown();
while (!executor.isTerminated()) {}
SortedCharMap.putAll(CharMap);
this.time =(System.nanoTime() - startTime)*10E-10;
}
class CollectorThread implements Runnable{
private Document html;
private String url;
public CollectorThread(String url){
this.url = url;
try {
this.html = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
}
#Override
public void run() {
if(cvlinks != links){
Elements collectedLinks = html.select("a[href]");
for(Element link:collectedLinks){
if(cvlinks == links) break;
else{
String current = link.attr("abs:href");
if(!current.equals(url) && current.startsWith(baseurl)&& !current.contains("#")){
LinksToVisit.add(current);
cvlinks++;
}
}
}
}
AnalyzeDocument(html, url);
}
}
}
Instead of using the LinksToVisit queue, just call executor.execute(new CollectorThread(current)) directly from CollectorThread.run(). The ExecutorService has its own internal queue of tasks which it will run as threads become available.
The other problem here is that calling shutdown() after adding the first set of URLs to the queue will prevent new tasks from being added to the executor. You can fix this by instead making the executor shut down when it has emptied its queue:
class Queue extends ThreadPoolExecutor {
Queue(int nThreads) {
super(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
}
protected void afterExecute(Runnable r, Throwable t) {
if(getQueue().isEmpty()) {
shutdown();
}
}
}