this is my code :
public class Test {
public static void main(String[] args) throws Exception {
String logPath = "D:\\mywork\\OMS\\Tymon\\testlog\\testlog.log";
File file = new File(logPath);
SchedulerFactory schedFact = new StdSchedulerFactory();
Scheduler sched = schedFact.getScheduler();
JobDetail jobDetail = new JobDetail("a", "b", TestJob.class);
CronTrigger trigger = new CronTrigger("c", "d");
trigger.setCronExpression("0/23 * * * * ?");
sched.scheduleJob(jobDetail, trigger);
when the job is running, the file "D:\mywork\OMS\Tymon\testlog\testlog.log" can't be renamed and deleted.
it seems like the file handle always be held
how fix it ?
please help ~
Why you create File file = new File(logPath) object.
Seems you never used in else where in your logic.
you opened the file:
File file = new File(logPath);
but where did you close it?
I am trying to load a group of files, make some checks on them and later saving them in HDFS. I haven't found a good way to create and save these Sequence files, though. Here is my loader main function
SparkConf sparkConf = new SparkConf().setAppName("writingHDFS")
.set("spark.streaming.stopGracefullyOnShutdown", "true");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
//JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(5*1000));
JavaPairRDD<String, PortableDataStream> imageByteRDD = jsc.binaryFiles("file:///home/cloudera/Pictures/cat");
JavaPairRDD<String, String> imageRDD = jsc.wholeTextFiles("file:///home/cloudera/Pictures/");
imageRDD.mapToPair(new PairFunction<Tuple2<String,String>, Text, Text>() {
public Tuple2<Text, Text> call(Tuple2<String, String> arg0)
throws Exception {
return new Tuple2<Text, Text>(new Text(arg0._1),new Text(arg0._2));
}).saveAsNewAPIHadoopFile("hdfs://localhost:8020/user/hdfs/sparkling/try.seq", Text.class, Text.class, SequenceFileOutputFormat.class);
It simply loads some images as text files, puts the name of the file as key of the PairRDD and use the native saveAsNewAPIHadoopFile.
I would like now to save file by file in ardd.foreach or rdd.foreachPartition` but I cannot find a proper method:
This stackoverflow answer creates a Job for the occasion. It seems to work, but it needs the file inputed as a path, while I already have an RDD made of them
A couple of solution I found create a directory for each file (OutputStream out = fs.create(new Path(dst));) which wouldn't be as much of a problem, if it weren't for the fact that I get an exception for Mkdirs didn't work
EDIT: I may have found a way, but I have a Task not serializable exception:
JavaPairRDD imageByteRDD = jsc.binaryFiles("file:///home/cloudera/Pictures/cat");
imageByteRDD.foreach(new VoidFunction<Tuple2<String,PortableDataStream>>() {
public void call(Tuple2<String, PortableDataStream> fileTuple) throws Exception {
Text key = new Text(fileTuple._1());
BytesWritable value = new BytesWritable( fileTuple._2().toArray());
SequenceFile.Writer writer = SequenceFile.createWriter(serializableConfiguration.getConf(), SequenceFile.Writer.file(new Path("/user/hdfs/sparkling/" + key)),
SequenceFile.Writer.compression(SequenceFile.CompressionType.RECORD, new BZip2Codec()),
SequenceFile.Writer.keyClass(Text.class), SequenceFile.Writer.valueClass(BytesWritable.class));
key = new Text("MiaoMiao!");
writer.append(key, value);
I have tried wrapping the entire function in a Serializable class, but no luck. Help?
The way I did it was a (pseudocode, I'll try to edit this answer as soon as I get to my office)
Configuration conf = ConfigurationSingletonClass.getConfiguration();
etcetera, etcetera...
EDIT: got to my office, here is the complete segment of code: the configuration is created inside the rdd.foreachPartition (for each was a little too much). In the iterator there is the file writing itself, to a sequence file format.
JavaPairRDD<String, PortableDataStream> imageByteRDD = jsc.binaryFiles(SOURCE_PATH);
imageByteRDD.foreachPartition(new VoidFunction<Iterator<Tuple2<String,PortableDataStream>>>() {
public void call(
Iterator<Tuple2<String, PortableDataStream>> arg0)
throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", HDFS_PATH);
Tuple2<String,PortableDataStream>fileTuple =;
Text key = new Text(fileTuple._1());
String fileName = key.toString().split(SEP_PATH)[key.toString().split(SEP_PATH).length-1].split(DOT_REGEX)[0];
String fileExtension = fileName.split(DOT_REGEX)[fileName.split(DOT_REGEX).length-1];
BytesWritable value = new BytesWritable( fileTuple._2().toArray());
SequenceFile.Writer writer = SequenceFile.createWriter(
SequenceFile.Writer.file(new Path(DEST_PATH + fileName + SEP_KEY + getCurrentTimeStamp()+DOT+fileExtension)),
SequenceFile.Writer.compression(SequenceFile.CompressionType.RECORD, new BZip2Codec()),
SequenceFile.Writer.keyClass(Text.class), SequenceFile.Writer.valueClass(BytesWritable.class));
key = new Text(key.toString().split(SEP_PATH)[key.toString().split(SEP_PATH).length-2] + SEP_KEY + fileName + SEP_KEY + fileExtension);
writer.append(key, value);
Hope this will help.
I am trying to execute a series of jobs where one job execute other two, and one of the two execute another.
Job 1 --> Job 3
--> Job 2 -->Job 4
The jobs are for sending data from db.
This is what i have done
public class MembersJob implements Job{
List<Member>unsentMem=new ArrayList<Member>();
JSONArray customerJson = new JSONArray();
Depot depot;
public MembersJob(){
depot = new UserHandler().getDepot();
public void execute(JobExecutionContext jec) throws JobExecutionException {
if ( Util.getStatus() ){
} else {
System.out.println("No internet connection");
public void runSucceedingJobs(JobExecutionContext context){
JobDataMap jobDataMap = context.getJobDetail().getJobDataMap();
Object milkCollectionJobObj = jobDataMap.get("milkCollectionJob");
MilkCollectionsJob milkCollectionsJob = (MilkCollectionsJob)milkCollectionJobObj;
Object productsJobObj = jobDataMap.get("milkCollectionJob");
ProductsJob productsJob = (ProductsJob)productsJobObj;
try {
} catch (JobExecutionException ex) {
Logger.getLogger(MembersJob.class.getName()).log(Level.SEVERE, null, ex);
Call jobs in series
//Members Job
JobKey membersJobKey = new JobKey("salesJob", "group1");
JobDetail membersJob = JobBuilder.newJob(MembersJob.class)
membersJob.getJobDataMap().put("milkCollectionJob", new MilkCollectionsJob());
membersJob.getJobDataMap().put("productsJob", new ProductsJob());
CronTrigger membersTrigger = newTrigger()
.withIdentity("salesTrigger", "group1")
cronSchedule("0/10 * * * * ?"))
Scheduler scheduler = new StdSchedulerFactory().getScheduler();
scheduler.scheduleJob(membersJob, membersTrigger);
The problem is members job starts but does not start other jobs when it is done. What is the easiest and fastest way to achieve this?
In this below code; many of the threads are BLOCKED in my system when adding new conference (addConference() API) into the collection, because of system level lock (addConfLock). Adding of each new conference thread is BLOCKED; and Time for adding conf is increased because of each conference object has to be created/builded from DB by executing complex sql. So Thread BLOCKING time is proportional to DB Transaction.
I would like to separate conf object creation from SYNC block of adding a conf into collection.
I tried a solution; Please guide me if anyother solution or explain me the bad about my solution.
Below is the original code.
class Conferences extends OurCollectionImpl
//Contains all on going conferences
//single conference instance
class Conf {
String confId;
Date startTime;
Date participants;
public void load()
// Load conference details from DB, and set its instance memebrs
class ConfMgr
Conferences confs = new Conferences();
Object addConfLock = new Object();
public boolean addConference(DBConn conn, String confID){
synchronized(addConfLock) {
Conf conf = null;
conf = confs.get(confID)
if(conf != null)
{ return true;}
conf = new Conf();
conf.load(); //This is the BIG JOB with in SYNC BLOCK NEED TO SEPARATED
//My solutions
public boolean addConference(DBConn conn, String confID){
Conf conf = null;
synchronized(addConfLock) {
conf = confs.get(confID)
if(conf != null)
{ return true;}
conf = Conf.getInstance(confID, conn);
synchronized(conf.builded) { //SYNC is liberated to individual conf object level
if(conf.builded.equals("T")) {
return true;
synchronized(addConfLock) {
conf.builded = "T";
//single conference instance
class Conf {
String confId;
Date startTime;
Date participants;
String builded = "F"; //This is to avoid building object again.
private static HashMap<String, Conf> instanceMap = new HashMap<String, Conf>;
* Below code will avoid two threads are requesting
* to create conference with same confID.
public static Conf getInstance(DBConn conn, String confID){
//This below synch will ensure singleTon created per confID
synchronized(Conf.Class) {
Conf conf = instanceMap.get(confID);
if(conf == null) {
conf = new Conf();
instanceMap.put(confID, conf);
return conf;
public void load()
// Load conference details from DB, and set its instance memebrs
The problem as you stated is that you locking everything for each conf id.
How about changing the design of locking mechanism so that there is separate lock for each conf id, then you will lock only if transactions are for same conf id else execution will be parallel.
Here is how yo can acheive it:
Your Conferences should be using COncurrentHashMap to store different conf.
Take lock on conf object which you retrieve from conf = confs.get(confID) i.e synchronize(conf).
Your application should perform much better then.
Problem is in this code:
synchronized(addConfLock) {
conf = confs.get(confID)
if(conf != null)
{ return true;}
conf = Conf.getInstance(confID, conn);
If you use ConcurrentHashMap as collection you can remove this, it provides an api putIfAbsent , so you dont need to lock it.
I'm trying to run an embedded ApacheDS in my application. After reading I build this:
public void startDirectoryService() throws Exception {
service = new DefaultDirectoryService();
service.getChangeLog().setEnabled( false );
Partition apachePartition = addPartition("apache", "dc=apache,dc=org");
addIndex(apachePartition, "objectClass", "ou", "uid");
// Inject the apache root entry if it does not already exist
service.getAdminSession().lookup( apachePartition.getSuffixDn() );
catch ( LdapNameNotFoundException lnnfe )
LdapDN dnApache = new LdapDN( "dc=Apache,dc=Org" );
ServerEntry entryApache = service.newEntry( dnApache );
entryApache.add( "objectClass", "top", "domain", "extensibleObject" );
entryApache.add( "dc", "Apache" );
service.getAdminSession().add( entryApache );
But I can't connect to the server after running it. What is the default port? Or am I missing something?
Here is the solution:
service = new DefaultDirectoryService();
service.getChangeLog().setEnabled( false );
Partition apachePartition = addPartition("apache", "dc=apache,dc=org");
LdapServer ldapService = new LdapServer();
ldapService.setTransports(new TcpTransport(389));
Here is an abbreviated version of how we use it:
File workingDirectory = ...;
Partition partition = new JdbmPartition();
DirectoryService directoryService = new DefaultDirectoryService();
LdapService ldapService = new LdapService();
ldapService.setSocketAcceptor(new SocketAcceptor(null));
I wasn't able to make it run neither with cringe's, Kevin's nor Jörg Pfünder's version. Received constantly NPEs from within my JUnit test. I have debugged that and compiled all of them to a working solution:
public class DirContextSourceAnonAuthTest {
private static DirectoryService directoryService;
private static LdapServer ldapServer;
public static void startApacheDs() throws Exception {
String buildDirectory = System.getProperty("buildDirectory");
File workingDirectory = new File(buildDirectory, "apacheds-work");
directoryService = new DefaultDirectoryService();
SchemaPartition schemaPartition = directoryService.getSchemaService()
LdifPartition ldifPartition = new LdifPartition();
String workingDirectoryPath = directoryService.getWorkingDirectory()
ldifPartition.setWorkingDirectory(workingDirectoryPath + "/schema");
File schemaRepository = new File(workingDirectory, "schema");
SchemaLdifExtractor extractor = new DefaultSchemaLdifExtractor(
SchemaLoader loader = new LdifSchemaLoader(schemaRepository);
SchemaManager schemaManager = new DefaultSchemaManager(loader);
List<Throwable> errors = schemaManager.getErrors();
if (!errors.isEmpty())
throw new Exception("Schema load failed : " + errors);
JdbmPartition systemPartition = new JdbmPartition();
systemPartition.setPartitionDir(new File(directoryService
.getWorkingDirectory(), "system"));
ldapServer = new LdapServer();
ldapServer.setTransports(new TcpTransport(11389));
public static void stopApacheDs() throws Exception {
public void anonAuth() throws NamingException {
DirContextSource.Builder builder = new DirContextSource.Builder(
DirContextSource contextSource =;
DirContext context = contextSource.getDirContext();
the 2.x sample is located at folloing link :
The default port for LDAP is 389.
Since ApacheDS 1.5.7 you will get a NullpointerException. Please use the tutorial at
This project helped me:
Embedded sample project
I use this dependency in pom.xml:
Further, in 2.0.* the working dir and other paths aren't anymore defined in DirectoryService, but rather in separate class InstanceLayout, which you need to instantiate and then call
InstanceLayout il = new InstanceLayout(BASE_PATH);
In Map I read Hdfs file update to Hbase,
Version:hadoop 2.5.1 hbase 1.0.0
Exception as follows :
Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
maybe there is something wrong with
this line prompt:
The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>)
codes as follows:
public class HdfsAppend2HbaseUtil extends Configured implements Tool{
public static class HdfsAdd2HbaseMapper extends Mapper<Text, Text, ImmutableBytesWritable, Put>{
public void map(Text ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String oldIdList = HBaseHelper.getValueByKey(ikey.toString());
StringBuffer sb = new StringBuffer(oldIdList);
String newIdList = ivalue.toString();
sb.append("\t" + newIdList);
Put p = new Put(ikey.toString().getBytes());
p.addColumn("idFam".getBytes(), "idsList".getBytes(), sb.toString().getBytes());
context.write(new ImmutableBytesWritable(), p);
public int run(String[] paths) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "master,salve1");
conf.set("", "2181");
Job job = Job.getInstance(conf,"AppendToHbase");
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "reachableTable");
FileInputFormat.setInputPaths(job, new Path(paths[0]));
return job.waitForCompletion(true) ? 0 : 1;
public static void main(String[] args) throws Exception {
System.out.println("Append Start: ");
long time1 = System.currentTimeMillis();
long time2;
String[] pathsStr = {Const.TwoDegreeReachableOutputPathDetail};
int exitCode = HdfsAppend2HbaseUtil(), pathsStr);
time2 = System.currentTimeMillis();
System.out.println("Append Cost " + "\t" + (time2-time1)/1000 +" s");
You didn't mention the output directory where it is to write the output like you gave for input path.
Mention it like this.
FileOutputFormat.setOutputPath(job, new Path(<output path>));
At last , I know why,just as I supposed there is something wrong with:
this line prompt:
The method setOutputFormatClass(Class<? extends OutputFormat>) in the type Job is not applicable for the arguments (Class<TableOutputFormat>)
In fact here we need import
not to import
the former extends from org.apache.hadoop.mapred.FileOutputFormat
and the later extends from
At last Thank U all very much!!!