How can I use OpenOffice in server mode as a multithreaded service? - java

What is the experience of working with OpenOffice in server mode? I know OpenOffice is not multithreaded and now I need to use its services in our server.
What can I do to overcome this problem?
I'm using Java.

With the current version of JODConverter (3.0-SNAPSHOT), it's quite easy to handle multiple threads of OOo in headless-mode, as the library now supports starting up several instances and keeping them in a pool, by just providing several port numbers or named pipes when constructing a OfficeManager instance:
final OfficeManager om = new DefaultOfficeManagerConfiguration()
.setOfficeHome("/usr/lib/openoffice")
.setPortNumbers(8100, 8101, 8102, 8103)
.buildOfficeManager();
om.start();
You can then us the library e.g. for converting documents without having to deal with the pool of OOo instances in the background:
OfficeDocumentConverter converter = new OfficeDocumentConverter(om);
converter.convert(new File("src/test/resources/test.odt"), new File("target/test.pdf"));

Yes, I am using OpenOffice as a document conversion server.
Unfortunately, the solution to your problem is to spawn a pool of OpenOffice processes.
The commons-pool branch of JODConverter (before it moved to code.google.com) implemented this out-of-the-box for you.

Thanks Bastian. I found another way, based on Bastian's answer. Opening several ports it provides access to create multithreads. But without many ports(enought several) we can improve performence by increase task queue timeout here is a documentation. And one thing again, we decided not to start and stop officeManager on each convertion process.At the end, I solved this task by this approach:
public class JODConverter {
private static volatile OfficeManager officeManager;
private static volatile OfficeDocumentConverter converter;
public static void startOfficeManager(){
try {
officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome(new File('libre office home path'))
.setPortNumbers(8100, 8101, 8102, 8103, 8104 )
.setTaskExecutionTimeout(600000L) // for big files
.setTaskQueueTimeout(200000L) // wait if all port were busy
.buildOfficeManager();
officeManager.start();
// 2) Create JODConverter converter
converter = new OfficeDocumentConverter(officeManager);
} catch (Throwable e){
e.printStackTrace();
}
}
public static void convertPDF(File inputFile, File outputFile) throws Throwable {
converter.convert(inputFile, outputFile);
}
public static void stopOfficeManager(){
officeManager.stop();
}
}
I call JODConverter's convertPDF when convertion is need. It will be stopped only when application was down.

OpenOffice can be used in headless mode, but it has not been built to handle a lot of requests in a stressfull production environment.
Using OpenOffice in headless mode has several issues:
The process might die/become unavailable.
There are several memory leaks issues.
Opening several OpenOffice "workers" does not scale as expected, and needs some tweaking to really have different open proccesses (having several OpenOffice copies, several services, running under different users.)
As suggested, jodconverter can be used to access the OpenOffice process.
http://code.google.com/p/jodconverter/wiki/GettingStarted

you can try this:
http://www.jopendocument.org/
its an opensource java based library that allows you to work with open office documents without open office, thus removing the need for the OOserver.

Vlad is correct about having to run multiple instances of OpenOffice on different ports.
I'd just like to add that OpenOffice doesn't seem to be stable. We run 10 instances of it in a production environment and set the code up to re-try with another instance if the first attempt fails. This way when one of the OpenOffice servers crashes (or doesn't crash but doesn't respond either) production is not affected. Since it's a pain to keep restarting the servers on a daily basis, we're slowly converting all our documents to JasperReports (see iReport for details). I'm not sure how you're using the OpenOffice server; we use it for mail merging (filling out forms for customers). If you need to convert things to PDF, I'd recommend iText.

Related

WebDAV FileSystemProvider - Java NIO

I have an Java application with lots of NIO methods like Files.copy, Files.move, Files.delete, FileChannel...
What I now trying to achieve: I want to access a remote WebDAV server and modify data on that server with the basic functions like upload, delete or update the remote WebDAV data - without changing every method on my application. So here comes my idea:
I think an WebDAV FileSystem implementation would do the trick. Adding a custom WebDAV FileSystemProvider which is managing the mentioned file operations on the remote data. I've googled a lot and the Apache VFS with Sardine implementation looks good - BUT it seems that the Apache VFS is not compatible with NIO?
Here's some example code, as I imagine it:
public class WebDAVManagerTest {
private static DefaultFileSystemManager fsManager;
private static WebdavFileObject testFile1;
private static WebdavFileObject testFile2;
private static FileSystem webDAVFileSystem1;
private static FileSystem webDAVFileSystem2;
#Before
public static void initWebDAVFileSystem(String webDAVServerURL) throws FileSystemException, org.apache.commons.vfs2.FileSystemException {
try {
fsManager = new DefaultFileSystemManager();
fsManager.addProvider("webdav", new WebdavFileProvider());
fsManager.addProvider("file", new DefaultLocalFileProvider());
fsManager.init();
} catch (org.apache.commons.vfs2.FileSystemException e) {
throw new FileSystemException("Exception initializing DefaultFileSystemManager: " + e.getMessage());
}
String exampleRemoteFile1 = "/foo/bar1.txt";
String exampleRemoteFile2 = "/foo/bar2.txt";
testFile1 = (WebdavFileObject) fsManager.resolveFile(webDAVServerURL + exampleRemoteFile1);
webDAVFileSystem1 = (FileSystem) fsManager.createFileSystem(testFile1);
Path localPath1 = webDAVFileSystem1.getPath(testFile1.toString());
testFile2 = (WebdavFileObject) fsManager.resolveFile(webDAVServerURL + exampleRemoteFile2);
webDAVFileSystem2 = (FileSystem) fsManager.createFileSystem(testFile2);
Path localPath2 = webDAVFileSystem1.getPath(testFile1.toString());
}
}
After that I want to work in my application with localPath1 + localPath2. So that e.g. a Files.copy(localPath1, newRemotePath) would copy a file on the WebDAV server to a new directory.
Is this the right course of action? Or are there other libraries to achieve that?
Apache VFS uses it's own FileSystem interface not the NIO one. You have three options with varying levels of effort.
Change your code to use an existing webdav project that uses it's own FileSystem ie Apache VFS.
Find an existing project that uses webdav and implements NIO FileSystem etc.
Implement the NIO FileSystem interface yourself.
Option 3 has already been done so you may be able to customize what someone else has already written, have a look at nio-fs-provider or nio-fs-webdav. I'm sure there are others but these two were easy to find using Google.
Implementing a WebDav NIO FileSystem from scratch would be quite a lot of work so I wouldn't recommend starting there, I'd likely take what someone has done and make that work for me ie Option 2.

Java code runs out of space memory on AWS but not MacOSX

I need another set of eyes on this.
I've written out a zip file into hundreds of gigabytes with this exact code with no modifications locally on MacOSX.
With 100% unchanged code, just deployed to an AWS instance running Ubuntu, this same code runs into Out of Memory issues (heap space).
Here's the code that's being run, streaming MyBatis to a CSV file on disk:
File directory = new File(feedDirectory);
File file;
try {
file = File.createTempFile(("feed-" + providerCode + "-"), ".csv", directory);
} catch (IOException e) {
throw new RuntimeException("Unable to create file to write feed to disk: " + e.getMessage(), e);
}
String filePath = file.getAbsolutePath();
log.info(String.format("File name for %s feed is %s", providerCode, filePath));
// output file
try (FileOutputStream out = new FileOutputStream(file)) {
streamData(out, providerCode, startDate, endDate);
} catch (IOException e) {
throw new RuntimeException("Unable to write feed to file: " + e.getMessage());
}
public void streamData(OutputStream outputStream, String providerCode, Date startDate, Date endDate) throws IOException {
try (CSVPrinter printer = CsvUtil.openPrinter(outputStream)) {
StreamingHandler<FStay> handler = stayPrintingHandler(printer);
warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, handler);
}
}
private StreamingHandler<FStay> stayPrintingHandler(CSVPrinter printer) {
StreamingHandler<FStay> handler = new StreamingHandler<>();
handler.setHandler((stay) -> {
try {
EXPORTER.writeStay(printer, stay);
} catch (IOException e) {
log.error("Issue with writing output: " + e.getMessage(), e);
}
});
return handler;
}
// The EXPORTER method
import org.apache.commons.csv.CSVPrinter;
public void writeStay(CSVPrinter printer, FStay stay) throws IOException {
List<Object> list = asList(stay);
printer.printRecord(list);
}
List<Object> asList(FStay stay) {
List<Object> list = new ArrayList<>(46);
list.add(stay.getUid());
list.add(stay.getProviderCode());
//....
return list;
}
Here's a graph of the JVM heap space (using jvisualvm) when I run this locally. I've run this consistently with of Java 8 (jdk1.8.0_51 and 1.8.0_112) locally and have gotten great results. Even written out a terabyte of data.
^ In the above, the max heap space is set to 4 gigs, and the most it ever increases to is 1.5 gigs, before going back down to around 500 MB, while streaming data to the CSV file as it's supposed to.
However, when I run this on Ubuntu with jdk 1.8.0_111, the exact same operation will not complete, running out of heap space (java.lang.OutOfMemoryError: Java heap space)
I've upped the Xmx value from 8 gigs to 16 to 25 gigs, and still run out of heap space. Meanwhile... the total size of the file is only 10 Gigs in total... which really perplexes me.
Here's what the JVisualVm graph looks like on the Ubuntu box:
I've no doubt it's the exact same code running in both environments, with the same operation being performed in each (same database server providing the same data)
The only differences I can think of at this point are:
Operating system - Ubuntu vs Mac OS X
Hosted VM in AWS vs hard metal laptop
Network speed is faster in AWS between database and Ubuntu server
JDK version is 1.8.0_111 in Ubuntu, tried 1.8.0_51 and 1.8.0_112 locally
Can anyone help shed any light on this problem?
Update
I've tried replacing all the 'try-with-resources' statements with explicit flush/close statements and no luck.
What's more, I tried to force a garbage collection on the Ubuntu box as soon as I started to see the data come in, and it had no effect-- there is something definitely stopping the heap from being collected on the Ubuntu machine... while running the exact same code on OS X let me write the full enchilada again no problem.
Update 2
In addition to the differences in the environments above, the only other difference I can think of is if the connection between the servers in AWS is so fast that it streams the data faster than it can flush the data to disk... but that still doesn't explain the issue where I only have 10 gigs of data total, and it blows up a JVM with 20 Gigs of heap space.
Is there any likelihood of there being a bug at the Ubuntu/Java level for this?
Update 3
Tried replacing the output of the CSVPrinter to use an entirely separate library (OpenCSV's CSVWriter in lieu of Apache's CSV library) and the same result occurs.
As soon as this code starts receiving data from the database, the heap starts blowing up and the garbage collector fails to reclaim any memory... but only on Ubuntu. On OS X, everything is reclaimed immediately and the heap never grows.
I've also tried flushing the stream after every write, but had no luck with that as well.
Update 4
Got the heap dump to print out, and according to this I should be looking at the database driver. Specifically the InboundDataHandler in amazon's redshift driver.
I'm using myBatis with a custom result handler. I tried setting the result handler to effectively do nothing when it gets a result (new ResultHandler<>() { // method overridden to do literally nothing}) and I know I'm not holding on to any references there.
Since it's the InboundDataHandler defined by AWS/Redshift... it makes me think it may be lower than the myBatis level... either:
Error in the SqlSessionFactory I'm setting up
Bug in the Redshift driver that only pops up in Ubuntu / AWS
Bug in the result handler I have overwritten
Here's the heap dump screenshot:
Here's where I'm setting up my SqlSessionFactoryBean:
#Bean
public javax.sql.DataSource redshiftDataSource() throws ClassNotFoundException {
log.info("Got to datasource config");
// Dynamically load driver at runtime.
Class.forName(dataWarehouseDriver);
DataSource dataSource = new DataSource();
dataSource.setURL(dataWarehouseUrl);
dataSource.setUserID(dataWarehouseUsername);
dataSource.setPassword(dataWarehousePassword);
return dataSource;
}
#Bean
public SqlSessionFactoryBean sqlSessionFactory() throws ClassNotFoundException {
SqlSessionFactoryBean factoryBean = new SqlSessionFactoryBean();
factoryBean.setDataSource(redshiftDataSource());
return factoryBean;
}
Here's the myBatis code I'm running as a test to verify that it's not me holding on to records in my ResultHandler:
warehouse.doForAllStaysByProvider(providerCode, startDate, endDate, new ResultHandler<FStay>() {
#Override
public void handleResult(ResultContext<? extends FStay> resultContext) {
// do nothing
}
});
Is there a way I can force the SQL connection to not hang on to records or something? I'll again re-iterate that on my local machine, there is no issue with this memory leak... it only surfaces when running the code in the hosted AWS environment. And in both cases, the Database driver and server are the same.
Update 6
I think it's finally fixed. Thanks to all who pointed me in the direction of the heap dump. That helped narrow it down to the offending class in a huge way.
After that, I did some research on the AWS redshift driver, and it explicitly says that your clients should specify a limit for any operations on large data. So I found out how to do that in my myBatis configuration:
<select id="doForAllStaysByProvider" fetchSize="1000" resultMap="FStayResultMap">
select distinct
f_stay.uid,
And this did the trick.
Mind you, this isn't necessary even when handling much larger data sets downloaded remotely from AWS (Database in AWS, code executing on laptop at home), and this shouldn't be necessary since I'm overriding the myBatis ResultHandler<> which handles each row individually and never holds on to any objects.
Yet something funky happens with the AWS redshift jdbc driver only when it's run in AWS (database in aws, code executing in AWS instance) which causes this InboundDataHandler to never release its resources, unless a fetchSize is specified.
Here's the heap of the server running now, getting much further than it ever has before in AWS, with the heap space never moving above 500Mb, and after i hit 'force gc' in jvisualvm, it shows the 'used' heap at less than 100mb:
Thanks again in a huge way to all those who helped guide this!
Finally figured out a solution.
The heap dump was the biggest aid-- it indicated the InboundDataHandler class of Amazon's RedShift/postgres JDCB driver was the prime culprit.
The code to set up the SqlSession appeared legit, so traveling over to Amazon's documentation landed this gem:
To avoid client-side out-of-memory errors when retrieving large data
sets using JDBC, you can enable your client to fetch data in batches
by setting the JDBC fetch size parameter.
We hadn't run into this before, as we stream results with custom ResultHandlers in MyBatis... but there seems to be something different when the AWS Redshift JDBC driver is running on AWS itself vs outside AWS connecting in.
Taking the guidance from the documentation, we added a 'fetchSize' to our MyBatis select query:
<select id="doForAllStaysByProvider" fetchSize="1000" resultMap="FStayResultMap">
select distinct
f_stay.uid,
And voila! Everything worked swimmingly. This is the only change we made and the heap never went above a couple hundred MBs.
You can see in one of the above graphs where the heap goes off the charts, as soon as the data started to be received on Amazon, the heap marches right up linearly and never reclaims an ounce of heap space once it starts.
My guess is the Redshift JDBC driver is doing something different when it's in Amazon's environment for some kind of optimization... that's all I can think of to explain the behavior.
Clearly Amazon knows what's going on since they documented it up front. I may not know the full 'why' of what's happening, but at least everything is resolved in what appears to be a satisfactory way.
Thanks to all those who helped.

JRuby: Calling Java Code From A Rack App And Keeping It In Memory

I currently know Java and Ruby, but have never used JRuby. I want to use some RAM- and computation-intensive Java code inside a Rack (sinatra) web application. In particular, this Java code loads about 200MB of data into RAM, and provides methods for doing various calculations that use this in-memory data.
I know it is possible to call Java code from Ruby in JRuby, but in my case there is an additional requirement: This Java code would need to be loaded once, kept in memory, and kept available as a shared resource for the sinatra code (which is being triggered by multiple web requests) to call out to.
Questions
Is a setup like this even possible?
What would I need to do to accomplish it? I am not even sure if this is a JRuby question per se, or something that would need to be configured in the web server. I have experience with Passenger and Unicorn/nginx, but not with Java servers, so if this does involve configuration of a Java server such as Tomcat, any info about that would help.
I am really not sure where to even start looking, or if there is a better way to be approaching this problem, so any and all recommendations or relevant links are appreciated.
Yes, a setup it's possibile ( see below about Deployment ) and to accomplish it I would suggest to use a Singleton
Singletons in Jruby
with reference to question: best/most elegant way to share objects between a stack of rack mounted apps/middlewares? I agree with Colin Surprenant's answer, namely singleton-as-module pattern which I prefer over using the singleton mixin
Example
I post here some test code you can use as a proof of concept:
JRuby sinatra side:
#file: sample_app.rb
require 'sinatra/base'
require 'java' #https://github.com/jruby/jruby/wiki/CallingJavaFromJRuby
java_import org.rondadev.samples.StatefulCalculator #import you java class here
# singleton-as-module loaded once, kept in memory
module App
module Global extend self
def calc
#calc ||= StatefulCalculator.new
end
end
end
# you could call a method to load data in the statefull java object
App::Global.calc.turn_on
class Sample < Sinatra::Base
get '/' do
"Welcome, calculator register:#{App::Global.calc.display}"
end
get '/add_one' do
"added one to calculator register, new value:#{App::Global.calc.add(1)}"
end
end
You can start it in tomcat with trinidad or simply with rackup config.ru but you need:
#file: config.ru
root = File.dirname(__FILE__) # => "."
require File.join( root, 'sample_app' ) # => true
run Sample # ..in sample_app.rb ..class Sample < Sinatra::Base
something about the Java Side:
package org.rondadev.samples;
public class StatefulCalculator {
private StatelessCalculator calculator;
double register = 0;
public double add(double a) {
register = calculator.add(register, a);
return register;
}
public double display() {
return register;
}
public void clean() {
register = 0;
}
public void turnOff() {
calculator = null;
System.out.println("[StatefulCalculator] Good bye ! ");
}
public void turnOn() {
calculator = new StatelessCalculator();
System.out.println("[StatefulCalculator] Welcome !");
}
}
Please note that the register in here is only a double but in your real code you can have a big data structure in your real scenario
Deployment
You can deploy using Mongrel, Thin (experimental), Webrick (but who would do that?), and even Java-centric application containers like Glassfish, Tomcat, or JBoss. source: jruby deployments
with TorqueBox that is built on the JBoss Application Server.
JBoss AS includes high-performance clustering, caching and messaging functionality.
trinidad is a RubyGem that allows you to run any Rack based applet wrap within an embedded Apache Tomcat container
Thread synchronization
Sinatra will use Mutex#synchronize method to place a lock on every request to avoid race conditions among threads. If your sinatra app is multithreaded and not thread safe, or any gems you use is not thread safe, you would want to do set :lock, true so that only one request is processed at a given time. .. Otherwise by default lock is false, which means the synchronize would yield to the block directly.
source: https://github.com/zhengjia/sinatra-explained/blob/master/app/tutorial_2/tutorial_2.md
Here are some instructions for how to deploy a sinatra app to Tomcat.
The java code can be loaded once and reused if you keep a reference to the java instances you have loaded. You can keep a reference from a global variable in ruby.
One thing to be aware of is that the java library you are using may not be thread safe. If you are running your ruby code in tomact, multiple requests can execute concurrently, and those requests may all access your shared java library. If your library is not thread safe, you will have to use some sort of synchronization to prevent multiple threads accessing it.

Programmatically multi language content in Alfresco

The requirement I receive is to model some existing content available on a SQL Server database using Alfresco content managment so, I create my new content model and it seems to working fine. But I've a problem with multi language: I know in Alfresco is possible for one node add multiple language (how can I do that using Java for a massive load?) but, I used also some aspects that need to be translated.
What do you usually do in that case? I thoug to follow this steps:
Create Eng content and add aspects
Create new child translted and add aspects
Is it correct? How can I make a node Multilingual programmatically (Java) and how can I add the new translate content with aspects? I took a look to Alfresco documentation but, I didn't find it, could you help me to find some documentation or tutorial about that?
UPDATE:
I'm trying to make a content multilangue:
void makeTranslation(Reference contentNodeRef, Locale locale) throws AlfrescoRuntimeException, Exception
{
try {
NodeRef nodeRef = new NodeRef("workspace://SpacesStore/" + contentNodeRef.getUuid());
MultilingualContentServiceImpl multilingualContentServiceImpl = new MultilingualContentServiceImpl();
multilingualContentServiceImpl.makeTranslation(nodeRef, locale);
}
catch (org.alfresco.error.AlfrescoRuntimeException ex) {
throw new AlfrescoRuntimeException(ex.getMessage());
}
catch (Exception ex) {
throw new Exception(ex.getMessage());
}
}
but, makeTranslation raise an nullPoint exception because MultilingualContentServiceImpl it's not initialized correctly. Any suggestion how to initialize it? I've to use spring but, how?
Any suggerstion or reply will be very helpful!
Thanks,
Andrea
You can use MultilingualContentService to add translations. But! I guess your properties should be of type d:mltext (like cm:title and cm:description are) to support multilingual content.
This means if you access alfresco using browser with english language you will see a different description as someone using german language settings in browser. This can be a little confusing because in Share there is (was?) no identifier that the property is multilingual.
If you want your translations to appear everywhere, no matter what kind of language in browser people are using, then the better approach is to define some aspect (for example ex:translatable) with as many properties as you need translations. Then you can programatically (using Java or JavaScript) use search service to find nodes you want and add the aspect to them. Finally you then add properties (translations) of that aspect to the node.
I hope this helps to clear things a bit... :)

Is it possible to connect to java from a firefox extension in the way described below

My objective is to:
Use Firefox to take a series of screendump images and save on the local Filesystem with a reference.
also via my custom extension send a reference to a java program that performs the ftp to a remote server.
This is pretty intimidating
https://developer.mozilla.org/en/JavaScript/Guide/LiveConnect_Overview
Is it possible?
Can you see any potential problems or things Id need to consider?
(I'm aware of file system problems but its for local use only)
Are there any tutorials / references that might be handy?
Ive tried linking to java but hit problems using my own classes Im getting a class not found exception when I try
JS:
var myObj = new Packages.message();
Java file:
public class Message {
private String message;
public Message()
{
this.message = "Hello";
}
public String getMessage()
{
return this.message;
}
}
not using a package java side.
Just trying to run a quick test to see if it is viable and under time pressure from those above so just wanted to see if it was a worthwhile time investment or a dead end
You might consider this Java tutorial instead: http://www.oracle.com/technetwork/java/javase/documentation/liveconnect-docs-349790.html.
What Java version are you using? Is your message class an object inside Java applet?

Categories