How to modify the class file? - java

I was working on the project in eclipse in which I have added this maven dependency for PDFBOX
Maven dependency
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.6.0</version>
</dependency>
And I was getting the error on some pdf file as:
Parsing Error, Skipping Object
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream#1b8d77fe
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.Tika.parseToString(Tika.java:357)
at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:460)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
at java.lang.Thread.run(Thread.java:662)
So when I google it, I found there was some bug in BaseParser.java file, So they have given the patch(https://issues.apache.org/jira/browse/PDFBOX-195) for this java file only.. So my question is how can I modify this java file only.. I can see the BaseParser.class file in eclipse as I have attached the source doc for that PDFBOX-Issue. Any suggestions will be appreciated.

Given that BaseParser.java is an Apache file, there is absolutely no reason why you cannot download the source, make your changes and re-compile it. I have done this with Apache code in the past. It was pretty straight forward and took me only a few minutes. Remember to submit your fix back to Apache so that way it will be included in the release.

You can:
create subclass manual (and use it if it possible)
download source, fix it, recompile, and finally, overwrite it in jar
create subclass programmaticly (using cglib or asm)
download only BasicParser, mock all depends (just create empty class files with needs methods), recompile it and put in jar (or ./ext ./endorsed dir in jvm, if you want)

Generally, one doesn't modify a class file directly, they download the source code and then rebuild the class file with javac. Yes, it is possible to modify class files without doing such a thing; but, patch files are not generally binary patch files, they are generally source code patch files.
Stefanglase has mentioned that the release you are working with should have the patch applied, but there is a small chance that a recent change reintroduced the issue. You might want to verify that you're not solving the wrong problem before you get too deep into it.
On the rare odds that you really want to modify a binary, you open it with a hexadecimal editor, or a hexeditor for short. Basically this allows you to set any byte in the file to any value, which means you must have a strong knowledge of the file's internal format, what is allowed / disallowed, and how to make allowable changes that actually implement your expected behavior. In short, you'll be doing a compiler's work manually, by hand.
It can be done, but it is the sort of task that generally requires a lot of knowledge, and few people have that knowledge already, so the costs of learning that knowledge and successfully implementing the change is likely much higher than rebuilding from available patched source. Even the costs of successfully implementing the change with the knowledge of the general principals and techniques already present isn't something that one can say with certainty is less than the costs of rebuilding the entire library with patched source.
Good Luck.

Related

How to Serialize classes, then read them with a modified version of that same class in Java

I am developing a Minecraft plugin which uses a class that I made called customPlayer. When I save the plugin data from a running instance, I put all of these objects into a HashMap<String,customPlayer> and save them with ObjectOutputStream. Loading these classes back into the same version of the plugin works great, but my problem arises when I modify the class and try to read the object using that modified class (usually associated with a new version of my plugin).
I thought about it for a bit, and thought I came up with a clever solution. My idea was to just include the old class files as an External Library inside the new version of the plugin, cross my fingers and hope it worked. It didn't.
Is there a better way to do this? I'm new to serialization and this kind of stuff, so any suggestions would be greatly appreciated. Below I will include a few Screenshots of the customPlayer class and the crash log of the server. Ideally any solution that is presented should be able to be used easily with future modifications to the class (Updates to the Jar downloaded Via a Github repo).
Instance Variables and Constructor of customPlayer.java
Is there a better way to do this?
There certainly is. Stop using Serialization and ObjectOutputStream. These classes are a disaster (even OpenJDK core team effectively agrees with this assessment). The output they generate is not particularly efficient (it's more bytes than is needed), it is not human readable, nor (easily) read by anything except java code, and it results in such hairy situations as you ran into.
Instead use e.g. Jackson to turn your objects into JSON, or use google's protobuf to turn it into efficient binary blobs.
You can read this JSON or these binary blobs in any language you want and you'll have your pick of the litter as far as libraries go. You will need to write some explicit code to 'save' an object (turn it into JSON / protobuf), and to 'read' one, but now you are free to change your code.
If you insist on continuing with serialization, you need to add a field named serialVersionUID, and set up readObject and writeObject. it's convoluted rocket science that's hard to get right. The details are in the javadoc of java.io.Serializable.
Do yourself a favour though. Don't do it.

Java how to extract from a jar and put classes back into working project?

I am trying to take files from a jar that is part of a working project, and put them back in to the project so I can run it while making subtle changes to the classes.
I have read it is possible to extract a jar, decompile, edit, reassemble the jar and run the project, but I dont want to do all that every time I make a small edit.
I have tried extracting and decompiling the jar, and then creating a new package in eclipse with the same name as the original jar, and then adding all the files back in; however I get hundreds of errors.
I am very new to java and I realize this is beyond my current skill level, so any help is greatly appreciated if there is a simple way to do this. None of the other threads on this give a clear answer.
Compilation and decompilation are lossy processes, so in general, you can't expect to be able to re-compile decompiled code. If you want to make changes to an application and run the modified version, your best best is disassemble it with Krakatau, edit the assembly file, and reassemble. The Krakatau assembly format is designed to be very close to the classfile format, so you can make changes without disrupting everything. The downside is that you have to understand Java bytecode.
I'd also suggest checking out Konloch's Bytecode Viewer or Samczsun's Helios, which might be able to do what you want.

With gcj compiled java & XStream. (Exception: Cannot create XmlPullParser)

I'm enhancing a client, which is part of a bigger project. Because of the lack of speed i was forced to switch to CNI and therefore i had to generate native code with the GNU-gcj compiler (gnu 4.6.3).
The compiling and linking works fine (thanks to the -findirect-dispatch flag) and i don't have any problems executing the output.
But when it comes to the communication between the client and the server, the client immediately disconnects. The reason:
[XStreamClient Reader] WARN - Client disconnected (Exception:
com.thoughtworks.xstream.io.StreamException: Cannot create
XmlPullParser)
(This Exeption only appears in the gcj compiled version of the client. When i run the code with the java interpreter - things work well (but too slow^^))
--> The challenging part is that i can't retrieve the source code of where this exception occurs because it is in a pre-compiled (Java class files) library the client uses. (And I cannot contact the author of that library)
I guess the library invokes the XppReader which then tries to create a XmlPullParser class and fails.
I bind in the XStream (vers. 1.4.3) library (and other required *.jars) by unpacking them and compiling the created *.class files and then linking the object files. This seems to work for all other librarys, too. (My OS=Ubuntu)
What i already did to overcome this problem:
I googled intensively for XStream/XmlPullParser and gcj and replaced the "xmlpull"- and "kxml2"-files with different versions.
But nothing worked.
Does anyone of you have a clue of what might be the solution?
EDIT:
I figured out that the reason why the XmlPullParser creation fails is that the META-INF directory with the /services/org.xmlpull.v1.XmlPullParserFactory file can not be found by the XmlPullParserFactory.newInstance function.
This is due to the fact that i only compiled and linked the *.jar's *.class files.
So as soon as i found i way to link the META-INF directory into the executable in away that the function can find and access it, the problem should be solved.
Does anyone of you already know a way to do so?
I think xmlpull need an implementation which can use xpp3 as its implementation.
Please add following code into your pom.xml and if required, add these jar files to the software which requires them.
<dependency>
<groupId>xmlpull</groupId>
<artifactId>xmlpull</artifactId>
<version>1.1.3.1</version>
</dependency>
<dependency>
<groupId>xpp3</groupId>
<artifactId>xpp3</artifactId>
<version>1.1.3.3</version>
</dependency>
I think that you've made a couple of mistakes in your implementation platform choices:
You probably didn't need to go to the lengths of implementing stuff in native code "for speed". For most things you can get roughly comparable speed in Jana as in native code, especially if you take the time to profile and optimize your Java code.
Assuming that you did, CNI was a poor choice. You would have been better off using JNI or JNA, both of which allow you to use Oracle HotSpot / OpenJDK releases.
GCJ is a poor choice because (as you have observed) some things don't work, and debugging is more difficult. (See also Is GNU's Java Compiler (GCJ) dead?)
Relying on a library that you cannot get source code for is unfortunate.
My advice would be to revisit as many of those "missteps" as possible.
As i already edited into my question, the reason why the creation fails is that the XmlPullParserFactory.newInstance method is not able to access the /META-INF/services/org.xmlpull.v1.XmlPullParserFactory file by using the following line of code:
InputStream is = context.getResourceAsStream (RESOURCE_NAME);
(RESOURCE_NAME equals "/META-INF/services/org.xmlpull.v1.XmlPullParserFactory")
I must admit that i didn't find a way to bind in the needed META-INF directory into the executable, which would have been one of the most elegant solutions.
But since the XmlPullParserFactory.java file (and XStream library) is open source you just need do add one line of code into above's source file and replace the old class, with the new one - and that's it.
In the public static XmlPullParserFactory newInstance (String classNames, Class context) function the program only wants to read from the RESOURCE_NAME file when classNames == null.
So what we do to avoid this is to assign the RESOURCE_NAME's file content to the classNames variable by our selves and for that place this line of code above the if (classNames == null || classNames.length() == 0 || "DEFAULT".equals(classNames)) statement:
classNames = "org.xmlpull.mxp1.MXParser,org.xmlpull.mxp1_serializer.MXSerializer";
"org.xmlpull.mxp1.MXParser,org.xmlpull.mxp1_serializer.MXSerializer" is my RESOURCE_NAME-file's content. If the content of your file differs from mine -> put in yours instead.
Best regards, Chris

How to make javadoc.jar take priority over javadoc in sources.jar?

I have a limited selection of original source code overlayed onto decompiled code in a sources jar.
This is great as it gives me easy ability to drill down into the code when debugging however it seems to have a side effect of disabling the javadoc from the associated javadoc.jar from working in eclipse despite me having a separate javadoc.jar file with the javadoc in it.
I assume this happening because eclipse is finding the 'source code' and assumes that all the javadoc is in the source and therefore there is no need to check the javadoc.jar file.
I'd like to be able to tell eclipse (preferably using maven) to not use the sources.jar for javadoc and only use the javadoc.jar. I would still like to use the sources.jar for source code.
I have assumed that eclipse is preferring to display javadoc from sources and may be wrong so please correct me if that is the case.
Also, I may just be doing something simple the wrong way so please let me know if that is the case.
I am hunting for the same thing. I have some source jars I created with jad (and since they are decompiled, they have no JavaDoc in them) and attached as source attachments. I also have the JavaDoc attached. It seems like it is a limitation of Eclipse. It will scrape the JavaDoc from the sources and display it (even if its empty) rather than looking to the JavaDoc. I wish it would notice that the JavaDoc was missing from the source and try the JavaDoc location instead. If I don't find a solution, I'm going to post the question and/or feature request over at the Eclipse site.
One workaround might be to integrate into the java decompiler (like jad) the ability to examine both the source an the javadoc, and put the javadoc back into the source. It would also then have parameter names for methods available too so it could put those back in. Lots of people have suggested this, but I cannot find anyone who has done it.
A couple of caveats. First, jad hasn't been maintained in a long time. The JD-Core/JD-Eclips website has vanished. And I have not found a better Java decompiler than jad. What happened to all the great Java decompiling gurus and solutions? Second, it might be tricky with the "align for debugging" feature to make sure the JavaDoc comments don't take up more room than is available.

Patching Java software

I'm trying to create a process to patch our current java application so users only need to download the diffs rather than the entire application. I don't think I need to go as low level as a binary diff since most of the jar files are small, so replacing an entire jar file wouldn't be that big of a deal (maybe 5MB at most).
Are there standard tools for determining which files changed and generating a patch for them? I've seen tools like xdelta and vpatch, but I think they work at a binary level.
I basically want to figure out - which files need to be added, replaced or removed. When I run the patch, it will check the current version of the software (from a registry setting) and ensure the patch is for the correct version. If it is, it will then make the necessary changes. It doesn't sound like this would be too difficult to implement on my own, but I was wondering if other people had already done this. I'm using NSIS as my installer if that makes any difference.
Thanks,
Jeff
Be careful when doing this--I recommend not doing it at all.
The biggest problem is public static variables. They are actually compiled into the target, not referenced. This means that even if a java file doesn't change, the class must be recompiled or you will still refer to the old value.
You also want to be very careful of changing method signatures--you will get some very subtle bugs if you change a method signature and do not recompile all files that call that method--even if the calling java files don't actually need to change (for instance, change a parameter from an int to a long).
If you decide to go down this path, be ready for some really hard to debug errors (generally no traces or significant indications, just strange behavior like the number received not matching the one sent) on customer site that you cannot duplicate and a lot of pissed off customers.
Edit (too long for comment):
A binary diff of the class files might work but I'd assume that some kind of version number or date gets compiled in and that they'd change a little every compile for no reason but that could be easily tested.
You could take on some strict development practices of not using public final statics (make them private) and not every changing method signatures (deprecate instead) but I'm not convinced that I know all the possible problems, I just know the ones we encountered.
Also binary diffs of the Jar files would be useless, you'd have to diff the classes and re-integrate them into the jars (doesn't sound easy to track)
Can you package your resources separately then minimize your code a bit? Pull out strings (Good for i18n)--I guess I'm just wondering if you could trim the class files enough to always do a full build/ship.
On the other hand, Sun seems to do an okay job of making class files that are completely compatible with the previous JRE release, so they must have guidelines somewhere.
You may want to see if Java WebStart can help you as it is designed to do exactly those things you want to do.
I know that the documentation describes how to create and do incremental updates, but we deploy the whole application as it changes very rarely. It is then an issue of updating the JNLP when ready.
How is it deployed?
On a local network I just leave everything as .class files in a folder. The startup script uses robocopy or rsync to copy from network share to local. If any .class file is different it is synced down. If not, it doesn't sync.
For non-local network I created my own updater. It downloads a text file of md5sums and compares to local files. If different it pulls file down from http.
A long time ago the way we solved this was to used Classpath and jar files. Our application was built in a Jar file, and it had a launcher Jar file. The launcher classpath had a patch.jar that was read into the classpath before the main application.jar. This meant that we could update the patch.jar to supersede any classes in the main application.
However, this was a long time ago. You may be better using something like the Java Web Start type of approach, which offers more seamless application updating.

Categories