How Use the Apache Tika 1.16 SourceCodeParser? - java

Note to community: Please do not close this as duplicate because the particular issue I am researching has manifested as a null pointer exception. As you can see from the stack trace, the NPE is buried 4 layers deep in the Tika library. That means of all the great advice that was given in the existing StackExchange post on NPE, none of the Tika developers saw fit to apply that advice (checking for null pointers) in four modules. Rather than learn Tika and retrofit their code with a patch to do that work, it think it would be more efficient to ask if anyone had achieved the common use case of using the SourcCodeParser.
I am looking for help with a published example for the Tika library here. I did not author the example code. I have seen many similar questions relating to the Tika library, which has 20 contributors and thousands of lines of code. Please do not close this question as I believe this can be quickly easily answered by anyone who used this Parser before. I have already read the post on NullPointerException, and am following this advice from that question:
I still can't find the problem
If you tried to debug the problem and still don't have a solution, you
can post a question for more help, but make sure to include what
you've tried so far. At a minimum, include the stacktrace in the
question, and mark the important line numbers in the code.
As I spent much time authoring this post, retrieving and including relevant stack trace and source code, I would really appreciate it if you would allow this to spend a little bit of time in an unclosed state so that someone who is familiar with Tika might take a look at what appears to be fairly common issue. As you would know as a Java expert, many null pointer exception issues can be non-trivial, particularly when working with a large unfamiliar framework. I really appreciate your help.
I wrote a simple program to test the Tika SourceCodeParser, by substituting it for the AutoDetectParser in the XHTML parsing example from the Tika Examples page. When executing the parse command on line 137, there is a NullPointerException. It appears that there may be a delegate missing from the in on line 180 of the Parser code.
The AutoDetectParser works but does not identify the source code as java.
When I use the Tika desktop app, it works fine and recognizes the code as Java.
How do I initialize the SourceCodeParser to avoid the NullPointerException when operating it?
Example using Tika "Example" Package
LocalFile.toTikaXhtmlString()
123 /** Parses as Tika using source code parser.
124 *
125 * #param filePathParam path to file to parse
126 */
127 public static String toTikaXhtmlString(final String filePathParam)
128 throws IOException, SAXException, TikaException
129 {
130 SourceCodeParser parser = new SourceCodeParser();
131 ContentHandler handler = new ToXMLContentHandler();
132 Metadata metadata = new Metadata();
133 File file = new File(filePathParam);
134 try (InputStream stream
135 = ContentHandlerExample.class
136 .getResourceAsStream(filePathParam)) {
137 parser.parse(stream, handler, metadata);
138 return handler.toString();
139 } catch (Exception e) {
140 System.out.println("Caught exception.");
141 System.out.println(e.toString());
142 e.printStackTrace();
143 throw e;
144 }
145
146 }
I also tried avoiding the Tika 'ContentHandlerExample' class using direct call with InputStreamReader, to the same result:
public static String toTikaXhtmlString(final String filePathParam)
throws IOException, SAXException, TikaException
{
SourceCodeParser parser = new SourceCodeParser();
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
File file = new File(filePathParam);
try (InputStream stream = new FileInputStream(file)) {
parser.parse(stream, handler, metadata);
return handler.toString();
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
}
JUNIT Test
108 #Test
109 public void parseFile() {
110 String fileName, verifyInput, resultContent;
111
112 //arrange
113 fileName = "/Users/johnmeyer/Projects/code-proc/FileParseTest-run.txt";
114
115 String fileContent = "/** Test */ public MyTestClass {"
116 + "public static void main(String[] args) {"
117 + "System.out.println(\"This is a test.\"); }";
118
119
120 LocalFile.putText(fileName, fileContent);
121
122 verifyInput = LocalFile.getContent(fileName);
123
124 assertEquals(fileContent, verifyInput);
125 //act (and clean up)
126
127 try {
128
129 resultContent = LocalFile.toTikaXhtmlString(fileName);
130 } catch (Exception e) {
131 throw new RuntimeException(e.getMessage());
132 }
133
134 LocalFile.delete(fileName);
135
136 //assert
137 assertEquals(fileContent, resultContent);
138 }
Stack Trace
[INFO] Running us.johnmeyer.test.tools.FileParseTest Caught exception.
java.lang.NullPointerException java.lang.NullPointerException at
org.apache.commons.io.input.ProxyInputStream.markSupported(ProxyInputStream.java:181)
at
org.apache.tika.detect.AutoDetectReader.getBuffered(AutoDetectReader.java:137)
at
org.apache.tika.detect.AutoDetectReader.(AutoDetectReader.java:114)
at
org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:93)
at
org.apache.tika.parser.AbstractParser.parse(AbstractParser.java:53)
at
us.johnmeyer.utilities.LocalFile.toTikaXhtmlString(LocalFile.java:137)
at
us.johnmeyer.test.tools.FileParseTest.parseFile(FileParseTest.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at
org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at
org.junit.runners.ParentRunner.run(ParentRunner.java:309) at
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
at
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
at
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
at
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
at
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
Tika Source Code
17 package org.apache.tika.io;
18
19 import java.io.FilterInputStream;
20 import java.io.IOException;
21 import java.io.InputStream;
22
23 /**
24 * A Proxy stream which acts as expected, that is it passes the method
25 * calls on to the proxied stream and doesn't change which methods are
26 * being called.
27 * <p>
28 * It is an alternative base class to FilterInputStream
29 * to increase reusability, because FilterInputStream changes the
30 * methods being called, such as read(byte[]) to read(byte[], int, int).
31 * <p>
32 * See the protected methods for ways in which a subclass can easily decorate
33 * a stream with custom pre-, post- or error processing functionality.
34 *
35 * #author Stephen Colebourne
36 * #version $Id$
37 */
38 public abstract class ProxyInputStream extends FilterInputStream {
40 /**
41 * Constructs a new ProxyInputStream.
42 *
43 * #param proxy the InputStream to delegate to
44 */
45 public ProxyInputStream(InputStream proxy) {
46 super(proxy);
47 // the proxy is stored in a protected superclass variable named 'in'
48 }
...
174 /**
175 * Invokes the delegate's <code>markSupported()</code> method.
176 * #return true if mark is supported, otherwise false
177 */
178 #Override
179 public boolean markSupported() {
180 return in.markSupported();
181 }

Related

Apache Commons Imaging (EXIF): Setting tags/tags missing

I'm using the Apache Commons Imaging library (Java 8, you can find my code here) and I've come across a few problems with tags:
If I open the image info of e.g. this .jpg file with Win 10, there are "Origin" tags, e.g. "Authors" and "Date acquired":
You can find a list of EXIF tags here and it includes the "Authors" one and also an additional "Artist" one. The "Artist" tag seemed to have existed in the library at one point (source) but the apidocs don't list it anymore and ExifTagConstants.TIFF_TAG_ARTIST doesn't exist. Same thing with the "GPSAltitude" tag: It should exist according to the EXIF list but I can't seem to find it in the library.
I tried to use the "Maker" tag instead:
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
exifDir.removeField(ExifTagConstants.EXIF_TAG_MAKER_NOTE);
exifDir.add(ExifTagConstants.EXIF_TAG_MAKER_NOTE, "Test Maker");
But ExifTagConstants.EXIF_TAG_MAKER_NOTE's type of TagInfoUndefineds doesn't seem to be valid for exifDir.add.
I also tried to add the date:
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED, "1970/01/01");
This just throws an exception:
org.apache.commons.imaging.ImageWriteException: Tag expects 20 byte(s), not 1
The only tag I've managed to successfully write so far is ExifTagConstants.EXIF_TAG_USER_COMMENT.
How do I use/write these tags (author/artist, date, altitude,...)?
Edit:
I managed to find two of the tags:
exifDir.add(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR, "Test Author");//Author
exifDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(3, 1));//Altitude
... but they aren't written into the file.
exifDir.add(ExifTagConstants.EXIF_TAG_USER_COMMENT, "my comment");
works, so I know it's actually writing tags, it just doesn't work for the two above. Any idea what's wrong?
As #haraldK already mentioned: GPS data isn't part of the actual EFIX directory, which is also the case in the "Apache Commons Imaging" library.
So instead of writing the altitude with
double someDouble = 123.123456789;
int alt = (int) Math.round(someDouble*1000); //round to 3 decimal places
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
exifDir.removeField(GpsTagConstants.GPS_TAG_GPS_ALTITUDE);
exifDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(alt, 1000));
use:
final TiffOutputDirectory gpsDir = outputSet.getOrCreateGPSDirectory();
gpsDir.removeField(GpsTagConstants.GPS_TAG_GPS_ALTITUDE);
gpsDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(alt, 1000));
This will write "123.123" into the "Altitude" field that can be viewed through the Windows Explorer (right-click on image -> Properties -> Details).
As for the other tags:
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
final TiffOutputDirectory gpsDir = outputSet.getOrCreateGPSDirectory();
final TiffOutputDirectory rootDir = outputSet.getOrCreateRootDirectory();
final TiffOutputDirectory intDir = outputSet.getInteroperabilityDirectory(); //Not sure what this one is used for
//Writing into the "Authors" field
rootDir.removeField(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR);
rootDir.add(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR, "Me");
//Writing into the "Program Name" field
rootDir.removeField(ExifTagConstants.EXIF_TAG_SOFTWARE);
rootDir.add(ExifTagConstants.EXIF_TAG_SOFTWARE, "My App");
//Writing into the "Date taken" field
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_ORIGINAL);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_ORIGINAL, "1970:01:01 12:34:56");
//Writing into the "Digitized Date"
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED, "1970:01:01 12:34:56");
//Not visible in "Properties" (use image editing software to see it)
//Writing the GPS time stamp
gpsDir.removeField(GpsTagConstants.GPS_TAG_GPS_DATE_STAMP);
gpsDir.add(GpsTagConstants.GPS_TAG_GPS_DATE_STAMP, "1970:01:01");
//Apparently only writes year & not visible in "Properties" (use image editing software to see it)
I haven't found the right tag for "Date Acquired" yet, so if someone knows the exact one, please comment on this answer.
Old way to go :
You can looking at a specification of EXIF metadatas and you can implement some custom parsing to get them with How to extract EXIF metadatas from JPEG files.
Recent way to go :
You can read EXIF datas with that ImageData class which is working very well
Now if you want to modify/update the images metadatas, you can use the documentation in which you will find the following method :
/**
48 * This example illustrates how to add/update EXIF metadata in a JPEG file.
49 *
50 * #param jpegImageFile
51 * A source image file.
52 * #param dst
53 * The output file.
54 * #throws IOException
55 * #throws ImageReadException
56 * #throws ImageWriteException
57 */
58 public void changeExifMetadata(final File jpegImageFile, final File dst)
59 throws IOException, ImageReadException, ImageWriteException {
60
61 try (FileOutputStream fos = new FileOutputStream(dst);
62 OutputStream os = new BufferedOutputStream(fos);) {
63
64 TiffOutputSet outputSet = null;
65
66 // note that metadata might be null if no metadata is found.
67 final ImageMetadata metadata = Imaging.getMetadata(jpegImageFile);
68 final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
69 if (null != jpegMetadata) {
70 // note that exif might be null if no Exif metadata is found.
71 final TiffImageMetadata exif = jpegMetadata.getExif();
72
73 if (null != exif) {
74 // TiffImageMetadata class is immutable (read-only).
75 // TiffOutputSet class represents the Exif data to write.
76 //
77 // Usually, we want to update existing Exif metadata by
78 // changing
79 // the values of a few fields, or adding a field.
80 // In these cases, it is easiest to use getOutputSet() to
81 // start with a "copy" of the fields read from the image.
82 outputSet = exif.getOutputSet();
83 }
84 }
85
86 // if file does not contain any exif metadata, we create an empty
87 // set of exif metadata. Otherwise, we keep all of the other
88 // existing tags.
89 if (null == outputSet) {
90 outputSet = new TiffOutputSet();
91 }
92
93 {
94 // Example of how to add a field/tag to the output set.
95 //
96 // Note that you should first remove the field/tag if it already
97 // exists in this directory, or you may end up with duplicate
98 // tags. See above.
99 //
100 // Certain fields/tags are expected in certain Exif directories;
101 // Others can occur in more than one directory (and often have a
102 // different meaning in different directories).
103 //
104 // TagInfo constants often contain a description of what
105 // directories are associated with a given tag.
106 //
107 final TiffOutputDirectory exifDirectory = outputSet.getOrCreateExifDirectory();
108 // make sure to remove old value if present (this method will
109 // not fail if the tag does not exist).
110 exifDirectory.removeField(ExifTagConstants.EXIF_TAG_APERTURE_VALUE);
111 exifDirectory.add(ExifTagConstants.EXIF_TAG_APERTURE_VALUE,
112 new RationalNumber(3, 10));
113 }
114
115 {
116 // Example of how to add/update GPS info to output set.
117
118 // New York City
119 final double longitude = -74.0; // 74 degrees W (in Degrees East)
120 final double latitude = 40 + 43 / 60.0; // 40 degrees N (in Degrees
121 // North)
122
123 outputSet.setGPSInDegrees(longitude, latitude);
124 }
125
126 // printTagValue(jpegMetadata, TiffConstants.TIFF_TAG_DATE_TIME);
127
128 new ExifRewriter().updateExifMetadataLossless(jpegImageFile, os,
129 outputSet);
130 }
131 }
I have just found a usefull example for your use case I think. You will know now how to find them, I am sure it will be easier to modify them :
#Override
public Exif getExif(Photo photo) throws ServiceException {
File file = new File(photo.getPath());
String exposure = "not available";
double aperture = 0.0;
double focalLength = 0.0;
int iso = 0;
boolean flash = false;
String make = "not available";
String model = "not available";
double altitude = 0.0;
try {
final ImageMetadata metadata = Imaging.getMetadata(file);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_EXPOSURE_TIME)
!= null) {
exposure = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_EXPOSURE_TIME)
.getValueDescription().split(" ")[0];
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_APERTURE_VALUE)
!= null) {
aperture = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_APERTURE_VALUE)
.getDoubleValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FOCAL_LENGTH)
!= null) {
focalLength = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FOCAL_LENGTH)
.getDoubleValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_ISO) != null) {
iso = jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_ISO)
.getIntValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FLASH) != null) {
flash = jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FLASH)
.getIntValue() != 0;
}
if (jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MAKE) != null) {
make = jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MAKE)
.getValueDescription();
}
if (jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MODEL) != null) {
model = jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MODEL)
.getValueDescription();
}
if (jpegMetadata.findEXIFValueWithExactMatch(GpsTagConstants.GPS_TAG_GPS_ALTITUDE)
!= null) {
altitude = jpegMetadata
.findEXIFValueWithExactMatch(GpsTagConstants.GPS_TAG_GPS_ALTITUDE)
.getDoubleValue();
}
return new Exif(photo.getId(), exposure, aperture, focalLength, iso, flash, make, model,
altitude);
} catch (IOException | ImageReadException e) {
throw new ServiceException(e.getMessage(), e);
}
}
I found the Artist Tag:
exifDirectory.add(TiffTagConstants.TIFF_TAG_ARTIST, "JANE DOUGH");
Haven't tested whether this will work though.
EDIT:
It works well.
Online Exif Viewer Image
The image shows that the Artist Tag works.

H2OGeneralizedLinearEstimator() - Prediction Error

I am trying to predict test times in a Kaggle comp using the H2OGeneralizedLinearEstimator function. The model trains normally in line 3 and the metrics are all reasonable. However when I come to the predict step I get an error despite the test data frame matching the train data frame.
Has anyone seen this error before?
h2o_glm = H2OGeneralizedLinearEstimator()
h2o_glm.train(training_frame=train_h2o,y='y')
h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame()
test_pred = pd.read_csv('test.csv')[['ID']]
test_pred['y'] = h2o_glm_predictions
test_pred.to_csv('h2o_glm_predictions.csv',index=False)
glm Model Build progress: |███████████████████████████████████████████████| 100%
glm prediction progress: | (failed)
OSError Traceback (most recent call last) in () 3 h2o_glm.train(training_frame=train_h2o,y='y') 4 ----> 5 h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame() 6 7 test_pred = pd.read_csv('test.csv')[['ID']]
/Applications/anaconda/lib/python3.6/site-packages/h2o/model/model_base.py in predict(self, test_data) 130 j = H2OJob(h2o.api("POST /4/Predictions/models/%s/frames/%s" % (self.model_id, test_data.frame_id)), 131 self._model_json["algo"] + " prediction") --> 132 j.poll() 133 return h2o.get_frame(j.dest_key) 134
/Applications/anaconda/lib/python3.6/site-packages/h2o/job.py in poll(self) 71 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)): 72 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: " ---> 73 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) 74 else: 75 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
OSError: Job with key
$03017f00000132d4ffffffff$_868312f4c32f683871930a1145c1476a failed
with an exception: DistributedException from /127.0.0.1:54321: 'null',
caused by java.lang.ArrayIndexOutOfBoundsException stacktrace:
DistributedException from /127.0.0.1:54321: 'null', caused by
java.lang.ArrayIndexOutOfBoundsException at
water.MRTask.getResult(MRTask.java:478) at
water.MRTask.getResult(MRTask.java:486) at
water.MRTask.doAll(MRTask.java:390) at
water.MRTask.doAll(MRTask.java:396) at
hex.glm.GLMModel.predictScoreImpl(GLMModel.java:1215) at
hex.Model.score(Model.java:1077) at
water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:351)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1349) at
jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at
jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at
jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at
jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at
jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Caused
by: java.lang.ArrayIndexOutOfBoundsException
To summarize the comments above, the current solution is to add a response column (with fake data if it doesn't exist) to the test_data frame. However, this is a bug that should be fixed. The JIRA is here.

convert jcas object to json object

I require the output in json format, for which I have to convert the jcas object in json.
I tried using the method given in uima guide but was not successful.
Can anyone suggest me a solution.
Using JsonCasSerializer you can do this
final String note = "Serum Cholesterol 154 150 250 mgs/dl\n-\nSerum Triglycerides 67 90 200 mgs /dl\n-\nSerum HDL: Cholesterol 38 35 55 mgs /dl\n-\nSerum LDL: Cholesterol 49 85 150 mgs/d1\n-\nSerum VLDL: Cholesterol 13 10 40 mgs/dl\n-\nTotal Cholesterol / HDL Ratio: 3.90\";
final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText(note);
final AnalysisEngineDescription aed = getFastPipeline();
SimplePipeline.runPipeline(jcas, aed);
CAS cas = jcas.getCas();
JsonCasSerializer jcs = new JsonCasSerializer();
jcs.setPrettyPrint(true); // do some configuration
StringWriter sw = new StringWriter();
jcs.serialize(cas, sw); // serialize into sw
System.out.println(sw.toString());
This gave me an output of the document in JSON format.

Apache Shiro HashedCredentialsMatcher generates wrong salt

I'm facing the problem that Shiro shows some odd behavior in converting a byte
array to a salt.
I started to implement all classes involved in the process into my application which are:
org.apache.shiro.realm.AuthenticatingRealm
org.apache.shiro.authc.credential.HashedCredentialsMatcher
Upon User creation the user password is hashed with a generated salt and then stored in my database:
import org.apache.shiro.crypto.hash.Sha256Hash;
import org.apache.shiro.crypto.RandomNumberGenerator;
import org.apache.shiro.crypto.SecureRandomNumberGenerator;
RandomNumberGenerator rng = new SecureRandomNumberGenerator();
Object salt = rng.nextBytes();
String hashedPasswordBase64 = new Sha256Hash(password, salt, 1024).toBase64();
shiro.ini looks like this:
# SALTED JDBC REALM
saltedJdbcRealm=com.mycompany.ssp.SaltedJdbcRealm
dataSource = org.postgresql.ds.PGSimpleDataSource
dataSource.databaseName = Self-Service-Portal
dataSource.serverName = localhost
dataSource.portNumber = 5432
dataSource.user = postgres
dataSource.password = admin
saltedJdbcRealm.dataSource = $dataSource
saltedJdbcRealm.authenticationQuery = SELECT umgmt_users.password, umgmt_users.salt FROM umgmt_users WHERE umgmt_users.user = ?
sha256Matcher = org.apache.shiro.authc.credential.Sha256CredentialsMatcher
# base64 encoding, not hex in this example:
sha256Matcher.storedCredentialsHexEncoded = false
sha256Matcher.hashIterations = 1024
saltedJdbcRealm.credentialsMatcher = $sha256Matcher
################################################################################
# SECURITY MANAGER #
securityManager.realms = $saltedJdbcRealm
strategy = org.apache.shiro.authc.pam.FirstSuccessfulStrategy
securityManager.authenticator.authenticationStrategy = $strategy
################################################################################
my custom saltedJdbcRealm just overrides the doGetAuthenticationInfo. This code is from this blog ->
#Override
protected AuthenticationInfo doGetAuthenticationInfo(AuthenticationToken token) throws AuthenticationException {
//identify account to log to
UsernamePasswordToken userPassToken = (UsernamePasswordToken) token;
String username = userPassToken.getUsername();
if (username == null) {
log.debug("Username is null.");
return null;
}
// read password hash and salt from db
PasswdSalt passwdSalt = getPasswordForUser(username);
if (passwdSalt == null) {
log.debug("No account found for user [" + username + "]");
return null;
}
// return salted credentials
SimpleAuthenticationInfo info = new SimpleAuthenticationInfo(username, passwdSalt.password, getName());
info.setCredentialsSalt(new SimpleByteSource(passwdSalt.salt));
return info;
}
Debugging after return info goes like this:
AuthenticatingRealm.java: Mehtod: assertCredentialsMatch()
HashedCredentialsMatcher.java: Method: doCredentialsMatch()
HashedCredentialsMatcher.java: Method: hashProvidedCredentials()
Looking for the error I ended up finding it here in
org.apache.shiro.authc.credential.HashedCredentialsMatcher.java:
protected Object hashProvidedCredentials(AuthenticationToken token, AuthenticationInfo info) {
Object salt = null;
if (info instanceof SaltedAuthenticationInfo) {
// STOP HERE AND SEE BELOW PART 1!!!
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
// STOP HERE AND SEE BELOW PART 2!!!
} else {
//retain 1.0 backwards compatibility:
if (isHashSalted()) {
salt = getSalt(token);
}
}
return hashProvidedCredentials(token.getCredentials(), salt, getHashIterations());
}
Part 1:
lets take a look at the variable info:
The full byte array is the following:
57 109 102 43 65 87 118 88 70 76 105 82 116 104 113 108 116 100 101 108 79 119 61 61
which correctly represents the salt in my database:
9mf+AWvXFLiRthqltdelOw==
Next Step in the code is to extract the Salt from the info variable and store it in the variable salt of type Object.
Part 2:
looking at the variable salt after this line:
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
executed I get this result:
OW1mK0FXdlhGTGlSdGhxbHRkZWxPdz09
Edit:
I did another example and show you the 2 methods that 1) hash the submitted password 2) take the password from database for comparison & that they are not
the same:
I start off with 2 variables, token (submitted password) & info (stored password information):
Stored Credentials:
credentials:
d5fHxI7kYQYtyqo6kwvZFDATIIsZThvFQeDVidpDDEQ
storedBytes before decoding:
100 53 102 72 120 73 55 107 89 81 89 116 121 113 111 54 107 119 118 90 70 68 65 84 73 73 115 90 84 104 118 70 81 101 68 86 105 100 112 68 68 69 81 61
storedBytes after decoding:
119 -105 -57 -60 -114 -28 97 6 45 -54 -86 58 -109 11 -39 20 48 19 32 -117 25 78 27 -59 65 -32 -43 -119 -38 67 12 68
hash:
7797c7c48ee461062dcaaa3a930bd9143013208b194e1bc541e0d589da430c44
Submitted Credentials:
char[] credentials:
[0] = 1
[1] = 2
[2] = 3
byte[] bytes:
50 69 81 77 57 55 80 53 53 112 89 52 122 69 78 54 57 98 53 56 82 65 61 61
which is 2EQM97P55pY4zEN69b58RA== and this is whats inside the database
cachedBase64:
MkVRTTk3UDU1cFk0ekVONjliNThSQT09
return value is this hash:
af9a7ef0ea9fa4d93eae1ca5d16c03c516f4822ec3e9017f14f694175848a6ab
As the 2 Hash values are not the same I get why my Application is telling me wrong password BUT I created this user with the password 123 using the code above (first code block)
Edit End
So does anyone know why the hash calculation is not giving the same hash value for the same password??? Or what else I might have done wrong (i doubt that the shiro code is wrong so it may be something wrong in my code with generation the password hash/salt or shiro.ini configuration?)
ufff, after a little more playing around with those functions I found the solution why the submitted password is hashed with a wrong salt value
I added 3 lines to the method hashProvidedCredentials inside
org.apache.shiro.authc.credential.HashedCredentialsMatcher.java
protected Object hashProvidedCredentials(AuthenticationToken token, AuthenticationInfo info) {
Object salt = null;
if (info instanceof SaltedAuthenticationInfo) {
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
// Get base64 Decoder
java.util.Base64.Decoder Decoder = java.util.Base64.getDecoder();
// decode salt from database
byte[] encodedJava8 = null;
encodedJava8 = Decoder.decode(((SaltedAuthenticationInfo) info).getCredentialsSalt().getBytes());
// save decoded salt value in previous salt Object
salt = ByteSource.Util.bytes(encodedJava8);
// The 3 steps above are nessecary because the Object salt is of type
// SimpleByteSource and:
// - it holds a byte[] which holds the salt in its correct form
// - it also holds a cachedBase64 encoded version of this byte[]
// (which is of course not the actual salt)
// The Problem is that the next method call below that hashes the
// submitted password uses the cachedBase64 value to hash the
// passwort and not the byte[] which represents the actual salt
// Therefor it is nessecary to:
// - create SimpleByteSource salt with the value from the database
// - decode the byte[] so that the cachedBase64 represents the actual salt
// - store the decoded version of the byte[] in the SimpleByteSource variable salt
} else {
//retain 1.0 backwards compatibility:
if (isHashSalted()) {
salt = getSalt(token);
}
}
return hashProvidedCredentials(token.getCredentials(), salt, getHashIterations());
}
Now the user submitted password on login is hashed the same way as it was when being generated this way:
RandomNumberGenerator rng = new SecureRandomNumberGenerator();
Object salt = rng.nextBytes();
//Now hash the plain-text password with the random salt and multiple
//iterations and then Base64-encode the value (requires less space than Hex):
String hashedPasswordBase64 = new Sha256Hash(password, salt, 1024).toBase64();
Note: This is not the final version of password hashing. Salt is going to be at least 256bit & iterations are going to be around 200k-300k.
Having the problem solved, I narrowed down the problem to 4 possible options:
1)
There is a major Error in the shiro code (HashedCredentialsMatcher.java) (at least from my point of view it is) because password varification using a salt will always fail this way (see my description inside the code block).
2)
I either uses the wrong CredentialsMatcher for hased & salted passwords and I have no clue which one to use instead.
3)
My Implementation of the doGetAuthenticationInfo Method in my custom Realm has an error. For my Custom Realm I used this tutorial:
Apache Shiro Part 2 - Realms, Database and PGP Certificates
4)
I made a mistake on creation of the password hash (although that code is from the official Apache Shiro Website Link
From my Point of view the options 1 & 4 are not the problem so its either 2 or 3 which cause this problem and make it nessecary to add some code to HashedCredentialsMatcher.java Method: hashProvidedCredentials()
So concluding, does anyone have any idea on this issue just to get clarification??

Struts2 ExecAndWait NullPointerException

I have used ExecuAndWait in Struts2
I am getting error NPE while setting attribute in request in action(which is long running)
Here is stack track:
java.lang.NullPointerException
at org.apache.catalina.connector.Request.notifyAttributeAssigned(Request.java:1563)
at org.apache.catalina.connector.Request.setAttribute(Request.java:1554)
at org.apache.catalina.connector.RequestFacade.setAttribute(RequestFacade.java:542)
at javax.servlet.ServletRequestWrapper.setAttribute(ServletRequestWrapper.java:239)
at com.os.gfnactions.SiteAction.createSite(SiteAction.java:1298)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.opensymphony.xwork2.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:450)
at com.opensymphony.xwork2.DefaultActionInvocation.invokeActionOnly(DefaultActionInvocation.java:289)
at com.os.interceptor.BackgroundProcess$1.run(BackgroundProcess.java:60)
at java.lang.Thread.run(Unknown Source)
Source snipate :
Action class:
public String createSite() throws Exception
{
----
HttpServletRequest request = ServletActionContext.getRequest();
request.setAttribute("test", "test"); {At this line I got error}
---
}
From ExecuteAndWaitInterceptor.java
231 if ((!executeAfterValidationPass || secondTime) && bp == null) {
232 bp = getNewBackgroundProcess(name, actionInvocation, threadPriority);
233 session.put(KEY + name, bp);
234 performInitialDelay(bp); // first time let some time pass before showing wait page
235 secondTime = false;
236 }
From BackgroundProcess.java
public More ...BackgroundProcess(String threadName, final ActionInvocation invocation, int threadPriority) {
50 this.invocation = invocation;
51 this.action = invocation.getAction();
52 try {
53 final Thread t = new Thread(new Runnable() {
54 public void More ...run() {
55 try {
56 beforeInvocation();
57 result = invocation.invokeActionOnly();
58 afterInvocation();
59 } catch (Exception e) {
60 exception = e;
61 }
62
63 done = true;
64 }
65 });
66 t.setName(threadName);
67 t.setPriority(threadPriority);
68 t.start();
69 } catch (Exception e) {
70 exception = e;
71 }
72 }
Concept of Struts2 ExecuteAndWait
Whenever there is a long-running request, it going to execute in separate thread and result return WAIT, So again client resubmit the same request in some interval to know the status of its process(which is running in the thread)
My Problem: In above case when main request(Which initiate thread to invoke action) return with WAIT and again other request come to know status of action at this time in my action class I have request.setAttribute("test", "test");, at this line it throws an error which I have mention above.
I had a similar problem with execAndWait interceptor throwing NPE. You can find my case study here: execAndWait interceptor not working with validation on how I solved this problem.. In this problem what I find out was, that execAndWait runs in separate thread and keeps throwing wait until action is completed and in mean time it cycles itself. I faced this problem, because I used model driven interceptor with it. Due to that the getModel() of model driven interceptor was repeatedly being called by execAndWait interceptor. In getModel method, it was setting the new POJO object again and again from scratch.
And then it was entering into validate method for validation. Inside validation process it found one of the POJO field to be null. Obviously, it happened due to re-creating of raw new POJO object in getModel. And thus was throwing Null Pointer Exception.
So what I did was I used SessionAware interface. And stored the POJO object for the first time when it entered validate. Because execAndWait will surely call all methods again and re-write an object from scratch. For this purpose I checked the availability of that object in getModel() method. If that found in session, then return that same object, rather of creating a new object.
I hope you'll find a way from this.

Categories