convert jcas object to json object - java

I require the output in json format, for which I have to convert the jcas object in json.
I tried using the method given in uima guide but was not successful.
Can anyone suggest me a solution.

Using JsonCasSerializer you can do this
final String note = "Serum Cholesterol 154 150 250 mgs/dl\n-\nSerum Triglycerides 67 90 200 mgs /dl\n-\nSerum HDL: Cholesterol 38 35 55 mgs /dl\n-\nSerum LDL: Cholesterol 49 85 150 mgs/d1\n-\nSerum VLDL: Cholesterol 13 10 40 mgs/dl\n-\nTotal Cholesterol / HDL Ratio: 3.90\";
final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText(note);
final AnalysisEngineDescription aed = getFastPipeline();
SimplePipeline.runPipeline(jcas, aed);
CAS cas = jcas.getCas();
JsonCasSerializer jcs = new JsonCasSerializer();
jcs.setPrettyPrint(true); // do some configuration
StringWriter sw = new StringWriter();
jcs.serialize(cas, sw); // serialize into sw
System.out.println(sw.toString());
This gave me an output of the document in JSON format.

Related

JAVA- Integer.parseInt( str ) gives NumberFormatException, input is a str representing an integer

UPDATE -
chardetect
considers the srt file to be encoded as UTF-8 with a confidence score of 1.0. I have been reading in the file with:
Files.readAllLines
set to interpret it as utf-8. With the help of others here, I determined the single digit string in fact has a length of 2... the trick now is to figure out where that extra character is coming from. The following is a en excerpt of the hex dump:
00000000: efbb bf31 0d0a 3030 3a30 303a 3034 2c35 ...1..00:00:04,5
00000010: 3031 202d 2d3e 2030 303a 3030 3a30 362c 01 --> 00:00:06,
00000020: 3439 300d 0ae3 8199 e381 9fe3 8198 e381 490.............
00000030: 8ae3 8198 e381 b6e3 828a e381 95e3 818f ................
00000040: e381 b2e3 8293 0d0a 2d20 4c75 7069 6e20 ........- Lupin
00000050: 3230 3033 2070 7265 7365 6e74 7320 2d0d 2003 presents -.
00000060: 0a0d 0a32 0d0a 3030 3a30 303a 3036 2c35 ...2..00:00:06,5
00000070: 3030 202d 2d3e 2030 303a 3030 3a30 382c 00 --> 00:00:08,
00000080: 3032 340d 0a41 2053 7475 6469 6f20 4768 024..A Studio Gh
00000090: 6962 6c69 2046 696c 6d0d 0a0d 0a33 0d0a ibli Film....3..
000000a0: 3030 3a30 303a 3133 2c30 3437 202d 2d3e 00:00:13,047 -->
000000b0: 2030 303a 3030 3a31 352c 3132 360d 0a3c 00:00:15,126..<
000000c0: 666f 6e74 2063 6f6c 6f72 3d22 2338 3838 font color="#888
000000d0: 3838 3822 3ee3 81a1 e381 b2e3 828d e380 888">...........
000000e0: 80e3 8192 e382 93e3 818d e381 a7e3 81ad ................
000000f0: e380 82e3 8080 e381 bee3 819f e380 80e3 ................
00000100: 8182 e381 8ae3 8186 e381 ade3 8080 e382 ................
Original question:
I'm reading in a .srt file using java.nio.file.Files.readAllLines().
In .srt files, every subtitle has a number line - an integer that indexes the subtitle.
As captured, this line is a string.
When I use Integer.parsint( numberLineString ), I get
java.lang.NumberFormatException
I've troubleshot this as best I can, by:
homing in on a very specific subtitle so I know this isn't being caused by some errant subtitle that might have an erroneous index with non digit characters in it.
removing any potential \n or \r away from the index number
printing the variable passed to Integer.parsint() to verify it is definitely the index string, representing an integer.
As you can see, the exception thrown is even confessing it threw the exception for the input string "1"
Exception in thread "main" java.lang.NumberFormatException: For input string: "1"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
at java.base/java.lang.Integer.parseInt(Integer.java:658)
at java.base/java.lang.Integer.parseInt(Integer.java:776)
at japanese.engine.kana.JPSubsParser.parseTitles(JPSubsParser.java:218)
at japanese.engine.kana.srtManager.main(srtManager.java:25)
I'm at a loss - any help would be appreciated.
Here is an extract from the file being read:
1
00:00:04,501 --> 00:00:06,490
すたじおじぶりさくひん
- Lupin 2003 presents -
2
00:00:06,500 --> 00:00:08,024
A Studio Ghibli Film
3
00:00:13,047 --> 00:00:15,126
<font color="#888888">ちひろ げんきでね。 また あおうね りさ</font>
<font color="#8888FF">Good Luck, Chihiro. We'll meet again</font>
Here is the relevant Java code:
public int parseTitles( Map<String, ArrayList<String>> subsMap ) {
/* Setup, get iterator for the Map
* Instantiate a charMatch object for checking for the various characters */
Iterator<String> keyIter = subsMap.keySet().iterator();
JPcharMatch charMatch = new JPcharMatch();
// Loop through the map
int tempCounter = 0;
while (keyIter.hasNext() && tempCounter == 0) {
//get the current title
ArrayList<String> titleLines = subsMap.get(keyIter.next());
String trimmedLine0 = titleLines.get(0).replace("\n", "").replace("\r", "").replace("\"", "");
int indexLineInt = Integer.parseInt( trimmedLine0 );
public class srtManager {
public static void main( String[] args ) {
Path subsFilePath = Paths.get("/Volumes/Multimedia/Coding/experimenting_srts/Sen to Chihiro Kanji+Hir+Eng.srt");
SubsFileDAO subsFileDAO = new SubsFileDAO(subsFilePath);
List<String> fileLines = subsFileDAO.getLinesList();
JPSubsParser parser = new JPSubsParser( fileLines );
Map<String, ArrayList<String>> subsMap = parser.getSubsMap();
// TODO analyze to detect kanji lines and store.
parser.parseTitles(subsMap);
}
}
You are trying to interpret BOM at beginning of file as a number.
You can remove it by hand before parsing:
String BOM = new String(new byte[] { (byte) 0xef, (byte) 0xbb, (byte) 0xbf});
int indexLineInt = Integer.parseInt( trimmedLine0.replaceAll(BOM, ""));
Alternatively, there are solutions like BOMInputStream that will remove BOM transparently from provided InputStream

Apache Commons Imaging (EXIF): Setting tags/tags missing

I'm using the Apache Commons Imaging library (Java 8, you can find my code here) and I've come across a few problems with tags:
If I open the image info of e.g. this .jpg file with Win 10, there are "Origin" tags, e.g. "Authors" and "Date acquired":
You can find a list of EXIF tags here and it includes the "Authors" one and also an additional "Artist" one. The "Artist" tag seemed to have existed in the library at one point (source) but the apidocs don't list it anymore and ExifTagConstants.TIFF_TAG_ARTIST doesn't exist. Same thing with the "GPSAltitude" tag: It should exist according to the EXIF list but I can't seem to find it in the library.
I tried to use the "Maker" tag instead:
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
exifDir.removeField(ExifTagConstants.EXIF_TAG_MAKER_NOTE);
exifDir.add(ExifTagConstants.EXIF_TAG_MAKER_NOTE, "Test Maker");
But ExifTagConstants.EXIF_TAG_MAKER_NOTE's type of TagInfoUndefineds doesn't seem to be valid for exifDir.add.
I also tried to add the date:
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED, "1970/01/01");
This just throws an exception:
org.apache.commons.imaging.ImageWriteException: Tag expects 20 byte(s), not 1
The only tag I've managed to successfully write so far is ExifTagConstants.EXIF_TAG_USER_COMMENT.
How do I use/write these tags (author/artist, date, altitude,...)?
Edit:
I managed to find two of the tags:
exifDir.add(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR, "Test Author");//Author
exifDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(3, 1));//Altitude
... but they aren't written into the file.
exifDir.add(ExifTagConstants.EXIF_TAG_USER_COMMENT, "my comment");
works, so I know it's actually writing tags, it just doesn't work for the two above. Any idea what's wrong?
As #haraldK already mentioned: GPS data isn't part of the actual EFIX directory, which is also the case in the "Apache Commons Imaging" library.
So instead of writing the altitude with
double someDouble = 123.123456789;
int alt = (int) Math.round(someDouble*1000); //round to 3 decimal places
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
exifDir.removeField(GpsTagConstants.GPS_TAG_GPS_ALTITUDE);
exifDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(alt, 1000));
use:
final TiffOutputDirectory gpsDir = outputSet.getOrCreateGPSDirectory();
gpsDir.removeField(GpsTagConstants.GPS_TAG_GPS_ALTITUDE);
gpsDir.add(GpsTagConstants.GPS_TAG_GPS_ALTITUDE, new RationalNumber(alt, 1000));
This will write "123.123" into the "Altitude" field that can be viewed through the Windows Explorer (right-click on image -> Properties -> Details).
As for the other tags:
final TiffOutputDirectory exifDir = outputSet.getOrCreateExifDirectory();
final TiffOutputDirectory gpsDir = outputSet.getOrCreateGPSDirectory();
final TiffOutputDirectory rootDir = outputSet.getOrCreateRootDirectory();
final TiffOutputDirectory intDir = outputSet.getInteroperabilityDirectory(); //Not sure what this one is used for
//Writing into the "Authors" field
rootDir.removeField(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR);
rootDir.add(MicrosoftTagConstants.EXIF_TAG_XPAUTHOR, "Me");
//Writing into the "Program Name" field
rootDir.removeField(ExifTagConstants.EXIF_TAG_SOFTWARE);
rootDir.add(ExifTagConstants.EXIF_TAG_SOFTWARE, "My App");
//Writing into the "Date taken" field
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_ORIGINAL);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_ORIGINAL, "1970:01:01 12:34:56");
//Writing into the "Digitized Date"
exifDir.removeField(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED);
exifDir.add(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED, "1970:01:01 12:34:56");
//Not visible in "Properties" (use image editing software to see it)
//Writing the GPS time stamp
gpsDir.removeField(GpsTagConstants.GPS_TAG_GPS_DATE_STAMP);
gpsDir.add(GpsTagConstants.GPS_TAG_GPS_DATE_STAMP, "1970:01:01");
//Apparently only writes year & not visible in "Properties" (use image editing software to see it)
I haven't found the right tag for "Date Acquired" yet, so if someone knows the exact one, please comment on this answer.
Old way to go :
You can looking at a specification of EXIF metadatas and you can implement some custom parsing to get them with How to extract EXIF metadatas from JPEG files.
Recent way to go :
You can read EXIF datas with that ImageData class which is working very well
Now if you want to modify/update the images metadatas, you can use the documentation in which you will find the following method :
/**
48 * This example illustrates how to add/update EXIF metadata in a JPEG file.
49 *
50 * #param jpegImageFile
51 * A source image file.
52 * #param dst
53 * The output file.
54 * #throws IOException
55 * #throws ImageReadException
56 * #throws ImageWriteException
57 */
58 public void changeExifMetadata(final File jpegImageFile, final File dst)
59 throws IOException, ImageReadException, ImageWriteException {
60
61 try (FileOutputStream fos = new FileOutputStream(dst);
62 OutputStream os = new BufferedOutputStream(fos);) {
63
64 TiffOutputSet outputSet = null;
65
66 // note that metadata might be null if no metadata is found.
67 final ImageMetadata metadata = Imaging.getMetadata(jpegImageFile);
68 final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
69 if (null != jpegMetadata) {
70 // note that exif might be null if no Exif metadata is found.
71 final TiffImageMetadata exif = jpegMetadata.getExif();
72
73 if (null != exif) {
74 // TiffImageMetadata class is immutable (read-only).
75 // TiffOutputSet class represents the Exif data to write.
76 //
77 // Usually, we want to update existing Exif metadata by
78 // changing
79 // the values of a few fields, or adding a field.
80 // In these cases, it is easiest to use getOutputSet() to
81 // start with a "copy" of the fields read from the image.
82 outputSet = exif.getOutputSet();
83 }
84 }
85
86 // if file does not contain any exif metadata, we create an empty
87 // set of exif metadata. Otherwise, we keep all of the other
88 // existing tags.
89 if (null == outputSet) {
90 outputSet = new TiffOutputSet();
91 }
92
93 {
94 // Example of how to add a field/tag to the output set.
95 //
96 // Note that you should first remove the field/tag if it already
97 // exists in this directory, or you may end up with duplicate
98 // tags. See above.
99 //
100 // Certain fields/tags are expected in certain Exif directories;
101 // Others can occur in more than one directory (and often have a
102 // different meaning in different directories).
103 //
104 // TagInfo constants often contain a description of what
105 // directories are associated with a given tag.
106 //
107 final TiffOutputDirectory exifDirectory = outputSet.getOrCreateExifDirectory();
108 // make sure to remove old value if present (this method will
109 // not fail if the tag does not exist).
110 exifDirectory.removeField(ExifTagConstants.EXIF_TAG_APERTURE_VALUE);
111 exifDirectory.add(ExifTagConstants.EXIF_TAG_APERTURE_VALUE,
112 new RationalNumber(3, 10));
113 }
114
115 {
116 // Example of how to add/update GPS info to output set.
117
118 // New York City
119 final double longitude = -74.0; // 74 degrees W (in Degrees East)
120 final double latitude = 40 + 43 / 60.0; // 40 degrees N (in Degrees
121 // North)
122
123 outputSet.setGPSInDegrees(longitude, latitude);
124 }
125
126 // printTagValue(jpegMetadata, TiffConstants.TIFF_TAG_DATE_TIME);
127
128 new ExifRewriter().updateExifMetadataLossless(jpegImageFile, os,
129 outputSet);
130 }
131 }
I have just found a usefull example for your use case I think. You will know now how to find them, I am sure it will be easier to modify them :
#Override
public Exif getExif(Photo photo) throws ServiceException {
File file = new File(photo.getPath());
String exposure = "not available";
double aperture = 0.0;
double focalLength = 0.0;
int iso = 0;
boolean flash = false;
String make = "not available";
String model = "not available";
double altitude = 0.0;
try {
final ImageMetadata metadata = Imaging.getMetadata(file);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_EXPOSURE_TIME)
!= null) {
exposure = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_EXPOSURE_TIME)
.getValueDescription().split(" ")[0];
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_APERTURE_VALUE)
!= null) {
aperture = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_APERTURE_VALUE)
.getDoubleValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FOCAL_LENGTH)
!= null) {
focalLength = jpegMetadata
.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FOCAL_LENGTH)
.getDoubleValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_ISO) != null) {
iso = jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_ISO)
.getIntValue();
}
if (jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FLASH) != null) {
flash = jpegMetadata.findEXIFValueWithExactMatch(ExifTagConstants.EXIF_TAG_FLASH)
.getIntValue() != 0;
}
if (jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MAKE) != null) {
make = jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MAKE)
.getValueDescription();
}
if (jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MODEL) != null) {
model = jpegMetadata.findEXIFValueWithExactMatch(TiffTagConstants.TIFF_TAG_MODEL)
.getValueDescription();
}
if (jpegMetadata.findEXIFValueWithExactMatch(GpsTagConstants.GPS_TAG_GPS_ALTITUDE)
!= null) {
altitude = jpegMetadata
.findEXIFValueWithExactMatch(GpsTagConstants.GPS_TAG_GPS_ALTITUDE)
.getDoubleValue();
}
return new Exif(photo.getId(), exposure, aperture, focalLength, iso, flash, make, model,
altitude);
} catch (IOException | ImageReadException e) {
throw new ServiceException(e.getMessage(), e);
}
}
I found the Artist Tag:
exifDirectory.add(TiffTagConstants.TIFF_TAG_ARTIST, "JANE DOUGH");
Haven't tested whether this will work though.
EDIT:
It works well.
Online Exif Viewer Image
The image shows that the Artist Tag works.

How Use the Apache Tika 1.16 SourceCodeParser?

Note to community: Please do not close this as duplicate because the particular issue I am researching has manifested as a null pointer exception. As you can see from the stack trace, the NPE is buried 4 layers deep in the Tika library. That means of all the great advice that was given in the existing StackExchange post on NPE, none of the Tika developers saw fit to apply that advice (checking for null pointers) in four modules. Rather than learn Tika and retrofit their code with a patch to do that work, it think it would be more efficient to ask if anyone had achieved the common use case of using the SourcCodeParser.
I am looking for help with a published example for the Tika library here. I did not author the example code. I have seen many similar questions relating to the Tika library, which has 20 contributors and thousands of lines of code. Please do not close this question as I believe this can be quickly easily answered by anyone who used this Parser before. I have already read the post on NullPointerException, and am following this advice from that question:
I still can't find the problem
If you tried to debug the problem and still don't have a solution, you
can post a question for more help, but make sure to include what
you've tried so far. At a minimum, include the stacktrace in the
question, and mark the important line numbers in the code.
As I spent much time authoring this post, retrieving and including relevant stack trace and source code, I would really appreciate it if you would allow this to spend a little bit of time in an unclosed state so that someone who is familiar with Tika might take a look at what appears to be fairly common issue. As you would know as a Java expert, many null pointer exception issues can be non-trivial, particularly when working with a large unfamiliar framework. I really appreciate your help.
I wrote a simple program to test the Tika SourceCodeParser, by substituting it for the AutoDetectParser in the XHTML parsing example from the Tika Examples page. When executing the parse command on line 137, there is a NullPointerException. It appears that there may be a delegate missing from the in on line 180 of the Parser code.
The AutoDetectParser works but does not identify the source code as java.
When I use the Tika desktop app, it works fine and recognizes the code as Java.
How do I initialize the SourceCodeParser to avoid the NullPointerException when operating it?
Example using Tika "Example" Package
LocalFile.toTikaXhtmlString()
123 /** Parses as Tika using source code parser.
124 *
125 * #param filePathParam path to file to parse
126 */
127 public static String toTikaXhtmlString(final String filePathParam)
128 throws IOException, SAXException, TikaException
129 {
130 SourceCodeParser parser = new SourceCodeParser();
131 ContentHandler handler = new ToXMLContentHandler();
132 Metadata metadata = new Metadata();
133 File file = new File(filePathParam);
134 try (InputStream stream
135 = ContentHandlerExample.class
136 .getResourceAsStream(filePathParam)) {
137 parser.parse(stream, handler, metadata);
138 return handler.toString();
139 } catch (Exception e) {
140 System.out.println("Caught exception.");
141 System.out.println(e.toString());
142 e.printStackTrace();
143 throw e;
144 }
145
146 }
I also tried avoiding the Tika 'ContentHandlerExample' class using direct call with InputStreamReader, to the same result:
public static String toTikaXhtmlString(final String filePathParam)
throws IOException, SAXException, TikaException
{
SourceCodeParser parser = new SourceCodeParser();
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
File file = new File(filePathParam);
try (InputStream stream = new FileInputStream(file)) {
parser.parse(stream, handler, metadata);
return handler.toString();
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
}
JUNIT Test
108 #Test
109 public void parseFile() {
110 String fileName, verifyInput, resultContent;
111
112 //arrange
113 fileName = "/Users/johnmeyer/Projects/code-proc/FileParseTest-run.txt";
114
115 String fileContent = "/** Test */ public MyTestClass {"
116 + "public static void main(String[] args) {"
117 + "System.out.println(\"This is a test.\"); }";
118
119
120 LocalFile.putText(fileName, fileContent);
121
122 verifyInput = LocalFile.getContent(fileName);
123
124 assertEquals(fileContent, verifyInput);
125 //act (and clean up)
126
127 try {
128
129 resultContent = LocalFile.toTikaXhtmlString(fileName);
130 } catch (Exception e) {
131 throw new RuntimeException(e.getMessage());
132 }
133
134 LocalFile.delete(fileName);
135
136 //assert
137 assertEquals(fileContent, resultContent);
138 }
Stack Trace
[INFO] Running us.johnmeyer.test.tools.FileParseTest Caught exception.
java.lang.NullPointerException java.lang.NullPointerException at
org.apache.commons.io.input.ProxyInputStream.markSupported(ProxyInputStream.java:181)
at
org.apache.tika.detect.AutoDetectReader.getBuffered(AutoDetectReader.java:137)
at
org.apache.tika.detect.AutoDetectReader.(AutoDetectReader.java:114)
at
org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:93)
at
org.apache.tika.parser.AbstractParser.parse(AbstractParser.java:53)
at
us.johnmeyer.utilities.LocalFile.toTikaXhtmlString(LocalFile.java:137)
at
us.johnmeyer.test.tools.FileParseTest.parseFile(FileParseTest.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at
org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at
org.junit.runners.ParentRunner.run(ParentRunner.java:309) at
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
at
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
at
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
at
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
at
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
Tika Source Code
17 package org.apache.tika.io;
18
19 import java.io.FilterInputStream;
20 import java.io.IOException;
21 import java.io.InputStream;
22
23 /**
24 * A Proxy stream which acts as expected, that is it passes the method
25 * calls on to the proxied stream and doesn't change which methods are
26 * being called.
27 * <p>
28 * It is an alternative base class to FilterInputStream
29 * to increase reusability, because FilterInputStream changes the
30 * methods being called, such as read(byte[]) to read(byte[], int, int).
31 * <p>
32 * See the protected methods for ways in which a subclass can easily decorate
33 * a stream with custom pre-, post- or error processing functionality.
34 *
35 * #author Stephen Colebourne
36 * #version $Id$
37 */
38 public abstract class ProxyInputStream extends FilterInputStream {
40 /**
41 * Constructs a new ProxyInputStream.
42 *
43 * #param proxy the InputStream to delegate to
44 */
45 public ProxyInputStream(InputStream proxy) {
46 super(proxy);
47 // the proxy is stored in a protected superclass variable named 'in'
48 }
...
174 /**
175 * Invokes the delegate's <code>markSupported()</code> method.
176 * #return true if mark is supported, otherwise false
177 */
178 #Override
179 public boolean markSupported() {
180 return in.markSupported();
181 }

Apache Shiro HashedCredentialsMatcher generates wrong salt

I'm facing the problem that Shiro shows some odd behavior in converting a byte
array to a salt.
I started to implement all classes involved in the process into my application which are:
org.apache.shiro.realm.AuthenticatingRealm
org.apache.shiro.authc.credential.HashedCredentialsMatcher
Upon User creation the user password is hashed with a generated salt and then stored in my database:
import org.apache.shiro.crypto.hash.Sha256Hash;
import org.apache.shiro.crypto.RandomNumberGenerator;
import org.apache.shiro.crypto.SecureRandomNumberGenerator;
RandomNumberGenerator rng = new SecureRandomNumberGenerator();
Object salt = rng.nextBytes();
String hashedPasswordBase64 = new Sha256Hash(password, salt, 1024).toBase64();
shiro.ini looks like this:
# SALTED JDBC REALM
saltedJdbcRealm=com.mycompany.ssp.SaltedJdbcRealm
dataSource = org.postgresql.ds.PGSimpleDataSource
dataSource.databaseName = Self-Service-Portal
dataSource.serverName = localhost
dataSource.portNumber = 5432
dataSource.user = postgres
dataSource.password = admin
saltedJdbcRealm.dataSource = $dataSource
saltedJdbcRealm.authenticationQuery = SELECT umgmt_users.password, umgmt_users.salt FROM umgmt_users WHERE umgmt_users.user = ?
sha256Matcher = org.apache.shiro.authc.credential.Sha256CredentialsMatcher
# base64 encoding, not hex in this example:
sha256Matcher.storedCredentialsHexEncoded = false
sha256Matcher.hashIterations = 1024
saltedJdbcRealm.credentialsMatcher = $sha256Matcher
################################################################################
# SECURITY MANAGER #
securityManager.realms = $saltedJdbcRealm
strategy = org.apache.shiro.authc.pam.FirstSuccessfulStrategy
securityManager.authenticator.authenticationStrategy = $strategy
################################################################################
my custom saltedJdbcRealm just overrides the doGetAuthenticationInfo. This code is from this blog ->
#Override
protected AuthenticationInfo doGetAuthenticationInfo(AuthenticationToken token) throws AuthenticationException {
//identify account to log to
UsernamePasswordToken userPassToken = (UsernamePasswordToken) token;
String username = userPassToken.getUsername();
if (username == null) {
log.debug("Username is null.");
return null;
}
// read password hash and salt from db
PasswdSalt passwdSalt = getPasswordForUser(username);
if (passwdSalt == null) {
log.debug("No account found for user [" + username + "]");
return null;
}
// return salted credentials
SimpleAuthenticationInfo info = new SimpleAuthenticationInfo(username, passwdSalt.password, getName());
info.setCredentialsSalt(new SimpleByteSource(passwdSalt.salt));
return info;
}
Debugging after return info goes like this:
AuthenticatingRealm.java: Mehtod: assertCredentialsMatch()
HashedCredentialsMatcher.java: Method: doCredentialsMatch()
HashedCredentialsMatcher.java: Method: hashProvidedCredentials()
Looking for the error I ended up finding it here in
org.apache.shiro.authc.credential.HashedCredentialsMatcher.java:
protected Object hashProvidedCredentials(AuthenticationToken token, AuthenticationInfo info) {
Object salt = null;
if (info instanceof SaltedAuthenticationInfo) {
// STOP HERE AND SEE BELOW PART 1!!!
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
// STOP HERE AND SEE BELOW PART 2!!!
} else {
//retain 1.0 backwards compatibility:
if (isHashSalted()) {
salt = getSalt(token);
}
}
return hashProvidedCredentials(token.getCredentials(), salt, getHashIterations());
}
Part 1:
lets take a look at the variable info:
The full byte array is the following:
57 109 102 43 65 87 118 88 70 76 105 82 116 104 113 108 116 100 101 108 79 119 61 61
which correctly represents the salt in my database:
9mf+AWvXFLiRthqltdelOw==
Next Step in the code is to extract the Salt from the info variable and store it in the variable salt of type Object.
Part 2:
looking at the variable salt after this line:
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
executed I get this result:
OW1mK0FXdlhGTGlSdGhxbHRkZWxPdz09
Edit:
I did another example and show you the 2 methods that 1) hash the submitted password 2) take the password from database for comparison & that they are not
the same:
I start off with 2 variables, token (submitted password) & info (stored password information):
Stored Credentials:
credentials:
d5fHxI7kYQYtyqo6kwvZFDATIIsZThvFQeDVidpDDEQ
storedBytes before decoding:
100 53 102 72 120 73 55 107 89 81 89 116 121 113 111 54 107 119 118 90 70 68 65 84 73 73 115 90 84 104 118 70 81 101 68 86 105 100 112 68 68 69 81 61
storedBytes after decoding:
119 -105 -57 -60 -114 -28 97 6 45 -54 -86 58 -109 11 -39 20 48 19 32 -117 25 78 27 -59 65 -32 -43 -119 -38 67 12 68
hash:
7797c7c48ee461062dcaaa3a930bd9143013208b194e1bc541e0d589da430c44
Submitted Credentials:
char[] credentials:
[0] = 1
[1] = 2
[2] = 3
byte[] bytes:
50 69 81 77 57 55 80 53 53 112 89 52 122 69 78 54 57 98 53 56 82 65 61 61
which is 2EQM97P55pY4zEN69b58RA== and this is whats inside the database
cachedBase64:
MkVRTTk3UDU1cFk0ekVONjliNThSQT09
return value is this hash:
af9a7ef0ea9fa4d93eae1ca5d16c03c516f4822ec3e9017f14f694175848a6ab
As the 2 Hash values are not the same I get why my Application is telling me wrong password BUT I created this user with the password 123 using the code above (first code block)
Edit End
So does anyone know why the hash calculation is not giving the same hash value for the same password??? Or what else I might have done wrong (i doubt that the shiro code is wrong so it may be something wrong in my code with generation the password hash/salt or shiro.ini configuration?)
ufff, after a little more playing around with those functions I found the solution why the submitted password is hashed with a wrong salt value
I added 3 lines to the method hashProvidedCredentials inside
org.apache.shiro.authc.credential.HashedCredentialsMatcher.java
protected Object hashProvidedCredentials(AuthenticationToken token, AuthenticationInfo info) {
Object salt = null;
if (info instanceof SaltedAuthenticationInfo) {
salt = ((SaltedAuthenticationInfo) info).getCredentialsSalt();
// Get base64 Decoder
java.util.Base64.Decoder Decoder = java.util.Base64.getDecoder();
// decode salt from database
byte[] encodedJava8 = null;
encodedJava8 = Decoder.decode(((SaltedAuthenticationInfo) info).getCredentialsSalt().getBytes());
// save decoded salt value in previous salt Object
salt = ByteSource.Util.bytes(encodedJava8);
// The 3 steps above are nessecary because the Object salt is of type
// SimpleByteSource and:
// - it holds a byte[] which holds the salt in its correct form
// - it also holds a cachedBase64 encoded version of this byte[]
// (which is of course not the actual salt)
// The Problem is that the next method call below that hashes the
// submitted password uses the cachedBase64 value to hash the
// passwort and not the byte[] which represents the actual salt
// Therefor it is nessecary to:
// - create SimpleByteSource salt with the value from the database
// - decode the byte[] so that the cachedBase64 represents the actual salt
// - store the decoded version of the byte[] in the SimpleByteSource variable salt
} else {
//retain 1.0 backwards compatibility:
if (isHashSalted()) {
salt = getSalt(token);
}
}
return hashProvidedCredentials(token.getCredentials(), salt, getHashIterations());
}
Now the user submitted password on login is hashed the same way as it was when being generated this way:
RandomNumberGenerator rng = new SecureRandomNumberGenerator();
Object salt = rng.nextBytes();
//Now hash the plain-text password with the random salt and multiple
//iterations and then Base64-encode the value (requires less space than Hex):
String hashedPasswordBase64 = new Sha256Hash(password, salt, 1024).toBase64();
Note: This is not the final version of password hashing. Salt is going to be at least 256bit & iterations are going to be around 200k-300k.
Having the problem solved, I narrowed down the problem to 4 possible options:
1)
There is a major Error in the shiro code (HashedCredentialsMatcher.java) (at least from my point of view it is) because password varification using a salt will always fail this way (see my description inside the code block).
2)
I either uses the wrong CredentialsMatcher for hased & salted passwords and I have no clue which one to use instead.
3)
My Implementation of the doGetAuthenticationInfo Method in my custom Realm has an error. For my Custom Realm I used this tutorial:
Apache Shiro Part 2 - Realms, Database and PGP Certificates
4)
I made a mistake on creation of the password hash (although that code is from the official Apache Shiro Website Link
From my Point of view the options 1 & 4 are not the problem so its either 2 or 3 which cause this problem and make it nessecary to add some code to HashedCredentialsMatcher.java Method: hashProvidedCredentials()
So concluding, does anyone have any idea on this issue just to get clarification??

Get values from an emitted tuple with a nimbus client

I have a running topology... on a remote cluster ... runs like a charm...
StormTopology runningTopology= new StormTopology();
nimbus.getClient().send_getTopology("MyNewTopo-14-1451353801");
runningTopology = nimbus.getClient().recv_getTopology();
Bolt readerBolt = new Bolt();
readerBolt = runningTopology.get_bolts().get("determineClickType");
System.out.println("Bolt com : " + readerBolt.get_bolt_object().toString());
/* Console dump: */
/* Returns: <ComponentObject serialized_java:80 01 80 01 00 02 00 00 00 ....... */
/* More Console dump:
Stream : StreamInfo(output_fields:[id, ack-val], direct:false)
Stream : StreamInfo(output_fields:[clickedType, count], direct:false)
Stream : StreamInfo(output_fields:[id], direct:false)
Stream : StreamInfo(output_fields:[task-info, data-points], direct:false)
Stream : StreamInfo(output_fields:[event], direct:false)
*/
I don't know how to proceed, to read the actual information which is emitted from a BaseBasicBolt defined as "determineClickType"
What is the correct approach to get values from an emitted tuple by a bolt with a nimbus client?
If I understand your question correctly, you want to get the tuples, that are emitted by a bolt? If you want to do this, you need to define a second bolt and connect it to the first bolt:
TopologyBuilder b = new TopologyBuilder();
b.addSpout(...);
b.addBolt("determineClickType", new MyBolt());
b.addBolt("receiver", new MyReceiverBolt()).shuffleGrouping("determineClickType");
The bolt with name "receiver" will now get the output tuples from the bolt called "determineClickType".

Categories