Which type is BLOB? - java

I have about 100,000 BLOBs in my database and have to work with them. All is ok when someone tells me which type of BLOB I must deal with. But there will be situations when I will not know which type is it. So how can I find out which type my BLOB is?
Last time I handled BLOB I get specific info about it, it was zipped file. So I did this..
try {
byte[] str = this.jdbcTemplate.queryForObject("SELECT SAVEDATA FROM JDBEVPP1.TEVP005 WHERE GFNR = 357302", byte[].class); // pakira BLOB u byte array
ByteArrayInputStream bys = new ByteArrayInputStream(str);
GZIPInputStream gzip = new GZIPInputStream(bys);
//...etc...
}
How can I find out which type BLOB is using Java code?

The final answer to this question would be the comment of Robert Harvey:
"The usual way to identify a binary file of some type is to have some "magic numbers" at the beginning of the file that you can use to identify the type. See en.wikipedia.org/wiki/… and en.wikipedia.org/wiki/File_format#Magic_number"
And also comment of Erwin Smout:
"By reading the detailed specs of the database design. Absent that, by trying to locate the original author of the system and hoping he still remembers. Absent that, by trying to locate other code that uses the same BLOB and kind of re-engineering the spec from there. In most shops you will have to go all the way to the third step alas. "

Related

Replacing text in XWPFParagraph without changing format of the docx file

I am developing font converter app which will convert Unicode font text to Krutidev/Shree Lipi (Marathi/Hindi) font text. In the original docx file there are formatted words (i.e. Color, Font, size of the text, Hyperlinks..etc. ).
I want to keep format of the final docx same as the original docx after converting words from Unicode to another font.
PFA.
Here is my Code
try {
fileInputStream = new FileInputStream("StartDoc.docx");
document = new XWPFDocument(fileInputStream);
XWPFWordExtractor extractor = new XWPFWordExtractor(document);
List<XWPFParagraph> paragraph = document.getParagraphs();
Converter data = new Converter() ;
for(XWPFParagraph p :document.getParagraphs())
{
for(XWPFRun r :p.getRuns())
{
String string2 = r.getText(0);
data.uniToShree(string2);
r.setText(string2,0);
}
}
//Write the Document in file system
FileOutputStream out = new FileOutputStream(new File("Output.docx");
document.write(out);
out.close();
System.out.println("Output.docx written successully");
}
catch (IOException e) {
System.out.println("We had an error while reading the Word Doc");
}
Thank you for ask-an-answer.
I have worked using POI some years ago, but over excel-workbooks, but still I’ll try to help you reach the root cause of your error.
The Java compiler is smart enough to suggest good debugging information in itself!
A good first step to disambiguate the error is to not overwrite the exception message provided to you via the compiler complain.
Try printing the results of e.getLocalizedMessage()or e.getMessage() and see what you get.
Getting the stack trace using printStackTrace method is also useful oftentimes to pinpoint where your error lies!
Share your findings from the above method calls to further help you help debug the issue.
[EDIT 1:]
So it seems, you are able to process the file just right with respect to the font conversion of the data, but you are not able to reconstruct the formatting of the original data in the converted data file.
(thus, "We had an error while reading the Word Doc", is a lie getting printed ;) )
Now, there are 2 elements to a Word document:
Content
Structure or Schema
You are able to convert the data as you are working only on the content of your respective doc files.
In order to be able to retain the formatting of the contents, your solution needs to be aware of the formatting of the doc files as well and take care of that.
MS Word which defined the doc files and their extension (.docx) follows a particular set of schemas that define the rules of formatting. These schemas are defined in Microsoft's XML Namespace packages[1].
You can obtain the XML(HTML) format of the doc-file you want quite easily (see steps in [1] or code in link [2]) and even apply different schemas or possibly your own schema definitions based on the definitions provided by MS's namespaces, either programmatically, for which you need to get versed with XML, XSL and XSLT concepts (w3schools[3] is a good starting point) but this method is no less complex than writing your own version of MS-Word; or using MS-Word's inbuilt tools as shown in [1].
[1]. https://www.microsoftpressstore.com/articles/article.aspx?p=2231769&seqNum=4#:~:text=During%20conversion%2C%20Word%20tags%20the,you%20can%20an%20HTML%20file.
[2]. https://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/testcases/org/apache/poi/hwpf/converter/TestWordToHtmlConverter.java
[3]. https://www.w3schools.com/xml/
My answer provides you with a cursory overview of how to achieve what you want to, but depending on your inclination and time availability, you may want to use your discretion before you decide to head onto one path than the other.
Hope it helps!

How to read DICOM string value with backslash (VR=LO, Value="0.4323\0.2325")?

Our C++ software use ITK to write DICOM files. In it we have a Private Tag with LO (Long String) as VR and 2 decimal values as value like 0.3234\0.34223.
The LO choice is inherent to ITK.
In other java application, I use dcm4che3 to read/write them. Since it respects the DICOM protocol, backslash are forbidden, and dcm4che interpret the value as "0.3234" and never reach the second value.
All DICOM viewer applications I use can display this value.
So my question is: Is there a trick in dcm4che to read this complete value as a string "0.3234\0.34223" despite the presence of a backslash?
Below, the code I use:
public DicomInfo uploadFile(MultipartFile file) throws IOException, ParseException {
DicomInfo infos = new DicomInfo();
Attributes attrs = readDicomAttributes(file);
infos.setTags(toAttributesObject(attrs).toString());
}
private static JsonObject toAttributesObject(Attributes targetSeriesAttrs)
{
StringWriter strWriter = new StringWriter();
JsonGenerator gen = Json.createGenerator(strWriter);
JSONWriter writer = new JSONWriter(gen);
writer.write(targetSeriesAttrs);
gen.flush();
gen.close();
return Json.createReader(new
StringReader(strWriter.toString())).readObject();
}
public Attributes readDicomAttributes(MultipartFile file) throws IOException
{
DicomInputStream dis = new DicomInputStream(file.getInputStream());
Attributes dataSet = dis.readDataset(-1, Tag.PixelData);
Attributes fmi = dis.readFileMetaInformation();
dis.close();
fmi.addAll(dataSet);
return fmi;
}
In the JSON I get for this tag:
\"00110013\":{\"vr\":\"LO\",\"Value\":[\"0.4323\"]},
As you can see it is LO and the second part is already lost.
The method I use to get the specific attribute:
attr.getStrings(0x00110013)
send back a table with only one value, 0.4323.
The problem happens during the readDataSet function.
When I open tags with software like free dicom viewer, I have the complete data, so data is here.
Ok I found the source of the problem... It is the addAll fmi.addAll(dataSet);
In dataSet, getStrings works perfectly. In fmi after addAll, the attributes lost the second value.
So my problem is to solve this addAll issue now: dcm4che3 java lib: Attributes.addAll method seems to lost multiple LO values
See answer from Paolo, and please believe us that the backslash is not a violation of the VR. Like he said, the attribute is 2-dimensional, i.e. it has two values of VR LO which are separated by the backslash.
I know a bit about the dcm4che project and the people behind it, and it is nearly unthinkable to me that it is generally incapable of handling this.
I strongly suspect that your problem is related to the fact that your attribute is private. That is, without any additional information to the tag and its value, dcm4che (and any other product) can never know that the attribute's value is encoded as VR LO (Long String).
The default transfer syntax in DICOM is Implicit Little Endian. This means, that the dataset does not convey an explicit information about the VR of the attributes in the dataset. This information is implicitly encoded by the Tag of the attribute, and the data dictionary (DICOM Part 6) must be used to look up the tag and obtain the corresponding VR. Obvioulsy this only works for well-known DICOM tags defined in the standard and fails for private ones.
So one option is to try encoding the dataset in Explicit Little Endian, which means that the VR is part of the attribute's encoding in the dataset. I would expect this to be sufficient.
Mature toolkits (like dcm4che) allow for extending the data dictionary by configuration, that is, you can extend the "official" data dictionary used by the tookit with your custom tag definitions - including the VR. So the tag can be looked up in the case that the VR is not explicitly given in the dataset itself.
Again, I am not an expert for dcm4che, but a quick search at google for "dcm4che private dictionary" yields this promising link.
I am more than confident that you can solve the problem in dcm4che and that you do not have to migrate to a different toolkit.
The solution of this problem is to write
dataSet.addAll(fmi);
return dataSet;
instead of
fmi.AddAll(dataSet);
return fmi;
since the addAll methods lost multiple values of private LO
LO can have multiple values separated by a backslash.
The DICOM standard says that in the VR "LO" the backslash cannot be used in values because it is used to separate the different elements.
In VRs that don't allow multiple elements then the backslash can be used in values.
So dcm4che is wrong here.

Can I transform a JSON-LD to a Java object?

EDIT: I changed my mind. I would find a way to generate the Java class and load the JSON as an object of that class.
I just discovered that exists a variant of JSON called JSON-LD.
It seems to me a more structured way of defining JSON, that reminds me XML with an associated schema, like XSD.
Can I create a Java class from JSON-LD, load it at runtime and use it to convert JSON-LD to an instantiation of that class?
I read the documentation of both the implementations but I found nothing about it. Maybe I read them bad?
Doing a Google search brought me to a library that will decode the JSON-LD into an "undefined" Object.
// Open a valid json(-ld) input file
InputStream inputStream = new FileInputStream("input.json");
// Read the file into an Object (The type of this object will be a List, Map, String, Boolean,
// Number or null depending on the root object in the file).
Object jsonObject = JsonUtils.fromInputStream(inputStream);
// Create a context JSON map containing prefixes and definitions
Map context = new HashMap();
// Customise context...
// Create an instance of JsonLdOptions with the standard JSON-LD options
JsonLdOptions options = new JsonLdOptions();
// Customise options...
// Call whichever JSONLD function you want! (e.g. compact)
Object compact = JsonLdProcessor.compact(jsonObject, context, options);
// Print out the result (or don't, it's your call!)
System.out.println(JsonUtils.toPrettyString(compact));
https://github.com/jsonld-java/jsonld-java
Apparently, it can take it from just a string as well, as if reading it from a file or some other source. How you access the contents of the object, I can't tell. The documentation seems to be moderately decent, though.
It seems to be an active project, as the last commit was only 4 days ago and has 30 contributors. The license is BSD 3-Clause, if that makes any difference to you.
I'm not in any way associate with this project. I'm not an author nor have I made any pull requests. It's just something I found.
Good luck and I hope this helped!
see this page: JSON-LD Module for Jackson

UTF8 characters showing weirdly or random basis in Android TextView

It has been at least 5 applications in which I have attempted to display UTF8 encoded characters and every time, quite sporadically and rarely I see random characters being replaced by diamond question marks (see image for better details).
I enclose a page layout to demonstrate my issues. The layout is very basic, it is very simple poll I am creating. The "Съгласен съм" text is takes from a database, where it has just been inserted by a script, using copy-pasted constant. The text is displayed in TextViews.
Has anyone ever encountered such an issue? Please advise!
EDIT: Something I forgot to mention is that the amount and position of weird characters varies on diffferent Android Phone models.
Finally I got it all sorted out in all my applications. Actually the issues mlet down to 3 different reasons and I will list all of them below so that this findings of mine could help people in the future.
Reason 1: Incorrect encoding of user created file.
This actually was the problem with the application I posted about in the question. The problem was that the encoding of the insert script I used for introducing the values in the database was "UTF8 without BOM". I converted this encoding to "UTF8" using Notepad++ and reinserted the values in the database and the issue was resolved. Thanks to #user3249477 for pointing me to thinking in this direction. By the way "UTF8 without BOM" seems to be the default encoding Eclipse uses when creating URF8 files, so take care!
Reason 2: Incorrect encoding of generated file.
The problem of reason 1, pointed me to what to think for in some of the other cases I was facing. In one application of mine I am provided with raw data that I insert in my backend database using simple Java application. The problem there turned out to be that I was passing through intermediate format, files stored on the file system that ?I used to verify I interpretted the raw data correctly. I noticed that these files were also created "UTF8 without BOM". I used this code to write to these files:
BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream(outputFilePath));
writer = new BufferedWriter(new OutputStreamWriter(outputStream, STRING_ENCODING));
writer.append(string);
Which I changed to:
BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream(outputFilePath));
writer = new BufferedWriter(new OutputStreamWriter(outputStream, STRING_ENCODING));
// prepending a bom
writer.write('\ufeff');
writer.append(string);
Following the prescriptions from this answer. This line I add basically made all the intermediate files be encoded in "UTF8" with BOM and resolved my encoding issues.
Reason 3: Incorrect parsing of HTTP responses
The last issue I encountered in few of my applications was that I was not interpretting the UTF8 http responses correctly. I used to have the following code:
HttpResponse response = httpClient.execute(host, request, (HttpContext) null);
String responseBody = null;
responseBody = IOHelper.getInputStreamContents(responseStream);
Where IOHelper is an util I have written myself and reads stream contents to String. I replaced this code with the already provided method in the Android API:
HttpResponse response = httpClient.execute(host, request, (HttpContext) null);
String responseBody = null;
if (response.getEntity() != null) {
responseBody = EntityUtils.toString(response.getEntity(), HTTP.UTF_8);
}
And this fixed the encoding issues I was having with HTTP responses.
As conclusion I can say that one needs to take special care of BOM / without BOM strings when using UTF8 encoding in Android. I am very happy I learnt so many new things during this investigation.

org.springframework.web.multipart.commons.CommonsMultipartFile cannot be cast to java.sql.Blob

This line of code:
document.setContent((Blob) file);
Is throwing this error:
org.springframework.web.multipart.commons.CommonsMultipartFile cannot be cast to java.sql.Blob
Where file is a MultipartFile.
Eclipse does not throw any warning message from the "offending" line of code above. In fact, eclipse added the (Blob) cast itself as part of a suggested fix.
Is there some other way that I can quickly and easily convert a MultipartFile to a Blob?
I have been working on this problem for a number of days now to no avail. See postings here and here. Simply being able to convert the MultipartFile to a Blob in a line or two of code would be an easy, elegant solution to an otherwise overcomplicated problem.
Take a look at docs and this question.
What I've come up with is to use byte[]:
byte[] contents = file.getBytes();
Blob blob = new SerialBlob(contents);
document.setContent(blob);

Categories