ArrayIndexOutofBoundsException in solr wordbreaksolrspellchecker getsuggesstion - java

We are escaping the special characters for exact search (" ") and it works fine, except for few cases where it throws
Arrayindex out of bounds exception at org.apache.solr.spelling.wordbreaksolrspellchecker.getSuggestions
search text : "PRINTING 9-27 TEST CARDS ADD-ON MATT LAMINATION ON 2-SIDE OF TEST CARDS PER BOX OF 100 PCS". the config is spellcheck.dictionary is default and commented the spellcheck.dictionary wordbreak
we cannot apply any patch now, checked the issue LUCENE-5494
any of you suggest any work around to get the results in-spite of the exception. any configuration changes to suppress suggest or spellcheck. commenting word break dictionary also didn't help. solr version 4.10.4

Due to security reasons, I cannot post anything related to code and sorry for the minimal information in the query.
Anyways It might be useful to someone like me. The reason even after commenting the wordbreak dictionary its still showing the exception is,the changes that were made in the solrconfig.xml file was not reflected. I was testing in my local machine which is a standalone environment. Restarting the container (weblogic) doesn't reflect the changes snd reloading the core through admin screen also didn't help. So import and reloadset and then restarting the container did the trick.

Related

Eclipse doesn't split string at line width when formatting

I'm currently working on an application in Eclipse where I'm running a really huge SQL statement that spans about 20 lines when splitting it in notepad to fit on the screen. Thus I want the string for the query also to be formatted somewhat more readable than a single line. All the time autoformatting normally worked when I used Eclipse but somehow now neither Ctrl + Alt + F nor rightclicking and selecting the "Format" option from the menu doesn't work to get a line break after a certain amount of characters.
I already checked the preferences where I already tried running my own profile with 120 and 100 characters line width but that didn't fix anything so far. I really don't know why Eclipse won't let me format this anymore. Normally Eclipse would be splitting the string into several lines in this case but I don't really know why this doesn't work anymore.
However other formatting is being fixed when executing autoformatting (e.g. if(xyz){ still becomes if (xyz) {.
Thank you for your help in advance.
As far as I can tell, autoformat as you described was never supported (at least as far back as 2008). And I have been using Eclipse much longer than that
You can do one of several things.
Simply insert the cursor in the string and hit a return.
Toggle word wrap Alt-Shift-Y
Try writing a regex to do what you want(not certain if this will work).

JTidy reports "3 errors were found!"... but does not say what they are

I have a large block of programmatically generated HTML. I ran it through Tidy (version r938) with the following Java code:
StringReader inStr = new StringReader(htmlInput);
StringWriter outStr = new StringWriter();
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parseDOM(inStr, outStr);
I get the following output:
InputStream: Document content looks like HTML 4.01 Transitional
247 warnings, 3 errors were found!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.
Trouble is, Tidy doesn't tell me what 3 errors it found.
I'm fibbing here a little. The output above actually follows a long list of all 247 warnings (mostly trimming out empty div elements). I can suppress those with tidy.setShowWarnings(false); either way, I see no error report, so I can't figure out what I need to fix. 300Kb of HTML is too much for me to eyeball.
I've tried numerous approaches to finding the error. I can't run it through validate.w3.org, sadly, as the HTML file is on a proprietary network. The most informative approach was to open it in IntelliJ IDEA; this revealed a dozen or so duplicate div IDs, which I fixed. Errors still occurred.
I've looked around for other mentions of this problem. While I find plenty of hits on things like "How can I get the error/warning messages out of the parsed HTML using JTidy?", they all appear to be asking for dissimilar things, or assume conditions that simply aren't holding for me. I'm getting warnings just fine, for example; it's the errors I need, and they're not being reported, even if I call setShowErrors(100) or something.
Am I going to have to dive into Tidy's source code and debug it, starting where it reports errors? Or is there something much simpler I could do?
Here's what I ended up doing to track down the errors:
Download JTidy's source. Most people should be able to go straight to the source.
Unzip the source into my dev area. Right on top of my existing source code. This also meant removing the Maven entry for JTidy from my pom.xml. (It also meant beating IntelliJ into submission (re: editing the relevant .iml files and restarting IJ a lot) when it got extremely confused by this.)
Set a breakpoint in Report.error. The first line of org.w3.tidy.Report.error() increments lexer.errors; error() is called from many places in the lexer.
Run my program in debug mode. Expect this to take a little while if the input HTML is large; a 300k file took around 10-15 seconds on my machine to stop on an error that turned out to be at the very end of the file.
Look at the contents of lexbuf. lexbuf is a byte array, so your IDE might not show it as text. It might also be large. You probably want to look at what index the lexer was looking at within lexbuf. If you have to, take that section of the byte array and cross-reference it with an ASCII table to get the text.
Search for that text in your HTML. Assuming it appears only once, there's your error. (In my case, it appeared exactly three times, and sure enough, I had three errors reported.)
This was much more involved than it probably should have been. I suspect Report.error() was being called inappropriately.
In my case, error() was called with the constant BAD_CDATA_CONTENT. This constant is used only by Report.warning(). error() doesn't know what to do with it, and just exits silently with no message at all. If I change the call in Lexer.getCDATA() from error() to warning(), I get the exact line and column of my error. (I also get what appears to be reasonably well-formed XHTML, instead of an empty document.)
I'd submit a ticket to the JTidy project with some suggestions, but SourceForge isn't letting me log in for some reason. So, here:
Given that this "error" appears not to doom the document to unparseability, I'll tentatively suggest that that call be made a warning instead. (In my specific case, it was an HTML tag inside a string constant or comment inside a script element; shouldn't have hurt anything. I asked another question about it, just in case.)
Report.error() should have a default case that reports an unhandled error code if it gets one.
Hope this helps anyone else having what I'm guessing is a rather esoteric problem.

mariadb FOUNDS_ROWS() is not working

I`m using mariadb-10.1.16.
FOUND_ROWS is always well working in HeidiSQL.
But, When I use mybatis in java, Sometimes FOUND_ROWS() returns 0 value.
I don`t know what problem is.
How can I solve this problem?
Let's see the statement before it, the one where you have SQL_CALC_FOUND_ROWS. And be sure that there are no intervening queries.
Now I am going to go out on a limb...
Turn on the "general log" until you the "0 value" happens.
Turn off the general log (to avoid filling up disk)
Find the two SELECTs in the log.
Find what is between them. Then file a bug with mybatis saying that the extra statements wrecked your code.

docx Template Docx4j replacing text in Java

Im new to Docx4j and my task is to replace some Text of a docx Template.
I read the getting Started Guide of docx4j but I don't think I fully understood the whole concept.
Well Anyway... I already tried [the unmashalling Template of Docx4j][1],
which worked fine with the given docx, but then I got the same Problem when I tried it on my own template
The Exceptions say, that the HashMap doesnt contain valid keys or values, and therefore it doesnt replace the placeholders.
I replaced the
<w:proofErr w:type="spellEnd"/>
by disabling the spellchecking, but it still didn't work... And it also takes quite some time to run the app.
In didn't understand the databound example in the Getting_Started.pdf, so I'm running out of options...
How can I simply replace some String-Texts from a docx?
EDIT:
I found out that if I add some Text to the unmarshallFromTemplate.docx and save it, that it wont replace the new lines of text.
the - Tags are somehow splitted into multiple Tags:
<w:p w:rsidR="002512F8" w:rsidRDefault="002512F8" w:rsidP="002512F8"><w:r><w:t>My</w:t></w:r><w:r w:rsidR="001A5174"><w:t xml:space="preserve"> favourite ice cream is ${DEGREE</w:t></w:r><w:r><w:t>}.</w:t></w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/></w:p>
editing the Text in the document.xml, and adding the missing Information didnt help much.
well anyway here is the document.xml of the Template.docx that im using:
http://uploaded.net/file/vz4qr23o
EDIT 2:
Well guys. I found a quite suitable workaround for myself and dont know why it took so long to figure it out.
As I was saying: The runs where splited up, and the reason for this was the ${} in my opinion. Therefore I simply used a # before my Placeholders and rewrote every placeholder, so that it would all be in one run.
Had to switch couple of times to the document.xml and rewrite the passages but then it worked. Then I simply used a replace(placeholder, xml) and replaced the text of the marshalled document.xml, then I unmarshalled it again.
Worked. End of Story, fuck the nightly build or the mappings. THX
docx4j source code has been on GitHub for a while now; that svn repository is obsolete.
The equivalent sample is now called VariableReplace. That code is a bit more efficient, but you need to build it yourself, or use a current nightly build.
You'll probably find running VariablePrepare addresses your issue.
The placeholder search and replace code built in to docx4j works just fine, but if you're having issues with placeholders getting broken up by rsid entities, you need to ensure that you have grammar and spell-checking disabled when saving your "template" (i.e. source) document. This will help prevent your text runs becoming fragmented (note that you might want to disable proof-reading too, as that inserts bookmark tags here there and everywhere).
Once you've done the search and replace and have a new / updated document, you can re-enable spell-checking easily enough. This thread has more on RSIDs: turnoff rsid's spell check & grammar check in generated xml

Strange intermittent character encoding behaivor in Tomcat server

From time to time we have encountered a very strange encoding problem in Tomcat in our production environment.
I have not yet been able to pinpoint exactly where in the code the problem happens, but it involves the replacement of non ascii characters to approximated ascii characters.
For example replacing the character 'å' with 'a'. Since the site is in swedish, the characters 'å', 'ä' and 'ö' is quite common. But for some reason the replacement of the 'ö' character always works, so a string like "Köp inte grisen i säcken" becomes "Kop inte grisen i säcken", ie the 'ä' is not replaced as it should, while the 'ö' character is.
Some quick facts about the problem:
It happens very seldom (we have noticed it 3-4 times, the first time maybe 1-2 years ago).
A restart of the troubled server makes the problem go away (until the next time).
It has never occured on more then one front end server at the same time.
It doesn't always happen on the same front end server.
No user input on the front end is involved.
All front end servers connect to the same CMS and DB, with the relevant config being identical.
All front end servers have the same relevant configuration (linux config, tomcat config, java environment config like "file.encoding" etc), and are started using the same script (all according to the hosting/service provider).
All front end servers use the same exact war file for the site, and the same jar files.
No other encoding problems can be seen on the site while this character replacement problem occurs.
We have never been able to reproduce the problem in any other environment.
We use Tomcat 5.5 and Java 5, because of CMS requirements.
I can only think of two plausible causes for this behaivor:
The hosting provider sometimes starts/restarts the front end servers in a different way, maybe with another user account with other environment variables or other file access rights, or maybe using some other script than the normal one.
Some process running during Tomcat or webapp startup depends upon some other process, and sometimes (intermittantly but seldom) these two (or more) processes happen to run in an order that causes this encoding defect.
But even if 1 or 2 above is the case, it still doesn't explain fully what really happens. What exact difference could explain this? Since all of "file.encoding", "file.encoding.pkg" "sun.io.unicode.encoding", "sun.jnu.encoding" and all other relevant environment variables all match on all front end machines (verified visually using a debug page, while the problem was occuring).
Can someone think of some plausible explanation for this strange intermittent behaivor? Simply upgrading Tomcat and/or Java version is not really a relevant answer since we don't really know if that would solve the problem, and it still doesn't explain what the problem was. I'm more interested in understanding exactly what the problem is caused by.
Regards
/Jimi
UPDATE:
I think I have found the code that performs the character replacements. On initiation (triggered by the first call to do a replacement) it builds a HashMap<Character, String>, and fills it like this:
lookup.put(new Character('å'), "a");
Then when it should replace characters for a String, it loops over each character and for each one does a lookup in the hash map with the charactar as the key, and if a replacement String is found it is used, otherwise the original character is used.
This part of the code is more then 3 years old, and written by a developer who is long gone. If I would rewrite this code today I would do something totally different, and that might even solve the problem. But it would still not explain exactly what happend. Can someone see some possible explanation?
Normalize the input to normal Form C, before doing the replacement.
For instance, ä can be just 1 character, U+00E4, or it can be two characters, a (U+0061) and the combining diaeresis U+0308.
If your replacement just looks for the composed form, then the decomposed form will still remain as \u0061\u0308 because neither of those match \u00e4:
public static void main(String args[]) {
String decomposed = "\u0061\u0308";
String composed = "\u00e4";
System.out.println(decomposed);
System.out.println(composed);
System.out.println(composed.equals(decomposed));
System.out.println(Normalizer
.normalize(decomposed, Normalizer.Form.NFC).equals(composed));
}
Output
ä
ä
false
true

Categories