Solr Query: replacing whitespace with +

Solr Query: replacing whitespace with + - java

The application I'm working on uses solr to index and search entries. I've been reading a bit about the logic and syntax behind in. Currently there's a bit of code that I'm confuses me and I'm hoping someone can clear up why the person who wrote this bit of code did it the way they did.
trimmedSearchField = SolrQueryUtil.escapeQueryString(trimmedSearchField).replaceAll("\\s+", "+");
String qString = "+(title:" + trimmedSearchField + "^100 OR description_t:" + trimmedSearchField + "^10 " +
"OR +" + trimmedSearchField +"^1)";
I'm just wanting bring attention to the .replaceAll method, why would we want to replace whitespace with +? My goal is to refactor a bit a search bar and I get better results ommitting the replaceAll call.
Example: two elements with the descriptions: "Helen of Troy" and "Helen from Troy" respectively. With replaceAll present, searching "Helen of Troy" will provide me with only the first element, with replaceAll removed, both will appear (which is what I want to occur)

that .replaceAll() call is just encoding any series of consecutive whitespaces into a single '+', which mean 'required' in lucene syntax (and Solr)
So it makes 'trimmedSearchField' mandatory in those fields.

Related

Generate Strings from regex

I am using regex to get some part of String. lets say i am capturing 3 group. I store these 3 groups in db. Now I want to recreate the original string from those groups. Is there a way this can be done?
For example Original String
longPrefix-202007_3847c820e158484dbc6ff486fc08cf6a.someSuffix
Regex
^(longPrefix-)(\d+)(_)([a-fA-F0-9]+)(.someSuffix)
after this there will be 3 groups,
group1: longPrefix-
group2: 202007
group3: _
group4: 3847c820e158484dbc6ff486fc08cf6a
group5: .someSuffix
i am only storing group2, and group4, because that is only changing part.
The question is that can I generate Original string only by using group2, group4 and regex?

I'm not aware of any regex java API that lets you access it in that sense.
In other words, in theory you can write software that determines that, provided the regexp matched plus the contents of group 2 and group 4, that the input must have been - input here -, but that's such a bizarre thing to want, and no java-based regexp implementation I know of gives you the API calls you'd need to write such a thing.
So either dig in and spend a few weeks writing that yourself, or just write it for this specific case: Match the regexp itself with a simple method taking in two strings and just emitting return "longPrefix-" + in1 + "_" + in2 + ".someSuffix";.

Regex to parse command line options

I'm faced with a need for parsing a string into key-value pairs, where the value may be optional. Standard command line parsers are not useful, because all the ones I checked accept a String[] and not a String. Thus, I resorted to regex, and sure enough, faced with the following:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
First, the input string:
"/opt/sensu/embedded/bin/ruby /opt/sensu/embedded/bin/check-graphite-stats.rb " +
"--crit 25 --host 99.99.999.9999:8082 --period -5mins --target 'alias(scale(divideSeries(" +
"summarize(sumSeries(nonNegativeDerivative(transformNull(exclude(" +
"\\\"unknown\\\"), 0))), \\\"30d\\\", \\\"sum\\\", false),summarize(" +
...gigantuous string
\\\"sum\\\", false)), 100), \\\"3pp error rate\\\")' " +
"--unknown-ignore --warn 5"
Next, my regex:
(--(?<option>.+?)\s+(?<value>.+?(?=--))?)+?
the above almost works, but not quite.
Output:
--crit 25
--host 99.99.999.9999:8082
--period -5mins
--target 'gigantuous string'
--unknown-ignore
--warn
Why is the value of --warn not picked up?

Because you're doing a positive lookahead to the next -- at the end of the regex ((?=--)), the value of the last parameter in the string isn't picked up as it's not followed by --. Accepting the end of the string as an alternative ((?:(?=--)|$)) and then filtering values that don't start with -- (by replacing .+? with .(?:[^-].+?)?) should behave in the way you want:
(--(?<option>.+?)\s+(?<value>.(?:[^-].+?)?(?:(?=--)|$))?)+?
(However, as others have mentioned, I'd be very surprised that there isn't a Java argument parsing library that would suit your use case. Even if it means writing the code to split your string into arguments yourself, it might be less brittle.)

Is there a way to copy the contents of a String in Eclipse using a keyboard shortcut or menu item?

Let's say I have this string:
String strName = "aaa" +
+ "bbb" +
+ "ccc" +
+ "ddd" +
+ "eee";
Let's say I have a lot of those type of strings, like tens of thousands. But let's say there are several times as many which I don't want to touch.
I need to grab the contents of these strings quicker than selecting the string and copying it, but since I need to manually determine whether or not these strings are the right ones, automating this process would take a long time and be prone to a lot of errors.
What I want to do is take the contents:
"aaa" +
+ "bbb" +
+ "ccc" +
+ "ddd" +
+ "eee";
And insert them into my clipboard with, hopefully, either a keyboard shortcut or menu item. The formatting varies constantly, so I'm just looking to copy what's between ""; without having to select it manually.
Is there a keyboard shortcut or menu item which can insert the contents of a string into the clipboard?
Here's how I'd expect it to work:
String strName =
^------ select this, hit keyboard shortcut, contents are copied
v----------clipboard now has this data:
"aaa" +
+ "bbb" +
+ "ccc" +
+ "ddd" +
+ "eee";
Assume automation of this process using third party tools outside of the editor, and find/replace, are not allowed. I've already written a program, and cannot use it. I'm looking for an in-editor shortcut, or menu item.

(Obviously a lot depends on the details of what you want to do, which you don't explain.)
I have found that for jobs for which one wants some automation ("tens of thousands" of items) but also need to retain some human eyeballs ("I need to manually determine...") Regular Expressions are often helpful. Eclipse's "Find/Replace" (Ctrl+F) dialog helps you to find occurrences belonging to a certain pattern, then still leaves you the choice of changing it (Replace/Find vs. Find (without replacing) ). Eclips doesn't have much help on regular expressions, but you can look at the Javadoc for Pattern for a fairly thorough manpage. Or look at various tutorials (google), but be aware that there are different versions and you want the one that is used in Java.
If your strings do not have a nice pattern you can use, what can maybe help is to mark each positive OR negative line's first character with some character that is not used (much) in the code, e.g. ~ or #. Then you can easily match those lines with a regexp. For instance, once those lines are marked, you can remove all lines matching (or NOT matching) that starting pattern, by replacing them with an empty string.
It might also be helpful to make copies of your files or extract just the lines you are interested in to a new file, then work on those copies, so that you don't mess up your source code files - again depending on what you need to do.
There are of course other text editors available that also do (different dialects of) regular expressions, like Notepad++ and Notetab (Windows environments) - sometimes the small differences in dialect are useful so I open my text in the required editor to do something Eclipse doesn't allow me to do.

Copy the expression from the first " to the last one, and pass it to the following program as the input:
perl -npe 's/^([+] )?"(.*)"( +[+] *|;)[\r\n]*$/$2/g';
Note, you might need an additional (or two) blank lines at the end of the quotes.
Edit: this will not unescape escape sequences.

Removing items from String

I am trying to replace all occurrences of a substring from a String.
I want to replace "\t\t\t" with "<3tabs>"
I want to replace "\t\t\t\t\t\t" with "<6tabs>"
I want to replace "\t\t\t\t" with "< >"
I am using
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
But no use, it does not replace anything, then i tried using
s = s.replaceAll("\t\t\t\t", "< >");
s = s.replaceAll("\t\t\t", "<3tabs>");
s = s.replaceAll("\t\t\t\t\t\t", "<6tabs>");
Again, no use, it does not replace anything. after trying these two methods i tried StringBuilder
I was able to replace the items through StringBuilder, My Question is, why am i unable to replace the items directly through String from the above two commands? Is there any method from which i can directly replace items from String?

try in this order
String s = "This\t\t\t\t\t\tis\t\t\texample\t\t\t\t";
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
System.out.print(s);
output:
This<6tabs>is<3tabs>example< >

6tabs is never going to find a match as the check before it will have already replaced them with two 3tabs.
You need to start with largest match first.
Strings are immutable so you can't directly modify them, s.replace() returns a new String with the modifications present in it. You then assign that back to s though so it should work fine.
Put things in the correct order and step through it with a debugger to see what is happening.

Take a look at this
Go through your text, divide it into a char[] array, then use a for loop to go through the individual characters.
Don't print them out straight, but print them using a %x tag (or %d if you like decimal numbers).
char[] characters = myString.tocharArray();
for (char c : characters)
{
System.out.printf("%x%n", c);
}
Get an ASCII table and look up all the numbers for the characters, and see whether there are any \n or \f or \r. Do this before or after.
Different operating systems use different line terminating characters; this is the first reference I found from Google with "line terminator Linux Windows." It says Windows uses \r\f and Linux \f. You should find that out from your example. Obviously if you strip \n and leave \r you will still have the text break into separate lines.
You might be more successful if you write a regular expression (see this part of the Java Tutorial, etc) which includes whitespace and line terminators, and use it as a delimiter with the String.split() method, then print the individual tokens in order.

A question on swapping XML attributes within a tag in Java

the question sounds a bit confusing, but it is actually straightforward. This is a follow-up of my previous post:
Need a little help on this regular expression
after successful transformation of the String, now the String looks like:
<media id="pc011018" rights="licensed"
type="photo">
<title>Sri Lankans harvest tea</title>
Now the only task left is to swap the three attributes of media node, so the output String should be:
<media type="photo" id="pc011018" rights="licensed">
<title>Sri Lankans harvest tea</title>
I actually could think of a way of doing this: first of all, I extract the string enclosed by the first pair of "[" bracket. Then for this string, I will use a StringTokenizer to tokenize three attributes strings: type, id, rights; then rearrange them in a StringBuffer,turn it back into a string, then finally concatenate with the remaining [title] substring.
I am just wondering if there is a better and more efficient way rather than using StringToknizer? Please kindly help, thanks.

A real hacky way of doing this
String input="<media id=\"pc011018\" rights=\"licensed\" type=\"photo\"><title>Sri Lankans harvest tea</title></media>";
Pattern r= Pattern.compile("<media id=\"(.*)\" rights=\"(.*)\" type=\"(.*)\">(.*)");
Matcher m = r.matcher(input);
m.find();
System.out.println("<media type=\""+m.group(3)+ "\" + id=\""+ m.group(1) + "\" rights=\"" + m.group(2) + "\">"+m.group(4));
Will only work if the data is always as you describe

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.