Escape all the characters in a pattern except some metachars

Escape all the characters in a pattern except some metachars - java

I want to allow the user use a "*" metachar in search and would like to use the pattern entered by user with Pattern.compile. So I would have to escape all the other metachars that user enters except the *. I am doing it with the below code, is there a better way of doing this?
private String escapePattern(String pattern) {
final String PATTERN_MATCH_ALL = ".*";
if(null == pattern || "".equals(pattern.trim())) {
return PATTERN_MATCH_ALL;
}
String remaining = pattern;
String result = "";
int index;
while((index = remaining.indexOf("*")) >= 0) {
if(index > 0) {
result += Pattern.quote(remaining.substring(0, index)) + PATTERN_MATCH_ALL;
}
if(index < remaining.length()-1) {
remaining = remaining.substring(index + 1);
} else
remaining = "";
}
return result + Pattern.quote(remaining) + PATTERN_MATCH_ALL;
}

How about
"\\Q" + pattern.replace("*", "\\E.*\\Q") + "\\E";

Related

How do I make a method that inserts dashes into a string in a looping pattern?

For context, the method needs to insert dashes into a string in a 1, 2, 4, 1, 2, 4... pattern. For example, a string that holds "Overflow" would be output as "O-ve-rflo-w". Would I use nested for loops in this situation?

The other answer using a pattern is a great solution, however, you could also use a recursive method. This may not be a compact solution, but the logic is easy to follow:
//Process the string in chunks of 7 characters
public static String addFormatting(String input){
String formatted = "";
//Add first character
if(input.length() >= 1) formatted = input.substring(0, 1);
//Add dash and the next 2 characters, else the remainder of the string
if(input.length() >= 3) formatted += "-" + input.substring(1,3);
else if (input.length() > 1) formatted += "-" + input.substring(1);
//Add dash and the next 4 characters, else the remainder of the string
if(input.length() >= 7) formatted += "-" + input.substring(3,7);
else if (input.length() > 3) formatted += "-" + input.substring(3);
//Add dash and recursivly format the next chunk
if(input.length() > 7){
formatted += "-";
return formatted + addFormatting(input.substring(7));
}
//else return the complete formatted once it has been fully processed
else return formatted;
}
To call the method simply use addFormatting("OverflowisagreatQnAsite!"); the printed output would O-ve-rflo-w-is-agre-a-tQ-nAsi-t-e!

You can do the following:
private static String applyPattern(List<Integer> pattern, String str) {
int currentPatternIndex = 0;
int iterationsTillNextDash = pattern.get(currentPatternIndex);
StringBuilder stringBuilder = new StringBuilder();
for (char aChar : str.toCharArray()) {
if (iterationsTillNextDash == 0) {
stringBuilder.append('-');
iterationsTillNextDash = pattern.get(++currentPatternIndex % pattern.size());
}
iterationsTillNextDash--;
stringBuilder.append(aChar);
}
return stringBuilder.toString();
}
Usage:
String strWithDashes = applyPattern(Arrays.asList(1, 2, 4), "Overflow");
System.out.println(strWithDashes);
Output:
O-ve-rflo-w

Here is a simple hard-coded example for your situation. Perhaps you can figure out a way to use modulus % in your code for words longer than "overflow".
class Main {
public static String addDashes(String s)
{
String s_with_dashes = "";
for(int i = 0; i < s.length(); i++)
{
if(i == 1 || i == 3 || i == 7)
{
s_with_dashes += '-';
}
s_with_dashes += s.charAt(i);
}
return s_with_dashes;
}
public static void main(String[] args)
{
String s = "Overflow";
String s_with_dashes = addDashes(s);
System.out.println(s_with_dashes);
}
}

Method:
private static String addDashes(String string, int... pattern) {
String output = "";
int index = 0;
while (true)
for (int p : pattern) {
if (index + p >= string.length())
return output += string.substring(index);
output += string.substring(index, index += p) + "-";
}
}
Call Method:
System.out.println(addDashes("Overflow", 1,2,4));
Output:
O-ve-rflo-w

How do I reverse the order of only the digits in a string?

Given a string in Java, how can I obtain a new string where all adjacent sequences of digits are reversed?
My code:
import static java.lang.System.*;
public class P2
{
public static void main(String[] args)
{
if(args.length < 1)
{
err.printf("Usage: java -ea P2 String [...]\n");
exit(1);
}
String[] norm = new String[args.length];
for(int i = 0; i<norm.length;i++)
{
norm[i] = args[i];
}
}
public String invertDigits(String[] norm)
{
}
}
And as an example, this is what it should do:
Inputs: 1234 abc9876cba a123 312asd a12b34c56d
1234 -> 4321
abc9876cba -> abc6789cba
a123 -> a321
312asd -> 213asd
a12b34c56d -> a21b43c65d

Although the question is heavily downvoted, the proposed problem seems clear now. I chose to solve it using a regular expression match in a recursive function.
private static String reverseDigits(String s) {
// the pattern will match a sequence of 1 or more digits
Matcher matcher = Pattern.compile("\\d+").matcher(s);
// fetch the position of the next sequence of digits
if (!matcher.find()) {
return s; // no more digits
}
// keep everything before the number
String pre = s.substring(0, matcher.start());
// take the number and reverse it
String number = matcher.group();
number = new StringBuilder(number).reverse().toString();
// continue with the rest of the string, then concat!
return pre + number + reverseDigits(s.substring(matcher.end()));
}
And here's the iterative approach.
private static String reverseDigits(String s) {
//if (s.isEmpty()) return s;
String res = "";
int base = 0;
Matcher matcher = Pattern.compile("\\d+").matcher(s);
while (!matcher.hitEnd()) {
if (!matcher.find()) {
return res + s.substring(base);
}
String pre = s.substring(base, matcher.start());
base = matcher.end();
String number = matcher.group();
number = new StringBuilder(number).reverse().toString();
res += pre + number;
}
return res;
}

String str = "1234";
//indexes
int i = 0, j = str.length()-1;
// find digits (if any)
while (!Character.isDigit(str.charAt(i)) && i < str.length()) {
i++;
}
while (!Character.isDigit(str.charAt(j)) && j >= 0) {
j--;
}
// while we havent searched all the digits
while (i < j) {
// switch digits
str = str.substring(0, i) + str.charAt(j) + str.substring(i + 1, j) + str.charAt(i) + str.substring(j + 1);
i++;
j--;
// find the next digits
while (!Character.isDigit(str.charAt(i)) && i < str.length()) {
i++;
}
while (!Character.isDigit(str.charAt(j)) && j >= 0) {
j--;
}
}
System.out.println(str);

Another dynamic approach without using regex classes:
public static String reverseOnlyNumbers(String s) {
StringBuilder digits = new StringBuilder();
StringBuilder result = new StringBuilder();
boolean start = false;
for (int i = 0; i < s.length(); i++) {
Character c = s.charAt(i);
if (Character.isDigit(c)) {
start = true;
digits.append(c);
}else {
start = false;
if (digits.length() > 0) {
result.append(digits.reverse().toString());
digits = new StringBuilder();
}
result.append(c);
}
}
return start ? result.append(digits.reverse()).toString() : result.toString();
}

Java Method is broken

I am trying to get it to return a compressed word. For example, reaction should be #act$. But it is getting returned as react$. I feel like my issue is not including the original word in the return statement. Can anyone help? Thanks!
public static String compress (String word) {
String newWord = "";
int the = word.indexOf("the");
if (the >= 0) {
newWord = word.substring(0,the) + "&" + word.substring(the+3);
}
int ion = newWord.indexOf("ion");
if (ion >= 0) {
newWord = newWord.substring(0,ion) + "$" + word.substring(ion+3);
}
int ing = newWord.indexOf("ing");
if (ing >= 0) {
newWord = newWord.substring(0,ing) + "~" + word.substring(ing+3);
}
int an = newWord.indexOf("an");
if (an >= 0) {
newWord = newWord.substring(0,an) + "#" + word.substring(an+2);
}
int re = newWord.indexOf("re");
if (re >= 0) {
newWord = newWord.substring(0,re) + "#" + word.substring(re+2);
}
int con = newWord.indexOf("con");
if (con >= 0) {
newWord = newWord.substring(0,con) + "%" + word.substring(con+3);
}
return newWord;
}

A compressed version also:
public static String compress(String word) {
word = word.replace("the", "&");
word = word.replace("ion", "$");
word = word.replace("ing", "~");
word = word.replace("an", "#");
word = word.replace("re","#");
word = word.replace("con","%");
return word;
}

You're mixing up your uses of newWord and word in a confusing way. If the first if clause doesn't fire, newWord will still be an empty string and none of the other conditions will fire. On the other hand, if newWord does get set to something, you still go on using word substrings, in ways that don't make any sense.
You would be better off just using one variable through the whole method.
public static String compress(String word) {
int the = word.indexOf("the");
if (the >= 0) {
word = word.substring(0,the) + "&" + word.substring(the+3);
}
int ion = word.indexOf("ion");
if (ion >= 0) {
word = word.substring(0,ion) + "$" + word.substring(ion+3);
}
int ing = word.indexOf("ing");
if (ing >= 0) {
word = word.substring(0,ing) + "~" + word.substring(ing+3);
}
int an = word.indexOf("an");
if (an >= 0) {
word = word.substring(0,an) + "#" + word.substring(an+2);
}
int re = word.indexOf("re");
if (re >= 0) {
word = word.substring(0,re) + "#" + word.substring(re+2);
}
int con = word.indexOf("con");
if (con >= 0) {
word = word.substring(0,con) + "%" + word.substring(con+3);
}
return word;
}
Note also that, written this way, you can only use each replacement once per word: if you have "thethe" you will compress it to "&the", not "&&". If you want use replacements multiple times, you would have to use a loop. Or, easier still, use String.replace.

Remove specifed trailing and leading punctuation from a word (Java) [duplicate]

This question already has answers here:
How can I remove all leading and trailing punctuation?
(3 answers)
Closed 9 years ago.
I have generated a constant by regex alled punctuation that contains everything that is defined to be punctuation i.e.
PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r"
Only problem is that I am not sure how to use this to remove all leading and trailing punctuation from a specified word. I have tried methods like replaceAll and startsWith but have had no luck.
Any suggestions anyone?

Completely untested, but should work:
public static String trimChars(String source, String trimChars) {
char[] chars = source.toCharArray();
int length = chars.length;
int start = 0;
while (start < length && trimChars.indexOf(chars[start]) > -1) {
start++;
}
while (start < length && trimChars.indexOf(chars[length - 1]) > -1) {
length--;
}
if (start > 0 || length < chars.length) {
return source.substring(start, length);
} else {
return source;
}
}
And you'd call it this way:
String trimmed = trimChars(input, PUNCTUATION);

A method that clears all chars in a string from the start and end (this should be more time-efficient than applying regex patterns):
public class StringUtil {
private static final String PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r";
public static String strip(String original, String charsToRemove) {
if (original == null) {
return null;
}
int end = original.length();
int start = 0;
char[] val = original.toCharArray();
while (start < end && charsToRemove.indexOf(val[start]) >= 0) {
start++;
}
while (start < end && charsToRemove.indexOf(val[end - 1]) >= 0) {
end--;
}
return ((start > 0) || (end < original.length())) ? original.substring(start, end) : original;
}
}
Use like this:
assertEquals("abc", StringUtil.strip(" !abc;-< ", StringUtils.PUNCTUATION));

String PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r";
String pattern = "([" + PUNCTUATION.replaceAll("(.)", "\\\\$1") + "]+)";
//[\ \!\"\'\,\;\:\.\-\_\?\)\(\[\]\<\>\*\#\t\n]
pattern = "\\b" + pattern + "|" + pattern + "\\b";
String text = ".\n<>#aword,... \n\t..# asecondword,?";
System.out.println( text.replaceAll(pattern, "") );
//awordasecondword
\b
is for word boundry.
Firstly you should put your characters in to [ ] (chracter class) and escape special characters.
"\b" + pattern
is for leading characters and
pattern + "\b"
is for trailing chracters.

How to truncate a HTML fragment to a given length(for preview) in Java? [duplicate]

Is there any utility (or sample source code) that truncates HTML (for preview) in Java? I want to do the truncation on the server and not on the client.
I'm using HTMLUnit to parse HTML.
UPDATE:
I want to be able to preview the HTML, so the truncator would maintain the HTML structure while stripping out the elements after the desired output length.

I've written another java version of truncateHTML. This function truncates a string up to a number of characters while preserving whole words and HTML tags.
public static String truncateHTML(String text, int length, String suffix) {
// if the plain text is shorter than the maximum length, return the whole text
if (text.replaceAll("<.*?>", "").length() <= length) {
return text;
}
String result = "";
boolean trimmed = false;
if (suffix == null) {
suffix = "...";
}
/*
* This pattern creates tokens, where each line starts with the tag.
* For example, "One, <b>Two</b>, Three" produces the following:
* One,
* <b>Two
* </b>, Three
*/
Pattern tagPattern = Pattern.compile("(<.+?>)?([^<>]*)");
/*
* Checks for an empty tag, for example img, br, etc.
*/
Pattern emptyTagPattern = Pattern.compile("^<\\s*(img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param).*>$");
/*
* Modified the pattern to also include H1-H6 tags
* Checks for closing tags, allowing leading and ending space inside the brackets
*/
Pattern closingTagPattern = Pattern.compile("^<\\s*/\\s*([a-zA-Z]+[1-6]?)\\s*>$");
/*
* Modified the pattern to also include H1-H6 tags
* Checks for opening tags, allowing leading and ending space inside the brackets
*/
Pattern openingTagPattern = Pattern.compile("^<\\s*([a-zA-Z]+[1-6]?).*?>$");
/*
* Find > ...
*/
Pattern entityPattern = Pattern.compile("(&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};)");
// splits all html-tags to scanable lines
Matcher tagMatcher = tagPattern.matcher(text);
int numTags = tagMatcher.groupCount();
int totalLength = suffix.length();
List<String> openTags = new ArrayList<String>();
boolean proposingChop = false;
while (tagMatcher.find()) {
String tagText = tagMatcher.group(1);
String plainText = tagMatcher.group(2);
if (proposingChop &&
tagText != null && tagText.length() != 0 &&
plainText != null && plainText.length() != 0) {
trimmed = true;
break;
}
// if there is any html-tag in this line, handle it and add it (uncounted) to the output
if (tagText != null && tagText.length() > 0) {
boolean foundMatch = false;
// if it's an "empty element" with or without xhtml-conform closing slash
Matcher matcher = emptyTagPattern.matcher(tagText);
if (matcher.find()) {
foundMatch = true;
// do nothing
}
// closing tag?
if (!foundMatch) {
matcher = closingTagPattern.matcher(tagText);
if (matcher.find()) {
foundMatch = true;
// delete tag from openTags list
String tagName = matcher.group(1);
openTags.remove(tagName.toLowerCase());
}
}
// opening tag?
if (!foundMatch) {
matcher = openingTagPattern.matcher(tagText);
if (matcher.find()) {
// add tag to the beginning of openTags list
String tagName = matcher.group(1);
openTags.add(0, tagName.toLowerCase());
}
}
// add html-tag to result
result += tagText;
}
// calculate the length of the plain text part of the line; handle entities (e.g. ) as one character
int contentLength = plainText.replaceAll("&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};", " ").length();
if (totalLength + contentLength > length) {
// the number of characters which are left
int numCharsRemaining = length - totalLength;
int entitiesLength = 0;
Matcher entityMatcher = entityPattern.matcher(plainText);
while (entityMatcher.find()) {
String entity = entityMatcher.group(1);
if (numCharsRemaining > 0) {
numCharsRemaining--;
entitiesLength += entity.length();
} else {
// no more characters left
break;
}
}
// keep us from chopping words in half
int proposedChopPosition = numCharsRemaining + entitiesLength;
int endOfWordPosition = plainText.indexOf(" ", proposedChopPosition-1);
if (endOfWordPosition == -1) {
endOfWordPosition = plainText.length();
}
int endOfWordOffset = endOfWordPosition - proposedChopPosition;
if (endOfWordOffset > 6) { // chop the word if it's extra long
endOfWordOffset = 0;
}
proposedChopPosition = numCharsRemaining + entitiesLength + endOfWordOffset;
if (plainText.length() >= proposedChopPosition) {
result += plainText.substring(0, proposedChopPosition);
proposingChop = true;
if (proposedChopPosition < plainText.length()) {
trimmed = true;
break; // maximum length is reached, so get off the loop
}
} else {
result += plainText;
}
} else {
result += plainText;
totalLength += contentLength;
}
// if the maximum length is reached, get off the loop
if(totalLength >= length) {
trimmed = true;
break;
}
}
for (String openTag : openTags) {
result += "</" + openTag + ">";
}
if (trimmed) {
result += suffix;
}
return result;
}

I think you're going to need to write your own XML parser to accomplish this. Pull out the body node, add nodes until binary length < some fixed size, and then rebuild the document. If HTMLUnit doesn't create semantic XHTML, I'd recommend tagsoup.
If you need an XML parser/handler, I'd recommend XOM.

There is a PHP function that does it here: http://snippets.dzone.com/posts/show/7125
I've made a quick and dirty Java port of the initial version, but there are subsequent improved versions in the comments that could be worth considering (especially one that deals with whole words):
public static String truncateHtml(String s, int l) {
Pattern p = Pattern.compile("<[^>]+>([^<]*)");
int i = 0;
List<String> tags = new ArrayList<String>();
Matcher m = p.matcher(s);
while(m.find()) {
if (m.start(0) - i >= l) {
break;
}
String t = StringUtils.split(m.group(0), " \t\n\r\0\u000B>")[0].substring(1);
if (t.charAt(0) != '/') {
tags.add(t);
} else if ( tags.get(tags.size()-1).equals(t.substring(1))) {
tags.remove(tags.size()-1);
}
i += m.start(1) - m.start(0);
}
Collections.reverse(tags);
return s.substring(0, Math.min(s.length(), l+i))
+ ((tags.size() > 0) ? "</"+StringUtils.join(tags, "></")+">" : "")
+ ((s.length() > l) ? "\u2026" : "");
}
Note: You'll need Apache Commons Lang for the StringUtils.join().

I can offer you a Python script I wrote to do this: http://www.ellipsix.net/ext-tmp/summarize.txt. Unfortunately I don't have a Java version, but feel free to translate it yourself and modify it to suit your needs if you want. It's not very complicated, just something I hacked together for my website, but I've been using it for a little more than a year and it generally seems to work pretty well.
If you want something robust, an XML (or SGML) parser is almost certainly a better idea than what I did.

I found this blog: dencat: Truncating HTML in Java
It contains a java port of Pythons, Django template function truncate_html_words

public class SimpleHtmlTruncator {
public static String truncateHtmlWords(String text, int max_length) {
String input = text.trim();
if (max_length > input.length()) {
return input;
}
if (max_length < 0) {
return new String();
}
StringBuilder output = new StringBuilder();
/**
* Pattern pattern_opentag = Pattern.compile("(<[^/].*?[^/]>).*");
* Pattern pattern_closetag = Pattern.compile("(</.*?[^/]>).*"); Pattern
* pattern_selfclosetag = Pattern.compile("(<.*?/>).*");*
*/
String HTML_TAG_PATTERN = "<(\"[^\"]*\"|'[^']*'|[^'\">])*>";
Pattern pattern_overall = Pattern.compile(HTML_TAG_PATTERN + "|" + "\\s*\\w*\\s*");
Pattern pattern_html = Pattern.compile("(" + HTML_TAG_PATTERN + ")" + ".*");
Pattern pattern_words = Pattern.compile("(\\s*\\w*\\s*).*");
int characters = 0;
Matcher all = pattern_overall.matcher(input);
while (all.find()) {
String matched = all.group();
Matcher html_matcher = pattern_html.matcher(matched);
Matcher word_matcher = pattern_words.matcher(matched);
if (html_matcher.matches()) {
output.append(html_matcher.group());
} else if (word_matcher.matches()) {
if (characters < max_length) {
String word = word_matcher.group();
if (characters + word.length() < max_length) {
output.append(word);
} else {
output.append(word.substring(0,
(max_length - characters) > word.length()
? word.length() : (max_length - characters)));
}
characters += word.length();
}
}
}
return output.toString();
}
public static void main(String[] args) {
String text = SimpleHtmlTruncator.truncateHtmlWords("<html><body><br/><p>abc</p><p>defghij</p><p>ghi</p></body></html>", 4);
System.out.println(text);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Escape all the characters in a pattern except some metachars - java

How about "\\Q" + pattern.replace("", "\\E.\\Q") + "\\E";

Related

How do I make a method that inserts dashes into a string in a looping pattern?

How do I reverse the order of only the digits in a string?

Java Method is broken

Remove specifed trailing and leading punctuation from a word (Java) [duplicate]

How to truncate a HTML fragment to a given length(for preview) in Java? [duplicate]

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Escape all the characters in a pattern except some metachars - java

How about "\\Q" + pattern.replace("*", "\\E.*\\Q") + "\\E";

Related

How do I make a method that inserts dashes into a string in a looping pattern?

How do I reverse the order of only the digits in a string?

Java Method is broken

Remove specifed trailing and leading punctuation from a word (Java) [duplicate]

How to truncate a HTML fragment to a given length(for preview) in Java? [duplicate]

Categories

Resources

How about "\\Q" + pattern.replace("", "\\E.\\Q") + "\\E";