How to tokenize an String like in lexer in java?
Please refer to the above question. I never used java regex . How to put the all substring into new string with matched characters (symbols like '(' ')' '.' '<' '>' ") separated by single space . for e.g. before regex
String c= "List<String> uncleanList = Arrays.asList(input1.split("x"));" ;
I want resultant string like this .
String r= " List < String > uncleanList = Arrays . asList ( input1 . split ( " x " ) ) ; "
Referring to the code that you linked to, matcher.group() will give you a single token. Simple use a StringBuilder to append this token and a space to get a new string where the tokens are space-separated.
String c = "List<String> uncleanList = Arrays.asList(input1.split(\"x\"));" ;
Pattern pattern = Pattern.compile("\\w+|[+-]?[0-9\\._Ee]+|\\S");
Matcher matcher = pattern.matcher(c);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String token = matcher.group();
sb.append(token).append(" ");
}
String r = sb.toString();
System.out.println(r);
String c = "List<String> uncleanList = Arrays.asList(input1.split('x'));";
Matcher matcher = Pattern.compile("\\<|\\>|\\\"|\\.|\\(|\\)").matcher(c);
while(matcher.find()){
String symbol = matcher.group();
c = c.replace(symbol," " + symbol + " ");
}
Actually if you look deeply You can figure out that you have to separate only not alphabet symbols and space ((?![a-zA-Z]|\ ).)
Related
How to extract the strings between the delimiters '<' and '>' from the string
“Rahul<is>an<entrepreneur>”
I tried using substring() method, but I could only extract one string out of the primary string. How to loop this and get all the strings between the delimiters from the primary string
You could use Pattern and Matcher for pattern lookup. For example, see code below:
String STR = "Rahul<is>an<entrepreneur>";
Pattern pattern = Pattern.compile("<(.*?)>", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(STR);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.end() + " " + matcher.group());
}
Output of above will give you start and end indexes and group substring:
5 9 <is>
11 25 <entrepreneur>
More specifically if you just want the strings, you can get string between the group start and end indexes.
STR.substring(matcher.start() + 1, matcher.end() - 1);
This gives you only the matching strings.
This worked for me:
String str = "Rahul<is>an<entrepreneur>";
String[] tempStr = str.split("<");
for (String st : tempStr) {
if (st.contains(">")) {
int index = st.indexOf('>');
System.out.println(st.substring(0, index));
}
}
Output:
is
entrepreneur
I want to find and replace a substring beginning with string 'sps.jsp' and ending with substring 'FILE_ARRAY_INDEX=12'.
Following is my string content
beginning with strings............[sps.jsp]..anything between.. [FILE_ARRAY_INDEX=12] ending with some strings....
Below is my code
Pattern r = Pattern.compile("sps.jsp[\\s\\S]*?FILE_ARRAY_INDEX=12");
Matcher m = r.matcher(InputStr);
if (m.find( ))
{
System.out.println("Found value: " + m.group() );
}
I'm not able to get my pattern and replace it with a new string.
All you need is to String::replaceAll with this regex sps.jsp(.*?)FILE_ARRAY_INDEX=12
String inputStr = "....";//your input
inputStr = inputStr.replaceAll("sps.jsp(.*?)FILE_ARRAY_INDEX=12", "[some string]");
Outputs
beginning with strings............[some string] ending with some strings....
I've to replace a set of substrings in a String with another substrings for example
"^t" with "\t"
"^=" with "\u2014"
"^+" with "\u2013"
"^s" with "\u00A0"
"^?" with "."
"^#" with "\\d"
"^$" with "[a-zA-Z]"
So, I've tried with:
String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";
Map<String,String> tokens = new HashMap<String,String>();
tokens.put("^t", "\t");
tokens.put("^=", "\u2014");
tokens.put("^+", "\u2013");
tokens.put("^s", "\u00A0");
tokens.put("^?", ".");
tokens.put("^#", "\\d");
tokens.put("^$", "[a-zA-Z]");
String regexp = "^t|^=|^+|^s|^?|^#|^$";
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(oppip);
while (m.find())
m.appendReplacement(sb, tokens.get(m.group()));
m.appendTail(sb);
System.out.println(sb.toString());
But it doesn't work. tokens.get(m.group()) throws an exception.
Any idea why?
You don't have to use a HashMap. Consider using simple arrays, and a loop:
String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";
String[] searchFor =
{"^t", "^=", "^+", "^s", "^?", "^#", "^$"},
replacement =
{"\\t", "\\u2014", "\\u2013", "\\u00A0", ".", "\\d", "[a-zA-Z]"};
for (int i = 0; i < searchFor.length; i++)
oppip = oppip.replace(searchFor[i], replacement[i]);
// Print the result.
System.out.println(oppip);
Here is an online code demo.
For the completeness, you can use a two-dimensional array for a similar approach:
String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";
String[][] tasks =
{
{"^t", "\\t"},
{"^=", "\\u2014"},
{"^+", "\\u2013"},
{"^s", "\\u00A0"},
{"^?", "."},
{"^#", "\\d"},
{"^$", "[a-zA-Z]"}
};
for (String[] replacement : tasks)
oppip = oppip.replace(replacement[0], replacement[1]);
// Print the result.
System.out.println(oppip);
In regex the ^ means "begin-of-text" (or "not" within a character class as negation). You have to place a backslash before it, which becomes two backslashes in a java String.
String regexp = "\\^[t=+s?#$]";
I have reduced it a bit further.
I am currently trying to test the regex pattern matching the following
[#123456]
[#aabc36]
I have tried #[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3} and successfully match #aabc36 but when it comes to adding the brackets [] , it fails.
I have tried below pattern for matching
[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}]
The below is my method for regex replacement
public String replaceColor(String text , String bbcode , String imageLocation ){
//"\\[("+bbcode+")\\]" for [369] , [sosad]
// String imageLocation = "file:///android_asset/smileyguy.png";
// builder.append("<img src=\"" + imageLocation + "\" />");
StringBuffer imageBuffer = new StringBuffer ("");
// Pattern pattern = Pattern.compile("\\"+bbcode);
Pattern pattern = Pattern.compile(Pattern.quote(bbcode));
Matcher matcher = pattern.matcher(text);
//populate the replacements map ...
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}
To match [ , ] literally, you should escape them. Otherwise it is used as metacharacter that represents a set of characters.
\[#[A-Fa-f0-9]{6}\]|\[[A-Fa-f0-9]{3}\]
In Java string litearls, \ should be escaped.
"\\[#[A-Fa-f0-9]{6}\\]|\\[[A-Fa-f0-9]{3}\\]"
You need to escape the brackets with a \ in order to match on them as they are a regex symbol:
\[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\]
In a Java string you will also need to escape the backslash so:
String pattern = "\\[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\\]";
If you want to include brackets in the pattern to match you must escape them with a . But because java already uses \ as an escape character you must use two of them "\[...\]"
I am trying parse out 3 pieces of information from a String.
Here is my code:
text = "H:7 E:7 P:10";
String pattern = "[HEP]:";
Pattern p = Pattern.compile(pattern);
String[] attr = p.split(text);
I would like it to return:
String[0] = "7"
String[1] = "7"
String[2] = "10"
But all I am getting is:
String[0] = ""
String[1] = "7 "
String[2] = "7 "
String[3] = "10"
Any suggestions?
A not-so-elegant solution I just devised:
String text = "H:7 E:7 P:10";
String pattern = "[HEP]:";
text = text.replaceAll(pattern, "");
String[] attr = text.split(" ");
From the javadoc, http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#split(java.lang.CharSequence) :
The array returned by this method contains each substring of the input
sequence that is terminated by another subsequence that matches this
pattern or is terminated by the end of the input sequence.
You get the empty string first because you have a match at the beginning of the string, it seems.
If I try your code with String text = "A H:7 E:7 P:10" I get indeed:
A 7 7 10
Hope it helps.
I would write a full regular expression like the following:
Pattern pattern = Pattern.compile("H:(\\d+)\\sE:(\\d+)\\sP:(\\d+)");
Matcher matcher = pattern.matcher("H:7 E:7 P:10");
if (!matcher.matches()) {
// What to do!!??
}
String hValue = matcher.group(1);
String eValue = matcher.group(2);
String pValue = matcher.group(3);
Basing on your comment I take it that you only want to get the numbers from that string (in a particular order?).
So I would recommend something like this:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("H:7 E:7 P:10");
while(m.find()) {
System.out.println(m.group());
}