I have a string as follows:
"This is #awesome #dude"
From this string i want to extract awesome and dude and create a string
output==> "awesome,dude"
So my code is like following:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
while (matcher.find()){
mergedStr += matcher.group() +",";
}
But this creates an artifact in the end
output==> "awesome,dude," //<-- egghh comma.. in the end
What is a better way to solve this out.
Another approach:
boolean needComma = false;
while (blah, blah, blah) {
if (needComma) {
string += ",";
}
string += word;
needComma = true;
}
But there are a dozen different approaches.
This is one option:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
while (matcher.find()){
if (!mergedStr.isEmpty())
mergedStr += ",";
mergedStr += matcher.group();
}
Here is another common approach:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder sb = new StringBuilder();
while (matcher.find()){
sb.append(matcher.group()).append(",");
}
return sb.toString().replaceAll(",$", "");
If you don't want to use a regex, you could do it like this:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder sb = new StringBuilder();
while (matcher.find()){
sb.append(matcher.group()).append(",");
}
if (sb.length() == 0) {
return "";
}
else {
return sb.toString().substring(0, sb.length() - 1);
}
A useful pattern that I often use for this kind of thing is to append the first item, and then append the remainder of the items preceded by the separator. This avoids unnecessary conditionals in loops or postprocessing to remove trailing separators.
I know, microoptimizations blah, blah, sixth circle of hell, blah, blah, but just including here for your amusement:
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(textStr);
StringBuilder mergedStr = new StringBuilder();
if (matcher.find()) {
mergedStr.append(matcher.group());
while (matcher.find()) {
mergedStr.append(',').append(matcher.group());
}
}
return mergedStr.toString();
Also, I'm not 100% convinced that replacing a quadratic algorithm (string concatenation) with a linear algorithm (StringBuilder) qualifies as a microoptimization in the bad sense.
String input = "#awesome#dude";
List<String> strSplit = new ArrayList<String>();
String result = "";
Matcher matcher = Pattern.compile("(?<=#)\\w+").matcher(input);
while (matcher.find()){
strSplit.add(matcher.group());
}
for(int j = 0; j< strSplit.size(); j++){
result = result + strSplit.get(j);
if(j < strSplit.size() -1){
result = result+",";
}
}
System.out.println("Result : " + result);
Related
I would like to have a Java method for replacing leading numbers in the xml element name. For example,<1396-tt5m>25K</1396-tt5m> needs to be transformed to <a-tt5m>25K</a-tt5m>. Please take a look to my method for this:
public static String removeLeadNumbersFromXMLTagElements(String xml) throws TransformerException {
Pattern p = Pattern.compile("(<[^>]*?[^[0-9]][^>]*?>)");
Matcher m = p.matcher(xml);
StringBuffer result = new StringBuffer();
while (m.find()) {
String replace = m.group().replaceAll("[^[0-9]]+", "a");
m.appendReplacement(result, replace);
}
m.appendTail(result);
return result.toString();
}
But the result of my method is:<a-ttam>25K</a-ttam>. Could you please help with correct regex? Thank you in advance.
Try using this:
public static String removeLeadNumbersFromXMLTagElements(String xml) throws TransformerException {
Pattern p = Pattern.compile("(\\<.*?)[0-9]+(.*?\\>)");
Matcher m = p.matcher(xml);
StringBuffer result = new StringBuffer();
while (m.find()) {
String replace = m.group(1) + "a" + m.group(2);
m.appendReplacement(result, replace);
}
m.appendTail(result);
return result.toString();
}
So this is not exactly what you wanted, but it should solve the problem. It will get the tag and then remove any leading digits, but nothing else. This code replaces your while loop. Your code is fine for identifying tags, but (as you noted) it is replacing all digits, not just the leading ones.
while (m.find()) {
//System.out.println(m.group());
String work = m.group();
String replace = m.group();
if (work.substring(0, 2).equals("</")) {
//System.out.println("end tag");
if (work.length() > 2 && Character.isDigit(work.charAt(2))) {
replace = "</a";
int i = 3;
while (i < work.length() && Character.isDigit(work.charAt(i))) {
i++;
}
replace += work.substring(i);
}
} else if (work.substring(0, 1).equals("<")) {
//System.out.println("begin tag");
if (work.length() > 1 && Character.isDigit(work.charAt(1))) {
replace = "<a";
int i = 2;
while (i < work.length() && Character.isDigit(work.charAt(i))) {
i++;
}
replace += work.substring(i);
}
}
m.appendReplacement(result, replace);
}
Mine solution: I finally found that I can use String replace = m.group().replaceFirst("[^[0-9]]+", "a") instead of replaceAll. That also works!
I'm using regex to find a pattern
I need to find all matches in this way :
input :"word1_word2_word3_..."
result: "word1_word2","word2_word3", "word4_word5" ..
It can be done using (?=) positive lookahead.
Regex: (?=(?:_|^)([^_]+_[^_]+))
Java code:
String text = "word1_word2_word3_word4_word5_word6_word7";
String regex = "(?=(?:_|^)([^_]+_[^_]+))";
Matcher matcher = Pattern.compile(regex).matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
word1_word2
word2_word3
word3_word4
...
Code demo
You can do it without regex, using split:
String input = "word1_word2_word3_word4";
String[] words = input.split("_");
List<String> outputs = new LinkedList<>();
for (int i = 0; i < words.length - 1; i++) {
String first = words[i];
String second = words[i + 1];
outputs.add(first + "_" + second);
}
for (String output : outputs) {
System.out.println(output);
}
Actually this is a very simple question, I tried a lot but I am unable to get the exact solution. I have a string like:
String mystring = "one<1234567>,two<98765432>,three<878897656>";
Here I want the data which is inside "<" and ">". Can anyone help me with this?
I would use regex
String str = "one<1234567>,two<98765432>,three<878897656>";
Matcher m = Pattern.compile("<(.+?)>").matcher(str);
while(m.find()) {
String v = m.group(1);
}
Try
String mystring = "one<1234567>,two<98765432>,three<878897656>";
String[] result = mystring.split(",");
for (String s : result) {
s = s.substring(s.indexOf("<")+1);
s = s.substring(0, s.indexOf(">"));
System.out.println(s);
}
Print result :
1234567
98765432
878897656
You can use a regex like <(.*?)> :
String mystring = "one<1234567>,two<98765432>,three<878897656>";
Pattern pattern = Pattern.compile("<(.*?)>");
Matcher matcher = pattern.matcher(mystring);
while (matcher.find())
{
System.out.println(matcher.group(1));
}
Try this
String mystring = "one<1234567>,two<98765432>,three<878897656>";
String[] a = myString.split(",");
for(int i = 0; i < a.length; i++){
String substr=a[i].subString(a[i].indexOf("<"),a[i].indexOf(">"));
System.out.println(substr);
}
Try if your inner bracket value always numeric and outside alphabetical i.e. <, >
String[] strings=mystring.replaceAll("[a-z<>]", "").split(",");
for(String string:stringsArray)
{
System.out.println(string);
}
i found an new solution from StringTokenizer class
you can use it as,
StringTokenizer tokens = new StringTokenizer(KEY_SUBFOLDERNAME, ".");
String first_string = tokens.nextToken();
File_Ext = tokens.nextToken();
System.out.println("First_string : "+first_string);
System.out.println("File_Ext : "+File_Ext);
i wanna extract a part of url which is at the middle of it, by using regex in java
this is what i tried,mostly the problem to detect java+regexis that its in the middle of last part of url and i have no idea how to ignore the characters after it, my regex just ignoring before it:
String regex = "https://www\\.google\\.com/(search)?q=([^/]+)/";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
}
the result should be regex+java or even regex java . but my code didnt work out...
Try:
String regex = "https://www\\.google\\.com/search\\?q=([^&]+).*";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
The result is:
https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
regex+java
EDIT
Replacing all pluses before printing:
for (int i = 0; i <= n; ++i) {
String str = matcher.group (i).replaceAll("\\+", " ");
System.out.println (str);
}
String regex = "https://www\\.google\\.com/?(search)\\?q=([^&]+)?";
String url = "https://www.google.com/search?q=regex+java&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group());
}
This should do your job.
in matcher.replace method,only has:
replaceFirst() and replaceAll() two methods
i want limit replace 3 times,how to do?
example:
String content="aaaaaaaaaa";
i want to get result is: "bbbaaaaaaa"
my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class T1 {
public static void main(String[] args) {
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m=pattern.matcher(content);
if(m.find()){
String result=m.replaceFirst("b");
System.out.println(result);
}
}
}
thanks :)
On appendReplacement/Tail
You'd have to use appendReplacement and appendTail explicitly. Unfortunately you have to use StringBuffer to do this. Here's a snippet (see also in ideone.com):
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
StringBuffer sb = new StringBuffer();
final int N = 3;
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(sb, "b");
} else {
break;
}
}
m.appendTail(sb);
System.out.println(sb); // bbbaaaaaaa
See also
StringBuilder and StringBuffer in Java
StringBuffer is synchronized and therefore slower than StringBuilder
BugID 5066679: Matcher should make more use of Appendable
If granted, this request for enhancement would allow Matcher to append to any Appendable
Another example: N times uppercase replacement
Here's another example that shows how appendReplacement/Tail can give you more control over replacement than replaceFirst/replaceAll:
// replaces up to N times with uppercase of matched text
static String replaceUppercase(int N, Matcher m) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(
sb,
Matcher.quoteReplacement(m.group().toUpperCase())
);
} else {
break;
}
}
m.appendTail(sb);
return sb.toString();
}
Then we can have (see also on ideone.com):
Pattern p = Pattern.compile("<[^>]*>");
Matcher m = p.matcher("<a> b c <ddd> e <ff> g <$$$> i <jjj>");
System.out.println(replaceUppercase(4, m));
// <A> b c <DDD> e <FF> g <$$$> i <jjj>
// 1 2 3 4
The pattern <[^>]*> is just a simple example pattern that matches "<tags like this>".
Note that Matcher.quoteReplacement is necessary in this particular case, or else appending "<$$$>" as replacement would trigger IllegalArgumentException about an illegal group reference (because $ unescaped in replacement string is a backreference sigil).
On replaceFirst and replaceAll
Attached is the java.util.regex.Matcher code for replaceFirst and replaceAll (version 1.64 06/04/07). Note that it's done using essentially the same appendReplacement/Tail logic:
// Excerpt from #(#)Matcher.java 1.64 06/04/07
public String replaceFirst(String replacement) {
if (replacement == null)
throw new NullPointerException("replacement");
StringBuffer sb = new StringBuffer();
reset(); // !!!!
if (find())
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
}
public String replaceAll(String replacement) {
reset(); // !!!!
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
Note that the Matcher is reset() prior to any replaceFirst/All. Thus, simply calling replaceFirst 3 times would always get you the same result (see also on ideone.com):
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
String result;
result = m.replaceFirst("b"); // once!
result = m.replaceFirst("b"); // twice!
result = m.replaceFirst("b"); // one more for "good" measure!
System.out.println(result);
// baaaaaaaaa
// i.e. THIS DOES NOT WORK!!!
See also
java.util.regex.Matcher source code, OpenJDK version
i think use StringUtils
code
org.apache.commons.lang3.StringUtils.replace(content,"a","b",3);