Extract String from a within a String using a Regular Expression

Extract String from a within a String using a Regular Expression - java

I have a very large String containing within it some markers like:
{codecitation class="brush: java; gutter: true;" width="700px"}
I'd need to collect all the markers contained in the long String. The difficulty I find in this task is that the markers all contain different parameter values. The only thing they have in common is the initial part that is:
{codecitation class="brush: [VARIABLE PART] }
Do you have any suggestion to collect all the markers in Java using a Regular Expression ?

Use pattern matching to find the markers as below. I hope this will help.
String xmlString = "{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}efasf{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}";
Pattern pattern = Pattern.compile("(\\{codecitation)([0-9 a-z A-Z \":;=]{0,})(\\})");
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find()) {
System.out.println(matcher.group());
}

I guess you are particularly interested in the brush: java; and gutter: true; parts.
Maybe this snippet helps:
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CodecitationParserTest {
public static void main(String[] args) {
String testString = "{codecitation class=\"brush: java; gutter: true;\" width=\"700px\"}";
Pattern codecitationPattern = Pattern
.compile("\\{codecitation class=[\"]([^\"]*)[\"][^}]*\\}");
Matcher matcher = codecitationPattern.matcher(testString);
Pattern attributePattern = Pattern
.compile("\\s*([^:]*): ([^;]*);(.*)$");
Matcher attributeMatcher;
while (matcher.find()) {
System.out.println(matcher.group(1));
attributeMatcher = attributePattern.matcher(matcher.group(1));
while (attributeMatcher.find()) {
System.out.println(attributeMatcher.group(1) + "->"
+ attributeMatcher.group(2));
attributeMatcher = attributePattern.matcher(attributeMatcher
.group(3));
}
}
}
}
The codecitationPattern extracts the content of the class attribute of a codecitation element. The attributePattern extracts the first key and value and the rest, so you can apply it recursively.

Related

Java replaceAll but the specified regex

Can't get my head around this for quite some time already. I have this piece of code:
getStringFromDom(doc).replaceAll("contract=\"\\d*\"|name=\"\\p{L}*\"", "");
Basically I need it to work literally the opposite way - to replace everything BUT the specified regex. I've been trying to do it with the negative lookahead to no avail.

For your particular task, I think
getStringFromDom(doc).replaceAll(".*?(contract=\"\\d*\"|name=\"\\p{L}*\").*", "$1");
should do what you need.

You want to remove everything that does not match the pattern. This is the same as simply filtering the pattern matches. Use the regex to find matches for that pattern, then collect the matches in a stringbuilder.
Matcher m = Pattern.compile(your pattern).matcher(your input);
StringBuilder sb = new StringBuilder();
while (m.find()) sb.append (m.group()).append('\n');
String result = sb.toString();

I also think that removing what your are not looking for is a double negative. Concentrate on what you are looking for and use a pattern matching for that. This example searches your document for any name attributes:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String input = "<AnotherDoc accNum=\"1111\" docDate=\"2017-09-26\" docNum=\"2222\" name=\"foo\"> <anotherTag>some date</anotherTag>";
Pattern pattern = Pattern.compile("name=\"[^\\\"]*\""); // value are all characters but "
Matcher matcher = pattern.matcher(input);
while (matcher.find())
System.out.println(matcher.group());
}
}
This prints:
name="foo"

I am trying to extract text using regex but it is not working

I am trying to extract text using regex but it is not working. Although my regex work fine on regex validators.
public class HelloWorld {
public static void main(String []args){
String PATTERN1 = "F\\{([\\w\\s&]*)\\}";
String PATTERN2 = "{([\\w\\s&]*)\\}";
String src = "F{403}#{Title1}";
List<String> fvalues = Arrays.asList(src.split("#"));
System.out.println(fieldExtract(fvalues.get(0), PATTERN1));
System.out.println(fieldExtract(fvalues.get(1), PATTERN2));
}
private static String fieldExtract(String src, String ptrn) {
System.out.println(src);
System.out.println(ptrn);
Pattern pattern = Pattern.compile(ptrn);
Matcher matcher = pattern.matcher(src);
return matcher.group(1);
}
}

Why not use:
Pattern regex = Pattern.compile("F\\{([\\d\\s&]*)\\}#\\{([\\s\\w&]*)\\}");
To get both ?
This way the number will be in group 1 and the title in group 2.
Another thing if you're going to compile the regex (which can be helpful to performance) at least make the regex object static so that it doesn't get compiled each time you call the function (which kind of misses the whole pre-compilation point :) )
Basic demo here.

First problem:
String PATTERN2 = "\\{([\\w\\s&]*)\\}"; // quote '{'
Second problem:
Matcher matcher = pattern.matcher(src);
if( matcher.matches() ){
return matcher.group(1);
} else ...
The Matcher must be asked to plough the field, otherwise you can't harvest the results.

pattern matching with regular expression in java

I need to write a program that matches pattern with a line, that pattern may be a regular expression or normal pattern
Example:
if pattern is "tiger" then line that contains only "tiger" should match
if pattern is "^t" then lines that starts with "t" should match
I have done this with:
Blockquote Pattern and Matcher class
The problem is that when I use Matcher.find(), all regular expressions are matching but if I give full pattern then it is not matching.
If I use matches(), then only complete patterns are matching, not regular expressions.
My code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchesLooking
{
private static final String REGEX = "^f";
private static final String INPUT =
"fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main(String[] args)
{
// Initialize
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "
+ REGEX);
System.out.println("Current INPUT is: "
+ INPUT);
System.out.println("find(): "
+ matcher.find());
System.out.println("matches(): "
+ matcher.matches());
}
}

matches given a regex of ^t would only match when the string only consists of a t.
You need to include the rest of the string as well for it to match. You can do so by appending .*, which means zero or more wildcards.
"^t.*"
Also, the ^ (and equivalently $) is optional when using matches.
I hope that helps, I'm not entirely clear on what you're struggling with. Feel free to clarify.

This is how Matcher works:
while (matcher.find()) {
System.out.println(matcher.group());
}
If you're sure there could be only one match in the input, then you could also use:
System.out.println("find(): " + matcher.find());
System.out.println("matches(): " + matcher.group());

Matching a string to a regex from html input

I'm having a little trouble figuring out what to do.
Basically using java I'm trying to:
Reading in the html from a website
I want to find the content after a certain string in this case being
title="
Store that in a string.
The first and last steps are simple for me but I'm having no luck (and never had with regex).
I believe this is the beginning of what I need:
String regex = "(?<=title=\")\\S+";
Pattern name = Pattern.compile(regex);
After that I have no clue. Any help?

import java.util.regex.Matcher;
import java.util.regex.Pattern;
String EXAMPLE_TEST = "......";
Pattern pattern = Pattern.compile("(?<=title=\")(\\S+)")
Matcher matcher = pattern.matcher(EXAMPLE_TEST);
while (matcher.find()) {
System.out.println(matcher.group());
}
Note: You might consider to use regex pattern (?<=title=\")([^\"]*)

List<String> result_list = new ArrayList<String>();
Pattern p = Pattern.compile("title=\"(.*)\"");
Matcher m = p.matcher("title=\"test\"");
boolean result = m.find();
while(result)
{
result_list.add(m.group(0));
result = m.find();
}

Java RegExp can't get the result ater evaluating pattern

Hi I have been trying to learn RegExpresions using Java I am still at the begining and I wanted to start a little program that is given a string and outputs the syllabels split.This is what I got so far:
String mama = "mama";
Pattern vcv = Pattern.compile("([aeiou][bcdfghjklmnpqrstvwxyz][aeiou])");
Matcher matcher = vcv.matcher(mama);
if(matcher){
// the result of this should be ma - ma
}
What I am trying to do is create a pattern that checks the letters of the given word and if it finds a pattern that contains a vocale/consonant/vocale it will add a "-" like this v-cv .How can I achive this.

In the following example i matched the first vowel and used positive lookahead for the next consonant-vowel group. This is so i can split again if i have a vcvcv group.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
new Test().run();
}
private void run() {
String mama = "mama";
Pattern vcv =
Pattern.compile("([aeiou])(?=[bcdfghjklmnpqrstvwxyz][aeiou])");
Matcher matcher = vcv.matcher(mama);
System.out.println(matcher.replaceAll("$1-"));
String mamama = "mamama";
matcher = vcv.matcher(mamama);
System.out.println(matcher.replaceAll("$1-"));
}
}
Output:
ma-ma
ma-ma-ma

try
mama.replaceAll('([aeiou])([....][aeiou])', '\1-\2');
replaceAll is a regular expression method

Your pattern only matches if the String starts with a vocal. If you want to find a substring, ignoring the beginning, use
Pattern vcv = Pattern.compile (".*([aeiou][bcdfghjklmnpqrstvwxyz][aeiou])");
If you like to ignore the end too:
Pattern vcv = Pattern.compile (".*([aeiou][bcdfghjklmnpqrstvwxyz][aeiou]).*");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract String from a within a String using a Regular Expression - java

Related

Java replaceAll but the specified regex

I am trying to extract text using regex but it is not working

pattern matching with regular expression in java

Matching a string to a regex from html input

Java RegExp can't get the result ater evaluating pattern

Categories

Resources