Using regular expression with Java (some specific characters) - java

I have this example:
String str = "HellMCo I fiCZMnd thBVMis site intZereVCsting";
String tags = "BCMVZ";
I need a regular expression that helps me to find every combination of tags. As you can see in str we find four variations. I don't know too much about regular expressions.
I'm starting to test with this pattern:
(\d{,1}[BCMVZ])
PD: I'm testing here http://regexpal.com/ but it doesn't work my pattern.
So my real question is, how can I detect any variation of any character from another string?

Maybe try someting like:
[BCMVZ]+
it find any tags combinations with this chars BCMVZ.

Related

Replacing substrings in String

I am 16 and trying to learn Java, I have a paper that my uncle gave me that has things to do in Java. One of these things is too write and execute a program that will accept an extended message as a string such as
Each time she saw the painting, she was happy
and replace the word she with the word he.
Each time he saw the painting, he was happy.
This part is simple, but he wants me to be able to take any form of she and replace it we he like (she to he, She to He, she? to he?, she. to he., she' to he' and so on). Can someone help me make a program to accomplish this.
I have this
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
System.out.println("Write Sentence");
String original = keyboard.nextLine();
String changeWord = "he";
String modified = original.replaceAll("she", changeWord);
System.out.println(modified);
}
If this isn't the right site to find answers like this, can you redirect me to a site that answers such questions?
The best way to do this is with regular expressions (regex). Regex allow you to match patterns or classes of words so you can deal with general cases. Consider the cases you have already listed:
(she to he, She to He, she? to he?, she. to he., she' to he' and so on)
What is common between these cases? Can you think of some general rule(s) that would apply to all such transformations?
But also consider some cases you haven't listed: for example, as you've written it now, your code will change the word "ashes" to "ahes" because "ashes" contains "she." A properly written regex expression allows you to avoid this.
Before delving into regex, try and express, in plain English, a rule or set of rules for what you want to replace and what it should be replaced with.
Then, learn some regex and attempt to apply those rules.
Lastly, try and write some tests (i.e. using JUnit) for various cases so you can see which cases your code is working for and which cases it isn't working for.
Once you have done this, if something still doesn't work, feel free to post a new question here showing us your code and explaining what doesn't work. We'll be happy to help.
I would recommend this regular expression to solve this. It seems you have to search and replace separately the uppercase S and the lowercase s
String modified = original
.replaceAll("(she)(\\W)", "he$2")
.replaceAll("(She)(\\W)", "He$2");
Explanation :
The pattern (she) will match the word she and store it as the first captured group of characters
The pattern (\\W) will match one non alphabetic character (e.g. ', .) and store it as the second captured group of characters
Both of these patterns must match consecutive parts of the input string for replaceAll to replace something.
"he$2" put in the resulting string the word he followed by the second captured group of characters (in our case the group has only one character)
The above means that the regular expression will match a pattern like She'll and replace with He'll, but it will not match a pattern like Sherlock because here She is followed by an alphabetic character r

Using regular expressions in JAVA how do i say 4 any letter a space and then 4 numbers

What I want is a class code like ACCT 4838.
I tried
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][\\s][\\d][\\d][\\d][\\d]";
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\\s\\d\\d\\d\\d"
I apologize if this gets flagged i have been looking around for a while and i cant quite peg what it is im doing wrong. should be a quick one for someone.
You can use a regex like this:
(?i)^[a-z]{4} \d{4}$ // With inline insensitive flag
^[A-Za-z]{4} \d{4}$ // without inline flag
Remember to escape backslashes in java like ^[A-Za-z]{4} \\d{4}$
IdeOne example
Below works. In java the single \ gives an error. I was stupidly feeding in the wrong string in addition to not having the proper code.
String REGEX = "^[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\s\d\d\d\d";

Java get pattern of a String

is it possible to detect the pattern of a String and store it in a variable? so, if I have a String test1234 and highlight 1234 I expect something like \d{4}.
It would require that you find a regular expression that both your highlighted substring and desired replacement match and that is in no way unique. For example, "1234" could match .{4} or \d{4} or even .+ , which is not of a unique length. So, even if you could generate a regular expression from a string, it could happen that it would be the string itself or something you didn't want. Maybe you should rethink the general desired outcome of your program and try to come up with a different way of solving the issue at hand.
Hope that helped. Good luck!

Simple Java regular expression matching fails

Before y'all jump on me for posting something similar to previous questions asked, yes, there seem to be a number of regex related questions but nothing which seems to help me, or at least that I can see.
I am trying to parse strings in JAVA using PATTERN and MATCHER and am really having no joy. My regular expression seems to match my input string when I use a few of the online regular expression testing websites but Java simply does not match my expression.
My input string is:
"Big apple" title="Little Apple" type="Container" url="http://malcolm.com/testing"
The regular expression I am using to match is ".*" title="(.*)" type="Container" url="(.*)"
Essentially I want to pull out the text within the second and the fourth set of quotes. There will always be 4 sets of quotes with text within and around.
I am coding as follows:
Variable XMLSubstring contains the string above (including the quotes) and is as stated, even when I print it out.
Pattern p = Pattern.compile(".* title=\"(.*)\" type=\"Container\" url=\"(.*)\"");
m = p.matcher(XMLSubstring);
It doesn't appear to be rocket science I'm attempting but I'm pulling my hair out trying to debug the bloody thing.
Is there something wrong with my regex pattern?
Is there something wrong with the code I am using?
Am I simply a moron and should stop coding with immediate effect?
EDIT & UPDATE: I have found the problem. My string had a space at the end of it which was breaking the parser! How silly, and I think based on that, I need to accept the third suggestion of mine and give up programming. Thanks all for your assistance.
Try this,
String str="\"Big apple\" title=\"Little Apple\" type=\"Container\" url=\"http://malcolm.com/testing\"";
Pattern p=Pattern.compile(".* title=\\\".*\\\" type=\\\"Container\\\" url=\\\".*\\\"");
Matcher m=p.matcher(str);

Java regex to retain specific closing tags

I'm trying to write a regex to remove all but a handful of closing xml tags.
The code seems simple enough:
String stringToParse = "<body><xml>some stuff</xml></body>";
Pattern pattern = Pattern.compile("</[^(a|em|li)]*?>");
Matcher matcher = pattern.matcher(stringToParse);
stringToParse = matcher.replaceAll("");
However, when this runs, it skips the "xml" closing tag. It seems to skip any tag where there is a matching character in the compiled group (a|em|li), i.e. if I remove the "l" from "li", it works.
I would expect this to return the following string: "<body><xml>some stuff" (I am doing additional parsing to remove the opening tags but keeping it simple for the example).
You probably shouldn't use regex for this task, but let's see what happens...
Your problem is that you are using a negative character class, and inside character classes you can't write complex expressions - only characters. You could try a negative lookahead instead:
"</(?!a|em|li).*?>"
But this won't handle a number of cases correctly:
Comments containing things that look like tags.
Tags as strings in attributes.
Tags that start with a, em, or li but are actually other tags.
Capital letters.
etc...
You can probably fix these problems, but you need to consider whether or not it is worth it, or if it would be better to look for a solution based on a proper HTML parser.
I would really use a proper parser for this (e.g. JTidy). You can't parse XML/HTML using regular expressions as it's not regular, and no end of edge cases abound. I would rather use the XML parsing available in the standard JDK (JAXP) or a suitable 3rd party library (see above) and configure your output accordingly.
See this answer for more passionate info re. parsing XML/HTML via regexps.
You cannot use an alternation inside a character class. A character class always matches a single character.
You likely want to use a negative lookahead or lookbehind instead:
"</(?!a|em|li).*?>"

Categories