Issue with Java string search pattern ( contains / matches)

Issue with Java string search pattern ( contains / matches) - java

I have one string which contains a couple of attribute values. While verifying whether the string contains specific attribute values or not by using some simple regex, the matches function is always returning false value.
Now I need the behavior like,
If String contains \"import\" : Then I need isExportSet to be
set as true.
If String contains \"path\" : true Then I need
isPathSet to be set to true.
I tried as shown below, but it did not work for me:
public class DriverClass {
public static void main(String[] args) {
String str = "\"import\" : \"static\",\"path\" : true";
boolean isExportSet = str.matches("\\*+export\\*+");
boolean isPathSet = str.matches("\\*+multipath\\s+:\\s+true");
System.out.println("Export " + isExportSet);
System.out.println("Path " + isPathSet);
}
}

Please let me know if the following code fulfill the problem deifinition.
static String str = "\"import\" : \"static\",\"path\" : true";
static void test(String str) {
Map<String, String> map = new HashMap<String, String>();
String[] parts = str.split("(:|,)");
for (int i = 0; i < parts.length - 1; i+=2) {
map.put(getUnquotedStr(parts[i]), getUnquotedStr(parts[i+1]));
}
System.out.println(map.size() + " entries: " + map); // 2 entries: {path=true, import=static}
boolean isExportSet = "".equals(map.get("import"));
boolean isPathSet = "true".equals(map.get("path"));
System.out.println(isExportSet + " - " + isPathSet);
}
private static String getUnquotedStr(String str) {
return str.replaceAll("\"", "").trim();
}
will print as follows on the console:
2 entries: {path=true, import=static}
false - true

You can simply use str.contains("valueToSerach")

You can use
\"import\"
\"(path|multipath)\"
And please never connect a * with another quantity indicator that leads to errors.
And since you want to check the " hard, you have to include them in your expression.

Testing the string for containing \"import\" is just checking if the string contains "import". In your regular expression you need to disregard the \ check because this is an escape character for Java to be able to handle the double quotes inside a string, without ending the string definition. You will, however, need to escape those characters in your regex as well.
For "import" the regex becomes str.matches(\"import\"). Analogous for the "path" string.
I found this a handy tool to check regex's: Free Formatter

Related

Any way to prevent that last char of a string from replacing in java

Let's say that want to add ? after each letter in a string.
String letters = "A#B#C#D"; //Split by #
String splitLetters[]=letters.split("#");
for(String ltr: splitLetters)
System.out.println(ltr+"?");
the output will be like:
A? B? C? D?
What I want is to prevent that last char from getting the change.
I want only the first letters to be changed.
Note:
replacing # with ? in a direct way like (...replace("#","?")) won't work. The code above is only an example.

You're almost thre! Just use for...loop and check if current letter last or not.
String letters = "A#B#C#D"; //Split by #
String splitLetters[] = letters.split("#");
for (int i = 0; i < splitLetters.length; i++) {
System.out.print(splitLetters[i]);
if (i + 1 < splitLetters.length)
System.out.println('?');
}

There are so many ways to do it (as already described in the existing answers). The following solution is based on your own solution with the required change:
public class Main {
public static void main(String[] args) {
String letters = "A#B#C#D"; // Split by #
String splitLetters[] = letters.split("#");
boolean firstStrPrinted = false;
for (String ltr : splitLetters) {
if (firstStrPrinted) {
System.out.print("?" + ltr);
} else {
System.out.print(ltr);
firstStrPrinted = true;
}
}
}
}
Output:
A?B?C?D
Here, a boolean firstStrPrinted has been used to track if the first string has been printed. If not, do not print the ? and update it to true.

Nothing is actually getting changed in your example, so it's difficult to figure out what you want to do.
If it's just that you don't want to print out the question mark after the last substring, then:
int k;
for (k=0; k<splitLetters.length-1; k++)
System.out.println(splitLetters[k] + "?");
System.out.println(splitLetters[k]);
You can apply similar reasoning to your actual code.

let's say that want to add ? after each letter in a string.
How about this?
This works by replacing the # sign followed by a character or end of string with the same character followed by the letter and the ? mark. It uses a back reference to capture the character.
String[] strs = { "A#B#C#D", "ABC#BBB#CCC#DDDD" };
for (String text : strs) {
String rep = text.replaceAll("(\\w)#|$", "$1? ");
System.out.println(text + " -> " + rep);
}
Prints
A#B#C#D -> A? B? C? D?
ABC#BBB#CCC#DDDD -> ABC? BBB? CCC? DDDD?
If this does not meet your requirements, please provide more specific guidelines.

How I can use InCombiningDiacriticalMarks ignoring one case

I'm writing code for remove all diacritics for one String.
For example: áÁéÉíÍóÓúÚäÄëËïÏöÖüÜñÑ
I'm using the property InCombiningDiacriticalMarks of Unicode. But I want to ignore the replace for ñ and Ñ.
Now I'm saving these two characters before replace with:
s = s.replace('ñ', '\001');
s = s.replace('Ñ', '\002');
It's possible to use InCombiningDiacriticalMarks ignoring the diacritic of ñ and Ñ.
This is my code:
public static String stripAccents(String s)
{
/*Save ñ*/
s = s.replace('ñ', '\001');
s = s.replace('Ñ', '\002');
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
/*Add ñ to s*/
s = s.replace('\001', 'ñ');
s = s.replace('\002', 'Ñ');
return s;
}
It works fine, but I want know if it's possible optimize this code.

It depends what you mean by "optimize". It's tough to reduce the number of lines of code from what you have written, but since you are processing the string six times there's scope to improve performance by processing the input string only once, character by character:
public class App {
// See SO answer https://stackoverflow.com/a/10831704/2985643 by virgo47
private static final String tab00c0
= "AAAAAAACEEEEIIII"
+ "DNOOOOO\u00d7\u00d8UUUUYI\u00df"
+ "aaaaaaaceeeeiiii"
+ "\u00f0nooooo\u00f7\u00f8uuuuy\u00fey"
+ "AaAaAaCcCcCcCcDd"
+ "DdEeEeEeEeEeGgGg"
+ "GgGgHhHhIiIiIiIi"
+ "IiJjJjKkkLlLlLlL"
+ "lLlNnNnNnnNnOoOo"
+ "OoOoRrRrRrSsSsSs"
+ "SsTtTtTtUuUuUuUu"
+ "UuUuWwYyYZzZzZzF";
public static void main(String[] args) {
var input = "AaBbCcáÁéÉíÍóÓúÚäÄëËïÏöÖüÜñÑçÇ";
var output = removeDiacritic(input);
System.out.println("input = " + input);
System.out.println("output = " + output);
}
public static String removeDiacritic(String input) {
var output = new StringBuilder(input.length());
for (var c : input.toCharArray()) {
if (isModifiable(c)) {
c = tab00c0.charAt(c - '\u00c0');
}
output.append(c);
}
return output.toString();
}
// Returns true if the supplied char is a candidate for diacritic removal.
static boolean isModifiable(char c) {
boolean modifiable;
if (c < '\u00c0' || c > '\u017f') {
modifiable = false;
} else {
modifiable = switch (c) {
case 'ñ', 'Ñ' ->
false;
default ->
true;
};
}
return modifiable;
}
}
This is the output from running the code:
input = AaBbCcáÁéÉíÍóÓúÚäÄëËïÏöÖüÜñÑçÇ
output = AaBbCcaAeEiIoOuUaAeEiIoOuUñÑcC
Characters without diacritics in the input string are not modified. Otherwise the diacritic is removed (e.g. Çto C), except in the cases of ñ and Ñ.
Notes:
The code does not use the Normalizer class or InCombiningDiacriticalMarks at all. Instead it processes each character in the input string only once, removing its accent if appropriate. The conventional approach for removing diacritics (as used in the OP) does not support selective removal as far as I know.
The code is based on an answer by user virgo47, but enhanced to support the selective removal of accents. See virgo47's answer for details of mapping an accented character to its unaccented counterpart.
This solution only works for Latin-1/Latin-2, but could be enhanced to support other mappings.
Your solution is very short and easy to understand, but it feels brittle, and for large input I suspect that it would be significantly slower than an approach that only processed each character once.

Ave Maria Purisima,
You can create a pattern excluding the tilde from the diacritical marks set:
private static final Pattern STRIP_ACCENTS_PATTERN = Pattern.compile("[\\p{InCombiningDiacriticalMarks}&&[^\u0303]]+");
public static String stripAccents(String input) {
if (input == null) {
return null;
}
final StringBuilder decomposed = new StringBuilder(Normalizer.normalize(input, Normalizer.Form.NFD));
return STRIP_ACCENTS_PATTERN.matcher(decomposed).replaceAll(EMPTY);
}
Hope it helps

simple mathematical expression parsing

I try to write equals override function. I think I have written right but the problem is that parsing the expression. I have an array type of ArrayList<String> it takes inputs from keyboard than evaluate the result. I could compare with another ArrayList<String> variable but how can I compare the ArrayList<String> to String. For example,
String expr = "(5 + 3) * 12 / 3";
ArrayList<String> userInput = new ArrayList<>();
userInput.add("(");
userInput.add("5");
userInput.add(" ");
userInput.add("+");
userInput.add(" ");
userInput.add("3");
.
.
userInput.add("3");
userInput.add(")");
then convert userInput to String then compare using equals
As you see it is too long when a test is wanted to apply.
I have used to split but It splits combined numbers as well. like 12 to 1 and 2
public fooConstructor(String str)
{
// ArrayList<String> holdAllInputs; it is private member in class
holdAllInputs = new ArrayList<>();
String arr[] = str.split("");
for (String s : arr) {
holdAllInputs.add(s);
}
}
As you expect it doesn't give the right result. How can it be fixed? Or can someone help to writing regular expression to parse it properly as wanted?
As output I get:
(,5, ,+, ,3,), ,*, ,1,2, ,/, ,3
instead of
(,5, ,+, ,3,), ,*, ,12, ,/, ,3

The Regular Expression which helps you here is
"(?<=[-+*/()])|(?=[-+*/()])"
and of course, you need to avoid unwanted spaces.
Here we go,
String expr = "(5 + 3) * 12 / 3";
.
. // Your inputs
.
String arr[] = expr.replaceAll("\\s+", "").split("(?<=[-+*/()])|(?=[-+*/()])");
for (String s : arr)
{
System.out.println("Element : " + s);
}
Please see my expiriment : http://rextester.com/YOEQ4863
Hope it helps.

Instead of splitting the input into tokens for which you don't have a regex, it would be good to move ahead with joining the strings in the List like:
StringBuilder sb = new StringBuilder();
for (String s : userInput)
{
sb.append(s);
}
then use sb.toString() later for comparison. I would not advice String concatenation using + operator details here.
Another approach to this would be to use one of the the StringUtils.join methods in Apache Commons Lang.
import org.apache.commons.lang3.StringUtils;
String result = StringUtils.join(list, "");
If you are fortunate enough to be using Java 8, then it's even easier...just use String.join
String result = String.join("", list);
More details on this approach available here

this makes all the inputs into one string which can then be can be compared against the expression to see if it is equal
String x = "";
for(int i = 0; i < holdAllInputs.length; i++){
x = x + holdAllInputs.get(i);
}
if(expr == x){
//do something equal
}else{
//do something if not equal
}

Java codingbat help - withoutString

I'm using codingbat.com to get some java practice in. One of the String problems, 'withoutString' is as follows:
Given two strings, base and remove, return a version of the base string where all instances of the remove string have been removed (not case sensitive).
You may assume that the remove string is length 1 or more. Remove only non-overlapping instances, so with "xxx" removing "xx" leaves "x".
This problem can be found at: http://codingbat.com/prob/p192570
As you can see from the the dropbox-linked screenshot below, all of the runs pass except for three and a final one called "other tests." The thing is, even though they are marked as incorrect, my output matches exactly the expected output for the correct answer.
Here's a screenshot of my output:
And here's the code I'm using:
public String withoutString(String base, String remove) {
String result = "";
int i = 0;
for(; i < base.length()-remove.length();){
if(!(base.substring(i,i+remove.length()).equalsIgnoreCase(remove))){
result = result + base.substring(i,i+1);
i++;
}
else{
i = i + remove.length();
}
if(result.startsWith(" ")) result = result.substring(1);
if(result.endsWith(" ") && base.substring(i,i+1).equals(" ")) result = result.substring(0,result.length()-1);
}
if(base.length()-i <= remove.length() && !(base.substring(i).equalsIgnoreCase(remove))){
result = result + base.substring(i);
}
return result;
}

Your solution IS failing AND there is a display bug in coding bat.
The correct output should be:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours is:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours fails because it is removing spaces, but also, coding bat does not display the correct expected and run output string due to HTML removing extra spaces.
This recursive solution passes all tests:
public String withoutString(String base, String remove) {
int remIdx = base.toLowerCase().indexOf(remove.toLowerCase());
if (remIdx == -1)
return base;
return base.substring(0, remIdx ) +
withoutString(base.substring(remIdx + remove.length()) , remove);
}
Here is an example of an optimal iterative solution. It has more code than the recursive solution but is faster since far fewer function calls are made.
public String withoutString(String base, String remove) {
int remIdx = 0;
int remLen = remove.length();
remove = remove.toLowerCase();
while (true) {
remIdx = base.toLowerCase().indexOf(remove);
if (remIdx == -1)
break;
base = base.substring(0, remIdx) + base.substring(remIdx + remLen);
}
return base;
}

I just ran your code in an IDE. It compiles correctly and matches all tests shown on codingbat. There must be some bug with codingbat's test cases.
If you are curious, this problem can be solved with a single line of code:
public String withoutString(String base, String remove) {
return base.replaceAll("(?i)" + remove, ""); //String#replaceAll(String, String) with case insensitive regex.
}
Regex explaination:
The first argument taken by String#replaceAll(String, String) is what is known as a Regular Expression or "regex" for short.
Regex is a powerful tool to perform pattern matching within Strings. In this case, the regular expression being used is (assuming that remove is equal to IS):
(?i)IS
This particular expression has two parts: (?i) and IS.
IS matches the string "IS" exactly, nothing more, nothing less.
(?i) is simply a flag to tell the regex engine to ignore case.
With (?i)IS, all of: IS, Is, iS and is will be matched.
As an addition, this is (almost) equivalent to the regular expressions: (IS|Is|iS|is), (I|i)(S|s) and [Ii][Ss].
EDIT
Turns out that your output is not correct and is failing as expected. See: dansalmo's answer.

public String withoutString(String base, String remove) {
String temp = base.replaceAll(remove, "");
String temp2 = temp.replaceAll(remove.toLowerCase(), "");
return temp2.replaceAll(remove.toUpperCase(), "");
}

Please find below my solution
public String withoutString(String base, String remove) {
final int rLen=remove.length();
final int bLen=base.length();
String op="";
for(int i = 0; i < bLen;)
{
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
i +=rLen;
continue;
}
op += base.substring(i, i + 1);
i++;
}
return op;
}
Something things go really weird on codingBat this is just one of them.

I am adding to a previous solution, but using a StringBuilder for better practice. Most credit goes to Anirudh.
public String withoutString(String base, String remove) {
//create a constant integer the size of remove.length();
final int rLen=remove.length();
//create a constant integer the size of base.length();
final int bLen=base.length();
//Create an empty string;
StringBuilder op = new StringBuilder();
//Create the for loop.
for(int i = 0; i < bLen;)
{
//if the remove string lenght we are looking for is not less than the base length
// and the base substring equals the remove string.
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
//Increment by the remove length, and skip adding it to the string.
i +=rLen;
continue;
}
//else, we add the character at i to the string builder.
op.append(base.charAt(i));
//and increment by one.
i++;
}
//We return the string.
return op.toString();
}

Taylor's solution is the most efficient one, however I have another solution that is a naive one and it works.
public String withoutString(String base, String remove) {
String returnString = base;
while(returnString.toLowerCase().indexOf(remove.toLowerCase())!=-1){
int start = returnString.toLowerCase().indexOf(remove.toLowerCase());
int end = remove.length();
returnString = returnString.substring(0, start) + returnString.substring(start+end);
}
return returnString;
}

#Daemon
your code works. Thanks for the regex explanation. Though dansalmo pointed out that codingbat is displaying the intended output incorrectly, I through in some extra lines to your code to unnecessarily account for the double spaces with the following:
public String withoutString(String base, String remove){
String result = base.replaceAll("(?i)" + remove, "");
for(int i = 0; i < result.length()-1;){
if(result.substring(i,i+2).equals(" ")){
result = result.replace(result.substring(i,i+2), " ");
}
else i++;
}
if(result.startsWith(" ")) result = result.substring(1);
return result;
}

public String withoutString(String base, String remove){
return base.replace(remove,"");
}

String.split() Not Acting on Semicolon or Space Delimiters

This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.

String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")

I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."

Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Issue with Java string search pattern ( contains / matches) - java

You can simply use str.contains("valueToSerach")

You can use \"import\" \"(path|multipath)\" And please never connect a * with another quantity indicator that leads to errors. And since you want to check the " hard, you have to include them in your expression.

Related

Any way to prevent that last char of a string from replacing in java

How I can use InCombiningDiacriticalMarks ignoring one case

simple mathematical expression parsing

Java codingbat help - withoutString

String.split() Not Acting on Semicolon or Space Delimiters

Categories

Resources