Find whether a string matches another string - java

I'd like to parse a string in order to see if it matches the entire string or a substring.
I tried this:
String [] array = {"Example","hi","EXAMPLE","example","eXamPLe"};
String word;
...
if ( array[j].toUpperCase().contains(word) || array[j].toLowerCase().contains(word) )
System.out.print(word + " ");
But my problem is:
When user enter the word "Example" (case sensitive) and in my array there is "Example" it doesn't print it, it only prints "EXAMPLE" and "example" that's because when my program compares the two strings it converts my array[j] string to uppercase or lowercase so it won't match words with both upper and lower cases like the word "Example".
So in this case if user enters "Examp" I want it to print:
Example EXAMPLE example eXamPLe

You can convert both the input string and the candidates to uppercase before calling contains().
if ( array[j].toUpperCase().contains( word.toUpperCase() ) ) {
System.out.print(word + " ");
}

If you are just looking for full matches, use equalsIgnoreCase.
When partial match is needed you might need a trie or something similar.

Compare to the same case of word.
if ( array[j].toUpperCase().contains(word.toUpperCase()))
{
}

You are looking for this:
if (array[j].toUpperCase().contains(word.toUpperCase())) {
System.out.print(array[j]+ " ");
}
This will print:
Example EXAMPLE example eXamPLe
As you wanted!

well, i think othehan using .equalsIgnoreCase u might also be interested in the Matcher( Java Regex). the links are self-explanaory: http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html
Pattern/Matcher group() to obtain substring in Java?
String contains - ignore case

Submit this as your assignment. You'll definitely come off as unique.
String word = "Exam";
String [] array = {"Example","hi","EXAMPLE","example","eXamPLe"};
for (String str : array)
if (str.matches("(?i)"+word+".*"))
System.out.print(str + " "); // prints: Example EXAMPLE example eXamPLe

Related

Regular expression for string with apostrophes

I'm trying to build regex which will filter form string all non-alphabetical characters, and if any string contains single quotes then I want to keep it as an exception to the rule.
So for example when I enter
car's34
as a result I want to get
car's
when I enter
*&* Lisa's car 0)*
I want to get
Lisa's
at the moment I use this:
string.replaceAll("[^A-Za-z]", "")
however, it gives me only alphabets, and removed the desired single quotas.
This will also remove apostrophes that are not "part if words":
string = string.replaceAll("[^A-Za-z' ]+|(?<=^|\\W)'|'(?=\\W|$)", "")
.replaceAll(" +", " ").trim();
This first simply adds an apostrophe to the list of chars you want to keep, but uses look arounds to find apostrophes not within words, so
I'm a ' 123 & 'test'
would become
I'm a test
Note how the solitary apostrophe was removed, as well as the apostrophes wrapping test, but I'm was preserved.
The subsequent replaceAll() is to replace multiple spaces with a single space, which will result if there's a solitary apostrophe in the input. A further call to trim() was added in case it occurs at the end of the input.
Here's a test:
String string = "I'm a ' 123 & 'test'";
string = string.replaceAll("[^A-Za-z' ]+|(?<=^|\\W)'|'(?=\\W|$)", "").replaceAll(" +", " ").trim();
System.out.println(string);
Output:
I'm a test
Isn't this working ?
[^A-Za-z']
The obvious solution would be:
string.replaceAll("[^A-Za-z']", "")
I suspect you want something more.
You can try the regular expression:
[^\p{L}' ]
\p{L} denote the category of Unicode letters.
In ahother hand, you need to use a constant of Pattern for avoid recompiled the expression every time, something like that:
private static final Pattern REGEX_PATTERN =
Pattern.compile("[^\\p{L}' ]");
public static void main(String[] args) {
String input = "*&* Lisa's car 0)*";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("")
); // prints " Lisa's car "
}
#Bohemian has a good idea but word boundaries are called for instead of lookaround:
string.replaceAll("([^A-Za-z']|\B'|'\B)+", " ");

Java Regex : String Formatting

After runing this
Names.replaceAll("^(\\w)\\w+", "$1.")
I have a String Like
Names = F.DA, ABC, EFG
I want a String format like
F.DA, A.BC & E.FG
How do I do that ?
Update :
If I had a name Like
Robert Filip, Robert Morris, Cirstian Jed
I want like
R.Filp, R.Morris & C.Jed
I will be happy, If also you suggest me a good resource on JAVA Regex.
You need to re-assign the result back to Names, since Strings are immutable, the replaceAll methods does not do in place replacement, rather it returns a new String:
names = names.replaceAll(", (?=[^,]*$)", " & ")
Following should work for you:
String names = "Robert Filip, Robert Morris, Cirstian Jed, S.Smith";
String repl = names.replaceAll("((?:^|[^A-Z.])[A-Z])[a-z]*\\s(?=[A-Z])", "$1.")
.replaceAll(", (?=[^,]*$)", " & ");
System.out.println(repl); //=> R.Filip, R.Morris, C.Jed & S.Smith
Explanation:
1st replaceAll call is matching a non-word && non-dot character + a capital letter in group #1 + 0 or more lower case letters + a space which should be followed by 1 capital letter. It is then inserting a dot in front of the match $1.
2ns replaceAll call is matching a comma that is not followed by another comma and replacing that by literal string " & ".
Try this
String names = "Amal.PM , Rakesh.KR , Ajith.N";
names = names.replaceAll(" , (?=[^,]*$)", " & ");
System.out.println("New String : "+names);

Parse and remove special characters in java regex

So we were looking at some of the other regex posts and we are having trouble removing a special case in one instance; the special character is in the beginning of the word.
We have the following line in our code:
String k = s.replaceAll("([a-z]+)[()?:!.,;]*", "$1");
where s is a singular word. For example, when parsing the sentence "(hi hi hi)" by tokenizing it, and then performing the replaceAll function on each token, we get an output of:
(hi
hi
hi
What are we missing in our regex?
You can use an easier approach - replace the characters that you do not want with spaces:
String k = s.replaceAll("[()?:!.,;]+", " ");
Position matters so you would need to match the excluded charcters before the capturing group also:
String k = s.replaceAll("[()?:!.,;]*([a-z]+)[()?:!.,;]*", "$1");
your replace just removed the "special chars" after the [a-z]+, that's why the ( before hi is left there.
If you know s is a single word
you could either:
String k = s.replaceAll("\\W*(\\w+)\\W*", "$1");
or
String k = s.replaceAll("\\W*", "");
This can be more simple
try this :
String oldString = "Hi There ##$ What is %#your name?##$##$ 0123$$";
System.out.println(oldString.replaceAll("[\\p{Punct}\\s\\d]+", " ");
output :
Hi There What is your name 0123
So it also accepts numeric.
.replaceAll("[\p{Punct}\s\d]+", " ");
will replace alll the Punctuations used which includes almost all the special characters.

Match only first and last character of a string

I had a look at other stackoverflow questions and couldn't find one that asked the same question, so here it is:
How do you match the first and last characters of a string (can be multi-line or empty).
So for example:
String = "this is a simple sentence"
Note that the string includes the beginning and ending quotation marks.
How do I get match the first and last characters where the string begins and ends with a quotation mark (").
I tried:
^"|$" and \A"\Z"
but these do not produce the desired result.
Thanks for your help in advance :)
Is this what you are looking for?
String input = "\"this is a simple sentence\"";
String result = input.replaceFirst("(?s)^\"(.*)\"$", " $1 ");
This will replace the first and last character of the input string with spaces if it starts and ends with ". It will also work across multiple lines since the DOTALL flag is specified by (?s).
The regex that matches the whole input ".*". In java, it looks like this:
String regex = "\".*\"";
System.out.println("\"this is a simple sentence\"".matches(regex)); // true
System.out.println("this is a simple sentence".matches(regex)); // false
System.out.println("this is a simple sentence\"".matches(regex)); // false
If you want to remove the quotes, use this:
String input = "\"this is a simple sentence\"";
input = input.replaceAll("(^\"|\"$)", "")); // this is a simple sentence (without any quotes)
If you want this to work over multiple lines, use this:
String input = "\"this is a simple sentence\"\n\"and another sentence\"";
System.out.println(input + "\n");
input = input.replaceAll("(?m)(^\"|\"$)", "");
System.out.println(input);
which produces output:
"this is a simple sentence"
"and another sentence"
this is a simple sentence
and another sentence
Explanation of regex (?m)(^"|"$):
(?m) means "Caret and dollar match after and before newlines for the remainder of the regular expression"
(^"|"$) means ^" OR "$, which means "start of line then a double quote" OR "double quote then end of line"
Why not use the simple logic of getting the first and last characters based on charAt method of String? Place a few checks for empty/incomplete strings and you should be done.
String regexp = "(?s)\".*\"";
String data = "\"This is some\n\ndata\"";
Matcher m = Pattern.compile(regexp).matcher(data);
if (m.find()) {
System.out.println("Match starts at " + m.start() + " and ends at " + m.end());
}

What is the best way to extract the first word from a string in Java?

Trying to write a short method so that I can parse a string and extract the first word. I have been looking for the best way to do this.
I assume I would use str.split(","), however I would like to grab just the first first word from a string, and save that in one variable, and and put the rest of the tokens in another variable.
Is there a concise way of doing this?
The second parameter of the split method is optional, and if specified will split the target string only N times.
For example:
String mystring = "the quick brown fox";
String arr[] = mystring.split(" ", 2);
String firstWord = arr[0]; //the
String theRest = arr[1]; //quick brown fox
Alternatively you could use the substring method of String.
You should be doing this
String input = "hello world, this is a line of text";
int i = input.indexOf(' ');
String word = input.substring(0, i);
String rest = input.substring(i);
The above is the fastest way of doing this task.
To simplify the above:
text.substring(0, text.indexOf(' '));
Here is a ready function:
private String getFirstWord(String text) {
int index = text.indexOf(' ');
if (index > -1) { // Check if there is more than one word.
return text.substring(0, index).trim(); // Extract first word.
} else {
return text; // Text is the first word itself.
}
}
The simple one I used to do is
str.contains(" ") ? str.split(" ")[0] : str
Where str is your string or text bla bla :). So, if
str is having empty value it returns as it is.
str is having one word, it returns as it is.
str is multiple words, it extract the first word and return.
Hope this is helpful.
import org.apache.commons.lang3.StringUtils;
...
StringUtils.substringBefore("Grigory Kislin", " ")
You can use String.split with a limit of 2.
String s = "Hello World, I'm the rest.";
String[] result = s.split(" ", 2);
String first = result[0];
String rest = result[1];
System.out.println("First: " + first);
System.out.println("Rest: " + rest);
// prints =>
// First: Hello
// Rest: World, I'm the rest.
API docs for: split
for those who are searching for kotlin
var delimiter = " "
var mFullname = "Mahendra Rajdhami"
var greetingName = mFullname.substringBefore(delimiter)
like this:
final String str = "This is a long sentence";
final String[] arr = str.split(" ", 2);
System.out.println(Arrays.toString(arr));
arr[0] is the first word, arr[1] is the rest
You could use a Scanner
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
The scanner can also use delimiters
other than whitespace. This example
reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
prints the following output:
1
2
red
blue
None of these answers appears to define what the OP might mean by a "word". As others have already said, a "word boundary" may be a comma, and certainly can't be counted on to be a space, or even "white space" (i.e. also tabs, newlines, etc.)
At the simplest, I'd say the word has to consist of any Unicode letters, and any digits. Even this may not be right: a String may not qualify as a word if it contains numbers, or starts with a number. Furthermore, what about hyphens, or apostrophes, of which there are presumably several variants in the whole of Unicode? All sorts of discussions of this kind and many others will apply not just to English but to all other languages, including non-human language, scientific notation, etc. It's a big topic.
But a start might be this (NB written in Groovy):
String givenString = "one two9 thr0ee four"
// String givenString = "oňňÜÐæne;:tŵo9===tĥr0eè? four!"
// String givenString = "mouse"
// String givenString = "&&^^^%"
String[] substrings = givenString.split( '[^\\p{L}^\\d]+' )
println "substrings |$substrings|"
println "first word |${substrings[0]}|"
This works OK for the first, second and third givenStrings. For "&&^^^%" it says that the first "word" is a zero-length string, and the second is "^^^". Actually a leading zero-length token is String.split's way of saying "your given String starts not with a token but a delimiter".
NB in regex \p{L} means "any Unicode letter". The parameter of String.split is of course what defines the "delimiter pattern"... i.e. a clump of characters which separates tokens.
NB2 Performance issues are irrelevant for a discussion like this, and almost certainly for all contexts.
NB3 My first port of call was Apache Commons' StringUtils package. They are likely to have the most effective and best engineered solutions for this sort of thing. But nothing jumped out... https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html ... although something of use may be lurking there.
You could also use http://download.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
I know this question has been answered already, but I have another solution (For those still searching for answers) which can fit on one line:
It uses the split functionality but only gives you the 1st entity.
String test = "123_456";
String value = test.split("_")[0];
System.out.println(value);
The output will show:
123
The easiest way I found is this:
void main()
String input = "hello world, this is a line of text";
print(input.split(" ").first);
}
Output: hello
Assuming Delimiter is a blank space here:
Before Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String[] words = sentence.split(delimiter);
return words[0];
}
After Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String firstWord = Arrays.stream(sentence.split(delimiter))
.findFirst()
.orElse("No word found");
}
String anotherPalindrome = "Niagara. O roar again!";
String roar = anotherPalindrome.substring(11, 15);
You can also do like these

Categories