Getting the first word in a String after quotation " (Java) - java

I have the following String:
String sentence = "this is my sentence \"course of math\" of this year";
I need to get the first word after a quote like this one ".
In my example I would get the word : course.

That's really simple, Try this:
/"(\w+)/
And you can get expected word by using $1
" matches the characters " literally
( capturing group
\w+ match any word character [a-zA-Z0-9_]
Online Demo

An alternative replaceAll approach:
String sentence = "this is my sentence \"course of math\" of this year";
System.out.println(sentence.replaceAll("(?s)[^\"]*\"(\\w+).*", "$1"));
// Or - if there can be a space after the first quote:
sentence = "this is my sentence \" course of math\" of this year";
System.out.println(sentence.replaceAll("(?s)[^\"]*\"\\s*(\\w+).*", "$1"));
It returns course because the pattern grabs any characters up to the first " (with [^"]*), then matches the quote, then matches and captures 1+ alphanumeric or underscore characters (with (\w+)), and then matches any 0+ characters up to the end (with .*), and we replace it all with just the contents of Group 1.
Just in case someone wonders if a non-regex solution is also possible, here is one that does not support spaces between the first " and the word:
String sentence = "this is my sentence \"course of math\" of this year";
String[] MyStrings = sentence.split(" "); // Split with a space
String res = "";
for(int i=0; i < MyStrings.length; i++) // Iterate over the split parts
{
if(MyStrings[i].startsWith("\"")) // Check if the split chunk starts with "
{
res = MyStrings[i].substring(1); // Get a substring from Index 1
break; // Stop the iteration, yield the value found first
}
}
System.out.println(res);
See the IDEONE demo
And here is another one that supports spaces between the first " and the next word:
String sentence = "this is my sentence \" course of math\" of this year";
String[] MyStrings = sentence.split("\"");
String res = MyStrings.length == 1 ? MyStrings[0] : // If no split took place use the whole string
MyStrings[1].trim().indexOf(" ") > -1 ? // If the second element has space
MyStrings[1].trim().substring(0, MyStrings[1].trim().indexOf(" ")): // Get substring
MyStrings[1]; // Else, fetch the whole second element
System.out.println(res);
See another demo

Related

Append a char at the end of each word using regex

I am looking for a RegEx (Java) which will append '~' char after end of each word.
My requirement is:
Append ~ at the end of each word
If word has any special char in it, then do not append '~'.
If there are multiple whitespaces, it should be trim to single whitespace.
Please have a look on my example below :
Input: Hello World How* A1e Y?u
Output: Hello~ World~ How* A1e~ Y?u
I took help from forum and could achieve it but I am not able to achieve #2.
My code snippet:
pattern = ([^\\s][a-zA-Z0-9])(\\s|$);
pattern.matcher(searchTerm).replaceAll("$1~$2");
How can I skip append operation if word has any special char?
Please suggest.
I suggest using
searchTerm = searchTerm.replaceAll("(?<!\\S)\\w++(?!\\S)", "$0~").replaceAll("\\s{2,}", " ").trim();
See the Java demo
Details
(?<!\S) - a negative lookbehind making sure there is either a whitespace or start of string right before the current location
\w++ - 1 or more word chars
(?!\S) - a negative lookahead making sure there is either a whitespace or start of string right after the current location.
The $0 is the whole match value.
The .replaceAll("\\s{2,}", " ") (for regular spaces, just replace \\s with a space) part "shrinks" any two or more whitespace characters to a single space, and .trim() part trims the result from whitespace on both ends.
This might help:
public static void main(String[] args) {
String input = "Hello World How* A1e Y?u word";
String extraSpaceInput = String.format(" %s ", input.replaceAll("\\s+", " "));
// Wanted output: Output: Hello~ World~ How* A1e~ Y?u word
Pattern pattern = Pattern.compile("\\s([a-zA-Z0-9]+)\\s");
String output = pattern.matcher(extraSpaceInput).replaceAll("$1~ ");
String cleanedUpOutput = output.replaceAll("\\s+", " ").trim();
// My output: "Hello~ World~ How* A1e~ Y?u word~"
System.out.println("My output: \"" + cleanedUpOutput + "\"");
}

How do I get the first and last initial/character from a name string

I am trying to extract the first character of both first and last name from a string but have had no luck so far.( tried searching multiple solutiosn online, no luck either)
So for example, if this is the string:
name = A A BCD EFG
I want to display just A and E ( say AE).
I am trying to figure out if theres an easier way to do this. The regex I am using, does get me the first character, but doesnt work for the last character, it returns (AEFG) ,aka the whole of last name instead.
Is there a way to go about this? Here's my code:
String lastName = "";
String firstName= "";
if(name.split("\\w+").length>1){
lastName = name.substring(name.lastIndexOf(" ")+1,1);
firstName = name.substring(0,1);
}
else{
firstName = name;
}
String completeInitials = firstName + " " + lastName;
You can use this regex to capture the first letter of first name in group1 and first letter of lastname in group2 and replace whole match with $1$2 to get the desired string.
^\s*([a-zA-Z]).*\s+([a-zA-Z])\S+$
Explanation:
^ - Start of string
\s* - Matches optional whitespace(s)
([a-zA-Z]) - Matches the first letter of firstname and captures it in group1
.*\s+ - Here .* matches any text greedily and \s+ ensures it matches the last whitespace just before last name
([a-zA-Z]) - This captures the first letter of lastname and captures it in group2
\S+$ - Matches remaining part of lastname and end of input
Regex Demo
Java code,
String s = "A A BCD EFG";
System.out.println(s.replaceAll("^\\s*([a-zA-Z]).*\\s+([a-zA-Z])\\S+$", "$1$2"));
Prints,
AE
Edit: To convert/ensure the replaced text to upper case, in case the name is in lower case letters, in PCRE based regex you can use \U just before the replacement as \U$1$2 but as this post is in Java, it won't work that way and hence you can use .toUpperCase()
Demo for ensuring the replaced text is in upper case
Java code for same would be to use .toUpperCase()
String s = "a A BCD eFG";
System.out.println(s.replaceAll("^\\s*([a-zA-Z]).*\\s+([a-zA-Z])\\S+$", "$1$2").toUpperCase());
Prints in upper case despite the input string in lower case,
AE
What about this: split the name String into parts. Then take the first char of the first and the last part.
This solution prints both initials (first name and last name) in case both are present and just one initial (name) if only a first name or a last name is given.
private void printInitials(String name) {
String[] nameParts = name.split(" ");
String firstName = nameParts[0];
char firstNameChar = firstName.charAt(0);
if (nameParts.length > 1) {
System.out.println("First character of first name: " + firstNameChar);
String lastName = nameParts[nameParts.length - 1];
char lastNameChar = lastName.charAt(0);
System.out.println("First character of last name: " + lastNameChar);
}
else {
System.out.println("First character name: " + firstNameChar);
}
}
So, in case of A A BCD EFG, it prints:
First character of first name: A
First character of last name: E
And in case of 'AABCDEFG`, it prints:
First character name: A
With Kotlin;
"Sample Name".replace("^\\s*([a-zA-Z]).*\\s+([a-zA-Z])\\S+$".toRegex(), "$1$2")?.uppercase()
or use this extension. for "Sample" -> "SA", "S" -> "S"
fun String?.toTwoChar(): String {
return when {
isNullOrEmpty() -> {
"?"
}
contains(" ") -> {
replace("^\\s*([a-zA-Z]).*\\s+([a-zA-Z])\\S+$".toRegex(), "$1$2").uppercase()
}
length > 1 -> {
substring(0, 2)
}
else -> {
substring(0, 1)
}
}
}
Split all words by ' '. For each word return the first character, and join it again
string.split(' ').map((element,index,array)=>{
if(index == 0 || index==array.length-1 ){
return element[0]
}
}).join('')
`

REGEX : How to add list as condition to satisfy regex in java

I want to check whether next sentence start with word which is already listed in wordList or not?
WordList = {Hello, Namshte, Hi, Hey ...... around 500 records- dummy list here }
Ex. This is 1st Sentence. This is 2nd place. Hello,This is my 3rd Sentence.
Requirement -> "." + " " (space) + "<word from List>"
For Simple REGEX I can write "\.\s[A-Z]" start with Alphabet.
But I want to detect sentence which starts with word from List.
REGEX -> \.\s[???] -> ? how to add List here
How does it select 3rd sentence properly?
Use this regular expression, given as a Java string:
"\\.\\s(?i:Hello|Namshte|Hi|Hey)\\b"
Explanation:
\\. Match period
\\s Match a whitespace
(?i: ) Non-capturing group match case-insensitively.
Hello|Namshte|Hi|Hey Match one of the words.
\\b Match word boundary to prevent match on word like Hijack.
To select the entire 3rd sentence, meaning up to and including the next period, use this:
"\\.\\s((?i:Hello|Namshte|Hi|Hey)\\b[^.]+\\.)"
The capturing group is the sentence.
Update Code example:
String[] wordList = { "Hello", "Namshte", "Hi", "Hey", ...... 500 words };
StringBuilder buf = new StringBuilder();
for (String word : wordList) {
if (buf.length() != 0)
buf.append('|');
buf.append(Pattern.quote(word));
}
Pattern regex = Pattern.compile("\\.\\s((?i:" + buf + ")\\b[^.]+\\.)");
String text = "This is 1st Sentence. This is 2nd place. Hello,This is my 3rd Sentence." +
" This is 4th place. Namshte, at 5.";
Matcher m = regex.matcher(text);
while (m.find())
System.out.println("Found: " + m.group(1));
Output
Found: Hello,This is my 3rd Sentence.
Found: Namshte, at 5.
The best thing is to use a loop structure. Your loop should be
for(int i=0; i<array.length; i++){
if(sentence.contains(array[i])){
//print out the sentence
}
}

Getting next two words from a given word in string with words containing non alphanumeric characters as well

I have a String as below:
String str = "This is something Total Toys (RED) 300,000.00 (49,999.00) This is something";
Input from user would be a keyword String viz. Total Toys (RED)
I can get the index of the keyword using str.indexOf(keyword);
I can also get the start of the next word by adding length of keyword String to above index.
However, how can I get the next two tokens after the keyword in given String which are the values I want?
if(str.contains(keyWord)){
String Value1 = // what should come here such that value1 is 300,000.00 which is first token after keyword string?
String Value2 = // what should come here such that value2 is (49,999.00) which is second token after keyword string?
}
Context : Read a PDF using PDFBox. The keyword above is the header in first column of a table in the PDF and the next two tokens I want to read are the values in the next two columns on the same row in this table.
You can use regular expressions to do this. This will work for all instances of the keyword that are followed by two tokens, if the keyword is not followed by two tokens, it won't match; however, this is easily adaptable, so please state if you want to match in cases where 0 or 1 tokens follow the keyword.
String regex = "(?i)%s\\s+([\\S]+)\\s+([\\S]+)";
Matcher m = Pattern.compile(String.format(regex, Pattern.quote(keyword))).matcher(str);
while (m.find())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
In you example, %s in regex would be replaced by "Total Toys", giving:
300,000.00 49,999.00
(?i) means case-insensitive
\\s means whitespace
\\S means non-whitespace
[...] is a character class
+ means 1 or more
(...) is a capturing group
EDIT:If you want to use a keyword with special characters intrinsic to regular expressions, then you need to use Pattern.quote(). For example, in regex, ( and ) are special characters, so a keyword with them will result in an incorrect regex. Pattern.quote() interprets them as raw characters, so they will be escaped in the regex, ie changed to \\( and \\).
If you want three groups, use this:
String regex = "%s\\s+([\\S]+)\\s+([\\S]+)(?:\\s+([\\S]+))?";
NB: If only two groups follow, group(3) will be null.
Something like this:
String remainingPart= str.substring(str.indexOf(keyWord)+keyWord.length());
StringTokenizer st=new StringTokenizer(remainingPart);
if(st.hasMoreTokens()){
Value1=st.nextToken();
}
if(st.hasMoreTokens()){
Value2=st.nextToken();
}
Try this,
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
if(str.contains(keyWord)) {
String splitLine = str.split(keyword)[1];
String tokens[] = splitLine.split(" ");
String Value1 = tokens[1];
String Value2 = tokens[2];
}
Here is something that works given what you have provided:
public static void main(String[] args)
{
String search = "Total Toys";
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
int index = str.indexOf(search);
index += search.length();
String[] tokens = str.substring(index, str.length()).trim().split(" ");
String val1 = tokens[0];
String val2 = tokens[1];
System.out.println("Val1: " + val1 + ", Val2: " + val2);
}
Output:
Val1: 300,000.00, Val2: 49,999.00

Match only first and last character of a string

I had a look at other stackoverflow questions and couldn't find one that asked the same question, so here it is:
How do you match the first and last characters of a string (can be multi-line or empty).
So for example:
String = "this is a simple sentence"
Note that the string includes the beginning and ending quotation marks.
How do I get match the first and last characters where the string begins and ends with a quotation mark (").
I tried:
^"|$" and \A"\Z"
but these do not produce the desired result.
Thanks for your help in advance :)
Is this what you are looking for?
String input = "\"this is a simple sentence\"";
String result = input.replaceFirst("(?s)^\"(.*)\"$", " $1 ");
This will replace the first and last character of the input string with spaces if it starts and ends with ". It will also work across multiple lines since the DOTALL flag is specified by (?s).
The regex that matches the whole input ".*". In java, it looks like this:
String regex = "\".*\"";
System.out.println("\"this is a simple sentence\"".matches(regex)); // true
System.out.println("this is a simple sentence".matches(regex)); // false
System.out.println("this is a simple sentence\"".matches(regex)); // false
If you want to remove the quotes, use this:
String input = "\"this is a simple sentence\"";
input = input.replaceAll("(^\"|\"$)", "")); // this is a simple sentence (without any quotes)
If you want this to work over multiple lines, use this:
String input = "\"this is a simple sentence\"\n\"and another sentence\"";
System.out.println(input + "\n");
input = input.replaceAll("(?m)(^\"|\"$)", "");
System.out.println(input);
which produces output:
"this is a simple sentence"
"and another sentence"
this is a simple sentence
and another sentence
Explanation of regex (?m)(^"|"$):
(?m) means "Caret and dollar match after and before newlines for the remainder of the regular expression"
(^"|"$) means ^" OR "$, which means "start of line then a double quote" OR "double quote then end of line"
Why not use the simple logic of getting the first and last characters based on charAt method of String? Place a few checks for empty/incomplete strings and you should be done.
String regexp = "(?s)\".*\"";
String data = "\"This is some\n\ndata\"";
Matcher m = Pattern.compile(regexp).matcher(data);
if (m.find()) {
System.out.println("Match starts at " + m.start() + " and ends at " + m.end());
}

Categories