How to extract substrings from a string in java

How to extract substrings from a string in java - java

I am not so confident in Java so I need some help to extract multiple substrings from a string.string is as given below.
I have a text file with possibly thousands of similar POS-tagged lines that I need to extract the original text from that.I have tried using tokenizer but didn't really get the result I wanted.I tried using Pattern Matcher and I am having problems with the regex.
String="I_PRP recently_RB purchased_VBD this_DT camera_NN";
I want to get the output= I recently purchased this camera.
I use
Regex: [\/](.*?)\s\b
But its not working.Please help me.

try
String s= "I_PRP recently_RB purchased_VBD this_DT camera_NN";
s = s.replaceAll("_\\w+(?=(\\s|$))", "");
System.out.println(s);
prints
I recently purchased this camera

It seems that you are attaching a tag to indicate the word type (e.g. noun, verb or pronoun) if this suffix will be always capital letters, it is more safe to use the following regex in your replaceAll
s = s.replaceAll("_[A-Z]+(?=(\\s|$))", "");

Related

Most efficient way to get the substring after a specific other substring

If I have a string that looks something like this:
String text = "id=2009,name=Susie,city=Berlin,phone=0723178,birthday=1991-12-07";
I only want to have the info name and phone. I know how to parse the entire String, but in my specific case it is important to only get those two "fields".
So what is the best/most efficient way to have my search method do the following:
search for the substring "name=" and return the substring after it ("Susie") until it reaches the next comma
My approach would have been to:
get the last index of "name=" first
use this index then as the new start for my parsing method
Any other suggestions maybe on how this could be done more efficiently and with a more condense code? Thank you for any input

You can use following regex to capture the expected word after phone and name and get frist group from matched object:
(?:phone|name)=([^,]+)
With regards to following command if it might happen to have a word which is contain phone or name as a more comprehensive way you can putt a comma before your name.
(?:^|,)(?:phone|name)=([^,]+)
Read more about regular expression http://www.regular-expressions.info/

Regex might be more efficient, but for readability, I <3 Guava
String text = "id=2009,name=Susie,city=Berlin,phone=0723178,birthday=1991-12-07";
final Map<String, String> infoMap = Splitter.on(",")
.omitEmptyStrings()
.trimResults()
.withKeyValueSeparator("=")
.split(text);
System.out.println(infoMap.get("name"));
System.out.println(infoMap.get("birthday"));

Selenium with Java: use split() with multiple delimiters

I have a string with multiple delimiters, i.e. ,':|£. I want to extract only the number from the string, along with the currency symbol. I tried many possible ways but was unsuccessful. Could someone help me with this.
The entire string is given below. I want to extract only the currency, like £340,346
chartInfoValues(event,'Investment Activity Graph','','Year:|2014|Current:|£340,346|Recommended:|£340,346','aa709fd2','220','80')

also I would recommend you to look at the StringTokenizer java class
Taking an example from the documenation:
"my name is khan"-splitting on the basis of whitespace
StringTokenizer st = new StringTokenizer("my name is khan"," ");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
Hope this helps to you.

So what you want is the regex to use for String.split()? If so this works:
(£[0-9]*),[0-9]*
slighty tidier approach:
£(\d*,\d*)

Using regular expression with Java (some specific characters)

I have this example:
String str = "HellMCo I fiCZMnd thBVMis site intZereVCsting";
String tags = "BCMVZ";
I need a regular expression that helps me to find every combination of tags. As you can see in str we find four variations. I don't know too much about regular expressions.
I'm starting to test with this pattern:
(\d{,1}[BCMVZ])
PD: I'm testing here http://regexpal.com/ but it doesn't work my pattern.
So my real question is, how can I detect any variation of any character from another string?

Maybe try someting like:
[BCMVZ]+
it find any tags combinations with this chars BCMVZ.

Encoding URL strings with regular expression

I'm trying to replace several different characters with different values. For example, if I have: #love hate then I would like to do is get back %23love%20hate
Is it something to do with groups? i tried to understand using groups but i really didn't understand it.

You can try to do this:
String encodedstring = URLEncoder.encode("#love hate","UTF-8");
It will give you the result you want. To revers it you should do this:
String loveHate = URLDecoder.decode(encodedstring);

You don't need RegEx to replace single characters. RegEx is an overkill for such porposes. You can simply use the plain replace method of String class in a loop, for each character that you want to replace.
String output = input.replace("#", "%23");
output = output.replace(" ", "%20");
How many such characters do you want to get replaced?

If you are trying to encode a URL to utf-8 or some encoding using existing classes will be much easier
eg.
commons-httpclient project
URIUtil.encodeWithinQuery(input,"UTF-8");

No, you will need multiple replaces. Another option is to use group to find the next occurrence of one of several strings, inspect what the string is and replace appropriately, perhaps using a map.

i think what you want to achieve is kind of url encoding instead of pure replacement.
see some answers on this thread of SO , especially the one with 7 votes which may be more interesting for you.
HTTP URL Address Encoding in Java

As Mat said, the best way to solve this problem is with URLEncoder. However, if you insist on using regex, then see the sample code in the documentation for java.util.regex.Matcher.appendReplacement:
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
Within the loop, you can use m.group() to see what substring matched and then do a custom substitution based on that. This technique can be used for replacing ${variables} by looking them up in a map, etc.

How to match the word exactly with regex?

I might be asking this question incorrectly but what I would like to do is the following:
Given a large String which could be many 100s of lines long match and replace a word exactly and make sure it does not replace and match any part of any other String.
For example :
Strings to Find = Mac Apple Microsoft Matt Damon I.B.M. Hursley
Replacement Strings = MacO AppleO MicrosoftO MattDamonP I.B.M.O HursleyL
Input String (with some of the escape characters included for clarity) =
"A file to test if it finds different\r\n
bits and bobs like Mac, Apple and Microsoft.\n
I.B.M. in Hursley does sum cool stuff!Wow look it's "Matt Damon"\r\n
Testing something whichwillerrorMac"\n
OUTPUT
"A file to test if it finds different
bits and bobs like MacO, AppleO and MicrosoftO.
I.B.M.O in HursleyL do sum cool stuff!Wow look it's "Matt DamonP"
Testing something whichwillerrorMac"
I have tried using Regex using word boundaries, although this picks up 'whichwhillerrorMacO' on the last line.
I have also tried using the StringTokenizer class and various delimiters to try and replace words, but some of the words I am trying to replace contains these delimiters.
Is there a regex that would solve this problem?

Replacing \b(Mac|Apple)\b with \$1O\ will not touch whichwillerrorMac - it will match whichwill-Mac though.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract substrings from a string in java - java

try String s= "I_PRP recently_RB purchased_VBD this_DT camera_NN"; s = s.replaceAll("_\\w+(?=(\\s|$))", ""); System.out.println(s); prints I recently purchased this camera

It seems that you are attaching a tag to indicate the word type (e.g. noun, verb or pronoun) if this suffix will be always capital letters, it is more safe to use the following regex in your replaceAll s = s.replaceAll("_[A-Z]+(?=(\\s|$))", "");

Related

Most efficient way to get the substring after a specific other substring

Selenium with Java: use split() with multiple delimiters

Using regular expression with Java (some specific characters)

Encoding URL strings with regular expression

How to match the word exactly with regex?

Categories

Resources