Remove all the leading zero from the number part of a string - java

I am trying to remove all the leading zero from the number part of a string. I have came up with this code (below). From the given example it worked. But when I add a '0' in the begining it will not give the proper output. Anybody know how to achive this? Thanks in advance
input: (2016)abc00701def00019z -> output: (2016)abc701def19z -> resut: correct
input: 0(2016)abc00701def00019z -> output: (2016)abc71def19z -> result: wrong -> expected output: (2016)abc701def19z
EDIT: The string can contain other than english alphabet.
String localReference = "(2016)abc00701def00019z";
String localReference1 = localReference.replaceAll("[^0-9]+", " ");
List<String> lists = Arrays.asList(localReference1.trim().split(" "));
System.out.println(lists.toString());
String[] replacedString = new String[5];
String[] searchedString = new String[5];
int counter = 0;
for (String list : lists) {
String s = CharMatcher.is('0').trimLeadingFrom(list);
replacedString[counter] = s;
searchedString[counter++] = list;
System.out.println(String.format("Search: %s, replace: %s", list,s));
}
System.out.println(StringUtils.replaceEach(localReference, searchedString, replacedString));

str.replaceAll("(^|[^0-9])0+", "$1");
This removes any row of zeroes after non-digit characters and at the beginning of the string.

I tried doing the task using Regex and was able to do the required according to the two test cases you gave. Also $1 and $2 in the code below are the parts in the () brackets in preceding Regex.
Please find the code below:
public class Demo {
public static void main(String[] args) {
String str = "0(2016)abc00701def00019z";
/*Below line replaces all 0's which come after any a-z or A-Z and which have any number after them from 1-9. */
str = str.replaceAll("([a-zA-Z]+)0+([1-9]+)", "$1$2");
//Below line only replace the 0's coming in the start of the string
str = str.replaceAll("^0+","");
System.out.println(str);
}
}

java has \P{Alpha}+, which matches any non-alphabetic character and then removing the the starting Zero's.
String stringToSearch = "0(2016)abc00701def00019z";
Pattern p1 = Pattern.compile("\\P{Alpha}+");
Matcher m = p1.matcher(stringToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,m.group().replaceAll("\\b0+",""));
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
(2016)abc701def19z

Related

Split a string in java into two parts

I want to split a string based on a substring, and get the first part. Example below.
Input:
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]
Ouptut: splitted at [12]
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]
I wrote this code :
String path1 = "body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]"
String result;
if(path1.contains("[12]")){
System.out.println("yes");
result = path1.split("[12]")[0];
System.out.println(result);
}
but I got result like this :
body/div[
String result = path1.substring(0, path1.indexOf("li[12]") + 6);
The split method accepts regular expressions. The regular expression [12] matches one character which is either 1 or 2 and therefore splits the string between each 1 or 2. A better solution is to search for the occurrence of [12] directly:
int indexOf12 = path1.indexOf("[12]");
if(indexOf12 != -1)
{
System.out.println("yes");
String result = path1.substring(0, indexOf12 + 4);
System.out.println(result);
}
The [ character is interpreted as a special regex character so you should escape it by adding \\
So replace
result = path1.split("[12]")[0];
By
result = path1.split("\\[12]")[0];
Output:
yes
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li
need to add [12] after substring so +6 in result
String result = path1.substring(0, path1.indexOf("li[12]")+6);
This will solve your problem. Thing is you have to provide Regex for split. Not only string.
String path1 = "body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]";
String result;
if(path1.contains("[12]")){
System.out.println("yes");
result = path1.split("\\[12\\]")[0];
System.out.println(result+"[12]");
}
Here's an example of RegEx specific approach:
Matcher m = Pattern.compile("(.*\\[12\\])")
.matcher("body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]");
Output
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]
Code
import java.util.regex.*;
import java.util.*;
public class HelloWorld {
public static void main(String[] args) {
List < String > allMatches = new ArrayList < String > ();
Matcher m = Pattern.compile("(.*\\[12\\])")
.matcher("body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]");
while (m.find())
allMatches.add(m.group(1));
for (String match: allMatches)
System.out.println(match);
}
}

Capitalization of the words in string

How can I avoid of StringIndexOutOfBoundsException in case when string starts with space (" ") or when there're several spaces in the string?
Actually I need to capitalize first letters of the words in the string.
My code looks like:
public static void main(String[] args) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String s = reader.readLine();
String[] array = s.split(" ");
for (String word : array) {
word = word.substring(0, 1).toUpperCase() + word.substring(1); //seems that here's no way to avoid extra spaces
System.out.print(word + " ");
}
}
Tests:
Input: "test test test"
Output: "Test Test Test"
Input: " test test test"
Output:
StringIndexOutOfBoundsException
Expected: " Test Test test"
I'm a Java newbie and any help is very appreciated. Thanks!
split will try to break string in each place where delimiter is found. So if you split on space and space if placed at start of the string like
" foo".split(" ")
you will get as result array which will contain two elements: empty string "" and "foo"
["", "foo"]
Now when you call "".substring(0,1) or "".substring(1) you are using index 1 which doesn't belong to that string.
So simply before you do any String modification based on indexes check if it is safe by testing string length. So check if word you are trying to modify has length grater than 0, or use something more descriptive like if(!word.isEmpty()).
A slight modification to Capitalize first word of a sentence in a string with multiple sentences.
public static void main( String[] args ) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String s = reader.readLine();
int pos = 0;
boolean capitalize = true;
StringBuilder sb = new StringBuilder(s);
while (pos < sb.length()) {
if (sb.charAt(pos) == ' ') {
capitalize = true;
} else if (capitalize && !Character.isWhitespace(sb.charAt(pos))) {
sb.setCharAt(pos, Character.toUpperCase(sb.charAt(pos)));
capitalize = false;
}
pos++;
}
System.out.println(sb.toString());
}
I would avoid using split and go with StringBuilder instead.
Instead of splitting the string, try to simply iterate over all characters within the original string, replacing all characters by its uppercase in case it's the first character of this string or if its predecessor is a space.
Use a regex in your split split all whitespaces
String[] words = s.split("\\s+");
Easier would be to use existing libraries: WordUtils.capitalize(str) (from apache commons-lang).
To fix your current code however, a possible solution would be to use a regex for words (\\w) and a combination of StringBuffer/StringBuilder setCharAt and Character.toUpperCase:
public static void main(String[] args) {
String test = "test test test";
StringBuffer sb = new StringBuffer(test);
Pattern p = Pattern.compile("\\s+\\w"); // Matches 1 or more spaces followed by 1 word
Matcher m = p.matcher(sb);
// Since the sentence doesn't always start with a space, we have to replace the first word manually
sb.setCharAt(0, Character.toUpperCase(sb.charAt(0)));
while (m.find()) {
sb.setCharAt(m.end() - 1, Character.toUpperCase(sb.charAt(m.end() - 1)));
}
System.out.println(sb.toString());
}
Output:
Test Test Test
Capitalize whole words in String using native Java streams
It is really elegant solution and doesnt require 3rd party libraries
String s = "HELLO, capitalized worlD! i am here! ";
CharSequence wordDelimeter = " ";
String res = Arrays.asList(s.split(wordDelimeter.toString())).stream()
.filter(st -> !st.isEmpty())
.map(st -> st.toLowerCase())
.map(st -> st.substring(0, 1).toUpperCase().concat(st.substring(1)))
.collect(Collectors.joining(wordDelimeter.toString()));
System.out.println(s);
System.out.println(res);
The output is
HELLO, capitalized worlD! i am here!
Hello, Capitalized World! I Am Here!

Split string without losing split character

I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?
You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"
Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]
I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]
You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]

Replacing regex with the same amount of "." as its length

See this for my current attempt: http://regexr.com?374vg
I have a regex that captures what I want it to capture, the thing is that the String().replaceAll("regex", ".") replaces everything with just one ., which is fine if it's at the end of the line, but otherwise it doesn't work.
How can I replace every character of the match with a dot, so I get the same amount of . symbols as its length?
Here's a one line solution:
str = str.replaceAll("(?<=COG-\\d{0,99})\\d", ".").replaceAll("COG-(?=\\.+)", "....");
Here's some test code:
String str = "foo bar COG-2134 baz";
str = str.replaceAll("(?<=COG-\\d{0,99})\\d", ".").replaceAll("COG-(?=\\.+)", "....");
System.out.println(str);
Output:
foo bar ........ baz
This is not possible using String#replaceAll. You might be able to use Pattern.compile(regexp) and iterate over the matches like so:
StringBuilder result = new StringBuilder();
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(inputString);
int previous = 0;
while (matcher.find()) {
result.append(inputString.substring(previous, matcher.start()));
result.append(buildStringWithDots(matcher.end() - matcher.start()));
previous = matcher.end();
}
result.append(inputString.substring(previous, inputString.length()));
To use this you have to define buildStringWithDots(int length) to build a String containing length dots.
Consider this code:
Pattern p = Pattern.compile("COG-([0-9]+)");
Matcher mt = p.matcher("Fixed. Added ''Show annualized values' chackbox in EF Comp Report. Also fixed the problem with the missing dots for the positions and the problem, described in COG-18613");
if (mt.find()) {
char[] array = new char[mt.group().length()];
Arrays.fill(array, '.');
System.out.println( " <=> " + mt.replaceAll(new String(array)));
}
OUTPUT:
Fixed. Added ''Show annualized values' chackbox in EF Comp Report. Also fixed the problem with the missing dots for the positions and the problem, described in .........
Personally, I'd simplify your life and just do something like this (for starters). I'll let you finish.
public class Test {
public static void main(String[] args) {
String cog = "COG-19708";
for (int i = cog.indexOf("COG-"); i < cog.length(); i++) {
System.out.println(cog.substring(i,i+1));
// build new string
}
}
}
Can you put your regex in grouping so replace it with string that matches the length of matched grouping? Something like:
regex = (_what_i_want_to_match)
String().replaceAll(regex, create string that has that many '.' as length of $1)
?
note: $1 is what you matched in your search
see also: http://www.regular-expressions.info/brackets.html

Extract the noun words & original sentence from POS Tag

I want to extract the nouns from the sentence and get back the original sentence from the POS Tag
//Extract the words before _NNP & _NN from below and also how to get back the original sentence from the Pos TAG.
Original Sentence:Hi. How are you? This is MikeĀ·
POSTag: Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NN
I tried something like this
String txt = "Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NN";
String re1 = "((?:[a-z][a-z0-9_]*))"; // Variable Name 1
String re2 = ".*?"; // Non-greedy match on filler
String re3 = "(_)"; // Any Single Character 1
String re4 = "(NNP)"; // Word 1
Pattern p = Pattern.compile(re1 + re2 + re3 + re4, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find()) {
String var1 = m.group(1);
System.out.print( var1.toString() );
}
}
output: Hi
But I need a list of all the nouns in the sentence.
To extract the nouns, you can do this:
public static String[] extractNouns(String sentenceWithTags) {
// Split String into array of Strings whenever there is a tag that starts with "._NN"
// followed by zero, one or two more letters (like "_NNP", "_NNPS", or "_NNS")
String[] nouns = sentenceWithTags.split("_NN\\w?\\w?\\b");
// remove all but last word (which is the noun) in every String in the array
for(int index = 0; index < nouns.length; index++) {
nouns[index] = nouns[index].substring(nouns[index].lastIndexOf(" ") + 1)
// Remove all non-word characters from extracted Nouns
.replaceAll("[^\\p{L}\\p{Nd}]", "");
}
return nouns;
}
To extract the original sentence, you can do this:
public static String extractOriginal(String sentenceWithTags) {
return sentenceWithTags.replaceAll("_([A-Z]*)\\b", "");
}
Proof that it works:
public static void main(String[] args) {
String sentence = "Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NN";
System.out.println(java.util.Arrays.toString(extractNouns(sentence)));
System.out.println(extractOriginal(sentence));
}
Output:
[Hi, Mike]
Hi. How are you? This is Mike.
Note: for the regex that removed all non-word characters (like punctuation) from the extracted nouns, I used this Stack Overflow question/answer.
Use while (m.find()) instead of if (m.find()) to iterate over all the matches.
Moreover, your regex can be really simplified:
if you don't need to capture the data, just don't put parenthesis (usually)
you're using ((?:...)) which is quite strange: a non-capturing group directly nested within a capturing group has no sense.
I'm not sure the .*? part does what you expect. If you want to match a dot, use [.] instead.
Thus, try ([a-z][a-z0-9_]*)[.]_NNP instead.
Or even using positive lookahead: [a-z][a-z0-9_]*(?=[.]_NNP). Use m.group() to access the captured data.
This one should work
import java.util.ArrayList;
public class Test {
public static final String NOUN_REGEX = "[a-zA-Z]*_NN\\w?\\w?\\b";
public static ArrayList<String> extractNounsByRegex(String sentenceWithTags) {
ArrayList<String> nouns = new ArrayList<String>();
String[] words = sentenceWithTags.split("\\s+");
for (int i = 0; i < words.length; i++) {
if(words[i].matches(NOUN_REGEX)) {
System.out.println(" Matched ");
//remove the suffix _NN* and retain [a-zA-Z]*
nouns.add(words[i].replaceAll("_NN\\w?\\w?\\b", ""));
}
}
return nouns;
}
public static String extractOriginal(String word) {
return word.replaceAll("_NN\\w?\\w?\\b", "");
}
public static void main(String[] args) {
// String sentence = "Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NN";
String sentence = "Eiffel_NNP tower_NN is_VBZ in_IN paris_NN Hi_NNP How_WRB are_VBP you_PRP This_DT is_VBZ Mike_NNP Barrack_NNP Obama_NNP is_VBZ a_DT president_NN this_VBZ";
System.out.println(extractNounsByRegex(sentence).toString());
System.out.println(sentence);
}
}

Categories