Java String Regex Divide - Always the Same Pattern - java

I never understood how to make properly regex to divide my Strings.
I have this types of Strings example = "on[?a, ?b, ?c]";
Sometimes I have this, Strings example2 = "not clear[?c]";
For the first Example I would like to divide into this:
[on, a, b, c]
or
String name = "on";
String [] vars = [a,b,c];
And for the second example I would like to divide into this type:
[not clear, c]
or
String name = "not clear";
String [] vars = [c];
Thanks alot in advance guys ;)

If you know the character set of your identifiers, you can simply do a split on all of the text that isn't in that set. For example, if your identifiers only consist of word characters ([a-zA-Z_0-9]) you can use:
String[] parts = "on[?a, ?b, ?c]".split("[\\W]+");
String name = parts[0];
String[] vars = Arrays.copyOfRange(parts, 1, parts.length);
If your identifiers only have A-Z (upper and lower) you could replace \\W above with ^A-Za-z.
I feel that this is more elegant than using a complex regular expression.
Edit: I realize that this will have issues with your second example "not clear". If you have no option of using something like an underscore instead of a space there, you could do one split on [? (or substring) to get the "name", and another split on the remainder, like so:
String s = "not clear[?a, ?b, ?c]";
String[] parts = s.split("\\[\\?"); //need the '?' so we don't get an extra empty array element in the next split
String name = parts[0];
String[] vars = parts[1].split("[\\W]+");

This comes close, but the problem is the third remembered group is actually repeated so it only captures the last match.
(.*?)\[(?:\s*(?:\?(.*?)(?:\s*,\s*\?(.*?))*)\s*)?]
For example, the first one you list on[?a, ?b, ?c] would give group 1 as on, 2 as a 3 as c. If you are using perl, you could the g flag to apply a regex to a line multiple times and use this:
my #tokens;
while ( my $line =~ /\s*(.*?)\s*[[,\]]/g ) {
push( #tokens, $1 );
}
Note, i did not actually test the perl code, just off the top of my head. It should give you the idea though

String[] parts = example.split("[^\\w ]");
List<String> x = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
if (!"".equals(parts[i]) && !" ".equals(parts[i])) {
x.add(parts[i]);
}
}
This will work as long as you don't have more than one space separating your non-space characters. There's probably a cleverer way of filtering out the null and " " strings.

Related

How to return only first n number of words in a sentence Java

Say i have a simple sentence as below.
For example, this is what have:
A simple sentence consists of only one clause. A compound sentence
consists of two or more independent clauses. A complex sentence has at
least one independent clause plus at least one dependent clause. A set
of words with no independent clause may be an incomplete sentence,
also called a sentence fragment.
I want only first 10 words in the sentence above.
I'm trying to produce the following string:
A simple sentence consists of only one clause. A compound
I tried this:
bigString.split(" " ,10).toString()
But it returns the same bigString wrapped with [] array.
Thanks in advance.
Assume bigString : String equals your text. First thing you want to do is split the string in single words.
String[] words = bigString.split(" ");
How many words do you like to extract?
int n = 10;
Put words together
String newString = "";
for (int i = 0; i < n; i++) { newString = newString + " " + words[i];}
System.out.println(newString);
Hope this is what you needed.
If you want to know more about regular expressions (i.e. to tell java where to split), see here: How to split a string in Java
If you use the split-Method with a limiter (yours is 10) it won't just give you the first 10 parts and stop but give you the first 9 parts and the 10th place of the array contains the rest of the input String. ToString concatenates all Strings from the array resulting in the whole input String. What you can do to achieve what you initially wanted is:
String[] myArray = bigString.split(" " ,11);
myArray[10] = ""; //setting the rest to an empty String
myArray.toString(); //This should give you now what you wanted but surrouned with array so just cut that off iterating the array instead of toString or something.
This will help you
String[] strings = Arrays.stream(bigstring.split(" "))
.limit(10)
.toArray(String[]::new);
Here is exactly what you want:
String[] result = new String[10];
// regex \s matches a whitespace character: [ \t\n\x0B\f\r]
String[] raw = bigString.split("\\s", 11);
// the last entry of raw array is the whole sentence, need to be trimmed.
System.arraycopy(raw, 0, result , 0, 10);
System.out.println(Arrays.toString(result));

Split String in java by specified pattern

How to split this String in java such that I'll get the text occurring between the braces in a String array?
GivenString = "(1,2,3,4,#) (a,s,3,4,5) (22,324,#$%) (123,3def,f34rf,4fe) (32)"
String [] array = GivenString.split("");
Output must be:
array[0] = "1,2,3,4,#"
array[1] = "a,s,3,4,5"
array[2] = "22,324,#$%"
array[3] = "123,3def,f34rf,4fe"
array[4] = "32"
You can try to use:
Matcher mtc = Pattern.compile("\\((.*?)\\)").matcher(yourString);
The best solution is the answer by Rahul Tripathi, but your question said "How to split", so if you must use split() (e.g. this is an assignment), then this regex will do:
^\s*\(|\)\s*\(|\)\s*$
It says:
Match the open-parenthesis at the beginning
Match close-parenthesis followed by open-parenthesis
Match the close-parenthesis at the end
All 3 allowing whitespace.
As a Java regex, that would mean:
str.split("^\\s*\\(|\\)\\s*\\(|\\)\\s*$")
See regex101 for demo.
The problem with using split() is that the leading open-parenthesis causes a split before the first value, resulting in an empty value at the beginning:
array[0] = ""
array[1] = "1,2,3,4,#"
array[2] = "a,s,3,4,5"
array[3] = "22,324,#$%"
array[4] = "123,3def,f34rf,4fe"
array[5] = "32"
That is why Rahul's answer is better, because it won't see such an empty value.
Usually, you would want to use the split() function as this is the easiest way to split a string into multiple arrays when the string is broken up by a key char.
The main problem is that you need information inbetween two chars. The easiest way to solve this problem would to go through the string get ride of every instance of '('. This leaves the string looking like
String = "1,2,3,4,#) a,s,3,4,5) 22,324,#$%) 123,3def,f34rf,4fe) 32)"
And this is perfect, as you can split by the char ')' and not worry about the other bracket interfering with the split. I suggest using the replace("","") where it replaces every instance of the first parameter with the second parameter (we can use "" to delete it).
Here is some example code that may work :
String a = "(1,2,3,4,#) (a,s,3,4,5) (22,324,#$%) (123,3def,f34rf,4fe) (32)"
a = a.replace("(","");
//a is now equal to 1,2,3,4,#) a,s,3,4,5) 22,324,#$%) 123,3def,f34rf,4fe) 32)
String[] parts = a.split("\\)");
System.out.println(parts[0]); //this will print 1,2,3,4,#
I haven't tested it completely, so you may end up with unwanted spaces at the end of the strings you may need to get rid of!
You can then loop through parts[] and it should have all of the required parts for you!

Java split with certain patern

String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";
String[] ab = abc.split("(\\d+),[a-z]");
System.out.println(ab[0]);
Expected Output:
abc_123
low
101.111.111.111,100.254.132.156
abc
1
The problem is i am not able to find appropriate regex for this pattern.
I would suggest to not solve all problems with one regular expression.
It seems that your initial string contains values that are separated by ",". So split those values with ",".
Then iterate the output of that process; and "join" those elements that are IP addresses (as it seems that this is what you are looking for).
And just for the sake of it: keep in mind that IP addresses are actually pretty complicated; a pattern "to match em all" can be found here
You could use lookahead and lookbehind to check, if 3 digits and a . at the correct place are preceding or following the ,:
String[] ab = abc.split("(?<!\\.\\d{3}),|,(?!\\d{3}\\.)");
String[] ab = abc.split(",");
System.out.println(ab[0]);
System.out.println(ab[1]);
int i = 2;
while(ab[i].matches("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")) {
if(i > 2) System.out.print(",");
System.out.print(ab[i++]);
}
System.out.println();
System.out.println(ab[i++]);
System.out.println(ab[i++]);
first split them into array by , ,then apply regex to check whether it is in desired formate or not.If yes then concate all these separated by,
String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";//or something else.
String[] split = abc.split(",");
String concat="";
for(String data:split){
boolean matched=data.matches("[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}");
if(matched){
concat=concat+","+data;
}else{
System.out.println(data);
}
}
if(concat.length()>0)
System.out.println(concat.substring(1));
}

Best way to select parts certain parts of data in a string that changes in size

I'm looking for a good method of parsing certain parts of information in a string that changes in size.
For example, the string might be
"id:1234 alert:a-b up:12.3 down:12.3"
and I need to pick out the value for id, alert, up and down so my initial thought was substring but then I thought that length of the string can change in size for example
"id:123456 alert:a-b-c-d up:12.345 down:12.345"
So using substring each time to look at say characters 3 to 7 may not work each time because it would not capture all of the data needed.
What would be a smart way of selecting each value that is needed? Hopefully I've explained this well as I normally tend to confuse people with my bad explanations. I am programming in Java.
You could simply use String.split(), first to tokenize the whitespace and then to tokenize on your key/value separator (colon in this case):
String line = "id:1234 alert:a-b up:12.3 down:12.3";
// first split the line by whitespace
String[] keyValues = line.split("\\s+");
for (String keyValueString : keyValues) {
String[] keyValue = keyValueString.split(":");
// TODO might want to check for bad data, that we have 2 values
System.out.println(String.format("Key: %-10s Value: %-10s", keyValue[0], keyValue[1]));
}
Result:
Key: id Value: 1234
Key: alert Value: a-b
Key: up Value: 12.3
Key: down Value: 12.3
A basic solution based on regular expressions might look like this:
String input = "id:1234 alert:a-b up:12.3 down:12.3";
Matcher matcher = Pattern.compile("(\\S+):(\\S+)").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1) + " = " + matcher.group(2));
}
This assumes you are looking for one or more non-whitespace characters, then a colon, then one or more non-whitespace characters.
Output:
id = 1234
alert = a-b
up = 12.3
down = 12.3
You can use the method .split() from the String class.
Check this out:
String line = "id:1234 alert:a-b up:12.3 down:12.3";
String []splittedLine = line.split(" ");
for(int i = 0; i <= splittedLine.length;i++){
System.out.println(splittedLine[i]);
}
What you are doing here is splitting your string line on every whitespace character it found.
This is the result:
id:1234
alert:a-b
up:12.3
down:12.3

(Java) Substrings & Reading data from two files using hashmap

If I had a .txt file called animals that had fishfroggoat etc. in it, and another file called owners that had something like:
fish:jane
frog:mark
goat:joe
how could I go about pairing the pets to their owners? I'm fairly sure a HashMap would be good here, but I'm stuck. I put the animal text into a string, but I don't know how to break it up into 4 characters properly.
Any help would be great.
Sorry I didn't add any code, but thanks to you guys' help (especially Ted Hopps) I worked it out and, more importantly, understood it. :-)
There are various approaches. The most direct is to split it using the substring method:
String animals = "fishfroggoat";
String fish = animals.substring(0, 4);
String frog = animals.substring(4, 8);
String goat = animals.substring(8); // or (8, 12)
If you have an arbitrarily long list of 4-character animals, you can do this:
String animals = "fishfroggoatbear";
int n = animals.length() / 4;
String[] animalArray = new String[n];
for (int i = 0; i < n; ++i) {
animalArray[i] = animals.substring(4*i, 4*i + 4);
}
You can split the pet/owner strings using split:
String rawData = "fish:jane";
String[] data = rawData.split(":");
String pet = data[0];
String owner = data[1];
Use String split as given below.
String msg=fish:jane;
msg.split(":")
Then it will make array separate by ":".
This is how you split a string into 4-character chunks in just one line:
String[] animals = input.split("(?<=\\G....)");
This may seem like black magic, so I'll try to demystify it. Welcome to the dark art of regular expressions...
The String.split() method splits the string on every match to the specified regex. So let's look at the regex:
(?<=\\G....)
The construct (?<=regex) is a "positive look behind" for the regex, meaning that the characters preceding the point in the input between characters (because a look behind is zero-width) must natch the regex.
The regex \G (coded as \\G as a java String constant) means "start of previous match" but also initially matches start of input.
The regex .... matches any 4 characters.
Thus, when expressed in English, the regex (?<=\\G....) means "after every characters".
IF anyone is interested, removing \G and splitting on (?<=\....) causes it to split on every character after the 4th = it just means "preceded by 4 characters" - you need the \G to find 4 new characters.
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "fishfroggoatbear";
String[] animals = input.split("(?<=\\G....)");
System.out.println(Arrays.toString(animals));
}
Output:
[fish, frog, goat, bear]

Categories