Java - Regex split decimal, minus, and math operation - java

i need to split h[0] to first number("-12.0"), h[1] to operation symbol(+) and h[2] to second number(-15.3) but i don't know how it works
a=12.0+-15.3;
h = a.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
Could somebody help me?

You may use a regex to match * or / anywhere n the string and - and + only when they are after a digit. In case of a match expression, you may match + or - after a word char, so, basically, you may check for a word boundary on the left: [/*]|\b[-+].
See the regex demo.
Then just split and keep the matches:
public static final Pattern regex = Pattern.compile("[/*]|\\b[-+]");
public static List<String> split(String s, Pattern pattern) {
Matcher m = pattern.matcher(s);
List<String> ret = new ArrayList<String>();
int start = 0;
while (m.find()) {
ret.add(s.substring(start, m.start()));
ret.add(m.group());
start = m.end();
}
if (start >= s.length()) {
ret.add(s.substring(start));
}
return ret;
}
Usage example:
String s = "12.0+-15.3*-45.7/+67.9";
List<String> res = split(s, regex);
System.out.println(res);
// => [12.0, +, -15.3, *, -45.7, /]
See the Java demo

Related

How to find first character after second dot java

Do you have any ideas how could I get first character after second dot of the string.
String str1 = "test.1231.asdasd.cccc.2.a.2";
String str2 = "aaa.1.22224.sadsada";
In first case I should get a and in second 2.
I thought about dividing string with dot, and extracting first character of third element. But it seems to complicated and I think there is better way.
How about a regex for this?
Pattern p = Pattern.compile(".+?\\..+?\\.(\\w)");
Matcher m = p.matcher(str1);
if (m.find()) {
System.out.println(m.group(1));
}
The regex says: find anything one or more times in a non-greedy fashion (.+?), that must be followed by a dot (\\.), than again anything one or more times in a non-greedy fashion (.+?) followed by a dot (\\.). After this was matched take the first word character in the first group ((\\w)).
Usually regex will do an excellent work here. Still if you are looking for something more customizable then consider the following implementation:
private static int positionOf(String source, String target, int match) {
if (match < 1) {
return -1;
}
int result = -1;
do {
result = source.indexOf(target, result + target.length());
} while (--match > 0 && result > 0);
return result;
}
and then the test is done with:
String str1 = "test..1231.asdasd.cccc..2.a.2.";
System.out.println(positionOf(str1, ".", 3)); -> // prints 10
System.out.println(positionOf(str1, "c", 4)); -> // prints 21
System.out.println(positionOf(str1, "c", 5)); -> // prints -1
System.out.println(positionOf(str1, "..", 2)); -> // prints 22 -> just have in mind that the first symbol after the match is at position 22 + target.length() and also there might be none element with such index in the char array.
Without using pattern, you can use subString and charAt method of String class to achieve this
// You can return String instead of char
public static char returnSecondChar(String strParam) {
String tmpSubString = "";
// First check if . exists in the string.
if (strParam.indexOf('.') != -1) {
// If yes, then extract substring starting from .+1
tmpSubString = strParam.substring(strParam.indexOf('.') + 1);
System.out.println(tmpSubString);
// Check if second '.' exists
if (tmpSubString.indexOf('.') != -1) {
// If it exists, get the char at index of . + 1
return tmpSubString.charAt(tmpSubString.indexOf('.') + 1);
}
}
// If 2 '.' don't exists in the string, return '-'. Here you can return any thing
return '-';
}
You could do it by splitting the String like this:
public static void main(String[] args) {
String str1 = "test.1231.asdasd.cccc.2.a.2";
String str2 = "aaa.1.22224.sadsada";
System.out.println(getCharAfterSecondDot(str1));
System.out.println(getCharAfterSecondDot(str2));
}
public static char getCharAfterSecondDot(String s) {
String[] split = s.split("\\.");
// TODO check if there are values in the array!
return split[2].charAt(0);
}
I don't think it is too complicated, but using a directly matching regex is a very good (maybe better) solution anyway.
Please note that there might be the case of a String input with less than two dots, which would have to be handled (see TODO comment in the code).
You can use Java Stream API since Java 8:
String string = "test.1231.asdasd.cccc.2.a.2";
Arrays.stream(string.split("\\.")) // Split by dot
.skip(2).limit(1) // Skip 2 initial parts and limit to one
.map(i -> i.substring(0, 1)) // Map to the first character
.findFirst().ifPresent(System.out::println); // Get first and print if exists
However, I recommend you to stick with Regex, which is safer and a correct way to do so:
Here is the Regex you need (demo available at Regex101):
.*?\..*?\.(.).*
Don't forget to escape the special characters with double-slash \\.
String[] array = new String[3];
array[0] = "test.1231.asdasd.cccc.2.a.2";
array[1] = "aaa.1.22224.sadsada";
array[2] = "test";
Pattern p = Pattern.compile(".*?\\..*?\\.(.).*");
for (int i=0; i<array.length; i++) {
Matcher m = p.matcher(array[i]);
if (m.find()) {
System.out.println(m.group(1));
}
}
This code prints two results on each line: a, 2 and an empty lane because on the 3rd String, there is no match.
A plain solution using String.indexOf:
public static Character getCharAfterSecondDot(String s) {
int indexOfFirstDot = s.indexOf('.');
if (!isValidIndex(indexOfFirstDot, s)) {
return null;
}
int indexOfSecondDot = s.indexOf('.', indexOfFirstDot + 1);
return isValidIndex(indexOfSecondDot, s) ?
s.charAt(indexOfSecondDot + 1) :
null;
}
protected static boolean isValidIndex(int index, String s) {
return index != -1 && index < s.length() - 1;
}
Using indexOf(int ch) and indexOf(int ch, int fromIndex) needs only to examine all characters in worst case.
And a second version implementing the same logic using indexOf with Optional:
public static Character getCharAfterSecondDot(String s) {
return Optional.of(s.indexOf('.'))
.filter(i -> isValidIndex(i, s))
.map(i -> s.indexOf('.', i + 1))
.filter(i -> isValidIndex(i, s))
.map(i -> s.charAt(i + 1))
.orElse(null);
}
Just another approach, not a one-liner code but simple.
public class Test{
public static void main (String[] args){
for(String str:new String[]{"test.1231.asdasd.cccc.2.a.2","aaa.1.22224.sadsada"}){
int n = 0;
for(char c : str.toCharArray()){
if(2 == n){
System.out.printf("found char: %c%n",c);
break;
}
if('.' == c){
n ++;
}
}
}
}
}
found char: a
found char: 2

Finding longest regex match in Java?

I have this:
import java.util.regex.*;
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = m.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
The above only prints hello whereas I want it to print hello world.
One way to fix this is to re-order the groups in String regex = "(?<m2>(hello world))|(?<m1>(hello|universe))" but I don't have control over the regex I get in my case...
So what is the best way to find the longest match? An obvious way would be to check all possible substrings of s as mentioned here (Efficiently finding all overlapping matches for a regular expression) by length and pick the first but that is O(n^2). Can we do better?
Here is a way of doing it using matcher regions, but with a single loop over the string index:
public static String findLongestMatch(String regex, String s) {
Pattern pattern = Pattern.compile("(" + regex + ")$");
Matcher matcher = pattern.matcher(s);
String longest = null;
int longestLength = -1;
for (int i = s.length(); i > longestLength; i--) {
matcher.region(0, i);
if (matcher.find() && longestLength < matcher.end() - matcher.start()) {
longest = matcher.group();
longestLength = longest.length();
}
}
return longest;
}
I'm forcing the pattern to match until the region's end, and then I move the region's end from the rightmost string index towards the left. For each region's end tried, Java will match the leftmost starting substring that finishes at that region's end, i.e. the longest substring that ends at that place. Finally, it's just a matter of keeping track of the longest match found so far.
As a matter of optimization, and since I start from the longer regions towards the shorter ones, I stop the loop as soon as all regions that would come after are already shorter than the length of longest substring already found.
An advantage of this approach is that it can deal with arbitrary regular expressions and no specific pattern structure is required:
findLongestMatch("(?<m1>(hello|universe))|(?<m2>(hello world))", "hello world")
==> "hello world"
findLongestMatch("hello( universe)?", "hello world")
==> "hello"
findLongestMatch("hello( world)?", "hello world")
==> "hello world"
findLongestMatch("\\w+|\\d+", "12345 abc")
==> "12345"
If you are dealing with just this specific pattern:
There is one or more named group on the highest level connected by |.
The regex for the group is put in superfluous braces.
Inside those braces is one or more literal connected by |.
Literals never contain |, ( or ).
Then it is possible to write a solution by extracting the literals, sorting them by their length and then returning the first match:
private static final Pattern g = Pattern.compile("\\(\\?\\<[^>]+\\>\\(([^)]+)\\)\\)");
public static final String findLongestMatch(String s, Pattern p) {
Matcher m = g.matcher(p.pattern());
List<String> literals = new ArrayList<>();
while (m.find())
Collections.addAll(literals, m.group(1).split("\\|"));
Collections.sort(literals, new Comparator<String>() {
public int compare(String a, String b) {
return Integer.compare(b.length(), a.length());
}
});
for (Iterator<String> itr = literals.iterator(); itr.hasNext();) {
String literal = itr.next();
if (s.indexOf(literal) >= 0)
return literal;
}
return null;
}
Test:
System.out.println(findLongestMatch(
"hello world",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: hello world
System.out.println(findLongestMatch(
"hello universe",
Pattern.compile("(?<m1>(hello|universe))|(?<m2>(hello world))")
));
// output: universe
just add the $ (End of string) before the Or separator |.
Then it check whether the string is ended of not. If ended, it will return the string. Otherwise skip that part of regex.
The below code gives what you want
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))";
String s = "hello world";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
Likewise, the below code will skip hello , hello world and match hello world there
See the usage of $ there
import java.util.regex.*;
public class RegTest{
public static void main(String[] arg){
String regex = "(?<m1>(hello|universe))$|(?<m2>(hello world))$|(?<m3>(hello world there))";
String s = "hello world there";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
System.out.println(substring);
}
}
}
If the structure of the regex is always the same, this should work:
String regex = "(?<m1>(hello|universe))|(?<m2>(hello world))";
String s = "hello world";
//split the regex into the different groups
String[] allParts = regex.split("\\|\\(\\?\\<");
for (int i=1; i<allParts.length; i++) {
allParts[i] = "(?<" + allParts[i];
}
//find the longest string
int longestSize = -1;
String longestString = null;
for (int i=0; i<allParts.length; i++) {
Pattern pattern = Pattern.compile(allParts[i]);
Matcher matcher = pattern.matcher(s);
while(matcher.find()) {
MatchResult matchResult = matcher.toMatchResult();
String substring = s.substring(matchResult.start(), matchResult.end());
if (substring.length() > longestSize) {
longestSize = substring.length();
longestString = substring;
}
}
}
System.out.println("Longest: " + longestString);

Need a regexp to extract a Sub String of a String

I have a string, The string looks like :
abc/axs/abc/def/gh/ij/kl/mn/src/main/resources/xx.xml
I want to get the content after n occurrences and before m occurrences of the character /.
For instance, from the string above, I want:
mn/src/main
Please suggest some solution for this.
the regex you require is this :
(?:.*?\/){7}(.*?)(.*)(?:\/.*?){2}$
a generic regex:
(?:.*?\/){n}(.*?)(.*)(?:\/.*?){m}$
substitute 7 and 2 with n and m and you will get your result
demo here:
http://regex101.com/r/bW2yF3
Use split().
String path = "abc/axs/abc/def/gh/ij/kl/mn/src/main/resources/xx.xml"
String [] tokens = path.split("/");
Now just print it:
for (int i = n; i < m; i++){
System.out.print(tokens[i] + (i != m - 1 ? "/" : ""));
}
If you must use regex:
String s = "abc/axs/abc/def/gh/ij/kl/mn/src/main/resources/xx.xml";
int n = 7;
int m = 10;
Pattern p = Pattern.compile("(?:[^/]*/){" + n + "}((?:[^/]*/){" + (m - n - 1) + "}[^/]*)/.*");
Matcher matcher = p.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
}

Java string -> two numbers

The operation I'm hoping to perform is to go from:
String "32.63578..."
to:
float 32.63
long 578... //where '...' is rest of string
Something like the following in Python:
split = str.find('.')+2
float = str[:split]
long = str[split:]
I'm new to Java, so I began by trying to look up equivalents, however it seems like a more convoluted solution than perhaps a regex would be? Unless there's more similar functions to python than splitting into a char array, and repeatedly iterating over?
Use indexOf and substring methods:
String str = "32.63578";
int i = str.indexOf(".") + 3;
String part1 = str.substring(0, i); // "32.63"
String part2 = str.substring(i); // "578"
float num1 = Float.parseFloat(part1); // 32.63
long num2 = Long.parseLong(part2); // 578
Regular expression alternative:
String str = "32.63578";
String[] parts = str.split("(?<=\\.\\d{2})");
System.out.println(parts[0]); // "32.63"
System.out.println(parts[1]); // "578"
About the regular expression used:
(?<=\.\d{2})
It's positive lookbehind (?<=...). It matches at the position where is preceded by . and 2 digits.
You can use the split method in String if you want to cleanly break the two parts.
However, if you want to have trailing decimals like in your example, you'll probably want to do something like this:
String str = "32.63578...";
String substr1, substr2;
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == '.')
{
substr1 = str.substring(0, i + 3);
substr2 = str.substring(i + 3, str.length());
break;
}
}
//convert substr1 and substr2 here
String s ="32.63578";
Pattern pattern = Pattern.compile("(?<Start>\\d{1,10}.\\d{1,2})(?<End>\\d{1,10})");
Matcher match = pattern.matcher(s);
if (match.find()) {
String start = match.group("Start");
String ending = match.group("End");
System.out.println(start);
System.out.println(ending);
}

trim all 'spaces' from String

I am parsing a PDF and getting a lot of Strings with \t, \r, \n,\s... And they appear on both ends of the String and don't appear in order. So I can have
ex:
"\t\s\t\nSome important data I need surrounded by useless data \r\t\s\s\r\t\t"
. Is there any efficient ways to trim these Strings?
What I have so far which isn't good enough because I want some symbols.:
public static String trimToLetters(String sourceString) {
int beginIndex = 0;
int endIndex = sourceString.length() - 1;
Pattern p = Pattern.compile("[A-Z_a-z\\;\\.\\(\\)\\*\\?\\:\\\"\\']");
Matcher matcher = p.matcher(sourceString);
if (matcher.find()) {
if (matcher.start() >= 0) {
beginIndex = matcher.start();
StringBuilder sb = new StringBuilder(sourceString);
String sourceReverse = sb.reverse().toString();
matcher = p.matcher(sourceReverse);
if (matcher.find()) {
endIndex = sourceString.length() - matcher.start();
}
}
}
return sourceString.substring(beginIndex, endIndex);
}
The trim method of the String should be able to remove all whitespace from both ends of the string:
trim: Returns a copy of the string, with leading and trailing whitespace omitted.
P.S. \s is not a valid escape sequence in Java.

Categories