How can I advoice splitting integer?

How can I advoice splitting integer? - java

I'm trying to add spaces between numbers but as result some numbers get split and other sometimes lost.
Code:
String line = "321HELLO how do you do? $ah213 -20d1001x";
line = line.replaceAll("([^d]?)([\\d\\.]+)([^d]?)", "$1 $2 $3");
System.out.println(line);
result:
3 21 HELLO how do "you" do? $ah 213 - 2 0 d1 001 x
Rules:
No matter how big integer is dont split it in many parts.
$ + number ($123) or $ +letter + number ($abc123) dont add space before & after number.
Letter + number = separate it.
Wanted result:
321 HELLO how do "you" do? $ah213 -20 d 1001 x

One small mistake in your regex: [^d] should be [^\\d], otherwise you're checking for the character d rather than the character class \d.
But it still inserts too many spaces, I don't really see a way to avoid that with your current regex.
Something that works:
String line = "321HELLO how do you do? $ah213 -20d1001x";
line = line.replaceAll("(?<=[-\\d.])(?=[^\\s-\\d.])|(?<!\\$[a-z]{0,1000})(?<=[^\\s-\\d.])(?=[-\\d.])", " ");
System.out.println(line);
prints:
321 HELLO how do you do? $ah213 -20 d 1001 x
Explanation:
[-\\d.] is what I presume you classify as "part of a number" (although a . alone will get treated as a number, which may not be desired) (you don't need to escape . inside []).
(?<=...) is positive look-behind, meaning the previous characters match the pattern.
(?=...) is positive look-ahead, meaning the next characters match the pattern.
(?<!...) is negative look-behind, meaning the previous characters don't match the pattern.
So basically whenever you get to a place that is a switching point between number and not number, insert a space (if one doesn't already exist). And the negative look-behind prevents a space from being inserted whenever there is a $ followed by 0-1000 (can't use * in look-behind) letters (will prevent spaces with $123 and $ah123).
Java regex reference.
Additional note:
Turns out you don't really need the ?<= at all, this can be matched regularly.
replaceAll("([-\\d.])(?=[^\\s-\\d.])|(?<!\\$[a-z]{0,1000})([^\\s-\\d.])(?=[-\\d.])", "$1$2 ")

Related

Java 8 regex: a capturing group in a pattern doesn't match, yet the whole pattern does match

This is my first question. Nice to e-meet everyone.
I have created the following regex pattern in Java 8 (this is just a simplified example of what I actually have in my code - for the sake of clarity):
(?<!a)([0-9])\,([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
so in general it consists of three alternatives:
1st one matches two single digits separated with a comma, for example:
1,1
2,0
4,5
2nd one matches two single digits separated with a space, for example:
1 1
2 0
4 5
3rd one matches two single digits in a row, for example:
11
20
45
Each alternative uses lookarounds and their content has to be slightly different for each one of them - that's why I couldn't just put everything together like that:
([0-9])[, ]?([0-9])
Each of the matched digits is enclosed in a capturing group and now I have a second line to 'call out' these captured numbers like this:
(?<!n)($1 $2|$3 $4|$5 $6)(?!n)
So at the end I need to match a text that would have the same digits separated with single space and not surrounded by 'n'. So if any of the examples shown above would be matched by the pattern from the 1st line, the 2nd line pattern should match these:
1 1
2 0
4 5
11 11
22 00
44 55
And not any of these:
n1 1
2,0
45
asd asd asd
The problem is the following: it returns a match even if I do not have these captured digits in the tested text, but I do have space in it... So here I do not get match and that is correct:
aaaaaaaaa
bbbbbbbbb
aasdfasdf
but here I get a match on the following things (most apparently because there is a space/spaces):
abc abc
q w r t y
as df
Does anyone know if this is normal that despite the fact that the characters in capturing groups are not captured by the 1st line, the 'non capturing group' part (so a single space) will be matched and therefore the whole pattern returns match, as if a capturing group could be a zero-length match in the second line if nothing is captured by the first line? Thanks in advance for any comment on this.

Your regex matches whitespace because the resulting pattern for the 1,1 string is (?<!n)(1 1| | )(?!n), and it can match a space that is neither preceded nor followed with a space.
When a replacement backreference does not match any string in a .replaceAll/.replaceFirst it is assigned an empty string (it is assigned null when using .find() / .matches()), and thus you still get the blank alternatives in the resulting pattern.
You may leverage this functionality AND the fact that each alternative has exactly two capturing groups by concatenating replacement backreferences in the string replacement pattern, getting rid of the alternations altogether:
SEARCH: (?<!a)([0-9]),([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
REPLACE: (?<!n)($1 $2|$3 $4|$5 $6)(?!n)
Note how the backreferences are concatenated: all backreferences to odd groups come first, then all backreferences to even groups are placed in a no-alternative pattern.
See the regex demo.
Note that even if the number of groups is different across the alternatives you may just add "fake" empty groups to each of them, and this approach will still work.

Regex not always working with angle brackets

So, in the process of writing a Brainfuck translator in Java I need to split the string following next rules: any of the [ ] , . characters or any sequence of the + - < > should be followed by newline. Here's the input string:
..-<[-]>..[[<<[+[-<-->>+,>-.++]-,>,<[.],][<.,<-]+[-,<->,-]<<[>->-.<-[.<++,>++,].-]]]
And my code:
s = s.replaceAll("(\\+|-|<|>)+", "$0\n")
.replaceAll("\\.|\\,|\\[|\\]", "$0\n");
And the result (SO won't allow this here): https://pastebin.com/ZaT8d5ve
What was expected: https://pastebin.com/gNxcgTSP
It seems that connections of brackets with plus-minus signs are faulty, while angle brackets with square brackets and dot/comma are fine. I can't really get, what's wrong with my solution?

Your output does exactly what you described, sequence of the + - < > is followed by \n so -< becomes -<\n not -\n<\n.
If I understand you correctly you want to split of sequence of same characters which is either + - < > to have \n after it. If that is the case then instead of
s.replaceAll("(\\+|-|<|>)+", "$0\n")
you can use
s.replaceAll("(\\+|-|<|>)\\1*", "$0\n")
\1 is backreference to match from group 1 (here (\\+|-|<|>)), so it matches one of those characters and its optional following repetitions.

You seem to think that
(\\+|-|<|>)+
would match only sequences of identical characters like ++ whereas it also matches any sequence of these characters like -<-->>.
You also don't need two regexes in sequence. The following should do:
s = s.replaceAll("([+<>-])\\1*|[,.\\[\\]]", "$0\n");

Removing Certain Characters inside a String, Java

My problem here is that i want a Character remove in some parts of a String but I do not know how to restrict the removing.
Example:
A computer is a general purpose device that can be\n
programmed to carry out a finite set of\n
millions to billions of times more capable.\n
\n
In this era mechanical analog computers were used\n
for military applications.\n
1.1 Limited-function early computers\n
1.2 First general-purpose computers\n
1.3 Stored-program architecture\n
1.4 Semiconductors and\n
this here example is the content of my string, what i want to happen is to remove the \n of lines 1 and 2 above but not to remove the \n in line 5 onwards. How do i remove the \n without removing the other \n?. My Goal here is to make the string a paragraph without \n after line. like the example the first 3 lines can be a paragraph and the next lines are in bullet form(example). what i am saying is that I do not want to remove \n in bulleted characters.
The real contents of the string is dynamic.
I have tried using String.replaceAll("\n", " ") well clearly that would not work it will remove all the \n i have thought of using Regex to determine what is Alphanumberic but it would remove some letters after \n

Try using this regex: -
str = str.replaceAll("(.+)(?<!\\.)\n(?!\\d)", "$1 ");
System.out.println(str);
This will replace your \n if it is not preceded by a dot - termination of a paragraph, and it is not followed by a digit, for when it is followed by a bulleted point. (like, your \n in first bullet point is followed by a 1.2. So, it will not be replaced.).
(.+) at the start, ensures that you are not replacing a blank line.
This will work for the string you have shown.
Explanation: -
(.+) -> A capture group, capturing anything, occurring at least once.
(?<!\\.) -> This is called negative-look-behind. It matches the string following it, only if that string is not preceded by a dot(.) given in the negative-look-behind pattern.
For e.g.: - You don't need to replace \n after the line: - millions to billions of times more capable.\n.
(?!\\d) -> This is called negative -look-ahead. It matches string behind it, only if that string is not followed by a digit (\\d) given in the negative-look-ahead pattern.
For e.g.: - In your bulleted points, computers\n is followed by 1.2. where 1 is a digit. So, you don't want to replace that \n.
Now, $1 and $2 represent the groups captured in the pattern match. Since you just want to replace "\n". So, we took the remaining pattern match as it is, while replacing "\n" with a space.
So, $1 is representation for 1st group - (.+)
Note, look-ahead and look-behind regexes are non-capturing groups.
For More Details, follow these links: -
http://docs.oracle.com/javase/tutorial/essential/regex/
http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

I suspect your requirement is to remove the \n of lines 1 and 2 .
What you can do is as below:
split your string into segments,
String[] array = yourString.split("\n");
concat every segments by adding \n tag, except line 1,2
array[1] + array[2] + array[3] + '\n' + array[4] + '\n' ...// and so
forth

regex to match a recurring pattern

I am trying to write a regex for java that will match the following string:
number,number,number (it could be this simple or it could have a variable number of numbers, but each number has to have a comma after it there will not be any white space though)
here was my attempt:
[[0-9],[0-9]]+
but it seems to match anything with a number in it

You could try something along the lines of ([0-9]+,)*[0-9]+
This will match:
Only one number, e.g.: 7
Two numbers, e.g.: 7,52
Three numbers, e.g.: 7,52,999
etc.
This will not match:
Things with spaces, e.g.: 7, 52
A list ending with a comma, e.g.: 7, 52,
Many other things out of the scope of this problem.

I think this would work
\d+,(\d+,)+
Note that as you want, that will only capture number followed by a comma

I guess you are starting with a String. Why don't you just use String.split(",") ?

^ means the start of a string and $ means the end. If you don't use those, you could match something in the middle (b matched "abc").
The + works on the element before it. b is an element, [0-9] is an element, and so are groups (things wrapped in parenthesis).
So, the regex you want matches:
The start of the string ^
a number [0-9]
any amount of comas flowed by numbers (,[0-9])+
the end of the string $
or, ^[0-9](,[0-9])+$

Try regex as [\d,]* string representation as [\\d,]* e.g. below:
Pattern p4 = Pattern.compile("[\\d,]*");
Matcher m4 = p4.matcher("12,1212,1212ad,v");
System.out.println(m4.find()); //prints true
System.out.println(m4.group());//prints 12,1212,1212
If you want to match minimum one comma (,) and two numbers e.g. 12,1212 then you may want to use regex as (\d+,)+\d+ with string representation as \\d+,)+\\d+. This regex matches a a region with a number minimum one digit followed by one comma(,) followed by minimum one digit number.

Dangling Meta Character and Regular expression Pattern for the String

I want to create a pattern for the following format of string. I have come with the following format but I am stuck as I am not able to scan it properly. Below are the details
Example String: JAS 5F W 123 or BWER34 23 C 23
Above String has the following rules to be followed.
The last digits can be 2 or 3 digit numbers only (123 && 023 or
23)
Before that only a single character is allowed case insensitive (W or c)
Before that only 2 digits or one digit and a character only "f"or"F" is allowed.
Starting of string can be any String alphanumeric string of any length.
All the parts are separated by space
I came up with the following String pattern but when i run my java program it gives dangling meta character.
"*\\s([0-9][fF]|[1-9][0-9])\\s([a-zA-Z])\\s(\\d\\d|\\d\\d\\d)$"
Please help me in creating the correct pattern for the above String

First of all you use a quantifier but don't quantify anything: remove the first * or add something before it. This causes the "dangling metacharacter" message.
Second \\d\\d|\\d\\d\\d could be rewritten to \\d{2,3} (two or three digits).
Finally, you can make the expression case insensitive by adding a (?i) prefix thus allowing you to write it as follows:
"(?i).*\\s(\\df|[1-9]\\d)\\s([a-z])\\s(\\d{2,3})$"
Note that I assume you want to match anything before the query and thus I added a dot before the asterisk: .*. If you use Pattern directly (i.e. not String#matches()) you don't even need that.
Before that only 2 digits or one digit and a character only "f"or"F" is allowed.
Would that allow 05 as well (those are two digits)? If so, you could rewrite that part as \\df|\\d{2}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I advoice splitting integer? - java

Related

Java 8 regex: a capturing group in a pattern doesn't match, yet the whole pattern does match

Regex not always working with angle brackets

Removing Certain Characters inside a String, Java

regex to match a recurring pattern

Dangling Meta Character and Regular expression Pattern for the String

Categories

Resources