Regular expression to split text on space and newline? [duplicate] - java

This question already has answers here:
Regex match empty lines
(2 answers)
Closed 4 years ago.
I have a String that looks as follows:
This is line number 1.
[space][space][space][space]\n
[space]\n
This is line number 2.
where every [space] represents a blank space and \n represents a new line.
What I would like to do is to split this string into two strings, one that has "This is line number 1." and the other that contains "This is line number 2." In other words split the string on every two empty lines regardless of whether they contain spaces or not.
What I tried to do:
System.out.println(myString.split("^[ ]{0,}\\n")[0]);
But the above prints the whole string.
UPDATE
Other things I have tried that also print the whole string and don't seem to work:
System.out.println(myString.split("(^[ ]{0,}\\n){2,}")[0]);
These all print the whole string as well. Any ideas?

Simply enable the multiline flag in your pattern like this.
myString.split("(?m)^[ ]{0,}\n")
The ?m character adds a multiline flag you can pass without using Java's Regex class.
This should work, not sure if you get the extra split caused by the first line.

I'm just briefly looking through things on a break at work so perhaps I haven't read the question thoroughly enough, but have you tried:
System.out.println(parsedText.split("^[ ]{0,}\n\n")[0]);
Seems like you are not completely skipping two lines in your code. Once again might be wrong but worth a shot!

Related

Regular expression construct for arbitrary string or number of characters in java [duplicate]

This question already has answers here:
Regex to split a CSV
(18 answers)
Closed 1 year ago.
I am trying to split data from a .csv file, however, some of the fields/columns also contain commas in between just like this
ABCKS,"ASK,ED","SDR,ED",2022-07-11,8011.0
cvbgb,"hfhvnf,rgr","dthd,chdf",2022-07-11,111.9
ABCKS,"ASK,ED","SDR,ED",2022-07-11,8011.0
hence, the .split(",") string method would create additional fields into the data.
I have tried
if (aLine.contains("\"|,|\"")){
String newString = aLine.replaceAll("\"|,|\"","|_|").replaceAll("\"", "");
aList = Arrays.asList(newString.split(",", -1));
}
It does not seem to work.
As Tim said above, a proper CSV reader would probably be better but a simple solution could be something like this: ,(?![A-Za-z]+"). Where you select every comma that is not followed by letters and a quotation. This satisfies your sample data but if there are edge cases it can easily break.

Remove White Spaces between Specific Substring in a String [duplicate]

This question already has answers here:
Which is the best library for XML parsing in java [closed]
(7 answers)
Closed 5 years ago.
cWhats i want is that all the spaces between <abc> tag to be removed and keep the spaces bwtween <efg> tag
<abc>this is between abc</abc><efg>this is between efg</efg>
<efg>this is between efg</efg><abc>this is between abc</abc>
i want output:
<abc>thisisbetweenabc</abc><efg>this is between efg</efg>
<efg>this is between efg</efg><abc>thisisbetweenabc</abc>
string = string.replaceAll("<abc> </abc>", ""); its not working for me
Brief
I urge you to use an XML parser!!! Anyway, if it's a limited, known set of HTML, you can use the following regex (as per my original comment).
Note: This solution only works on a limited, known set of HTML. If you input differs from what you posted in your question it is likely this solution will not work. See Pshemo's comment below your question.
Note 2: The OP changed the format of the input, thus my original answer will no longer work. See original input below. (Exactly why I put a limited, known set of HTML). In the Code section I've added a second regex that works on the OP's newly added input.
Code
See regex in use here
(?:^(<abc>)|\G(?!^))(\S+)[ \t]*
Replace with $1$2
With the new input format, the following regex can be used (as seen in use here):
(?:^(<abc>)|\G(?!^))([^\s<]+)[ \t]*
Results
Input
<abc>this is between abc</abc>
<efg>this is between efg</efg>
<abc>this is between abc</abc>
<efg>this is between efg</efg>
Output
<abc>thisisbetweenabc</abc>
<efg>this is between efg</efg>
<abc>thisisbetweenabc</abc>
<efg>this is between efg</efg>
Explanation
(?:^(<abc>)|\G(?!^)) Match either of the following
^(<abc>) Match the following
^ Assert position at the start of the line
(<abc>) Capture <abc> literally into capture group 1
\G(?!^) Assert position at the end of the previous match
(\S+) Capture any non-whitespace character one or more times into capture group 2
[ \t]* Match space or tab characters any number of times
Simple just do
xml = my overall string with <abc> and </abc> stuff
start = xml.indexOf('<abc>')
end = xml.indexOf('</abc>')
totalCharsToInclude = end - start (get the length to run from start)
abcOnly = xml.subString(start, totalCharsToInclude),
abcOnly = abcOnly.replace(" ", "")
This is all pseduo code, but you can easily mimic it. You may also have to tweak the indexes with plus or minus, I am not in front of your code to test it, but you should be able to get what you need from this.
Disclaimer: Using XML parser is far better way to handle this, then manipulating strings, but I'll assume you have your reasons, so I'll answer the question you asked, instead of telling you to go get XML parser lol. Good luck.

How to use Split in report's expression [duplicate]

This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 4 years ago.
Currently attempting to split a large string field into 3 smaller fields. The string delimited by a "/". Example String:
0123/ABCD1234/EFGH909883432212
At the moment I have managed to pull the middle section out using the following expression inside a variable:
$F{String}.split("/" ,5)[1].trim()
To be perfectly honest I am not sure how it works as I do not know what the 5 and 1 are for (which is probably what I need to know to get the other two sections)
After calling method spit an array is created holding substrings which are delimited by "/" in the original string. The trim removes any trailing spaces.
Number 5 resembles optional parameter.
An integer that specifies the number of splits, items after the split limit will not be included in the array.

Java regex plus at the beginning is optional

I would like to write Java regex where plus at the beginning is optional
I try this but not working correctly
[+]+[0-9]{3,}
so that +123 and 123 is valid
What I am doing wrong?
As Hamza commented below, use [+]?[0-9]{3,}. A question mark means one or none of the previous, which in this case means one or no + before the three numbers.

Regular Expression to Match Number of Lines and Characters per Line

I'm trying to make sure that a string contains between 0 and 3 lines, and that for a given line that is present that it contains 0 to 100 characters. It would need to be a valid expression for JavaScript and Java. Like many people doing RegEx I'm copying from various spots on the Internet.
Working backwards I think ^.{0,100}$ gets me the "line contains 0 to 100 characters", but trying to group that as (^.{0,100}$){0,3} doesn't work.
The new line character is probably part of my problem, so I ended up with something like .{0,100}(?:\n.{0,100}){0,2} trying to say "a line of 0 to 100 characters optionally followed by 0 to 2 instances of a new line and 0 to 100 more characters", but that also failed.
Up until now I got those expressions from other people. Using an online test tool I finally monkeyed this together: ^.{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}$ which appears to work.
So, my question is, am I missing any pitfalls in ^.{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}$ given what I'm after? Furthermore, even if that does work is it the best expression to use?
I think what you have will work fine. You can make the line break part a little more compact if you want, and you don't need ^ and $ if you are using matches():
String regex = ".{0,100}(?:[\r\n]+.{0,100}){0,2}";
EDIT
After some more thoughts I realized the newline suggestion above will match 4 (or more) lines as long as a couple of them are empty. So, we are back to your suggested example. Oh well, at least the start and end characters can be omitted.
String regex = ".{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}";
I'm not very good at regular expressions but would this work?
^.{0,100}\n?(.{0,100}\n)?.{0,100}?$
Again I'm still new to reg exp, so if there is an error(which is likely) please tell me.

Categories