Regex display with arrays - java

So I have a regex question. When running this code
if (str1.trim().contains(search2)){
String str3 = str1;
str3 = str3.replaceAll("[^-?0-9]+", " ");
System.out.println("location: " + Arrays.asList(str3.trim().split(" ")));
System.out.println(" ");
}
it produces
location: [290, -70]
is it possible to replace the bracket characters with "[ x, x]" with "x x" so that they just show the characters within quotes?
location: "290 -70"?
I'm kinda new to regex so I tried some things like .replace("[", " "); but it did not work.
EDIT ----
Here's my entire code.
public static void main (String [] args) throws IOException {
BufferedReader in = new BufferedReader (new FileReader ("/Users/Dannybwee/Documents/workspace/csc199/src/csc199/test.txt"));
String str;
List<String> finallist = new ArrayList<String>();
while ((str = in.readLine()) != null){
finallist.add(str);
}
String search = "node";
String search2 = "position";
for (String str1: finallist) {
if (str1.trim().contains(search)){
System.out.print("{ key " + str1+ ",\n" +
"name: " + str1 + ",\n" +
"Truth: 'Tainted'," + "\n" +
"False: 'NotTainted, \n");
}
if (str1.trim().contains(search2)){
String str3 = str1;
str3 = str3.replaceAll("[^-?0-9]+", " ");
System.out.println("location: " + Arrays.asList(str3.trim().split(" ")));
System.out.println("}");
}
}
}
What i'm trying to do is take a text file, and then change the formatting of the text. I thought it would be easiest to take the file and scan for what needed to change. for instance, All I want is to change the brackets outputed above to braces.
So basically I want it to output location: "290 -70" instead of location: [290, -70] without the comma and brackets

I'm splitting because the line is positions = (number number); What I'm trying to do is just extract the number from that index
Then if you split, you get ["(number", "number)"].
You want to remove the round brackets, not the square ones. And you have already done that [^-?0-9]+ removes all characters but one or more 0-9, -, and ?
You don't need to split anything.
if (str1.trim().contains(search2)){
str1 = st1.replaceAll("[^-?0-9]+", " ");
System.out.println("location: \"" + str1 + "\"");
System.out.println("}");
}
You could also forget the regex entirely and use str1.substring(1, str1.length() - 1)
By the way, if you are trying to produce JSON, it isn't valid. The keys need to be quoted

you can specify the literal bracket with the backslash "escape character" \[. This is common for many regex entries that also correspond to triggered characters.
\\ , \. , \( ... etc
It is important to note that in Java we must escape our escape character, therefore whenever you use it you'll need a single backslash for each backslash:
\\[, \\\\, \\., \\( ... etc
You can implement this into your existing code, or you could make your life a little easier by using a pattern matcher.
Pattern p = Pattern.compile("\\D+?(-?\\d++)\\D+?(-?\\d++)\\D*");
Matcher m = p.matcher(STRING);
String results = "location: "+m.group(1)+" "+m.group(2);
\\D+? eliminates non-digit (0-9) characters reluctantly, this will spare the '-' when found.
(-?\\d++) will capture m.group(n) which will possessively contain as many digits as it can find in a row. Since the '-' was spared earlier it should be present for this capture if at all.

Related

How to remove single line transitions from a string, but leave one for double ones?

I count lines from files in string .
String testing=newString(Files.readAllBytes(Paths.get(path)),StandardCharsets.UTF_8).replaceAll("\\r\\n{1}", "");
What regular expression or something similar can be used to remove single line transitions, but leave double ones?
That is for example:
the elements in the file go like this:
A
1
b
;
d
When I do:
String testing = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8).replaceAll("\\r\\n", "");
LinkedList<Character> alphabetList = testing.chars().mapToObj(i -> (char)i).collect(Collectors.toList());
I get that the list contains such elements:
A
\\r
\\n
\\r
\\n
1
\\r
\\n
b
And I need the single ones to be removed , and the single ones replaced with " ".
Try this.
public static void main(String[] args) {
String str =
"afaf\r\n"
+ "\r\n"
+ "af1\r\n"
+ "\r\n"
+ "\r\n"
+ "23131";
System.out.println(str.replaceAll("\\r\\n((\\r\\n)+)", "$1"));
}
output:
afaf
af1
23131

Replace different combinations of whitespaces, tabs and carriage returns with a single white space

I would likwe to replace different combinations of whitespaces, tabs and carriage returns with a single white space.
So far i got a solution that works:
String stringValue="";
stringValue = stringValue.replaceAll(";", ",");
stringValue = stringValue.replaceAll("\\\\n+", " ");
stringValue = stringValue.replaceAll("\\\\r+", " ");
stringValue = stringValue.replaceAll("\\\\t+", " ");
stringValue = stringValue.replaceAll(" +", " ");
Input: test\n\t\r123 ;123
Output:test123,123
is there a prettier solution to this ?
The \s class matches whitespace characters. Thus:
stringValue = stringValue.replaceAll("\\s+", " ");
To substitute whitespace escape strings per the question, the four regexes can be combined as follows:
"(?:\\\\[nrt])+| +"

regex seems to be off for special characters (e.g. +-.,!##$%^&*;)

I am using regex to print out a string and adding a new line after a character limit. I don't want to split up a word if it hits the limit (start printing the word on the next line) unless a group of concatenated characters exceed the limit where then I just continue the end of the word on the next line. However when I hit special characters(e.g. +-.,!##$%^&*;) as you'll see when I test my code below, it adds an additional character to the limit for some reason. Why is this?
My function is:
public static String limiter(String str, int lim) {
str = str.trim().replaceAll(" +", " ");
str = str.replaceAll("\n +", "\n");
Matcher mtr = Pattern.compile("(.{1," + lim + "}(\\W|$))|(.{0," + lim + "})").matcher(str);
String newStr = "";
int ctr = 0;
while (mtr.find()) {
if (ctr == 0) {
newStr += (mtr.group());
ctr++;
} else {
newStr += ("\n") + (mtr.group());
}
}
return newStr ;
}
So my input is:
String str = " The 123456789 456789 +-.,!##$%^&*();\\/|<>\"\' fox jumpeded over the uf\n 2 3456 green fence ";
With a character line limit of 7.
It outputs:
456789 +
-.,!##$%
^&*();\/
|<>"
When the correct output should be:
456789
+-.,!##
$%^&*()
;\/|<>"
My code is linked to an online compiler you can run here:
https://ideone.com/9gckP1
You need to replace the (\W|$) with \b as your intention is to match whole words (and \b provides this functionality). Also, since you do not need trailing whitespace on newly created lines, you need to also use \s*.
So, use
Matcher mtr = Pattern.compile("(?U)(.{1," + lim + "}\\b\\s*)|(.{0," + lim + "})").matcher(str);
See demo
Note that (?U) is used here to "fix" the word boundary behavior to keep it in sync with \w (so that diacritics were not considered word characters).
In your pattern, \\W is part of the first capturing group. It is adding this one (non-word) character to the .{1,limit} pattern.
Try with: "(.{1," + lim + "})(\W|$)|(.{0," + lim + "})"
(I can't currently use your regex online compiler)

What does regex "\\p{Z}" mean?

I am working with some code in java that has an statement like
String tempAttribute = ((String) attributes.get(i)).replaceAll("\\p{Z}","")
I am not used to regex, so what is the meaning of it? (If you could provide a website to learn the basics of regex that would be wonderful) I've seen that for a string like
ept as y it gets transformed into eptasy, but this doesn't seem right. I believe the guy who wrote this wanted to trim leading and trailing spaces maybe.
It removes all the whitespace (replaces all whitespace matches with empty strings).
A wonderful regex tutorial is available at regular-expressions.info.
A citation from this site:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
The OP stated that the code fragment was in Java. To comment on the statement:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
the sample code below shows that this does not apply in Java.
public static void main(String[] args) {
// some normal white space characters
String str = "word1 \t \n \f \r " + '\u000B' + " word2";
// various regex patterns meant to remove ALL white spaces
String s = str.replaceAll("\\s", "");
String p = str.replaceAll("\\p{Space}", "");
String b = str.replaceAll("\\p{Blank}", "");
String z = str.replaceAll("\\p{Z}", "");
// \\s removed all white spaces
System.out.println("s [" + s + "]\n");
// \\p{Space} removed all white spaces
System.out.println("p [" + p + "]\n");
// \\p{Blank} removed only \t and spaces not \n\f\r
System.out.println("b [" + b + "]\n");
// \\p{Z} removed only spaces not \t\n\f\r
System.out.println("z [" + z + "]\n");
// NOTE: \p{Separator} throws a PatternSyntaxException
try {
String t = str.replaceAll("\\p{Separator}","");
System.out.println("t [" + t + "]\n"); // N/A
} catch ( Exception e ) {
System.out.println("throws " + e.getClass().getName() +
" with message\n" + e.getMessage());
}
} // public static void main
The output for this is:
s [word1word2]
p [word1word2]
b [word1
word2]
z [word1
word2]
throws java.util.regex.PatternSyntaxException with message
Unknown character property name {Separator} near index 12
\p{Separator}
^
This shows that in Java \\p{Z} removes only spaces and not "any kind of whitespace or invisible separator".
These results also show that in Java \\p{Separator} throws a PatternSyntaxException.
First of all, \p means you are going to match a class, a collection of character, not single one. For reference, this is Javadoc of Pattern class. https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.
And then Z is the name of a class (collection,set) of characters. In this case, it's abbreviation of Separator . Separator containts 3 sub classes: Space_Separator(Zs), Line_Separator(Zl) and Paragraph_Separator(Zp).
Refer here for which characters those classes contains here: Unicode Character Database or
Unicode Character Categories
More document: http://www.unicode.org/reports/tr18/#General_Category_Property

String.split() at a meta character +

I'm making a simple program that will deal with equations from a String input of the equation
When I run it, however, I get an exception because of trying to replace the " +" with a " +" so i can split the string at the spaces. How should I go about using
the string replaceAll method to replace these special characters? Below is my code
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+
^
public static void parse(String x){
String z = "x^2+2=2x-1";
String[] lrside = z.split("=",4);
System.out.println("Left side: " + lrside[0] + " / Right Side: " + lrside[1]);
String rightside = lrside[0];
String leftside = lrside[1];
rightside.replaceAll("-", " -");
rightside.replaceAll("+", " +");
leftside.replaceAll("-", " -"); leftside.replaceAll("+", " +");
List<String> rightt = Arrays.asList(rightside.split(" "));
List<String> leftt = Arrays.asList(leftside.split(" "));
System.out.println(leftt);
System.out.println(rightt);
replaceAll accepts a regular expression as its first argument.
+ is a special character which denotes a quantifier meaning one or more occurrences. Therefore it should be escaped to specify the literal character +:
rightside = rightside.replaceAll("\\+", " +");
(Strings are immutable so it is necessary to assign the variable to the result of replaceAll);
An alternative to this is to use a character class which removes the metacharacter status:
rightside = rightside.replaceAll("[+]", " +");
The simplest solution though would be to use the replace method which uses non-regex String literals:
rightside = rightside.replace("+", " +");
I had similar problem with regex = "?". It happens for all special characters that have some meaning in a regex. So you need to have "\\" as a prefix to your regex.
rightside = rightside.replaceAll("\\+", " +");
String#replaceAll expects regex as input, and + is not proper pattern, \\+ would be pattern. rightside.replaceAll("\\+", " +");
The reason behind this is - There are reserved characters for regex. So when you split them using the java split() method, You will have to use them with escape.
FOr example you want to split by + or * or dot(.) then you will have to do it as split("\+") or split("\*") or split("\.") according to your need.
The reason behind my long explanation on regex is -
YOU MAY FACE IT in OTHER PLACES TOO.
For example the same issue will occur if you use replace or replaceAll methods of java Because they are also working based on regex.

Categories