Removing backslashes in the numbers sequence - java

What regular expression can get a number sequence from the input string, contains backslashes and not a numbers, for example -
"12\34a56ss7890"
I need to -
1234567890

If we assume you have this in a String. You could do something like:
string = string.replaceAll("\\D", "");
This will replace all non digit Characters from your String.

str.replaceAll("[^\d]", "");
bootnote: im not a java developer, but the regex itself should be correct

Sorry for adding another Answer but this is needed because this won't fit to an Comment.
I think this is because of the \34. If I do call System.out.print("12\34a56ss7890"); I will get the following output 12a56ss7890. This is because the \34 will be escaped. This is an Issue in Java. You can fix this by first calling this Method on your InputStream:
private InputStreamReader replaceBackSlashes() throws Exception {
FileInputStream fis = new FileInputStream(new File("PATH TO A FILE");
Scanner in = new Scanner(fis, "UTF-8");
ByteArrayOutputStream out = new ByteArrayOutputStream();
while (in.hasNext()) {
String nextLine = in.nextLine().replace("\", "");
out.write(nextLine.getBytes());
out.write("\n".getBytes());
}
return new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
}
BTW: Sorry for my Edit, but there was a little Mistake in the Code.
After calling this Method you will convert your InputStream to a String and the call this on the String:
string = string.replaceAll("\\D", "");
This should hopefully work now :)

String num;
String str =" 12\34a56ss7890";
str= str.replace("\34", "34");
String regex = "[\\d]+";
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
num = matcher.group();
System.out.print(num);
}
replace \34 by 34 and match the rest using regular expression.

User a regular exxpression.
String numvber;
String str =" 12\34a56ss7890";
str= str.replace("\34", "34");
String regex = "[\\d]+";//match only digits.
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
num = matcher.group();
System.out.print(num);
}

The following example:
String a ="1\2sas";
String b ="1\\2sas";
System.out.println(a.replaceAll("[a-zA-Z\\\\]",""));
System.out.println(b.replaceAll("[a-zA-Z\\\\]",""));
gives output:
1X
12
where X is not a X but a little rectangle - a symbol which is shown when the text showing control does not know how to draw it, a so called non printable character.
It is because in String a the "\2" part obviously tries to be interpreted as a single escaped sign "\u0002"- similar to "\n" "\t" - you can see this in debugger (i tried it using NetBeans)
Since the first argument of a replaceAll method is passed to [Pattern.compile](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String)) it needs to be escaped twice as opposed to String literal (like b).
So if the String "12\34a56ss7890" looks like this on screen you have printed it out like this:
System.out.println("12\\34a56ss7890");
which is solved in the second example.
However if the literal is given as "12\34a56ss7890" then I think you can't handle it with a single regexp, because if the backslash is followed by a number it gets interpreted as as \u0000 -\u0009 so the best I can think of is a very ugly solution:
str.replaceAll("\u0000","0").replaceAll("\u0001","1") ... .replaceAll("\u0009","9").replaceAll("[^\\d]")
the first then replacements (\u0000-\u0009) might be rewritten as a for loop to make it look elegant.
+1 for an EXCELLENT question :)
EDIT:
actually if a backslash is followed by more than one number they all get interpreted as a single sign - up to three numbers after a backslash, the fourth number will be treated as a single number.
Therefore, my solution is not generally correct, but could be extended to be. I would recommend Robin's solution below as it is far more efficient.

The character \34 is an octal number in the string 12\34a56ss7890, so you could use:
str.replaceAll("\034", "34").replaceAll("\\D", "")

Related

Java - Remove only the first backslash

Small Java question regarding how to remove only the first backslash please.
I have a string which looks like this:
String s = "\\u6df1\\u5733";
Please note, there are two backslashes, and multiple occurrences.
Hence, when this is displayed, the visual result is:
\深\圳
I would like to just remove any extra backslashes, having a result like this:
深圳
So far, I have tried this:
String s = "\\u6df1\\u5733";
String ss = s.replaceAll("\\", "");
But it is still not working.
What is the correct solution please in order to get 深圳 from "\\u6df1\\u5733" please?
Thank you
Try this.
String s = "\\u6df1\\u5733";
Pattern UNICODE_ESCAPE = Pattern.compile("\\\\u[0-9a-f]+", Pattern.CASE_INSENSITIVE);
String ss = UNICODE_ESCAPE.matcher(s).results()
.map(x -> new String(Character.toChars(Integer.parseInt(x.group().substring(2), 16))))
.collect(Collectors.joining());
System.out.println(ss);
UNICODE_ESCAPE.matcher(s).results() returns the stream of MatcherResult.
x.group().substring(2) extracts hexadecimal part "xxxx" from "\\uxxxx".
Integer.parseInt(..., 16) converts it to an integer value that is a code point.
Caracter.toChars() converts it to an array of char.
new String(...) converts it to an String. And .collect(Collectors.joining()) concatenates the all of them.
output:
深圳
Going by this output:
\深\圳
you actually have two unicode characters each preceded by one backslash.
In a Java string literal, that would look like this:
String s = "\\\u6df1\\\u5733";
If you want to remove the backslashes (\\) and leave the unicode character codes (e.g. \u6df1), then you just need replace.
String ss = s.replace("\\", "");
replaceAll won't work for this, because it requires a regular expression as its first argument.

How to clean a file, replacing unwanted seperators, operators, string literals

I'm working on a concordance problem where I must: "Clean the file. For this, remove all string literals (anything enclosed
in double quotes, the second of which is not preceded by an odd number
of backslashes), remove all // comments, remove all separator characters
(look these up), and operators (look these up). Do not worry about ".class literals" (we will assume they will not appear in the input file)."
I think I know how the replaceAll() method works, but I don't know what's going to be in the file. For starters, how would I go about removing all string literals? Is there a way to replace everything within two double quotes? I.E. String someString = "I want to remove this from a file plz help me, thx";
I've currently put each line of text within an ArrayList of Strings.
Here's what I've got: http://pastebin.com/N84QdLqz
I think I've come up with a solution for your string literal regex. Something like:
inputLine.replaceAll("\"([^\\\\\"]*(\\\\\")*)*([\\\\]{2})*(\\\\\")*[^\"]*\"");
should do the trick. The regex is actually significantly more readable if you print it out to the console after Java has had a chance to escape all of the characters. So if you call System.out.println() with that String, you'll get:
"([^\\"]*(\\")*)*([\\]{2})*(\\")*[^"]*"
I'll break down the original regex to explain. First there's:
"\"([^\\\\\"]*(\\\\\")*)*
This says to match a quote character (") followed by 0 or more patterns of characters that are neither backslashes (\) nor quote characters (") which are followed by 0 or more escaped quotes (\"). As you can see, since \ is typically used as an escape character in Java, any regexes using them become pretty verbose.
([\\\\]{2})*
This says to next match 0 or more sets of 2 (i.e. even-numbered amounts) of backslashes.
(\\\\\")*
This says to match a single backslash followed by a quote character, and to find 0 or more of those together.
[^\"]*\"
This says to match anything that is not a quote character, 0 or more times, followed by a quote character.
I tested my regex with an example similar to what you were asking for:
string literals (anything enclosed in double quotes, the second of which is not preceded by an odd number of backslashes)
Emphasis mine. So by this statement, if the first quote in a literal has a backslash in front of it, it doesn't matter.
String s = "This is "a test\" + "So is this"
Applying the regex with replaceAll and a replacement of \"\", you'll get:
String s = ""a test\""So is this"
which should be correct. You can completely remove the matching literal's quotes, if you want, by calling replaceAll with a replacement of "":
String s = a test\So is this"
Alternately, using this regex on something much less contrived to cause headaches:
String s = "This is \"a test\\" + "So is this"
will return:
String s = +
Yo can do something like this:
private static final String REGEX = "(\"[\\w|\\s]*\")";
private static Pattern P;
private static Matcher M;
public static void main(String args[]){
P = Pattern.compile(REGEX);
//.... your code here ....
}
public static ArrayList<String> readStringsFromFile(String fileName) throws FileNotFoundException
{
Scanner scanner = null;
scanner = new Scanner(new File(fileName));
ArrayList<String> list = new ArrayList<>();
String str = new String();
try
{
while(scanner.hasNext())
{
str = scanner.nextLine();
str = cleanLine(str);//clean the line after read
list.add(str);
}
}
catch (InputMismatchException ex)
{
}
return list;
}
public static String cleanLine(String line) {
int index;
//remove comment lines
index = line.indexOf("//");
if (index != -1) {
line = line.substring(0, index);
}
//remove everything within two double quotes
M = P.matcher(line);
String tmp = "";
while(M.find()) {
tmp = line.substring(0,M.start());
tmp += line.substring(M.end());
line = tmp;
M = P.matcher(line);
}
return line;
}

Java how can remove everything between two substring in a string

I want to remove any substring(s) in a string that begins with 'galery' and ends with 'jssdk));'
For instance, consider the following string:
Galery something something.... jssdk));
I need an algorithm that removes 'something something....' and returns 'Galery jssdk));'
This is what I've done, but it does not work.
newsValues[1].replaceAll("Galery.*?jssdK));", "");
Could probably be improved, I've done it fast:
public static String replaceMatching(String input, String lowerBound, String upperBound{
Pattern p = Pattern.compile(".*?"+lowerBound+"(.*?)"+upperBound+".*?");
Matcher m = p.matcher(input);
String textToRemove = "";
while(m.find()){
textToRemove = m.group(1);
}
return input.replace(textToRemove, "");
}
UPDATE Thx for accepting the answer, but here is a smaller reviewed version:
public static String replaceMatching2(String input, String lowerBound, String upperBound){
String result = input.replaceAll("(.*?"+lowerBound + ")" + "(.*?)" + "(" + upperBound + ".*)", "$1$3");
return result;
}
The idea is pretty simple actually, split the String into 3 groups, and replace those 3 groups with the first and third, droping the second one.
You are almost there, but that will remove the entire string. If you want to remove anything between Galery and jssdK));, you will have to do something like so:
String newStr = newsValues[1].replaceAll("(Galery)(.*?)(jssdK\\)\\);)","$1$3");
This will put the strings into groups and will then use these groups to replace the entire string. Note that in regex syntax, the ) is a special character so it needs to be escaped.
String str = "GaleryABCDEFGjssdK));";
String newStr = str.replaceAll("(Galery)(.*?)(jssdK\\)\\);)","$1$3");
System.out.println(newStr);
This yields: GaleryjssdK));
I know that the solution presented by #amit is simpler, however, I thought it would be a good idea to show you a useful way in which you can use the replaceAll method.
Simplest solution will be to replace the string with just the "edges", effectively "removing" 1 everything between them.
newsValues[1].replaceAll("Galery.*?jssdK));", "GaleryjssdK));");
1: I used "" here because it is not exactly replacing - remember strings are immutable, so it is creating a new object, without the "removed" part.
newsValues[1] = newsValues[1].substring(0,6)+newsValues.substring(newsValues[1].length()-5,newsValues[1].length())
This basically concatenates the "Galery" and the "jssdk" leaving or ignoring everything else. More importantantly, you can simply assign newValues[1] = "Galeryjssdk"

Regular Expression - Java

For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?
Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.
A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12
You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].
If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29

How to find and replace a substring?

For example I have such a string, in which I must find and replace multiple substrings, all of which start with #, contains 6 symbols, end with ' and should not contain ) ... what do you think would be the best way of achieving that?
Thanks!
Edit:
just one more thing I forgot, to make the replacement, I need that substring, i.e. it gets replaces by a string generated from the substring being replaced.
yourNewText=yourOldText.replaceAll("#[^)]{6}'", "");
Or programmatically:
Matcher matcher = Pattern.compile("#[^)]{6}'").matcher(yourOldText);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb,
// implement your custom logic here, matcher.group() is the found String
someReplacement(matcher.group());
}
matcher.appendTail(sb);
String yourNewString = sb. toString();
Assuming you just know the substrings are formatted like you explained above, but not exactly which 6 characters, try the following:
String result = input.replaceAll("#[^\\)]{6}'", "replacement"); //pattern to replace is #+6 characters not being ) + '
You must use replaceAll with the right regular expression:
myString.replaceAll("#[^)]{6}'", "something")
If you need to replace with an extract of the matched string, use a a match group, like this :
myString.replaceAll("#([^)]{6})'", "blah $1 blah")
the $1 in the second String matches the first parenthesed expression in the first String.
this might not be the best way to do it but...
youstring = youstring.replace("#something'", "new stringx");
youstring = youstring.replace("#something2'", "new stringy");
youstring = youstring.replace("#something3'", "new stringz");
//edited after reading comments, thanks

Categories