Say if I have the following code
String sum = "(5+5)/2*6";
char[] bodmasChars = {'+','-','*','/','.'.'(',')'};
Is there a way to check whether the string contains any of the elements in my char[]?
A regex
String sum = "(5+5)/2*6";
if (sum.matches("(?s).*[-\\+\\*/\\.()].*")) { ...
(?s) lets . also match newlines.
[...] is group of possible character or character ranges. Probably have a backslash too many.
You can try it with Regex or with Simple as :
String s="(5+5)/2*6";
if((s.contains("+")||(s.contains("-")||(s.contains("-"))||....))
{
System.out.println("yes");
}
Related
Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");
How can I check if a string contains only numbers and alphabets ie. is alphanumeric?
Considering you want to check for ASCII Alphanumeric characters, Try this:
"^[a-zA-Z0-9]*$". Use this RegEx in String.matches(Regex), it will return true if the string is alphanumeric, else it will return false.
public boolean isAlphaNumeric(String s){
String pattern= "^[a-zA-Z0-9]*$";
return s.matches(pattern);
}
If it will help, read this for more details about regex: http://www.vogella.com/articles/JavaRegularExpressions/article.html
In order to be unicode compatible:
^[\pL\pN]+$
where
\pL stands for any letter
\pN stands for any number
It's 2016 or later and things have progressed. This matches Unicode alphanumeric strings:
^[\\p{IsAlphabetic}\\p{IsDigit}]+$
See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.
See the documentation of Pattern.
Assuming US-ASCII alphabet (a-z, A-Z), you could use \p{Alnum}.
A regex to check that a line contains only such characters is "^[\\p{Alnum}]*$".
That also matches empty string. To exclude empty string: "^[\\p{Alnum}]+$".
Use character classes:
^[[:alnum:]]*$
Pattern pattern = Pattern.compile("^[a-zA-Z0-9]*$");
Matcher matcher = pattern.matcher("Teststring123");
if(matcher.matches()) {
// yay! alphanumeric!
}
try this [0-9a-zA-Z]+ for only alpha and num with one char at-least..
may need modification so test on it
http://www.regexplanet.com/advanced/java/index.html
Pattern pattern = Pattern.compile("^[0-9a-zA-Z]+$");
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
}
To consider all Unicode letters and digits, Character.isLetterOrDigit can be used. In Java 8, this can be combined with String#codePoints and IntStream#allMatch.
boolean alphanumeric = str.codePoints().allMatch(Character::isLetterOrDigit);
To include [a-zA-Z0-9_], you can use \w.
So myString.matches("\\w*"). (.matches must match the entire string so ^\\w*$ is not needed. .find can match a substring)
https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
If you want to include foreign language letters as well, you can try:
String string = "hippopotamus";
if (string.matches("^[\\p{L}0-9']+$")){
string is alphanumeric do something here...
}
Or if you wanted to allow a specific special character, but not any others. For example for # or space, you can try:
String string = "#somehashtag";
if(string.matches("^[\\p{L}0-9'#]+$")){
string is alphanumeric plus #, do something here...
}
100% alphanumeric RegEx (it contains only alphanumeric, not even integers & characters, only alphanumeric)
For example:
special char (not allowed)
123 (not allowed)
asdf (not allowed)
1235asdf (allowed)
String name="^[^<a-zA-Z>]\\d*[a-zA-Z][a-zA-Z\\d]*$";
To check if a String is alphanumeric, you can use a method that goes through every character in the string and checks if it is alphanumeric.
public static boolean isAlphaNumeric(String s){
for(int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if(!Character.isDigit(c) && !Character.isLetter(c))
return false;
}
return true;
}
I have no idea how to remove invalid characters from a string in Java. I'm trying to remove all the characters that are not numbers, letters, or ( ) [ ] . How can I do this?
Thanks
String foo = "this is a thing with & in it";
foo = foo.replaceAll("[^A-Za-z0-9()\\[\\]]", "");
Javadocs are your friend. Regular expressions are also your friend.
Edit:
That being siad, this is only for the Latin alphabet; you can adjust accordingly. \\w can be used for a-zA-Z to denote a "word" character if that works for your case though it includes _.
Using Guava, and almost certainly more efficient (and more readable) than regexes:
CharMatcher desired = CharMatcher.JAVA_DIGIT
.or(CharMatcher.JAVA_LETTER)
.or(CharMatcher.anyOf("()[]"))
.precomputed(); // optional, may improve performance, YMMV
return desired.retainFrom(string);
Try this:
String s = "123abc&^%[]()";
s = s.replaceAll("[^A-Za-z0-9()\\[\\]]", "");
System.out.println(s);
The above will remove characters "&^%" in the sample string, leaving in s only "123abc[]()".
public static void main(String[] args) {
String c = "hjdg$h&jk8^i0ssh6+/?:().,+-#";
System.out.println(c);
Pattern pt = Pattern.compile("[^a-zA-Z0-9/?:().,'+/-]");
Matcher match = pt.matcher(c);
if (!match.matches()) {
c = c.replaceAll(pt.pattern(), "");
}
System.out.println(c);
}
Use this code:
String s = "Test[]"
s = s.replaceAll("[");
s = s.replaceAll("]");
myString.replaceAll("[^\\w\\[\\]\\(\\)]", "");
replaceAll method takes a regex as first parameter and replaces all matches in string. This regex matches all characters which are not digit, letter or underscore (\\w) and braces you need (\\[\\]\\(\\)])
You can remove specials characters from your String/Url or any request parameters you have get from user side
public static String removeSpecialCharacters(String inputString){
final String[] metaCharacters = {"../","\\..","\\~","~/","~"};
String outputString="";
for (int i = 0 ; i < metaCharacters.length ; i++){
if(inputString.contains(metaCharacters[i])){
outputString = inputString.replace(metaCharacters[i],"");
inputString = outputString;
}else{
outputString = inputString;
}
}
return outputString;
}
You can specify the range of characters to keep/remove based on the order of characters in the ASCII table. The regex can use actual characters or character hex codes:
// Example - remove characters outside of the range of "space to tilde".
// 1) using characters
someString.replaceAll("[^ -~]", "");
// 2) using hex codes for "space" and "tilde"
someString.replaceAll("[^\\u0020-\\u007E]", "");
What's the best and easiest way to check if a string only contains the following characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
I want like an example like this pseudo-code:
//If String contains other characters
else
//if string contains only those letters
Please and thanks :)
if (string.matches("^[a-zA-Z0-9_]+$")) {
// contains only listed chars
} else {
// contains other chars
}
For that particular class of String use the regular expression "\w+".
Pattern p = Pattern.compile("\\w+");
Matcher m = Pattern.matcher(str);
if(m.matches()) {}
else {};
Note that I use the Pattern object to compile the regex once so that it never has to be compiled again which may be nice if you are doing this check in a-lot or in a loop. As per the java docs...
If a pattern is to be used multiple
times, compiling it once and reusing
it will be more efficient than
invoking this method each time.
My turn:
static final Pattern bad = Pattern.compile("\\W|^$");
//...
if (bad.matcher(suspect).find()) {
// String contains other characters
} else {
// string contains only those letters
}
Above searches for single not matching or empty string.
And according to JavaDoc for Pattern:
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
Is there a nice way to extract tokens that start with a pre-defined string and end with a pre-defined string?
For example, let's say the starting string is "[" and the ending string is "]". If I have the following string:
"hello[world]this[[is]me"
The output should be:
token[0] = "world"
token[1] = "[is"
(Note: the second token has a 'start' string in it)
I think you can use the Apache Commons Lang feature that exists in StringUtils:
substringsBetween(java.lang.String str,
java.lang.String open,
java.lang.String close)
The API docs say it:
Searches a String for substrings
delimited by a start and end tag,
returning all matching substrings in
an array.
The Commons Lang substringsBetween API can be found here:
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/StringUtils.html#substringsBetween(java.lang.String,%20java.lang.String,%20java.lang.String)
Here is the way I would go to avoid dependency on commons lang.
public static String escapeRegexp(String regexp){
String specChars = "\\$.*+?|()[]{}^";
String result = regexp;
for (int i=0;i<specChars.length();i++){
Character curChar = specChars.charAt(i);
result = result.replaceAll(
"\\"+curChar,
"\\\\" + (i<2?"\\":"") + curChar); // \ and $ must have special treatment
}
return result;
}
public static List<String> findGroup(String content, String pattern, int group) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(content);
List<String> result = new ArrayList<String>();
while (m.find()) {
result.add(m.group(group));
}
return result;
}
public static List<String> tokenize(String content, String firstToken, String lastToken){
String regexp = lastToken.length()>1
?escapeRegexp(firstToken) + "(.*?)"+ escapeRegexp(lastToken)
:escapeRegexp(firstToken) + "([^"+lastToken+"]*)"+ escapeRegexp(lastToken);
return findGroup(content, regexp, 1);
}
Use it like this :
String content = "hello[world]this[[is]me";
List<String> tokens = tokenize(content,"[","]");
StringTokenizer?Set the search string to "[]" and the "include tokens" flag to false and I think you're set.
Normal string tokenizer wont work for his requirement but you have to tweak it or write your own.
There's one way you can do this. It isn't particularly pretty. What it involves is going through the string character by character. When you reach a "[", you start putting the characters into a new token. When you reach a "]", you stop. This would be best done using a data structure not an array since arrays are of static length.
Another solution which may be possible, is to use regexes for the String's split split method. The only problem I have is coming up with a regex which would split the way you want it to. What I can come up with is {]string of characters[) XOR (string of characters[) XOR (]string of characters) Each set of parenthesis denotes a different regex. You should evaluate them in this order so you don't accidentally remove anything you want. I'm not familiar with regexes in Java, so I used "string of characters" to denote that there's characters in between the brackets.
Try a regular expression like:
(.*?\[(.*?)\])
The second capture should contain all of the information between the set of []. This will however not work properly if the string contains nested [].
StringTokenizer won't cut it for the specified behavior. You'll need your own method. Something like:
public List extractTokens(String txt, String str, String end) {
int so=0,eo;
List lst=new ArrayList();
while(so<txt.length() && (so=txt.indexOf(str,so))!=-1) {
so+=str.length();
if(so<txt.length() && (eo=txt.indexOf(end,so))!=-1) {
lst.add(txt.substring(so,eo);
so=eo+end.length();
}
}
return lst;
}
The regular expression \\[[\\[\\w]+\\] gives us
[world] and
[[is]