I am passing a String value in a URL
eg: http://localhost:8080/webservice/useradmin/a%bghijlk123/0978+gh
The String "ab%ghijlk123/0978+gh" breaks the URL.
What are the available options to overcome this.
Is encoding the string the only option? There must be minimal code change. Any server side configurations can be used to achieve this?
Kindly provide suggestions please.
Is encoding the string the only option?
It is the only correct option.
Use URLEncoder.encode("ab%ghijlk123/0978+gh", "UTF-8"),
which will give you ab%25ghijlk123%2F0978%2Bgh, for a full URL of:
http://localhost:8080/webservice/useradmin/ab%25ghijlk123%2F0978%2Bgh
The URL http://localhost:8080/webservice/useradmin/a%bghijlk123/0978+gh is invalid.
The URL specification (RFC3986) says that path segments (the values separated by a /) may only consist of:
ALPHA: "a"-"z", "A"-"Z"
DIGIT: "0"-"9"
Special chars: - . _ ~ ! $ & ' ( ) * + , ; = : #
pct-encoded: "%" HEXDIG HEXDIG
Values that has to be disallowed because they have other meanings are: / (path separator), ? (start of query), # (start of fragment), and % (start of 2-digit hex encoded char).
As you can see, the % sign is only allowed as a percent-encoded character, so %bg makes the URL invalid.
If the part after the useradmin/ is supposed to be the value ab%ghijlk123/0978+gh, then it must be encoded as shown above.
If the server rejects that as "400:Bad request", then the server is in error.
Related
I am trying to replace a particular word say password to ******* from a string which has characters such as $ and \n in Groovy.
I cannot escape them by using \ because i have no control over data i receive and even in the final output i need as it was earlier with $.
i tried str.replaceAll("password","**")
gives:
illegal string body character after dollar sign;
solution: either escape a literal dollar sign "\$5" or bracket the value expression "${5}" # line 2, column 8.
afdmas$
def str="""hello how
you$
password
doing"""
expected o/p :
hello how
you$
**
doing
It's not clear why you don't have access to the input data because you use string literal in the example and the error you get is from this example as well. In this case you can use Groovy single-quoted strings but they are without interpolation. Or if interpolation is necessary then you can use slashy or dollar slashy strings which additionally don't require backslash escaping:
def str='''"hello how
you$ password doing'''
def str=/"hello how
you$ password doing/
def str=$/"hello how
you$ password doing/$
*Formatting is the same as in OP
I'm working on GUI validation...
Please see the problem below...
How to validate an email with a specific format? at least one digit before the # and one digit after and at least two letters after the dot.
String EmailFormat = "m#m.co";
Pattern patternEmail = Pattern.compile("\\d{1,}#\\d{1,}.\\d{2,}");
Matcher matcherName = patternEmail.matcher(StudentEmail);
Don't write your own validator. Email has been around for decades and there are many standard libraries which work, address parts of the standard you may not know about, and are well tested by many other developers.
Apache Commons Email Validator is a good example. Even if you use a standard validator you need to be aware of the limitations or gotchas in validating an email address. Here are the javadocs for Commons EmailValidator which state, "This implementation is not guaranteed to catch all possible errors in an email address. For example, an address like nobody#noplace.somedog will pass validator, even though there is no TLD "somedog"" . So you can use a good email validator to determine if an address is valid, but you will have to do extra work to guarantee that the domain exists, accepts email, and accepts email fro that address.
If you require good addresses you will need a secondary mechanism. A confirmation email is a good mechanism. You send a link to the given address and the user must visit that link to verify that email can be sent to that address.
This the regex pattern for emails
String pt = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
You can try it like this
List email = Arrays.asList("xyzl#gmail.com", "#", "sxd");
Predicate<String> validMail = (n) -> n.matches(pt);
email.stream().filter(validMail).forEach((n) -> System.out.println(n));
This is the description you can change it according to your need.
^ #start of the line
[_A-Za-z0-9-\\+]+ # must start with string in the bracket [ ], must contains one or more (+)
( # start of group #1
\\.[_A-Za-z0-9-]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #1, this group is optional (*)
# # must contains a "#" symbol
[A-Za-z0-9-]+ # follow by string in the bracket [ ], must contains one or more (+)
( # start of group #2 - first level TLD checking
\\.[A-Za-z0-9]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #2, this group is optional (*)
( # start of group #3 - second level TLD checking
\\.[A-Za-z]{2,} # follow by a dot "." and string in the bracket [ ], with minimum length of 2
) # end of group #3
$ #end of the line
Split email into two parts using # as delimiter:
String email = "some#email.com";
String[] parts = email.split("#"); // parts = [ "some", "email.com" ]
Validate each part separately, using multiple checks if necessary:
// validate username
String username = parts[0];
if (username.matches("\\d")) {
// ok
}
// validate domain
String domain = parts[1];
if (domain.matches("\\d") && domain.matches("\\.[a-z]{2,4}$")) {
// ok
}
Note that this is a very poor email validator and it shouldn't be used standalone.
i want to send an email this text
Destination : 6W - ATLANTA WEST!##$%^*!gemini!##$%^*!jfds!##$%^*!,Trailer Number : 000564,,Drop empty trailer at Plant Numbe :546,Pick up trailer at Plant Number :45, Bill Date : 25-Jan-2013,Bill Time - Eastern Time : 1,Trip Number :456,MBOL :546,Carrier :Covenant!##$%^*!test#shaw.com!##$%^*!transport#shaw.com!##$%^*!test#transport.com!##$%^*!antoalphi#gmail.com,Destination : 6W - ATLANTA WEST!##$%^*!gemini!##$%^*!jfds!##$%^*!,Customer Name : 567,Cusomer Delivery Address : 657567657,General Comments :657,Warehouse Comments : 65,Carrier Comments : ,Appointment Date :25-Jan-2013,Appointment Time : 1am,Rail Only :Standard,Total Weight : 45645
and i used this mailContent = URLDecoder.decode(Body, "UTF-8"); decode,
but it is giving me this exception URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "^*"
could any one of you help me,how to solve this. I get this while sending mail.
Best Regards
You are trying to URL decode something that wasn't URL encoded in the first place. What's wrong with the body as it is? In other words, what happens if you just use:
mailContent = Body
(In URL encoding, the % character is used with two hexadecimal digits to encode characters that might cause problems, for example / would be encoded as %2F, as its ASCII code is 47 (decimal) or 2F (hex). In your body, % is followed by two characters that are not hexadecimal digits - that's how I can tell it hasn't been URL encoded, and why the decoder is erroring.)
Simply stop calling URLDecoder.decode() and you will stop getting the error! The string value you are passing to it is not URL encoded.
There are various forms of MIME encoding that you might want to consider, if you are sending an email with content that would not normally be allowed in an email message without encoding. There references might be handy:
What is allowed in SMTP: http://www.apps.ietf.org/rfc/rfc788.html
Basic MIME encoding: http://www.apps.ietf.org/rfc/rfc1341.html
Java MIME support: http://docs.oracle.com/javaee/1.4/api/javax/mail/internet/MimeUtility.html
For example, you might try:
String sendable = MimeUtility.encodeText(body,"UTF-8","BASE64")
I'm matching URLs against a regular expression, testing if they reflect a "shutdown" command.
Here's a URL that performs a shutdown:
/exec?debug=true&command=shutdown&f=0
Here's another, legitimate but confusing URL that performs shutdown:
/exec?commando=yes&zcommand=34&command=shutdown&p
Now, I must ensure there's only one command=... parameter and it is command=shutdown. Alternatively, I can live with ensuring the first command=... parameter is command=shutdown.
Here's my test for the requested regular expression:
/exec?version=0.4&command=shutdown&out=JSON&zcommand=1
Should match
/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown
Should fail to match
/exec?command=shutdown&out=JSON
Should match
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Should fail to match
Here's my baseline - a regular expression that passes the above tests - all but the last one:
^/exec?(.*\&)*command=shutdown(\&.*)*$
The problem is with the occurrence of more than one command=..., where the first one is not shutdown.
I tried using lookbehind:
^/exec?(.*\&)*(?<!(\&|\?)command=.*)command=shutdown(\&.*)*$
But I'm getting:
Look-behind group does not have an obvious maximum length near index 31
I even tried atomic grouping. To no avail. I can't make the following expression NOT match:
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Can anyone help with a regular expression that passes all the tests?
Clarifications
I see I owe you some context.
My task is to configure a Filter that guards the entrance of all our system’s servlets, and verifies there’s an open HTTP session (in other words: that a successful Login has occurred). The filter also allows configuring which URLs do not require login.
Some exceptions are easy: /login does not need login. Calls to localhost do not need login.
But sometimes it gets complicated. Like the shutdown command that cannot require login while other commands can and should (the strange reason for that is out of the scope of my question).
Since it’s a security matter, I can’t allow users to merely append &command=shutdown to a URL and bypass the filter.
So I really need a regular expression, or otherwise I’ll need to redefine the configuration specs.
You would need to do it in multiple steps:
(1) Find match of ^(?=\/exec\?).*?(?<=[?&])command=([^&]+)
(2) Check if match is shutdown
Ok. I thank you all for your great answers! I tried some of the suggestions, struggled with others, and all in all I have to agree that even if the right regex exists, it looks terrible, non maintainable, and can serve well as a nasty university exercise, but not in a real system configuration.
I also realize that since a Filter is involved here, and the Filter already parses its own URI, it is absolutely ridiculous to glue back all the URI parts into a string and match it against a regular expression. What was I thinking??
I'll therefore redesign the Filter and its configuration.
Thanks a lot, people! I appreciate the help :)
Noam Rotem.
P.S. - why was I getting a userXXXX nick? Very strange...
This tested (and fully commented) regex solution meets all your requirements:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
Pattern re = Pattern.compile(
" # Match URI having command=shutdown query variable value. \n" +
" ^ # Anchor to start of string. \n" +
" (?:[^:/?\\#\\s]+:)? # URI scheme (Optional). \n" +
" (?://[^/?\\#\\s]*)? # URI authority (Optional). \n" +
" [^?\\#\\s]* # URI path. \n" +
" \\? # Literal start of URI query. \n" +
" # Match var=value pairs preceding 'command=xxx'. \n" +
" (?: # Zero or more 'var=values' \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" & # var=value separator. \n" +
" )* # Zero or more 'var=values' \n" +
" command=shutdown # variable and value to match. \n" +
" # Match var=value pairs following 'command=shutdown'. \n" +
" (?: # Zero or more 'var=values' \n" +
" & # var=value separator. \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" )* # Zero or more 'var=values' \n" +
" (?:\\#\\S*)? # URI fragment (Optional). \n" +
" $ # Anchor to end of string.",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
String s = "/exec?version=0.4&command=shutdown&out=JSON&zcommand=1";
// Should match
// String s = "/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown";
// Should fail to match
// String s = "/exec?command=shutdown&out=JSON";
// Should match
// String s = "/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown";
// Should fail to match";
Matcher m = re.matcher(s);
if (m.find()) {
// Successful match
System.out.print("Match found.\n");
} else {
// Match attempt failed
System.out.print("No match found.\n");
}
}
}
The above regex matches any RFC3986 valid URI having any scheme, authority, path, query or fragment components, but it must have one (and only one) query "command" variable whose value must be exactly, but case insensitively: "shutdown".
A carefully crafted complex regex is perfectly fine (and maintainable) to use when written with proper indentation and commented steps (like shown above). (For more information on using regex to validate a URI, see my article: Regular Expression URI Validation)
If you can live with just accepting the first match, you could just use '\\Wcommand=([^&]+) and fetch the first group.
Otherwise, you could just call Matcher.find twice to test for subsequent matches, and eventually use the first match, why do you want to do this with a single complex regex?
I am not a Java coder, but try this one (works in Perl) >>
^(?=\/exec\?)(?:[^&]+(?<![?&]command)=[^&]+&)*(?<=[?&])command=shutdown(?:&|$)
To match the first occurrence of command=shutdown use this:
Pattern.compile("^((?!command=).)+command=shutdown.*$");
The results will look like this:
"/exec?version=0.4&command=shutdown&out=JSON&zcommand=1" => false
"/exec?command=shutdown&out=JSON" => true
"/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown" => false
"/exec?commando=yes&zcommand=34&command=shutdown&p" => false
If you want to match strings that ONLY contain one 'command=' use this:
Pattern.compile("^((?!command=).)+command=shutdown((?!command=).)+$");
Please note that using "not" qualifiers in regular expressions is not something they are intended for and performance might not be the best.
If this can be done with a single regular expression, and it may well could be; it will be so complex as to be un-readable, and thus un-maintainable as the intent of the logic will be lost. Even if it is "documented" it will still be much less obvious to someone who just knows Java.
A much better approach would be to use the URI object parse the entire thing, domain and all and pull off the query parameters and then write a simple loop that walks through them and decides based on your business logic what is a shutdown and what isn't. Then it will be simple, self-documenting and probably more efficient ( not that that should be a concern ).
Try this:
Pattern p = Pattern.compile(
"^/exec\\?(?:(?:(?!\\1)command=shutdown()|(?!command=)\\w+(?:=[^&]+)?)(?:&|$))+$\\1");
Or a little more readably:
^/exec\?
(?:
(?:
(?!\1)command=shutdown()
|
(?!command=)\w+(?:=[^&]+)?
)
(?:&|$)
)+$
\1
The main body of the regex is an alternation that matches either a shutdown command or a parameter whose name is not command. If it does match a shutdown command, the empty group in that branch "captures" an empty string. It doesn't need to consume anything, because we're only using it as a checkbox, confirming en passant that one of the parameters was a shutdown command.
The negative lookahead - (?!\1) - prevents it from matching two or more shutdown commands. I don't know if that's really necessary, but it's a good opportunity to demonstrate (1) how to negate a "back-assertion", and (2) that a backreference can appear before the group it refers to in certain circumstances (what's known as a forward reference).
When the whole URL has been consumed, the backreference (\1) acts like a zero-width assertion. If one of the parameters was command=shutdown, the backreference will succeed. Otherwise it will fail even though it's only trying to match an empty string, because the group it refers to didn't participate in the match.
But I have to concur with the other responders: when your regexes get this complicated, you should be thinking seriously about switching to a different approach.
EDIT: It works for me. Here's the demo.
I have used Inet6Address.getByName("2001:db8:0:0:0:0:2:1").toString() method to compress IPv6 address, and the output is 2001:db8:0:0:0:0:2:1 ,but i need 2001:db8::2:1 . , Basically the compression output should based on RFC 5952 standard , that is
Shorten as Much as Possible : For example, 2001:db8:0:0:0:0:2:1 must be shortened to
2001:db8::2:1.Likewise, 2001:db8::0:1 is not acceptable,
because the symbol "::" could have been used to produce a
shorter representation 2001:db8::1.
Handling One 16-Bit 0 Field : The symbol "::" MUST NOT be used to shorten just one 16-bit 0 field.
For example, the representation 2001:db8:0:1:1:1:1:1 is correct, but
2001:db8::1:1:1:1:1 is not correct.
Choice in Placement of "::" : = When there is an alternative choice in the placement of a "::", the
longest run of consecutive 16-bit 0 fields MUST be shortened (i.e.,
the sequence with three consecutive zero fields is shortened in 2001:
0:0:1:0:0:0:1). When the length of the consecutive 16-bit 0 fields
are equal (i.e., 2001:db8:0:0:1:0:0:1), the first sequence of zero
bits MUST be shortened. For example, 2001:db8::1:0:0:1 is correct
representation.
I have also checked another post in Stack overflow, but there was no condition specified (example choice in placement of ::).
Is there any java library to handle this? Could anyone please help me?
Thanks in advance.
How about this?
String resultString = subjectString.replaceAll("((?::0\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)", "::$2").replaceFirst("^0::","::");
Explanation without Java double-backslash hell:
( # Match and capture in backreference 1:
(?: # Match this group:
:0 # :0
\b # word boundary
){2,} # twice or more
) # End of capturing group 1
:? # Match a : if present (not at the end of the address)
(?! # Now assert that we can't match the following here:
\S* # Any non-space character sequence
\b # word boundary
\1 # the previous match
:0 # followed by another :0
\b # word boundary
) # End of lookahead. This ensures that there is not a longer
# sequence of ":0"s in this address.
(\S*) # Capture the rest of the address in backreference 2.
# This is necessary to jump over any sequences of ":0"s
# that are of the same length as the first one.
Input:
2001:db8:0:0:0:0:2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1:0:0:0:1
2001:db8:0:0:1:0:0:1
2001:db8:0:0:1:0:0:0
Output:
2001:db8::2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1::1
2001:db8::1:0:0:1
2001:db8:0:0:1::
(I hope the last example is correct - or is there another rule if the address ends in 0?)
I recently ran into the same problem and would like to (very slightly) improve on Tim's answer.
The following regular expression offers two advantages:
((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)
Firstly, it incorporates the change to match multiple zeroes. Secondly, it also correctly matches addresses where the longest chain of zeroes is at the beginning of the address (such as 0:0:0:0:0:0:0:1).
Guava's InetAddresses class has toAddrString() which formats according to RFC 5952.
java-ipv6 is almost what you want. As of version 0.10 it does not check for the longest run of zeroes to shorten with :: - for instance 0:0:1:: is shortened to ::1:0:0:0:0:0. It is a very decent library for the handling of IPv6 addresses, though, and this problem should be fixed with version 0.11, such that the library is RFC 5952 compliant.
The open-source IPAddress Java library can do as described, it provides numerous ways of producing strings for IPv4 and/or IPv6, including the canonical string which for IPv6 matches rfc 5952. Disclaimer: I am the project manager of that library.
Using the examples you list, sample code is:
IPAddress addr = new IPAddressString("2001:db8:0:0:0:0:2:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8::2:1
addr = new IPAddressString("2001:db8:0:1:1:1:1:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8:0:1:1:1:1:1
addr = new IPAddressString("2001:0:0:1:0:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:0:0:1::1
addr = new IPAddressString("2001:db8:0:0:1:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
//2001:db8::1:0:0:1
After performing some tests, I think the following captures all the different IPv6 scenarios:
"((?:(?::0|0:0?)\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)" -> "::$2"
Not quite elegant but this is my proposal (based on chrixm work):
public static String shortIpv6Form(String fullIP) {
fullIP = fullIP.replaceAll("^0{1,3}", "");
fullIP = fullIP.replaceAll("(:0{1,3})", ":");
fullIP = fullIP.replaceAll("(0{4}:)", "0:");
//now we have full form without unnecessaires zeros
//Ex:
//0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200:0:0:0:0:0:0
//0000:0000:0000:1200:0000:0000:0000:8351 -> 0:0:0:1200:0:0:0:8351
//0000:125f:0000:94dd:e53f:0000:61a9:0000 -> 0:125f:0:94dd:e53f:0:61a9:0
//0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
//compress to short notation
fullIP = fullIP.replaceAll("((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)", "::$2");
return fullIP;
}
results:
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351 ->
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351
7469:125f:0000:0000:e53f:cfe7:0000:0000 -> 7469:125f::e53f:cfe7:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:0000:0000:94dd:0000:0000:0000:8351 -> 7469:0:0:94dd::8351
0469:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
469:125f:8eb6:94dd:0:cfe7:61a9:8351
0069:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
69:125f:8eb6:94dd:0:cfe7:61a9:8351
0009:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
9:125f:8eb6:94dd:0:cfe7:61a9:8351
0000:0000:8eb6:94dd:e53f:0007:6009:8350 ->
::8eb6:94dd:e53f:7:6009:8350 0000:0000:8eb6:94dd:e53f:0007:6009:8300
-> ::8eb6:94dd:e53f:7:6009:8300 0000:0000:8eb6:94dd:e53f:0007:6009:8000 ->
::8eb6:94dd:e53f:7:6009:8000 7469:0000:0000:0000:e53f:0000:0000:8300
-> 7469::e53f:0:0:8300 7009:100f:8eb6:94dd:e000:cfe7:6009:8351 -> 7009:100f:8eb6:94dd:e000:cfe7:6009:8351
7469:100f:8006:900d:e53f:cfe7:61a9:8351 ->
7469:100f:8006:900d:e53f:cfe7:61a9:8351
7000:1200:8e00:94dd:e53f:cfe7:0000:0001 ->
7000:1200:8e00:94dd:e53f:cfe7:0:1
0000:0000:0000:0000:0000:0000:0000:0000 -> ::
0000:0000:0000:94dd:0000:0000:0000:0000 -> 0:0:0:94dd::
0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200::
0000:0000:0000:1200:0000:0000:0000:8351 -> ::1200:0:0:0:8351
0000:125f:0000:94dd:e53f:0000:61a9:0000 ->
0:125f:0:94dd:e53f:0:61a9:0 7469:0000:8eb6:0000:e53f:0000:61a9:0000
-> 7469:0:8eb6:0:e53f:0:61a9:0 0000:125f:0000:94dd:0000:cfe7:0000:8351 ->
0:125f:0:94dd:0:cfe7:0:8351 0000:025f:0000:94dd:0000:cfe7:0000:8351
-> 0:25f:0:94dd:0:cfe7:0:8351 0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
0000:000f:0000:94dd:0000:cfe7:0000:8351 -> 0:f:0:94dd:0:cfe7:0:8351
0000:0000:0000:0000:0000:0000:0000:0001 -> ::1