URI and double slashes - java

java.net.URI.create("localhost:8080/foo") // Works
java.net.URI.create("127.0.0.1:8080/foo") // Throws exception
java.net.URI.create("//127.0.0.1:8080/foo") // Works
Is double slash required for when you have the host as an IP Address? I glanced through the RFC for URI - https://www.rfc-editor.org/rfc/rfc3986. But could not find anything pertaining to this.

java.net.URI.create uses the syntax described in RFC 2396.
java.net.URI.create("localhost:8080/foo")
This doesn't produce an exception, but the URI is parsed in a way which you probably don't expect. Its scheme (not host!) is set to localhost, and the 8080/foo isn't port + path, but a scheme-specific part. So this doesn't really work.
java.net.URI.create("//localhost:8080/foo")
parses the URL without scheme, as a net_path grammar element (see RFC 2396 for details).
Here's the relevant grammar excerpt from the RFC 2396:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
// This is how 'localhost:8080/foo' is parsed:
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
...
// This is how '//127.0.0.1:8080/foo' is parsed:
net_path = "//" authority [ abs_path ]
...
// Scheme must start with a letter,
// hence 'localhost' is parsed as a scheme, but '127' isn't:
scheme = alpha *( alpha | digit | "+" | "-" | "." )
One proper way would be:
java.net.URI.create("http://localhost:8080/foo")

Related

Remove values from Java map and replace with comma using Regex

I have a regular expression that will target everything to the right of ":" in a map and to the left of a ","
I am using this RegExp:
:(.*?)(,|})
Like this:
$PROP_TYPES.replaceAll(":(.*?)(,|})", ",").concat("}")
This works well for the most part, except when the expression runs into objects that are on the right side of the ":"
For example, on a list of prop type definitions in TypeScript:
"open": "boolean",
"onDidDismiss": "() => void",
"message": "string",
"showAddReceiptModal": "boolean",
"showModal": "(value: (((prevState: boolean) => boolean) | boolean)) => void",
"vehicleTabRef": "React.MutableRefObject<null>",
"onReceiptSubmit": "(data) => void",
"vehicle": "any",
"showModal1": "boolean",
"showModal2": "(value: (((prevState: boolean) => boolean) | boolean)) => void",
"selectedReceipt": "{ __typename: "Receipt" | undefined; id: string | undefined; purchaseDate: string | undefined; sellerName: string | undefined; sellerAddress: string | undefined; sellerCity: string | undefined; sellerState: string | undefined; sellerZipCode: string | undefined; totalAmount: number | undefined; gallonsPurchased: number | undefined; image?: string | null | undefined | undefined; userID?: string | null | undefined | undefined; vehicleID: string | undefined; vehicle?: Vehicle | null | undefined | undefined; taxRefund?: number | null | undefined | undefined; createdAt: string | undefined; updatedAt: string | undefined }",
"onClick": "() => void",
"onClick1": "() => void",
"dataEditing": "boolean",
"toastMessage": "(val) => void",
"editing": "(bool) => void",
"showToast": "(bool) => void",
"showModal3": "boolean",
"showModal4": "(bool) => void",
"dateFilter": "{ startDate: string; endDate: string }",
When I run replaceAll with this regex, and the replacement is a comma I am left with this:
open,
onDidDismiss,
message,
showAddReceiptModal,
showModal,
vehicleTabRef,
onReceiptSubmit,
vehicle,
showModal1,
showModal2,
selectedReceipt,
,
onClick,
onClick1,
dataEditing,
toastMessage,
editing,
showToast,
showModal3,
showModal4,
dateFilter,
,
Notice the empty commas, these should not be here. This expression should target everything to the right of the ":" and replace it all with a comma.
This code is written in VTL to use with the Webstorm IDE to create quick typescript templates. But Java string manipulation works as well
What regex can I use to only remove everything to the right of the : to the ending comma, and just leave that comma?
You might use:
^\s*\"([^\"]+)\":.*(?=,)
Explanation
^ Start of string
\s* Match optional whitespace chars (Or use \h* in Java to not match newlines)
\"([^\"]+)\" Capture in group 1 all between double quotes
:.* Match : and then the rest of the line
(?=,) Positive lookahead, assert a comma to the right
Regex demo
In the replacement use capture group 1.
The result after the replacement:
open,
onDidDismiss,
message,
showAddReceiptModal,
showModal,
vehicleTabRef,
onReceiptSubmit,
vehicle,
showModal1,
showModal2,
selectedReceipt,
onClick,
onClick1,
dataEditing,
toastMessage,
editing,
showToast,
showModal3,
showModal4,
dateFilter,

ANTLR: Parse a date within a quote string

I have a problem figuring out how to parse a date in my grammar.
The thing is that it shares its definition with a String, but according to the Antlr 4 documentation, it should follow the precedence by looking at the order of declaration.
Here is my grammar:
grammar formula;
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=('*'|'/'|'%') r=expr # multdivArithmeticExpr // TODO: test the % operator
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| '-' expr # minusArithmeticExpr
| FUNCTION_NAME '(' (expr ( ',' expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
value
: number
| variable
| date
| string
| bool;
/* Atomes */
bool
: BOOL
;
variable
: '[' (~(']') | ' ')* ']'
;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
/* lexemes de base */
QUOTE : '\'';
DQUOTE : '"';
MINUS : '-';
COLON : ':';
DOT : '.';
PIPE : '|';
BOOL : T R U E | F A L S E;
FUNCTION_NAME: IDENTIFIER ;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO: do we more chars in this set?
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
WS: [ \t\n]+ -> skip;
UNEXPECTED_CHAR: . ;
fragment DIGIT: [0-9];
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The important part here is this:
value
: number
| variable
| date
| string
| bool;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
My grammar expects these things:
"a quoted string" -> gives a string
"2015-03 TOTOTo" -> gives a string because the date format doesn't match.
"2015-03-15" -> gives a date because it matches DQUOTE INT '-' INT '-' INT DQUOTE
And I (tried?) to make sure that the parser tries to match a date before trying to match a string: value: ...| date | string| ....
But when I use the grun utility (and my unit tests...), I can see that it categorizes the date as a string, like if it never bothered to check the date format.
Can you tell me why it is so?
I suspect there's a catch with the order in which I declare my grammar rules, but I tried some permutations and didn't get anything.
The problem stems from the failure to understand that the lexer runs to completion before any of the parser rules are effectively considered.
That means, the STRING_LITERAL lexer rule will consume all strings, dates included, and output just STRING_LITERAL tokens. The date and related parser subrules are never even considered by the parser.
Perhaps the minimal solution is to modify the STRING_LITERAL lexer rule to
STRING_LITERAL
: { notDateString() }?
( QUOTE .*? QUOTE
| DQUOTE .*? DQUOTE
)
;
The notDateString predicate requires native code to perform the essential disambiguation between date formats and other strings.
Another alternative is to promote the STRING_LITERAL rule entirely to the parser. Doable, but a bit messy depending on whether there is a need to preserve whitespaces within 'real' strings.
BTW, you may wish to add a token stream dump to your standard series of unit tests.

Regex doesn't work in Java

I have some expected pattern for data that I receive in my server. The following 2 lines are expected as I wish.
¬14AAAA3170008#¶
%AAAA3170010082¶
So, to check if the data it's fine, I wrote the following regex:
(?<pacote>\\A¬\\d{2}[a-fA-F0-9]{4}\\d{7}.{2})
(?<pacote>\\A[$%][a-fA-F0-9]{4}\\d{10}.)
And it works fine in regex101.com but Java Pattern and Matcher doesn't understand this regex as expected. Here goes my Java code
String data= "¬14AAAA3170008#¶%AAAA3170010082¶";
Pattern patternData = Pattern.compile( "(?<pacote>\\A¬\\d{2}[a-fA-F0-9]{4}\\d{7}.{2})", Pattern.UNICODE_CASE );
Matcher matcherData = patternData.matcher( data );
if( matcherData .matches() ){
System.out.println( "KNOW. DATA[" + matcherData .group( "pacote" ) + "]");
}else{
System.out.println( "UNKNOW" );
}
And it didn't worked as expected. Could someone help me figure what mistake I'm doing?
You're using Matcher#matches, which matches the whole input.
However, the Pattern you're using only applies for the first input, and your whole input contains the two cases concatenated.
On top of that, the \\A boundary matcher implies the pattern follows the start of the input.
You can use the following pattern to generalize and match the two:
String test = "¬14AAAA3170008#¶%AAAA3170010082¶";
Pattern p = Pattern.compile(
// | named group definition
// | | actual pattern
// | | | ¬ + 2 digits or $%
// | | | | 4 hex alnums
// | | | | | 7 to 10 digits
// | | | | | | any 1 or 2 characters
// | | | | | | | multiple times (2 here)
"(?<pacote>((¬\\d{2}|[$%])[a-fA-F0-9]{4}\\d{7,10}.{1,2})+)"
);
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group("pacote"));
}
Output
¬14AAAA3170008#¶%AAAA3170010082¶

Regular Expression in Java for UMASK

I need a Java regular expression to get the following two values:
the value of the UMASK parameter in file /etc/default/security should be set to 077. [Current value: 022] [AA.1.9.3]
the value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]
I need to get the file name from the input string, as well as the current value if existing.
I wrote a regex as .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])?.* for this one, it can match both strings, also to get the file name, but can NOT get the current value.
Then another regex: .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])? .* comparing with the first one, there is a space before the last .* for this one, it can match the string 1, and get file name and current value, but it can NOT match the string 2...
What how can I correct these regular expressions to obtain the values described above?
If I understand your requisites correctly (file name and current octal permissions value), you can use the following Pattern:
String input =
"Value for parameter UMASK in file /etc/default/security should be set to 077. " +
"[Current value: 022] [AA.1.9.3] - " +
"Value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]";
// | "file " marker
// | | group 1: the file path
// | | | space after
// | | || any characters
// | | || | escaped square bracket
// | | || | | "Current value: " marker
// | | || | | | group 2:
// | | || | | | digits for value
// | | || | | | | closing bracket
Pattern p = Pattern.compile("file (.+?) .+?\\[Current value: (\\d+)\\]");
Matcher m = p.matcher(input);
// iterates, but will find only once in this instance (which is desirable)
while (m.find()) {
System.out.printf("File: %s%nCurrent value: %s%n", m.group(1), m.group(2));
}
Output
File: /etc/default/security
Current value: 022

Java - How to parse irc message to human readable

I am creating an irc client in Java. It work fine but the message from the server is a bit "messed-up"
for example :User1!webirc#1.9.com PRIVMSG #channel :test. So i wanna know how to parse the irc message to human readable? Here is a regex that i found ^(:(\\S+) )?(\\S+)( (?!:)(.+?))?( :(.+))?$ for irc message.
The IRC Protocol is documented here: https://www.rfc-editor.org/rfc/rfc2812
2.3.1 Message format in Augmented BNF
The protocol messages must be extracted from the contiguous stream of
octets. The current solution is to designate two characters, CR and
LF, as message separators. Empty messages are silently ignored,
which permits use of the sequence CR-LF between messages without
extra problems.
The extracted message is parsed into the components ,
and list of parameters ().
The Augmented BNF representation for this is:
message = [ ":" prefix SPACE ] command [ params ] crlf
prefix = servername / ( nickname [ [ "!" user ] "#" host ] )
command = 1*letter / 3digit
params = *14( SPACE middle ) [ SPACE ":" trailing ]
=/ 14( SPACE middle ) [ SPACE [ ":" ] trailing ]
nospcrlfcl = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B-FF
; any octet except NUL, CR, LF, " " and ":"
middle = nospcrlfcl *( ":" / nospcrlfcl )
trailing = *( ":" / " " / nospcrlfcl )
SPACE = %x20 ; space character
crlf = %x0D %x0A ; "carriage return" "linefeed"

Categories