Java - How to parse irc message to human readable

Java - How to parse irc message to human readable - java

I am creating an irc client in Java. It work fine but the message from the server is a bit "messed-up"
for example :User1!webirc#1.9.com PRIVMSG #channel :test. So i wanna know how to parse the irc message to human readable? Here is a regex that i found ^(:(\\S+) )?(\\S+)( (?!:)(.+?))?( :(.+))?$ for irc message.

The IRC Protocol is documented here: https://www.rfc-editor.org/rfc/rfc2812
2.3.1 Message format in Augmented BNF
The protocol messages must be extracted from the contiguous stream of
octets. The current solution is to designate two characters, CR and
LF, as message separators. Empty messages are silently ignored,
which permits use of the sequence CR-LF between messages without
extra problems.
The extracted message is parsed into the components ,
and list of parameters ().
The Augmented BNF representation for this is:
message = [ ":" prefix SPACE ] command [ params ] crlf
prefix = servername / ( nickname [ [ "!" user ] "#" host ] )
command = 1*letter / 3digit
params = *14( SPACE middle ) [ SPACE ":" trailing ]
=/ 14( SPACE middle ) [ SPACE [ ":" ] trailing ]
nospcrlfcl = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B-FF
; any octet except NUL, CR, LF, " " and ":"
middle = nospcrlfcl *( ":" / nospcrlfcl )
trailing = *( ":" / " " / nospcrlfcl )
SPACE = %x20 ; space character
crlf = %x0D %x0A ; "carriage return" "linefeed"

Related

URI and double slashes

java.net.URI.create("localhost:8080/foo") // Works
java.net.URI.create("127.0.0.1:8080/foo") // Throws exception
java.net.URI.create("//127.0.0.1:8080/foo") // Works
Is double slash required for when you have the host as an IP Address? I glanced through the RFC for URI - https://www.rfc-editor.org/rfc/rfc3986. But could not find anything pertaining to this.

java.net.URI.create uses the syntax described in RFC 2396.
java.net.URI.create("localhost:8080/foo")
This doesn't produce an exception, but the URI is parsed in a way which you probably don't expect. Its scheme (not host!) is set to localhost, and the 8080/foo isn't port + path, but a scheme-specific part. So this doesn't really work.
java.net.URI.create("//localhost:8080/foo")
parses the URL without scheme, as a net_path grammar element (see RFC 2396 for details).
Here's the relevant grammar excerpt from the RFC 2396:
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
// This is how 'localhost:8080/foo' is parsed:
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
...
// This is how '//127.0.0.1:8080/foo' is parsed:
net_path = "//" authority [ abs_path ]
...
// Scheme must start with a letter,
// hence 'localhost' is parsed as a scheme, but '127' isn't:
scheme = alpha *( alpha | digit | "+" | "-" | "." )
One proper way would be:
java.net.URI.create("http://localhost:8080/foo")

How add quotes in a JSON string, using Java, when the value is a date

I'm facing difficulties in a scenario that I need to read a JSON object, in Java, that has no double quotes in the keys and no values, like the example below:
"{id: 267107086801, productCode: 02-671070868, lastUpdate: 2018-07-15, lastUpdateTimestamp: 2018-07-15 01:49:58, user: {pf: {document: 123456789, name: Luis Fernando}, address: {street: Rua Pref. Josu00e9 Alves Lima,number:37}, payment: [{sequential: 1, id: CREDIT_CARD, value: 188, installments: 9}]}"
I was able to add the double quotes in the fields using the code below, with replaceAll and the Gson library:
String jsonString = gson.toJson (obj);
String jsonString = jsonString.replaceAll ("([\\ w] +) [] *:", "\" $ 1 \ ":"); // to quote before: value
jsonString = jsonString.replaceAll (": [] * ([\\ w # \\.] +)", ": \" $ 1 \ ""); // to quote after: value, add special character as needed to the exclusion list in regex
jsonString = jsonString.replaceAll (": [] * \" ([\\ d] +) \ "", ": $ 1"); // to un-quote decimal value
jsonString = jsonString.replaceAll ("\" true \ "", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll ("\" false \ "", "false"); // to un-quote boolean
However, fields with dates are being broken down erroneously, for example:
"{"id" : 267107086801,"productCode" : 02-671070868,"lastUpdate" : 2018-07-15,"lastUpdateTimestamp" : 2018-07-15 "01" : 49 : 58,"user" :{"pf":{"document" : 123456789, "name" : "Luis" Fernando},"address" :{"street" : "Rua"Pref.Josu00e9AlvesLima,"number" : 37},"payment" : [{"sequential" : 1,"id" : "CREDIT_CARD","value" : 188,"installments" : 9}]}"
Also, strings with spaces are wrong as well. How could I correct this logic? What am I doing wrong? Thanks in advance.

String incorrectJson = "{id: 267107086801, productCode: 02-671070868,"
+ " lastUpdate: 2018-07-15, lastUpdateTimestamp: 2018-07-15 01:49:58,"
+ " user: {pf: {document: 123456789, name: Luis Fernando},"
+ " address: {street: Rua Pref. Josu00e9 Alves Lima,number:37},"
+ " payment: [{sequential: 1, id: CREDIT_CARD, value: 188, installments: 9}]}";
String correctJson = incorrectJson.replaceAll("(?<=: ?)(?![ \\{\\[])(.+?)(?=,|})", "\"$1\"");
System.out.println(correctJson);
Output:
{id: "267107086801", productCode: "02-671070868", lastUpdate:
"2018-07-15", lastUpdateTimestamp: "2018-07-15 01:49:58", user: {pf:
{document: "123456789", name: "Luis Fernando"}, address: {street: "Rua
Pref. Josu00e9 Alves Lima",number:"37"}, payment: [{sequential: "1",
id: "CREDIT_CARD", value: "188", installments: "9"}]}
One downside of non-trivial regular expressions is they can be hard to read. The one I use here matches each literal value (but not values that are objects or arrays). I am using colons, commas and curly braces to guide the matching so I don’t need to care what is inside each string value, it may be any characters (except comma or right curly brace). The parts mean:
(?<=: ?): there’s a colon an optionally a blank before the value (lookbehind)
(?![ \\{\\[]) the value does not start with a blank, curly brace or square bracket (negative lookahead; blank because we don’t want a blank between the colon and the value to be taken as part of the value)
(.+?): the value consists of at least one character, as few as possible (reluctant quantifier; or regex would try to take the rest of the string)
(?=,|}): after the value comes either a comma or a right curly brace (positive lookahead).
Without being well versed in JSON I don’t think you need to quote the name. You may, though:
String correctJson = incorrectJson.replaceAll(
"(?<=\\{|, ?)([a-zA-Z]+?): ?(?![ \\{\\[])(.+?)(?=,|})", "\"$1\": \"$2\"");
{"id": "267107086801", "productCode": "02-671070868", "lastUpdate":
"2018-07-15", "lastUpdateTimestamp": "2018-07-15 01:49:58", user: {pf:
{"document": "123456789", "name": "Luis Fernando"}, address:
{"street": "Rua Pref. Josu00e9 Alves Lima","number": "37"}, payment:
[{"sequential": "1", "id": "CREDIT_CARD", "value": "188",
"installments": "9"}]}

The following code takes care single quote present in JSON string as well as a key containing number
jsonString = jsonString.replaceAll(" :",":"); // to trip space after key
jsonString = jsonString.replaceAll(": ,",":,");
jsonString = jsonString.replaceAll("(?<=: ?)(?![ \{\[])(.+?)(?=,|})", ""$1"");
jsonString = jsonString.replaceAll("(?<=\{|, ?)([a-zA-Z0-9]+?)(?=:)",""$1"");
jsonString = jsonString.replaceAll(""true"", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll(""false"", "false"); // to un-quote boolean
jsonString = jsonString.replaceAll(""null"", "null");// to un-quote null
jsonString = jsonString.replaceAll(":",", ":"" ,"); // to remove unnecessary double quotes
jsonString = jsonString.replaceAll("true"", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll("'",", "',"); // to handle single quote within json string
jsonString = jsonString.replaceAll("'},", "'}","); // to put double quote after string ending with single quote

Remove the double quotes in JSON String created using GSON

I got the below result in spark after using GSON library.
[
"{"A":"1","A-Description":"Eastern "}",
"{"B":"2","B-Description":"Western "}",
"{"C":"3","C-Description":"Northern "}",
"{"D":"4","D-Description":"Southern"}"
]
I want to remove the double quotes from start and end of json string
Final result will be as below :
[
{"A":"1","A-Description":"Eastern "},
{"B":"2","B-Description":"Western "},
{"C":"3","C-Description":"Northern "},
{"D":"4","D-Description":"Southern"}
]
I have solved the issue as below :
val jsonString = str.replaceAll("\\\\", "").replaceAll("\"(.+)\"", "$1")
where str is some string.
Please suggest more efficient way if available.

Only recognizing one token in antlr4 grammar

I want my grammar to recognize the following expression &COL[0]. I have built the following grammar:
array:
ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
This gives the error:
mismatched input '[1]' expecting '['
It only works if I write &COL[ 0] with spaces between the [ and ]

I changed the grammar a bit to make it complete enough to run. The text &COL[0] lexes fine with this amended grammar.
grammar test1; // different name for my test rig
test1: ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
QUOT: '"'; // assumed this
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
WS : [ \t\r\n] -> skip; // added whitespace just so I could add \r\n
Here's the tokenized output:
[#0,0:3='&COL',<ARRAY_NAME>,1:0]
[#1,4:4='[',<'['>,1:4]
[#2,5:5='0',<ARRAY_DIGIT>,1:5]
[#3,6:6=']',<']'>,1:6]
[#4,9:8='<EOF>',<EOF>,2:0]
So this answers the question you asked but I'm still not sure about your definition of STRING. But &COL[0] parses great now.

Parse without ignoring whitespaces - Java

I have the following string input (from a netstat -a command):
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ] DGRAM 11453 /run/systemd/shutdownd
unix 2 [ ] DGRAM 7644 /run/systemd/notify
unix 2 [ ] DGRAM 7646 /run/systemd/cgroups-agent
unix 5 [ ] DGRAM 7657 /run/systemd/journal/socket
unix 14 [ ] DGRAM 7659 /dev/log
unix 3 [ ] STREAM CONNECTED 16620
unix 3 [ ] STREAM CONNECTED 16621
Meanwhile I'm attempting to parse the above string as:
// lines is an array representing each line above
for (int i = 0; i < lines.length; i++) {
String[] tokens = lines[i].split("\\s+");
}
I want to have tokens as an array of 7 entries [Proto, RefCnt, Flag, Type, State, I-Node, Path]. Instead, I'm obtaining an array that excludes the brackets under Flags and the empty State:
["unix", "2", "[", "]", "DGRAM", "11453", "/run/systemd/shutdownd"]
instead of
["unix", "2", "[]", "DGRAM", "", "11453", "/run/systemd/shutdownd"]
How can I fix my regex to produce the correct output?

You need to set minimal space length in your regular expression to 2, try split like this:
String[] tokens = lines[i].split("\\s{2,16}+");
Or like #revo suggests using lookarounds, like this:
String[] tokens = lines[i].split("(?<!\\[)\\s{2,16}+(?!\\])");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - How to parse irc message to human readable - java

Related

URI and double slashes

How add quotes in a JSON string, using Java, when the value is a date

Remove the double quotes in JSON String created using GSON

Only recognizing one token in antlr4 grammar

Parse without ignoring whitespaces - Java

Categories

Resources