I have a regular expression that will target everything to the right of ":" in a map and to the left of a ","
I am using this RegExp:
:(.*?)(,|})
Like this:
$PROP_TYPES.replaceAll(":(.*?)(,|})", ",").concat("}")
This works well for the most part, except when the expression runs into objects that are on the right side of the ":"
For example, on a list of prop type definitions in TypeScript:
"open": "boolean",
"onDidDismiss": "() => void",
"message": "string",
"showAddReceiptModal": "boolean",
"showModal": "(value: (((prevState: boolean) => boolean) | boolean)) => void",
"vehicleTabRef": "React.MutableRefObject<null>",
"onReceiptSubmit": "(data) => void",
"vehicle": "any",
"showModal1": "boolean",
"showModal2": "(value: (((prevState: boolean) => boolean) | boolean)) => void",
"selectedReceipt": "{ __typename: "Receipt" | undefined; id: string | undefined; purchaseDate: string | undefined; sellerName: string | undefined; sellerAddress: string | undefined; sellerCity: string | undefined; sellerState: string | undefined; sellerZipCode: string | undefined; totalAmount: number | undefined; gallonsPurchased: number | undefined; image?: string | null | undefined | undefined; userID?: string | null | undefined | undefined; vehicleID: string | undefined; vehicle?: Vehicle | null | undefined | undefined; taxRefund?: number | null | undefined | undefined; createdAt: string | undefined; updatedAt: string | undefined }",
"onClick": "() => void",
"onClick1": "() => void",
"dataEditing": "boolean",
"toastMessage": "(val) => void",
"editing": "(bool) => void",
"showToast": "(bool) => void",
"showModal3": "boolean",
"showModal4": "(bool) => void",
"dateFilter": "{ startDate: string; endDate: string }",
When I run replaceAll with this regex, and the replacement is a comma I am left with this:
open,
onDidDismiss,
message,
showAddReceiptModal,
showModal,
vehicleTabRef,
onReceiptSubmit,
vehicle,
showModal1,
showModal2,
selectedReceipt,
,
onClick,
onClick1,
dataEditing,
toastMessage,
editing,
showToast,
showModal3,
showModal4,
dateFilter,
,
Notice the empty commas, these should not be here. This expression should target everything to the right of the ":" and replace it all with a comma.
This code is written in VTL to use with the Webstorm IDE to create quick typescript templates. But Java string manipulation works as well
What regex can I use to only remove everything to the right of the : to the ending comma, and just leave that comma?
You might use:
^\s*\"([^\"]+)\":.*(?=,)
Explanation
^ Start of string
\s* Match optional whitespace chars (Or use \h* in Java to not match newlines)
\"([^\"]+)\" Capture in group 1 all between double quotes
:.* Match : and then the rest of the line
(?=,) Positive lookahead, assert a comma to the right
Regex demo
In the replacement use capture group 1.
The result after the replacement:
open,
onDidDismiss,
message,
showAddReceiptModal,
showModal,
vehicleTabRef,
onReceiptSubmit,
vehicle,
showModal1,
showModal2,
selectedReceipt,
onClick,
onClick1,
dataEditing,
toastMessage,
editing,
showToast,
showModal3,
showModal4,
dateFilter,
Related
I'm trying to parse Metrics data into a formatted String so that there is a header and each record below starts from a new line. Initially I wanted to get something close to a table formatting like this:
Id | Name | Rate | Value
1L | Name1 | 1 | value_1
2L | Name2 | 2 | value_2
3L | Name3 | 3 | value_3
But my current implementation results in the following Error:
java.util.MissingFormatArgumentException: Format specifier '%-70s'
What should I change in my code to get it formatted correctly?
import spark.implicits._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
case class BaseMetric(val id: Long,
val name: String,
val rate: String,
val value: String,
val count: Long,
val isValid: Boolean
) {
def makeCustomMetric: String = Seq(id, name, rate, value).mkString("\t")
}
val metric1 = new BaseMetric(1L, "Name1", "1", "value_1", 10L, true)
val metric2 = new BaseMetric(2L, "Name2", "2", "value_2", 20L, false)
val metric3 = new BaseMetric(3L, "Name3", "3", "value_3", 30L, true)
val metrics = Seq(metric1, metric1, metric1)
def formatMetrics(metrics: Seq[BaseMetric]): String = {
val pattern = "%-50s | %-70s | %-55s | %-65s | %f"
val formattedMetrics: String = pattern.format(metrics.map(_.makeCustomMetric))
.mkString("Id | Name | Rate | Value\n", "\n", "\nId | Name | Rate | Value")
formattedMetrics
}
val metricsString = formatMetrics(metrics)
The specific error is due to the fact that you pass a Seq[String] to format which expects Any*. You only pass one parameter instead of five. The error says it doesn't find an argument for your second format string.
You want to apply the pattern on every metric, not all the metrics on the pattern.
The paddings in the format string are too big for what you want to achieve.
val pattern = "%-2s | %-5s | %-4s | %-6s"
metrics.map(m => pattern.format(m.makeCustomMetric: _*))
.mkString("Id | Name | Rate | Value\n", "\n", "\nId | Name | Rate | Value")
The _* tells the compiler that you want to pass a list as variable length argument.
makeCustomMetric should return only the List then, instead of a string.
def makeCustomMetric: String = Seq(id, name, rate, value)
Scala string interpolation is the optimized way to concat/foramt strings.
Reference: https://docs.scala-lang.org/overviews/core/string-interpolation.html
s"id: $id ,name: $name ,rate: $rate ,value: $value ,count: $count, isValid: $isValid"
I have a problem in getting all operator token in a rule. Such as if my input is (states = failed) and (states1 = nominal) or (states2 = nominal), then I want to get "and"/"or". I already have a grammar that can parse my input, but words like 'and' and 'or' are keywords in my grammar. So that they can show up in the parse tree but they didn't match a rule.
I want to finish it by Listener method, but I don't know how to get these tokens.
My lexer file:
lexer grammar TransitionLexer;
BOOLEAN: 'true' | 'false';
IF: 'if';
THEN: 'then';
ELSE: 'else';
NAME: (ALPHA | CHINESE | '_')(ALPHA | CHINESE | '_'|DIGIT)*;
ALPHA: [a-zA-Z];
CHINESE: [\u4e00-\u9fa5];
NUMBER: INT | REAL;
INT: DIGIT+
|'(-'DIGIT+')';
REAL: DIGIT+ ('.' DIGIT+)?
| '(-' DIGIT+ ('.' DIGIT+)? ')';
fragment DIGIT: [0-9];
OPCOMPARE: '='|'>='|'<='|'!='|'>'|'<';
WS: [ \t\n\r]+ ->skip;
SL_COMMENT: '/*' .*? '*/' ->skip;
My grammar file:
grammar TransitionCondition;
import TransitionLexer;
#parser::header{
import java.util.*;
}
#parser:: members{
private List<String> keywords = new ArrayList<String>();
public boolean isKeyWord(){
return keywords.contains(getCurrentToken().getText());
}
public List<String> getKeywords(){
return keywords;
}
}
condition : stat+ EOF;
stat : expr;
expr: pair (('and' | 'or') pair)*
| '(' pair ')';
pair: '(' var OPCOMPARE value ')' # keyValuePair
| booleanExpr # booleanPair
| BOOLEAN # plainBooleanPair
;
var: localStates # localVar
| globalStates # globalVar
| connector # connectorVar
;
localStates: NAME;
globalStates: 'Top' ('.' brick)+ '.' NAME;
connector: brick '.' NAME;
value: {isKeyWord()}? userDefinedValue
|basicValue
;
userDefinedValue: NAME;
basicValue: arithmeticExpr | booleanExpr;
booleanExpr: booleanExpr op=('and' | 'or') booleanExpr
| BOOLEAN
| relationExpr
| 'not' booleanExpr
| '(' booleanExpr ')'
;
relationExpr: arithmeticExpr
| arithmeticExpr OPCOMPARE arithmeticExpr
;
arithmeticExpr: arithmeticExpr op=('*'|'/') arithmeticExpr
| arithmeticExpr op=('+'|'-') arithmeticExpr
| 'min' '(' arithmeticExpr (',' arithmeticExpr)* ')'
| 'max' '(' arithmeticExpr (',' arithmeticExpr)* ')'
| globalStates
| connector
| localStates
| NUMBER
| '(' arithmeticExpr ')'
;
brick: NAME;
My Input file t.expr with content: (states = failed) and (states1 = nominal) or (states2 = nominal)
I get the tree in Command line using 'grun'.
If you label your parser rule expr:
expr
: pair (operators+=('and' | 'or') pair)* #logicalExpr
| '(' pair ')' #parensExpr
;
your (generated) listener class will contain these methods:
void enter_logicalExpr(TransitionConditionParser.LogicalExprContext ctx);
void enter_parensExpr(TransitionConditionParser.ParensExprContext ctx);
Inside enter_logicalExpr you can find the and/or tokens in the java.util.List from the context: ctx.operators.
I have a problem figuring out how to parse a date in my grammar.
The thing is that it shares its definition with a String, but according to the Antlr 4 documentation, it should follow the precedence by looking at the order of declaration.
Here is my grammar:
grammar formula;
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=('*'|'/'|'%') r=expr # multdivArithmeticExpr // TODO: test the % operator
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| '-' expr # minusArithmeticExpr
| FUNCTION_NAME '(' (expr ( ',' expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
value
: number
| variable
| date
| string
| bool;
/* Atomes */
bool
: BOOL
;
variable
: '[' (~(']') | ' ')* ']'
;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
/* lexemes de base */
QUOTE : '\'';
DQUOTE : '"';
MINUS : '-';
COLON : ':';
DOT : '.';
PIPE : '|';
BOOL : T R U E | F A L S E;
FUNCTION_NAME: IDENTIFIER ;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO: do we more chars in this set?
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
WS: [ \t\n]+ -> skip;
UNEXPECTED_CHAR: . ;
fragment DIGIT: [0-9];
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The important part here is this:
value
: number
| variable
| date
| string
| bool;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
My grammar expects these things:
"a quoted string" -> gives a string
"2015-03 TOTOTo" -> gives a string because the date format doesn't match.
"2015-03-15" -> gives a date because it matches DQUOTE INT '-' INT '-' INT DQUOTE
And I (tried?) to make sure that the parser tries to match a date before trying to match a string: value: ...| date | string| ....
But when I use the grun utility (and my unit tests...), I can see that it categorizes the date as a string, like if it never bothered to check the date format.
Can you tell me why it is so?
I suspect there's a catch with the order in which I declare my grammar rules, but I tried some permutations and didn't get anything.
The problem stems from the failure to understand that the lexer runs to completion before any of the parser rules are effectively considered.
That means, the STRING_LITERAL lexer rule will consume all strings, dates included, and output just STRING_LITERAL tokens. The date and related parser subrules are never even considered by the parser.
Perhaps the minimal solution is to modify the STRING_LITERAL lexer rule to
STRING_LITERAL
: { notDateString() }?
( QUOTE .*? QUOTE
| DQUOTE .*? DQUOTE
)
;
The notDateString predicate requires native code to perform the essential disambiguation between date formats and other strings.
Another alternative is to promote the STRING_LITERAL rule entirely to the parser. Doable, but a bit messy depending on whether there is a need to preserve whitespaces within 'real' strings.
BTW, you may wish to add a token stream dump to your standard series of unit tests.
I need a Java regular expression to get the following two values:
the value of the UMASK parameter in file /etc/default/security should be set to 077. [Current value: 022] [AA.1.9.3]
the value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]
I need to get the file name from the input string, as well as the current value if existing.
I wrote a regex as .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])?.* for this one, it can match both strings, also to get the file name, but can NOT get the current value.
Then another regex: .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])? .* comparing with the first one, there is a space before the last .* for this one, it can match the string 1, and get file name and current value, but it can NOT match the string 2...
What how can I correct these regular expressions to obtain the values described above?
If I understand your requisites correctly (file name and current octal permissions value), you can use the following Pattern:
String input =
"Value for parameter UMASK in file /etc/default/security should be set to 077. " +
"[Current value: 022] [AA.1.9.3] - " +
"Value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]";
// | "file " marker
// | | group 1: the file path
// | | | space after
// | | || any characters
// | | || | escaped square bracket
// | | || | | "Current value: " marker
// | | || | | | group 2:
// | | || | | | digits for value
// | | || | | | | closing bracket
Pattern p = Pattern.compile("file (.+?) .+?\\[Current value: (\\d+)\\]");
Matcher m = p.matcher(input);
// iterates, but will find only once in this instance (which is desirable)
while (m.find()) {
System.out.printf("File: %s%nCurrent value: %s%n", m.group(1), m.group(2));
}
Output
File: /etc/default/security
Current value: 022
I want to know about GUID in JAVA. GUID can contain a white space or not?
i am using following Code
import java.util.UUID;
UUID uuid = UUID.randomUUID();
String randomUUIDString = uuid.toString();
randomUUIDString can contain White Space? if yes what is chances to avoid it ?
No. From the documentation of UUID.toString()
The UUID string representation is as described by this BNF:
UUID = <time_low> "-" <time_mid> "-"
<time_high_and_version> "-"
<variant_and_sequence> "-"
<node>
time_low = 4*<hexOctet>
time_mid = 2*<hexOctet>
time_high_and_version = 2*<hexOctet>
variant_and_sequence = 2*<hexOctet>
node = 6*<hexOctet>
hexOctet = <hexDigit><hexDigit>
hexDigit =
"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
| "a" | "b" | "c" | "d" | "e" | "f"
| "A" | "B" | "C" | "D" | "E" | "F"
As you can see, every character in the returned string will be a character from A-F, a-f, 0-9 or '-'.