TCL String Manipulation with curly braces

TCL String Manipulation with curly braces - java

I'm modifying an application backout script in JACL. This script is designed to search for a JVM argument string that we want to remove within the JVM arguments of the server. New to the application this release cycle is a jvm argument ${variable_name}. My old code
set ixReplace [lsearch -exact $jvm_args "string_to_search"]
set jvm_args [lreplace $jvm_args $ixReplace $ixReplace]
now returns an extra set of {} like this
-Xjit:disableOSR -Xgc:preferredHeapBase=0x100000000 -Xmnx1152m -Xmns512m
-Xgcpolicy:gencon -agentlib:getClasses -noverify {${variable_name}}
I've found multiple articles about how to remove the extra {} here and here but I cannot seem to set the results to a variable to which i'm using to set new jvm arguments.
My ultimate goal is to have the correct string set to a variable called jvm_args so that I may use to update the JVM arguments like this.
set attr {}
lappend attr [list genericJvmArguments $jvm_args]
$AdminConfig modify $server_jvm_id $attr
Any help or suggestions would be greatly appreciated.

Tcl's adding those braces because you've got a well-formed Tcl list after the lreplace operation, and not any old string. The braces stop the ${variable_name} from ever being interpreted as a variable substitution; the $ is a Tcl metasyntax character. (Square brackets would also attract quoting, as would a few other characters too.)
However, you're wanting to feed the result into a context that doesn't expect a Tcl list, but rather probably a simple space-separated string. The simplest approach is to just use join at the point where you stop thinking of having a Tcl list of words and start thinking of having a general string, probably like this:
lappend attr [list genericJvmArguments [join $jvm_args]]
It won't cope well if you've got spaces embedded in the string, or a few other cases, but without knowing the exact criteria for what makes a word a word in the source material or how to quote things in the system that is taking them, this is the best you're likely to get. (It is at least cheap to do this much…)

Related

How to write a regular expression that can matches the whole function #Prompt (…) whatever written inside () even if it contains another ()

For example I want replace any prompt function in an SQL query
I have used this expression
Query = Query.replaceAll("#prompt\\s*\\(.*?\\)", "(1)");
This expression works in this example
#Prompt('Bill Cycle','A','MIGRATION\BC',,,)
#Prompt('Bill Cycle','A','MIGRATION\BC',,,)
and the output is is (1)
but when it does not work on this example
#Prompt('Groups','A','[Lookup] Price Group (2)\Nested Group',,,)
the out put is (1) \Nested Group',,,) which is not valid

Sadly, as pointed out by Joe C in a comment, what you are trying to do cannot be done in a regular expression for arbitrary depth parenthesis. The reason is because regular-expressions are not capable of "counting". You need a stack machine for that, or a context-free language parser.
However, you also suggest that the 'prompted' content is always inside single quotes. I assume below the standard Java regexp library. Other regexp libraries might need translation...
"#Prompt\\('[^']*'(\s*,\s*(('[^']*')|([^',)]*)))*\\)"
So, you are searching within prompt for blocks of single-quoted text. The search assumes that each internal bit of content is enclosed in single quotes.
Verify at https://regex101.com/r/nByy0Y/1 (I made a couple fixes). Note that at regex101.com, it will treat the double back-slash as intending a literal back-slash. What you want instead is just to quote the parenthesis so that you want a literal parenthesis.

Because you are using the lazy quantifier '?', it is stopping the match at the end of the first ')'. removing that will let it go to the end greedily, as such:
#prompt\(.*\)
But if there is concern that the entries may have more parans after the one in question, it will cause problems.
Assuming the additional parens will always be in quotes, you can do this:
#prompt\((('([^'])*',*)*|(.*,*)*)\)
Here is it looking for items wrapped in single quotes OR text without parens, which should capture all of the single quoted elements or null params or unquoted text params

How to get rid of files with names bin$

Using Jooq generator, by Gradle plugin, I am getting now with POJOs and tables not only classes with normal names, bu also heaps of files whose names start by bin$.
They are not necessary, for only yesterday the generator did not make these files. And everything works OK with or without them. But I don't want the project to be littered with tens of excessive files.

Since 10'th version, Oracle puts the dropped tables to the recycle bin. They have names starting by Bin$. So, JooQ simply makes classes for dropped tables. That could be blocked in two ways: To stop use recycling bean in Oracle or to filter the tables for which the Jooq generator makes classes.
ALTER SYSTEM SET RECYCLEBIN = OFF DEFERRED;
purge dba_recyclebin;
or to change the generator setting (the example is for Gradle)
generator{
...
database {
...
excludes = '(?i:BIN\\$.*)'
Edit: Finally after several attempts (by Lukas) and checks (by me) Lukas had found the correct meaning for excludes. Its form, IMHO, has the only explanation - JOOQ doesn't work with regex'es correctly, for Groovy does not parse the strings in single quotes.

jOOQ's <excludes/> setting is a Java regular expression. You have to properly form it like this:
excludes = '(?i:BIN\\$.*)'
Explanation:
Use (?i:...) for case-insensitivity. Just in case. Pun intended.
Use \\ before the $ sign because the $ means "end of line" in regular expressions. You want to escape that. And because Groovy/Gradle parses (as in "look for escape sequences") your string, you need to escape the backslash too, for it to reach the Java Pattern.compile() call
Use .* to indicate that after the $, you want to match any number of characters. . = any character and * = any number of repetitions

What are the differences between the variations of JEXL?

Does anyone know the best place where I can go to see the differences between the variations of JEXL? I've noted the following so far.
Expression
This only allows for a single command to be executed and the result from that is returned. If you try to use multiple commands it ignores everything after the first semi-colon and just returns the result from the first command.
Script
This allows you to put multiple commands in the expression and you can use variable assignments, loops, calculations, etc. The result from the last command is returned from the script.
Unified
This is ideal for text. To get a calculation you use the EL-like syntax as in ${someVariable}. The expression that goes between the brackets behaves like a script, not an expression. You can use semi-colons to execute multiple commands and the result from the last command is returned from the script.

Regex to find variables and ignore methods

I'm trying to write a regex that finds all variables (and only variables, ignoring methods completely) in a given piece of JavaScript code. The actual code (the one which executes regex) is written in Java.
For now, I've got something like this:
Matcher matcher=Pattern.compile(".*?([a-z]+\\w*?).*?").matcher(string);
while(matcher.find()) {
System.out.println(matcher.group(1));
}
So, when value of "string" is variable*func()*20
printout is:
variable
func
Which is not what I want. The simple negation of ( won't do, because it makes regex catch unnecessary characters or cuts them off, but still functions are captured. For now, I have the following code:
Matcher matcher=Pattern.compile(".*?(([a-z]+\\w*)(\\(?)).*?").matcher(formula);
while(matcher.find()) {
if(matcher.group(3).isEmpty()) {
System.out.println(matcher.group(2));
}
}
It works, the printout is correct, but I don't like the additional check. Any ideas? Please?
EDIT (2011-04-12):
Thank you for all answers. There were questions, why would I need something like that. And you are right, in case of bigger, more complicated scripts, the only sane solution would be parsing them. In my case, however, this would be excessive. The scraps of JS I'm working on are intented to be simple formulas, something like (a+b)/2. No comments, string literals, arrays, etc. Only variables and (probably) some built-in functions. I need variables list to check if they can be initalized and this point (and initialized at all). I realize that all of it can be done manually with RPN as well (which would be safer), but these formulas are going to be wrapped with bigger script and evaluated in web browser, so it's more convenient this way.
This may be a bit dirty, but it's assumed that whoever is writing these formulas (probably me, for most of the time), knows what is doing and is able to check if they are working correctly.
If anyone finds this question, wanting to do something similar, should now the risks/difficulties. I do, at least I hope so ;)

Taking all the sound advice about how regex is not the best tool for the job into consideration is important. But you might get away with a quick and dirty regex if your rule is simple enough (and you are aware of the limitations of that rule):
Pattern regex = Pattern.compile(
"\\b # word boundary\n" +
"[A-Za-z]# 1 ASCII letter\n" +
"\\w* # 0+ alnums\n" +
"\\b # word boundary\n" +
"(?! # Lookahead assertion: Make sure there is no...\n" +
" \\s* # optional whitespace\n" +
" \\( # opening parenthesis\n" +
") # ...at this position in the string",
Pattern.COMMENTS);
This matches an identifier as long as it's not followed by a parenthesis. Of course, now you need group(0) instead of group(1). And of course this matches lots of other stuff (inside strings, comments, etc.)...

If you are rethinking using regex and wondering what else you could do, you could consider using an AST instead to access your source programatically. This answer shows you could use the Eclipse Java AST to build a syntax tree for Java source. I guess you could do similar for Javascript.

A regex won't cut in this case because Java isn't regular. Your best best is to get a parser that understands Java syntax and build onto that. Luckily, ANTLR has a Java 1.6 grammar (and 1.5 grammar).
For your rather limited use case you could probably easily extend the variable assignment rules and get the info you need. It's a bit of a learning curve but this will probably be your best best for a quick and accurate solution.

It's pretty well established that regex cannot be reliably used to parse structured input. See here for the famous response: RegEx match open tags except XHTML self-contained tags
As any given sequence of characters may or may not change meaning depending on previous or subsequent sequences of characters, you cannot reliably identify a syntactic element without both lexing and parsing the input text. Regex can be used for the former (breaking an input stream into tokens), but cannot be used reliably for the latter (assigning meaning to tokens depending on their position in the stream).

Need some ideas on how to acomplish this in Java (parsing strings)

Sorry I couldn't think of a better title, but thanks for reading!
My ultimate goal is to read a .java file, parse it, and pull out every identifier. Then store them all in a list. Two preconditions are there are no comments in the file, and all identifiers are composed of letters only.
Right now I can read the file, parse it by spaces, and store everything in a list. If anything in the list is a java reserved word, it is removed. Also, I remove any loose symbols that are not attached to anything (brackets and arithmetic symbols).
Now I am left with a bunch of weird strings, but at least they have no spaces in them. I know I am going to have to re-parse everything with a . delimiter in order to pull out identifiers like System.out.print, but what about strings like this example:
Logger.getLogger(MyHash.class.getName()).log(Level.SEVERE,
After re-parsing by . I will be left with more crazy strings like:
getLogger(MyHash
getName())
log(Level
SEVERE,
How am I going to be able to pull out all the identifiers while leaving out all the trash? Just keep re-parsing by every symbol that could exist in java code? That seems rather lame and time consuming. I am not even sure if it would work completely. So, can you suggest a better way of doing this?

There are several solutions that you can use, other than hacking your-own parser:
Use an existing parser, such as this one.
Use BCEL to read bytecode, which includes all fields and variables.
Hack into the compiler or run-time, using annotation processing or mirrors - I'm not sure you can find all identifiers this way, but fields and parameters for sure.

I wouldn't separate the entire file at once according to whitespace. Instead, I would scan the file letter-by-letter, saving every character in a buffer until I'm sure an identifier has been reached.
In pseudo-code:
clean buffer
for each letter l in file:
if l is '
toggle "character mode"
if l is "
toggle "string mode"
if l is a letter AND "character mode" is off AND "string mode" is off
add l to end of buffer
else
if buffer is NOT a keyword or a literal
add buffer to list of identifiers
clean buffer
Notice some lines here hide further complexity - for example, to check if the buffer is a literal you need to check for both true, false, and null.
In addition, there are more bugs in the pseudo-code - it will find identify things like the e and L parts of literals (e in floating-point literals, L in long literals) as well. I suggest adding additional "modes" to take care of them, but it's a bit tricky.
Also there are a few more things if you want to make sure it's accurate - for example you have to make sure you work with unicode. I would strongly recommend investigating the lexical structure of the language, so you won't miss anything.
EDIT:
This solution can easily be extended to deal with identifiers with numbers, as well as with comments.
Small bug above - you need to handle \" differently than ", same with \' and '.

Wow, ok. Parsing is hard -- really hard -- to do right. Rolling your own java parser is going to be incredibly difficult to do right. You'll find there are a lot of edge cases you're just not prepared for. To really do it right, and handle all the edge cases, you'll need to write a real parser. A real parser is composed of a number of things:
A lexical analyzer to break the input up into logical chunks
A grammar to determine how to interpret the aforementioned chunks
The actual "parser" which is generated from the grammar using a tool like ANTLR
A symbol table to store identifiers in
An abstract syntax tree to represent the code you've parsed
Once you have all that, you can have a real parser. Of course you could skip the abstract syntax tree, but you need pretty much everything else. That leaves you with writing about 1/3 of a compiler. If you truly want to complete this project yourself, you should see if you can find an example for ANTLR which contains a preexisting java grammar definition. That'll get you most of the way there, and then you'll need to use ANTLR to fill in your symbol table.
Alternately, you could go with the clever solutions suggested by Little Bobby Tables (awesome name, btw Bobby).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

TCL String Manipulation with curly braces - java

Related

How to write a regular expression that can matches the whole function #Prompt (…) whatever written inside () even if it contains another ()

How to get rid of files with names bin$

What are the differences between the variations of JEXL?

Regex to find variables and ignore methods

Need some ideas on how to acomplish this in Java (parsing strings)

Categories

Resources