Parse Python Configuration File for Java Programs

Parse Python Configuration File for Java Programs - java

I have not found anything of this sort on Google and would like to know if there is a quicker way of doing the following:
I need to parse build scripts for Java programs which are written in Python. More specifically, I want to parse the dictionaries which are hard-coded into these build scripts.
For example, these scripts contain entries like:
config = {}
config["Project"] = \
{
"Name" : "ProjName",
"Version" : "v2",
"MinimumPreviousVersion" : "v1",
}
def actualCode ():
# Some code that actually compiles the relevant files
(The actual compiling is done via a call to another program, this script just sets the required options which I want to extract).
For example, I want to extract, "Name"="ProjName" and so on.
I am aware of the ConfigParser library which is part of Python, but that was designed for .ini files and hence has problems (throws exception and crashes) with actual python code which may appear in the build scripts which I am talking about. So using this library would mean that I would first have to read the file in and remove lines of the file which ConfigParser would object to.
Is there a quicker way than reading the config file in as a normal file and parsing it? I am looking for libraries which can do this. I don't mind too much which languages this libraries is in.

I was trying to solve the similar problem. I converted the directory into a JSON object so that I can query keys using JSON object in simplest way possible. This solution worked for multi-level key values pairs for me. I
Here is the algorithm.
Locate the config["key_name"] using a regular expression from string or file. Use the following regular expression
config(.*?)\\[(.*?)\\]
Get the data within curly brackets into a string. Use some stack based code since there could be nested brackets of type {} or [] in complex directories.
Replace the circular bracket, if any, "()" with square brackets "[]" and backslash "\" with blank character " " as follows
expression.replace('(', '[')
.replace(')', ']')
.replace('\\', ' ')
JSONObject json = (JSONObject) parser.parse(expression)
Here is your JSON object. You can use it the way you want.

Try Parboiled. It is written in Java and you write your grammars in... Java too.
Use the stack to store elements etc; its parser class is generic and you can get the final result out of it.

I know this is an old question, but I have found an incredibly useful config parser library for Java here.
It provides a simple function getValue("sectionName", "optionName") that allows you to get the value of an option inside a section.
[sectionName]
optionName = optionValue

Related

Java Regex for Finding a Pattern and Getting Value in It?

I am working on a plugin. I will parse HTML files. I have a naming convention like that:
<!--$include="a.html" -->
or
<!--$include="a.html"-->
is similar
According to this pattern(similar to server side includes) I want to search an HTML file.
Question is that:
Find that pattern and get value (a.html at my example, it is variable)
It should be like:
while(!notFinishedWholeFile){
fileName = findPatternFunc(htmlFile)
replaceFunc(fileName,something)
}
PS: Using regex at Java or implementing it different(as like using .indexOf()) I don't know which one is better. If regex is good at this situation by performence I want to use it.
Any ideas?

You mean like this?
<!--\$include=\"(?<htmlName>[a-z-_]*).html\"\s?-->

Read a file into a string then
str = str.replaceAll("(?<=<!--\\$include=\")[^\"]+(?=\" ?-->)", something);
will replace the filenames with the string something, then the string can be written back to the file.
(Note: this replaces any text inside the double quotes, not just valid filenames.)
If you want only want to replace filenames with the html extension, swap the [^\"]+ for [^.]+.html.
Using regex for this task is fine performance wise, but see e.g.
How to use regular expressions to parse HTML in Java? and Java Regex performance etc.

I have used that pattern:
"<!--\\$include=\"(.+)(.)(html|htm)\"-->"

Is there a tool to take a block of text and turn it into Java StringBuffer code?

I have several blocks of text that I need to be able to paste inline in my code for some unit tests. It would make the code difficult to read if they were externalized, so is there some web tool where I can paste in my text and it will generate the code for a StringBuffer that preserves it's formatting? Or even a String, I'm not that picky at this point.
This seems like a code generator like this must exist somewhere on the web. I tried to Google one, but I have yet to come up with a set of search terms that don't fill my results with Java examples and documentation.
I suppose I could write one myself, but I'm in a bit of a time crunch and would rather not duplicate effort.

If I understood it correctly, any text editor which supports regexps should make it an easy task. For instance Notepad++ - just replace ^(.+)$ with "\1"+, then copy the result to the code, remove the last + and add String s = to the beginning :)

If you want to externalize then, use a properties file or something like that to read the text.
If you are looking for a simple tool to break up your text into concatenated strings that are joined together by stringbuffer then, most modern IDE will help you do it automatically. Here's how.
Copy the block of text in the IDE
Surround it in double quotes and assign to a String type variable. (This step may not be required)
Enter carriage returns wherever you want to wrap the text to next line and the IDE will automatically break the literals, concatenate them using double quotes "" and add them together
All modern compilers will internally convert "addas" + "addasfdas" literals to a String using StringBuffer.

The squirrel SQL client has a function called convert to string buffer it works nice.

ANTLR - Embedding Java code, evaluate before or after?

I'm writing a simple scripting language on top of Java/JVM, where you can also embed Java code using the {} brackets. The problem is, how do I parse this in the grammar? I have two options:
Allow everything to be in it, such as: [a-z|a-Z|0-9|_|$], and go on
Get an extra java grammar and use that grammar to parse that small code (is it actually possible and efficient?)
Since option 2] is basically a double-check since when evaluating java code it's also being checked. Now my last question is -- is way that can dynamically execute java code also with objects which have been created at runtime?
Thanks,
William van Doorn

1] Allow everything to be in it, such as: [a-z|a-Z|0-9|_|$], and go on
You can't just do that: you'll have to account for opening and closing brackets.
2] Get an extra java grammar and use that grammar to parse that small code (is it actually possible and efficient?)
Yes that's possible. But I suggest you first get something working, and then worry about efficiency (is that really an issue here?).
... is way that can dynamically execute java code also with objects which have been created at runtime?
Yes, since Java 6, there's a way to compile source files dynamically. See the JavaCompiler API.

I propose enclose your Java code inside characters like '`' which are not used in Java code and barely present in literals.
JavaCode: '' ( EscapeSequence | ~('\\'|'') )* '`'
;
Use java.g provided by antlr examples to get definition of EscapeSequence ,...
The only catch is that you need to ask programmers to use code of this character ('`') if it is required to be as an literal.

Regarding Java Split Command CSV File Parsing

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.

I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.

opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.

If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882

Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

Is there a semi-automated way to perform string extraction for i18n?

We have a Java project which contains a large number of English-language strings for user prompts, error messages and so forth. We want to extract all the translatable strings into a properties file so that they can be translated later.
For example, we would want to replace:
Foo.java
String msg = "Hello, " + name + "! Today is " + dayOfWeek;
with:
Foo.java
String msg = Language.getString("foo.hello", name, dayOfWeek);
language.properties
foo.hello = Hello, {0}! Today is {1}
I understand that doing in this in a completely automated way is pretty much impossible, as not every string should be translated. However, we were wondering if there was a semi-automated way which removes some of the laboriousness.

What you want is a tool that replaces every expression involving string concatenations with a library call, with the obvious special case of expressions involving just a single literal string.
A program transformation system in which you can express your desired patterns can do this.
Such a system accepts rules in the form of:
lhs_pattern -> rhs_pattern if condition ;
where patterns are code fragments with syntax-category constraints on the pattern variables. This causes the tool to look for syntax matching the lhs_pattern, and if found, replace by the rhs_pattern, where the pattern matching is over langauge structures rather than text. So it works regardless of code formatting, indentation, comments, etc.
Sketching a few rules (and oversimplifying to keep this short)
following the style of your example:
domain Java;
nationalize_literal(s1:literal_string):
" \s1 " -> "Language.getString1(\s1 )";
nationalize_single_concatenation(s1:literal_string,s2:term):
" \s1 + \s2 " -> "Language.getString1(\s1) + \s2";
nationalize_double_concatenation(s1:literal_string,s2:term,s3:literal_string):
" \s1 + \s2 + \s3 " ->
"Language.getString3(\generate_template1\(\s1 + "{1}" +\s3\, s2);"
if IsNotLiteral(s2);
The patterns are themselves enclosed in "..."; these aren't Java string literals, but rather a way of saying to the multi-computer-lingual pattern matching engine
that the suff inside the "..." is (domain) Java code. Meta-stuff are marked with \,
e.g., metavariables \s1, \s2, \s3 and the embedded pattern call \generate with ( and ) to denote its meta-parameter list :-}
Note the use of the syntax category constraints on the metavariables s1 and s3 to ensure matching only of string literals. What the meta variables match on the left hand side pattern, is substituted on the right hand side.
The sub-pattern generate_template is a procedure that at transformation time (e.g., when the rule fires) evaluates its known-to-be-constant first argument into the template string you suggested and inserts into your library, and returns a library string index.
Note that the 1st argument to generate pattern is this example is composed entirely of literal strings concatenated.
Obviously, somebody will have to hand-process the templated strings that end up in the library to produce the foreign language equivalents.
You're right in that this may over templatize the code because some strings shouldn't be placed in the nationalized string library. To the extent that you can write programmatic checks for those cases, they can be included as conditions in the rules to prevent them from triggering. (With a little bit of effort, you could place the untransformed text into a comment, making individual transformations easier to undo later).
Realistically, I'd guess you have to code ~~100 rules like this to cover the combinatorics and special cases of interests. The payoff is that the your code gets automatically enhanced. If done right, you could apply this transformation to your code repeatedly as your code goes through multiple releases; it would leave previously nationalized expressions alone and just revise the new ones inserted by the happy-go-lucky programmers.
A system which can do this is the DMS Software Reengineering Toolkit. DMS can parse/pattern match/transform/prettyprint many langauges, include Java and C#.

Eclipse will externalize every individual string and does not automatically build substitution like you are looking for. If you have a very consistent convention of how you build your strings you could write a perl script to do some intelligent replacement on .java files. But this script will get quite complex if you want to handle
String msg = new String("Hello");
String msg2 = "Hello2";
String msg3 = new StringBuffer().append("Hello3").toString();
String msg4 = "Hello" + 4;
etc.
I think there are some paid tools that can help with this. I remember evaluating one, but I don't recall its name. I also don't remember if it could handle variable substitution in external strings. I'll try to find the info and edit this post with the details.
EDIT:
The tool was Globalyzer by Lingport. The website says it supports string externalization, but not specifically how. Not sure if it supports variable substitution. There is a free trial version so you could try it out and see.

Globalyzer has extensive capabilities to detect, manage and externalize strings and speeds up the work dramatically over looking at strings and externalizing one by one. You can filter the strings as well see them in context and then either externalize one by one, or in batches. It works for a wide variety of programming languages and resource types, of course including Java. Plus Globalyzer finds much more than embedded strings for your internationalization projects. You can read more at http://lingoport.com/globalyzer and there's links there to sign up for a demo account. Globalyzer was first built for performing big internationalization service projects and then over the years it's grown in to a full scale enterprise tool for making sure development gets and stays internationalized.

As well as Eclipse's string externalizer, which generates properties files, Eclipse has a warning for non-externalized strings, which is helpful for finding files that you haven't internationalized.
String msg = "Hello " + name;
gives the warning "Non-externalized string literal; it should be followed by //$NON-NLS-$". For strings that truly do belong in the code you can add an annotation (#SuppressWarnings("nls")) or you can add a comment:
String msg = "Hello " + name; //$NON-NLS-1$
This is very helpful for converting a project to proper internationalization.

I think eclipse has some option to externalize all strings into a property file.

You can use "Externalize String" method from eclipse.
Open your Java file in the editor and then click on "Externalize String" in the "Source" main menu. IT generates a properties file for you with all strngs you checked in the selection.
Hope this helps.

since everyone is weighing in an IDE i guess i'd better stand up for Netbeans :)
Tools-->Internationalisation-->Internationalisation Wizard
very handy..

InteliJ idea is another tool which have this feature.
Here's a link to the demo

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.