How to remove particular string from .asp file, using Java? - java

I'm currently writing something which is validating our vbscript files. Right at the start I wish to remove all lines of code which are comments. I was expecting to be able to use the "'" (comment symbol in vbscript) and '\n'. However, when I write the content of the file to screen, the new lines are not formatting. Does this mean there are actually no new lines in the original vbscript file and if not, how could I remove comments?

first read whole file in string example
then use regex or simply substring for removing extra syntax

How are you parsing the file? Are you also taking the '\r' into consideration when removing the comments? Or maybe you are accidentally removing all newline characters.
I would create some state flags to tell the parser when I was in a comment or not.

Related

Search "if condition" in .java file

I want to search all if conditions in .java file.
I am using BufferedReader to read file and pattern to search condition.
My program searching all if but when my file look this:
// if{}
I get bad result.
I want to get only valid if conditions (also if{} and if {} - between if and { is space), without conditions in comments.
How should it look regex?
Full code: http://pastebin.com/55RMfwg2
^(?!a\\/\\/) *if *\\{(.|\n)*}
This regex will look for if without // before it and with optional space after it,
it will also catch the closing bracket } and allow new line character between the brackets.
Moreover it will accept spaces before the if.
If multi-line comments /* */ should be skipped also, I think as other people wrote, it will be easier to just clean the file before.
There are many websites that can help you to find the exact regex, i will recommend RegExr.

How to prevent to write characters like into a string?

I have a problem with extracting text from scientific articles.
I use PDFBox to extract text from pdf. The
problem is not from extraction process but with some special math notations that leads to problem when I want to write the extracted text into an XML file, the special character which is not extracted correctly will cause trouble. Instead of ,  or other similar HTML codes will be inserted to the XML file and ruins the whole file. How to fix this issue?
The HTML codes that I mean are look like these and at the moment, number 218 is the trouble. But I guess for different math notations, different HTML codes will be replaced and cause the problem afterward.
I have already tried following string cleanings but didn't help:
nextWord=nextWord.replaceAll("[-+.^:,]", "");
nextWord=nextWord.replaceAll("\\s+", "");
nextWord=nextWord.replaceAll("[^\\x00-\\x7F]", "");
You may write a pre-check before writing each line to a file, to check whether the text does not contain ambiguous characters. Below pattern contains all basic characters in any given textbook. You may add or remove as per your content.
public boolean isValidCharacters(String word){
String pattern= "^[a-zA-Z0-9~##$^*()_+={}|\\,.?: -]*$";
return word.matches(pattern);
}
You can write something yourself with a regex or if you have other String manipulations to do the Apache StringUtils are really great. It has a isAlpha() isNumeric() method that is easy to implement.
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

Removing comments from code character by character [Java]

I need to remove comments from code, but in this case I'll have to do it without using
System.out.println(sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", ""));
The program needs to check the code character by character to look for "/" and then proceed to check if the next character is "/" or "*".
I'm looking for a good way to read through the code and check characters letter by letter
This is a classic problem given to new learners in Java. I would suggest to go for a simple approach as it is intended to help you practice your coding skills
Read the java source code as a file in your program char by char.
Search for comments beginning. In this case, there are 2, /* and //.
Open a string buffer and start writing the read contents into it.
If its /*, then don't write it in buffer. Keep on moving to next character till you find */.
Repeat till end of file is reached.
If single line comments need to be removed, then same algorithm can be followed till you get a new line character.
If you need help in reading from file char by char, refer to Java documentation.
When end of file is reached, then write the string buffer back to the file.

How do I use Super CSV on a delimited file that does not use a quote character?

I'm using Super CSV to parse a pipe ("|") separated file. The file does not use "text qualifiers", or what Super CSV calls a quote character. The problem is that Super CSV requires a quote character. I don't see a way to skip this, or provide a null character. Currently I'm passing some wacky unicode character that hopefully never appears in the input file.
Is there a way to have Super CSV parse a file without using a quote character?
I'm guessing that you don't have control of how the file to parse is written, and that it will never contain embedded pipe characters in the data?
The solutions I can see are:
Use a character that will never appear in your file (as you've suggested). This is a little dodgy, but will work.
Supply your own Tokenizer when you construct your Reader (you can copy the Super CSV implementation and just remove the quoting functionality).
Send us a feature request and we'll consider adding it. It may be simply a case of adding another preference which disables quoting when parsing.
I'll have a think about this, and see if I can think of the best way to achieve this.
Use the delimiter character as the quote character. E.g.:
CsvPreference cp = new CsvPreference('|'/*quote char*/,'|'/*delimiter char*/, "\n");

Parsing of data structure in a plain text file

How would you parse in Java a structure, similar to this
\\Header (name)\\\
1JohnRide 2MarySwanson
1 password1
2 password2
\\\1 block of data name\\\
1.ABCD
2.FEGH
3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc
Suppose, it comes from a text buffer (plain file).
Each line of text is "\n" - limited. Space is used between the words.
The structure is more or less defined. Ambuguity may sometimes be, though, case
number of fields in each line of information may be different, sometimes there may not
be some block of data, and the number of lines in each block may vary as well.
The question is how to do it most effectively?
First solution that comes to my head is to use regular expressions.
But are there other solutions? Problem-oriented? Maybe some java library already written?
Check out UTAH: https://github.com/sonalake/utah-parser
It's a tool that's pretty good at parsing this kind of semi structured text
As no one recommended any library, my suggestion would be : use REGEX.
From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).
This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).
First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?
After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.
If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).
Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).
If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.
E.g., the password can be gotten from your password line like this:
final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());

Categories