Java Buffered Reader detecting patterns in phrases

Java Buffered Reader detecting patterns in phrases - java

I want my program to be able to read in a file of java code and be able to identify the different methods. Is this possible to do with a buffered reader or should I be doing something different? Since methods can return any type (String/void/int/etc) and can be of many different types of modifier (private/public etc) I don't see how I can identify them easily.
public returnType methodName(String s){
How can I get my program to read that in and automatically detect that it is of the same format as:
private Set<String> nextstates(int newInt)

You can use regular expressions to search the file for method definitions. You would just read in the file line by line using a BufferedReader for example and search in every line for matches with the regex. One possible regex is the one suggested in the following post by Georgios Gousios

Related

How do I use Super CSV on a delimited file that does not use a quote character?

I'm using Super CSV to parse a pipe ("|") separated file. The file does not use "text qualifiers", or what Super CSV calls a quote character. The problem is that Super CSV requires a quote character. I don't see a way to skip this, or provide a null character. Currently I'm passing some wacky unicode character that hopefully never appears in the input file.
Is there a way to have Super CSV parse a file without using a quote character?

I'm guessing that you don't have control of how the file to parse is written, and that it will never contain embedded pipe characters in the data?
The solutions I can see are:
Use a character that will never appear in your file (as you've suggested). This is a little dodgy, but will work.
Supply your own Tokenizer when you construct your Reader (you can copy the Super CSV implementation and just remove the quoting functionality).
Send us a feature request and we'll consider adding it. It may be simply a case of adding another preference which disables quoting when parsing.
I'll have a think about this, and see if I can think of the best way to achieve this.

Use the delimiter character as the quote character. E.g.:
CsvPreference cp = new CsvPreference('|'/*quote char*/,'|'/*delimiter char*/, "\n");

Detect when file can't be created because of bad characters in name

Can anyone tell me how to cope with illegal file names in java? When I run the following on Windows:
File badname = new File("C:\\Temp\\a:b");
System.out.println(badname.getAbsolutePath()+" length="+badname.length());
FileWriter w = new FileWriter(badname);
w.write("hello world");
w.close();
System.out.println(badname.getAbsolutePath()+" length="+badname.length());
The output shows that the file has been created and has the expected length, but in C:\Temp all I can see is a file called "a" with 0 length. Where is java putting the file?
What I'm looking for is a reliable way to throw an error when the file can't be created. I can't use exists() or length() - what other options are there?

In that particular example, the data is being written to a named stream. You can see the data you've written from the command line as follows:
more < .\a:b
For information about valid file names, look here.
To answer your specific question: exists() should be sufficient. Even in this case, after all, the data is being written to the designated location - it just wasn't where you expected it to be! If you think this case will cause problems for your users, check for the presence of a colon in the file name.

I would suggest looking at Regular Expressions. They allow you to break apart a string and see if certain characteristics apply. The other method that would work is splitting the String into a char[], and then processing each point to see what's in it, and if it's legal... but I think RegEx would work much better.

You should take a look at Regular Expressions and create a pattern which will match any illegal character, something like this:
String fileName = "...";
Pattern pattern = Pattern.compile("[:;!?]");
Matcher matcher = pattern.match(fileName);
if (matcher.find())
{
//Do something when the file name has an illegal character.
}
Note: I have not tested this code, but it should be enough to get you on the right track. The above code will match any string which contains a :, ;, `!' and '?'. Feel free to add/remove as you see fit.

You can use File.renameTo(File dest);.

Get the file name first:
String fileName = fullPath.substring(fullPath.lastIndexOf('\\'), fullPath.length);
Create an array of all special chars not allowed in file names.
for each char in array, check if fileName contains it. I guess, Java has a pre-built API for it.
Check this.
Note: This solution assumes that parent directory exists

What is simplest way to read a file into String? [duplicate]

This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
I am trying to read a simple text file into a String. Of course there is the usual way of getting the input stream and iterating with readLine() and reading contents into String.
Having done this hundreds of times in past, I just wondered how can I do this in minimum lines of code? Isn't there something in java like String fileContents = XXX.readFile(myFile/*File*/) .. rather anything that looks as simple as this?
I know there are libraries like Apache Commons IO which provide such simplifications or even I can write a simple Util class to do this. But all that I wonder is - this is a so frequent operation that everyone needs then why doesn't Java provide such simple function? Isn't there really a single method somewhere to read a file into string with some default or specified encoding?

Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).
String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);
This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().
There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.
You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.
See also
Java Tutorials - I/O Essentials - Scanning and formatting
Related questions
Validating input using java.util.Scanner - has many examples of more typical usage
Third-party library options
For completeness, these are some really good options if you have these very reputable and highly useful third party libraries:
Guava
com.google.common.io.Files contains many useful methods. The pertinent ones here are:
String toString(File, Charset)
Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
... reads all of the lines from a file into a List<String>, one entry per line
Apache Commons/IO
org.apache.commons.io.IOUtils also offer similar functionality:
String toString(InputStream, String encoding)
Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
... as a (raw) List of String, one entry per line
Related questions
Most useful free third party Java libraries (deleted)?

From Java 7 (API Description) onwards you can do:
new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
Where filePath is a String representing the file you want to load.

You can use apache commons IO..
FileInputStream fisTargetFile = new FileInputStream(new File("test.txt"));
String targetFileStr = IOUtils.toString(fisTargetFile, "UTF-8");

This should work for you:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = new String(Files.readAllBytes(Paths.get("abc.java")));
}

Using Apache Commons IO.
import org.apache.commons.io.FileUtils;
//...
String contents = FileUtils.readFileToString(new File("/path/to/the/file"), "UTF-8")
You can see de javadoc for the method for details.

Don't write your own util class to do this - I would recommend using Guava, which is full of all kinds of goodness. In this case you'd want either the Files class (if you're really just reading a file) or CharStreams for more general purpose reading. It has methods to read the data into a list of strings (readLines) or totally (toString).
It has similar useful methods for binary data too. And then there's the rest of the library...
I agree it's annoying that there's nothing similar in the standard libraries. Heck, just being able to supply a CharSet to a FileReader would make life a little simpler...

Another alternative approach is:
How do I create a Java string from the contents of a file?
Other option is to use utilities provided open source libraries
http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html
Why java doesn't provide such a common util API ?
a) to keep the APIs generic so that encoding, buffering etc is handled by the programmer.
b) make programmers do some work and write/share opensource util libraries :D ;-)

Sadly, no.
I agree that such frequent operation should have easier implementation than copying of input line by line in loop, but you'll have to either write helper method or use external library.

I discovered that the accepted answer actually doesn't always work, because \\Z may occur in the file. Another problem is that if you don't have the correct charset a whole bunch of unexpected things may happen which may cause the scanner to read only a part of the file.
The solution is to use a delimiter which you are certain will never occur in the file. However, this is theoretically impossible. What we CAN do, is use a delimiter that has such a small chance to occur in the file that it is negligible: such a delimiter is a UUID, which is natively supported in Java.
String content = new Scanner(file, "UTF-8")
.useDelimiter(UUID.randomUUID().toString()).next();

Importing from Text File Java question

I decided to create a currency converter in Java, and have it so that it would pull the conversion values out of a text file (to allow for easy editability since these values are constantly changing). I did manage to do it by using the Scanner class and putting all the values into an ArrayList.
Now I'm wondering if there is a way to add comments to the text file for the user to read, which Scanner will ignore. "//" doesn't seem to work.
Thanks

Best way would be to read the file line by line using java.io.BufferedReader and scan every line for comments using String#startsWith() where in you searches for "//".
But have you considered using a properties file and manage it using the java.util.Properties API? This way you can benefit from a ready-made specification and API's and you can use # as start of comment line. Also see the tutorial at sun.com.

Scanner wont ignore anything, you will have to remove the comments from your data after you have read it in.

Yea, while ((currentLine = bufferedReader.readLine()) != null) is possibly the easiest, then perform your necessary tests. currentLine.split(regex) is also very handy for converting a line into an array of values using a delimiter.

With Java nio, you could do something like this. Assuming you want to ignore lines that start with "//" and end up with an ArrayList.
List<String> dataList;
Path path = FileSystems.getDefault().getPath(".", "data.txt");
dataList = Files.lines(path)
.filter(line -> !(line.startsWith("//")))
.collect(Collectors.toCollection(ArrayList::new));

What's the best way to have stringTokenizer split up a line of text into predefined variables

I'm not sure if the title is very clear, but basically what I have to do is read a line of text from a file and split it up into 8 different string variables. Each line will have the same 8 chunks in the same order (title, author, price, etc). So for each line of text, I want to end up with 8 strings.
The first problem is that the last two fields in the line may or may not be present, so I need to do something with stringTokenizer.hasMoreTokens, otherwise it will die messily when fields 7 and 8 are not present.
I would ideally like to do it in one while of for loop, but I'm not sure how to tell that loop what the order of the fields is going to be so it can fill all 8 (or 6) strings correctly. Please tell me there's a better way that using 8 nested if statements!
EDIT: The String.split solution seems definitely part of it, so I will use that instead of stringTokenizer. However, I'm still not sure what the best way of feeding the individual strings into the constructor. Would the best way be to have the class expecting an array, and then just do something like this in the constructor:
line[1] = isbn;
line[2] = title;

The best way is to not use a StringTokenizer at all, but use String's split method. It returns an array of Strings, and you can get the length from that.
For each line in your file you can do the following:
String[] tokens = line.split("#");
tokens will now have 6 - 8 Strings. Use tokens.length() to find out how many, then create your object from the array.

Regular expression is the way. You can convert your incoming String into an array of String using the split method
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)

Would a regular expression with capture groups work for you? You can certainly make parts of the expression optional.
An example line of data or three might be helpful.

Is this a CSV or similar file by any chance? If so, there are libraries to help you, for example Apache Commons CSV (link to alternatives on their page too). It will get you a String[] for each line in the file. Just check the array size to know what optional fields are present.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Buffered Reader detecting patterns in phrases - java

You can use regular expressions to search the file for method definitions. You would just read in the file line by line using a BufferedReader for example and search in every line for matches with the regex. One possible regex is the one suggested in the following post by Georgios Gousios

Related

How do I use Super CSV on a delimited file that does not use a quote character?

Detect when file can't be created because of bad characters in name

What is simplest way to read a file into String? [duplicate]

Importing from Text File Java question

What's the best way to have stringTokenizer split up a line of text into predefined variables

Categories

Resources