Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
im trying to create a restaurant system that will create food items and such from a menu.
ill be learning jdbc soon and im sure that would help but for now i think the simplest way is too create my menu in notepad.
whats the best way to line up and read from a notepad file like a menu.
please try speak clearly, im not exactly sure of all terminologies.
this one looks promising but ive no idea whats goin on.
/////////////////////////////////////////////////////////////////////
im still stuck with this.
ive decided too make a seperate mthod for reading the file.
ive tried every example i can think of. could someone just show me an example of how too define a files classpath.
if i type menu.txt it just doesnt work.
Have a look at Sun's Java Tutorial
Easiest option is to simply use the Apache Commons IO JAR and import the org.apache.commons.io.FileUtils class. There are many possibilities when using this class, but the most obvious would be as follows;
List<String> lines = FileUtils.readLines(new File("untitled.txt"));
It's that easy.
"Don't reinvent the wheel."
Can I ask what sort of content/data you will be reading from this file as there may be other (even simpler) possibilities?
i.e.
Properties
foo="bar"
String Tokens
foo,bar,fu,baz
Let me know if you require more details with any of the processes I've mentioned.
http://java.sun.com/j2se/1.5.0/docs/api/java/io/BufferedReader.html#readLine()
"Notepad" files are just text files, so you just read it in with a a Reader instance. Since Notepad supports windows Unicode, you may need to specify a charset of "UTF-16LE".
String filename = "myfile.txt";
BufferedReader reader = new BufferedReader(new FileReader(filename));
try{
String line;
//as long as there are lines in the file, print them
while((line = reader.readLine()) != null){
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
The basic idea is that you create a file reader object
FileReader fr = new FileReader('file.txt');
and then go over the file line by line parsing each line and saving the stuff to some internal data storage (Array, HashMap).
The while loop in the example you have does just this. The FileReader class will take care of the line ending for you, and will return null when there's no more lines to be read. What you need to do inside the while loop is to parse each line and separate the different bits of data (course name, price etc.) from each other.
EDIT: To parse the lines you would do something like the following. What is inside the while loop depends on how you format the menu files. The following works on the assumption that the menu files contains the price and the name of the course (in that order) separated by a comma on each line.
12.95$,Penne ala Arabiata
8.15$,Fish Soup
Notice that you can't use a comma in the price if you do this. You can of course use a semicolon as the separator between the data fields instead of a comma. The number of data fields is of course also up to you.
String line = "";
// read lines from file
while ((line = fr.readLine()) != null) {
// parse each line
tokens = line.split(",");
String price = tokens[0];
String courseName = tokens[1];
// extract all other information
}
In your final code you'll want to save the data fields into some structure instead of just extracting them from the file. Another thing to note is that the price is a String NOT a number because of the dollar sign. Should you wish to do any calculations with the prices you'll of course need the convert it to a number with parseFloat() or parseDouble().
And of course if you do use the csv (comma separated values) format, it's better to go for a csv library to do the parsing for you instead of writing the parser yourself.
http://opencsv.sourceforge.net/
Related
I was supposed to write a method that reads a DNA sequence in order to test some string matching algorithms on it.
I took some existing code I use to read text files (don't really know any others):
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
while((line = br.readLine()) != null) {
seq += line;
}
br.close();
}
catch(FileNotFoundException e) { e.printStackTrace(); }
catch(IOException e) { e.printStackTrace(); }
This seems to work just fine for small text files with ~3000 characters, but it takes forever (I just cancelled it after 10 minutes) to read files containing more than 45 million characters.
Is there a more efficient way of doing this?
One thing I notice is that you are doing seq+=line. seq is probably a String? If so, then you have to remember that strings are immutable. So in fact what you are doing is creating a new String each time you are trying to append a line to it. Please use StringBuilder instead. Also, if possible you don't want to do create a string and then process. That way you have to do it twice. Ideally you want to process as you read, but I don't know your situation.
The main element slowing your progress is the "concatenation" of the String seq and line when you call seq+=line. I use quotes for concatenation because in Java, Strings cannot be modified once they are created (e.g. immutable as user1598503 mentioned). Initially, this is not an issue, as the Strings are small, however once the Strings become very long, e.e. hundreds of thousands of characters, memory must be reallocated for the new String, which takes quite a bit of time. StringBuilder will allow you to do these concatenations in place, meaning you will not be creating a new Object every single time.
Your problem is not that the reading takes too much time, but the concatenating takes too much time. Just to verify this I ran your code (didn't finish) and then simply comented line 8 (seq += line) and it ran in under a second. You could try using seq = seq.concat(line) since it has been reported to be quite a bit faster most of the times, but I tried that too and didn't ran under 1-2 minutes (for a 9.6mb input file). My solution would be to store your lines in an ArrayList (or a container of your choice). The ArrayList example worked in about 2-3 seconds with the same input file. (so the content of your while loop would be list.add(line);). If you really, really want to store your entire file in a string you could do something like this (using the Scanner class):
String content = new Scanner(new File("input")).useDelimiter("\\Z").next();
^^This works in a matter of seconds as well. I should mention that "\Z" is the end of file delimiter so that's why it reads the whole thing in one swoop.
I read a text file containing list of words with their tags and put them as an ArrayList in an a wrapping ArrayList (ArrayList).
[[1, that, that, that, DT, DT], [2, table, table, table, NN, NN]]
Now I want to write the in a text file in a same format as follows:
1 that that that DT DT
2 table table table NN NN
each of the above rows is an ArrayList with 6 columns.
the following code return a file with Ԁ inside.
public void setPPOSOfWordInDevelopmentList(ArrayList<ArrayList> trainingList){
try{
FileOutputStream streamFile = new FileOutputStream("developmentFile.txt");
ObjectOutputStream streamFileWriter = new ObjectOutputStream(streamFile);
for(ArrayList word: developmentWordsList){
String inputWord = (String)word.get(1);
extractTag(inputWord,trainingList);
String extractedPPOSofWord =(String)findMaxTag().get(1);
word.set(5, extractedPPOSofWord);
}
streamFileWriter.close();
System.out.println(developmentWordsList);
}
catch(Exception e){
System.out.println("Something went wrong, check the code");
}
}
this code is coupled with some others so it is not easy to change the format of objects returned by the functions.
If you want to write a simple text file, would be better if you use a BufferedWriter. For your content, you can format it in a StringBuffer or a StringBuilder if it is too long. Here in this post, I replied to a question related with the kind of formatting you're trying to make. But you should need to adapt it according to your format and the logic of using a wrapping array.
Export array values to csv file java
I think, the loop or "enhanced for" statement should be used as something like:
for (ArrayList<String> innerArray: wrapperArray) {
for (String word : innerArray) {
//Adapt to your required format using a StringBuilder
}
}
//Here at the end save the content of your StringBuilder or StringBuffer using the BufferedWriter.
Hope you can get an idea on how to achieve this. Best regards :)
What you want sounds eerily like a standard CSV file. This stackoverflow thread will set you straight on how to parse that sort of content. I would strongly recommend that you refactor along the lines of a CSV file instead of using the ObjectInput/OutputStreams. It'll be easier to maintain and you'll be able to use tools like Excel and OpenOffice Calc to view your files when debugging.
If you are certain to use custom format file you can use formatted printing and add padding accordingly. It's pretty easy:
for (ArrayList<String> list : trainingList) {
writeToStream(
String.format(
%s, %-5s, %-5s, %-5s,
list.getAt(0),list.getAt(1),list.getAt(2),list.getAt(3)
);
}
}
This should work if your strings aren't longer than five characters. Just keep in mind that blank characters are bad demiliters and you will face indentation problems if you use other than monospaced fonts.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need to split a text file into various fields. I can control the way in which the values are divided however, since there are occasionally commas within each value, I can't use CSV. What is the best way to go about importing the file? Would TAB be a better delimiter?
The issue lies within Lippincott, Williams & Wilkins. That's all one field.
Example data
History of Education Quarterly,1748-5959,na,Wiley-Blackwell, History of Political Economy,1527-1919,0018-2702,Duke University Press, History of Political Economy - Annual Supplement,na,missing, History Teacher,0018-2745,0018-2745,The Society for History Education, History Today,na,0018-2753,History Today Limited, Home Healthcare Nurse,na,0884-741X,Lippincott, Williams & Wilkins, Hospitality Law,na,0889-5414,LRP Publications, Hudson Review,na,0018-702X,Hudson Review Incorporated, Humanist - DC,na,0018-7399,American Humanist Associatioin, Idealistic Studies,na,0894-5373,F&W Media,
Instead of hard coding a delimiter, why not just make it a configurable parameter, then if the input should ever change or something like that, you can easily adapt without having to rewrite.
If that's not an option, TAB or | seem like reasonable options without knowing what the input is
You choose any separator and it would break the day one of your data values have them. So, why not embrace a CSV library that would make sure the separators are escaped when required and read them back as well effortlessly.
Here's how you would do it with OpenCSV
String[] values = {"one", "two,three", "four , five"};
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
writer.writeNext(values);
writer.close();
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] entries;
// reading just the first line
if ((entries = reader.readNext()) != null) {
System.out.println(entries[0] + ", " + entries[1] + ", " + entries[2]);
}
reader.close();
You could actually use any custom separator with OpenCSV like a Tab \t if you want.
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), '\t');
But using CSVs makes your data files compatible with many other tools as well. So, it entirely depends on your compatibility requirements for your data files if any.
this is my first post here. I'm excited to finally take part.
I'm working on a project where I'm parsing obscure files types. I need to be able to parse word (which I've already done), .sbs, .day, .cmp, and more. All of these types can be opened simply with notepad and displayed.
Since I'm so new to this stuff, is there a way I can use some generic library (or two) to open all of these up? And if so what library would it be?
What's a best practice in this sort of circumstance?
Thanks!
You could use the Apache Commons IO library. FileUtils class has several methods that receives the file path and optionlly the file encoding.
If you just want to only read text files and save them to a text variable
java.io.File file = new java.io.File("C:\\dir\\file.cmp");
String allWordAndLines = org.apache.commons.io.FileUtils.readFileToString(file);
If you want each line separately and store them in a collection:
java.util.List<String> lines = org.apache.commons.io.FileUtils.readLines(file);
for(String line : lines) {
// do something with line
}
To specify the encoding, you need to add another parameter:
org.apache.commons.io.FileUtils.readFileToString(file, "UTF-8");
org.apache.commons.io.FileUtils.readLines(file, "Cp1252");
Java include several classes for read files, see more in http://docs.oracle.com/javase/tutorial/essential/io/index.html
I hope this can help you if you are looking for only to have your text file is available in memory.
I am reading about 600 text files, and then parsing each file individually and add all the terms to a map so i can know the frequency of each word within the 600 files. (about 400MB).
My parser functions includes the following steps (ordered):
find text between two tags, which is the relevant text to read in each file.
lowecase all the text
string.split with multiple delimiters.
creating an arrayList with words like this: "aaa-aa", then adding to the string splitted above, and discounting "aaa" and "aa" to the String []. (i did this because i wanted "-" to be a delimiter, but i also wanted "aaa-aa" to be one word only, and not "aaa" and "aa".
get the String [] and map to a Map = new HashMap ... (word, frequency)
print everything.
It is taking me about 8min and 48 seconds, in a dual-core 2.2GHz, 2GB Ram. I would like advice on how to speed this process up. Should I expect it to be this slow? And if possible, how can I know (in netbeans), which functions are taking more time to execute?
unique words found: 398752.
CODE:
File file = new File(dir);
String[] files = file.list();
for (int i = 0; i < files.length; i++) {
BufferedReader br = new BufferedReader(
new InputStreamReader(
new BufferedInputStream(
new FileInputStream(dir + files[i])), encoding));
try {
String line;
while ((line = br.readLine()) != null) {
parsedString = parseString(line); // parse the string
m = stringToMap(parsedString, m);
}
} finally {
br.close();
}
}
EDIT: Check this:
![enter image description here][1]
I don't know what to conclude.
EDIT: 80% TIME USED WITH THIS FUNCTION
public String [] parseString(String sentence){
// separators; ,:;'"\/<>()[]*~^ºª+&%$ etc..
String[] parts = sentence.toLowerCase().split("[,\\s\\-:\\?\\!\\«\\»\\'\\´\\`\\\"\\.\\\\\\/()<>*º;+&ª%\\[\\]~^]");
Map<String, String> o = new HashMap<String, String>(); // save the hyphened words, aaa-bbb like Map<aaa,bbb>
Pattern pattern = Pattern.compile("(?<![A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû-])[A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû]+-[A-Za-zÁÉÍÓÚÀÃÂÊÎÔÛáéíóúàãâêîôû]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher(sentence);
// Find all matches like this: ("aaa-bb or bbb-cc") and put it to map to later add this words to the original map and discount the single words "aaa-aa" like "aaa" and "aa"
for(int i=0; matcher.find(); i++){
String [] tempo = matcher.group().split("-");
o.put(tempo[0], tempo[1]);
}
//System.out.println("words: " + o);
ArrayList temp = new ArrayList();
temp.addAll(Arrays.asList(parts));
for (Map.Entry<String, String> entry : o.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
temp.add(key+"-"+value);
if(temp.indexOf(key)!=-1){
temp.remove(temp.indexOf(key));
}
if(temp.indexOf(value)!=-1){
temp.remove(temp.indexOf(value));
}
}
String []strArray = new String[temp.size()];
temp.toArray(strArray);
return strArray;
}
600 files, each file about 0.5MB
EDIT3#- The pattern is no longer compiling each time a line is read. The new images are:
2:
Be sure to increase your heap size, if you haven't already, using -Xmx. For this app, the impact may be striking.
The parts of your code that are likely to have the largest performance impact are the ones that are executed the most - which are the parts you haven't shown.
Update after memory screenshot
Look at all those Pattern$6 objects in the screenshot. I think you're recompiling the pattern a lot - maybe for every line. That would take a lot of time.
Update 2 - after code added to question.
Yup - two patterns compiled on every line - the explicit one, and also the "-" in the split (much cheaper, of course). I wish they hadn't added split() to String without it taking a compiled pattern as an argument. I see some other things that could be improved, but nothing else like the big compile. Just compile the pattern once, outside this function, maybe as a static class member.
Try to use to single regex that has a group that matches each word that is within tags - so a single regex could be used for your entire input and there would be not separate "split" stage.
Otherwise your approach seems reasonable, although I don't understand what you mean by "get the String [] ..." - I thought you were using an ArrayList. In any event, try to minimize the creation of objects, for both construction cost and garbage collection cost.
Is it just the parsing that's taking so long, or is it the file reading as well?
For the file reading, you can probably speed that up by reading the files on multiple threads. But first step is to figure out whether it's the reading or the parsing that's taking all the time so you can address the right issue.
Run the code through the Netbeans profiler and find out where it is taking the most time (right mouse click on the project and select profile, make sure you do time not memory).
Nothing in the code that you have shown us is an obvious source of performance problems. The problem is likely to be something to do with the way that you are parsing the lines or extracting the words and putting them into the map. If you want more advice you need to post the code for those methods, and the code that declares / initializes the map.
My general advice would be to profile the application and see where the bottlenecks are, and use that information to figure out what needs to be optimized.
#Ed Staub's advice is also sound. Running an application with a heap that is too small can result serious performance problems.
If you aren't already doing it, use BufferedInputStream and BufferedReader to read the files. Double-buffering like that is measurably better than using BufferedInputStream or BufferedReader alone. E.g.:
BufferedReader rdr = new BufferedReader(
new InputStreamReader(
new BufferedInputStream(
new FileInputStream(aFile)
)
/* add an encoding arg here (e.g., ', "UTF-8"') if appropriate */
)
);
If you post relevant parts of your code, there'd be a chance we could comment on how to improve the processing.
EDIT:
Based on your edit, here are a couple of suggestions:
Compile the pattern once and save it as a static variable, rather than compiling every time you call parseString.
Store the values of temp.indexOf(key) and temp.indexOf(value) when you first call them and then use the stored values instead of calling indexOf a second time.
It looks like its spending most of it time in regular expressions. I would firstly try writing the code without using a regular expression and then using multiple threads as if the process still appears to be CPU bound.
For the counter, I would look at using TObjectIntHashMap to reduce the overhead of the counter. I would use only one map, not create an array of string - counts which I then use to build another map, this could be a significant waste of time.
Precompile the pattern instead of compiling it every time through that method, and rid of the double buffering: use new BufferedReader(new FileReader(...)).