I'm creating a personal movie database thingy and i want to populate a combo box with movie titles from IMDB, IMDB releases this information in text files, so i'm trying to populate it from those text files. Ive got it working, but since the text file is VERY large, almost 80 000 rows with a title on every row... it takes way to long to load.
This might be the wrong way to go about doing this, someone knows how to solve it or what I should do?
The code for reading the file and return the String [] for the combo box
public String [] getMoviesFromFile() throws IOException{
BufferedReader input = new BufferedReader(new FileReader(filePath));
try {
String line = null;
while (( line = input.readLine()) != null){
strings.add(line);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
finally {
input.close();
}
String[] lineArray = strings.toArray(new String[]{});
return lineArray;
}
The problem your having is your blocking the Event Dispatching Thread, which will make your application come to a grinding halt while the file is begin read. You should never perform time consuming or blocking actions in the EDT.
You need to off load the loading to a background thread and load the list there, then re-sync the values back to the EDT (you should never create or modify any UI element out side of the EDT)
Have a look at Concurrency in Swing. In your case, I'd recommend taking a look at SwingWorker as it's designed to meet your actual requirements.
File I/O may be to slow for your needs, I might suggest you look at loading the text file into a SQL style database, which may give faster results.
I'd suggest looking at HyperSQL or H2 which are both pure Java SQL databases designed to be small and lightweight, but which also run in single user mode, meaning you don't need to install a fully fledged SQL server in order to use them
Related
I'm actually not that great and am relatively new at Java. I wish to receive input from the user, and want to input this data into an external application.
This application processes the data and provides an output. I wish to retrieve this output using the Java code.
I have attempted in doing this but, I haven't got the slightest idea on how to start this script.
Nothin' on the internet seems to answer this question. If you have any idea or any new functions that can be useful, please help me in doing so.
Since I'm starting from ground zero, any help is appreciated.
Thanks so much.
To communicate with an external application you need to first define the communication way. For example:
Will this application read the output from a file?
If that statement it's true, then you need to learn serialization:
Will this application read the input from the standard output (like a command-line application)
If that statement it's true then you need to send with System.out.print().
Will this application get the data over HTTP.
Then you need to learn about REST and or RPC architectures.
Assuming that it will be a command-line application, then you could use something like this:
public class App
{
public static void main(String... args)
{
// You need to implement your business logic here. Not just print whatever the user passes as arguments of the command-line.
for(String arg : args)
{
System.out.print(arg);
}
}
}
There's a lot going on here but I'll suggest an example for each part of this question and assume this is just going to be written in Java, and suggesting an iterative design/development approach.
receive input from the user::getting arguments from the command line can work, but I think most users want to use familiar user interfaces like excel to input large amounts of data. Have them export files to .csv or look into reading excel files directly with apache poi. The latter is not for beginners, but not terrible to figure out or find examples. The former should be easy to figure out if you look into reading files and splitting them line by line on the delimiter. Here's an example of that:
try (BufferedReader reader = new BufferedReader(new FileReader(new File("user_input.csv"))) {
String currentLine = reader.readLine();
while (currentLine != null) {
String splitLine[] = currentLine.split(","); //choose delimiter here
//process cells as needed
//write output somewhere so other program can read it later
currentLine = reader.readLine();
}
}
catch (IOException ex) {
System.out.println(ex.getMessage()); //maybe write to an error log
System.exit(1);
}
"input" data to other app::you can use pipes if you're at the command line. but I'd recommend you write to a file and have the other app read it. here's an expansion of the previous code snippet showing how to write to a file as that might be more practical and easier to log/archive/debug.
try (BufferedReader reader = new BufferedReader(new FileReader(new File("user_input.csv")));
BufferedWriter writer = new BufferedWriter(new FileWriter(new File("process_me.csv")))) {
String currentLine = reader.readLine();
while (currentLine != null) {
String splitLine[] = currentLine.split(","); //choose delimiter here
//process cells as needed
writer.write(processed_stuff);
currentLine = reader.readLine();
}
}
catch (IOException ex) {
System.err.println(ex.getMessage());
System.exit(1);
}
Then retrieving output::can just be reading another file with another Java program. This way you're communicating between programs using the file system. You must agree upon file formats and directories though. And you'll be limited to having both programs on the same server.
To make this at scale, you could use web services assuming the other program you're making requests to is a web service or has one wrapped around it. You can send your file and receive some response using URLConnection. This is where things will get much more complex, but now everything in your new program is just one Java program and the other code can live on another server.
Building the app first with those "intermediate" files between the user input code, the external code, and the final code will help you focus on perfecting the business logic, then you can worry about just communication over the network.
I'm already looking for a solution for quite a while, but I'm still struggling with concurrency and parallelization.
Background: There's an ETL process and I get a big csv (up to over a million rows). In production there will be live updates, too. I want to spell check each row. For that I use an adapted LanguageTool. The check method (with my customization inside) takes quite a while. I want to speed it up.
One aspect is of course the method itself, but I also want to simply check multiple rows at a time. The order of the rows is not important. The result is the corrected text and it should be written to a new csv file for further processing.
I found that ExecutorService might be a reasonable choice, but since I'm not that familiar with it, some help would be appreciated.
That's how I use it so far in the ETL process:
private static SpellChecker spellChecker;
static {
SpellChecker tmp = null;
try {
tmp = new SpellChecker(...);
} catch (Exception e) {
e.printStackTrace();
}
spellChecker = tmp;
}
public static String spellCheck(String input) {
String output = input.replace("</li>", ".");
output = searchAVC.removeHtml(output);
try {
output = spellChecker.correctText(output);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return output;
}
My spellChecker is a custom library here and I create a static object of it (because instanciation of LanguageTool takes some time).
I want to parellize the execution of spellCheck.
I've already read stuff like this:
https://www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams
What is the easiest way to parallelize a task in java?
Write to text file from multiple threads?
I don't really know to combine all this information. What do I have to concern when reading the file? Writing the file? Processing the rows?
Create Reader class responsible is reading from File.
Create Writer class responsible is writing from file.
Create processor class responsible is processing.
Now create a partitionner which responsible is read chunk by chunk and dispatch the batch of row to reader an reader will use processor to process and sent batch of row to writer.
To run create a thread pool to execute in multi thread environment.
There are many examples on the internet showing how to use StandardOpenOption.DELETE_ON_CLOSE, such as this:
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);
Other examples similarly use Files.newOutputStream(..., StandardOpenOption.DELETE_ON_CLOSE).
I suspect all of these examples are probably flawed. The purpose of writing a file is that you're going to read it back at some point; otherwise, why bother writing it? But wouldn't DELETE_ON_CLOSE cause the file to be deleted before you have a chance to read it?
If you create a work file (to work with large amounts of data that are too large to keep in memory) then wouldn't you use RandomAccessFile instead, which allows both read and write access? However, RandomAccessFile doesn't give you the option to specify DELETE_ON_CLOSE, as far as I can see.
So can someone show me how DELETE_ON_CLOSE is actually useful?
First of all I agree with you Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE) in this example the use of DELETE_ON_CLOSE is meaningless. After a (not so intense) search through the internet the only example I could find which shows the usage as mentioned was the one from which you might got it (http://softwarecave.org/2014/02/05/create-temporary-files-and-directories-using-java-nio2/).
This option is not intended to be used for Files.write(...) only. The API make is quite clear:
This option is primarily intended for use with work files that are used solely by a single instance of the Java virtual machine. This option is not recommended for use when opening files that are open concurrently by other entities.
Sorry I can't give you a meaningful short example, but see such file like a swap file/partition used by an operating system. In cases where the current JVM have the need to temporarily store data on the disc and after the shutdown the data are of no use anymore. As practical example I would mention it is similar to an JEE application server which might decide to serialize some entities to disc to freeup memory.
edit Maybe the following (oversimplified code) can be taken as example to demonstrate the principle. (so please: nobody should start a discussion about that this "data management" could be done differently, using fixed temporary filename is bad and so on, ...)
in the try-with-resource block you need for some reason to externalize data (the reasons are not subject of the discussion)
you have random read/write access to this externalized data
this externalized data only is of use only inside the try-with-resource block
with the use of the StandardOpenOption.DELETE_ON_CLOSE option you don't need to handle the deletion after the use yourself, the JVM will take care about it (the limitations and edge cases are described in the API)
.
static final int RECORD_LENGTH = 20;
static final String RECORD_FORMAT = "%-" + RECORD_LENGTH + "s";
// add exception handling, left out only for the example
public static void main(String[] args) throws Exception {
EnumSet<StandardOpenOption> options = EnumSet.of(
StandardOpenOption.CREATE,
StandardOpenOption.WRITE,
StandardOpenOption.READ,
StandardOpenOption.DELETE_ON_CLOSE
);
Path file = Paths.get("/tmp/enternal_data.tmp");
try (SeekableByteChannel sbc = Files.newByteChannel(file, options)) {
// during your business processing the below two cases might happen
// several times in random order
// example of huge datastructure to externalize
String[] sampleData = {"some", "huge", "datastructure"};
for (int i = 0; i < sampleData.length; i++) {
byte[] buffer = String.format(RECORD_FORMAT, sampleData[i])
.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
sbc.position(i * RECORD_LENGTH);
sbc.write(byteBuffer);
}
// example of processing which need the externalized data
Random random = new Random();
byte[] buffer = new byte[RECORD_LENGTH];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
for (int i = 0; i < 10; i++) {
sbc.position(RECORD_LENGTH * random.nextInt(sampleData.length));
sbc.read(byteBuffer);
byteBuffer.flip();
System.out.printf("loop: %d %s%n", i, new String(buffer));
}
}
}
The DELETE_ON_CLOSE is intended for working temp files.
If you need to make some operation that needs too be temporaly stored on a file but you don't need to use the file outside of the current execution a DELETE_ON_CLOSE in a good solution for that.
An example is when you need to store informations that can't be mantained in memory for example because they are too heavy.
Another example is when you need to store temporarely the informations and you need them only in a second moment and you don't like to occupy memory for that.
Imagine also a situation in which a process needs a lot of time to be completed. You store informations on a file and only later you use them (perhaps many minutes or hours after). This guarantees you that the memory is not used for those informations if you don't need them.
The DELETE_ON_CLOSE try to delete the file when you explicitly close it calling the method close() or when the JVM is shutting down if not manually closed before.
Here are two possible ways it can be used:
1. When calling Files.newByteChannel
This method returns a SeekableByteChannel suitable for both reading and writing, in which the current position can be modified.
Seems quite useful for situations where some data needs to be stored out of memory for read/write access and doesn't need to be persisted after the application closes.
2. Write to a file, read back, delete:
An example using an arbitrary text file:
Path p = Paths.get("C:\\test", "foo.txt");
System.out.println(Files.exists(p));
try {
Files.createFile(p);
System.out.println(Files.exists(p));
try (BufferedWriter out = Files.newBufferedWriter(p, Charset.defaultCharset(), StandardOpenOption.DELETE_ON_CLOSE)) {
out.append("Hello, World!");
out.flush();
try (BufferedReader in = Files.newBufferedReader(p, Charset.defaultCharset())) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println(Files.exists(p));
This outputs (as expected):
false
true
Hello, World!
false
This example is obviously trivial, but I imagine there are plenty of situations where such an approach may come in handy.
However, I still believe the old File.deleteOnExit method may be preferable as you won't need to keep the output stream open for the duration of any read operations on the file, too.
I have been trying a hundred different methods to solve my problem, but for some reason they simple won't work.
I'm trying to make a quick and dirty way, for my application to be persistent. It basically got a lot of objects it needs to save when destroying, so I thought I would make it put the objects into an ArrayList, and then write the ArrayList to the file using an ObjectOutputStream.
public void onStop() {
super.onStop();
Log.d("Event", "Stopped");
FileOutputStream fos = null;
ObjectOutputStream oos = null;
try {
fos = openFileOutput("Flights", MODE_WORLD_WRITEABLE);
oos = new ObjectOutputStream(fos);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
ArrayList<Flight> alFlightList = new ArrayList<Flight>();
Iterator it = flightMap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry)it.next();
alFlightList.add((Flight) pairs.getValue());
}
try {
oos.writeObject(alFlightList);
oos.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
Log.d("Info", "File created!");
}
}
I got a similar algorithm for reading it out again, but it complains about there not being any file to read from.
I know using files for persistence is not the best practice, but this is as previously mentioned, supposed to have been a quick and dirty solution. (But the time I have used on it now, might as well have been spent on making a database. ._.)
Thanks!
From the documentation on Saving Persistent State,
There are generally two kinds of persistent state than an activity
will deal with: shared document-like data (typically stored in a
SQLite database using a content provider) and internal state such as
user preferences.
For content provider data, we suggest that activities use a "edit in
place" user model. That is, any edits a user makes are effectively
made immediately without requiring an additional confirmation step.
Supporting this model is generally a simple matter of following two
rules:
When creating a new document, the backing database entry or file for
it is created immediately. For example, if the user chooses to write a
new e-mail, a new entry for that e-mail is created as soon as they
start entering data, so that if they go to any other activity after
that point this e-mail will now appear in the list of drafts. When an
activity's onPause() method is called, it should commit to the backing
content provider or file any changes the user has made. This ensures
that those changes will be seen by any other activity that is about to
run. You will probably want to commit your data even more aggressively
at key times during your activity's lifecycle: for example before
starting a new activity, before finishing your own activity, when the
user switches between input fields, etc.
So if you want to do it "correctly", I would save the data in onPause... and I'd probably save the state using an SQLite database of some sorts. You should also perform file I/O on a separate thread using an AsyncTask, as this sort of thing could potentially block the UI thread and crash your app.
If you want a quick and dirty way to do it (i.e. if you are not releasing this application on the Android market), then I am betting that the problem is that you are trying to perform the file I/O in onDestroy, which is not guaranteed to be called. This is another reason to perform the file reads/writes in onPause.
The last thing I would suggest is reading through the documentation on internal/external storage. It could be that you aren't writing to the correct directory because you don't have the file permissions to do so. You should perform the file I/O like so:
String FILENAME = "FLIGHTS";
FileOutputStream fos = openFileOutput(FILENAME, Context.MODE_PRIVATE);
fos.write(...);
fos.close();
Replace Flight in /sdcard/Flights or else it creates a file in null space. :)
I am not sure if it will work.
Why don't you use database? and call back all the setting from the database when the app is created or restarted?
You can also use onpause() or onstop() method to store all the data into the database.
I'm trying to read in a large (700GB) file and incrementally process it, but the network I'm working on will occasionally go down, cutting off access to the file. This throws a java.io.IOException telling me that "The specified network name is no longer available". Is there a way that I can catch this exception and wait for, say, fifteen minues, and then retry the read, or is the Reader object fried once access to the file is lost?
If the Reader is rendered useless once the connection is lost, is there a way that I can rewrite this in such a way as to allow me to "save my place" and then begin my read from there without having to read and discard all the data before it? Even just munching data without processing it takes a long time when there's 500GB of it to get through.
Currently, the code looks something like this (edited for brevity):
class Processor {
BufferedReader br;
Processor(String fname) {
br = new BufferedReader(new FileReader("fname"));
}
void process() {
try {
String line;
while((line=br.readLine)!=null) {
...code for processing the line goes here...
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Thank you for your time.
You can keep track of read bytes in a variable. For example here I keep track in a variable called read, and buff is char[]. Not sure if this is possible using the readLine method.
read+=br.read(buff);
Then if you need to restart, you can skip that many bytes
br.skip(read);
Then you can keep processing away. Good luck
I doubt that the underlying fd will still be usable after this error, but you would have to try it. More probably you will have to reopen the file and skip to where you were up to.