Encog - How to load training data for Neural Network - java

The NeuralDataSet objects that I've seen in action haven't been anything but XOR which is just two small data arrays... I haven't been able to figure out anything from the documentation on MLDataSet.
It seems like everything must be loaded at once. However, I would like to loop through training data until I reach EOF and then count that as 1 epoch.. However, everything I've seen all the data must be loaded into 1 2D array from the beginning. How can I get around this?
I've read this question, and the answers didn't really help me. And besides that, I haven't found a similar question asked on here.

This is possible, you can either use an existing implementation of a data set that supports streaming operation or you can implement your own on top of whatever source you have. Check out the BasicMLDataSet interface and the SQLNeuralDataSet code as an example. You will have to implement a codec if you have a specific format. For CSV there is an implementation already, I haven't checked if it is memory based though.
Remember when doing this that your data will be streamed fully for each epoch and from my experience that is a much higher bottleneck than the actual computation of the network.

Related

Multi channel audio within processing

I’m trying to build a sketch that shows me levels of audio coming into a system. I want to be able to do more than 2 channels so i know that i need more than the processing.sound library can provide currently and my searching has led me to javax.sound.sampled.*, however this is as far as my searching and playing has got me.
Does anyone know how to query the system for how many lines are coming in and to get the amplitude of audio on each line?
This is kind of a composite question.
For the number of lines, see Accessing Audio System Resources in the Java tutorials. There is sample code there for inspecting what lines are present. If some of the terms are confusing, most are defined in the tutorial immediately preceding this one.
To see what is on the line, check Capturing Audio.
To get levels, you will probably want to do some sort of rolling average (usually termed as root-mean-square). The "controls" (sometimes) provided at a higher level are kind of iffy for a variety of reasons.
In order to calculate those levels, though, you will have to convert the byte data to PCM. The example code in Using Files and Format Converters has example code that shows the point where the conversion would take place. In the first real example given, under the heading "Reading Sound Files" take note of the place where the comment sits that reads
// Here, do something useful with the audio data that's
// now in the audioBytes array...
I recall there are already StackOverflow questions that show the commands needed to convert bytes to PCM.

What is the most efficient way to pass data (list of pairs of [Integer + Double]) between two Google App Engine instances?

What is the most efficient way to pass data (list of pairs of [Integer, Double]) between two Google App Engine instances ?
Currently I use Java binary serialization. Frontend servlet receives data from the client in JSON format. I convert it to byte[] using ObjectOutput.writeObject and then send it to backend servlet via HTTP POST. It's not in production yet.
Should I just pass client's JSON as it is to backend? It seems more logical. But it's bigger in size.
Or should I use Google Protocol Buffers as stated in this benchmark article ?
Thank you!!!
My suggestion would be to try out the 3 options for yourself. For a data structure this simple, the effort needed to try the alternatives should be relatively small. (The chances of someone here having tried these 3 alternatives on your specific use-case is pretty small, so any direct answers are likely to be mostly best-guesses.)
But before you spend time on this, ask yourself if you can justify it. Is there a real performance problem here, or is it just conjecture that there is likely to be a problem? Can you quantify it? Is it worth expending effort on it?
And if we are making guesses, I'd think that you'll get best performance by using a simple DataOutputStream / DataInputStream pair. Write alternating int and double values extracted from the input list, and at the other end read the values and reconstruct the equivalent list of pairs at the other end. (And start by sending the list length to make the reconstruction more efficient.)

Transferring large arrays from server to client in GWT

I'm attempting to transfer a large two dimensional array (17955 X 3) from my server to the client using Asynchronous RPC calls. This is taking a very long period of time which is especially bad because the data is needed in order to initialize the application. I've read that using a JSON object might be faster, but I'm not sure how to do the conversion in Java as I'm pretty new to the language and GWT, and I don't know if the speed difference is significant. I also read somewhere that I can zip the data, but I only read that in a forum and I'm not sure if it's actually possible as I couldn't find information for it elsewhere. Is there any way to transfer large amounts of data from server to client? Thanks for your time.
Read this article on adding JSON capabilities to GWT. In regards to compression this article explains gzipping with GWT.
Also the size of your array is still very large even with the compression you may achieve with gzipping, which will vary depending on how much data is repeated in your array. You may want to consider logically breaking up the array in multiple RPC calls if at all possible.
I would recommend revisiting your design if your application needs such a large amount of data to initialize.
As other's pointed out, you should re-consider your design because even if you are able to solve the data transfer speed issue somehow you will likely find other issues waiting for you:
Processing large amount of data in the browser can be slow.
Lot of data means a lot of used-up memory
What you can think about is:
Partitioning the data:
How is your user going to cope with a lot of data. Your user will probably need some kind of user interface aid to be able to work with such a huge data. If you are going to use paging, tabs or other means to partition the data for user's consumption, why not load the data on demand. For example, you can load a single page of records if you are using a paging grid or you can load a single tab worth of records if you are going to use tabs. Similary, if you are going to allow filtering on the records, you can set a default filter after the load to keep the data to a minumum.
Summarizing the data:
You can also summarize the data on the server, if you are not going to show each row to the user. For example you can initially show summary for each group of records and let the user drill-down in a specific group

Parsing IBM 3270 data in java

I was wondering if anyone had experience retrieving data with the 3270 protocol. My understanding so far is:
Connection
I need to connect to an SNA server using telnet, issue a command and then some data will be returned. I'm not sure how this connection is made since I've read that a standard telnet connection won't work. I've also read that IBM have a library to help but not got as far as finding out any more about it.
Parsing
I had assumed that the data being returned would be a string of 1920 characters since the 3278 screen was 80x24 chars. I would simply need to parse these chars into the appropriate fields. The more I read about the 3270 protcol the less this seems to be the case - I read in the documentation provided with a trial of the Jagacy 3270 Java library that attributes were marked in the protocol with the char 'A' before the attribute and my understanding is that there are more chars denoting other factors such as whether fields are editable.
I'm reasonably sure my thinking has been too simplistic. Take an example like a screen containing a list of items - pressing a special key on one of the 24 visible rows drills down into more detailed information regarding that row.
Also it's been suggested to me that print commands can be issued. This has some positive implications - if the format of the string returned is not 1920 since it contains these characters such as 'A' denoting how users interact with the terminal, printing would eradicate these. Also it would stop having to page through lots of data. The flip side is I wouldn't know how to retrieve the data from the print command back to Java.
So..
I currently don't have access to the SNA server but have some screen shots of what the terminal will look like once I get a connection and was therefore going to start work on parsing. With so many assumptions and not a lot of idea on what the data will look like I feel really stumped. Does anyone have any knowledge of these systems that might help me back on track?
You've picked a ripper of a problem there. 3270 is a very complex protocol indeed. I wouldn't bother about trying to implement it, it's a fool's errand, and I'm speaking from painful personal experience. Try to find a TN3270 (Telnet 3270) client API.
This might not specifically answer your question, but...
If you are using Rational Developer for z/OS, your java code should be able to use the integrated HATS product to deal with the 3270 stream. It might not fit your project, but I thought I would mention it if all you are trying to do is some simple screen scraping, it makes things very easy.

Java: Most efficient way to store/retrieve workout information from a file?

I'm working on a Java project for class that stores workout information in a flat file. Each file will have the information for one exercise (BenchPress.data) that holds the time (milliseconds since epoch), weight and repetitions.
Example:
1258355921365:245:12
1258355921365:245:10
1258355921365:245:8
What's the most efficient way to store and retrieve this data? It will be graphed and searched through to limit exercises to specific dates or date ranges.
Ideas I had was to write most recent information to the top of the file instead of appending at the end. This way when I start reading from the top, I'll have the most recent information, which will match most of the searches (assumption).
There's no guarantee on the order of the dates, though. A user could enter exercises for today and then go in and enter last weeks exercises, for whatever reason. Should I take the hit upon saving to order all of the information by the date?
Should I go a completely different direction? I know a database would be ideal, but this is a group project and managing a database installation and data synch amongst us all would not be ideal. The others have no experience with databases and it'll make grading difficult.
So thanks for any advice or suggestions.
-John
Don't overcomplicate things. Unless you are dealing with million records you can just read the whole thing into memory and sort it any way you like. And always add records in the end, this way you are less likely to damage your file.
For simple projects, using an embedded like JavaDB / Apache Derby may be a good idea. Configuration for the DB is absolutely minimal and in your case, you may need a maximum of just 2 tables (User and Workout). Exporting data to file is also fairly simple for sync between team members.
As yu_sha pointed out though, unless expect to have a large dataset ( to run on a PC , > 50000), you can just use the file and read everything into memory.
Read in every line via BufferedReader and parse with StringTokenizer. Looking at the data, I'd likely store an array of fields in a List that can be iterated and sorted according to your preference.
If you must store the file in this format, you're likely best off just reading the entire thing into memory at startup and storing it in a TreeMap or some other sorted, searchable map. Then you can use TreeMap's convenience methods such as ceilingKey or the similar floorKey to find matches near certain dates/times.
Use flatworm, a Java library allowing to parse and create flat files. Describe the format with a simple XML definition file, and there you go.

Categories