I am working with GTFS format.I have a concern for example consider a little complex network where a user want to travel from a source 'A' to Destination 'B' and actually there exist no direct route from A to B.But we can reach B from A thru a stop C. i cant see a way mentioned in GTFS to know that there exist some route to reach B in our example(A->c->B). Do i miss something here? or there is no way to do that but to implement our own algo? or is some third party already implemented the algo in java (i believe someone did it ;) ).
Thanks in advance.
Cheers
Sriram.
PS:As i am unable to create new tags hence i have the transport,java tag (not gtfs or something similar to that)
Well GTFS does not help in identifying paths across different routes and this has to be done on fly.
As Sriram's answer from 4.5 years ago notes, GTFS does not attempt to represent all possible paths through a transit network. OpenTripPlanner is an open-source project that tries to solve this problem in the general case: http://www.opentripplanner.org/. OTP uses GTFS as its primary schedule input.
Related
I want to create an algorithm that searches job descriptions for given words (like Java, Angular, Docker, etc). My algorithm works, but it is rather naive. For example, it cannot detect the word Java if it is contained in another word (such as JavaEE). When I check for substrings, I have the problem that, for example, Java is recognized in the word JavaScript, which I want to avoid. I could of course make an explicit case distinction here, but I'm more looking for a general solution.
Are there any particular techniques or approaches that try to solve this problem?
Unfortunately, I don't have the amount of data necessary for data-driven approaches like machine learning.
Train a simple word2vec language model with your whole job description text data. Then use your own logic to find the keywords. When you find a match, if it's not an exact match use your similar words list.
For example you're searching for Java but find also javascript, use your word vectors to find if there is any similarity between them (in another words, if they ever been used in a similar context). Java and JavaEE probably already used in a same sentence before but java and javascript or Angular and Angularentwicklung been not.
It may seems a bit like over-engineering, but its not :).
I spent some time researching my problem, and I found that identifying certain words, even if they don't match 1:1, is not a trivial problem. You could solve the problem by listing synonyms for the words you are looking for, or you could build a rule-based named entity recognition service. But that is both error-prone and maintenance-intensive.
Probably the best way to solve my problem is to build a named entity recognition service using machine learning. I am currently watching a video series that looks very promising for the given problem. --> https://www.youtube.com/playlist?list=PL2VXyKi-KpYs1bSnT8bfMFyGS-wMcjesM
I will comment on this answer when I am done with my work to give feedback to those who are facing the same problem.
I'm not asking for anyone to build me an app.
I just need some tips on getting started.
So what I wanted to do:
be able to map some routes/directions, similar to what Google Maps already has regarding the local transit in a city.
Why? Because Google's database is a bit outdated, first. Second, because I want to create a local database with the routes and with the stations. Unfortunately, I can't really do that using Google Maps and I think Leaflet could help me with this much better. This would've been a web app, where someone with an account could add/edit/delete the routes.
create an Android app that :
a) sees the routes, allows an user to find the closest path to get from point A to B using only the routes I have in my database, sort by tram/bus etc
b) allows the user to mark a location and say something like "bus no 37 was here at hour:minute:second" - this would appear for anyone else that is using the app, similar to what another app lets you do this for police cars and traffic jams
c) extra: allow users to input some data so that my app could also give predictions; for example, someone inputs it took 10m50s to get from point X to point Y on route Z. That remains in a database and then someone else inputs some data for the same path...i would create some algorithm that could get predictions on where would a bus be now if someone marked it at Station 'bla' 5 minutes ago. I know, I know, this might be pretty hard, and it would be pretty inaccurate, I should consider the time of day, but it would just be something small, as an extra. Also, would be cool if this stuff could be added automatically: like the user sets the route he's on, starts "recording", then stops it when he gets off the vehicle and the time and locations are automatically taken into consideration.
Hope you understand what I have in my mind.
Thing is, what would you recommend?
I know Java, Spring MVC and a bit of Android. JavaScript, HTML and CSS won't be a problem. I need to combine these. If I will use Leaflet, as far as I can find, I won't really be able to use it in an Android APP, I would have to create a web app. At the same time, Google Maps doesn't really let me do what I want for my "personal" database. I can't even create decent custom routes by adding waypoints because parts of the Tramway Line aren't on streets with car access. ALso, would you think this is easier/better to do as an Android app or as a web app? I'm kinda new to Android.
I hope this isn't an unsuitable thing to ask on stackoverflow.
I'm open to any ideas.
allows an user to find the closest path to get from point A to B using only the routes I have in my database, sort by tram/bus etc
Routing is hard. Multi-modal routing (tram+bus+car+walking+cycle) even more so. See pgRouting and Valhalla. If you're going to do anything with public transport, then you'll have to deal with GTFS too.
Research into OpenTripPlanner also, as there are several actors developing some similar platforms.
I hope this isn't an unsuitable thing to ask on stackoverflow.
I'm afraid it kinda is - see https://stackoverflow.com/help/on-topic, point 4.
The NeuralDataSet objects that I've seen in action haven't been anything but XOR which is just two small data arrays... I haven't been able to figure out anything from the documentation on MLDataSet.
It seems like everything must be loaded at once. However, I would like to loop through training data until I reach EOF and then count that as 1 epoch.. However, everything I've seen all the data must be loaded into 1 2D array from the beginning. How can I get around this?
I've read this question, and the answers didn't really help me. And besides that, I haven't found a similar question asked on here.
This is possible, you can either use an existing implementation of a data set that supports streaming operation or you can implement your own on top of whatever source you have. Check out the BasicMLDataSet interface and the SQLNeuralDataSet code as an example. You will have to implement a codec if you have a specific format. For CSV there is an implementation already, I haven't checked if it is memory based though.
Remember when doing this that your data will be streamed fully for each epoch and from my experience that is a much higher bottleneck than the actual computation of the network.
I'm interested in AI and 2 days ago I found an interesting recent development in this area, it's called ES-HyperNEAT, first there was NEAT, then HyperNEAT then ES-HyperNEAT.
Here are some links to the topic :
http://eplex.cs.ucf.edu/hyperNEATpage/
http://eplex.cs.ucf.edu/ESHyperNEAT/
So I've downloaded the Java version of AHNI, but there is no tutorial, I guess the developers took for granted that it's easy to use, but to me, I don't know how to implement a solution to the following problem, doesn't seem very hard, but could someone show me how to get started ?
Input looks like this :
Date , A , B , C , D
2013-07-26,18.94,19.06,18.50,18.63
2013-07-25,18.85,19.26,18.55,19.04
2013-07-24,19.32,19.40,18.47,18.99
2013-07-23,20.15,20.30,19.16,19.22 <-- Predict it ? [ Output ]
2013-07-22,20.09,20.23,19.80,20.03 <-- Start Date
2013-07-19,20.08,20.48,19.76,20.02
2013-07-18,19.88,20.68,19.64,20.12
2013-07-17,19.98,20.07,19.69,19.83
2013-07-16,20.38,20.49,19.51,19.92
......
2013-07-02,18.19,18.20,17.32,17.69
2013-07-01,18.38,18.96,17.95,18.15 <-- End Date
The program should read the above data from Start Date counting back n days to End Date, train on those data and the correct output will always be the next day's D value, I wonder how this can be implemented with ES-HyperNEAT ?
Specifically :
[1] Which classes to call to start the process ?
[2] How to tell it which fields in the input file to gather data, in this case it can ignore the Date field, and gather data from A,B,C,D [ not normalized to 0,1 ]
[3] How to tell it the correct result is the next day's D value ?
[4] How to specify the program should start from line x at the Start Date, and get data through line y at the End Date ?
Is there something like : myProgram.start(FilePath,Delimiliter,Filed2,Field3,..,Line_X,Line_Y,...) ?
The readme.txt (which you can see at https://github.com/OliverColeman/ahni) contains some info about getting started with your own experiments, specifically see the DEVELOPMENT AND CREATING NEW EXPERIMENTS section. There is currently no code specific to performing time-series prediction in AHNI, so you would have to extend one of the base fitness function classes (see the readme). Your code would need to do the things you ask about (points 2-4), but you could create a fairly generic time-series prediction class which can be configured via the .properties file to specify the things in points 2-4. If you do do this then feel free to contribute it and we'll add it to the AHNI software on github :).
AHNI is intended as a research platform to support my own research (and hopefully others along the way), rather than an "easy to use, throw generic machine learning problem X at it" kind of software package (depending on your definition of "easy to use"). I try to keep the code clean, well-organised and the API well-documented so that others may use it, but creating a full-blown tutorial (and functionality) for the many possible use-cases is beyond the scope of the project (though of course I'd gladly include tutorials written by others).
Before going further I recommend considering the below:
When googling around for previous research on using HyperNEAT for time-series prediction I came across a question I asked several years ago that is similar to yours that I had completely forgotten about (I was surprised to see my name attached to the question! :)) http://tech.groups.yahoo.com/group/neat/message/5470 The reply to this question is good food for thought on the matter. Additionally:
(ES-)HyperNEAT is designed to exploit geometric regularities (patterns, correlations) in the input or output (see http://eplex.cs.ucf.edu/papers/gauci_nc10.pdf), so one question that might be worth exploring is whether the data contains regularities that can be represented geometrically (in my question I suggested plotting some window of the time-series on a 2D plane, which the 2D input layer of the network "sees", similar to the approach used in http://eplex.cs.ucf.edu/papers/verbancsics_gecco10.pdf. However, it sounds like NEAT, using a recurrent network, might be just as good if not better than HyperNEAT for this kind of problem.
I was wondering if anyone had experience retrieving data with the 3270 protocol. My understanding so far is:
Connection
I need to connect to an SNA server using telnet, issue a command and then some data will be returned. I'm not sure how this connection is made since I've read that a standard telnet connection won't work. I've also read that IBM have a library to help but not got as far as finding out any more about it.
Parsing
I had assumed that the data being returned would be a string of 1920 characters since the 3278 screen was 80x24 chars. I would simply need to parse these chars into the appropriate fields. The more I read about the 3270 protcol the less this seems to be the case - I read in the documentation provided with a trial of the Jagacy 3270 Java library that attributes were marked in the protocol with the char 'A' before the attribute and my understanding is that there are more chars denoting other factors such as whether fields are editable.
I'm reasonably sure my thinking has been too simplistic. Take an example like a screen containing a list of items - pressing a special key on one of the 24 visible rows drills down into more detailed information regarding that row.
Also it's been suggested to me that print commands can be issued. This has some positive implications - if the format of the string returned is not 1920 since it contains these characters such as 'A' denoting how users interact with the terminal, printing would eradicate these. Also it would stop having to page through lots of data. The flip side is I wouldn't know how to retrieve the data from the print command back to Java.
So..
I currently don't have access to the SNA server but have some screen shots of what the terminal will look like once I get a connection and was therefore going to start work on parsing. With so many assumptions and not a lot of idea on what the data will look like I feel really stumped. Does anyone have any knowledge of these systems that might help me back on track?
You've picked a ripper of a problem there. 3270 is a very complex protocol indeed. I wouldn't bother about trying to implement it, it's a fool's errand, and I'm speaking from painful personal experience. Try to find a TN3270 (Telnet 3270) client API.
This might not specifically answer your question, but...
If you are using Rational Developer for z/OS, your java code should be able to use the integrated HATS product to deal with the 3270 stream. It might not fit your project, but I thought I would mention it if all you are trying to do is some simple screen scraping, it makes things very easy.