Finding the right Architectural pattern for flight search engine [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am working on a college project in Java. One of the first tasks we were given is to choose an Architectural pattern for our project (MVC, Repository, layers, etc) and create it visually.
I found a lot of different examples for Architectural patterns over the internet but I cant find anything that matches 100% the idea of the project.
I also couldn't find an Architectural pattern example for a similar project (flight search engine system).
I'd appreciate any help finding the right Architectural pattern for the system we're creating in our project. Details about the system below:
Main functions: sign up, login, search, place an order, export reports for the travel agent/ agency as a whole.
Only a travel agent (with certificate) or a travel agent from a travelling agency can sign up to the system and use it. It is not possible for the passenger to use the system.
The agent can run a search. The results of the searches are pulled from a static JSON file (it is not a complex system, so it is not taking from real time database or something. We just shuffle the file every 2 hrs or so).
The search has different filters, including destination, origin country, number of passengers, one way or two and other non- mandatory fields.
The results are listed by best to worse ( pricing and shortest path). The algorithm to calculate the price is pretty simple and is based on airline company type (charter or scheduled flight, day in the week, season, holidays, etc).
If the customer (passenger) is interested in it, the travel agent can order it for him/her. An email with order details will be sent to the customer.The seats available on that ordered flight will be reduced accordingly and changed in the specific airline company file we allocated for it.
In addition, an export option is available for the agent to view all of the orders the made for all time and in specific dates as well. Cancellation is possible too.
That's it about the project,
I'd appreciate any help!
Thanks!

I should consider changing the term "architectural pattern" into architectural style. Then, I should think about the fact that an architecture is a set of multiple architectural styles that are composed together into a system.
As I've said, you should choose multiple architectural style, not a single one, when designing a system. From the description posted by you I should use an MVC approach for the web layers: login, signup, place order, where I will use models, views and controllers. I suppose that you will read in detail about what is a model, a view and a controller.
Also, I will use a layered ports-and-adapters/onion-architecture style for a better decoupling of the code. Use adapters for interaction with external systems such as the database. Think in terms of domain model using domain entities, aggregates and repositories.
Good luck!

Related

REST API Design for Complex Operations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Queries -
Working on building a Rest API to support search/filtering of transactions. Below are the two requirements, api is expected to support
Retrieve list of transactions by an array of [transactionid] - minimum is 1
Retrieve list of transactions by ( (transactionType=Sale OR transactionType=Refund) AND storeid=XXXX))
I am looking to design this as POST request with assuming search as resource something like the below. I am puzzled on second requirement above in complex querying with "AND" "OR" operations. Any inputs around it will be deeply appreciated. API is expected to supported search of transactions on varied attribute combinations
Design for Requirement 1
POST /sales-transactions/search
{
searchrequestid:xxxxxx
transactionids:[1,2,3,4,..]
}
If "retrieve" is an essentially read only operation, then you should be prioritizing a design that allows GET, rather than POST.
Think "search form on a web page"; you enter a bunch of information into input controls, and when you submit the form the browser creates a request like
GET /sales-transactions/search?searchrequestid=xxxxxx&transactionIds=1,2,3,4...
Query parameters can be thought of as substitutions, the machines don't care which parameters are being used for AND and which are being used for OR.
select * from transactions where A = :x and B = :y or C = :z
GET /sales-transactions/search?A=:x&B=:y&C=:z
Because the machines don't care, you have the freedom to choose spellings that make things easier for some of your people. So you could instead, for example, try something like
GET /sales-transactions/AandBorC?A=:x&B=:y&C=:z
It's more common to look into your domain experts language for the name of the report, and use that
GET /some-fancy-domain-name??A=:x&B=:y&C=:z
When we start having to support arbitrary queries through the web interface, and those queries become complicated enough that we start running into constraints like URI lengths, the fall back position is to use POST with the query described in the message body.
And that's "fine"; you give up caching, and safety, and idempotent semantics; but it can happen that the business value of the adhoc queries wins the tradeoff.

Advice required on choosing SQL or NoSQL framework for searching / persisting [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
We are trying to build to backend for a Job Portal for which we are building android and iPhone clients.
Here is a basic field which needs to be persisted/searchable.
User meta data and their preferences needs to be stored.
category to which user below ( single value )
skills of user. ( Multi value )
User location in text and in latlng
Job data and their searchable fields.
Job category ( single value )
Job skills ( Multi value )
Job location in text as well as in latlng
Some of the basic use cases :
When job is about to get posted, we should be able to get candidate list nearby location based on job category/skills and latlng.
When job is posted, it has to match the actual candidates and get their meta information and persist in another table/schema.
When a new user on boards, get suitable jobs for candidate and store in another table.
This data will be served to android/iphone and web dashboard for serving real time data.
Need your suggestions for choosing the framework considering factors of HA, Scalability, reliability and cost.
You might want to use both MySql and Solr for different purposes. For persisting the data, it better to use MySql or like database because they will provide you all the ACID properties. You should index your job and user data to Solr/Lucene which can serve the real time search on your platform and provide suggestion for auto-completion feature. Solr also provides geo-location search, which could be used to match users and jobs. You can always build recommendation feature on that. CloudSolr can be configured to for HA and Scalability.
For searching give Solr/Lucene a try. It is extremely scalable, battle-tested and mature.
They are different tools with different advantages and disadvantages. Both are used on a massive scale. For small projects the business answer is probably "Do what you know, because it will save you developer-hours."
Just be sure it doesn't lock you into a situation where it's hard to make a change you want down the road. See, e.g., http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Develop algorithm for determining data availability [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are using a Change Data Capture tool to migrate source data to a target database in near real-time.
The challenge is to identify as accurately as possible the data migration latency that exists between
source and target. The latency reporting capabilities of the tool are not to our satisfaction and so I
have been tasked with developing a process that will better monitor this specific metric.
There are two main reasons why we need to know this:
1: Provide our users with an accurate data availability matrix to support report scheduling. For example,
How much time should pass after midnight before scheduling a daily reconciliation report for the
previous day given that we want this information as soon as possible?
2: Identify situations when the data mirroring process is running slower than usual (or has even stopped).
This will trigger an email to our support team to investigate.
I am looking for some general ideas of how to best go about this seemingly simple task
My preferred approach is a dedicated heartbeat or health-check table.
At the source the table has an identity column (SQLserver) column or value from a sequence (Oracle) as main identifier; a fixed task name string; fixed server string (if no already identified by the taskname; and the current time.
Have a script/job on the source to insert a record every minute (or 2 minutes or 10 minutes)
In the CDC engine (if there is one), add a column with the time the change event was processed.
At the target, add a final column defaulting to the current time at insert.
A single target table can accommodate multiple sources/tasks.
The regular blibs will allow one to see at a glance whether changes are coming true, whether the application is generating changes or not.
A straightforward report can show the current latency, as was as the latency over time.
It is nice to be able to compare 'this Monday' with 'last Monday' to see if things a similar, better or worse.
Cheers, Hein.

Creating a database from various web pages? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Is there a way using java or python that I can somehow gather a ton of information from a ton of different colleges on a website such as collegeboard?
I want to know how to do things like this but I've never really programmed outside of default libraries. I have no idea how to start my approach.
Example:
I input a large list of colleges on a list that looks somewhat like
this
https://bigfuture.collegeboard.org/print-college-search-results
The code then finds the page for each college such as
https://bigfuture.collegeboard.org/college-university-search/alaska-bible-college?searchType=college&q=AlaskaBibleCollege
and then gathers information from the page such as tuition, size, etc.
and then stores it in a class that I can use for analysis and stuff
Is something like this even possible? I remember seeing a similar program in the Social Network. How would I go about this?
So, short answer, yes. It's perfectly possible, but you need to learn a bunch of stuff first:
1) The basics of the DOM model (HTML) so you can parse the page
2) The general idea of how servers and databases work (and how to interface with them in python- what I use, or java)
3) Sort of a subsection of 2: Learn how to retrieve HTML documents from a server to then parse
Then, once you do that, this is the procedure a program would haft to go through:
1) You need to come up with a list of pages that you want to search. If you want to search and entire website, you need to sort of narrow that down. You can easily limit your program to just search certain types of forums, which all have the same format on college board. You'll also want to add part of the program that builds a list of web pages that your program finds links to. For instants, if collegeboard has a page with a bunch of links to different pages with statistics, you'll want your program to scan that page to find the links to the pages with those statistics.
2) You need to find the ID, location, or some identifying marker of the HTML tag that contains the information you want. If you want to get REALLY FANCY (and I mean REALLY fancy) you can try to use some algorithms to parse the text and try to get information (maybe trying to parse admission statistics and stuff from the text on the forums)
3) You then need to store that information in a database that you then index and create an interface to search (if you want this whole thing to be online, I suggest the python framework Django for making it a web application). For the database type, it would make sense to use Sqlite 3 (I)
So yes, it's perfectly possible, but here's the bad news:
1) As someone already commented, you'll need to figure out step 2 for each individual web page format you do. (By web page format, I mean different pages with different layouts. The stack overflow homepage is different from this page, but all of the question pages follow the same format)
2) Not only will you need to repeat step 2 for each new website, but if the website does a redesign, you'll have to redo it again as well.
3) By the time you finish the program you may have easily gathered the info on your own.
Alternative and Less Cool Option
Instead of going through all the trouble or searching the web page for specific information, you can just search the web page and extract all its text, then try and find key words within the text relating to colleges.
BUT WAIT, THERE'S SOMETHING THAT DOES THIS ALREADY! It's called google :). That's basically how google works, so... yah.
Of course there is "a way". But there is no easy way.
You need to write a bunch of code that extracts the stuff you are interested in from the HTML. Then you need to write code to turn that information into a form that matches your database schema ... and do the database updates.
There are tools that help with parts of the problem; e.g. Web crawler frameworks for fetching the pages, JSoup for parsing HTML, Javascript engines if the pages are "dynamic", etc. But I'm not aware of anything that does the whole job.
What you're asking about here is called scraping and in general
it's quite tricky to do right. You have to worry about a bunch of
things:
The data is formatted for display, not programmatic consumption.
It may be messy, inconsistent, or incomplete.
There may be dynamic content, which means you might have to run a
JavaScript VM or something just to get the final state of the page.
The format could change, often.
So I'd say the first thing you should do is see if you can access the
data some other way before you resort to scraping. If you poke around
in the source for those pages, you might find a webservice feeding data
to the display layer in XML or JSON. That would be a much better place
to start.
Ok everyone thanks for the help. Here's how I ended up doing it. It took me a little while but thankfully collegeboard uses very simple addresses.
Basically there are 3972 colleges and each has a unique, text only page with an address that goes like:
https://bigfuture.collegeboard.org/print-college-profile?id=9
but the id=(1-3972).
Using a library called HTMLunit I was able to access all of these pages, convert them in to strings and then gather the info using indexOf.
It still is going to take about 16 hours to process all of them but I've got a hundred or so saved.
Maybe I lucked out with the print page but I got what I needed and thanks for the help!

Are there free realtime financial data feeds since the demise of OpenQuant? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Now that the oligopole of market data providers successfully killed OpenQuant, does any alternative to proprietary and expensive subscriptions for realtime market data subsist?
Ideally I would like to be able to monitor tick by tick securities from the NYSE, NASDAQ and AMEX (about 6000 symbols).
Most vendors put a limit of 500 symbols watchable at the same time, this is unacceptable to me, even if one can imagine a rotation among the 500 symbols ie. making windows of 5 sec. of effective observation out of each minute for every symbol.
Currently I'm doing this by a Java thread pool calling Google Finance, but this is unsatisfactory for several reasons, one being that Google doesn't return the volume traded, but the main one being that Google promptly is killing bots attempting to take advantage of this service ;-)
Any hint much appreciated,
Cheers
I think you'll find all you need to know by looking at this question: source of historical stock data
I don't know of any free data feeds other than Yahoo!, but it doesn't offer tick-by-tick data, it only offers 1 minute intervals with a 15 minute delay. If you want to use an already existing tool to download the historical data, then I would recommend EclipseTrader. It only saves the Open, Close, High, Low, and Volume.
(source: divbyzero.com)
You can write your own data scraper with very little effort. I've written an article on downloading real-time data from yahoo on my blog, but it's in C#. If you're familiar with C# then you'll be able to translate the action in Java pretty quickly. If you write your own data scraper then you can get pretty much ANYTHING that Yahoo! shows on their web site: Bid, Ask, Dividend Share, Earnings Share, Day's High, Day's Low, etc, etc, etc.
If you don't know C# then don't worry, it's REALLY simple: Yahoo allows you to download CSV files with quotes just by modifying a URL. You can find out everything about the URL and the tags that are used on yahoo here: http://www.gummy-stuff.org/Yahoo-data.htm
Here are the basic steps you need to follow:
Construct a URL for the symbol or multiple symbols of your choice.
Add the tags which you're interested in downloading (Open, Close, Volume, Beta, 52 week high, etc, etc.).
Create a URLConnection with the URL you just constructed.
Use a BufferedReader to read the CSV file that is returned from the connection stream.
Your CSV will have the following format:
Each row is a different symbol.
Each column is a different tag.
Open a TDAmeritrade account and you will have free access to ThinkOrSwim real time trading and quotes platform. Live trading is real time and paper trading is delayed 15 minutes. I forget what the minimum required is to open a TDAmeritrade account but you can go to TDAMeritrade.com or thinkorswim.com to check them out.
Intrinio has a bunch of feeds with free and paid tiers. Essentially you only have to pay for what you need as opposed to the bigger data suppliers. Intrinio focuses on data quality and caters to developers as well, so I think it'd be a great option for you.
full disclosure - I work at Intrinio as a developer
There's a handy function in Google Sheets (ImportHTML) which I've been using for a while to reasonable effect.
For example -
=ImportHTML("http://www.bloomberg.com/markets/commodities/futures/metals/","table",1),5,3) returns the EUR Gold spot price.
It works with Yahoo too, so =Index(ImportHTML("http://finance.yahoo.com/q?s=DX-Y.NYB","table",0),2,2) returns the DXY.
The data updates with some small delay but it's usable.

Categories