How to deal with multi-REST client running on one server - java

Here is my situation:
I wrote two REST clients in Java which are running on my server. And these clients are packaged into Runnable Jar files. I set a schedule for running it. Every time data has been synced around 3MB.
Recently, I needed to write two more clients to sync from other resources. Before that, I didn't have software architecture experience to build an efficient client System.
My problem is my server is not good for running Microsoft windows server 2003 R2. The hardware info is following:
CPU: Intel Xeon E5649 2.53GHZ
RAM: 2GB
I use MySQL database, basically the sql writes per second around 20. It is very slow. From now, doing one synchronization takes 2 hours. I could not imagine 4 REST clients running on Monday.
I need help with how to deal with four clients running on a low capacity server. However, please don't convince me to change a new powerful server :)
I have been thinking for a long time, could I only build a client which sync data from different resource? Or build four clients which running on a different schedule? The other problem is in the future more resources will be added. I don't know how to build a system with strong scalability because of my lack of knowledge.
If you could give me some advice to push my learning a little bit, it will be very appreciated. Thank you.
Further information:
The goal is to grasp data from different RESTful servers with different API and GET query rules and insert these data into Database. Thus, basically the application's job is to put data into MySQL via RESTful call.
In addition, I only focus on it. I do not need to consider about how to deal with inserted data. The structure is simply:
Get Restful call and get JSON format result
parsing JSON
insert into DB
I used (JAX-RS) Jersey api to implement RESTful call, and use JDBC to manage the database, but I am working on the next version which will use Hibernate to implement insert, delete, update, and search functions.
This application implementation does not use any application server. I package the program to runnable jar file, and set schedule to run it on windows server 2003.
Being a recently graduated student, that is my first REST client, but with my deep research and studying, I know more about it. However, I don't have experience on it. I just want to make it work better. Any suggestion, I appreciate it.

Use a profiler to gain insight into the actual memory and CPU usage of your application. You can then decide how to improve it using the appropriate means. (F.e. multithreading, caching, compression, ...)

Related

Java - Network application- Real-Time

What are recommended strategies for building Java application that will be run on "desktop", not in browser. Characteristics of the application would be:
1. Multiple application instances would be running on different machines
2. Applications must communicate in real-time (if one user make changes,
in another application data must be refreshed)
Do you want to create a networking application maybe? based on sockets and so on? Regarding your 2 questions, I have implemented that scenario some time ago and I am working in something similar for my job, it is not complex at all, but I will answer to you according the two issues that concern to you.
Multiple application instances would be run on different machines.
If you are going to install an instance of the application in the people's desktop, I'd suggest to be very careful with "paths", do not hard code any path, since the resources loading will be dynamic.
Check carefully what is the network architecture in which your application will be installed. Maybe it is just a LAN, or maybe it will work in a big network and access through VPN, etc. Check what is the scenario.
Once you make sure your application works fine in different machines without any path conflict or resource loading conflict, you can export your jar, generate it using maven, ant, etc.
Also, if you want to move forward, you can create an installer using any Install wizard creation and create a batch file (.exe) for Windows or (.sh) for Linux distr. But these are only suggestions for the installation stage.
On the other hand, if you wanna execute the application as a Java desktop but using an URL to launch it, you can take a look to JNLP.
Applications must communicate in real-time (if one user make changes, then other will be able to see that)
If you want to do that, you will need, for sure, a server to provide and store information. The server can be a physical machine set up in the office or a remote one.
You have two options here:
Use Java Networking: Create an application that works as a server that provides and saves the information (it should be a concurrent environment since many people will perform transactions or queries over it). Check how can you create a basic server - client application using Sockets to understand better how it works and then you will not have problems to add the complexity of the requirements your environment demands.
You can simply, develop a Java REST Based application and make your Client application connect to the machine (or machines if you plan to implement load balancing) and consume those REST. You can take a look to Jersey libraries in order to implement your scenario. Make sure to add security to these Web Services and make the server private access for the network in which your application instances will work.
Well, that's what I can tell you regarding the scenario you try to implement, based on what I've done and what I'm doing now so far.
Maybe if you need additional or further information, you can reply in the comments, and it will be great to help you.
Regards and happy coding :)
you want to look into using sockets, TCP or UDP, and also figure out if you want a central authoritative server ( what if two users change the same thing in different ways, whose data is saved?)
read this article from Oracle/Java hereJava Custom Networking

iOS remote MySQL database, technology recommendation

There is web application, journalism related, that uses MySQL databases and presents a web based interface to users.
I want to build a iOS app that does a mobile interface as well. The UI is pretty easy and I have experience with that.
The problem is with the database, which I have no experience with.
I will be learning about databases and probably take the Coursera course on it. I am not asking you to teach me that. I just wanna know which technologies I should invest my time in over the next couple months.
My understanding so far is that the app should not talk to the database directly,
but rather there should be some one on the server talking to the database on behalf of the App.
This is the question and the part I want to understand clearly, so correct me if I am wrong.
I will have to write some sort of a unix program that runs on the server and talks to the db and then communicates back to app? how? using a web view? Using unix sockets to talk to the app? ssh? Which one is cool with Apple?
My preference for writing something like that on the server would be: python(have experience), java(have experience), and maybe ruby(no experience). I'd prefer to avoid scripting languages.
Are they ok? Which one is best suited? Also is this middle dude going to have to be on the same server that has the database or can be another machine on the internet(i'd prefer this, so i can put it on my own VPS and not have to screw up with the server machine)
This is similar to another question from tonight, but you're coming at it from a different angle.
In general terms, an iOS application that needs to be able to run in offline mode will need to have its own database. This means creating Core Data models to store all of the data required by the application. Internally this is stored in a SQLite database.
If you want to make an application that's online-only, it's somewhat easier since you won't need to worry about the Core Data part and can instead focus on building your service API. If you're familiar with Python then your best bet is Django to provide that layer. You'll need to implement a number of endpoints that can receive requests, translate that into the appropriate database calls, then render the result in a machine readable format.
Scripting languages are what power most back-ends even for massive scale systems. In most cases the database will be the bottleneck and not the language used to interface with it. Even Twitter stuck with Ruby until they hit tens of millions of active users, so unless you're at that level, don't worry about it.
For most applications, using HTTP as your transport mechanism and JSON as your encoding method is the way to go. It's very simple to construct, easy to consume, and fairly easy to read. There are probably a number of ways you might go about reading and writing this, but that's another question.
For small-scale applications where the number of users is measured in the hundreds then you can host the application and database on the same server. Even a modest VPS with 512MB of memory might do the job, though for heavier loads you might want to invest in a 1GB instance. It really depends on how often people are accessing your application and what the peak loads are like.

multi mySQL driven web site

I have database driven web site that needs more than one MySQL Sever to handle the expected demand
I also need to implement back up system (of some type) to keep data safe.
I'm using java but that that’s not critical
What options are available to me from projects out their
I'm thinking of daisy chaining project with the MYSQL server's somehow and then when one is busy go to the next and they all be written data to. I know they can measure time used they must be able to measure when they are in use.
You might want to look into clustering.
http://www.mysql.com/products/cluster/
How about deploying a Cluster in the cloud?
http://www.mysqlconf.com/mysql2009/public/schedule/detail/6912

How is AWS for Data mining for school project?

I have to do a class project for data mining subject. My topic will be mining stackoverflow's data for trending topics.
So, I have downloaded the data from here but the data set is so huge (posts.xml is 3gb in size), that I cannot process it on my machine.
So, what do you suggest, is going for AWS for data processing a good option or not worth it?
I have no prior experience on AWS, so how can AWS help me with my school project? How would you have gone about it?
UPDATE 1
So, my data processing will be in 3 stages:
Convert XML (from so.com dump) to .ARFF (for weka jar),
Mine the data using algos in weka,
Convert the output to GraphML format which will be read by prefuse library for visualization.
So, where does AWS fit in here? I support there are two features in AWS which can help me:
EC2 and
Elastic MapReduce,
but I am not sure how mapreduce works and how can I use it in my project. Can I?
You can consider EC2 (the part of AWS you would be using for doing the actual computations) as nothing more than a way to rent computers programmatically or through a simple web interface. If you need a lot of machines and you intend to use them for a short period of time, then AWS is probably good for you. However, there's no magic bullet. You will still have to pick the right software to install on them, load the data either in EBS volumes or S3 and all the other boring details.
Also be advised that EC2 instances and storage are relatively expensive. Be prepared to pay 5-10x more than you would pay if you actually owned the machine/disks and used it for say 3 years.
Regarding your problem, I sincerely doubt that a modern computer is not able to process a 3 gigabyte xml file. In fact, I just indexed all of stack overflow's posts.xml in SOLR on my workstation and it all went swimmingly. Are you using a SAX-like parser? If not, that will help you more than all the cloud services combined.
Sounds like an interesting project or at least a great excuse to get in touch with new technology -- I wish there would have been stuff like that when I went to school.
In most cases AWS offers you a barebone server, so the obvious question is, have you decided how you want to process your data? E.g. -- do you just want to run a shell script on the .xml's or do you want to use hadoop, etc.?
The beauty of AWS is that you can get all the capacity you need -- on demand. E.g., in your case you probably don't need multiple instances just one beefy instance. And you don't have to pay for a root server for an entire month or even a week if you need the server only for a few hours.
If you let us know a little bit more on how you want to process the data, maybe we can help further.

Is it easier to scrape data for a gae app in dev and upload it to prod or should you scrape in prod?

I have to run a scraping task to collect data for my App Engine (Java) app.
I'm not sure which is best - scrape data in development mode and upload it to prod or scrape it while the app is running in production.
Does it make a difference?
Are there any difficulties with bringing large quantities of data from one environment to the other (dev->prod or prod->dev)?
The dev server itself probably isn't a great scraping tool; it's single-threaded and (at least for python; the java implementation might be drastically different) the datastore is fairly horrible when storing large amounts of data.
However, depending on what you're scraping, the production servers might not be well-suited to the task; if the sites can take longer than 10 seconds to respond to a request, the urlfetch API will timeout. If you can be sure that this won't be a problem, it's probably more convenient to do the scraping in production and write directly to the datastore.
If not, it might make sense to do the scraping with a standalone tool and then put the data into the production datastore either with a RESTful web service or the remote API.
EDIT: The production servers can now set a 10 minute timeout on urlfetches initiated from taskqueue or cron jobs, so these objections might not apply anymore.
I find that spiders running in production often time out. Your solution of using the dev server is a good one, but also consider implementing each fetch through taskqueue.
Look at this question how to configure remore API for Java to use Python bulk data loader. You can also write a custom loader.

Categories