How to create big data project? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Anyone please guide me to create a big data project in java and what are the tools and technologies need to develop.
Hadoop
mogoDB
NoSQL
In above mentioned which technology is using to develop the big data concept in Java.

This question is very broad; big data "Projects" can range from writing MapReduce jobs, to using frameworks like Spring XD to automate the import of data into your environment, to using tools like GraphX visualization and MLLib machine learning libraries to analyze the data you have. The first step towards starting your project would be to figure out what you or your organization want to accomplish.
Since your question seems to at least in part be asking about what technologies to familiarize yourself with, I would suggest looking towards getting a Cloudera or Hortonworks VM to stand up and play with an environment, since those environments come with a complete suite of Big Data tools for you to work with and develop for, and are the easiest way to get started with figuring out what you can do. Once you have a better idea of your organizations goals or your own interests, more specific internet searches will lead you to books, tools, and tutorials to do what you want to do.

Related

Need design guidance to develope java spring hibernate web application [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to create java web based application using Spring-Hibernate. I know its not difficult to implement the functionality, but i need some help/guidance from the architecture perspective. Can any one suggest me the best design which will cover interfaces,design patterns etc.
Also need which version should i use of spring and hibernate.
The best way to start implementing a web application using the technologies you mentioned is to follow a tutorial from the large variety of tutorials you can find using google.
Another good option is to find a skeleton for an application that someone has created and shared in a source code sharing service like github or bitbucket (check the licenses also). You can check-out the code and have an initial working example you can work on and expand.
If something does not work during these attempts, then please come back here, search if your question is already asked by someone else, and if not place your question with specific code snippets and error messages you may get.
If everything works well and you need advices on different ways to improve performance, your architecture and the software patterns, then come back here also with a specific question, or in some cases you will find codereview more suitable for this kind of questions.
Good luck!

Integration of different parts of code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working on a small project on metadata extraction from documents and have run into, eh a dilemma. I have some libraries in Java which work well with document-handling for information retrieval, like Apache Tika, POI etc and some more tools in other languages like Ruby(pdf-extract) and a script in bash to fetch data from a RESTful API using wget.
AFAIK, Code reuse is a good thing, right? But then, if its not possible (natively, I mean) to reuse all this code, What approach has to be taken?
Using Java to run terminal-commands is a solution but I don't think it is good programming practice.
Integrating multiple technologies is something that is very common in real world applications. In order for it to scale properly, you probably want to use some methodology to keep things consistent. To me, the weakest part is probably fetching using wget, but that's my opinion.
In order to integrate and for everything to scale nicely you may want to look at some message passing protocols and have some sort of handling of queues where individual workers run in different programming languages and environments. Look at:
https://www.amqp.org/ (message passing standard)
https://www.rabbitmq.com/ (Java, .NET, Ruby, Python, PHP, JavaScript...

Java: SQL and Statistics/Machine Learning [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a question for you concerning Java. I am basically a Java user and did most of my work with it. However, in the machine learning classes I took in college, we used mostly python with the scikit-learn and numpy packages.
Now I want to do a project where I crawl data from the web, store it in SQL databases, and then do machine learning on this data. Maybe some of you have experience with those things and share some of it? I mean, of course it is possible to do these things with java, but maybe you have had some particular experiences on why I should use something else or what to consider?
I am happy for all your thoughts :-)
Have a great weekend!
It turns out that programming language and database implementation are secondary problems. Think first about the machine learning you want to do. Review the existing packages (in any language) and pick one according to how well it fits the needs of the business problem you are trying to solve. Then work with whatever language is most convenient for that package. You will probably find that no single language is suitable for all parts of the problem; you will end up gluing together Java, Python, R, shell scripts, etc, to make a complete solution, and there's nothing wrong with that. Consider that your job is problem solving instead of programming in a specific language and go from there.

How to maintain database of words in a java dictionary application(other than files) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am developing a java dictionary application...
I used jdbc for maintaining words but i realized that it won't work on other machines since i am making a desktop application...
suggest me a way to maintain words other than files
I don't think there is an alternative to using files if you want to persist your data - it will have to be stored somewhere!
What about the following:
- Use an in memory database that can be persisted to file (h2 for example)
- Use XML persistance - there are plenty of libraries to help.
My suggestion: http://www.manning.com/ingersoll/ :)
And after getting familiar with it, try Solr.
https://lucene.apache.org/solr/
Btw: Please make your question more specific. As this is your first post, I upvoted the question. But think of all the guys around here, who have no knowledge about your problem. Everyone wants to help, but you have to provide more information.
Why not use Embedded Derby?
It's platform independent, uses standard JDBC, and writes files which are accessible across any platform.
http://db.apache.org/derby/papers/DerbyTut/embedded_intro.html
There's no reason why you cannot use JDBC in a standalone java application incidentally. Think of JDBC as a kind of socket you can plug into, but what's behind that socket is entirely implementation specific, and usually can be defined by configuration.
JCR is another good example of this, but this kind of engineering technique is plentiful in the Java universe.

how to determine the number of users that can be supported by my server for a specific app [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am creating a Java web app that is meant for use by Facebook users.
The web app is planned to be hosted on Amazon EC2- now I want to find out how many users can be supported by one server. So that I have a better idea of the costs involved.
Can you tell me how to determine this, for a java + gwt web app?
Since programming products are complicated and contain many layers theoretical estimation of application performance is hard and even probably impossible. Each layer can be a bottle neck and it is hard to expect which one.
So, the only way is to perform experimental bench-marking you your specific application deployed on specific environment.
There are a lot of tolls both commercial and free. I'd start from some kind of benchmarking with JMeter. It is open source, easily extendable and very popular product for performance testing.

Categories