Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have 200 folders with up to 20 files in each folder. Total the dataset is 2gb. I tried parsing all at once and put each line into a list and sort them but i get out of memory.
What approach could I use to sort the multiple files into on single file?
File-based merge-sort:
Sort content of each file.
Merge sort the 20 files of each folder to get one sorted file per folder.
Merge sort the 200 folder-files to get final result.
If you don't want to do a 200-way merge sort, you can split #3 into multiple merge-sorts and then merge-sort the results of those, to as many levels as needed.
What sorting algorithm do you use? Because I think the problem lies with the algorithm; you need to take a look for a more efficient algorithm to do the sorting. I believe that for large inputs, Merge-Sort is the best (albeit with a few modifications for that size).
Here is a very similar question, take a look at the top two answers. They should help you solve the problem.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 months ago.
Improve this question
for my project i have to create a huffman tree project, but my lecturer has said that i cannot use priority queues to build it?
But i dunno how to implement that.
Are there any other ways i can create a huffman tree without using priority queues?
This is an example of a huffman tree but it is using priority queues
enter image description here
enter image description here
There's a trick that is often used to build Huffman trees in practice:
Create a list of your symbols with probabilities, and sort it in ascending order
Create an initially empty list for combined symbols. This will remain sorted as we work.
While there is more than one symbol in the lists:
Remove the two smallest symbols from the beginnings of two lists
Combine them and add them to the end of the combined list. Because the new symbol has a higher combined probability than any combined symbol created before, this list remains sorted.
After the initial sorting, the smallest probability symbol will always be the first one of one of the two lists, so no priority queues or searching is required to find it.
This technique is quite clever, and your lecturer would not expect you to think of it yourself, so it was probably taught or referenced in class.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm writing a java program in Eclipse, and when I try to use recursion it gives this error:
Exception in thread "main" java.lang.StackOverflowError
I know that people have probably already asked this question, but every response I've found has been to remove recursion. For what I'm trying to do, recursion is a necessity. There's no other option. I know that the recursive limit can be modified in Enthought Canopy (for python) like this:
import sys
sys.setrecursionlimit(10000)
Is there a way to do this for java in Eclipse? Again, removing recursion is not an option.
UPDATE: I figured out the problem (which was an infinite loop), and the code works now.
Take a look at this: What is the maximum depth of the java call stack?
While it is not exactly a duplicate, it also answers your question by explaining how the limit can be changed.
Note that Eclipse itself has nothing to do with the limit, it is a Java restriction and can be increased by allocating more space to it.
As always with such questions one should note that your code is likely to be inefficient, wrong or maybe has a non-recursive alternative. However you said that you are not interested in such solutions, so I just leave it here as a side note.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I want to change my DOM tree by removing all nodes except some.
For example, if I want my new DOM to have one of the leaf nodes of the old one, everything needs to be deleted expect the leaf and everyone of his parents (ancestors). Basically I have a list of nodes at some depth that need to be saved and everything else removed.
Iterating on every level to remove nodes takes to much time. I also tried approaching this using "ancestor-or-self" with xpath but that's not helping me delete nodes.
XSLT is designed for this job; it can be called from Java, and it can operate on DOM trees. Basically the rules you outline in your question translate directly into template rules in XSLT, but to give you examples I would need a more precise specification.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
For practice I want to make my own lists and maps (like ArrayList, HashMap, HashSet etc.).
My goal is to have it as small and flexible as possible while still maintaining good performance. (long road...)
I have some questions:
1)
Unlike the sun, I don't have to take backwards compatibility into account.
So the first thing I wonder, is there any good reason to keep add and put?
Why not just one?
If I would name put > add would this give problems / complexity / unclearness down the road?
2)
Are there any languages known to have really good data structures? (For example, they could be really smart to avoid a concurrency exception).
3)
As last more a request then a question, if you have any tips our vision of how things could be done different then please post them.
There is no duplicated methods, Collection's have add method that returns a boolean, Map's have put method that returns type associated to Map.
There are plenty of examples of data structure, the point is, ¿what you need your data stucture do best? Avoid concurrency? sort? be fast? store securely?
The examples you need are directly in Java source code:
SOURCES
List
ArrayList
HashMap
and so on....
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
This is kind of unusual question for developers but for some reason i want to post it here and hope to get adequate answer.
Here is a simple example:
I wrote a java function that calculates distance between two geo points. The function is not more than 50 lines of code. I decided to download a source code from ibm that does the same thing but when i opened it i saw that it looks very complicated and is almost thousand lines of code.
What kind of people write such source code? Are they just very good programmers? Should i use their source code or my own?
I have noticed this kind of thing lots of times and i from time to time i start to wonder if it is just me who do not know how exactly to program or maybe i am wrong?
Do you guys have the same kind of feeling when you browse throught some other peoples source code?
The code you found, does it do the exact same calculation? Perhaps it takes into account some edge cases you didn't think of, or uses an algorithm that has better numerical stability, lower asymptotic complexity, or is written to take advantage of branch prediction or CPU caches. Or it could be just over-engineered.
Remember the saying: "For every complex problem there is a solution that is simple, elegant, and wrong." If you are dealing with numerical software, even the most basic problems like adding a bunch of numbers can turn out to be surprisingly complex.