I have to do an exercise for a parallel computing course.
The task is using N parallel processes to remove all combinations of letters "RTY" from the string.
Normally I'll do it with
String strAfter=str1.replaceAll("[RTY]","") ;
But how to make it in parallel?
Split, work, merge.
Split in the main thread storing the output in a Set
Create N worker threads.
Have each worker thread syncrhonized pick() a string from the set at a given index, increase the index and process the entry
When index reaches Set size, glue everything back together. You may want to use StringBuilder and append() instead of concatenating Strings
Split the String into N parts then make each process work on one chunk of String. The splitting mechanism should be intelligent enough to handle boundary values. You need to communicate one chunk of String to corresponding processes using Send() and Recv() methods for processing and in the end updated String should be communicated in same manner. Here you can find Javadocs http://mpj-express.org/docs/javadocs/index.html
My guess is you need to find a way to do this without using single-threaded functions on the entire string. What about breaking the string into N parts and letting each of your N parallel processes run the replace function on that part and concatenating the string after all the threads finished?
Related
Edit
IMHO : I think it is not a duplicate because the two questions are trying to solve the problem in different ways and especially because they provide totally different technological skills (and finally, because I ask myself these two questions).
Question
How to aggregate items from an ordered stream, preferably in an intermediate operation ?
Context
Following my other question : Java8 stream lines and aggregate with action on terminal line
I've got a very large file of the form :
MASTER_REF1
SUBREF1
SUBREF2
SUBREF3
MASTER_REF2
MASTER_REF3
SUBREF1
...
Where SUBREF (if any) is applicable to MASTER_REF and both are complex objects (you can imagine it somewhat like JSON).
On first look I tried to group the lines with an operation returning null while agregating and a value when a group of line could be found (a "group" of lines ends if line.charAt(0)!=' ').
This code is hard to read and requires a .filter(Objects::nonNull).
I think one could achieve this using a .collect(groupingBy(...)) or a .reduce(...) but those are terminal operations which is :
not required in my case : lines are ordered and should be grouped by their position and groups of line are to be transformed afterwards (map+filter+...+foreach);
nor a good idea : I'm talking of a huge data file that is way bigger than the total amount of RAM+SWAP ... a terminal operation would saturate availiable resources (as said, by design I need to keep groups in memory because are to be transformed afterwards)
As I already noted in the answer to the previous question, it's possible to use some third-party libraries which provide partial reduction operations. One of such libraries is StreamEx which I develop by myself.
In StreamEx library the partial reduction operation is the intermediate stream operation which combines several input elements while some condition is met. Usually the condition is specified via BiPredicate applied to the pair of adjacent stream elements which returns true when elements should be combined together. The simplest way to combine elements is to make a List via StreamEx.groupRuns() method like this:
Stream<List<String>> records = StreamEx.of(Files.lines(path))
.groupRuns((line1, line2) -> !line2.startsWith("MASTER"));
Here we start a new record when the second of two adjacent lines starts with "MASTER" (as in your example). Otherwise we continue the previous record.
Note that such stream is still lazy. In sequential processing at most one intermediate List<String> is created at a time. Parallel processing is also supported, though turning the Files.lines stream into parallel mode rarely improves the performance (at least prior to Java-9).
I have to process 450 unique strings about 500 million times. Each string has unique integer identifier. There are two options for me to use.
I can append the identifier with the string and on arrival of the
string I can split the string to get the identifier and use it.
I can store the 450 strings in HashMap<String, Integer> and on
arrival of the string, I can query HashMap to get the identifier.
Can someone suggest which option will be more efficient in terms of processing?
It all depends on the sizes of the strings, etc.
You can do all sorts of things.
You can use a binary search to get the index in a list, and at that index is the identifier.
You can hash just the first 2 characters, rather than the entire string, that would likely be faster than the binary search, assuming the strings have an OK distribution.
You can use the first character, or first two characters, if they're unique as a "perfect index" in to 255 or 65K large array that points to the identifier.
Also, if your identifier is numeric, it's better to pre-calculate that, rather than convert it on the fly all the time. Text -> Binary is actually rather expensive (Binary -> Text is worse). So it's probably nice to avoid that if possible.
But it behooves you work the problem. 1 million anything at 1ms each, is 20 minutes of processing. At 500m, every nano-second wasted adds up to 8+ minutes extra of processing. You may well not care, but just demonstrating that at these scales "every little bit helps".
So, don't take our words for it, test different things to find what gives you the best result for your work set, and then go with that. Also consider excessive object creation, and avoiding that. Normally, I don't give it a second thought. Object creation is fast, but a nano-second is a nano-second.
If you're working in Java, and you don't REALLY need Unicode (i.e. you're working with single characters of the 0-255 range), I wouldn't use strings at all. I'd work with raw bytes. String are based on Java characters, which are UTF-16. Java Readers convert UTF-8 in to UTF-16 every. single. time. 500 million times. Yup! Another few nano-seconds. 8 nano-seconds adds an hour to your processing.
So, again, look in all the corners.
Or, don't, write it easy, fire it up, run it over the weekend and be done with it.
If each String has a unique identifier then retrieval is O(1) only in case of hashmaps.
I wouldn't suggest the first method because you are splitting every string for 450*500m, unless your order is one string for 500m times then on to the next. As Will said, appending numeric to strings then retrieving might seem straight forward but is not recommended.
So if your data is static (just the 450 strings) put them in a Hashmap and experiment it. Good luck.
Use HashMap<Integer, String>. Splitting a string to get the identifier is an expensive operation because it involves creating new Strings.
I don't think anyone is going to be able to give you a convincing "right" answer, especially since you haven't provided all of the background / properties of the computation. (For example, the average length of the strings could make a lot of difference.)
So I think your best bet would be to write a benchmark ... using the actual strings that you are going to be processing.
I'd also look for a way to extract and test the "unique integer identifier" that doesn't entail splitting the string.
Splitting the string should work faster if you write your code well enough. In fact if you already have the int-id, I see no reason to send only the string and maintain a mapping.
Putting into HashMap would need hashing the incoming string every time. So you are basically comparing the performance of the hashing function vs the code you write to append (prepending might be a bit more tricky) on sending end and to parse on receiving end.
OTOH, only 450 strings aren't a big deal, and if you're into it, writing your own hashing algo/function would actually be the most elegant and performant.
I have requirement to split a input string in to output string/s (with some order)
by applying a set of regexs on the input string. I thinking to implement
this functionality with cluster of akka actors in such a way I scatter the
regex and input string and gather the string. However I would like to know
while collecting back the processed string, how to keep track of the order. Iam not
certain if "Scatter-Gather" is the correct approach , if there is any other that
can be suited please suggest.
I guess you will have to provide hints to the gatherer on how to assemble the string in order. You don't mention how the order is established: whether the initial split can define the order or whether the regex processing will define it.
In both cases, you need to keep track of 3 things:
- the initial source,
- the order of each individual piece
- the total amount of pieces
You could use a message like:
case class StringSegment(id:String, total:Int, seqNr:Int, payload:String)
The scatterer produces StringSegments based on the input:
def scatter(s:String):List[StringSegment] ...
scatter(input).foreach(msg => processingActor ! msg)
and the gatherer will assemble them together, using the seqNr to know the order and total to know when all the pieces are present.
I'm writing a routine that takes a string and formats it as quoted printable. And it's got to be as fast as possible. My first attempt copied characters from one stringbuffer to another encoding and line wrapping along the way. Then I thought it might be quicker to just modify the original stringbuffer rather than copy all that data which is mostly identical. Turns out the inserts are far worse than copying, the second version (with the stringbuffer inserts) was 8 times slower, which makes sense, as it must be moving a lot of memory.
What I was hoping for was some kind of gap buffer data structure so the inserts wouldn't involve physically moving all the characters in the rest of the stringbuffer.
So any suggestions about the fastest way to rip through a string inserting characters every once in a while?
Suggestions to use the standard mimeutils library are not helpful because I'm also dot escaping the string so it can be dumped out to an smtp server in one shot.
At the end, your gap data structure would have to be transformed into a String, which would need assembling all the chunks in a single array by appending them to a StringBuilder.
So using a StringBuilder directly will be faster. I don't think you'll find a faster technique than that. Make sure to initialize the StringBuilder with a large enough size to avoid copies of the whole buffer once the capacity is exhausted.
So taking the advice of some of the other answers here I've been writing many versions of this function, seeing what goes quickest and for future reference if anybody can gain from what I found:
1) The slowest: stringbuffer.append() but we knew that.
2) Almost twice as fast: stringbuilder.append(). locks are very expensive it seems.
3) another 20% faster is.... copying from one char[] to another.
4) and finally, coming in three times faster than even that... a JNI call to the exact same code compiled in C that copies from one char array to another.
You may consider #4 cheating, but cheaters win. It is by far the fastest way to go.
There is a risk of the GetCharArrayElements call causing the java char array to be copied so it can be handed to the C program, but I can't tell if that's happening, and it's still wicked fast compared to any java implementation.
I think a good balance between speed and coding grace would be using Matcher.appendReplacement. Formulate a regex that will catch all insertion points. In a loop you use find, analyze Matcher.group() to see what exactly has matched, and use your program logic to decide what to give to appendReplacement.
In any case, it is important not to copy the text over char by char. You must copy in the largest chunks possible.
The Matcher API is quite unfortunately bound to the StringBuffer, but, as you find, that only steels the final 5% from you.
I have the following problem:
for example, I have 10 lists, each one has a link to some other lists, I would like to create a code to search elements in theses lists, I've already done this algorithm but sequentially, it start the search in the first list, then if the search is failed, it send messages for searching in lists that have a link with it (to the first one), at the end of the algorithm, he show the results as the number of lists visited and if he find the element or no.
now, I want to transform it to be a parallel algorithm, at least a concurrent one using multi-threads:
to use threads for searching;
to start a search in the 10 lists at the same time;
As long as you don't change anything, you can consider your search read only. In that case, you probably don't need synchronization. If you want to have a fast search, don't use threads directly but use runnables and look for the appropriate classes. If you do work directly with threads, make sure that you don't exceed the number of processors.
Before going further, read into multi-threading. I would mention "Java Concurrency in Practice" as a (rather safe) recommendation. It's too easy to get wrong.
I am not sure from your problem statement if there is a sequential dependency between the searches in different lists, namely whether search results from the first list should take priority over those from the second list or not.
Assume there's no such dependency, so that any search result from any list is fine, then this is expressed in a very concise way as a speculative search in Ateji PX:
Object parallelSearch(Collection<List> lists)
{
// start of parallel block
[
// create one parallel branch for each of the lists
|| (List list: lists) {
// start searching lists in parallel
Object result = search(list);
// the first branch that finds a result returns it,
// thereby stopping all remaining parallel branches
if(result != null) return result;
}
]
}
The term "speculative" means that you run a number of searches in parallel although you know that you will use the result of only one of them. Then as soon as you have found a result in one of the searches, you want to stop all the remaining searches.
If you run this code on a 4-core machine, the Ateji PX runtime will schedule 4 parallel branches at a time in order to make the best use of the available parallel hardware.