Parameters in ReduceByKey in Spark

Parameters in ReduceByKey in Spark - java

While coding in Java in Spark, I have been facing the problems with parameters in reduceByKey in Spark. I didn't understand the parameters used in the reduceByKey function. I know that what reduceByKey means and the way it works. However, the codes below are a little different from the basic spark code examples (ex. word count example)
As you can see, there are two parameters in reduceByKey which are new KrukalReducer(numPoints) and numSubGraphs. numSubGraphs is integer value and the KruskalReducer is java class.
mstToBeMergedResult = mstToBeMerged.mapToPair(new SetPartitionIdFunction(K)).reduceByKey(
new KruskalReducer(numPoints), numSubGraphs);
I did't understand why such integer variables are used for reduceByKey. I tried to connect two parameters to the concept with ReduceByKey but failed to get it.
I attached the java class for your information.
public static final class KruskalReducer implements Function2<Iterable<Edge>, Iterable<Edge>, Iterable<Edge>>{
private static final long serialVersionUID = 1L;
private transient UnionFind uf = null;
private final int numPoints;
public KruskalReducer(int numPoints) {
this.numPoints = numPoints;
}
// merge sort
#Override
public Iterable<Edge> call(Iterable<Edge> leftEdges, Iterable<Edge> rightEdges) throws Exception{
uf = new UnionFind(numPoints);
List<Edge> edges = Lists.newArrayList();
Iterator<Edge> leftEdgesIterator = leftEdges.iterator();
Iterator<Edge> rightEdgesIterator = rightEdges.iterator();
Edge leftEdge = leftEdgesIterator.next();
Edge rightEdge = rightEdgesIterator.next();
Edge minEdge;
boolean isLeft;
Iterator<Edge> minEdgeIterator;
final int numEdges = numPoints - 1;
do {
if (leftEdge.getWeight() < rightEdge.getWeight()) {
minEdgeIterator = leftEdgesIterator;
minEdge = leftEdge;
isLeft = true;
} else {
minEdgeIterator = rightEdgesIterator;
minEdge = rightEdge;
isLeft = false;
}
if (uf.unify(minEdge.getLeft(), minEdge.getRight())) {
edges.add(minEdge);
}
minEdge = minEdgeIterator.hasNext() ? minEdgeIterator.next() : null;
if (isLeft) {
leftEdge = minEdge;
} else {
rightEdge = minEdge;
}
}while (minEdge != null && edges.size() < numEdges);
minEdge = isLeft ? rightEdge : leftEdge;
minEdgeIterator = isLeft ? rightEdgesIterator : leftEdgesIterator;
while (edges.size() < numEdges && minEdgeIterator.hasNext()) {
if (uf.unify(minEdge.getLeft(), minEdge.getRight())) {
edges.add(minEdge);
}
minEdge = minEdgeIterator.next();
}
return edges;
}
}
Additionally, the full related codes are shown as below. (You can skip this code if you get confused)
JavaPairRDD<Integer, Iterable<Edge>> mstToBeMerged = partitions.combineByKey(new CreateCombiner(),
new Merger(), new KruskalReducer(numPoints));
JavaPairRDD<Integer, Iterable<Edge>> mstToBeMergedResult = null;
while (numSubGraphs > 1){
numSubGraphs = (numSubGraphs + (K - 1)) / K;
mstToBeMergedResult = mstToBeMerged.mapToPair(new SetPartitionIdFunction(K)).reduceByKey(
new KruskalReducer(numPoints), numSubGraphs);
mstToBeMerged = mstToBeMergedResult;
displayResults(mstToBeMerged);
}
private static class CreateCombiner implements Function<Edge, Iterable<Edge>>{
private static final long serialVersionUID = 1L;
#Override
public Iterable<Edge> call(Edge edge) throws Exception {
List<Edge> edgeList = Lists.newArrayListWithCapacity(1);
edgeList.add(edge);
return edgeList;
}
}
private static class Merger implements Function2<Iterable<Edge>, Edge, Iterable<Edge>>{
private static final long serialVersionUID = 1L;
#Override
public Iterable<Edge> call(Iterable<Edge> list, Edge edge) throws Exception {
List<Edge> mergeList = Lists.newArrayList(list);
mergeList.add(edge);
return mergeList;
}
}

I did't understand why such integer variables are used for
reduceByKey. I tried to connect two parameters to the concept with
ReduceByKey but failed to get it.
If I'm reading the right overload:
def reduceByKey(func: JFunction2[V, V, V], numPartitions: Int): JavaPairRDD[K, V] =
fromRDD(rdd.reduceByKey(func, numPartitions))
Then the number you're passing is the number of partitions in the underlying RDD. Because reduceByKey is a shuffle boundary operation, data will get re-partitioned and passing that numbers allows you to control how many partitions will be allocated.

Related

Are there anyway to reduced complexiy of following java code?

I need to reduced following java method complexity according to sonar acceptable level. now ,it given this like sonar issue.
need some expert help to do this.
public List<X> Y(final DateTime treatmentDiscontinueTime,
final List< P> validPrescribedPrescriptions)
{
final List<x> doseWrapperList = new ArrayList<>();
final int noOfPrescriptions = validPrescribedPrescriptions.size();
for (int prescriptionIndex = 0; prescriptionIndex < noOfPrescriptions; prescriptionIndex++)
{
final BasePrescribedPrescription basePrescribedPrescription = validPrescribedPrescriptions.get(prescriptionIndex);
final String firstDoseText = basePrescribedPrescription.getFirstText();
final String secondDoseText = basePrescribedPrescription.getSecondText();
final boolean accordingToSchedule = A.ACCORDING.equals(firstDoseText);
final boolean specificPrescription = A.SP.equals(firstDoseText);
final boolean specificVbTypePrescription = A.SPVB.equals(firstDoseText);
List<D> doseDetails = new ArrayList<>(basePrescribedPrescription.getDoseDetails());
final DateTime changedDosageEndDate =
getChangedDoseEndDate(basePrescribedPrescription.getActualTerminateDate(), treatmentDiscontinueTime);
final int noOfDosages = doseDetails.size();
for (int doseIndex = 0; doseIndex < noOfDosages; doseIndex++)
{
final D doseDetail = doseDetails.get(doseIndex);
if ((doseDetail.getStart().getStartDate() != null) && (changedDosageEndDate != null) &&
doseDetail.getStart().getStartDate().isAfter(changedDosageEndDate))
{
continue;
}
String previewDoseText;
if (accordingToSchedule)
{
previewDoseText = X
}
else if (specificPrescription)
{
previewDoseText = Y;
}
else if (specificVbTypePrescription)
{
previewDoseText = Z;
}
else if (noOfDosages == 2)
{
previewDoseText = ((doseIndex == 0) ? secondDoseText : firstDoseText);
}
else
{
previewDoseText = firstDoseText;
}
final boolean isUnplanned =isuNplaned()
if (!isUnplanned)
{
doseStart = getStartDate();
doseEnd = getEndDate();
}
doseWrapperList.add(new DoseInfoLiteDTOWrapper(previewDoseText, doseStart, doseEnd, doseDetail));
}
}
return doseWrapperList;
}
i need some expert help to resoled this sonar issue. I thing different way to extract code fragment , breakdown this method to little parts.but still couldn't find some proper way to do it.

It's not difficult to clear, i think:
Use simple For loop
Create more small methods to do small(clear) things for For loop
Block if/esle: use simple statement
Hint: Study TDD to write clean code as possible

GC overhead limit exceeded while training OpenNLP's NameFinderME

I want to get probability score for the extracted names using NameFinderME, but using the provided model gives very bad probabilities using the probs function.
For example, "Scott F. Fitzgerald" gets a score around 0.5 (averaging log probabilities, and taking an exponent), while "North Japan" and "Executive Vice President, Corporate Relations and Chief Philanthropy Officer" both get a score higher than 0.9...
I have more than 2 million first names, and another 2 million last names (with their frequency counts) And I want to synthetically create a huge dataset from outer multiplication of the first names X middle names (using the first names pool) X last names.
The problem is, I don't even get to go over all the last names once (even when discarding freq counts and only using each name only once) before I get a GC overhead limit exceeded exception...
I'm implementing a ObjectStream and give it to the train function:
public class OpenNLPNameStream implements ObjectStream<NameSample> {
private List<Map<String, Object>> firstNames = null;
private List<Map<String, Object>> lastNames = null;
private int firstNameIdx = 0;
private int firstNameCountIdx = 0;
private int middleNameIdx = 0;
private int middleNameCountIdx = 0;
private int lastNameIdx = 0;
private int lastNameCountIdx = 0;
private int firstNameMaxCount = 0;
private int middleNameMaxCount = 0;
private int lastNameMaxCount = 0;
private int firstNameKBSize = 0;
private int lastNameKBSize = 0;
Span span[] = new Span[1];
String fullName[] = new String[3];
String partialName[] = new String[2];
private void increaseFirstNameCountIdx()
{
firstNameCountIdx++;
if (firstNameCountIdx == firstNameMaxCount) {
firstNameIdx++;
if (firstNameIdx == firstNameKBSize)
return; //no need to update anything - this is the end of the run...
firstNameMaxCount = getFirstNameMaxCount(firstNameIdx);
firstNameCountIdx = 0;
}
}
private void increaseMiddleNameCountIdx()
{
lastNameCountIdx++;
if (middleNameCountIdx == middleNameMaxCount) {
if (middleNameIdx == firstNameKBSize) {
resetMiddleNameIdx();
increaseFirstNameCountIdx();
} else {
middleNameMaxCount = getMiddleNameMaxCount(middleNameIdx);
middleNameCountIdx = 0;
}
}
}
private void increaseLastNameCountIdx()
{
lastNameCountIdx++;
if (lastNameCountIdx == lastNameMaxCount) {
lastNameIdx++;
if (lastNameIdx == lastNameKBSize) {
resetLastNameIdx();
increaseMiddleNameCountIdx();
}
else {
lastNameMaxCount = getLastNameMaxCount(lastNameIdx);
lastNameCountIdx = 0;
}
}
}
private void resetLastNameIdx()
{
lastNameIdx = 0;
lastNameMaxCount = getLastNameMaxCount(0);
lastNameCountIdx = 0;
}
private void resetMiddleNameIdx()
{
middleNameIdx = 0;
middleNameMaxCount = getMiddleNameMaxCount(0);
middleNameCountIdx = 0;
}
private int getFirstNameMaxCount(int i)
{
return 1; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getMiddleNameMaxCount(int i)
{
return 3; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getLastNameMaxCount(int i)
{
return 1;
//String occurences = (String) lastNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
#Override
public NameSample read() throws IOException {
if (firstNames == null) {
firstNames = CSVFileTools.readFileFromInputStream("namep_first_name_idf.csv", new ClassPathResource("namep_first_name_idf.csv").getInputStream());
firstNameKBSize = firstNames.size();
firstNameMaxCount = getFirstNameMaxCount(0);
middleNameMaxCount = getFirstNameMaxCount(0);
}
if (lastNames == null) {
lastNames = CSVFileTools.readFileFromInputStream("namep_last_name_idf.csv",new ClassPathResource("namep_last_name_idf.csv").getInputStream());
lastNameKBSize = lastNames.size();
lastNameMaxCount = getLastNameMaxCount(0);
}
increaseLastNameCountIdx();;
if (firstNameIdx == firstNameKBSize)
return null; //we've finished iterating over all permutations!
String [] sentence;
if (firstNameCountIdx < firstNameMaxCount / 3)
{
span[0] = new Span(0,2,"Name");
sentence = partialName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[1] = (String)lastNames.get(lastNameIdx).get("last_name");
}
else
{
span[0] = new Span(0,3,"name");
sentence = fullName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[2] = (String)lastNames.get(lastNameIdx).get("last_name");
if (firstNameCountIdx < 2*firstNameCountIdx/3) {
sentence[1] = (String)firstNames.get(middleNameIdx).get("first_name");
}
else {
sentence[1] = ((String)firstNames.get(middleNameIdx).get("first_name")).substring(0,1) + ".";
}
}
return new NameSample(sentence,span,true);
}
#Override
public void reset() throws IOException, UnsupportedOperationException {
firstNameIdx = 0;
firstNameCountIdx = 0;
middleNameIdx = 0;
middleNameCountIdx = 0;
lastNameIdx = 0;
lastNameCountIdx = 0;
firstNameMaxCount = 0;
middleNameMaxCount = 0;
lastNameMaxCount = 0;
}
#Override
public void close() throws IOException {
reset();
firstNames = null;
lastNames = null;
}
}
And
TokenNameFinderModel model = NameFinderME.train("en","person",new OpenNLPNameStream(),TrainingParameters.defaultParams(),new TokenNameFinderFactory());
model.serialize(new FileOutputStream("trainedNames.bin",false));
I get the following error after a few minutes of running:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at opennlp.tools.util.featuregen.WindowFeatureGenerator.createFeatures(WindowFeatureGenerator.java:112)
at opennlp.tools.util.featuregen.AggregatedFeatureGenerator.createFeatures(AggregatedFeatureGenerator.java:79)
at opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedFeatureGenerator.java:69)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:118)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:37)
at opennlp.tools.namefind.NameFinderEventStream.generateEvents(NameFinderEventStream.java:113)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:137)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:36)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:62)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:27)
at opennlp.tools.util.AbstractObjectStream.read(AbstractObjectStream.java:32)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:46)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:29)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:130)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
Edit: After increasing the memory of the JVM to 8GB, I still don't get past the first 2 million last names, but now the Exception is:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at opennlp.tools.ml.model.AbstractDataIndexer.update(AbstractDataIndexer.java:141)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:134)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
It seems the problem stems from the fact I'm creating a new NameSample along with new Spans and Strings at every read call... But I can't reuse Spans or NameSamples, since they're immutables.
Should I just write my own language model, is there a better Java library for doing this sort of thing (I'm only interested in getting the probability the extracted text is actually a name) are there parameters I should tweak for the model I'm training?
Any advice would be appreciated.

Androidplot: Dynamic plot with specific scan rate

I am using androidplot that loops the showing of a pulse (essentially a relatively short sequence of points) n times per minute and a flat value the rest of the time. There is an erase bar at the start that removes the 50 oldest points. But what I can't figure out how to have that graph update at a specific interval (the delay in run()) so that the series scans at 25mm/sec.
private class PulseXYSeries implements XYSeries {
private ArrayList<Integer> values;
private String title;
public PulseXYSeries(String title, int size) {
values = new ArrayList<Integer>(size);
for(int i = 0; i < size;i++) {
values.add(null);
}
this.title = title;
}
#Override
public String getTitle() {
return title;
}
public void remove(int idx) {
values.set(idx, null);
}
public void setY(int val, int idx) {
values.set(idx, val);
}
#Override
public Number getX(int idx) {
return idx;
}
#Override
public Number getY(int idx) {
if(idx >= values.size())
return null;
return values.get(idx);
}
#Override
public int size() {
return values.size();
}
}
private class MonitorDataSource implements Runnable {
private final int SAMPLE_SIZE = 1000;
private boolean keepRunning = false;
private List<Integer> queue;
private int flat;
private Thread rd;
MonitorDataSource(View rootView) {
queue = getSelectedPointData(rootView);
flat = queue.get(0);
rd = new Thread(/** runnable that calls dynamicPlot.redraw() at 30Hz **/);
rd.start();
}
public void stopThread() {
keepRunning = false;
rd.interrupt();
}
public void run() {
try {
Log.i(TAG,"Running pulse thread");
keepRunning = true;
int i=0;
boolean pulsing = true;
long lastPulse = SystemClock.elapsedRealtime();
long pulseDelay = 1000*60/mHeartRatePicker.getValue();
int position = 0;
// we need to scan at 25mm/sec
long delay = 10;
DisplayMetrics dp = getResources().getDisplayMetrics();
float plotWidth = dynamicPlot.getGraphWidget().getWidgetDimensions().canvasRect.width();
float plotWidthMm = plotWidth / dp.xdpi * 25.4f;
float widthPerTickInMm = plotWidthMm/(float)SAMPLE_SIZE;
Log.i(TAG,"Width per tick: "+widthPerTickInMm+" plot width px="+plotWidth+" in mm="+plotWidthMm+" xdpi="+dp.xdpi+" xdpmm="+(dp.xdpi*(1.0f/25.4f)));
long currTime,loopStart = SystemClock.elapsedRealtimeNanos();
while (keepRunning) {
// plot 4 points at a time
for (int j = 0; j < 3; j++) {
if(pulsing) {
mMovingWaveSeries.setY(queue.get(i),position);
if(++i == queue.size()-1) {
pulsing = false;
i=0;
}
} else {
mMovingWaveSeries.setY(flat,position);
currTime = SystemClock.elapsedRealtime();
if(currTime - lastPulse >= pulseDelay) {
pulsing = true;
lastPulse = currTime;
}
}
mMovingWaveSeries.remove(((position + 50) % SAMPLE_SIZE));
position = (position+1) % SAMPLE_SIZE;
if(position +1 >= SAMPLE_SIZE) {
float diff = (SystemClock.elapsedRealtimeNanos() - loopStart )/ 1000000000f;
loopStart = SystemClock.elapsedRealtimeNanos();
Log.i(TAG,"Looped through "+plotWidthMm+"mm in "+diff+"s = "+ (plotWidthMm/diff) +"mm/s");
}
}
Thread.sleep(delay);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

What seems to be lacking in your code is an instantaneous measurement of the current scan rate, in mm. You can use this value to adjust the scale of your plot's domain to get the desired effect. This is done via XYPlot.setDomainBoundaries(...). Domain scale and sample frequency (seemingly represented by "delay" in your code) can be adjusted to compensate for each other, so if you need to maintain a particular domain scale then modulate your sampling frequency accordingly. If done properly, rendering frequency should not matter at all and can be allowed to float...in fact modulating refresh rate to compensate for sample rate will usually result in buffer overrun/underrun issues.
UPDATE (response to below comment)
Appears that you're actually throttling the datasource (sample rate), not the plot (refresh rate), which is fine. The first thing you'll need to do is determine the loop frequency required to achieve 25mm/sec based on widthPerTickInMm and the number of points you are drawing in each loop:
Frequency(Hz) = 25 / (widthPerTickInMm * pointsPerLoop)
Use this value to modulate your datasource update loop. Here's an example of how you can dynamically modulate an arbitrary loop at a given frequency:
float hz = 5; // modulate at 5hz
long budget = (long) ((1/hz) * 1000f);
long loopDurationMs = 0;
long loopStartMs = 0;
while(true) {
// calculate how long this loop took:
long now = System.currentTimeMillis();
loopDurationMs = now - loopStartMs;
long sleepTime = budget - loopDurationMs;
loopStartMs = now;
if(sleepTime > 0) {
try {
Thread.sleep(sleepTime);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
Just a warning - I've not tried compiling or running the code but the concept is there. (This only works if your potential loop frequency is > desired frequency...probably obvious but just in case)

Hashtable key within integer interval

I don't know if this is possible but i'm trying to make an Hashtable of where Interval is a class with 2 integer / long values, a start and an end and i wanted to make something like this:
Hashtable<Interval, WhateverObject> test = new Hashtable<Interval, WhateverObject>();
test.put(new Interval(100, 200), new WhateverObject());
test.get(new Interval(150, 150)) // returns the new WhateverObject i created above because 150 is betwwen 100 and 200
test.get(new Interval(250, 250)) // doesn't find the value because there is no key that contains 250 in it's interval
So basically what i want is that a key between a range of values in an Interval object give the correspondent WhateverObject. I know i have to override equals() and hashcode() in the interval object, the main problem i think is to somehow have all the values between 100 and 200 (in this specific example) to give the same hash.
Any ideias if this is possible?
Thanks

No need to reinvent the wheel, use a NavigableMap. Example Code:
final NavigableMap<Integer, String> map = new TreeMap<Integer, String>();
map.put(0, "Cry Baby");
map.put(6, "School Time");
map.put(16, "Got a car yet?");
map.put(21, "Tequila anyone?");
map.put(45, "Time to buy a corvette");
System.out.println(map.floorEntry(3).getValue());
System.out.println(map.floorEntry(10).getValue());
System.out.println(map.floorEntry(18).getValue());
Output:
Cry Baby
School Time
Got a car yet?

You could use an IntervalTree. Here's one I made earlier.
public class IntervalTree<T extends IntervalTree.Interval> {
// My intervals.
private final List<T> intervals;
// My center value. All my intervals contain this center.
private final long center;
// My interval range.
private final long lBound;
private final long uBound;
// My left tree. All intervals that end below my center.
private final IntervalTree<T> left;
// My right tree. All intervals that start above my center.
private final IntervalTree<T> right;
public IntervalTree(List<T> intervals) {
if (intervals == null) {
throw new NullPointerException();
}
// Initially, my root contains all intervals.
this.intervals = intervals;
// Find my center.
center = findCenter();
/*
* Builds lefts out of all intervals that end below my center.
* Builds rights out of all intervals that start above my center.
* What remains contains all the intervals that contain my center.
*/
// Lefts contains all intervals that end below my center point.
final List<T> lefts = new ArrayList<T>();
// Rights contains all intervals that start above my center point.
final List<T> rights = new ArrayList<T>();
long uB = Long.MIN_VALUE;
long lB = Long.MAX_VALUE;
for (T i : intervals) {
long start = i.getStart();
long end = i.getEnd();
if (end < center) {
lefts.add(i);
} else if (start > center) {
rights.add(i);
} else {
// One of mine.
lB = Math.min(lB, start);
uB = Math.max(uB, end);
}
}
// Remove all those not mine.
intervals.removeAll(lefts);
intervals.removeAll(rights);
uBound = uB;
lBound = lB;
// Build the subtrees.
left = lefts.size() > 0 ? new IntervalTree<T>(lefts) : null;
right = rights.size() > 0 ? new IntervalTree<T>(rights) : null;
// Build my ascending and descending arrays.
/** #todo Build my ascending and descending arrays. */
}
/*
* Returns a list of all intervals containing the point.
*/
List<T> query(long point) {
// Check my range.
if (point >= lBound) {
if (point <= uBound) {
// In my range but remember, there may also be contributors from left or right.
List<T> found = new ArrayList<T>();
// Gather all intersecting ones.
// Could be made faster (perhaps) by holding two sorted lists by start and end.
for (T i : intervals) {
if (i.getStart() <= point && point <= i.getEnd()) {
found.add(i);
}
}
// Gather others.
if (point < center && left != null) {
found.addAll(left.query(point));
}
if (point > center && right != null) {
found.addAll(right.query(point));
}
return found;
} else {
// To right.
return right != null ? right.query(point) : Collections.<T>emptyList();
}
} else {
// To left.
return left != null ? left.query(point) : Collections.<T>emptyList();
}
}
private long findCenter() {
//return average();
return median();
}
protected long median() {
// Choose the median of all centers. Could choose just ends etc or anything.
long[] points = new long[intervals.size()];
int x = 0;
for (T i : intervals) {
// Take the mid point.
points[x++] = (i.getStart() + i.getEnd()) / 2;
}
Arrays.sort(points);
return points[points.length / 2];
}
/*
* What an interval looks like.
*/
public interface Interval {
public long getStart();
public long getEnd();
}
/*
* A simple implemementation of an interval.
*/
public static class SimpleInterval implements Interval {
private final long start;
private final long end;
public SimpleInterval(long start, long end) {
this.start = start;
this.end = end;
}
public long getStart() {
return start;
}
public long getEnd() {
return end;
}
#Override
public String toString() {
return "{" + start + "," + end + "}";
}
}
}

A naive HashTable is the wrong solution here. Overriding the equals() method doesn't do you any good because the HashTable compares a key entry by the hash code first, NOT the equals() method. The equals() method is only checked AFTER the hash code is matched.
It's easy to make a hash function on your interval object, but it's much more difficult to make one that would yield the same hashcode for all possible intervals that would be within another interval. Overriding the get() method (such as here https://stackoverflow.com/a/11189075/1261844) for a HashTable completely negates the advantages of a HashTable, which is very fast lookup times. At the point where you are scanning through each member of a HashTable, then you know you are using the HashTable incorrectly.
I'd say that Using java map for range searches and https://stackoverflow.com/a/11189080/1261844 are better solutions, but a HashTable is simply not the way to go about this.

I think implementing a specialized get-method would be much easier.
The new method can be part of a map-wrapper-class.
The key-class: (interval is [lower;upper[ )
public class Interval {
private int upper;
private int lower;
public Interval(int upper, int lower) {
this.upper = upper;
this.lower = lower;
}
public boolean contains(int i) {
return i < upper && i >= lower;
}
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Interval other = (Interval) obj;
if (this.upper != other.upper) {
return false;
}
if (this.lower != other.lower) {
return false;
}
return true;
}
#Override
public int hashCode() {
int hash = 5;
hash = 61 * hash + this.upper;
hash = 61 * hash + this.lower;
return hash;
}
}
The Map-class:
public class IntervalMap<T> extends HashMap<Interval, T> {
public T get(int key) {
for (Interval iv : keySet()) {
if (iv.contains(key)) {
return super.get(iv);
}
}
return null;
}
}
This is just an example and can surely be optimized, and there are a few flaws as well:
For Example if Intervals overlap, there's no garantee to know which Interval will be used for lookup and Intervals are not garanteed to not overlap!

OldCurmudgeon's solution works perfectly for me, but is very slow to initialise (took 20 mins for 70k entries).
If you know your incoming list of items is already ordered (ascending) and has only non overlapping intervals, you can make it initialise in milliseconds by adding and using the following constructor:
public IntervalTree(List<T> intervals, boolean constructorFlagToIndicateOrderedNonOverlappingIntervals) {
if (intervals == null) throw new NullPointerException();
int centerPoint = intervals.size() / 2;
T centerInterval = intervals.get(centerPoint);
this.intervals = new ArrayList<T>();
this.intervals.add(centerInterval);
this.uBound = centerInterval.getEnd();
this.lBound = centerInterval.getStart();
this.center = (this.uBound + this.lBound) / 2;
List<T> toTheLeft = centerPoint < 1 ? Collections.<T>emptyList() : intervals.subList(0, centerPoint);
this.left = toTheLeft.isEmpty() ? null : new IntervalTree<T>(toTheLeft, true);
List<T> toTheRight = centerPoint >= intervals.size() ? Collections.<T>emptyList() : intervals.subList(centerPoint+1, intervals.size());
this.right = toTheRight.isEmpty() ? null : new IntervalTree<T>(toTheRight, true);
}

This depends on your hashCode implementation. You may have two Objects with the same hashCode value.
Please use eclipse to generate a hashCode method for your class (no point to re-invent the wheel

For Hastable or HashMap to work as expected it's not only a equal hashcode, but also the equals method must return true. What you are requesting is that Interval(x, y).equals(Interval(m, n)) for m, n within x,y. As this must be true for any overlapping living instance of Interval, the class has to record all of them and needs to implement what you are trying to achieve, indeed.
So in short the answer is no.
The Google guava library is planning to offer a RangeSet and Map: guava RangeSet
For reasonable small ranges an easy approach would be to specialize HashMap by putting and getting the indivual values of the intervals.

Qt: how do I highlight duplicated items in QListWidget? (qtjambi)

I need to implement a mechanism of highlighting duplicated values. Values are edited through delegate depending on the value type (string - line edit, long and big decimal - spin boxes). Currently, I implemented this feature with help of additional class which stores all values and their counts in two "parallel" lists. And after adding a new value I increase its count number (or decrease when repeated value is removed), but this solution seems to be too bulky. Do you guys have any other ideas on highlighting in setModelData(...) method of QItemDelegate?
/**
* Stores a delegates' existing values
*/
private final class DelegateValuesStorage {
private final List<Object> values = new ArrayList<Object>();
private final List<Integer> counts = new ArrayList<Integer>();
....
//Add value or increase a count if exists
public void add(final Object value) {
if(values.contains(value)) {
final int valueIndex = values.indexOf(value);
final int oldCount = counts.get(valueIndex);
counts.remove(valueIndex);
counts.add(valueIndex, oldCount + 1);
} else {
values.add(value);
counts.add(1);
}
}
....
//Decrease a count or remove value if it doesn't exist anymore
public void decreaseCount(final Object value) {
if(value == null) {
return;
}
final int index = values.indexOf(value);
if(index >= 0) {
final int oldCount = counts.get(index);
if(oldCount >= 2) {
counts.remove(index);
counts.add(index, oldCount - 1);
} else {
values.remove(index);
counts.remove(index);
}
}
}
/**
* Delegate
*/
private class ConcreteDelegate extends QItemDelegate {
private final DelegateValuesStorage values = new DelegateValuesStorage();
...
#Override
public void setModelData(final QWidget editor, final QAbstractItemModel model, final QModelIndex index) {
if(editor instanceof ValEditor) { // ValEditor is an abstraction of line edit and spin box over values' data types
final Object value = ((ValEditor) editor).getValue();
model.setData(index, value, Qt.ItemDataRole.UserRole);
final String newData = (value == null) ? "" : String.valueOf(value);
values.add(newData);
final String oldData = (String) model.data(index, Qt.ItemDataRole.DisplayRole);
values.decreaseCount(oldData);
model.setData(index, newData, Qt.ItemDataRole.DisplayRole);
model.setData(index, new QColor(0, 0, 0), Qt.ItemDataRole.ForegroundRole);
redrawItems(model); // runs through values and colors them red if count >= 2; or black if count == 1
} else {
super.setModelData(editor, model, index);
}
}
}

I usually use maps for those kinds of tasks:
private final class DelegateValuesStorage {
private final Map<Object, Integer> values = new HashMap<Object, Integer>();
....
//Add value or increase a count if exists
public void add(final Object value) {
Integer count = values.get(value);
if (count == null) {
values.put(value, 1);
} else {
values.put(value, count + 1);
}
}
....
//Decrease a count or remove value if it doesn't exist anymore
public void decreaseCount(final Object value) {
if(value == null) {
return;
}
Integer count = values.get(value);
if (count == null) {
// decreasing a "new" value - could be an error too
return;
}
if (count <= 1) {
// remove the value from the map
values.remove(value);
} else {
values.put(value, count - 1);
}
}
}
Highlighting now should be enabled if
values.get(value) > 1
is true.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parameters in ReduceByKey in Spark - java

Related

Are there anyway to reduced complexiy of following java code?

GC overhead limit exceeded while training OpenNLP's NameFinderME

Androidplot: Dynamic plot with specific scan rate

Hashtable key within integer interval

Qt: how do I highlight duplicated items in QListWidget? (qtjambi)

Categories

Resources