I am attempting to implement the following Basic Sliding Window algorithm in Java. I get the basic idea of it, but I am a bit confused by some the wording, specifically the sentence in bold:
A sliding window of fixed width w is moved across the file,
and at every position k in the file, the fingerprint of
its content is computed. Let k be a chunk boundary
(i.e., Fk mod n = 0). Instead of taking the hash of the
entire chunk, we choose the numerically smallest fingerprint
of a sliding window within this chunk. Then we compute a hash
of this randomly chosen window within the chunk. Intuitively,
this approach would permit small edits within the chunks to
have less impact on the similarity computation. This method
produces a variable length document signature, where the
number of fingerprints in the signature is proportional to
the document length.
Please see my code/results below. Am I understanding the basic idea of the algorithm? As per the text in bold, what does it mean to "choose the numerically smallest fingerprint of a sliding window within its chunk"? I am currently just hashing the entire chunk.
code:
public class BSW {
/**
* #param args
*/
public static void main(String[] args) {
int w = 15; // fixed width of sliding window
char[] chars = "Once upon a time there lived in a certain village a little
country girl, the prettiest creature who was ever seen. Her mother was
excessively fond of her; and her grandmother doted on her still more. This
good woman had a little red riding hood made for her. It suited the girl so
extremely well that everybody called her Little Red Riding Hood."
.toCharArray();
List<String> fingerprints = new ArrayList<String>();
for (int i = 0; i < chars.length; i = i + w) {
StringBuffer sb = new StringBuffer();
if (i + w < chars.length) {
sb.append(chars, i, w);
System.out.println(i + ". " + sb.toString());
} else {
sb.append(chars, i, chars.length - i);
System.out.println(i + ". " + sb.toString());
}
fingerprints.add(hash(sb));
}
}
private static String hash(StringBuffer sb) {
// Implement hash (MD5)
return sb.toString();
}
}
results:
0. Once upon a tim
15. e there lived i
30. n a certain vil
45. lage a little c
60. ountry girl, th
75. e prettiest cre
90. ature who was e
105. ver seen. Her m
120. other was exces
135. sively fond of
150. her; and her gr
165. andmother doted
180. on her still m
195. ore. This good
210. woman had a lit
225. tle red riding
240. hood made for h
255. er. It suited t
270. he girl so extr
285. emely well that
300. everybody call
315. ed her Little R
330. ed Riding Hood.
That is not a sliding window. All you have done is break up the input into disjoint chunks. An example of a sliding window would be
Once upon a time
upon a time there
a time there lived
etc.
The simple answer is NO per my understanding (I once studied sliding window algorithm years ago, so I just remember the principles, while cannot remember some details. Correct me if you have more insightful understanding).
As the name of the algorithm 'Sliding Window', your window should be sliding not jumping as it says
at every position k in the file, the fingerprint of its content is computed
in your quotes. That is to say the window slides one character each time.
Per my knowledge, the concept of chunks and windows should be distinguished. So should be fingerprint and hash, although they could be the same. Given it too expense to compute hash as fingerprint, I think Rabin fingerprint is a more proper choice. The chunk is a large block of text in the document and a window highlight a small portion in a chunk.
IIRC, the sliding windows algorithm works like this:
The text file is chunked at first;
For each chunk, you slide the window (a 15-char block in your running case) and compute their fingerprint for each window of text;
You now have the fingerprint of the chunk, whose length is proportional to the length of chunk.
The next is how you use the fingerprint to compute the similarity between different documents, which is out of my knowledge. Could you please give us the pointer to the article you referred in the OP. As an exchange, I recommend you this paper, which introduce a variance of sliding window algorithm to compute document similarity.
Winnowing: local algorithms for document fingerprinting
Another application you can refer to is rsync, which is a data synchronisation tool with block-level (corresponding to your chunk) deduplication. See this short article for how it works.
package com.perturbation;
import java.util.ArrayList;
import java.util.List;
public class BSW {
/**
* #param args
*/
public static void main(String[] args) {
int w = 2; // fixed width of sliding window
char[] chars = "umang shukla"
.toCharArray();
List<String> fingerprints = new ArrayList<String>();
for (int i = 0; i < chars.length+w; i++) {
StringBuffer sb = new StringBuffer();
if (i + w < chars.length) {
sb.append(chars, i, w);
System.out.println(i + ". " + sb.toString());
} else {
sb.append(chars, i, chars.length - i);
System.out.println(i + ". " + sb.toString());
}
fingerprints.add(hash(sb));
}
}
private static String hash(StringBuffer sb) {
// Implement hash (MD5)
return sb.toString();
}
}
this program may help you. and please try to make more efficent
Related
I am creating javafx.Text objects (maintained in an instance of LinkedList) and placing them on javafx.Group (i.e: sourceGroup.getChildren().add(Text)). Each Text instance holds only one letter (not an entire word).
I have a click even that returns the x and y coordinates of the click. I want to drop a cursor to appear in front of the letter. This needs to be done in constant time, so I cant just iterate over my LinkedList and examine the Text x and y values.
There are certain restrictions on the libraries I can use. I can essentially only use javafx stuffs and java.util stuffs.
I was reading that HashMaps lookups essentially take place in constant time. My idea to drop the cursor is to:
1) While adding Text to the LinkedList instance, I want to update four hashMaps. One hashMap for the upper X value, one for the lower X value and the same for the Y values.
2) when it comes time to drop a cursor, i grab the x and y coordinates of the mouse click and perform a series of intersections (this part im not sure how to do yet) which should return the Text or subset of texts that fall between the X range and the Y range.
My Question:
Is there a better/more efficient way to do this? Am I being terribly inefficient with this idea?
Just add a click listener to each text item, and, when the mouse is clicked on the text, reposition the cursor based upon the text bounds in parent.
It's your homework, so you may or may not wish to look at the following code...
import javafx.application.Application;
import javafx.geometry.*;
import javafx.scene.Scene;
import javafx.scene.control.ScrollPane;
import javafx.scene.layout.FlowPane;
import javafx.scene.layout.Pane;
import javafx.scene.shape.Line;
import javafx.scene.text.Text;
import javafx.stage.Stage;
import java.util.stream.Collectors;
public class SillySong extends Application {
private static final String lyrics =
"Mares eat oats and does eat oats and little lambs eat ivy. ";
private static final int CURSOR_HEIGHT = 16;
private static final int INSET = 2;
private static final int N_LYRIC_REPEATS = 10;
private Line cursor = new Line(INSET, INSET, INSET, INSET + CURSOR_HEIGHT);
#Override
public void start(Stage stage) {
FlowPane textPane = new FlowPane();
for (int i = 0; i < N_LYRIC_REPEATS; i++) {
lyrics.codePoints()
.mapToObj(this::createTextNode)
.collect(Collectors.toCollection(textPane::getChildren));
}
textPane.setPadding(new Insets(INSET));
Pane layout = new Pane(textPane, cursor) {
#Override
protected void layoutChildren() {
super.layoutChildren();
layoutInArea(textPane, 0, 0, getWidth(), getHeight(), 0, new Insets(0), HPos.LEFT, VPos.TOP);
}
};
ScrollPane scrollPane = new ScrollPane(layout);
scrollPane.setFitToWidth(true);
stage.setScene(new Scene(scrollPane, 200, 150));
stage.show();
}
private Text createTextNode(int c) {
Text text = new Text(new String(Character.toChars(c)));
text.setOnMouseClicked(event -> {
Bounds bounds = text.getBoundsInParent();
cursor.setStartX(bounds.getMinX());
cursor.setStartY(bounds.getMinY());
cursor.setEndX(bounds.getMinX());
cursor.setEndY(bounds.getMinY() + CURSOR_HEIGHT);
});
return text;
}
public static void main(String[] args) {
launch(args);
}
}
This was just a basic sample, if you want to study something more full featured, look at the source of RichTextFX.
Truly, new TextField() is simpler :-)
So, what's really going on in the sample above? Where did all your fancy hash tables for click support go? How is JavaFX determining you clicked on a given text node? Is it using some kind of tricky algorithm for spatial indexing such as a quadtree or a kdtree?
Nah, it is just doing a straight depth first search of the scene graph tree and returning the first node it finds that intersects the click point, taking care to loop through children in reverse order so that the last added child to a parent group receives click processing priority over earlier children if the two children overlap.
For a parent node (Parent.java source):
#Deprecated
#Override protected void impl_pickNodeLocal(PickRay pickRay, PickResultChooser result) {
double boundsDistance = impl_intersectsBounds(pickRay);
if (!Double.isNaN(boundsDistance)) {
for (int i = children.size()-1; i >= 0; i--) {
children.get(i).impl_pickNode(pickRay, result);
if (result.isClosed()) {
return;
}
}
if (isPickOnBounds()) {
result.offer(this, boundsDistance, PickResultChooser.computePoint(pickRay, boundsDistance));
}
}
}
For a leaf node (Node.java):
#Deprecated
protected void impl_pickNodeLocal(PickRay localPickRay, PickResultChooser result) {
impl_intersects(localPickRay, result);
}
So you don't need to implement your own geometry processing and pick handling with a complicated supporting algorithm, as JavaFX already provides an appropriate structure (the scene graph) and is fully capable of processing click handling events for it.
Addressing additional questions or concerns
I know that searching trees is fast and efficient, but it isn't constant time right?
Searching trees is not necessarily fast nor efficient. Search speed depends upon the depth and width of the tree and whether the tree is ordered, allowing a binary search. The scene graph is not a [binary search tree[(https://en.wikipedia.org/wiki/Binary_search_tree) or a red-black tree or a b-tree or any other kind of tree that is optimized for search. The hit testing algorithm that JavaFX uses, as can be seen above is a in-order traversal of the tree, which is linear in time: O(n).
If you wanted, you could subclass parent, region, pane or group and implement your own search algorithm for picking by overriding functions such as impl_pickNodeLocal. For example, if you constrain your field to a fixed width font, calculation of which letter a click will hit is trivial function that could be done in constant time via a simple mathematical equation and no additional data structures.
Starting to get really off-topic aside
But, even if you can do implement a custom hit processing algorithm, you need to consider whether you really should. Obviously the default hit testing algorithm for JavaFX is sufficient for most applications and further optimizations of it for the general use case are currently deemed unnecessary. If there existed some well-known golden algorithm or data structure that greatly improved its quality and there was sufficient demand for it, the hit-testing algorithm would have been further optimized already. So it is probably best to use the default implementation unless you are experiencing performance issues and you have a specific use case (such as a mapping application), where an alternate algorithm or data structure (such as an r-tree), can be used to effect a performance boost. Even then, you would want to benchmark various approaches on different sizes and types of data sets to validate the performance on those data sets.
You can see the evidence of the optimization approach such as I described above in the multiply algorithm for BigIntegers from the JDK. You might think the number multiplication is a constant time operation, but, its not for large numbers, because digits in the numbers are represented in different bytes and it is necessary to process all of the bytes to perform the multiplication. There are various algorithms out there for processing the bytes for multiplication, but the choice of what is the "most efficient" one depends upon the properties of the numbers themselves (e.g. their size). For smaller numbers, a straight loop for long multiplication is the most efficient, for larger numbers, the Karatsuba algorithm is used and for larger numbers again the Toom-Cook algorithm is used. The thresholds for choosing just how large a number is required to switch to the different algorithm would have been chosen via analysis and benchmarking. Also, the number is being multiplied with itself (it is being squared), a more efficient algorithm can be used to perform the square (so that is an example of a special edge case that is being specifically optimized for).
/**
* Returns a BigInteger whose value is {#code (this * val)}.
*
* #implNote An implementation may offer better algorithmic
* performance when {#code val == this}.
*
* #param val value to be multiplied by this BigInteger.
* #return {#code this * val}
*/
public BigInteger multiply(BigInteger val) {
if (val.signum == 0 || signum == 0)
return ZERO;
int xlen = mag.length;
if (val == this && xlen > MULTIPLY_SQUARE_THRESHOLD) {
return square();
}
int ylen = val.mag.length;
if ((xlen < KARATSUBA_THRESHOLD) || (ylen < KARATSUBA_THRESHOLD)) {
int resultSign = signum == val.signum ? 1 : -1;
if (val.mag.length == 1) {
return multiplyByInt(mag,val.mag[0], resultSign);
}
if (mag.length == 1) {
return multiplyByInt(val.mag,mag[0], resultSign);
}
int[] result = multiplyToLen(mag, xlen,
val.mag, ylen, null);
result = trustedStripLeadingZeroInts(result);
return new BigInteger(result, resultSign);
} else {
if ((xlen < TOOM_COOK_THRESHOLD) && (ylen < TOOM_COOK_THRESHOLD)) {
return multiplyKaratsuba(this, val);
} else {
return multiplyToomCook3(this, val);
}
}
}
I'm having a horrible time coming up with a good question Title... sorry/please edit if your brain is less shot than mine.
I am having some issues handling my game's maps client side. My game is tile based using 32x32 pixel tiles. My first game map was 1750 x 1750 tiles. I had a bunch of layers client side, but managed to cut it down to 2 (ground and buildings). I was previously loading the entire map's layers into memory(short arrays). When I jumped to 2200 x 2200 tiles I noticed an older pc having some issues with out of memory (1GB+). I wish there was a data type between byte and short(I am aiming for ~1000 different tiles). My game supports multiple resolutions so the players visible space may show 23,17 tiles for a 800x600 resolution all the way up to 45,29 tiles for 1440x1024 (plus) resolutions. I use Tiled to draw my maps and output the 2 layers into separate text files using a format similar to the following (0, 0, 2, 0, 3, 6, 0, 74, 2...) all on one line.
With the help of many SO questions and some research I came up with a map chunking strategy. Using the player's current coordinates, as the center point, I load enough tiles for 5 times the size of the visual map(largest would be 45*5,29*5 = 225,145 tiles). The player is always drawn in the center and the ground moves beneath him/her(when you walk east the ground moves west). The minimap is drawn showing one screen away in all directions, to be three times the size of the visible map. Please see the below(very scaled down) visual representation to explain better than I likely explained it.
My issue is this: When the player moves "1/5th the chunk size tiles away" from the original center point (chunkX/Y) coordinates, I call for the game to re scan the file. The new scan will use the current coordinates of the player as it's center point. Currently this issue I'm having is the rechunking takes like .5s on my pc(which is pretty high spec). The map does not update for like 1-2 tile moves.
To combat the issue above I tried to run the file scanning in a new thread (before hitting the 1/5th point) into a temporary arraybuffer. Then once it was done scanning, I would copy the buffer into the real array, and call repaint(). Randomly I saw some skipping issues with this which was no big deal. Even worse I saw it drawing a random part of the map for 1-2 frames. Code sample below:
private void checkIfWithinAndPossiblyReloadChunkMap(){
if (Math.abs(MyClient.characterX - MyClient.chunkX) + 10 > (MyClient.chunkWidth / 5)){ //arbitrary number away (10)
Runnable myRunnable = new Runnable(){
public void run(){
logger.info("FillMapChunkBuffer started.");
short chunkXBuffer = MyClient.characterX;
short chunkYBuffer = MyClient.characterY;
int topLeftChunkIndex = MyClient.characterX - (MyClient.chunkWidth / 2) + ((MyClient.characterY - (MyClient.chunkHeight / 2)) * MyClient.mapWidth); //get top left coordinate of chunk
int topRightChunkIndex = topLeftChunkIndex + MyClient.chunkWidth - 1; //top right coordinate of chunk
int[] leftChunkSides = new int[MyClient.chunkHeight];
int[] rightChunkSides = new int[MyClient.chunkHeight];
for (int i = 0; i < MyClient.chunkHeight; i++){ //figure out the left and right index points for the chunk
leftChunkSides[i] = topLeftChunkIndex + (MyClient.mapWidth * i);
rightChunkSides[i] = topRightChunkIndex + (MyClient.mapWidth * i);
}
MyClient.groundLayerBuffer = MyClient.FillGroundBuffer(leftChunkSides, rightChunkSides);
MyClient.buildingLayerBuffer = MyClient.FillBuildingBuffer(leftChunkSides, rightChunkSides);
MyClient.groundLayer = MyClient.groundLayerBuffer;
MyClient.buildingLayer = MyClient.buildingLayerBuffer;
MyClient.chunkX = chunkXBuffer;
MyClient.chunkY = chunkYBuffer;
MyClient.gamePanel.repaint();
logger.info("FillMapChunkBuffer done.");
}
};
Thread thread = new Thread(myRunnable);
thread.start();
} else if (Math.abs(MyClient.characterY - MyClient.chunkY) + 10 > (MyClient.chunkHeight / 5)){ //arbitrary number away (10)
//same code as above for Y
}
}
public static short[] FillGroundBuffer(int[] leftChunkSides, int[] rightChunkSides){
try {
return scanMapFile("res/images/tiles/MyFirstMap-ground-p.json", leftChunkSides, rightChunkSides);
} catch (FileNotFoundException e) {
logger.fatal("ReadMapFile(ground)", e);
JOptionPane.showMessageDialog(theDesktop, getStringChecked("message_file_locks") + "\n\n" + e.getMessage(), getStringChecked("message_error"), JOptionPane.ERROR_MESSAGE);
System.exit(1);
}
return null;
}
private static short[] scanMapFile(String path, int[] leftChunkSides, int[] rightChunkSides) throws FileNotFoundException {
Scanner scanner = new Scanner(new File(path));
scanner.useDelimiter(", ");
int topLeftChunkIndex = leftChunkSides[0];
int bottomRightChunkIndex = rightChunkSides[rightChunkSides.length - 1];
short[] tmpMap = new short[chunkWidth * chunkHeight];
int count = 0;
int arrayIndex = 0;
while(scanner.hasNext()){
if (count >= topLeftChunkIndex && count <= bottomRightChunkIndex){ //within or outside (east and west) of map chunk
if (count == bottomRightChunkIndex){ //last entry
tmpMap[arrayIndex] = scanner.nextShort();
break;
} else { //not last entry
if (isInsideMapChunk(count, leftChunkSides, rightChunkSides)){
tmpMap[arrayIndex] = scanner.nextShort();
arrayIndex++;
} else {
scanner.nextShort();
}
}
} else {
scanner.nextShort();
}
count++;
}
scanner.close();
return tmpMap;
}
I really am at my wits end with this. I want to be able to move on past this GUI crap and work on real game mechanics. Any help would be tremendously appreciated. Sorry for the long post, but trust me a lot of thought/sleepless nights has gone into this. I need the SO experts ideas. Thanks so much!!
p.s. I came up with some potential optimization ideas (but not sure these would solve some of the issue):
split the map files into multiple lines so I can call scanner.nextLine() 1 time, rather than scanner.next() 2200 times
come up with a formula that given the 4 corners of the map chunk will know if a given coordinate lies within it. this would allow me to call scanner.nextLine() when at the farthest point on chunk for a given line. this would require the multiline map file approach above.
throw away only 1/5th of the chunk, shift the array, and load the next 1/5th of the chunk
Make sure scanning the file has finished before starting a new scan.
Currently you'll start scanning again (possibly in every frame) while your centre is too far away from the previous scan centre. To fix this remember you are scanning before you even start and enhance your far away condition accordingly.
// MyClient.worker represents the currently running worker thread (if any)
if(far away condition && MyClient.worker == null) {
Runnable myRunnable = new Runnable() {
public void run(){
logger.info("FillMapChunkBuffer started.");
try {
short chunkXBuffer = MyClient.nextChunkX;
short chunkYBuffer = MyClient.nextChunkY;
int topLeftChunkIndex = MyClient.characterX - (MyClient.chunkWidth / 2) + ((MyClient.characterY - (MyClient.chunkHeight / 2)) * MyClient.mapWidth); //get top left coordinate of chunk
int topRightChunkIndex = topLeftChunkIndex + MyClient.chunkWidth - 1; //top right coordinate of chunk
int[] leftChunkSides = new int[MyClient.chunkHeight];
int[] rightChunkSides = new int[MyClient.chunkHeight];
for (int i = 0; i < MyClient.chunkHeight; i++){ //figure out the left and right index points for the chunk
leftChunkSides[i] = topLeftChunkIndex + (MyClient.mapWidth * i);
rightChunkSides[i] = topRightChunkIndex + (MyClient.mapWidth * i);
}
// no reason for them to be a member of MyClient
short[] groundLayerBuffer = MyClient.FillGroundBuffer(leftChunkSides, rightChunkSides);
short[] buildingLayerBuffer = MyClient.FillBuildingBuffer(leftChunkSides, rightChunkSides);
MyClient.groundLayer = groundLayerBuffer;
MyClient.buildingLayer = buildingLayerBuffer;
MyClient.chunkX = chunkXBuffer;
MyClient.chunkY = chunkYBuffer;
MyClient.gamePanel.repaint();
logger.info("FillMapChunkBuffer done.");
} finally {
// in any case clear the worker thread
MyClient.worker = null;
}
}
};
// remember that we're currently scanning by remembering the worker directly
MyClient.worker = new Thread(myRunnable);
// start worker
MyClient.worker.start();
}
Preventing a rescan before the previous rescan has completed presents another challenge: what to do if you walk diagonally i.e. you reach the situation where in x you're meeting the far away condition, start scanning and during that scan you'll meet the condition for y to be far away. Since you choose the next scan centre according to your current position, this problem should not arise as long as you have a large enough chunk size.
Remembering the worker directly comes with a bonus: what do you do if you need to teleport the player/camera at some point while you are scanning? You can now simply terminate the worker thread and start scanning at the new location: you'll have to check the termination flag manually in MyClient.FillGroundBuffer and MyClient.FillBuildingBuffer, reject the (partially computed) result in the Runnable and stop the reset of MyClient.worker in case of an abort.
If you need to stream more data from the file system in your game think of implementing a streaming service (extend the idea of the worker to one that's processing arbitrary file parsing jobs). You should also check if your hard drive is able to perform reading from multiple files concurrently faster than reading a single stream from a single file.
Turning to a binary file format is an option, but won't save much in terms of file size. And since Scanner already uses an internal buffer to do its parsing (parsing integers from a buffer is faster than filling a buffer from a file), you should first focus on getting your worker running optimally.
Try to speed up the reading speed by using a binary file instead of a csv-file.
Use DataInputStream and readShort() for that. (This will also cut down the size of the map.)
You also can also use 32x32 tiles chunks and save them into several files.
So you haven't to load the tiles which are already loaded.
I am trying to build an OCR by calculating the Coefficient Correlation between characters extracted from an image with every character I have pre-stored in a database. My implementation is based on Java and pre-stored characters are loaded into an ArrayList upon the beginning of the application, i.e.
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
{
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
}
...
...
public double findCorrelation(byte [] extractedCharacter, byte [] storedCharacter)
{
double mag1, mag2, corr = 0;
for(int i=0; i < extractedCharacter.length; i++)
{
mag1 += extractedCharacter[i] * extractedCharacter[i];
mag2 += storedCharacter[i] * storedCharacter[i];
corr += extractedCharacter[i] * storedCharacter[i];
} // for
corr /= Math.sqrt(mag1*mag2);
return corr;
}
The number of extractedCharacters are around 100-150 per image but the database has 15600 stored binary characters. Checking the coefficient correlation between every extracted character and every stored character has an impact on the performance as it needs around 15-20 seconds to complete for every image, with an Intel i5 CPU.
Is there a way to improve the speed of this program, or suggesting another path of building this bringing similar results. (The results produced by comparing every character with such a large dataset is quite good).
Thank you in advance
UPDATE 1
public static void run() {
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
computeNorms(charComps, extractedCharacters);
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
{
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
}
}
}
private static double[] storedNorms;
private static double[] extractedNorms;
// Correlation between to binary images
public static double findCorrelation(byte[] arr1, byte[] arr2, int strCharIndex, int extCharNo){
final int dotProduct = dotProduct(arr1, arr2);
final double corr = dotProduct * storedNorms[strCharIndex] * extractedNorms[extCharNo];
return corr;
}
public static void computeNorms(ArrayList<byte[]> storedCharacters, ArrayList<byte[]> extractedCharacters) {
storedNorms = computeInvNorms(storedCharacters);
extractedNorms = computeInvNorms(extractedCharacters);
}
private static double[] computeInvNorms(List<byte []> a) {
final double[] result = new double[a.size()];
for (int i=0; i < result.length; ++i)
result[i] = 1 / Math.sqrt(dotProduct(a.get(i), a.get(i)));
return result;
}
private static int dotProduct(byte[] arr1, byte[] arr2) {
int dotProduct = 0;
for(int i = 0; i< arr1.length; i++)
dotProduct += arr1[i] * arr2[i];
return dotProduct;
}
Nowadays, it's hard to find a CPU with a single core (even in mobiles). As the tasks are nicely separated, you can do it with a few lines only. So I'd go for it, though the gain is limited.
In case you really mean cross-correlation, then a transform like DFT or DCT could help. They surely do for big images, but with yours 12x16, I'm not sure.
Maybe you mean just a dot product? And maybe you should tell us?
Note that you actually don't need to compute the correlation, most of the time you only need is find out if it's bigger than a threshold:
corr = findCorrelation(extractedCharacter, storedCharacter)
..... more code to check if this is the best match ......
This may lead to some optimizations or not, depending on how the images look like.
Note also that a simple low level optimization can give you nearly a factor of 4 as in this question of mine. Maybe you really should tell us what you're doing?
UPDATE 1
I guess that due to the computation of three products in the loop, there's enough instruction level parallelism, so a manual loop unrolling like in my above question is not necessary.
However, I see that those three products get computed some 100 * 15600 times, while only one of them depends on both extractedCharacter and storedCharacter. So you can compute
100 + 15600 + 100 * 15600
dot products instead of
3 * 100 * 15600
This way you may get a factor of three pretty easily.
Or not. After this step there's a single sum computed in the relevant step and the problem linked above applies. And so does its solution (unrolling manually).
Factor 5.2
While byte[] is nicely compact, the computation involves extending them to ints, which costs some time as my benchmark shows. Converting the byte[]s to int[]s before all the correlations gets computed saves time. Even better is to make use of the fact that this conversion for storedCharacters can be done beforehand.
Manual loop unrolling twice helps but unrolling more doesn't.
In my project I want to be able to at least inform the user what string the note they need to play is on. I can get the note and its octave but as I've discovered, that note and its octave can appear in multiple places on a guitar fret board.
So my question is: Is there anyway to map a midi note to a guitar string?
Here's code that takes the MIDI note value and returns the position on the guitar fretboard closest to the end of the instrument. Fret zero is an open string.
static class Fingering {
int string;
int fret;
public String toString() {
return "String : " + stringNames[string] + ", fret : " + fret;
}
}
static String[] stringNames = new String[] {"Low E", "A", "D", "G", "B", "High E"};
/** Array showing guitar string's relative pitches, in semi-tones, with "0" being low E */
static int[] strings = new int[]{64, 69, 74, 79, 83, 88};
public static Fingering getIdealFingering(int note) {
if (note < strings[0])
throw new RuntimeException("Note " + note + " is not playable on a guitar in standard tuning.");
Fingering result = new Fingering();
int idealString = 0;
for (int x = 1; x < strings.length; x++) {
if (note < strings[x])
break;
idealString = x;
}
result.string = idealString;
result.fret = note - strings[idealString];
return result;
}
public static void main(String[] args) {
System.out.println(getIdealFingering(64)); // Low E
System.out.println(getIdealFingering(66)); // F#
System.out.println(getIdealFingering(72)); // C on A string
System.out.println(getIdealFingering(76)); // E on D string
System.out.println(getIdealFingering(88)); // guitar's high e string, open
System.out.println(getIdealFingering(100)); // high E, 12th fret
System.out.println(getIdealFingering(103)); // high G
}
Result:
String : Low E, fret : 0
String : Low E, fret : 2
String : A, fret : 3
String : D, fret : 2
String : High E, fret : 0
String : High E, fret : 12
String : High E, fret : 15
Yes, with simple logic you can do this. I would consider using a HashMap of <Note, MidiNote> where Note is your class that holds both relative note and octave and has decent equals and hashcode methods, and MidiNote is your class to represent a Midi note.
Think of MIDI as like defining piano keys. Codes and keys are one-to-one. This is unlike a guitar or violin, where the same tone can be played in multiple places.
If you want to represent the greater freedom you have on a guitar in some data format, you'll have to find or invent a different format. MIDI won't encode what you want.
However, there's an indirect way you might go about this, and it has to do with developing heuristics as to where to play a note given a sliding window of notes that came before. A given note may be easier on one string or another depending on what you've just played, and you can calculate that given a model of the hand and where fingers will have been. Based on this, you can convert MIDI to guitar in a way that makes the MIDI easiest to play. If you have a piece of guitar music that follows these rules already, then you can encode it in MIDI and then decode it later.
But perhaps your question is more basic. Yes, you can map a MIDI note to a guitar. The naive method is to make a mapping of each note playable on the guitar, and you decide between equivalent alternatives by picking the one closest to the nut. This would be an easy one-to-one mapping but wouldn't necessarily be the easiest to play.
If you REALLY want to do it right, you'll do a careful analysis of the music to decide the optimal hand position and where the hand position should change, and then you'd associate MIDI notes with frets and strings based on what's easiest to reach based on the hand position. The optimal solution is probably NP-complete or worse, do you'd probably want to develop an approximate solution based on some rules about how often and how far you can change hand position.
I was wondering if I could get some advice on increasing the overall efficiency of a program that implements a genetic algorithm. Yes this is an assignment question, but I have already completed the assignment on my own and am simply looking for a way to get it to perform better
Problem Description
My program at the moment reads a given chain made of the types of constituents, h or p. (For example: hphpphhphpphphhpphph) For each H and P it generated a random move (Up, Down, Left, Right) and adds the move to an arrayList contained in the "Chromosome" Object. At the start the program is generating 19 moves for 10,000 Chromosomes
SecureRandom sec = new SecureRandom();
byte[] sbuf = sec.generateSeed(8);
ByteBuffer bb = ByteBuffer.wrap(sbuf);
Random numberGen = new Random(bb.getLong());
int numberMoves = chromosoneData.length();
moveList = new ArrayList(numberMoves);
for (int a = 0; a < numberMoves; a++) {
int randomMove = numberGen.nextInt(4);
char typeChro = chromosoneData.charAt(a);
if (randomMove == 0) {
moveList.add(Move.Down);
} else if (randomMove == 1) {
moveList.add(Move.Up);
} else if (randomMove == 2) {
moveList.add(Move.Left);
} else if (randomMove == 3) {
moveList.add(Move.Right);
}
}
After this comes the selection of chromosomes from the Population to crossover. My crossover function selections the first chromosome at random from the fittest 20% of the population and the other at random from outside of the top 20%. The chosen chromosomes are then crossed and a mutation function is called. I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome. Currently my fitness function creates a 2d Array to act as a grid, places the moves in order from the move list generated by the function shown above, and then loops through the array to do the fitness calculation. (I.E. found and H at location [2,1] is Cord [1,1] [3,1] [2,0] or [2,2] also an H and if an H is found it just increments the count of bonds found)
After the calculation is complete the least fit chromosome is removed from my population and the new one is added and then the array list of chromosomes is sorted. Rinse and repeat until target solution is found
If you guys want to see more of my code to prove I actually did the work before asking for help just let me know (dont want to post to much so other students cant just copy pasta my stuff)
As suggested in the comments I have ran the profiler on my application (have never used it before, only a first year CS student) and my initial guess on where i am having issues was somewhat incorrect. It seems from what the profiler is telling me is that the big hotspots are:
When comparing the new chromosome to the others in the population to determine its position. I am doing this by implementing Comparable:
public int compareTo(Chromosome other) {
if(this.fitness >= other.fitness)
return 1;
if(this.fitness ==other.fitness )
return 0;
else
return -1;
}
The other area of issue described is in my actual evolution function, consuming about 40% of the CPU time. A codesample from said method below
double topPercentile = highestValue;
topPercentile = topPercentile * .20;
topPercentile = Math.ceil(topPercentile);
randomOne = numberGen.nextInt((int) topPercentile);
//Lower Bount for random two so it comes from outside of top 20%
int randomTwo = numberGen.nextInt(highestValue - (int) topPercentile);
randomTwo = randomTwo + 25;
//System.out.println("Selecting First: " + randomOne + " Selecting Second: " + randomTwo);
Chromosome firstChrom = (Chromosome) populationList.get(randomOne);
Chromosome secondChrom = (Chromosome) populationList.get(randomTwo);
//System.out.println("Selected 2 Chromosones Crossing Over");
Chromosome resultantChromosome = firstChrom.crossOver(secondChrom);
populationList.add(resultantChromosome);
Collections.sort(populationList);
populationList.remove(highestValue);
Chromosome bestResult = (Chromosome) populationList.get(0);
The other main preformance hit is the inital population seeding which is performed by the first code sample in the post
I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome
If you are not sure then I assume you have not run a profiler on the program yet.
If you want to improve the performance, profiling is the first thing you should do.
Instead of repeatedly sorting your population, use a collection that maintains its contents already sorted. (e.g. TreeSet)
If your fitness measure is consistent across generations (i.e. not dependent on other members of the population) then I hope at least that you are storing that in the Chromosome object so you only calculate it once for each member of the population. With that in place you'd only be calculating fitness on the newly generated/assembled chromosome each iteration. Without more information on how fitness if calculated it's difficult to be able to offer any optimisations in that area.
Your random number generator seed doesn't need to be cryptographically strong.
Random numberGen = new Random();
A minor speedup when seeding your population is to remove all the testing and branching:
static Move[] moves = {Move.Down, Move.Up, Move.Left, Move.Right};
...
moveList.add(moves[randomMove]);