My problem comes down to this:
I need to sort files in a specific order (the files got numbers at the beginning). Later I want to store them externally, the files are then sorted by alphabet by the system they're on, I got no influence on that process. So how could I rename them to make them stay in the correct order?
// this comparator sorts them by alphabetical order
Comparator<Mp3File> compAlphabetical = (x,y) -> getMp3TitleFromFilename(x).compareTo(getMp3TitleFromFilename(y));
//and this one does by number
//the inputs look like this "582 Some File Name" so they have to be edited with some regex before using them for sorting
Comparator<Mp3File> compNumeric = new Comparator<Mp3File>() {
#Override
public int compare(Mp3File o1, Mp3File o2) {
Integer i1 = Integer.parseInt(getMp3TitleFromFilename(o1).substring(0,3).replaceAll("[^0-9]", ""));
Integer i2 = Integer.parseInt(getMp3TitleFromFilename(o2).substring(0,3).replaceAll("[^0-9]", ""));
return i1.compareTo(i2);
}
};
What I want to achieve is a method which gets the list with the correct sorting (2nd) comparator and renames the files so they would maintain their order, even if I would run the first Comparator on the list.
Right now the sorting by alphabet puts out partly correct orders. It looks like this:
1 One File Name
10 Another File Name
100 A Good File Name
101 An Even Better File Name
102 Another File
103 A really good File Name
But this isn't really what I want so I thought about putting some letters at the beginning like this:
AAA One File
AAB Another File
AAC And
AAD So
AAE On
But I can't figure out how to properly convert those numbers to chars and how to make that working inside Java. Maybe one of you got an idea for me how to figure this out? Thanks in advance!
So your code is currently ordering the strings in lexicographical order, which basically means the language treats the variables as strings and orders them as such, comparing each character at each position of a string with another strings corresponding character and its corresponding position(e.g. "2" is greater than "1999999" because '2' is greater than '1').
You've probably seen this problem, and its solution, if you've ever looked at a folder containing episodes of a show(e.g. S3E08). You'll notice they prepend a '0' to the episode number so that the lexicographical sort doesn't mess up what we would expect to be the correct alphabetical order.
This is what I suggest that you do, I'll put an example of what the files names would look like below:
001 One File Name
010 Another File Name
100 A Good File Name
101 An Even Better File Name
102 Another File
103 A really good File Name
The algorithm to do this is fairly simple, so I'll leave that up to you! Feel free to post back if you have any questions on implementation
Related
I have a certain String (from a Radio talkshow), which is an anagram with a length of 15.
What I want to do is to build all permutations efficiently and check them against a dictionary.
This way I want to find out the original word of the anagram.
I already wrote an alogithm, which is working by always merging one letter after the other in the already known permutations.
It is working, but it is too slow. There is never any result shown with 15 chracters (no wonder with 15! possibilities).
So my question is, how to do that faster?
for every word in your dictionary/array/set - sort letters in this word and store in separate dictionary/map, something like (in pseudo code)
Set<String> originalDictionary = {"word", "string"};
Map<String, String> sortedMap = {
"dorw" => "word", "ginrst" => "string"}
};
for your input string - sort letter again and check whether you have something in your dictionary
I don't know how large your dictionary is, but assume it has much less than 15! entries. So I would go through all dictionary entries and find the ones with length 15. Checking if they are permutations of your original string should be easy.
Suppose I've a couple of objects' values stored on a text file. In the beginning, there will be some values which will have the same value, for instance, all students by default will have 0 age. Now if I want to make an edit in the age of one student using the conventional file handling approach, I'll end up making changes to all other students who have 0 age, while writing my data onto the temporary file. Thus, I was hoping that if there is a better way to make changes to a file using file handling in java. Just to give an example of the issue at hand consider the following text file
Edsger
Dijkstra
123
72 years
Ruth
Dijkstra
12345
29 years
The line indicates a space between the age and the name. Now, my task is to construct a program where a user can change any detail, such as the First Name, surname, roll_number or the age. As you can see from the example given above, two people can share some data that is common. In this case, it is the surname. However, the roll number (123,12345) will always be unique. The problem comes when you have to change similar data. Suppose the user wants to edit the surname. Then by I would create a temporary file which would hold this data and later I would read this data with some conditions to it. So the code might look like this:
Note: This data is stored at a known location "abc.txt".
BufferedReader br=new BufferedReader(new FileReader("abc.txt"));
BufferedWriter bw=new BufferedWriter(new FileWriter("Temp.txt"));
String a=br.readLine();
while(a!=null)
{ bw.write(a);
bw.newLine();
a=br.readLine();
}
br.close();
bw.close();
BufferedReader br1=new BufferedReader(new FileReader("Temp.txt"));
BufferedWriter bw1=new BufferedWriter(new FileWriter("temp.txt"));
String b=br1.readLine();
while(b!=null)
{
if(b.equals(requested_surname))
bw1.write(w);//w is the String that holds the altered surname as desired by the User, for the sake of an example say it is Euler
else
bw1.write(b);
bw1.newLine();
b=br1.readLine();
}
bw1.close();
br1.close();
f.delete();
As a result the Original text file "abc.txt" will show something like this:-
Edsger
Euler
123
72 years
Ruth
Euler
12345
29 years
Now this will be a bungling problem as I intend to change only Ruth's surname! I know that this is slightly different from what I initially asked, but I think that if I could target the line below "Ruth", I can make the desired changes.
Please Help...
There are several approaches to do this.
You could store the data in a csv file, 1 line for each object:
123,Edsger,Dijkstra,72
12345,Ruth,Dijkstra,29
4567,Ruth,Euler,27
Then, on program start-up read all the objects in memory (in a structure) for easy access. On program exit or save, write everything back to the file (assuming the number of objects isn't really big - i.e. not millions).
Another way is to store every field of the object as a fixed width value:
123 Edsger Dijkstra 72
12345 Ruth Dijkstra 29
4567 Ruth Euler 27
That way changes to the data can easily be written 'in place' in the file. You only have to make sure the fields don't exceed the maximum size. The number fields could even be in binary format if needed.
With fixed width, or an exact size for each object, it is easy (and faster) to look up a certain object or roll number: since the size of the objects is known, file seek can be used to jump directly to the beginning of each object - no parsing is needed.
Note: the objects don't need to be on a separate line in this case (I've done it for clarity) - but if they are, newlines (could be \r or \r\n or \n) will have to be added to the size of the objects.
Of course, searches for a certain person/object should always be done on a unique ID, in this case roll number, never on the name.
File can be thought as a character array.
char[] file = ... // file (on disk)
char[] newData = ... // data to be written
int pos = ... // the position in the file to write to
for (i=0; i<newData.; i++) {
file[pos+i] = newData[i];
}
You can particularly make use of seek().
Check this out as well:
http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
I need to implement a spell checker in java , let me give you an example for a string lets say "sch aproblm iseasili solved" my output is "such a problem is easily solved".The maximum length of the string to correct is 64.As you can see my string can have spaces inserted in the wrong places or not at all and even misspelled words.I need a little help in finding a efficient algorithm of coming up with the corrected string. I am currently trying to delete all spaces in my string and inserting spaces in every possible position , so lets say for the word (it apply to a sentence as well) "hot" i generate the next possible strings to afterwords be corrected word by word using levenshtein distance : h o t ; h ot; ho t; hot. As you can see i have generated 2^(string.length() -1) possible strings. So for a string with a length of 64 it will generate 2^63 possible strings, which is damn high, and afterwords i need to process them one by one and select the best one by a different set of parameters such as : - total editing distance (must take the smallest one)
-if i have more strings with same editing distance i have to choose the one with the fewer number of words
-if i have more strings with the same number of words i need to choose the one with the total maximum frequency the words have( i have a dictionary of the most frequent 8000 words along with their frequency )
-and finally if there are more strings with the same total frequency i have to take the smallest lexicographic one.
So basically i generate all possible strings (inserting spaces in all possible positions into the original string) and then one by one i calculate their total editing distance, nr of words ,etc. and then choose the best one, and output the corrected string. I want to know if there is a easier(in terms of efficiency) way of doing this , like not having to generate all possible combinations of strings etc.
EDIT:So i thought that i should take another approach on this one.Here is what i have in mind: I take the first letter from my string , and extract from the dictionary all the words that begin with that letter.After that i process all of them and extract from my string all possible first words. I will remain at my previous example , for the word "hot" by generating all possible combinations i got 4 results , but with my new algorithm i obtain only 2 "hot" , and "ho" , so it's already an improvement.Though i need a little bit of help in creating a recursive or PD algorithm for doing this . I need a way to store all possible strings for the first word , then for all of those all possible strings for the second word and so on and finally to concatenate all possibilities and add them into an array or something. There will still be a lot of combinations for large strings but not as many as having to do ALL of them. Can someone help me with a pseudocode or something , as this is not my strong suit.
EDIT2: here is the code where i generate all the possible first word from my string http://pastebin.com/d5AtZcth .I need to somehow implement this to do the same for the rest and combine for each first word with each second word and so on , and store all these concatenated into an array or something.
A few tips for you:
try correcting just small parts of the string, not everything at once.
90% of erros (IIRC) have 1 edit distance from the source.
you can use a phonetic index to match words against words that sound alike.
you can assume most typos are QWERTY errors (j=>k, h=>g), and try to check them first.
A few more ideas can be found in this nice article:
http://norvig.com/spell-correct.html
I want to add words an opensource Java word splitting program for Khmer (a language that does not have spaces between words). The developers have not worked on it in a long time, and I haven't been able to contact them for details (http://sourceforge.net/projects/khmer/files/Khmer%20Word%20Breaking/Khmer%20Word%20Breaking%20program%20V1.0/). Supposedly the list was created from a Khmer dictionary, and I would like to re-create the file to include more words.
Can anyone identify what format the word dictionary is in (I believe it is some type of Trie)? Here are the first few lines:
0ឳមអគណជយឍឫហកដពទឱលថឦឡញឩខនឧផប។ឋវឭឈឃឥឌឰឪសងចភធឯតឆរ
1ទ
0ក
1
1ីែមគួណជយ៍ៀហកទុលេញ៉ឺនំឹៃូឈឃោាឿសងចិ្ធើតៅរ
1គនសងរ
0ទ
0ា
0យ
0ព
0ន
1
1រ
0ា
0ស
0ី
1
And does anyone know how I would go about making a new one (I have a large wordlist, but I am not sure how to get it into this format).
Thanks!
After a quick look through the code, I have a theory.
Create a SearchTree which extends TreeItem. For each word in your dictionary, call addWord from TreeItem. When the iteration is done, call export on SearchTree. Use new file as the word input file.
Additionally, there may be an undocumented parameter for khwrdbrk.jar, --create, that will read the words for the new tree from standard input.
Again, just a theory, but let me know what happens if you test it out.
Im given a task which i am a little confused to understand. Here is the question statement:
The following program should read a file and store all its tokens in a member variable.
Your task is to write a single method that returns the number of items in tokenMap, the average length (as double value) of the elements in tokenMap, and the number of tokens starting with character "a".
Here the tokenMap is an object of type HashMap<String, Integer>;
I do have some idea about HashMap but what i want to know the "key value" for HashMap required is a single character or the whole word?? that i should store in tokenMap.
Also how can i compute the average length?
Looks like you have to use the entire word as the key.
The average length of tokens can be computed by summing the lengths of each token and dividing by the number of tokens.
In Java, you can find the number of tokens in the HashMap by tokenMap.size().
You can write loops that visit each member of the map like this:
for(String t: tokenMap.values()){
//t is a token
}
and if you look up String in the Java API docs you will see that it is easy to find the length of a String.
To compute the average length of the items in a hash map, you'll have to iterate over them all and count the length and calculate the average.
As for your other question about what to use for a key, how are we supposed to know? A hashmap can use practically any* value for a key.
*The value must be hashable, which is defined differently for different languages.
Reading the question closely, it seems that you have to read a file, extract each word and use it as the key value, and store the length of each key as the integer:
an example line
leads to a HashMap like this
an : 2
example : 7
line : 4
After you've built your map (made of keys mapping to entries, or seemingly elements in the question), you'll need to run some statistics over it to find
the number of keys (look at HashMap)
the average length of all keys (again, simple enough)
the number beginning with "a" (just look at the String)
Then make a value object containing these values and return it from the method that does the statistics.
I know I've given more information that you require, but someone else may benefit from a little extra help.
Guys there is some confusion. Im not asking for a solution. Im just confused for one thing.
For the time being, im gonna use String type as the key type.
The only confusion i have is once i read the file line by line, should i split it based upon words or based upon each character. So that the key value should be a single character type string or a String of whole word.
If you can go through the question statement, what do you suggest. That's all im asking.
should i split it based upon words or
based upon each character
The requirement is to make tokens, so you should split them based on words. Each word becomes a unique String key. It would make sense for the value to be the count of each token.
If the file you are reading has these three lines:
int alpha;
int beta;
float delta;
Then you should have something like
<"int", 2>
<";", 3>
<"alpha", 1>
<"beta", 1>
<"float", 1>
<"delta", 1>
(The semicolon may or may not be considered a token.)
Your average length would be ( 3x2 + 3x1 + 5 + 4 + 5 + 5) / 6.
Your length of tokens starting with "a" would be 5.0.
Look elsewhere on this forum for keySet and you should be good to go.