rewrite id3v2 tagging java - java

I'm looking to be able to get the metadata from various mp3s and store them as strings. This is easy with ID3v1 as it's the last 128 bits, but for ID3v2 onwards all of the metadata is stored in variable sized frames, and I'm currently unable to extract this. Any help much appreciated, thanks
Current code is :
Path fromC = Paths.get("C:\\whatever.mp3");
byte[] data = Files.readAllBytes(fromC);
String[] bytesInString = new String[768];
//uses a loop 768 times as its up to 256mb in size, and 3 digit numbers
for (int i = 0; i < (768); i++) {
bytesInString [i] = Character.toString((char) (data[i]));
}
StringBuilder byteResult = new StringBuilder();
for (int i = 0; i < bytesInString .length; i++) {
byteResult.append(title[i]);
}
String allMeta= byteResult.toString();
System.out.println(allMeta);
which is to simply save the metadata as a string for use elsewhere, but currently prints it out to just check it.
This however gives results such as 'ID3 OTRCK 2TALB FeelTIT2 Here We GoPRIV ) WM/MediaClassSecondaryID PRIV 'WM/MediaClassPrimaryID ￑# ̄¬KニᄀHᄂ PRIV ' - which shows that it is semi-able to get the metadata, but is currently unable to make it easy to read/useful. Could anyone explain or give any sample code as to how to make it more legible and with just the title, artist, etc, rather than the frame information.

Related

How to use kerning pairs extracted from a TTF file to correctly show glyphs as Path2D in Java?

This question is about recovering glyph font information in Java and it is related to a question posted here. For more details please check the question and answers.
It was suggested there to use Apache FOP library to recover the kerning pairs directly from the Truetype file since Java doesn't supply this information. I then ported the library to Windows and recovered the kerning pairs using this code:
TTFFile file;
File ttf = new File("C:\\Windows\\Fonts\\calibri.ttf" );
try { file = TTFFile.open(ttf); }
catch (IOException e) {e.printStackTrace(); }
Map<Integer, Map<Integer, Integer>> kerning = file.getKerning();
Finally, the library works but the kerning pairs returned don't work with the glyphs retrieved in a Path2D.Float using the function below and the code fragment shown right after:
void vectorize(Path2D.Float path, String s) {
PathIterator pIter;
FontRenderContext frc = new FontRenderContext(null,true,true);
GlyphVector gv;
Shape glyph;
gv = font.createGlyphVector(frc, s);
glyph = gv.getGlyphOutline(0);
pIter = glyph.getPathIterator(null);
while (!pIter.isDone()) {
switch(pIter.currentSegment(points)) {
case PathIterator.SEG_MOVETO:
path.moveTo(points[0], points[1]);
break;
case PathIterator.SEG_LINETO :
path.lineTo(points[0], points[1]);
break;
case PathIterator.SEG_QUADTO :
path.quadTo(points[0], points[1], points[2], points[3]);
break;
case PathIterator.SEG_CUBICTO :
path.curveTo(points[0], points[1], points[2], points[3], points[4], points[5]);
break;
case PathIterator.SEG_CLOSE :
path.closePath();
}
pIter.next();
}
}
The glyph lengths are retrieved into the array lens:
Font font = new Font("Calibri", Font.PLAIN, 1000);
double interchar = 1000. * 0.075;
int size = '}' - ' ' + 1;
Path2D.Float[] glyphs = new Path2D.Float[size];
double[] lens = new double[size];
String chars[] = new String[size];
int i; char c;
char[] s = { '0' };
for (i = 0, c = ' '; c <= '}'; c++, i++) { s[0] = c; chars[i] = new String(s); }
for (i = 0; i < size; i++) {
vectorize(glyphs[i] = new Path2D.Float(), chars[i]); // function shown above
lens[i] = glyphs[i].getBounds2D().getWidth() + interchar;
}
Just to be clear, I display the glyphs using fill from Graphics2D and I translate using the lengths above added to the kerning displacements returned by the Apache FOP library as suggested, but the result is horrible. The font size is standard 1000, as suggested in that discussion, and interchar results in 75. All this seems correct but my manual kerning pairs look far much better than using the kerning pairs from the TTF file.
Is there anyone knowledgeable in this library or Truetype Fonts to be able to tell how we are supposed to use these kerning pairs?
Is it necessary to access the glyphs directly from the TTF file instead of using Java font management as shown above? If yes, how?
Problem solved!
Recalling that to open the file and to obtain the kerning pairs one needs this code, using the library Apache FOP:
TTFFile file;
File ttf = new File("C:\\Windows\\Fonts\\calibri.ttf" );
try { file = TTFFile.open(ttf); }
catch (IOException e) {e.printStackTrace(); }
Map<Integer, Map<Integer, Integer>> kerning = file.getKerning();
The following piece of code to vectorize the glyphs is correct now:
Font font = new Font("Calibri", Font.PLAIN, 2048);
int size = '}' - ' ' + 1;
Path2D.Float[] glyphs = new Path2D.Float[size];
//double[] lens = new double[size];
String chars[] = new String[size];
int i; char c;
char[] s = { '0' };
for (i = 0, c = ' '; c <= '}'; c++, i++) { s[0] = c; chars[i] = new String(s); }
for (i = 0; i < size; i++) {
vectorize(glyphs[i] = new Path2D.Float(), chars[i]);
//lens[i] = glyphs[i].getBounds2D().getWidth();
}
Notice that now the font size is 2048 which is the unitsPerEm for this particular font. This value is given by the HEAD tag in the font file as explained here.
Notice that the widths cannot be given by the array lens and code commented out above. It has to be read from the file. Using int width = getCharWidthRaw(prev) from Apache FOP, where prev is the previous character, width is the raw width of the character as written in the file. This value has to be added to the kerning pair value that can be obtained in the map kerning.
The map is used this way: kerning.get(prev) which returns another map containing the characters and kerning values to be added. If the character to be shown next is found in this map, the corresponding value is added to width. If not found, or if null is returned, there is no kerning value for this pair.
Here it is a text to show the kerning now works.
GNU Classpath contains an example, gnu.classpath.examples.awt.HintingDemo.java, that may help to solve this problem. This example allows you to visualize glyphs. It reads the font and interprets the language for hints given in it. You can choose to show with hints or without them (hinted glyphs are good for small font sizes but not recommended in large sizes). If you are not used to Truetype hints you will understand with this demo that they align the paths within integer boundaries. The program isn't very fancy but it has all the necessary tools to read the glyphs and interpret the hints with the advantage of visualizing the results.
You don't need the whole package to compile and run this demo. If you are using Eclipse it is easy to create a project for it. First create the packages gnu.classpath.examples.awt and import HintingDemo.java in it. Then you just import all its dependencies, file by file or whole packages at a time. For example, you can import the whole package gnu.java.awt.font and erase OpenTypeFontPeer.java (the demo doesn't need it and it causes an error if you leave it).
This gives a standalone way to read and display glyphs directly from the font file. Interestingly, it doesn't use any kerning information. This has to be added with Apache FOP library. If reading the file twice is a problem you will need a workaround, either going deeply into GNU Classpath to get the same information, or trying to make Apache FOP "to talk" with GNU Classpath. At this time I am unable to say how difficult this is. I am using it only as tools to copy the information and using it elsewhere, not as a way to really read font files in an actual program. Fonts are very compact but are not the most efficient way to display text, especially where there is interpretation of the font language as in the case of Type 1 and Truetype fonts. Getting rid of this interpretation looks like a good idea if you are willing high quality and speed.

Extracting frequency from wav file

I am trying to extract frequency from a wav file, but looks like something is going wrong.
First of all I am extracting bytes from files, then applying FFT on it and at last finding the magnitude.
Seems like I am doing something wrong as the output is not close to real value.
Below is the code.
try{
File log = new File("files/log.txt");
if(!log.exists()) log.createNewFile();
PrintStream ps = new PrintStream(log);
File f = new File("files/5000.wav");
FileInputStream fis = new FileInputStream(f);
int length = (int)f.length();
length = (int)nearestPow2(length);
double[] ibr = new double[length]; //== real
double[] ibi = new double[length]; //== imaginary
int i = 0;
int l=0;
//fis.skip(44);
byte[] b = new byte[1024];
while((l=fis.read(b))!=-1){
try{
for(int j=0; j<1024; j++){
ibr[i] = b[j];
ibi[i] = 0;
i++;
}
}catch(Exception e){}
}
double[] ftb = FFTBase.fft(ibr, ibi, true);
double[] mag = new double[ftb.length/2];
double mxMag = 0;
long avgMg = 0;
int reqIndex = 512; //== no need to go till end
for(i=1;i<ibi.length; i++){
ibr[i] = ftb[i*2];
ibi[i] = ftb[i*2+1];
mag[i] = Math.sqrt(ibr[i]*ibr[i]+ibi[i]*ibi[i]);
avgMg += mag[i];
if(mag[i]>mxMag) mxMag = mag[i];
ps.println(mag[i]);
}
avgMg = avgMg/ibi.length;
ps.println("MAx===="+mxMag);
ps.println("Average===="+avgMg);
}catch(Exception e){e.printStackTrace();}
When I run this code for a 5KHZ file , these are the values I am getting.
https://pastebin.com/R3V0QU4G
This is not the complete output, but its somewhat similar.
Thanks
Extracting a frequency, or a "pitch" is unfortunatly hardly possible by only doing a fft and searching for the "loudest" frequency or something like that. At least if you are trying to extract it from a musical signal.
Also there are different kinds of tones. A large portion of musical instruments (i.e. a guitar or our voice) create harmonic sounds which consists of several frequencies which follow a certain pattern.
But there are also tones that have only one Peak / frequency (i.e. whistleing)
Additionally you usually have to deal with noise in the signal that is not tonal at all. This could be a background noise, or this could be produced by the instrument itself. Guitars for instance have a very large noise-portion while the attack-phase.
You can use different approaches, meaning different algorthims to find the pitch of these signals, depending of its type.
If we stay in the frequency domain (fft) and assuming we want to analyze a harmonic sound there is for example the two way mismatch algorithm that uses a statistical patternmatching to find harmonics and to guess the fundamental frequency, which is the frequency that is perceived as the tone by our ears.
An example-implementation can be found here: https://github.com/ausmauricio/audio_dsp This repo is part of a complete course on audio signal processing at coursera, maybe this is helpful.

finding the value pair that has the highest affinity in Java?

Hi I am current working on a algorithm problem set.
Given the below file in a file.txt file,
yahoo,ap42
google,ap42
twitter,thl76
google,aa314
google,aa314
google,thl76
twitter,aa314
twitter,ap42
yahoo,aa314
A web server logs page views in a log file. The log file consists of one line per page view. A page view consists of page id and a user id, separated by a comma. The affinity of a pair of pages is the number of distinct users who viewed both pages. For example in the quoted log file, the affinity of yahoo and google is 2 (because ap42 viewed both and aa314 viewed both).
My requirement is to create an algorithm which will return the pair of pages with highest affinity.
Currently, I have written below code, however, right now it is not returning the pair of pages with highest affinity, any suggest of how I am modify the code to make it work? thanks. :
Scanner in = new Scanner(new File("./file.txt"));
ArrayList<String[]> logList = new ArrayList<String[]>();
while (in.hasNextLine()) {
logList.add(in.nextLine().split(","));
}
String currentPage;
String currentUser;
int highestCount =0;
for (int i = 0; i < logList.size()-1; i++) {
int affinityCount =0;
currentPage = logList.get(i)[0];
currentUser = logList.get(i)[1];
for (int j = logList.size()-1; j > 0; j--) {
if (i != j) {
if (!currentPage.equals(logList.get(j)[0])
&& currentUser.equals(logList.get(j)[1])) {
affinityCount++;
System.out.println("currentPage: "+currentPage+" currentUser: "+ currentUser);
System.out.println("logList.get(j)[0]: "+logList.get(j)[0]+" logList.get(j)[1]): "+ logList.get(j)[1]);
System.out.println(affinityCount);
}
}
}
}
Am going to write the algorithm here . You can convert that into the code.
Traverse the file and create a hashMap of .
After this traversal, you shall get the pages viewed by each user.
Now traverse this dataset. For each user, take out the list of pages he viewed. Make all possible combinations of pair of pages and put it in a max heap with value set to 1. If the combination exists in heap, increment the value.
Make sure you treat - yahoo,google same as google,yahoo while comparing.
At the end of this, the element at top of the heap is your output.

count distinct values in big long array (performance issue)

I have this:
long hnds[] = new long[133784560]; // 133 million
Then I quickly fill the array (couple of ms) and then I somehow want to know the number of unique (i.e. distinct) values. Now, I don't even need this realtime, I just need to try out a couple of variations and see how many unique values each gives.
I tried e.g. this:
import org.apache.commons.lang3.ArrayUtils;
....
HashSet<Long> length = new HashSet<Long>(Arrays.asList(ArrayUtils.toObject(hnds)));
System.out.println("size: " + length.size());
and after waiting for half an hour it gives a heap space error (I have Xmx4000m).
I also tried initializing Long[] hnds instead of long[] hnds, but then the initial filling of the array takes forever. Or for example use a Set from the beginning when adding the values, but also then it takes forever. Is there any way to count the distinct values of a long[] array without waiting forever? I'd write it to a file if I have to, just some way.
My best suggestion would be to use a library like fastutil (http://fastutil.di.unimi.it/) and then use the custom unboxed hash set:
import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
System.out.println(new LongOpenHashSet(hnds).size());
(Also, by the way, if you can accept approximate answers, there are much more efficient algorithms you can try; see e.g. this paper for details.)
Just sort it and count.
int sz = 133784560;
Random randy = new Random();
long[] longs = new long[sz];
for(int i = 0; i < sz; i++) { longs[i] = randy.nextInt(10000000); }
Arrays.sort(longs);
long lastSeen = longs[0];
long count = 0;
for(int i = 1; i < sz; i++) {
if(longs[i] != lastSeen) count++;
lastSeen = longs[i];
}
Takes about 15 seconds on my laptop.

How to train data correctly using libsvm?

I want to use SVM (Support vector machine) in my program, but I could not get the true result.
I want to know that how we must train data for SVM.
What I am doing:
Think that we have 5 document (the numbers are just an example), 3 of them is on first category and others (2 of them) are on second category, I merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document), after that I made a train array like this:
double[][] train = new double[cat1.getDocument().getAttributes().size() + cat2.getDocument().getAttributes().size()][];
and I will fill the array like this:
int i = 0;
Iterator<String> iteraitor = cat1.getDocument().getAttributes().keySet().iterator();
Iterator<String> iteraitor2 = cat2.getDocument().getAttributes().keySet().iterator();
while (i < train.length) {
if (i < cat2.getDocument().getAttributes().size()) {
while (iteraitor2.hasNext()) {
String key = (String) iteraitor2.next();
Long value = cat2.getDocument().getAttributes().get(key);
double[] vals = { 0, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
} else {
while (iteraitor.hasNext()) {
String key = (String) iteraitor.next();
Long value = cat1.getDocument().getAttributes().get(key);
double[] vals = { 1, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
i++;
}
so I will continue like this to get the model :
svm_problem prob = new svm_problem();
int dataCount = train.length;
prob.y = new double[dataCount];
prob.l = dataCount;
prob.x = new svm_node[dataCount][];
for (int k = 0; k < dataCount; k++) {
double[] features = train[k];
prob.x[k] = new svm_node[features.length - 1];
for (int j = 1; j < features.length; j++) {
svm_node node = new svm_node();
node.index = j;
node.value = features[j];
prob.x[k][j - 1] = node;
}
prob.y[k] = features[0];
}
svm_parameter param = new svm_parameter();
param.probability = 1;
param.gamma = 0.5;
param.nu = 0.5;
param.C = 1;
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.LINEAR;
param.cache_size = 20000;
param.eps = 0.001;
svm_model model = svm.svm_train(prob, param);
Is this way correct? if not please help me to make it true.
these two answers are true : answer one , answer two,
Even without examining the code one can find conceptual errors:
think that we have 5 document , 3 of them is on first category and others( 2 of them) are on second category , i merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document ) ,after that i made a train array like this
So:
training on the 5 documents won't give any reasonable effects, with any machine learning model... these are statistical models,there is no reasonable statistics in 5 points in R^n, where n~10,000
You do not merge anything. Such approach can work for Naive Bayes, which do not really treat documents as "whole" but rather - as probabilistic dependencies between features and classes. In SVM each document should be separate point in the R^n space, where n can be number of distinct words (for bag of words/set of words representation).
A problem might be that you do not terminate each set of features in a training example with an index of -1 which you should according to the read me...
I.e. if you have one example with two features i think you should do:
Index[0]: 0
Value[0]: 22
Index[1]: 1
Value[1]: 53
Index[2]: -1
Good luck!
Using SVMs to classify text is a common task. You can check out research papers by Joachims [1] regarding SVM text classification.
Basically you have to:
Tokenize your documents
Remove stopwords
Apply stemming technique
Apply feature selection technique (see [2])
Transform your documents using features achieved in 4.) (simple would be binary (0: feature is absent, 1: feature is present) or other measures like TFC)
Train your SVM and be happy :)
[1] T. Joachims: Text Categorization with Support Vector Machines: Learning with Many Relevant Features; Springer: Heidelberg, Germany, 1998, doi:10.1007/BFb0026683.
[2] Y. Yang, J. O. Pedersen: A Comparative Study on Feature Selection in Text Categorization. International Conference on Machine Learning, 1997, 412-420.

Categories