PDFBox how to get the upper-left coordinates of an Image? - java

I'm using the following script to get the image positions within a Page. How can I transfer them to Pixel coordinates of the upper-left Corner? Because I want to create a Rectangle based on the position and size of the image and compare it to another Rectangle.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox.examples.util;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.contentstream.operator.DrawObject;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.contentstream.operator.state.Concatenate;
import org.apache.pdfbox.contentstream.operator.state.Restore;
import org.apache.pdfbox.contentstream.operator.state.Save;
import org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters;
import org.apache.pdfbox.contentstream.operator.state.SetMatrix;
/**
* This is an example on how to get the x/y coordinates of image locations.
*
* #author Ben Litchfield
*/
public class PrintImageLocations extends PDFStreamEngine
{
/**
* Default constructor.
*
* #throws IOException If there is an error loading text stripper properties.
*/
public PrintImageLocations() throws IOException
{
addOperator(new Concatenate());
addOperator(new DrawObject());
addOperator(new SetGraphicsStateParameters());
addOperator(new Save());
addOperator(new Restore());
addOperator(new SetMatrix());
}
/**
* This will print the documents data.
*
* #param args The command line arguments.
*
* #throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException
{
if( args.length != 1 )
{
usage();
}
else
{
PDDocument document = null;
try
{
document = PDDocument.load( new File(args[0]) );
PrintImageLocations printer = new PrintImageLocations();
int pageNum = 0;
for( PDPage page : document.getPages() )
{
pageNum++;
System.out.println( "Processing page: " + pageNum );
printer.processPage(page);
}
}
finally
{
if( document != null )
{
document.close();
}
}
}
}
/**
* This is used to handle an operation.
*
* #param operator The operation to perform.
* #param operands The list of arguments.
*
* #throws IOException If there is an error processing the operation.
*/
#Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
String operation = operator.getName();
if( "Do".equals(operation) )
{
COSName objectName = (COSName) operands.get( 0 );
PDXObject xobject = getResources().getXObject( objectName );
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
System.out.println("*******************************************************************");
System.out.println("Found image [" + objectName.getName() + "]");
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
float imageXScale = ctmNew.getScalingFactorX();
float imageYScale = ctmNew.getScalingFactorY();
// position in user space units. 1 unit = 1/72 inch at 72 dpi
System.out.println("position in PDF = " + ctmNew.getTranslateX() + ", " + ctmNew.getTranslateY() + " in user space units");
// raw size in pixels
System.out.println("raw image size = " + imageWidth + ", " + imageHeight + " in pixels");
// displayed size in user space units
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in user space units");
// displayed size in inches at 72 dpi rendering
imageXScale /= 72;
imageYScale /= 72;
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in inches at 72 dpi rendering");
// displayed size in millimeters at 72 dpi rendering
imageXScale *= 25.4;
imageYScale *= 25.4;
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in millimeters at 72 dpi rendering");
System.out.println();
}
else if(xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject)xobject;
showForm(form);
}
}
else
{
super.processOperator( operator, operands);
}
}
/**
* This will print the usage for this document.
*/
private static void usage()
{
System.err.println( "Usage: java " + PrintImageLocations.class.getName() + " <input-pdf>" );
}
}
Source: https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java
Possible Formula to get the coordinates of the upper left edge:
X1 = ctmNew.getTranslateX() * topixelfactor
Y1 = pageheight - ctmNew.getTranslateY() * toPixelFactor - imageHeight
I can get the pageHeight by page.getMediaBox().getHeight(); But the problem is to calculate the "toPixelFactor" as the Translation from ctmNew.getTranslateY() is given in User Space Units and not in Pixel.

Related

Negative X or Y obtained from PdfBox while extracting text position

I am trying to extract all text in a pdf along with their coordinates.
I am using Apache PDFBox 2.0.8 and following the sample program DrawPrintTextLocations .
It seems to work mostly, but for certain pdf-s i get negative values for the x and y coordinates of the bounding boxes. Refer this pdf file for example.
My app assumes the coordinate system as a normal pdf (x goes from left to right an y goes top to bottom). so these are throwing my computations off.
Below is the relevant piece of code.
import org.apache.fontbox.util.BoundingBox;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType3Font;
import org.apache.pdfbox.pdmodel.interactive.pagenavigation.PDThreadBead;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.geom.AffineTransform;
import java.awt.geom.Rectangle2D;
import java.awt.image.BufferedImage;
import java.io.*;
import java.util.List;
/**
* This is an example on how to get some x/y coordinates of text and to show them in a rendered
* image.
*
* #author Ben Litchfield
* #author Tilman Hausherr
*/
public class DrawPrintTextLocations extends PDFTextStripper {
private AffineTransform flipAT;
private AffineTransform rotateAT;
private AffineTransform transAT;
private final float DPI = 200.0f;
private final double PT2PX = DPI / 72.0;
private final AffineTransform dpiAT = AffineTransform.getScaleInstance(PT2PX, PT2PX);
private final String filename;
static final int SCALE = 1;
private Graphics2D g2d;
private final PDDocument document;
/**
* Instantiate a new PDFTextStripper object.
*
* #param document
* #param filename
* #throws IOException If there is an error loading the properties.
*/
public DrawPrintTextLocations(PDDocument document, String filename) throws IOException {
this.document = document;
this.filename = filename;
}
/**
* This will print the documents data.
*
* #param args The command line arguments.
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
String pdfLoc = "/debug/pdfbox/p2_VS008PI.pdf";
if (args.length == 1) {
pdfLoc = args[0];
}
try (PDDocument document = PDDocument.load(new File(pdfLoc))) {
DrawPrintTextLocations stripper = new DrawPrintTextLocations(document, pdfLoc);
stripper.setSortByPosition(true);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
stripper.stripPage(page);
}
}
}
private void stripPage(int page) throws IOException {
PDFRenderer pdfRenderer = new PDFRenderer(document);
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI);
PDPage pdPage = document.getPage(page);
PDRectangle cropBox = pdPage.getCropBox();
// flip y-axis
flipAT = new AffineTransform();
flipAT.translate(0, pdPage.getBBox().getHeight());
flipAT.scale(1, -1);
// page may be rotated
rotateAT = new AffineTransform();
int rotation = pdPage.getRotation();
if (rotation != 0) {
PDRectangle mediaBox = pdPage.getMediaBox();
switch (rotation) {
case 90:
rotateAT.translate(mediaBox.getHeight(), 0);
break;
case 270:
rotateAT.translate(0, mediaBox.getWidth());
break;
case 180:
rotateAT.translate(mediaBox.getWidth(), mediaBox.getHeight());
break;
default:
break;
}
rotateAT.rotate(Math.toRadians(rotation));
}
// cropbox
transAT = AffineTransform.getTranslateInstance(-cropBox.getLowerLeftX(), cropBox.getLowerLeftY());
g2d = image.createGraphics();
g2d.setStroke(new BasicStroke(0.1f));
g2d.scale(SCALE, SCALE);
setStartPage(page + 1);
setEndPage(page + 1);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
writeText(document, dummy);
g2d.dispose();
String imageFilename = filename;
int pt = imageFilename.lastIndexOf('.');
imageFilename = imageFilename.substring(0, pt) + "-marked-" + (page + 1) + ".png";
ImageIO.write(image, "png", new File(imageFilename));
}
/**
* Override the default functionality of PDFTextStripper.
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
for (TextPosition text : textPositions) {
AffineTransform at = text.getTextMatrix().createAffineTransform();
PDFont font = text.getFont();
BoundingBox bbox = font.getBoundingBox();
float xadvance = font.getWidth(text.getCharacterCodes()[0]); // todo: should iterate all chars
Rectangle2D.Float rect1 = new Rectangle2D.Float(0, bbox.getLowerLeftY(), xadvance, bbox.getHeight());
if (font instanceof PDType3Font) {
at.concatenate(font.getFontMatrix().createAffineTransform());
} else {
at.scale(1 / 1000f, 1 / 1000f);
}
Shape s1 = at.createTransformedShape(rect1);
s1 = flipAT.createTransformedShape(s1);
s1 = rotateAT.createTransformedShape(s1);
s1 = dpiAT.createTransformedShape(s1);
g2d.setColor(Color.blue);
g2d.draw(s1);
Rectangle bounds = s1.getBounds();
if (bounds.getX() < 0 || bounds.getY() < 0) {
// THIS is where things go wrong
// i need these coordinates to be +ve
System.out.println(bounds.toString());
System.out.println(rect1.toString());
}
}
}
}
And here is some snippet of the output from the first page of the above pdf.
SECTION 10 – INSURANCE & OTHER FINANCIAL RESOURCES
java.awt.Rectangle[x=-3237,y=40,width=19,height=43]
java.awt.Rectangle[x=-3216,y=40,width=20,height=43]
java.awt.Rectangle[x=-3194,y=40,width=23,height=43]
java.awt.Rectangle[x=-3170,y=40,width=22,height=43]
The characters with negative coordinates are outside the cropbox (also characters with coordinates bigger than the cropbox height / width). See the cropbox as a cutout from something bigger. To see the whole thing, run this code
pdPage.setCropBox(pdPage.getMediaBox());
for each page of your PDF and then save and view it.
Per your comment
Following your advice of setting the crop box to the media box, actually changed the whole on screen appearance of the pdf, now i got 3 pages collated as one.
This suggests that physically, this is a folded sheet that has 3 pages on each side. The online PDF displays this as 6 pages for easy viewing on a computer.

JCUDA cuda file not compiling

http://www.jcuda.org/tutorial/TutorialIndex.html
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package jcudavectoradd;
/**
*
* #author Sanjula
*/
/*
* JCuda - Java bindings for NVIDIA CUDA driver and runtime API
* http://www.jcuda.org
*
* Copyright 2011 Marco Hutter - http://www.jcuda.org
*/
import static jcuda.driver.JCudaDriver.*;
import java.io.*;
import jcuda.*;
import jcuda.driver.*;
/**
* This is a sample class demonstrating how to use the JCuda driver
* bindings to load and execute a CUDA vector addition kernel.
* The sample reads a CUDA file, compiles it to a PTX file
* using NVCC, loads the PTX file as a module and executes
* the kernel function. <br />
*/
public class JCudaVectorAdd
{
/**
* Entry point of this sample
*
* #param args Not used
* #throws IOException If an IO error occurs
*/
public static void main(String args[]) throws IOException
{
// Enable exceptions and omit all subsequent error checks
JCudaDriver.setExceptionsEnabled(true);
// Create the PTX file by calling the NVCC
String ptxFileName = preparePtxFile("JCudaVectorAddKernel.cu");
//String ptxFileName = "JCudaVectorAddKernel.ptx";
// Initialize the driver and create a context for the first device.
cuInit(0);
CUdevice device = new CUdevice();
cuDeviceGet(device, 0);
CUcontext context = new CUcontext();
cuCtxCreate(context, 0, device);
// Load the ptx file.
CUmodule module = new CUmodule();
cuModuleLoad(module, ptxFileName);
// Obtain a function pointer to the "add" function.
CUfunction function = new CUfunction();
cuModuleGetFunction(function, module, "add");
int numElements = 100000;
// Allocate and fill the host input data
float hostInputA[] = new float[numElements];
float hostInputB[] = new float[numElements];
for(int i = 0; i < numElements; i++)
{
hostInputA[i] = (float)i;
hostInputB[i] = (float)i;
}
// Allocate the device input data, and copy the
// host input data to the device
CUdeviceptr deviceInputA = new CUdeviceptr();
cuMemAlloc(deviceInputA, numElements * Sizeof.FLOAT);
cuMemcpyHtoD(deviceInputA, Pointer.to(hostInputA),
numElements * Sizeof.FLOAT);
CUdeviceptr deviceInputB = new CUdeviceptr();
cuMemAlloc(deviceInputB, numElements * Sizeof.FLOAT);
cuMemcpyHtoD(deviceInputB, Pointer.to(hostInputB),
numElements * Sizeof.FLOAT);
// Allocate device output memory
CUdeviceptr deviceOutput = new CUdeviceptr();
cuMemAlloc(deviceOutput, numElements * Sizeof.FLOAT);
// Set up the kernel parameters: A pointer to an array
// of pointers which point to the actual values.
Pointer kernelParameters = Pointer.to(
Pointer.to(new int[]{numElements}),
Pointer.to(deviceInputA),
Pointer.to(deviceInputB),
Pointer.to(deviceOutput)
);
// Call the kernel function.
int blockSizeX = 256;
int gridSizeX = (int)Math.ceil((double)numElements / blockSizeX);
cuLaunchKernel(function,
gridSizeX, 1, 1, // Grid dimension
blockSizeX, 1, 1, // Block dimension
0, null, // Shared memory size and stream
kernelParameters, null // Kernel- and extra parameters
);
cuCtxSynchronize();
// Allocate host output memory and copy the device output
// to the host.
float hostOutput[] = new float[numElements];
cuMemcpyDtoH(Pointer.to(hostOutput), deviceOutput,
numElements * Sizeof.FLOAT);
// Verify the result
boolean passed = true;
for(int i = 0; i < numElements; i++)
{
float expected = i+i;
if (Math.abs(hostOutput[i] - expected) > 1e-5)
{
System.out.println(
"At index "+i+ " found "+hostOutput[i]+
" but expected "+expected);
passed = false;
break;
}
}
System.out.println("Test "+(passed?"PASSED":"FAILED"));
// Clean up.
cuMemFree(deviceInputA);
cuMemFree(deviceInputB);
cuMemFree(deviceOutput);
}
/**
* The extension of the given file name is replaced with "ptx".
* If the file with the resulting name does not exist, it is
* compiled from the given file using NVCC. The name of the
* PTX file is returned.
*
* #param cuFileName The name of the .CU file
* #return The name of the PTX file
* #throws IOException If an I/O error occurs
*/
private static String preparePtxFile(String cuFileName) throws IOException
{
int endIndex = cuFileName.lastIndexOf('.');
if (endIndex == -1)
{
endIndex = cuFileName.length()-1;
}
String ptxFileName = cuFileName.substring(0, endIndex+1)+"ptx";
File ptxFile = new File(ptxFileName);
if (ptxFile.exists())
{
return ptxFileName;
}
File cuFile = new File(cuFileName);
if (!cuFile.exists())
{
throw new IOException("Input file not found: "+cuFileName);
}
String modelString = "-m"+System.getProperty("sun.arch.data.model");
String command =
"nvcc " + modelString + " -ptx "+
cuFile.getPath()+" -o "+ptxFileName;
System.out.println("Executing\n"+command);
Process process = Runtime.getRuntime().exec(command);
String errorMessage =
new String(toByteArray(process.getErrorStream()));
String outputMessage =
new String(toByteArray(process.getInputStream()));
int exitValue = 0;
try
{
exitValue = process.waitFor();
}
catch (InterruptedException e)
{
Thread.currentThread().interrupt();
throw new IOException(
"Interrupted while waiting for nvcc output", e);
}
if (exitValue != 0)
{
System.out.println("nvcc process exitValue "+exitValue);
System.out.println("errorMessage:\n"+errorMessage);
System.out.println("outputMessage:\n"+outputMessage);
throw new IOException(
"Could not create .ptx file: "+errorMessage);
}
System.out.println("Finished creating PTX file");
return ptxFileName;
}
/**
* Fully reads the given InputStream and returns it as a byte array
*
* #param inputStream The input stream to read
* #return The byte array containing the data from the input stream
* #throws IOException If an I/O error occurs
*/
private static byte[] toByteArray(InputStream inputStream)
throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte buffer[] = new byte[8192];
while (true)
{
int read = inputStream.read(buffer);
if (read == -1)
{
break;
}
baos.write(buffer, 0, read);
}
return baos.toByteArray();
}
}
extern "C"
__global__ void add(int n, float *a, float *b, float *sum)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<n)
{
sum[i] = a[i] + b[i];
}
}
when I compile this code I get this error. I am using the NetBeans 8.2 and I installed the Cuda. it is perfectly working in the visual studio 2015 . but it not working with java.
i added visual studio cl.exe path to Environment Variables
C:\Program Files\Microsoft Visual Studio 10.0\VC\bin
go to My Computer -> Properties -> Advanced System Settings -> Environment Variables. Here look for "PATH" in the list, and add the path above (or whatever is the location of your cl.exe).

pdfBox add different lines to pdf

I'm looking into generating a pdf-document. At the moment I'm trying out different approaches. I want to get more than one line in a pdf-document. Using a HelloWorld code example I came up with ...
package org.apache.pdfbox.examples.pdmodel;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
/**
* Creates a "Hello World" PDF using the built-in Helvetica font.
*
* The example is taken from the PDF file format specification.
*/
public final class HelloWorld
{
private HelloWorld()
{
}
public static void main(String[] args) throws IOException
{
String filename = "line.pdf";
String message = "line";
PDDocument doc = new PDDocument();
try
{
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contents = new PDPageContentStream(doc, page);
contents.beginText();
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int y = 0; y< 25; y++) {
int ty = 700 + y * 15;
contents.newLineAtOffset(100, ty);
//contents.newLineAtOffset(125, ty);
//contents.showText(Integer.toString(i));
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
}
contents.endText();
contents.close();
doc.save(filename);
}
finally
{
doc.close();
System.out.println("HelloWorld finished after 'doc.close()'.");
}
}
}
But looking at my resulting document I only see "line 0" once, and no other lines. What am I doing wrong?
Your issue is that you think PDPageContentStream.newLineAtOffset uses absolute coordinates. This is not the case, it uses relative coordinates, cf. the JavaDocs:
/**
* The Td operator.
* Move to the start of the next line, offset from the start of the current line by (tx, ty).
*
* #param tx The x translation.
* #param ty The y translation.
* #throws IOException If there is an error writing to the stream.
* #throws IllegalStateException If the method was not allowed to be called at this time.
*/
public void newLineAtOffset(float tx, float ty) throws IOException
So your additional lines are way off the visible page area.
Thus, you might want to something like this:
...
contents.beginText();
contents.setFont(font, 12);
contents.newLineAtOffset(100, 700);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
contents.newLineAtOffset(0, -15);
}
contents.endText();
...
Here you start at 100, 700 and move down for each line by 15.
In addition to mkl's answer you could also create a new text operation for each line. Doing that will enable you to use absolute coordinates.
...
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
int ty = 700 + y * 15;
contents.beginText();
contents.newLineAtOffset(100, ty);
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i))
contents.endText();
}
...
Whether you need this or not depends on your usecase.
For example I wanted to write some text right aligned. In that case it was easier to use absolute position, so I created a helper method like this:
public static void showTextRightAligned(PDPageContentStream contentStream, PDType1Font font, int fontsize, float rightX, float topY, String text) throws IOException
{
float textWidth = fontsize * font.getStringWidth(text) / 1000;
float leftX = rightX - textWidth;
contentStream.beginText();
contentStream.newLineAtOffset(leftX, topY);
contentStream.showText(text);
contentStream.endText();
}
You could do something like this:
contentStream.beginText();
contentStream.newLineAtOffset(20,750);
//This begins the cursor at top right
contentStream.setFont(PDType1Font.TIMES_ROMAN,8);
for (String readList : resultList) {
contentStream.showText(readList);
contentStream.newLineAtOffset(0,-12);
//This will move cursor down by 12pts on every run of loop
}

Rectify Exception in JSpeex Encoding

I am getting a null pointer exception. I have seen the previous posts for null pointer exception but I am unable to solve my problem. Also I provided my snapshot of the Exception
Below is my JSpeexCode and SpeexEncoder code.
JSpeex Code
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import org.xiph.speex.AudioFileWriter;
import org.xiph.speex.OggSpeexWriter;
import org.xiph.speex.PcmWaveWriter;
import org.xiph.speex.RawWriter;
import org.xiph.speex.SpeexEncoder;
/**
* Java Speex Command Line Encoder.
*
* Currently this code has been updated to be compatible with release 1.0.3.
*
* #author Marc Gimpel, Wimba S.A. (mgimpel#horizonwimba.com)
* #version $Revision: 1.5 $
*/
public class JSpeexEnc
{
/** Version of the Speex Encoder */
public static final String VERSION = "Java Speex Command Line Encoder v0.9.7 ($Revision: 1.5 $)";
/** Copyright display String */
public static final String COPYRIGHT = "Copyright (C) 2002-2004 Wimba S.A.";
/** Print level for messages : Print debug information */
public static final int DEBUG = 0;
/** Print level for messages : Print basic information */
public static final int INFO = 1;
/** Print level for messages : Print only warnings and errors */
public static final int WARN = 2;
/** Print level for messages : Print only errors */
public static final int ERROR = 3;
/** Print level for messages */
protected int printlevel = INFO;
/** File format for input or output audio file: Raw */
public static final int FILE_FORMAT_RAW = 0;
/** File format for input or output audio file: Ogg */
public static final int FILE_FORMAT_OGG = 1;
/** File format for input or output audio file: Wave */
public static final int FILE_FORMAT_WAVE = 2;
/** Defines File format for input audio file (Raw, Ogg or Wave). */
protected int srcFormat = FILE_FORMAT_OGG;
/** Defines File format for output audio file (Raw or Wave). */
protected int destFormat = FILE_FORMAT_WAVE;
/** Defines the encoder mode (0=NB, 1=WB and 2=UWB). */
protected int mode = -1;
/** Defines the encoder quality setting (integer from 0 to 10). */
protected int quality = 8;
/** Defines the encoders algorithmic complexity. */
protected int complexity = 3;
/** Defines the number of frames per speex packet. */
protected int nframes = 1;
/** Defines the desired bitrate for the encoded audio. */
protected int bitrate = -1;
/** Defines the sampling rate of the audio input. */
protected int sampleRate = -1;
/** Defines the number of channels of the audio input (1=mono, 2=stereo). */
protected int channels = 1;
/** Defines the encoder VBR quality setting (float from 0 to 10). */
protected float vbr_quality = -1;
/** Defines whether or not to use VBR (Variable Bit Rate). */
protected boolean vbr = false;
/** Defines whether or not to use VAD (Voice Activity Detection). */
protected boolean vad = false;
/** Defines whether or not to use DTX (Discontinuous Transmission). */
protected boolean dtx = false;
/** The audio input file */
protected String srcFile;
/** The audio output file */
protected String destFile;
/**
* Builds a plain JSpeex Encoder with default values.
*/
/**
* Command line entrance:
* <pre>
* Usage: JSpeexEnc [options] input_file output_file
* </pre>
* #param args Command line parameters.
*/
public static void main(final String[] args) throws IOException
{
JSpeexEnc encoder = new JSpeexEnc();
if (encoder.parseArgs(args)) {
encoder.encode("frf1.wav", "frf1_encoded.raw");
try (BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\Administrator\\workspace\\JSpeex.java\\src\\frf1.wav")))
{
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
}
} catch (FileNotFoundException e) {
}
}
}
/**
* Parse the command line arguments.
* #param args Command line parameters.
* #param FILE_FORMAT_WAVE1
* #return true if the parsed arguments are sufficient to run the encoder.
*/
public boolean parseArgs(final String[] args)
{
// make sure we have command args
if (args.length < 2) {
if (args.length==1 && (args[0].equalsIgnoreCase("-v") || args[0].equalsIgnoreCase("--version"))) {
version();
return false;
}
usage();
return false;
}
// Determine input, output and file formats
srcFile = args[args.length-2];
destFile = args[args.length-1];
if (srcFile.toLowerCase().endsWith(".wav"))
{
srcFormat = FILE_FORMAT_WAVE;
}
else {
srcFormat = FILE_FORMAT_RAW;
}
if (destFile.toLowerCase().endsWith(".spx")) {
destFormat = FILE_FORMAT_OGG;
}
else if (destFile.toLowerCase().endsWith(".wav")) {
destFormat = FILE_FORMAT_WAVE;
}
else {
destFormat = FILE_FORMAT_RAW;
}
// Determine encoder options
for (int i=0; i<args.length-2; i++) {
if (args[i].equalsIgnoreCase("-h") || args[i].equalsIgnoreCase("--help")) {
usage();
return false;
}
else if (args[i].equalsIgnoreCase("-v") || args[i].equalsIgnoreCase("--version")) {
version();
return false;
}
else if (args[i].equalsIgnoreCase("--verbose")) {
printlevel = DEBUG;
}
else if (args[i].equalsIgnoreCase("--quiet")) {
printlevel = WARN;
}
else if (args[i].equalsIgnoreCase("-n") ||
args[i].equalsIgnoreCase("-nb") ||
args[i].equalsIgnoreCase("--narrowband")) {
mode = 0;
}
else if (args[i].equalsIgnoreCase("-w") ||
args[i].equalsIgnoreCase("-wb") ||
args[i].equalsIgnoreCase("--wideband")) {
mode = 1;
}
else if (args[i].equalsIgnoreCase("-u") ||
args[i].equalsIgnoreCase("-uwb") ||
args[i].equalsIgnoreCase("--ultra-wideband")) {
mode = 2;
}
else if (args[i].equalsIgnoreCase("-q") || args[i].equalsIgnoreCase("--quality")) {
try {
vbr_quality = Float.parseFloat(args[++i]);
quality = (int) vbr_quality;
}
catch (NumberFormatException e) {
usage();
return false;
}
}
else if (args[i].equalsIgnoreCase("--complexity")) {
try {
complexity = Integer.parseInt(args[++i]);
}
catch (NumberFormatException e) {
usage();
return false;
}
}
else if (args[i].equalsIgnoreCase("--nframes")) {
try {
nframes = Integer.parseInt(args[++i]);
}
catch (NumberFormatException e) {
usage();
return false;
}
}
else if (args[i].equalsIgnoreCase("--vbr")) {
vbr = true;
}
else if (args[i].equalsIgnoreCase("--vad")) {
vad = true;
}
else if (args[i].equalsIgnoreCase("--dtx")) {
dtx = true;
}
else if (args[i].equalsIgnoreCase("--rate")) {
try {
sampleRate = Integer.parseInt(args[++i]);
}
catch (NumberFormatException e) {
usage();
return false;
}
}
else if (args[i].equalsIgnoreCase("--stereo")) {
channels = 2;
}
else {
usage();
return false;
}
}
return true;
}
/**
* Prints the usage guidelines.
*/
public static void usage()
{
version();
System.out.println("");
System.out.println("Usage: JSpeexEnc [options] input_file output_file");
System.out.println("Where:");
System.out.println(" input_file can be:" );
System.out.println(" filename.wav a PCM wav file");
System.out.println(" filename.* a raw PCM file (any extension other than .wav)");
System.out.println(" output_file can be:");
System.out.println(" filename.spx an Ogg Speex file");
System.out.println(" filename.wav a Wave Speex file (beta!!!)");
System.out.println(" filename.* a raw Speex file");
System.out.println("Options: -h, --help This help");
System.out.println(" -v, --version Version information");
System.out.println(" --verbose Print detailed information");
System.out.println(" --quiet Print minimal information");
System.out.println(" -n, -nb Consider input as Narrowband (8kHz)");
System.out.println(" -w, -wb Consider input as Wideband (16kHz)");
System.out.println(" -u, -uwb Consider input as Ultra-Wideband (32kHz)");
System.out.println(" --quality n Encoding quality (0-10) default 8");
System.out.println(" --complexity n Encoding complexity (0-10) default 3");
System.out.println(" --nframes n Number of frames per Ogg packet, default 1");
System.out.println(" --vbr Enable varible bit-rate (VBR)");
System.out.println(" --vad Enable voice activity detection (VAD)");
System.out.println(" --dtx Enable file based discontinuous transmission (DTX)");
System.out.println(" if the input file is raw PCM (not a Wave file)");
System.out.println(" --rate n Sampling rate for raw input");
System.out.println(" --stereo Consider input as stereo");
System.out.println("More information is available from: http://jspeex.sourceforge.net/");
System.out.println("This code is a Java port of the Speex codec: http://www.speex.org/");
}
/**
* Prints the version.
*/
public static void version()
{
System.out.println(VERSION);
System.out.println("using " + SpeexEncoder.VERSION);
System.out.println(COPYRIGHT);
}
/**
* Encodes a PCM file to Speex.
*/
public void encode()
{
System.out.println("Value of Destination File is:= "+destFile);
encode();
System.out.println("Value of Destination File is:= " +srcFile +destFile);
}
/**
* Encodes a PCM file to Speex.
* #param string
* #param string2
* #exception IOException */
public void encode(final String string, final String string2)throws IOException
{
byte[] temp = new byte[2560]; // stereo UWB requires one to read 2560b
final int HEADERSIZE = 8;
final String RIFF = "RIFF";
final String WAVE = "WAVE";
final String FORMAT = "fmt ";
final String DATA = "data";
final int WAVE_FORMAT_PCM = 0x0001;
// Display info
if (printlevel <= INFO) {
version();
}
if (printlevel <= DEBUG) {
System.out.println("");
}
if (printlevel <= DEBUG) {
System.out.println("Input File: " );
}
try
(DataInputStream dis = new DataInputStream(new FileInputStream(string)))
{
if (srcFormat == FILE_FORMAT_WAVE) {
// read the WAVE header
dis.readFully(temp, 0, HEADERSIZE+4);
// make sure its a WAVE header
if (!RIFF.equals(new String(temp, 0, 4)) &&
!WAVE.equals(new String(temp, 8, 4)))
{
System.err.println("Not a WAVE file");
return;
}
// Read other header chunks
dis.readFully(temp, 0, HEADERSIZE);
String chunk = new String(temp, 0, 4);
int size = readInt(temp, 4);
while (!chunk.equals(DATA)) {
dis.readFully(temp, 0, size);
if (chunk.equals(FORMAT)) {
/*
typedef struct waveformat_extended_tag {
WORD wFormatTag; // format type
WORD nChannels; // number of channels (i.e. mono, stereo...)
DWORD nSamplesPerSec; // sample rate
DWORD nAvgBytesPerSec; // for buffer estimation
WORD nBlockAlign; // block size of data
WORD wBitsPerSample; // Number of bits per sample of mono data
WORD cbSize; // The count in bytes of the extra size
} WAVEFORMATEX;
*/
if (readShort(temp, 0) != WAVE_FORMAT_PCM) {
System.err.println("Not a PCM file");
return;
}
channels = readShort(temp, 2);
sampleRate = readInt(temp, 4);
if (readShort(temp, 14) != 16) {
System.err.println("Not a 16 bit file " + readShort(temp, 18));
return;
}
// Display audio info
if (printlevel <= DEBUG) {
System.out.println("File Format: PCM wave");
System.out.println("Sample Rate: " + sampleRate);
System.out.println("Channels: " + channels);
}
}
dis.readFully(temp, 0, HEADERSIZE);
chunk = new String(temp, 0, 4);
size = readInt(temp, 4);
}
if (printlevel <= DEBUG) {
System.out.println("Data size: " + size);
}
}
else {
if (sampleRate < 0) {
switch (mode) {
case 0:
sampleRate = 8000;
break;
case 1:
sampleRate = 16000;
break;
case 2:
sampleRate = 32000;
break;
default:
sampleRate = 8000;
break;
}
}
// Display audio info
if (printlevel <= DEBUG) {
System.out.println("File format: Raw audio");
System.out.println("Sample rate: " + sampleRate);
System.out.println("Channels: " + channels);
System.out.println("Data size: " + string.length());
}
}
// Set the mode if it has not yet been determined
if (mode < 0) {
if (sampleRate < 100) { // Sample Rate has probably been given in kHz
sampleRate *= 1000;
}
if (sampleRate < 12000) {
mode = 0; // Narrowband
} else if (sampleRate < 24000) {
mode = 1; // Wideband
} else {
mode = 2; // Ultra-wideband
}
}
// Construct a new encoder
SpeexEncoder speexEncoder = new SpeexEncoder();
SpeexEncoder speexEncoder1 = speexEncoder;
if (complexity > 0) {
speexEncoder1.setComplexity(complexity);
}
if (bitrate > 0) {
speexEncoder1.setBitRate(bitrate);
// speexEncoder1.getEncoder().setBitRate(bitrate);
}
if (vbr) {
// speexEncoder1.getEncoder().setVbr(vbr);
if (vbr_quality > 0) {
speexEncoder1.setVbrQuality(vbr_quality);
}
}
if (vad) {
( speexEncoder1).setVad(vad);
}
if (dtx) {
( speexEncoder1).setDtx(dtx);
}
// Display info
if (printlevel <= DEBUG) {
System.out.println("");
System.out.println("Output File: " + string2);
System.out.println("File format: Ogg Speex");
System.out.println("Encoder mode: " + (mode==0 ? "Narrowband" : (mode==1 ? "Wideband" : "UltraWideband")));
System.out.println("Quality: " + (vbr ? vbr_quality : quality));
System.out.println("Complexity: " + complexity);
System.out.println("Frames per packet: " + nframes);
System.out.println("Variable bitrate: " + vbr);
System.out.println("Voice activity detection: " + vad);
System.out.println("Discontinouous Transmission: " + dtx);
}
// Open the file writer
AudioFileWriter writer;
if (destFormat == FILE_FORMAT_OGG) {
writer = new OggSpeexWriter(mode, sampleRate, channels, nframes, vbr);
}
else if (destFormat == FILE_FORMAT_WAVE) {
nframes = PcmWaveWriter.WAVE_FRAME_SIZES[mode-1][channels-1][quality];
writer = new PcmWaveWriter(mode, quality, sampleRate, channels, nframes, vbr);
}
else {
writer = new RawWriter();
}
writer.open(string2);
writer.writeHeader("Encoded with: " + VERSION);
int pcmPacketSize = 2 * channels * speexEncoder.getFrameSize();
while (true) {
dis.readFully(temp, 0, nframes*pcmPacketSize);
for (int i=0; i<nframes; i++)
speexEncoder.processData(temp, i*pcmPacketSize, pcmPacketSize);
int encsize = speexEncoder.getProcessedData(temp, 0);
if (encsize > 0) {
writer.writePacket(temp, 0, encsize);
}
}
}
}
/**
* Converts Little Endian (Windows) bytes to an int (Java uses Big Endian).
* #param data the data to read.
* #param offset the offset from which to start reading.
* #return the integer value of the reassembled bytes.
*/
protected static int readInt(final byte[] data, final int offset)
{
return (data[offset] & 0xff) |
((data[offset+1] & 0xff) << 8) |
((data[offset+2] & 0xff) << 16) |
(data[offset+3] << 24); // no 0xff on the last one to keep the sign
}
/**
* Converts Little Endian (Windows) bytes to an short (Java uses Big Endian).
* #param data the data to read.
* #param offset the offset from which to start reading.
* #return the integer value of the reassembled bytes.
*/
protected static int readShort(final byte[] data, final int offset)
{
return (data[offset] & 0xff) |
(data[offset+1] << 8); // no 0xff on the last one to keep the sign
}
}
SpeexEncoder code
package org.xiph.speex;
/**
* Main Speex Encoder class.
* This class encodes the given PCM 16bit samples into Speex packets.
*
* #author Marc Gimpel, Wimba S.A. (mgimpel#horizonwimba.com)
* #version $Revision: 1.6 $
*/
public class SpeexEncoder
{
/**
* Version of the Speex Encoder
*/
public static final String VERSION = "Java Speex Encoder v0.9.7 ($Revision: 1.6 $)";
private Encoder encoder;
private Bits bits;
private float[] rawData;
private int sampleRate;
private int channels;
private int frameSize;
/**
* Constructor
*/
public SpeexEncoder()
{
bits = new Bits();
}
/**
* Initialisation
* #param mode the mode of the encoder (0=NB, 1=WB, 2=UWB).
* #param quality the quality setting of the encoder (between 0 and 10).
* #param sampleRate the number of samples per second.
* #param channels the number of audio channels (1=mono, 2=stereo, ...).
* #return true if initialisation successful.
*/
public boolean init(final int mode,
final int quality,
final int sampleRate,
final int channels)
{
switch (mode) {
case 0:
encoder = new NbEncoder();
((NbEncoder)encoder).nbinit();
break;
//Wideband
case 1:
encoder = new SbEncoder();
((SbEncoder)encoder).wbinit();
break;
case 2:
encoder = new SbEncoder();
((SbEncoder)encoder).uwbinit();
break;
//*/
default:
return false;
}
/* initialize the speex decoder */
encoder.setQuality(quality);
/* set decoder format and properties */
this.frameSize = encoder.getFrameSize();
this.sampleRate = sampleRate;
this.channels = channels;
rawData = new float[channels*frameSize];
bits.init();
return true;
}
/**
* Returns the Encoder being used (Narrowband, Wideband or Ultrawideband).
* #return the Encoder being used (Narrowband, Wideband or Ultrawideband).
*/
public Encoder getEncoder()
{
return encoder;
}
/**
* Returns the sample rate.
* #return the sample rate.
*/
public int getSampleRate()
{
return sampleRate;
}
/**
* Returns the number of channels.
* #return the number of channels.
*/
public int getChannels()
{
return channels;
}
/**
* Returns the size of a frame.
* #return the size of a frame.
*/
public int getFrameSize()
{
return frameSize;
}
/**
* Pull the decoded data out into a byte array at the given offset
* and returns the number of bytes of encoded data just read.
* #param data
* #param offset
* #return the number of bytes of encoded data just read.
*/
public int getProcessedData(final byte[] data, final int offset)
{
int size = bits.getBufferSize();
System.arraycopy(bits.getBuffer(), 0, data, offset, size);
bits.init();
return size;
}
/**
* Returns the number of bytes of encoded data ready to be read.
* #return the number of bytes of encoded data ready to be read.
*/
public int getProcessedDataByteSize()
{
return bits.getBufferSize();
}
/**
* This is where the actual encoding takes place
* #param data
* #param offset
* #param len
* #return true if successful.
*/
public boolean processData(final byte[] data,
final int offset,
final int len)
{
// converty raw bytes into float samples
mapPcm16bitLittleEndian2Float(data, offset, rawData, len, len);
// encode the bitstream
return processData(rawData, len/2);
}
/**
* Encode an array of shorts.
* #param data
* #param offset
* #param numShorts
* #return true if successful.
*/
public boolean processData(final short[] data,
final int offset,
final int numShorts)
{
int numSamplesRequired = channels * frameSize;
if (numShorts != numSamplesRequired) {
throw new IllegalArgumentException("SpeexEncoder requires " + numSamplesRequired + " samples to process a Frame, not " + numShorts);
}
// convert shorts into float samples,
for (int i=0; i<numShorts; i++) {
rawData[i] = (float) data[offset + i ];
}
// encode the bitstream
return processData(rawData, numShorts);
}
/**
* Encode an array of floats.
* #param data
* #param numSamples
* #return true if successful.
*/
public boolean processData(final float[] data, final int numSamples)
{
int numSamplesRequired = channels * frameSize;
if (numSamples != numSamplesRequired) {
throw new IllegalArgumentException("SpeexEncoder requires " + numSamplesRequired + " samples to process a Frame, not " + numSamples );
}
// encode the bitstream
if (channels==2) {
Stereo.encode(bits, data, frameSize);
}
encoder.encode(bits, data);
//System.out.println("THA VALUE OF BITS IS:" + bits);
return true;
}
/**
* Converts a 16 bit linear PCM stream (in the form of a byte array)
* into a floating point PCM stream (in the form of an float array).
* Here are some important details about the encoding:
* <ul>
* <li> Java uses big endian for shorts and ints, and Windows uses little Endian.
* Therefore, shorts and ints must be read as sequences of bytes and
* combined with shifting operations.
* </ul>
* #param pcm16bitBytes - byte array of linear 16-bit PCM formated audio.
* #param offsetInput
* #param samples - float array to receive the 16-bit linear audio samples.
* #param offsetOutput
* #param length
*/
void mapPcm16bitLittleEndian2Float(final byte[] pcm16bitBytes,
final int offsetInput,
float[] samples,
final int offsetOutput,
final int length)
{
if (pcm16bitBytes.length - offsetInput < 2 * length) {
throw new IllegalArgumentException("Insufficient Samples to convert to floats");
}
System.out.println("the value is:" +samples);
if (samples.length - offsetOutput < length) {
throw new IllegalArgumentException("Insufficient float buffer to convert the samples");
}
for (int i = 0; i < length; i++) {
samples[offsetOutput+i] = (float)((pcm16bitBytes[offsetInput+2*i] & 0xff) | (pcm16bitBytes[offsetInput+2*i+1] << 8)); // no & 0xff at the end to keep the sign
}
}
public void setComplexity(int complexity) {
// TODO Auto-generated method stub
}
public void setVbrQuality(float vbr_quality) {
// TODO Auto-generated method stub
}
public void setDtx(boolean dtx) {
// TODO Auto-generated method stub
}
public void setBitRate(int bitrate) {
// TODO Auto-generated method stub
}
public void setVad(boolean vad) {
// TODO Auto-generated method stub
}
}
Exceptions:
Exception in thread "main" java.lang.NullPointerException
at org.xiph.speex.SpeexEncoder.mapPcm16bitLittleEndian2Float(SpeexEncoder.java:290)
at org.xiph.speex.SpeexEncoder.processData(SpeexEncoder.java:216)
at JSpeexEnc.encode(JSpeexEnc.java:541)
at JSpeexEnc.main(JSpeexEnc.java:170)
Check the sizes of the arrays that you're passing into mapPcm16bitLittleEndian2Float, and the number of iterations that your loops are doing, bearing in mind that in Java, arrays start from zero.

Reading a IDX file type in Java

I have built a image classifier in Java that I would like to test against the images provided here: http://yann.lecun.com/exdb/mnist/
Unfortunately, if you download the train-images-idx3-ubyte.gz or any of the other 3 files, they are all of file type: .idx1-ubyte
First Question:
I was wondering if anyone can give me instructions on how to make the .idx1-ubyte into bitmaps (.bmp) files?
Second Question:
Or just how I can read these files in general?
Information about the IDX file format:
the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is:
magic number
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data
The magic number is an integer (MSB first). The first 2 bytes are always 0.
The third byte codes the type of the data:
0x08: unsigned byte
0x09: signed byte
0x0B: short (2 bytes)
0x0C: int (4 bytes)
0x0D: float (4 bytes)
0x0E: double (8 bytes)
The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....
The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
Pretty Straightforward, as WPrecht said: "The URL describes the format you have to decode". This is my ImageSet exporter for the idx file, not very clean, but does what it has to do.
public class IdxReader {
public static void main(String[] args) {
// TODO Auto-generated method stub
FileInputStream inImage = null;
FileInputStream inLabel = null;
String inputImagePath = "CBIR_Project/imagesRaw/MNIST/train-images-idx3-ubyte";
String inputLabelPath = "CBIR_Project/imagesRaw/MNIST/train-labels-idx1-ubyte";
String outputPath = "CBIR_Project/images/MNIST_Database_ARGB/";
int[] hashMap = new int[10];
try {
inImage = new FileInputStream(inputImagePath);
inLabel = new FileInputStream(inputLabelPath);
int magicNumberImages = (inImage.read() << 24) | (inImage.read() << 16) | (inImage.read() << 8) | (inImage.read());
int numberOfImages = (inImage.read() << 24) | (inImage.read() << 16) | (inImage.read() << 8) | (inImage.read());
int numberOfRows = (inImage.read() << 24) | (inImage.read() << 16) | (inImage.read() << 8) | (inImage.read());
int numberOfColumns = (inImage.read() << 24) | (inImage.read() << 16) | (inImage.read() << 8) | (inImage.read());
int magicNumberLabels = (inLabel.read() << 24) | (inLabel.read() << 16) | (inLabel.read() << 8) | (inLabel.read());
int numberOfLabels = (inLabel.read() << 24) | (inLabel.read() << 16) | (inLabel.read() << 8) | (inLabel.read());
BufferedImage image = new BufferedImage(numberOfColumns, numberOfRows, BufferedImage.TYPE_INT_ARGB);
int numberOfPixels = numberOfRows * numberOfColumns;
int[] imgPixels = new int[numberOfPixels];
for(int i = 0; i < numberOfImages; i++) {
if(i % 100 == 0) {System.out.println("Number of images extracted: " + i);}
for(int p = 0; p < numberOfPixels; p++) {
int gray = 255 - inImage.read();
imgPixels[p] = 0xFF000000 | (gray<<16) | (gray<<8) | gray;
}
image.setRGB(0, 0, numberOfColumns, numberOfRows, imgPixels, 0, numberOfColumns);
int label = inLabel.read();
hashMap[label]++;
File outputfile = new File(outputPath + label + "_0" + hashMap[label] + ".png");
ImageIO.write(image, "png", outputfile);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (inImage != null) {
try {
inImage.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if (inLabel != null) {
try {
inLabel.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
}
I created some classes for reading the MNIST handwritten digits data set with Java. The classes can read the files after they have been decompressed (unzipped) from the files that are available at the download site. Classes that allow reading the original (compressed) files are part of a small MnistReader project.
These following classes are standalone (meaning that they do not have dependencies to third-party libraries) and are essentially in the Public Domain - meaning that they can just be copied into own projects. (Attributions would be appreciated, but not required) :
The MnistDecompressedReader class:
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Path;
import java.util.Objects;
import java.util.function.Consumer;
/**
* A class for reading the MNIST data set from the <b>decompressed</b>
* (unzipped) files that are published at
* <a href="http://yann.lecun.com/exdb/mnist/">
* http://yann.lecun.com/exdb/mnist/</a>.
*/
public class MnistDecompressedReader
{
/**
* Default constructor
*/
public MnistDecompressedReader()
{
// Default constructor
}
/**
* Read the MNIST training data from the given directory. The data is
* assumed to be located in files with their default names,
* <b>decompressed</b> from the original files:
* extension) :
* <code>train-images.idx3-ubyte</code> and
* <code>train-labels.idx1-ubyte</code>.
*
* #param inputDirectoryPath The input directory
* #param consumer The consumer that will receive the resulting
* {#link MnistEntry} instances
* #throws IOException If an IO error occurs
*/
public void readDecompressedTraining(Path inputDirectoryPath,
Consumer<? super MnistEntry> consumer) throws IOException
{
String trainImagesFileName = "train-images.idx3-ubyte";
String trainLabelsFileName = "train-labels.idx1-ubyte";
Path imagesFilePath = inputDirectoryPath.resolve(trainImagesFileName);
Path labelsFilePath = inputDirectoryPath.resolve(trainLabelsFileName);
readDecompressed(imagesFilePath, labelsFilePath, consumer);
}
/**
* Read the MNIST training data from the given directory. The data is
* assumed to be located in files with their default names,
* <b>decompressed</b> from the original files:
* extension) :
* <code>t10k-images.idx3-ubyte</code> and
* <code>t10k-labels.idx1-ubyte</code>.
*
* #param inputDirectoryPath The input directory
* #param consumer The consumer that will receive the resulting
* {#link MnistEntry} instances
* #throws IOException If an IO error occurs
*/
public void readDecompressedTesting(Path inputDirectoryPath,
Consumer<? super MnistEntry> consumer) throws IOException
{
String testImagesFileName = "t10k-images.idx3-ubyte";
String testLabelsFileName = "t10k-labels.idx1-ubyte";
Path imagesFilePath = inputDirectoryPath.resolve(testImagesFileName);
Path labelsFilePath = inputDirectoryPath.resolve(testLabelsFileName);
readDecompressed(imagesFilePath, labelsFilePath, consumer);
}
/**
* Read the MNIST data from the specified (decompressed) files.
*
* #param imagesFilePath The path of the images file
* #param labelsFilePath The path of the labels file
* #param consumer The consumer that will receive the resulting
* {#link MnistEntry} instances
* #throws IOException If an IO error occurs
*/
public void readDecompressed(Path imagesFilePath, Path labelsFilePath,
Consumer<? super MnistEntry> consumer) throws IOException
{
try (InputStream decompressedImagesInputStream =
new FileInputStream(imagesFilePath.toFile());
InputStream decompressedLabelsInputStream =
new FileInputStream(labelsFilePath.toFile()))
{
readDecompressed(
decompressedImagesInputStream,
decompressedLabelsInputStream,
consumer);
}
}
/**
* Read the MNIST data from the given (decompressed) input streams.
* The caller is responsible for closing the given streams.
*
* #param decompressedImagesInputStream The decompressed input stream
* containing the image data
* #param decompressedLabelsInputStream The decompressed input stream
* containing the label data
* #param consumer The consumer that will receive the resulting
* {#link MnistEntry} instances
* #throws IOException If an IO error occurs
*/
public void readDecompressed(
InputStream decompressedImagesInputStream,
InputStream decompressedLabelsInputStream,
Consumer<? super MnistEntry> consumer) throws IOException
{
Objects.requireNonNull(consumer, "The consumer may not be null");
DataInputStream imagesDataInputStream =
new DataInputStream(decompressedImagesInputStream);
DataInputStream labelsDataInputStream =
new DataInputStream(decompressedLabelsInputStream);
int magicImages = imagesDataInputStream.readInt();
if (magicImages != 0x803)
{
throw new IOException("Expected magic header of 0x803 "
+ "for images, but found " + magicImages);
}
int magicLabels = labelsDataInputStream.readInt();
if (magicLabels != 0x801)
{
throw new IOException("Expected magic header of 0x801 "
+ "for labels, but found " + magicLabels);
}
int numberOfImages = imagesDataInputStream.readInt();
int numberOfLabels = labelsDataInputStream.readInt();
if (numberOfImages != numberOfLabels)
{
throw new IOException("Found " + numberOfImages
+ " images but " + numberOfLabels + " labels");
}
int numRows = imagesDataInputStream.readInt();
int numCols = imagesDataInputStream.readInt();
for (int n = 0; n < numberOfImages; n++)
{
byte label = labelsDataInputStream.readByte();
byte imageData[] = new byte[numRows * numCols];
read(imagesDataInputStream, imageData);
MnistEntry mnistEntry = new MnistEntry(
n, label, numRows, numCols, imageData);
consumer.accept(mnistEntry);
}
}
/**
* Read bytes from the given input stream, filling the given array
*
* #param inputStream The input stream
* #param data The array to be filled
* #throws IOException If the input stream does not contain enough bytes
* to fill the array, or any other IO error occurs
*/
private static void read(InputStream inputStream, byte data[])
throws IOException
{
int offset = 0;
while (true)
{
int read = inputStream.read(
data, offset, data.length - offset);
if (read < 0)
{
break;
}
offset += read;
if (offset == data.length)
{
return;
}
}
throw new IOException("Tried to read " + data.length
+ " bytes, but only found " + offset);
}
}
The MnistEntry class:
import java.awt.image.BufferedImage;
import java.awt.image.DataBuffer;
import java.awt.image.DataBufferByte;
/**
* An entry of the MNIST data set. Instances of this class will be passed
* to the consumer that is given to the {#link MnistCompressedReader} and
* {#link MnistDecompressedReader} reading methods.
*/
public class MnistEntry
{
/**
* The index of the entry
*/
private final int index;
/**
* The class label of the entry
*/
private final byte label;
/**
* The number of rows of the image data
*/
private final int numRows;
/**
* The number of columns of the image data
*/
private final int numCols;
/**
* The image data
*/
private final byte[] imageData;
/**
* Default constructor
*
* #param index The index
* #param label The label
* #param numRows The number of rows
* #param numCols The number of columns
* #param imageData The image data
*/
MnistEntry(int index, byte label, int numRows, int numCols,
byte[] imageData)
{
this.index = index;
this.label = label;
this.numRows = numRows;
this.numCols = numCols;
this.imageData = imageData;
}
/**
* Returns the index of the entry
*
* #return The index
*/
public int getIndex()
{
return index;
}
/**
* Returns the class label of the entry. This is a value in [0,9],
* indicating which digit is shown in the entry
*
* #return The class label
*/
public byte getLabel()
{
return label;
}
/**
* Returns the number of rows of the image data.
* This will usually be 28.
*
* #return The number of rows
*/
public int getNumRows()
{
return numRows;
}
/**
* Returns the number of columns of the image data.
* This will usually be 28.
*
* #return The number of columns
*/
public int getNumCols()
{
return numCols;
}
/**
* Returns a <i>reference</i> to the image data. This will be an array
* of length <code>numRows * numCols</code>, containing values
* in [0,255] indicating the brightness of the pixels.
*
* #return The image data
*/
public byte[] getImageData()
{
return imageData;
}
/**
* Creates a new buffered image from the image data that is stored
* in this entry.
*
* #return The image
*/
public BufferedImage createImage()
{
BufferedImage image = new BufferedImage(getNumCols(),
getNumRows(), BufferedImage.TYPE_BYTE_GRAY);
DataBuffer dataBuffer = image.getRaster().getDataBuffer();
DataBufferByte dataBufferByte = (DataBufferByte) dataBuffer;
byte data[] = dataBufferByte.getData();
System.arraycopy(getImageData(), 0, data, 0, data.length);
return image;
}
#Override
public String toString()
{
String indexString = String.format("%05d", index);
return "MnistEntry["
+ "index=" + indexString + ","
+ "label=" + label + "]";
}
}
The reader can be used to read the uncompressed files. The result will be MnistEntry instances that are passed to a consumer:
MnistDecompressedReader mnistReader = new MnistDecompressedReader();
mnistReader.readDecompressedTraining(Paths.get("./data"), mnistEntry ->
{
System.out.println("Read entry " + mnistEntry);
BufferedImage image = mnistEntry.createImage();
...
});
The MnistReader project contains several examples of how these classes may be used to read the compressed- or uncompressed data, or to generate PNG images from the MNIST entries.
The URL describes the format you have to decode, and they mention it's non-standard, so the obvious Google search doesn't turn up any code of use. However, it's very straight forward with a header followed by a 28x28 pixel matrix of 0-255 gray scale values.
Once you have the data read out (remember to pay attention to endian-ness), creating BMP files is straight forward.
I recommend the following article to you:
How to make bmp image from pixel byte array in java
their question is about color, but their code already works for gray scale and that's what you need anyway, you should be able to get something going from that snippet.

Categories