AWS Transcript: file to text returns nonsense - java

This is a follow-on question to AWS Transcribe S3 .wav file to text. I use a stream to read and send a .wav file contents to AWS.
Instead of getting back the correct transcript, I get nonsense like a bunch of "Yeah." statements. It looks like AWS isn't able to interpret the byte stream correctly, but I'm not sure what's wrong. I'm wondering if the file needs to be encoded somehow, ie, I can't send the raw .wav bytes straight from the file? Or perhaps I need to tell the service that this is .wav format?
What's wrong here? The input file is a valid .wav voice file that sounds intelligible when I listen to it.
Here is my java code:
package com.amazonaws.transcribe;
import org.reactivestreams.Publisher;
import org.reactivestreams.Subscriber;
import org.reactivestreams.Subscription;
import software.amazon.awssdk.core.SdkBytes;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.transcribestreaming.TranscribeStreamingAsyncClient;
import software.amazon.awssdk.services.transcribestreaming.model.*;
import javax.sound.sampled.*;
import java.io.*;
import java.net.URISyntaxException;
import java.nio.ByteBuffer;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class TranscribeFileFromStream {
private static final Region REGION = Region.US_EAST_1;
private static TranscribeStreamingAsyncClient client;
public static void main(String args[]) throws URISyntaxException, ExecutionException, InterruptedException, LineUnavailableException {
System.out.println(System.getProperty("java.version"));
client = TranscribeStreamingAsyncClient.builder()
.region(REGION)
.build();
try {
CompletableFuture<Void> result = client.startStreamTranscription(getRequest(16000),
new AudioStreamPublisher(getStreamFromFile()),
getResponseHandler());
result.get();
} finally {
if (client != null) {
client.close();
}
}
}
private static InputStream getStreamFromFile() {
try {
File inputFile = new File("~/work/transcribe/src/main/resources/story/media/Story3.m4a.wav");
InputStream audioStream = new FileInputStream(inputFile);
return audioStream;
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
}
private static StartStreamTranscriptionRequest getRequest(Integer mediaSampleRateHertz) {
return StartStreamTranscriptionRequest.builder()
.languageCode(LanguageCode.EN_US)
.mediaEncoding(MediaEncoding.PCM)
.mediaSampleRateHertz(mediaSampleRateHertz)
.build();
}
private static StartStreamTranscriptionResponseHandler getResponseHandler() {
return StartStreamTranscriptionResponseHandler.builder()
.onResponse(r -> {
System.out.println("Received Initial response");
})
.onError(e -> {
System.out.println(e.getMessage());
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
System.out.println("Error Occurred: " + sw.toString());
})
.onComplete(() -> {
System.out.println("=== All records stream successfully ===");
})
.subscriber(event -> {
List<Result> results = ((TranscriptEvent) event).transcript().results();
if (results.size() > 0) {
if (!results.get(0).alternatives().get(0).transcript().isEmpty()) {
System.out.println(results.get(0).alternatives().get(0).transcript());
} else {
System.out.println("Empty result");
}
} else {
System.out.println("No results");
}
})
.build();
}
private static class AudioStreamPublisher implements Publisher<AudioStream> {
private final InputStream inputStream;
private static Subscription currentSubscription;
private AudioStreamPublisher(InputStream inputStream) {
this.inputStream = inputStream;
}
#Override
public void subscribe(Subscriber<? super AudioStream> s) {
if (this.currentSubscription == null) {
this.currentSubscription = new SubscriptionImpl(s, inputStream);
} else {
this.currentSubscription.cancel();
this.currentSubscription = new SubscriptionImpl(s, inputStream);
}
s.onSubscribe(currentSubscription);
}
}
public static class SubscriptionImpl implements Subscription {
private static final int CHUNK_SIZE_IN_BYTES = 1024 * 1;
private final Subscriber<? super AudioStream> subscriber;
private final InputStream inputStream;
private ExecutorService executor = Executors.newFixedThreadPool(1);
private AtomicLong demand = new AtomicLong(0);
SubscriptionImpl(Subscriber<? super AudioStream> s, InputStream inputStream) {
this.subscriber = s;
this.inputStream = inputStream;
}
#Override
public void request(long n) {
if (n <= 0) {
subscriber.onError(new IllegalArgumentException("Demand must be positive"));
}
demand.getAndAdd(n);
executor.submit(() -> {
try {
do {
ByteBuffer audioBuffer = getNextEvent();
if (audioBuffer.remaining() > 0) {
AudioEvent audioEvent = audioEventFromBuffer(audioBuffer);
subscriber.onNext(audioEvent);
} else {
subscriber.onComplete();
break;
}
} while (demand.decrementAndGet() > 0);
} catch (Exception e) {
subscriber.onError(e);
}
});
}
#Override
public void cancel() {
executor.shutdown();
}
private ByteBuffer getNextEvent() {
ByteBuffer audioBuffer = null;
byte[] audioBytes = new byte[CHUNK_SIZE_IN_BYTES];
int len = 0;
try {
len = inputStream.read(audioBytes);
if (len <= 0) {
audioBuffer = ByteBuffer.allocate(0);
} else {
audioBuffer = ByteBuffer.wrap(audioBytes, 0, len);
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return audioBuffer;
}
private AudioEvent audioEventFromBuffer(ByteBuffer bb) {
return AudioEvent.builder()
.audioChunk(SdkBytes.fromByteBuffer(bb))
.build();
}
}
}
Here's my program output:
Received Initial response
No results
No results
Yeah.
No results
Yeah.
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
Yeah.
No results
No results
Oh,
No results
Oh,
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
Oh,
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results
No results

The audio file had a sample rate of 44.1 kHz. It was converted to 16 kHz, and it worked:
https://drive.google.com/file/d/1mYVbNlYK3SpGT4NbFRYGn86177eTCqhd/view?usp=sharing

As smac2020 pointed out, the sample rate was wrong. Debugging incorrect metadata values passed to AWS is tricky because there's no errors from AWS. You just get back an incorrect transcription. So, the lesson here is, make sure you know what the right values are. Some of them can be automatically detected.
If you're on mac, the tool mediainfo is quite useful.
brew install mediainfo
So is ffmpeg:
brew install ffmpeg
Here is an updated example where I automatically detect the sample rate using AudioFormat.java. Ideally, the AWS sdk would do this for you. If the media file is outside the parameters of what can be transcribed, then it would throw an exception. Note, I had to modify my original file to 16,000 sample rate using the tool: nch.com.au/switch/index.html. It would be great (hint, hint) if the SDK would also have the ability to modify sample rate, etc, so that files can be changed to fit within input parameters.
package com.amazonaws.transcribe;
import org.reactivestreams.Publisher;
import org.reactivestreams.Subscriber;
import org.reactivestreams.Subscription;
import software.amazon.awssdk.core.SdkBytes;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.transcribestreaming.TranscribeStreamingAsyncClient;
import software.amazon.awssdk.services.transcribestreaming.model.*;
import javax.sound.sampled.*;
import java.io.*;
import java.nio.ByteBuffer;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
import static javax.sound.sampled.AudioFormat.Encoding.*;
public class TranscribeFileFromStream {
private static final Region REGION = Region.US_EAST_1;
private static TranscribeStreamingAsyncClient client;
public static void main(String args[]) throws Exception {
System.setProperty("AWS_ACCESS_KEY_ID", "myId");
System.setProperty("AWS_SECRET_ACCESS_KEY", "myKey");
System.out.println(System.getProperty("java.version"));
// BasicConfigurator.configure();
client = TranscribeStreamingAsyncClient.builder()
.region(REGION)
.build();
try {
File inputFile = new File("/home/me/work/transcribe/src/main/resources/test-file.wav");
CompletableFuture<Void> result = client.startStreamTranscription(
getRequest(inputFile),
new AudioStreamPublisher(getStreamFromFile(inputFile)),
getResponseHandler());
result.get();
} finally {
if (client != null) {
client.close();
}
}
}
private static InputStream getStreamFromFile(File inputFile) {
try {
return new FileInputStream(inputFile);
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
}
private static StartStreamTranscriptionRequest getRequest(File inputFile) throws IOException, UnsupportedAudioFileException {
//TODO: I read the file twice in this example. Can this be more performant?
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(inputFile);
AudioFormat audioFormat = audioInputStream.getFormat();
return StartStreamTranscriptionRequest.builder()
.languageCode(LanguageCode.EN_US)
//.mediaEncoding(MediaEncoding.PCM)
.mediaEncoding(getAwsMediaEncoding(audioFormat))
.mediaSampleRateHertz(getAwsSampleRate(audioFormat))
.build();
}
private static MediaEncoding getAwsMediaEncoding(AudioFormat audioFormat) {
final String javaMediaEncoding = audioFormat.getEncoding().toString();
if (PCM_SIGNED.toString().equals(javaMediaEncoding)) {
return MediaEncoding.PCM;
} else if (PCM_UNSIGNED.toString().equals(javaMediaEncoding)){
return MediaEncoding.PCM;
} /*else if (ALAW.toString().equals(javaMediaEncoding)){
//WARNING: I have no idea how ALAW maps to AWS media encodings.
return MediaEncoding.OGG_OPUS;
} else if (ULAW.toString().equals(javaMediaEncoding)){
//WARNING: I have no idea how ULAW maps to AWS encodings.
return MediaEncoding.FLAC;
}*/
throw new IllegalArgumentException("Not a recognized media encoding:" + javaMediaEncoding);
}
private static Integer getAwsSampleRate(AudioFormat audioFormat) {
return Math.round(audioFormat.getSampleRate());
}
private static StartStreamTranscriptionResponseHandler getResponseHandler() {
return StartStreamTranscriptionResponseHandler.builder()
.onResponse(r -> {
System.out.println("Received Initial response");
})
.onError(e -> {
System.out.println(e.getMessage());
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
System.out.println("Error Occurred: " + sw.toString());
})
.onComplete(() -> {
System.out.println("=== All records stream successfully ===");
})
.subscriber(event -> {
List<Result> results = ((TranscriptEvent) event).transcript().results();
if (results.size() > 0) {
if (!results.get(0).alternatives().get(0).transcript().isEmpty()) {
System.out.println(results.get(0).alternatives().get(0).transcript());
} else {
System.out.println("Empty result");
}
} else {
System.out.println("No results");
}
})
.build();
}
private static class AudioStreamPublisher implements Publisher<AudioStream> {
private final InputStream inputStream;
private static Subscription currentSubscription;
private AudioStreamPublisher(InputStream inputStream) {
this.inputStream = inputStream;
}
#Override
public void subscribe(Subscriber<? super AudioStream> s) {
if (this.currentSubscription == null) {
this.currentSubscription = new SubscriptionImpl(s, inputStream);
} else {
this.currentSubscription.cancel();
this.currentSubscription = new SubscriptionImpl(s, inputStream);
}
s.onSubscribe(currentSubscription);
}
}
public static class SubscriptionImpl implements Subscription {
private static final int CHUNK_SIZE_IN_BYTES = 1024 * 1;
private final Subscriber<? super AudioStream> subscriber;
private final InputStream inputStream;
private ExecutorService executor = Executors.newFixedThreadPool(1);
private AtomicLong demand = new AtomicLong(0);
SubscriptionImpl(Subscriber<? super AudioStream> s, InputStream inputStream) {
this.subscriber = s;
this.inputStream = inputStream;
}
#Override
public void request(long n) {
if (n <= 0) {
subscriber.onError(new IllegalArgumentException("Demand must be positive"));
}
demand.getAndAdd(n);
executor.submit(() -> {
try {
do {
ByteBuffer audioBuffer = getNextEvent();
if (audioBuffer.remaining() > 0) {
AudioEvent audioEvent = audioEventFromBuffer(audioBuffer);
subscriber.onNext(audioEvent);
} else {
subscriber.onComplete();
break;
}
} while (demand.decrementAndGet() > 0);
} catch (Exception e) {
subscriber.onError(e);
}
});
}
#Override
public void cancel() {
executor.shutdown();
}
private ByteBuffer getNextEvent() {
ByteBuffer audioBuffer = null;
byte[] audioBytes = new byte[CHUNK_SIZE_IN_BYTES];
int len = 0;
try {
len = inputStream.read(audioBytes);
if (len <= 0) {
audioBuffer = ByteBuffer.allocate(0);
} else {
audioBuffer = ByteBuffer.wrap(audioBytes, 0, len);
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return audioBuffer;
}
private AudioEvent audioEventFromBuffer(ByteBuffer bb) {
return AudioEvent.builder()
.audioChunk(SdkBytes.fromByteBuffer(bb))
.build();
}
}
}

Related

How to use parallelism to speed up URL GET requests?

The problem I am doing requires me to send requests to a website and check whether a specific password is correct. It is somewhat similar to a CTF problem, but I use brute force to generate the correct password key, as the site gives feedback whether a specific key is on the right track. In order for a key to be considered "almost-valid," it must be a substring of the correct key.
I have implemented this naively, but the intended solution uses simple parallelism to speed up the process. How would I accomplish this in Java?
import java.net.*;
import java.io.*;
import java.nio.charset.StandardCharsets;
public class Main {
static boolean check(StringBuilder s) throws IOException{
String test = "https://example.com?pass=";
String q = URLEncoder.encode(s.toString(), StandardCharsets.UTF_8);
URL site = new URL(test+q);
URLConnection yc = site.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
//System.out.println(inputLine);
if (inputLine.contains("Incorrect password!")) {
return false;
}
if (inputLine.contains("the correct password!")) {
System.out.println(s);
System.exit(0);
}
}
return true;
}
static void gen(StringBuilder s) throws IOException {
if (!check(s)) {
return;
}
for (int i = 33; i<127; i++) {
int len = s.length();
gen(s.append((char) i));
s.setLength(len);
}
}
public static void main(String[] args) throws IOException, InterruptedException {
gen(new StringBuilder("start"));
}
}
EDIT: I have attempted to implement RecursiveAction & ForkJoinPool, but the code seems just as slow as the naive implementation. Am I implementing the parallelism incorrectly?
import java.nio.charset.StandardCharsets;
import java.util.concurrent.*;
import java.util.*;
import java.io.*;
import java.net.*;
public class cracked4 {
static class Validator extends RecursiveAction{
public String password;
public Validator(String p) {
password = p;
}
#Override
protected void compute(){
try {
if (!valid(password)) return;
System.out.println(password);
ArrayList<Validator> futures = new ArrayList<>();
for (int i = 33; i<127; i++) {
futures.add(new Validator(password + (char) i));
}
for (Validator t: futures) {
ForkJoinTask.invokeAll(t);
}
} catch (IOException e) {
e.printStackTrace();
}
}
public boolean valid(String s) throws IOException {
String test = "https://example.com?pass=" + URLEncoder.encode(s, StandardCharsets.UTF_8);
URL site = new URL(test);
URLConnection yc = site.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains("Incorrect password!")) {
return false;
}
if (inputLine.contains("the correct password!")) {
System.out.println(s);
System.exit(0);
}
}
return true;
}
}
public static void main(String[] args) {
ForkJoinPool forkJoinPool = new ForkJoinPool();
forkJoinPool.invoke(new Validator("cararra"));
}
}
Furthermore, is there a certain UID serial I need? I researched about it, but I could not find a specific answer.
Alright, I researched more about parallelism, and I decided on using ForkJoin / RecursiveAction. Using the parallelism allowed me to reduce my code execution time from 200 seconds to roughly 43 seconds (on my computer).
import java.util.*;
import java.util.concurrent.*;
import java.net.http.*;
import java.net.*;
import java.nio.charset.StandardCharsets;
import java.io.*;
public class cracked4 {
static class Validator extends RecursiveAction {
public StringBuilder password;
public Validator(StringBuilder p) {
password = p;
}
public static boolean valid(StringBuilder s) throws IOException, InterruptedException {
HttpClient client = HttpClient.newHttpClient();
String test = "https://example.com?pass=" + URLEncoder.encode(s.toString(), StandardCharsets.UTF_8);
HttpRequest request = HttpRequest.newBuilder().uri(URI.create(test)).GET().build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
String text = response.body();
if (text.contains("Incorrect password!")) {
return false;
}
if (text.contains("the correct password!")) {
System.out.println(s);
System.exit(0);
}
return true;
}
#Override
protected void compute() {
try {
if (!valid(password)) return;
ArrayList<Validator> c = new ArrayList<>();
for (int i = 33; i < 127; i++) {
StringBuilder t = new StringBuilder(password).append((char) i);
c.add(new Validator(t));
}
ForkJoinTask.invokeAll(c);
}
catch (IOException | InterruptedException ignored) {
}
}
}
public static void main(String[] args) {
ForkJoinPool forkJoinPool = new ForkJoinPool();
forkJoinPool.invoke(new Validator(new StringBuilder("start")));
}
}
When I wrote this code initially, I used .fork(), but this did not help at all. In fact, it made it perform just as slow as the sequential code. I collected all of the Validator objects and I used ForkJoinTask.invokeAll(). This small difference resulted in an almost 200% speedup.
Hi please use RecursiveTask,ForkJoinPool for parallelism. Below code is not final modify on your own way.
public class Test {
public static void main(String[] args) {
ForkJoinPool forkJoinPool=new ForkJoinPool(parallelism);
forkJoinPool.invoke(new PasswordKeyValidatorTask(new StringBuilder("start")));
}
public static class PasswordKeyValidatorTask extends RecursiveTask<StringBuilder> {
private static final long serialVersionUID = 3113310524830439302L;
private StringBuilder password;
public PasswordKeyValidatorTask(StringBuilder password) {
this.password = password;
}
#Override
protected StringBuilder compute() {
try {
if (!valid(password)) {
List<PasswordKeyValidatorTask> subtasks = new ArrayList<PasswordKeyValidatorTask>();
for (int i = 33; i < 127; i++) {
PasswordKeyValidatorTask task = new PasswordKeyValidatorTask(password.append((char) i));
subtasks.add(task);
}
subtasks.stream().forEach(t -> t.fork());
return subtasks.stream().map(t -> t.join()).findFirst().orElse(null);
}
} catch (Exception e) {
}
return password;
}
boolean valid(StringBuilder s) throws IOException {
String test = "https://samplewebsite.com?pass=";
String q = URLEncoder.encode(s.toString(), StandardCharsets.UTF_8);
URL site = new URL(test + s);
URLConnection yc = site.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
// System.out.println(inputLine);
if (inputLine.contains("Incorrect password!")) {
return false;
}
if (inputLine.contains("the correct password!")) {
System.out.println(s);
}
}
return true;
}
}
}

Why does Java file write consume CPU?

I am writing data to file using a queue on a separate thread, but the process consumes around 25% of CPU, as shown in this test main.
Is there something I can do to resolve this issue?
Perhaps I should be using flush() somewhere?
The test shows the main method start and run the queue thread and then send created data to it. The queue thread writes the data to a BufferedWriter which handles writing the data to a file.
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.logging.Level;
import java.util.logging.Logger;
import uk.co.moonsit.utils.timing.Time;
public class OutputFloatQueueReceiver extends Thread {
private static final Logger LOG = Logger.getLogger(OutputFloatQueueReceiver.class.getName());
private ConcurrentLinkedQueue<List<Float>> queue = null;
private boolean running = true;
private final BufferedWriter outputWriter;
private int ctr = 0;
private final int LIMIT = 1000;
public OutputFloatQueueReceiver(String outputFile, String header, ConcurrentLinkedQueue<List<Float>> q) throws IOException {
queue = q;
File f = new File(outputFile);
FileWriter fstream = null;
if (!f.exists()) {
try {
f.getParentFile().mkdirs();
if (!f.createNewFile()) {
throw new IOException("Exception when trying to create file " + f.getAbsolutePath());
}
fstream = new FileWriter(outputFile, false);
} catch (IOException ex) {
//Logger.getLogger(ControlHierarchy.class.getName()).log(Level.SEVERE, null, ex);
throw new IOException("Exception when trying to create file " + f.getAbsolutePath());
}
}
fstream = new FileWriter(outputFile, true);
outputWriter = new BufferedWriter(fstream);
outputWriter.append(header);
}
public synchronized void setRunning(boolean running) {
this.running = running;
}
#Override
public void run() {
while (running) {
while (queue.peek() != null) {
if (ctr++ % LIMIT == 0) {
LOG.log(Level.INFO, "Output Queue size = {0} '{'ctr={1}'}'", new Object[]{queue.size(), ctr});
}
List<Float> list = queue.poll();
if (list == null) {
continue;
}
try {
StringBuilder sbline = new StringBuilder();
Time t = new Time(list.get(0));
sbline.append(t.HMSS()).append(",");
for (Float f : list) {
sbline.append(f).append(",");
}
sbline.append("\n");
outputWriter.write(sbline.toString());
} catch (IOException ex) {
LOG.info(ex.toString());
break;
}
}
}
if (outputWriter != null) {
try {
outputWriter.close();
LOG.info("Closed outputWriter");
} catch (IOException ex) {
Logger.getLogger(OutputFloatQueueReceiver.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
public static void main(String[] args) {
try {
String outputFile = "c:\\tmp\\qtest.csv";
File f = new File(outputFile);
f.delete();
StringBuilder header = new StringBuilder();
header.append("1,2,3,4,5,6,7,8,9");
header.append("\n");
ConcurrentLinkedQueue<List<Float>> outputQueue = null;
OutputFloatQueueReceiver outputQueueReceiver = null;
outputQueue = new ConcurrentLinkedQueue<>();
outputQueueReceiver = new OutputFloatQueueReceiver(outputFile, header.toString(), outputQueue);
outputQueueReceiver.start();
for (int i = 1; i < 100000; i++) {
List<Float> list = new ArrayList<>();
//list.set(0, (float) i); // causes exception
list.add((float) i);
for (int j = 1; j < 10; j++) {
list.add((float) j);
}
outputQueue.add(list);
}
try {
Thread.sleep(5000);
} catch (InterruptedException ex) {
Logger.getLogger(OutputFloatQueueReceiver.class.getName()).log(Level.SEVERE, null, ex);
}
outputQueueReceiver.setRunning(false);
} catch (IOException ex) {
Logger.getLogger(OutputFloatQueueReceiver.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
This code is the reason while your code is using so much CPU:
while (running) {
while (queue.peek() != null) {
// logging
List<Float> list = queue.poll();
if (list == null) {
continue;
}
// do stuff with list
}
}
Basically, your code is busy-waiting, repeatedly "peeking" until a queue entry becomes available. It is probably spinning there in a tight loop.
You should replace your queue class with a BlockingQueue, and simply use take() ... like this:
while (running) {
List<Float> list = queue.take();
// do stuff with list
}
The take() call block indefinitely, only returning once there is an element available, and returning that element as the result. If blocking indefinitely is a problem, you could either use poll(...) with a timeout, or you could arrange that some other thread interrupts the thread that is blocked.

unable to fetch process time using sigar

import java.io.IOException;
import org.hyperic.sigar.*;
public class SigarDemo {
public static void main(String args[]) throws SigarException, IOException
{
final Sigar sigar = new Sigar();
final long[] processes = sigar.getProcList();
ProcTime pt=new ProcTime();
for (final long processId : processes) {
ProcUtil.getDescription(sigar, processId);
pt=sigar.getProcTime(processId);
System.out.println("---"+pt.getStartTime());
}
}
I am trying to fetch process time of each process using sigar. I am getting this errors:
Exception in thread "main" java.lang.ExceptionInInitializerError
at taskmanager.SigarDemo.main(SigarDemo.java:22)
Caused by: java.security.AccessControlException: access denied ("java.util.PropertyPermission" "sigar.nativeLogging" "read")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:457)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.SecurityManager.checkPropertyAccess(SecurityManager.java:1294)
at java.lang.System.getProperty(System.java:714)
at org.hyperic.sigar.Sigar.(Sigar.java:78)
I have tried policy file setting all permission. but still I am getting errors. I am using netbeans 8.0 . and I had already setting
-Djava.security.manager -Djava.security.policy=src/dir1/dir2/important.policy
I used this code to get the process times
public static void main(String args[]) {
try {
final Sigar sigar = new Sigar();
final long[] processes = sigar.getProcList();
ProcTime pt = new ProcTime();
for (final long processId : processes) {
try {
ProcUtil.getDescription(sigar, processId);
pt = sigar.getProcTime(String.valueOf(processId));
System.out.println("---" + pt.getStartTime());
} catch (SigarException e) {
System.out.println("can't accessible...");
}
}
} catch (SigarException ex) {
ex.printStackTrace();
}
}
you don't want to specify the security policy file in VM arguments to get the process times. but the thing is getProcTime() will not return process times for some process ids because of SigarPermissionDeniedException.
but you will get process time for some processes without any problem.
I got this idea from a sample demo file from bindings\java\examples folder. I posted it below with slight modification. you can compile and run it to see the result(it includes the process time also)
import org.hyperic.sigar.Sigar;
import org.hyperic.sigar.SigarProxy;
import org.hyperic.sigar.SigarException;
import org.hyperic.sigar.ProcCredName;
import org.hyperic.sigar.ProcMem;
import org.hyperic.sigar.ProcTime;
import org.hyperic.sigar.ProcState;
import org.hyperic.sigar.ProcUtil;
import org.hyperic.sigar.cmd.Shell;
import org.hyperic.sigar.cmd.SigarCommandBase;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.text.SimpleDateFormat;
import java.util.Date;
/**
* Show process status.
*/
public class Ps extends SigarCommandBase {
public Ps(Shell shell) {
super(shell);
}
public Ps() {
super();
}
protected boolean validateArgs(String[] args) {
return true;
}
public String getSyntaxArgs() {
return "[pid|query]";
}
public String getUsageShort() {
return "Show process status";
}
public boolean isPidCompleter() {
return true;
}
public void output(String[] args) throws SigarException {
long[] pids;
if (args.length == 0) {
pids = this.proxy.getProcList();
}
else {
pids = this.shell.findPids(args);
}
for (int i=0; i<pids.length; i++) {
long pid = pids[i];
try {
output(pid);
} catch (SigarException e) {
this.err.println("Exception getting process info for " +
pid + ": " + e.getMessage());
}
}
}
public static String join(List info) {
StringBuffer buf = new StringBuffer();
Iterator i = info.iterator();
boolean hasNext = i.hasNext();
while (hasNext) {
buf.append((String)i.next());
hasNext = i.hasNext();
if (hasNext)
buf.append("\t");
}
return buf.toString();
}
public static List getInfo(SigarProxy sigar, long pid)
throws SigarException {
ProcState state = sigar.getProcState(pid);
ProcTime time = null;
String unknown = "???";
List info = new ArrayList();
info.add(String.valueOf(pid));
try {
ProcCredName cred = sigar.getProcCredName(pid);
info.add(cred.getUser());
} catch (SigarException e) {
info.add(unknown);
}
try {
time = sigar.getProcTime(pid);
info.add(getStartTime(time.getStartTime()));
System.out.println("this line has executed..!!!");
} catch (SigarException e) {
info.add(unknown);
}
try {
ProcMem mem = sigar.getProcMem(pid);
info.add(Sigar.formatSize(mem.getSize()));
info.add(Sigar.formatSize(mem.getRss()));
info.add(Sigar.formatSize(mem.getShare()));
} catch (SigarException e) {
info.add(unknown);
}
info.add(String.valueOf(state.getState()));
if (time != null) {
info.add(getCpuTime(time));
}
else {
info.add(unknown);
}
String name = ProcUtil.getDescription(sigar, pid);
info.add(name);
return info;
}
public void output(long pid) throws SigarException {
println(join(getInfo(this.proxy, pid)));
}
public static String getCpuTime(long total) {
long t = total / 1000;
return t/60 + ":" + t%60;
}
public static String getCpuTime(ProcTime time) {
return getCpuTime(time.getTotal());
}
private static String getStartTime(long time) {
if (time == 0) {
return "00:00";
}
long timeNow = System.currentTimeMillis();
String fmt = "MMMd";
if ((timeNow - time) < ((60*60*24) * 1000)) {
fmt = "HH:mm";
}
return new SimpleDateFormat(fmt).format(new Date(time));
}
public static void main(String[] args) throws Exception {
new Ps().processCommand(args);
}
}

Reading information from a nonblocking SocketChanel

I'm trying to read a nonblocking socket to avoid getting stuck at some point in my program. Does anyone know why when I try to read always return zero? It would be a problem with ByteBuffer? This problem occurs in the read method with lenght is always zero.
package com.viewt.eyebird.communication;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SocketChannel;
import java.util.LinkedList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.viewt.eyebird.PhoneInformation;
import com.viewt.eyebird.TrackingServiceData;
import com.viewt.eyebird.commands.*;
import android.os.Handler;
import android.util.Log;
final public class ServerCommunication {
protected final int socketTimeout;
protected final TrackingServiceData commandData;
protected final Handler handler;
protected final ServerCommunicationChecker clientChecker;
protected final LinkedList<Serialize> queue = new LinkedList<Serialize>();
protected final ByteBuffer socketBuffer = ByteBuffer.allocate(1024);
protected final StringBuilder readBuffer = new StringBuilder();
protected static final Pattern commandPattern = Pattern.compile(">[^<]+<");
protected static final ServerCommand availableCommands[] = { new Panic(),
new ChangeServer(), new GetServer(), new Restart(),
new PasswordCleanup() };
protected InetSocketAddress inetSocketAddress;
protected SocketChannel sChannel;
public ServerCommunication(Handler handler, String host, int port,
int timeAlive, int socketTimeout,
PhoneInformation phoneInformation, TrackingServiceData commandData) {
this.commandData = commandData;
this.handler = handler;
this.socketTimeout = socketTimeout;
try {
connect(host, port);
} catch (CommunicationException e) {
Log.getStackTraceString(e);
}
clientChecker = new ServerCommunicationChecker(handler, this,
timeAlive, new AliveResponse(phoneInformation));
handler.postDelayed(clientChecker, timeAlive);
}
public void connect() throws CommunicationException {
try {
sChannel = SocketChannel.open();
sChannel.configureBlocking(false);
sChannel.socket().setSoTimeout(socketTimeout);
sChannel.connect(inetSocketAddress);
} catch (IOException e) {
throw new CommunicationException(e);
}
}
public boolean isConnectionPending() {
return sChannel.isConnectionPending();
}
public boolean finishConnection() throws CommunicationException {
try {
return sChannel.finishConnect();
} catch (IOException e) {
throw new CommunicationException(e);
}
}
public void connect(String host, int port) throws CommunicationException {
inetSocketAddress = new InetSocketAddress(host, port);
connect();
}
public void send(Serialize serialize) throws CommunicationException {
try {
sChannel.write(ByteBuffer
.wrap(String.valueOf(serialize).getBytes()));
} catch (IOException e) {
throw new CommunicationException(e);
}
}
public void sendOrQueue(Serialize serialize) {
try {
send(serialize);
} catch (Exception e) {
queue(serialize);
}
}
public void queue(Serialize serialize) {
queue.add(serialize);
}
#Override
protected void finalize() throws Throwable {
handler.removeCallbacks(clientChecker);
super.finalize();
}
public void sync() throws CommunicationException {
int queueSize = queue.size();
for (int i = 0; i < queueSize; i++) {
send(queue.getFirst());
queue.removeFirst();
}
}
public void read() throws CommunicationException {
int length, readed = 0;
try {
while ((length = sChannel.read(socketBuffer)) > 0)
for (readed = 0; readed < length; readed++)
readBuffer.append(socketBuffer.get());
} catch (IOException e) {
throw new CommunicationException(e);
} finally {
socketBuffer.flip();
}
Matcher matcher = commandPattern.matcher(readBuffer);
int lastCommand;
if ((lastCommand = readBuffer.lastIndexOf("<")) != -1)
readBuffer.delete(0, lastCommand);
while (matcher.find()) {
for (ServerCommand command : availableCommands) {
try {
command.command(matcher.group(), commandData);
break;
} catch (CommandBadFormatException e) {
continue;
}
}
}
if (length == -1)
throw new CommunicationException("Server closed");
}
}
You are using non blocking channel, which means it will block till data is available. It returns with 0 immediately without blocking if no data is available.
Applications always read from network buffers.
You probably try to read immediately after you send some data and then when you get 0 bytes you stop reading. You didn't even give any time to the network to return data.
Instead, you should read in a loop and if you get no data, sleep a little with Thread.sleep(time) (use about 100-300ms), then retry.
You should stop the loop if you already waited too long: count the sleeps, reset when you get some data. Or stop when all data is read.

What is the best way to write to a file in a parallel thread in Java?

I have a program that performs lots of calculations and reports them to a file frequently. I know that frequent write operations can slow a program down a lot, so to avoid it I'd like to have a second thread dedicated to the writing operations.
Right now I'm doing it with this class I wrote (the impatient can skip to the end of the question):
public class ParallelWriter implements Runnable {
private File file;
private BlockingQueue<Item> q;
private int indentation;
public ParallelWriter( File f ){
file = f;
q = new LinkedBlockingQueue<Item>();
indentation = 0;
}
public ParallelWriter append( CharSequence str ){
try {
CharSeqItem item = new CharSeqItem();
item.content = str;
item.type = ItemType.CHARSEQ;
q.put(item);
return this;
} catch (InterruptedException ex) {
throw new RuntimeException( ex );
}
}
public ParallelWriter newLine(){
try {
Item item = new Item();
item.type = ItemType.NEWLINE;
q.put(item);
return this;
} catch (InterruptedException ex) {
throw new RuntimeException( ex );
}
}
public void setIndent(int indentation) {
try{
IndentCommand item = new IndentCommand();
item.type = ItemType.INDENT;
item.indent = indentation;
q.put(item);
} catch (InterruptedException ex) {
throw new RuntimeException( ex );
}
}
public void end(){
try {
Item item = new Item();
item.type = ItemType.POISON;
q.put(item);
} catch (InterruptedException ex) {
throw new RuntimeException( ex );
}
}
public void run() {
BufferedWriter out = null;
Item item = null;
try{
out = new BufferedWriter( new FileWriter( file ) );
while( (item = q.take()).type != ItemType.POISON ){
switch( item.type ){
case NEWLINE:
out.newLine();
for( int i = 0; i < indentation; i++ )
out.append(" ");
break;
case INDENT:
indentation = ((IndentCommand)item).indent;
break;
case CHARSEQ:
out.append( ((CharSeqItem)item).content );
}
}
} catch (InterruptedException ex){
throw new RuntimeException( ex );
} catch (IOException ex) {
throw new RuntimeException( ex );
} finally {
if( out != null ) try {
out.close();
} catch (IOException ex) {
throw new RuntimeException( ex );
}
}
}
private enum ItemType {
CHARSEQ, NEWLINE, INDENT, POISON;
}
private static class Item {
ItemType type;
}
private static class CharSeqItem extends Item {
CharSequence content;
}
private static class IndentCommand extends Item {
int indent;
}
}
And then I use it by doing:
ParallelWriter w = new ParallelWriter( myFile );
new Thread(w).start();
/// Lots of
w.append(" things ").newLine();
w.setIndent(2);
w.newLine().append(" more things ");
/// and finally
w.end();
While this works perfectly well, I'm wondering:
Is there a better way to accomplish this?
Your basic approach looks fine. I would structure the code as follows:
import java.io.BufferedWriter;
import java.io.File;
import java.io.IOException;
import java.io.Writer;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.TimeUnit;
public interface FileWriter {
FileWriter append(CharSequence seq);
FileWriter indent(int indent);
void close();
}
class AsyncFileWriter implements FileWriter, Runnable {
private final File file;
private final Writer out;
private final BlockingQueue<Item> queue = new LinkedBlockingQueue<Item>();
private volatile boolean started = false;
private volatile boolean stopped = false;
public AsyncFileWriter(File file) throws IOException {
this.file = file;
this.out = new BufferedWriter(new java.io.FileWriter(file));
}
public FileWriter append(CharSequence seq) {
if (!started) {
throw new IllegalStateException("open() call expected before append()");
}
try {
queue.put(new CharSeqItem(seq));
} catch (InterruptedException ignored) {
}
return this;
}
public FileWriter indent(int indent) {
if (!started) {
throw new IllegalStateException("open() call expected before append()");
}
try {
queue.put(new IndentItem(indent));
} catch (InterruptedException ignored) {
}
return this;
}
public void open() {
this.started = true;
new Thread(this).start();
}
public void run() {
while (!stopped) {
try {
Item item = queue.poll(100, TimeUnit.MICROSECONDS);
if (item != null) {
try {
item.write(out);
} catch (IOException logme) {
}
}
} catch (InterruptedException e) {
}
}
try {
out.close();
} catch (IOException ignore) {
}
}
public void close() {
this.stopped = true;
}
private static interface Item {
void write(Writer out) throws IOException;
}
private static class CharSeqItem implements Item {
private final CharSequence sequence;
public CharSeqItem(CharSequence sequence) {
this.sequence = sequence;
}
public void write(Writer out) throws IOException {
out.append(sequence);
}
}
private static class IndentItem implements Item {
private final int indent;
public IndentItem(int indent) {
this.indent = indent;
}
public void write(Writer out) throws IOException {
for (int i = 0; i < indent; i++) {
out.append(" ");
}
}
}
}
If you do not want to write in a separate thread (maybe in a test?), you can have an implementation of FileWriter which calls append on the Writer in the caller thread.
One good way to exchange data with a single consumer thread is to use an Exchanger.
You could use a StringBuilder or ByteBuffer as the buffer to exchange with the background thread. The latency incurred can be around 1 micro-second, doesn't involve creating any objects and which is lower using a BlockingQueue.
From the example which I think is worth repeating here.
class FillAndEmpty {
Exchanger<DataBuffer> exchanger = new Exchanger<DataBuffer>();
DataBuffer initialEmptyBuffer = ... a made-up type
DataBuffer initialFullBuffer = ...
class FillingLoop implements Runnable {
public void run() {
DataBuffer currentBuffer = initialEmptyBuffer;
try {
while (currentBuffer != null) {
addToBuffer(currentBuffer);
if (currentBuffer.isFull())
currentBuffer = exchanger.exchange(currentBuffer);
}
} catch (InterruptedException ex) { ... handle ... }
}
}
class EmptyingLoop implements Runnable {
public void run() {
DataBuffer currentBuffer = initialFullBuffer;
try {
while (currentBuffer != null) {
takeFromBuffer(currentBuffer);
if (currentBuffer.isEmpty())
currentBuffer = exchanger.exchange(currentBuffer);
}
} catch (InterruptedException ex) { ... handle ...}
}
}
void start() {
new Thread(new FillingLoop()).start();
new Thread(new EmptyingLoop()).start();
}
}
Using a LinkedBlockingQueue is a pretty good idea. Not sure I like some of the style of the code... but the principle seems sound.
I would maybe add a capacity to the LinkedBlockingQueue equal to a certain % of your total memory.. say 10,000 items.. this way if your writing is going too slow, your worker threads won't keep adding more work until the heap is blown.
I know that frequent write operations
can slow a program down a lot
Probably not as much as you think, provided you use buffering.

Categories