Reading System.in character by character without buffering

Reading System.in character by character without buffering - java

I know that similar questions have been asked before, but not exactly what I'm asking. To begin with, let me explain my purpose. I'm trying to write a kind of "remote shell" that will take in characters from the console (System.in) on character at a time and then send those to a remote session on another machine, write them to that machine and gather any characters it might output to return to my shell to display back to the user.
So, the issue is that System.in, no matter what I do, doesn't really support a "raw" mode where any type of reader is able to read just one character at a time UNTIL a terminator character is entered, typically new line.
Things I have tried, Using Scanner, using a buffered reader, creating a FileDescriptor.in and creating a fileInputStream from that, using a FileChannel and reading into a ByteBuffer that is one character long, etc. In all cases, it seems, System.in only makes characters available to the java application after a terminator character has been entered by the user. I'm convinced there is not a "java" way to do this, so the question is, does anyone have some native code, wrapped in a java library to do this? Its hard to find such a thing just searching GitHub.
BTW, for the remote console, I'm using the pty4J package. I've seen sample projects that connect to that code using other langauages, for example javaScript running in a browser to create a web based shell. Other languages all you to do a simple "get_char" on standard in.
Some examples of the code I've tried:
Scanner scanner = new Scanner(System.in);
FileDescriptor fd = FileDescriptor.in;
FileInputStream fis = new FileInputStream(fd);
FileChannel fc = fis.getChannel();
while(process.isAlive()) {
System.out.println(scanner.next());
// ByteBuffer bb = ByteBuffer.allocate(1);
// int c = fc.read(bb);
// int c = fis.read();
// System.err.println("Read " + c);
// if (c == 1) {
// os.write(bb.get());
// }
}
You can see that I've tried various methods to read the input: scanner.next(), fc.read(byteBuffer), fileInputStream.read(), etc. All attempts "wait" till a terminator character is entered.
Additionally, I have tried using the "useDelimiter" and "next(pattern)" methods on the scanner too. That's still not working.
Any pointer or help is much appreciated.

Below is an example of reading one character at a time until end of stream is reached. On linux, you type control-d to signal the end input. I think on Windows, you type control-c to end of input.
import java.io.*;
class Test {
public static void main(String[] args) throws IOException {
int c = 0;
while( (c=System.in.read()) != -1){
System.out.println((char) c);
}
}
}

Related

Issue when convert buffer to string with hexadecimal code of LF

I am trying to download web page with all its resources . First i download the html, but when to be sure to keep file formatted and use this function below .
there is and issue , i found 10 in the final file and when i found that hexadecimal code of the LF or line escape . and this makes troubles to my javascript functions .
Example of the final result :
<!DOCTYPE html>10<html lang="fr">10 <head>10 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />10
Can someone help me to found the real issue ?
public static String scanfile(File file) {
StringBuilder sb = new StringBuilder();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
while (true) {
String readLine = bufferedReader.readLine();
if (readLine != null) {
sb.append(readLine);
sb.append(System.lineSeparator());
Log.i(TAG,sb.toString());
} else {
bufferedReader.close();
return sb.toString();
}
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
}

There are multiple problems with your code.
Charset error
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
This isn't going to work in tricky ways.
Files (and, for that matter, data given to you by webservers) comes in bytes. A stream of numbers, each number being between 0 and 255.
So, if you are a webserver and you want to send the character ö, what byte(s) do you send?
The answer is complicated. The mapping that explains how some character is rendered in byte(s)-form is called a character set encoding (shortened to 'charset').
Anytime bytes are turned into characters or vice versa, there is always a charset involved. Always.
So, you're reading a file (that'd be bytes), and turning it into a Reader (which is chars). Thus, charset is involved.
Which charset? The API of new FileReader(path) explains which one: "The system default". You do not want that.
Thus, this code is broken. You want one of two things:
Option 1 - write the data as is
When doing the job of querying the webserver for the data and relaying this information onto disk, you'd want to just store the bytes (after all, webserver gives bytes, and disks store bytes, that's easy), but the webserver also sends the encoding, in a header, and you need to save this separately. Because to read that 'sack of bytes', you need to know the charset to turn it into characters.
How would you do this? Well, up to you. You could for example decree that the data file starts with the name of a charset encoding (as sent via that header), then a 0 byte, and then the data, unmodified. I think you should go with option 2, however
Option 2
Another, better option for text-based documents (which HTML is), is this: When reading the data, convert it to characters, using the encoding as that header tells you. Then, to save it to disk, turn the chars back to bytes, using UTF-8, which is a great encoding and an industry standard. That way, when reading, you just know it's UTF-8, period.
To read a UTF-8 text file, you do:
Files.newBufferedReader(Paths.get(file));
The reason this works, is that the Files API, unlike most other APIs (and unlike FileReader, which you should never ever use), defaults to UTF_8 and not to platform-default. If you want, you can make it more readable:
Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8);
same thing - but now in the code it is clear what's happening.
Broken exception handling
} catch (IOException e) {
e.printStackTrace();
return null;
}
This is not okay - if you catch an exception, either [A] throw something else, or [B] handle the problem. And 'log it and keep going' is definitely not 'handling' it. Your strategy of exception handling results in 1 error resulting in a thousand things going wrong with a thousand stack traces, and all of them except the first are undesired and irrelevant, hence why this is horrible code and you should never write it this way.
The easy solution is to just put throws IOException on your scanFile method. The method inherently interacts with files, it SHOULD be throwing that. Note that your psv main(String[] args) method can, and usually should, be declared to throws Exception.
It also makes your code simpler and shorter, yay!
Resource Management failure
a filereader is a resource. You MUST close it, no matter what happens. You are not doing that: If .readLine() throws an exception, then your code will jump to the catch handler and bufferedReader.close is never executed.
The solution is to use the ARM (Automatic Resource Management) construct:
try (var br = Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8)) {
// code goes here
}
This construct ensures that close() is invoked, regardless of how the 'code goes here' block exits. Even if it 'exits' via an exception or a return statement.
The problem
Your 'read a file and print it' code is other than the above three items mostly fine. The problem is that the HTML file on disk is corrupted; the error lies in your code that reads the data from the web server and saves it to disk. You did not paste that code.
Specifically, System.lineSeparator() returns the actual string. Thus, assuming the code you pasted really is the code you are running, if you are seeing an actual '10' show up, then that means the HTML file on disk has that in there. It's not the read code.
Closing thoughts
More generally the job of 'just print a file on disk with a known encoding' can be done in far fewer lines of code:
public static String scanFile(String path) throws IOException {
return Files.readString(Paths.get(path));
}
You should just use the above code instead. It's simple, short, doesn't have any bugs, cannot leak resources, has proper exception handling, and will use UTF-8.

Actually, there is no problem in this function I was mistakenly adding 10 using another function in my code .

StreamDecoder vs InputStreamReader when reading malformed files

I came across some strange behavior with reading files in Java 8 and i'm wondering if someone can make sense of it.
Scenario:
Reading a malformed text file. By malformed i mean that it contains bytes that do not map to any unicode code points.
The code i use to create such a file is as follows:
byte[] text = new byte[1];
char k = (char) -60;
text[0] = (byte) k;
FileUtils.writeByteArrayToFile(new File("/tmp/malformed.log"), text);
This code produces a file that contains exactly one byte, which is not part of the ASCII table (nor the extended one).
Attempting to cat this file produces the following output:
�
Which is the UNICODE Replacement Character. This makes sense because UTF-8 needs 2 bytes in order to decode non-ascii characters, but we only have one. This is the behavior i expect from my Java code as well.
Pasting some common code:
private void read(Reader reader) throws IOException {
CharBuffer buffer = CharBuffer.allocate(8910);
buffer.flip();
// move existing data to the front of the buffer
buffer.compact();
// pull in as much data as we can from the socket
int charsRead = reader.read(buffer);
// flip so the data can be consumed
buffer.flip();
ByteBuffer encode = Charset.forName("UTF-8").encode(buffer);
byte[] body = new byte[encode.remaining()];
encode.get(body);
System.out.println(new String(body));
}
Here is my first approach using nio:
FileInputStream inputStream = new FileInputStream(new File("/tmp/malformed.log"));
read(Channels.newReader(inputStream.getChannel(), "UTF-8");
This produces the following exception:
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.Reader.read(Reader.java:100)
Which is not what i expected but also kind of makes sense, because this is actually a corrupt and an illegal file, and the exception is basically telling us it expected more bytes to be read.
And my second one (using regular java.io):
FileInputStream inputStream = new FileInputStream(new File("/tmp/malformed.log"));
read(new InputStreamReader(inputStream, "UTF-8"));
This does not fail and produces the exact same output as cat did:
�
Which also makes sense.
So my questions are:
What is the expected behavior from a Java Application in this scenario?
Why is there a difference between using the Channels.newReader (which returns a StreamDecoder) and simply using the regular InputStreamReader? Am i doing something wrong with how i read?
Any clarifications would be much appreciated.
Thanks :)

The difference between the behaviour actually goes right down to the StreamDecoder and Charset classes. The InputStreamReader gets a CharsetDecoder from StreamDecoder.forInputStreamReader(..) which does replacement on error
StreamDecoder(InputStream in, Object lock, Charset cs) {
this(in, lock,
cs.newDecoder()
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE));
}
while the Channels.newReader(..) creates the decoder with the default settings (i.e. report instead of replace, which results in an exception further up)
public static Reader newReader(ReadableByteChannel ch,
String csName) {
checkNotNull(csName, "csName");
return newReader(ch, Charset.forName(csName).newDecoder(), -1);
}
So they work differently, but there's no indication in documentation anywhere about the difference. This is badly documented, but I suppose they changed the functionality because you'd rather get an exception than have your data silently corrupted.
Be careful when dealing with character encodings!

Interpret a string from one encoding to another in java

I've looked around for answers to this (I'm sure they're out there), and I'm not sure it's possible.
So, I got a HUGE file that contains the word "för". I'm using RandomAccessFile because I know where it is (kind of) and can therefore use the seek() function to get there.
To know that I've found it I have a String "för" in my program that I check for equality. Here's the problem, I ran the debugger and when I get to "för" what I get to compare is "fÃ¶r".
So my program terminates without finding any "för".
This is the code I use to get a word:
private static String getWord(RandomAccessFile file) throws IOException {
StringBuilder stb = new StringBuilder();
String word;
char c;
c = (char)file.read();
int end;
do {
stb.append(c);
end = file.read();
if(end==-1)
return "-1";
c = (char)end;
} while (c != ' ');
word = stb.toString();
word.trim();
return word;
}
So basically I return all the characters from the current point in the file to the first ' '-character. So basically I get the word, but since (char)file.read(); reads a byte (I think), UTF-8 'ö' becomes the two characters 'Ã' and '¶'?
One reason for this guess is that if I open my file with encoding UTF-8 it's "för" but if I open the file with ISO-8859-15 in the same place we now have exactly what my getWord method returns: "fÃ¶r"
So my question:
When I'm sitting with a "för" and a "fÃ¶r", is there any way to fix this? Like saying "read "fÃ¶r" as if it was an UTF-8 string" to get "för"?

If you have to use a RandomAccessFile you should read the content into a byte[] first and then convert the complete array to a String - somthing along the lines of:
byte[] buffer = new byte[whatever];
file.read(buffer);
String result = new String(buffer,"UTF-8");
This is only to give you a general impression what to do, you'll have to add some length-handling etc.
This will not work correctly if you start reading in the middle of a UTF-8 sequence, but so will any other method.

You are using RandomAccessFile.read(). This reads single bytes. UTF-8 sometimes uses several bytes for one character.
Different methods to read UTF-8 from a RandomAccessFile are discussed here: Java: reading strings from a random access file with buffered input
If you don't necessarily need a RandomAccessFile, you should definitely switch to reading characters instead of bytes.
If possible, I would suggest Scanner.next() which searches for the next word by default.

import java.nio.charset.Charset;
String encodedString = new String(originalString.getBytes("ISO-8859-15"), Charset.forName("UTF-8"));

Flush/Clear System.in (stdin) before reading

At work, we have 5 RFID readers attached to a PC running Linux. The readers are all recognized as keyboards and send their input (what they read form the Chip) as an key-input-event sequence. To be able to tell which reader send what sequence, I'm doing a raw-read over /dev/input/XX and get their input this way.
The problem with this is, that the send keyboard-events generated by the RFID readers are still "in" stdin and when I try to read from System.in via Scanner (input should be generated by a normal keyboard this time), I first get the "pending" input from the readers (which consists of 10 Hex-decimal digits and a newline (\n)).
Now, the question is: How can I flush all these "pending" input's from stdin and then read what I really want from the keyboard?
I tried:
System.in.skip(System.in.available());
But seek is not allowed on stdin (skip throws an IOException).
for (int i = 0; i < System.in.available(); i++){
System.in.read();
}
But available() doesn't estimate enough (still stuff in stdin afterwards).
Scanner scanner = new Scanner(System.in);
while (scanner.hasNextLine()){
scanner.nextLine();
}
System.out.println("Clean!");
But hasNextLine() never becomes false (the print never executes).
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = in.readLine()) != null);
System.out.println("Clean!");
Same as above.
Anyone with any more ideas?

Based on #Joni's advice, i put this together:
Scanner scanner = new Scanner(System.in);
int choice = 0;
while (scanner.hasNext()){
if (scanner.hasNextInt()){
choice = scanner.nextInt();
break;
} else {
scanner.next(); // Just discard this, not interested...
}
}
This discards the data that is already "pending" in stdin and waits until valid data is entered. Valid, in this context, meaning a decimal integer.

This worked for me
System.in.read(new byte[System.in.available()])

A related one.
I read a double, then needed to read a string.
Below worked correctly:
double d = scanner.nextDouble();
scanner.nextLine(); // consumes \n after the above number(int,double,etc.)
String s = scanner.nextLine();

There is no built-in portable way to flush the data in an input stream. If you know that the pending data ends with \n why don't you read until you find it?

Devices usually send data using a well defined protocol which you can use to parse data segments.
If I'm right, discard data that isn't properly formatted for the protocol. This allows you to filter out the data you aren't interested in.
As I'm not familiar with the RFID scanner you're using I can't be of more help, but this is what I suggest.

You could do this with multiple threads.
Your real application reads from a PipedInputStream that is connected to a PipedOutputStream
You need to have one thread reading from System.in continuously. As long as the real application is not interested in the data coming from System.in (indicated by a boolean flag), this thread discards everything that it reads. But when the real application sets the flag to indicate that it is interested in the data coming from System.in, then this thread sends all the data that it reads to the PipedOutputStream.
Your real application turns on the flag to indicate that it is interested in the data, and clears the flag when it is no longer interested in the data.
This way, the data from System.in is always automatically flushed/clead

The best practice (that I've found) when dealing with terminals (aka. the console) is to deal with i/o a line at a time. So the ideal thing to do is get the entire line of user input, as a string, and then parse it as you see fit. Anything else is not only implementation specific, but also prone to blocking.
Scanner sc = new Scanner(System.in);
String line = "";
while (true) {
System.out.println("Enter something...");
try {
line = sc.nextLine();
//Parse `line` string as you see fit here...
break;
} catch (Exception e) {}
}
I include the while & try/catch blocks so that the prompt will loop infinitely on invalid input.

You can try:
System.in.skipNBytes(System.in.available());

Java - Stuck when reading from inputstream

Hello I am currently working with sockets and input/output streams. I have a strange problem with my loop I use to send bytes. For some reason it get stuck when it tries to read from the inputstream when it is supposed to stop. Does anyone have any idea whats wrong?
int bit;
final byte[] request = new byte[1024];
if (response instanceof InputStream)
{
while ((bit = response.read(request)) > 0) { // <-- Stuck here
incoming.getOutputStream().write(request,0,bit);
incoming.getOutputStream().flush();
}
}
incoming.close();

InputStream.read blocks until input data is available, end of file is detected, or an exception is thrown.
You don't catch the exception, and don't check for EOF.

What I've done in the past to leave each side open is to add a termination character to the end of each message that you wouldn't expect to see in the message. If you are building the messages yourself then you could use a character such as a ; or maybe double pipes or something ||. Then just check for that character on the receiving end. Just a workaround. Not a solution. It was necessary in my case but may not be for you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading System.in character by character without buffering - java

Related

Issue when convert buffer to string with hexadecimal code of LF

StreamDecoder vs InputStreamReader when reading malformed files

Interpret a string from one encoding to another in java

Flush/Clear System.in (stdin) before reading

Java - Stuck when reading from inputstream

Categories

Resources