How to detect if a windows named pipe has been closed? - java

I've got a third party program which puts data into a windows named pipe.
I access the pipe with
String pipename = "\\\\.\\pipe\\the_pipe";
RandomAccessFile pipe = new RandomAccessFile(pipename, "r");
DataInputStream input = new DataInputStream(Channels.newInputStream(pipe.getChannel()));
So sometimes someone gets the 'perfect' idea to close the third party program before my small data converting tool which of course closes the pipe. When that happens, my tool writes down the last message received million times into the resultfile and successfully fills every hdd up to the last byte within hours, because I am not able to check if the pipe has been closed.
Things I've tried:
// checking file descriptor and file channel
if(!(pipe.getFD().valid() && pipe.getChannel().isOpen())) {
// quit
}
But both options do not trigger.
Is there a other way to access named pipes where such an information can be obtained?
Or do I have overlooked something?

When that happens, my tool writes down the last message received million times into the resultfile
Only if your tool ignores EOFExceptions and -1 return values from read().

I would recommend looking at JNA to access the named pipe directly. Yes, you can detect the pipe closure if you use JNA.
I have done so in a project called NuProcess available on github.com. Particularly, look at the com.zaxxer.nuprocess.windows package. Look at the WindowsProcess.createPipes() method for setting up pipes (the code creates both ends, you only need one).
On the read side, NuProcess is using Windows IOCompletionPorts for asynchronous I/O (in ProcessCompletions.java), which may be overkill for what you need (or not). But once you get your feet wet setting up the pipes, you should be able to figure it out from there reading the Microsoft API docs.

Related

How is actually End of File detected in java?

I never thought about it.
But if you read a file you can use for example this code.
FileReader fileReader = new FileReader("c:\\data\\input-text.txt");
int data = fileReader.read();
while(data != -1) {
data = fileReader.read();
}
But how is actually recognised that the file ends. Is this because operating system know size of the file. Or is there a special character . I think java will call some C/C++ function from operating system and this function will return -1 , so java knows end of file is reached. But how does operating system know that file end is reached. Which special character is used for this.
How is actually End of File detected in java?
Java doesn't detect it. The operating system does.
The meaning of end-of-file depends on the nature of the "file" that you are reading.
If the file is a regular file in a file system, then the operating system knows or can find out what the actual file size is. It is part of the file's metadata.
If the file is a Socket stream, then end-of-file means that all available data has been consumed, and the OS knows that there cannot be any more. Typically, the socket has been closed or half closed.
If the file is a Pipe, then end-of-file means that the other end of the Pipe has closed it, and there will be no maore data.
If the file is a Linux/UNIX device file, then the precise end-of-file meaning will be device dependent. For example, if the device is a "tty" device on Linux/UNIX, it could mean:
the modem attached to the serial line has dropped out
the tty was in "cooked" mode and received the character that denotes EOF
and possibly other things.
It is common for a command shell to provide a way to signal an "end of file". Depending on the implementation, it may implement this itself, or it may be implemented at the device driver level. In either case, Java is not involved in the recognition.
I think java will call some C/C++ function from operating system and this function will return -1 , so java knows end of file is reached.
On Linux / UNIX / MacOS, the Java runtime calls the read(fd, buffer, count) native library method. That will return -1 if the fd is at the end-of-file position.
I see the chances of most popular file systems like ext and NTFS using a delimiter/ special char to mark the end of data as very slim. This is because files often have to store binary information too rather than text data and if the delimiter is present within its data, it can easily confuse the OS. In Linux, VFS (Virtual Filesystem Layer) offloads these details to implementations themselves and most of them construct a unique iNode (sort of like metadata) for every file that's resident in the filesystem. iNodes tend to have information on the blocks where the data is stored and also the exact size of the file among other things. Detecting EOF becomes is trivial when you have those.

How file manipulations perform during power outage

Linux machine, Java standalone application
I am having the following situation:
I have:
consecutive file write(which creates the destination file and writes some content to it) and file move.
I also have a power outage problem, which instantly cuts off the power of computer during these operations.
As a result, I am getting that the file was created, and it was moved as well, but the file content is empty.
The question is what under the hood can be causing this exact outcome? Considering the time sensitivity, may be hard drive is disabled before the processor and RAM during the cut out, but in that case, how is it possible that the file is created and moved after, but the write before moving is not successful?
I tried catching and logging the exception and debug information but the problem is power outage disables the logging abilities(I/O) as well.
try {
FileUtils.writeStringToFile(file, JsonUtils.toJson(object));
} finally {
if (file.exists()) {
FileUtils.moveFileToDirectory(file, new File(path), true);
}
}
Linux file systems don't necessarily write things to disk immediately, or in exactly the order that you wrote them. That includes both file content and file / directory metadata.
So if you get a power failure at the wrong time, you may find that the file data and metadata is inconsistent.
Normally this doesn't matter. (If the power fails and you don't have a UPS, the applications go away without getting a chance to finish what they were doing.)
However, if it does matter, you can do the following: to force the file to "sync" before you move it:
FileOutputStream fos = ...
// write to file
fs.getFD().sync();
fs.close();
// now move it
You need to read the javadoc for sync() carefully to understand what the method actually does.
You also need to read the javadoc for the method you are using to move the file regarding atomicity.

Will positioned read or seek() from HDFS file load and ignore whole content of the file?

I want to read sub-content the big file from some offset/position.
For example I have a file of 1M lines and I want to read 50 lines starting from 100th. (line no: 101 to 150 - both inclusive)
I think I should be using PositionalReadable.
https://issues.apache.org/jira/browse/HADOOP-519
I see that FSInputStream.readFully actually uses seek() method of Seekable.
When I check the underlying implementation of seek() I see that it uses BlockReader.skip()
Wouldn't the blockReader.skip() read the whole data till position to skip the bytes? Question is would HDFS load first 100 lines as well in order to get to 101th line.
How to make position to be at any desired offset in the file like 10000th line of the file without loading the rest of the content? Something what s3 offers in header-offsets.
Here is the similar question I found: How to read files with an offset from Hadoop using Java, but it suggest using seek() and that is argued in the comments that seek() is expensive operation and should be used sparingly. Which I guess is correct because seek seems to read all the data in order to skip till the position.
The short answer may or may not read as much data as skip(n).
As you said, seek() internally calls BlockReader.skip(). BlockReader is an interface type and is created via BlockReaderFactory(). The BlockReader implementation that is created is either BlockReaderRemote or BlockReaderLocal. (Exactly, ExternalBlockReader is also possible, but it is excluded because it is a special case)
BlockReaderRemote is used when a client reads data from a remote DataNode over the network via RPC over TCP. In this case, if you analyze the skip() method code, you can see that readNextPacket is repeatedly called as many times as n bytes to skip. That is, it actually reads the data to be skipped.
BlockReaderLocal is used when the client is on the same machine as the DataNode where the block is stored. In this case, the client can read the block file directly, and change dataPos to actually do an offset-based skip on the next read operation.
+Additional information (2023.01.19)
The above applies to both Hadoop 3.x.x and 2.x.x, but the path and name of the implementation have been changed from version 2.8.0 due to a change in the project structure.
< 2.8.0
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
>= 2.8.0
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderLocal.java
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderRemote.java
Related Jira issues
https://issues.apache.org/jira/browse/HDFS-8057
https://issues.apache.org/jira/browse/HDFS-8925
I recommend you to look at SequenceFile format may be it will suit your needs.
We use seek to read from arbitrary place of a file.
https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/io/SequenceFile.Reader.html#seek(long)

Entering data into an InputStream

I know that InputStreams are for reading, and OutputStreams are for writing... but if I have an application that passes all data from an InputStream to the remote side and pushes all received data from that remote side to the OutputStream and I need to send dynamic data to that remote side... how would I enter it into the InputStream? I can do this easily in the Java console since anything entered in is put into System.in and sent to that remote side, and anything coming back is processed through System.out, but obviously I cannot use a Java console in production. How would I emulate this functionality e.g. create a button that sends "command X\r" as if it were typed into the java console?
Note: For background, I'm using JSch to SSH into a Cisco ASA. Where I have set Channel.setInputStream(System.in) and Channel.setOutputStream(System.out) to communicate through console successfully.
I am not familiar with JSch, and I suspect you have this backwards. First, according to their example, you should actually be executing commands with Channel.setCommand(). Then you can use Channel.getInputStream() to obtain a stream that you can read the remote response from.
That aside, a cursory glance at the documentation seems to suggest that you should use the channel's existing streams and read to / write from them, e.g.:
OutputStream out = Channel.getOutputStream();
String str = "command X\r";
out.write(str.getBytes("us-ascii"));
This would make more sense and is much easier to deal with on your end.
However, as to the general question regarding InputStreams: You can use any InputStream as a source for data. It just so happens that System.in is one that comes from standard input (which is essentially a file).
If you want to use data constructed on the fly, you could use a ByteArrayInputStream, e.g.:
String str = "command X\r";
InputStream in = new ByteArrayInputStream(str.getBytes("us-ascii"));
// do stuff with in
You can use any character encoding you want if us-ascii is not appropriate.
But, again, I suspect you are doing this slightly backwards.

Appropriate time to open stream during decompression of a file to be passed to a worker pool

This may be more of a theoretical question. I have a scenario wherein there is a compressed file (~2 GB) that gets decompressed into a larger file (~22GB). This process takes roughly 20 minutes, which in turn means I am wasting about 19 minutes and 59 seconds every time this process gets run. My question is the following: Is it possible to open up a stream from the file that is being decompressed and pass the information to a separate program that will manipulate the data? Essentially every line in the file is a record, but I have been unable to find a technique to discover when a line has been fully decoded during decompression. General algorithms or Java libraries are of value.
You can use java.util.zip's GZIPInputStream to read the gzip file sequentially. Then you can implement your own buffering and extract lines, or use BufferedReader with the readLine method.
Yes, this is easy. In unix, you might do
bzcat compressedfile.bz2 | mainprogram
Then your mainprogram can read the decompressed stream on standard input. Similar command-line programs exist for zip and gzip.
If the main program needs to read from a file instead of standard input, use a named pipe.
If you're on Windows, there may or may not be similar tools.

Categories