I'm writing scripts that will run in parallel and will get their input data from the same file. These scripts will open the input file, read the first line, store it for further treatment and finally erase this read line from the input file.
Now the problem is that multiple scripts accessing the file can lead to the situation where two scripts access the input file simultaneously and read the same line, which produces the unacceptable result of the line being processed twice.
Now one solution is to write a lock file (.lock_input) before accessing the input file, and then erase it when releasing the input file, but this solution is not appealing in my case because sometimes NFS slows down network communication randomly and may not have reliable locking.
Another solution is to put a process lock instead of writing a file, which means the first script to access the input file will launch a process called lock_input, and the other scripts will ps -elf | grep lock_input. If it is present on the process list they will wait. This may be faster than writing to the NFS but still not perfect solution ...
So my question is: Is there any bash command (or other script interpreter) or a service I can use that will behave like semaphore or mutex locks used for synchronization in thread programming?
Thank you.
Small rough example:
Let's say we have input_file as following:
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Treatment script : TrScript.sh
#!/bin/bash
NbLines=$(cat input_file | wc -l)
while [ ! $NbLines = 0 ]
do
FirstLine=$(head -1 input_file)
echo "Hello World today is $FirstLine"
RemainingLines=$(expr $NbLines - 1 )
tail -n $RemainingLines input_file > tmp
mv tmp input_file
NbLines=$(cat input_file | wc -l)
done
Main script:
#! /bin/bash
./TrScript.sh &
./TrScript.sh &
./TrScript.sh &
wait
The result should be:
Hello World today is Monday
Hello World today is Tuesday
Hello World today is Wednesday
Hello World today is Thursday
Hello World today is Friday
Hello World today is Saturday
Hello World today is Sunday
use
line=`flock $lockfile -c "(gawk 'NR==1' < $infile ; gawk 'NR>1' < $infile > $infile.tmp ; mv $infile.tmp $infile)"`
for accessing the file you want to read from. This uses file locks, though.
gawk NR==1 < ...
prints the first line of the input
I have always liked the lockfile program (sample search result for lockfile manpage) from the procmail set of tools (should be available on most systems, though it might not be installed by default).
It was designed to lock mail spool files, which are (were?) commonly mounted via NFS, so it does work properly over NFS (as much as anything can).
Also, as long as you you are making the assumption that all your ‘workers’ are on the same machine (by assuming you can check for PIDs, which may not work properly when PIDs eventually wrap), you could put your lock file in some other, local, directory (e.g. /tmp) while processing files hosted on an NFS server. As long as all the workers use the same lock file location (and a one-to-one mapping of lockfile filenames to locked pathnames), it will work fine.
Using FLOM (Free LOck Manager) tool your main script can become as easy as:
#!/bin/bash
flom -- ./TrScript.sh &
flom -- ./TrScript.sh &
flom -- ./TrScript.sh &
wait
if you are running the script inside a single host and something like:
flom -A 224.0.0.1 -- ./TrScript.sh &
if you want to distribute your script on many hosts. Some usage examples are available at this URL: http://sourceforge.net/p/flom/wiki/FLOM%20by%20examples/
Related
I've a JavaFX application where I've a list of a bunch of script files. Once the application loads, it reads it and and checks which ones are running.
To do that I use a ProcessHandle, as mentioned in various examples here on StackOverflow and other guides/tutorials on the internet.
The problem is, it never finds any of them. There for I programmatically started one, which I know for a fact that it will be running, via Process process = new ProcessBuilder("/path/to/file/my_script.sh").start(); - and it won't find this one either.
Contents of my_script.sh:
#!/bin/bash
echo "Wait for 5 seconds"
sleep 5
echo "Completed"
Java code:
// List of PIDs which correspond to the processes shown after "INFO COMMAND:"
System.out.println("ALL PROCESSES: " + ProcessHandle.allProcesses().toList());
Optional<ProcessHandle> scriptProcessHandle = ProcessHandle.allProcesses().filter(processHandle -> {
System.out.println("INFO COMMAND: " + processHandle.info().command());
Optional<String> processOptional = processHandle.info().command();
return processOptional.isPresent() && processOptional.get().equals("my_script.sh");
}).findFirst();
System.out.println("Script process handle is present: " + scriptProcessHandle.isPresent());
if (scriptProcessHandle.isPresent()) { // Always false
// Do stuff
}
Thanks to the good old fashioned System.out.println(), I noticed that I get this in my output console every time:
ALL PROCESSES: [1, 2, 28, 85, 128, 6944, 21174, 29029, 29071]
INFO COMMAND: Optional[/usr/bin/bwrap]
INFO COMMAND: Optional[/usr/bin/bash]
INFO COMMAND: Optional[/app/idea-IC/jbr/bin/java]
INFO COMMAND: Optional[/app/idea-IC/bin/fsnotifier]
INFO COMMAND: Optional[/home/username/.jdks/openjdk-17.0.2/bin/java]
INFO COMMAND: Optional[/usr/bin/bash]
INFO COMMAND: Optional[/home/username/.jdks/openjdk-17.0.2/bin/java]
INFO COMMAND: Optional[/home/username/.jdks/openjdk-17.0.2/bin/java]
INFO COMMAND: Optional[/usr/bin/bash]
Script process handle is present: false
The first line in the Javadoc of ProcessHandle.allProcess() reads:
Returns a snapshot of all processes visible to the current process.
So how come I can't see the rest of the operating system's processes?
I'm looking for a non-os-dependent solution, if possible. Why? For better portability and hopefully less maintenance in the future.
Notes:
A popular solution for GNU/Linux seems to be to check the proc entries, but I don't know if that would work for at least the majority of the most popular distributions - if it doesn't, adding support for them in a different way, would create more testing and maintenance workload.
I'm aware of ps, windir, tasklist.exe possible solutions (worst comes to worst).
I found the JavaSysMon library but it seems dead and unfortunately:
CPU speed on Linux only reports correct values for Intel CPUs
Edit 1:
I'm on Pop_OS! and installed IntelliJ via the PopShop as flatpak.
In order to start it as root as suggested by mr mcwolf, I went to /home/username/.local/share/flatpak/app/com.jetbrains.IntelliJ-IDEA-Community/x86_64/stable/active/export/bin and found com.jetbrains.IntelliJ-IDEA-Community file.
When I run sudo ./com.jetbrains.IntelliJ-IDEA-Community or sudo /usr/bin/flatpak run --branch=stable --arch=x86_64 com.jetbrains.IntelliJ-IDEA-Community in my terminal, I get error: app/com.jetbrains.IntelliJ-IDEA-Community/x86_64/stable not installed
So I opened the file and ran its contents:
exec /usr/bin/flatpak run --branch=stable --arch=x86_64 com.jetbrains.IntelliJ-IDEA-Community "$#"
This opens IntelliJ, but not as root, so instead I ran:
exec sudo /usr/bin/flatpak run --branch=stable --arch=x86_64 com.jetbrains.IntelliJ-IDEA-Community "$#"
Which prompts for a password and when I write it in, the terminal crashes.
Edit 1.1:
(╯°□°)╯︵ ┻━┻ "flatpak run" is not intended to be ran with sudo
Edit 2:
As mr mcwolf said, I downloaded the IntelliJ from the official website, extracted it and ran the idea.sh as root.
Now a lot more processes are shown. 1/3 of them show up as INFO COMMAND: Optional.empty.
scriptProcessHandle.isPresent() is still unfortunately returning false. I searched through them and my_script.sh is nowhere to be found. I also tried processOptional.isPresent() && processOptional.get().equals("/absolute/path/to/my_script.sh") but I still get false on isPresent() and it's not in the list of shown processes.
Though the last sentence might be a different problem. I'll do more digging.
Edit 3:
Combining .commandLine() and .contains() (instead of .equals()) solves the problem mentioned in "Edit 2".
Optional<ProcessHandle> scriptProcessHandle = ProcessHandle.allProcesses().filter(processHandle -> {
System.out.println("INFO COMMAND LINE: " + processHandle.info().commandLine());
Optional<String> processOptional = processHandle.info().commandLine();
return processOptional.isPresent() && processOptional.get().contains("/absolute/path/to/my_script.sh");
}).findFirst();
System.out.println("Script process handle is present: " + scriptProcessHandle.isPresent());
if (scriptProcessHandle.isPresent()) { // Returns true
// Do stuff
}
.commandLine() also shows script arguments, so that must be kept in mind.
I want to read data stored locally by the Apple Calendar app on my Mac (12.1 Monterey).
The data is stored in subdirectories of ~/Library/Calendars/ with one subdirectory per calendar.
The problem: When I try to get a list of files from there, Java returns null:
String userHomeDir = System.getProperty("user.home");
File calendarRoot = new File(userHomeDir + "/Library/Calendars/");
File[] calendars = calendarRoot.listFiles();
System.out.println("Number of files: " + calendars.length); // NPE thrown here
File permissions are as follows:
~/Library: drwx------+ (owner: my user)
~/Library/Calendars: drwxr-xr-x# (owner: my user)
Listing files in Library works fine.
How can I access that folder?
Short answer
Give it up. Apple has made it next to impossible to elegantly get a Java app to read calendar data.
Long answer
Since some versions (Catalina?) the directory ~/Library/Calendars/ and all subdirectories (and files therein) are protected by MacOS using extended attributes, namely com.apple.quarantine.
It used to be possible to grant applications the specific right to access calendar data using System Settings - Security and Privacy - Privacy - Calendar. However, the manual +-Button has gone now.
What I will do is use some zsh script to export the desired calendar events to another directory and remove the com.apple.quarantine attribute from there, too.
This is not elegant and leaves the Java world, but for my case, having a Java command line application being started from a designated shell script, it works rather nicely.
Here's what I came up with:
#!/bin/zsh
calendars="/Users/yourUserName/Library/Calendars"
target="/Users/yourUserName/some/other/directory/Calendar_Export"
cd ${calendars}
calsource=""
for f in *.calendar
do
linesFound=`grep -c '<string>Your Calendar Name</string>' ${f}/Info.plist`
if [[ ${linesFound} -eq 1 ]]
then
echo "The relevant calendar resides at " ${f}", copying all events"
calsource=${calendars}/${f}/Events
fi
done
if [[ ${calsource} != "" ]]
then
rm ${target}/*
cp ${calsource}/* ${target}/
xattr -d com.apple.quarantine ${target}/*
fi
I use Google Closure Compiler to compile automatically javascript using PHP (is needed to do it that way - in PHP, hovewer no security limitations on Windows machine). I wrote simple PHP script which calls process, pass .js content to stdin and receive recompiled .js via stdout. It works fine, problem is, when I compiling for example 40 .js files, it takes on strong machine almost 2 minutes. However, mayor delay is because java starts new instance of .jar app for every script. Is there any way how to modify script below to create process only one and send/receive .js content multiple times before process ends?
function compileJScript($s) {
$process = proc_open('java.exe -jar compiler.jar', array(
0 => array("pipe", "r"), 1 => array("pipe", "w")), $pipes);
if (is_resource($process)) {
fwrite($pipes[0], $s);
fclose($pipes[0]);
$output = stream_get_contents($pipes[1]);
fclose($pipes[1]);
if (proc_close($process) == 0) // If fails, keep $s intact
$s = $output;
}
return $s;
}
I can see several options, but don't know if it is possible and how to do it:
Create process once and recreate only pipes for every file
Force java to keep JIT-ed .jar in memory for much faster re-executing
If PHP can't do it, is possible to use bridge (another .exe file which will start fast every time, transfer stdin/out and redirects it to running compiler; if something like this even exists)
This is really a matter of coordination between the two process.
Here I wrote a quick 10-minutes script (just for the fun) that launches a JVM and sends an integer value, which java parses and returns incremented.. which PHP will just send it back ad-infinitum..
PHP.php
<?php
echo 'Compiling..', PHP_EOL;
system('javac Java.java');
echo 'Starting JVM..', PHP_EOL;
$pipes = null;
$process = proc_open('java Java', [0 => ['pipe', 'r'],
1 => ['pipe', 'w']], $pipes);
if (!is_resource($process)) {
exit('ERR: Cannot create java process');
}
list($javaIn, $javaOut) = $pipes;
$i = 1;
while (true) {
fwrite($javaIn, $i); // <-- send the number
fwrite($javaIn, PHP_EOL);
fflush($javaIn);
$reply = fgetss($javaOut); // <-- blocking read
$i = intval($reply);
echo $i, PHP_EOL;
sleep(1); // <-- wait 1 second
}
Java.java
import java.util.Scanner;
class Java {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
while (s.hasNextInt()) { // <-- blocking read
int i = s.nextInt();
System.out.print(i + 1); // <-- send it back
System.out.print('\n');
System.out.flush();
}
}
}
To run the script simply put those files in the same folder and do
$ php PHP.php
you should start seeing the numbers being printed like:
1
2
3
.
.
.
Note that while those numbers are printed by PHP, they are actually generated by Java
I don't think #1 from your list is possible because compiler.jar would need to have native support for keeping the process alive, which it doesn't (and if you consider that a compression algorithm needs the entire input before it can start processing data, it makes sense that the process doesn't stay alive).
According to Anyway to Boost java JVM Startup Speed? some people have been able to reduce their jvm startup times with nailgun
Nailgun is a client, protocol, and server for running Java programs
from the command line without incurring the JVM startup overhead.
Programs run in the server (which is implemented in Java), and are
triggered by the client (written in C), which handles all I/O.
I'm trying to write a Groovy script that wraps another command and am having trouble with the stdout/stderr order. My script is below:
#!/usr/bin/env groovy
synchronized def output = ""
def process = "qrsh ${args.join(' ')}".execute()
def outTh = Thread.start {
process.in.eachLine {
output += it
System.out.println "out: $it"
}
}
def errTh = Thread.start {
process.err.eachLine {
output += it
System.err.println "err: $it"
}
}
outTh.join()
errTh.join()
process.waitFor()
System.exit(process.exitValue())
My problem is that the output doesn't appear on the terminal in the correct order. Below is the wrapper's output.
[<cwd>] wrap.groovy -cwd -V -now n -b y -verbose ant target
waiting for interactive job to be scheduled ...
Your interactive job 2831303 has been successfully scheduled.
Establishing builtin session to host <host> ...
Buildfile: build.xml
BUILD FAILED
Target "target" does not exist in the project "null".
Total time: 0 seconds
Your job 2831303 ("wrap.groovy") has been submitted
Below is the unwrapped command output.
[<cwd>] qrsh -cwd -V -now n -b y -verbose ant target
Your job 2831304 ("ant") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 2831303 has been successfully scheduled.
Establishing builtin session to host host ...
Buildfile: build.xml
BUILD FAILED
Target "target" does not exist in the project "null".
Total time: 0 seconds
Why does the "Your job has been submitted" message appear as the first line in one cast and the last line in another? I'm guessing it's related to Java libraries, not Groovy.
This is because of buffering. The threads which read stdout and stderr will not process the output the moment it is written by the child process. Instead, both streams are buffered, so your process won't see anything unless the child flushes the streams).
When the data is on the way, which thread gets the CPU first? There is no way to tell. Even if the data for stderr arrives a few milliseconds before stdout, if the stdout thread has the CPU right now, it will get its data first.
What you could do is use Java NIO (channels) and a single thread and first process all output from stderr but that still wouldn't guarantee that the order is preserved. Because of the buffering between child and parent process, you could get 4KB of text from one stream before you see a single byte of the other.
Unfortunately, there is no cross-platform solution because Java doesn't have an API to merge the two streams into one. On Unix, you could run the command with sh -c cmd 2>&1. That would redirect stderr to stdout. In the parent process, you could then just read stdout and ignore stderr.
The same works for OS X (since it's Unix based). On Windows, you could install Perl or a similar tool to run the process; that allows you to mess with the file descriptors.
PS: Pray that args never contains spaces. String.execute() is a really bad way to run a process; use java.lang.ProcessBuilder instead.
Try putting System.out.flush after you do your println. If I am right, the messages are appearing in different orders because the System.out is being buffered.
I have a bat file with the following contents:
set logfile= D:\log.txt
java com.stuff.MyClass %1 %2 %3 >> %logfile%
when I run the bat file though, I get the following:
C:\>set logfile= D:\log.txt
C:\>java com.stuff.MyClass <val of %1> <val of %2> <val of %3> 1>>D:\log.txt
The parameter is incorrect.
I'm almost positive the "The parameter is incorrect." is due to the extraneous 1 in there. I also think this might have something with the encoding of the .bat file, but I can't quite figure out what is causing it. Anyone ever run into this before or know what might be causing it and how to fix it?
Edit
And the lesson, as always, is check if its plugged in first before you go asking for help. The bat file, in version control, uses D:\log.txt because it is intended to be run from the server which contains a D drive. When testing my changes and running locally, on my computer which doesn't have a D drive, I failed to make the change to use C:\log.txt which is what caused the error. Sorry for wasting you time, thanks for the help, try to resist the urge to downvote me too much.
I doubt that that's the problem - I expect the command processor to deal with that part for you.
Here's evidence of it working for me:
Test.java:
public class Test
{
public static void main(String args[]) throws Exception
{
System.out.println(args.length);
for (String arg : args)
{
System.out.println(arg);
}
}
}
test.bat:
set logfile= c:\users\jon\test\test.log
java Test %1 %2 %3 >> %logfile%
On the command line:
c:\Users\Jon\Test> [User input] test.bat first second third
c:\Users\Jon\Test>set logfile= c:\users\jon\test\test.log
c:\Users\Jon\Test>java Test first second third 1>>c:\users\jon\test\test.log
c:\Users\Jon\Test> [User input] type test.log
3
first
second
third
the 1 is not extraneous: it is inserted by cmd.exe meaning stdout (instead of ">>", you can also write "1>>". contrast this to redirecting stderr: "2>>"). so the problem must be with your parameters.
This may seem like a stupid question, but is there an existing D: drive in the context that the bat file runs in?
Once I had a case where a bat file was used as the command line of a task within the Task Manager, but the Run As user was set to a local user on the box, giving no access to network drives.
Interpolated for your case, if the D: drive were a network drive, running the bat file as, say, the local administrator account on that machine instead of a domain user account would likely fail to have access to D:.