Finding substring from a string using regex java - java

I have a String:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
I want to get the path (in this case D:\\workdir\\PV 81\\config\\sum81pv.pwf) from this string. This path is an argument of a command option -sn or -n, so this path always appears after these options.
The path may or may not contain whitespaces, which needs to be handled.
public class TestClass {
public static void main(String[] args) {
String path;
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
path = s.replaceAll(".*(-sn|-n) \"?([^ ]*)?", "$2");
System.out.println("Path: " + path);
}
}
Current output: Path: D:\workdir\PV 81\config\sum81pv.pwf -C 5000
Expected output: Path: D:\workdir\PV 81\config\sum81pv.pwf
Below Answers working fine for the earlier case.
i need a regex which return `*.pwf` path if the option is `-sn, -n, -s, -s -n, or without -s or -n.`
But if I have below case then what would be the regex to find password file.
String s1 = msqllab91 0 0 1 50 50 60 /mti/root/bin/msqlora -n "tmp/my.pwf" -s
String s2 = msqllab92 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.pwf
String s3 = msqllab93 0 0 1 50 50 60 msqlora -s -n "/mti/root/my.pwf" -C 10000
String s4 = msqllab94 0 0 1 50 50 60 msqlora.exe -sn /mti/root/my.pwf
String s5 = msqllab95 0 0 1 50 50 60 msqlora.exe -sn "/mti/root"/my.pwf
String s6 = msqllab96 0 0 1 50 50 60 msqlora.exe -sn"/mti/root"/my.pwf
String s7 = msqllab97 0 0 1 50 50 60 "/mti/root/bin/msqlora" -s -n /mti/root/my.pwf -s
String s8 = msqllab98 0 0 1 50 50 60 /mti/root/bin/msqlora -s
String s9 = msqllab99 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.NOTpwf -s -n /mti/root/my.pwf
String s10 = msqllab90 0 0 1 50 50 60 /mti/root/bin/msqlora -sn /mti/root/my.NOTpwf -sn /mti/root/my.pwf
String s11 = msqllab901 0 0 1 50 50 60 /mti/root/bin/msqlora
String s12 = msqllab902 0 0 1 50 50 60 /mti/root/msqlora-n NOTmy.pwf
String s13 = msqllab903 0 0 1 50 50 60 /mti/root/msqlora-n.exe NOTmy.pwf
i need a regex which return *.pwf path if the option is -sn, -n, -s, -s -n, or without -s or -n.
path contains *.pwf file extension only not NOTpwf or any other extension and code should all work except the last two because it is an invalid command.
Note: I already asked this type of question but didn't get anything working as per my requirement. (How to get specific substring with option vale using java)

You can use:
path = s.replaceFirst(".*\\s-s?n\\s*(.+?)(?:\\s-.*|$)", "$1");
//=> D:\workdir\PV 81\config\sum81pv.pwf
Code Demo
RegEx Demo

Try this
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
int l=s.indexOf("-sn");
int l1=s.indexOf("-C");
System.out.println(s.substring(l+4,l1-2));

You can also use : [A-Z]:.*\.\w+
Demo and Explaination

Rather than using complex regexps for replacing, I'd rather suggest a simpler one for matching:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
Pattern pattern = Pattern.compile("\\s-s?n\\s*(.*?)\\s*-C\\s+\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
// => D:\workdir\PV 81\config\sum81pv.pwf
See the IDEONE Demo
If the -C <NUMBER> is optional at the end, wrap with an optional group -> (?:\\s*-C\\s+\\d+)?$.
Pattern details:
\\s - a whitespace
-s?n - a -sn or -n (as s? matches an optional s)
\\s* - 0+ whitespaces
(.*?) - Group 1 matching any 0+ chars other than a newline
\\s* - ibid
-C - a literal -C
\\s+ - 1+ whitespaces
\\d+ - 1 or more digits
$ - end of string.

Related

Regex for csv file separated by spaces and optional quotes [duplicate]

This question already has answers here:
Python csv string to array
(10 answers)
Closed 3 years ago.
I have a csv file that is in this format:
22/09/2011 15:15:11 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 096 123456_110922_151511_000001M.jpg 123456 "DBx 4U02" 428008 100 95 "AB123CD"
22/09/2011 15:15:16 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 087 123456_110922_151516_000002M.jpg 123456 "DBx 4U02" 428008 100 95 "EF456GH"
22/09/2011 15:16:30 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 079 123456_110922_151630_000005M.jpg 123456 "DBx 4U02" 428008 200 96 "LM789NP"
And I need a regex to split each value correctly, for example the first line would be:
22/09/2011
15:15:11
"AT45 - Km 2 +300 Foo "
"PL - 0460"
70 096 123456_110922_151511_000001M.jpg
123456
"DBx 4U02"
428008
100
95
"AB123CD"
I have found this regex: ([^,"]+|"([^"]|)*"), but it doesn't do the job quite well.
Can somebody give me a good hint?
This kind of tasks are better handled with CSV parser. One of them is http://opencsv.sourceforge.net/ which allows us to specify your own separator (and many other things).
String csv =
"22/09/2011 15:15:11 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 096 123456_110922_151511_000001M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"AB123CD\"\n" +
"22/09/2011 15:15:16 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 087 123456_110922_151516_000002M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"EF456GH\"\n" +
"22/09/2011 15:16:30 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 079 123456_110922_151630_000005M.jpg 123456 \"DBx 4U02\" 428008 200 96 \"LM789NP\"";
CSVParser parser = new CSVParserBuilder().withSeparator(' ').build();
CSVReader reader = new CSVReaderBuilder(new StringReader(csv))
.withCSVParser(parser)
.build();
for (String[] row : reader){
for (String str : row){
System.out.println(str);
}
System.out.println("----");
}
Output (at least its beginning):
22/09/2011
15:15:11
AT45 - Km 2 +300 Foo
PL - 0460
70
096
123456_110922_151511_000001M.jpg
123456
DBx 4U02
428008
100
95
AB123CD
----

Echo binary data in php so that Java code

I want to read binary data from file and send it to remote Java App.
As I found here:
I can get it like this (part of my code):
else
{
$fp = fopen("binary file","rb");
$vector="";
while (!feof($fp)) {
// Read the file, in chunks of 16 byte
$data = fread($fp,16);
$arr = unpack("C*",$data);
foreach ($arr as $key => $value) {
$vector.=" ".$value;
}
$vector.="\n";
}
}
I send some headers
header("Content-Type: multipart/related; boundary=bounary----I don't know if boundary value is private".$eol);
header("MIME-Version: 1.0".$eol);
header("Connection: Keep-Alive".$eol);
header("Accept-Encoding: gzip, deflate".$eol);
header("Host: host".$eol.$eol);
header("Content-Type: multipart/related; boundary=bounary----I don't know if boundary value is private".$eol);
header("Content-Type: multipart/related; boundary=bounary----I don't know if boundary value is private".$eol);
Then I print it like this:
echo "--".$BOUNDARY.$eol;
echo "Content-Type: application/octet-stream".$eol;
echo "Content-Length: ".strlen($vector).$eol;
echo "Content-Transfer-Encoding: binary".$eol;
echo $eol.$vector.$eol;
echo "--".$BOUNDARY."--".$eol;
I test it in Advanced Rest Client Application and see binary data:
0 0 0 72 0 54 0 55 0 97 0 56 0 51 0 49
0 101 0 56 0 45 0 53 0 102 0 48 0 56 0 45
0 52 0 100 0 49 0 99 0 45 0 97 0 57 0 57
0 52 0 45 0 101 0 101 0 53 0 97 0 51 0 52
0 49 0 52 0 50 0 54 0 57 0 51 0 0 0 1
0 0 0 0 4 0 0 1 0 0 0 0 1 0 0 0...
But Java coder sayas that there is an empty string instead of binary data? How can I echo this binary data in proper way? What can cause this problem?
Update: We've found, that no matter what Content-Length header I set, in his app he receives header: Content-Length: 475
However in Advanced Rest Client I see my value of content-length. Well it can cause the problem. Can it be caused by php somehow?
Perhaps the 2 $eol's in this line is causing the subsequent headers to not be sent:
header("Host: host".$eol.$eol);
Try changing it to 1, or in fact I doubt you even need the EOL char in the string you send to header().

Transferring data structure from R to Java

I have an R script that does some computation. The last step of the computation is a kernel density estimate: http://www.inside-r.org/packages/cran/kerdiest/docs/kde
I now, in R, need to convert the result of calling kde into a string, or save it into a file, such that I can read and "unmarshal" it from a Java program.
What is the best format to use for the exchange and what R and Java libraries can read / write that format?
The structure is not ridiculously complex, but also not trivial:
> str(tmp)
List of 8
$ x : num [1:1398, 1:3] 1.035 0.902 0.679 0.826 1.243 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ eval.points:'data.frame': 1398 obs. of 3 variables:
..$ Rb ppm: num [1:1398] 1.035 0.902 0.679 0.826 1.243 ...
..$ Sb ppm: num [1:1398] -2.58 -2.6 -2.48 -2.44 -2.53 ...
..$ Cr ppm: num [1:1398] 4.56 4.44 4.3 4.26 4.49 ...
$ estimate : Named num [1:1398] 0.1572 0.0897 0.0311 0.0434 0.099 ...
..- attr(*, "names")= chr [1:1398] "1" "2" "3" "4" ...
$ H : num [1:3, 1:3] 0.02395 0.00927 -0.014 0.00927 0.06868 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ w : num [1:1398] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"
RJSONIO seems to do the job. It seems quite verbose, however.

Comparing lines in a file

I am trying to compare File 1 and File 2.
File 1:
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 1
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 1
File 2:
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 -1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 -1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
In both files the last element in each line is class label.
I am comparing if the class labels are equal.
ie compare the classlabel of
line1:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
with
line2:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
Matches.
compare
line1:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
with
line2:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
Not matches
Updated
What I did is
String line1;
String line2;
int notequalcnt = 0;
while((line1 = bfpart.readLine())!=null){
found = false;
while((line2 = bfin.readLine())!=null){
if(line1.equals(line2)){
found = true;
break;
}
else{
System.out.println("not equal");
notequalcnt++;
}
}
}
But I am getting every one as not equal.
Am I doing anything wrong.
After the first iteration itself, line2 becomes null. So, the loop will not execute again... Declare line2 buffer after the first while loop. Use this code:
public class CompareFile {
public static void main(String args[]) throws IOException{
String line1;
String line2;
boolean found;
int notequalcnt =0;
BufferedReader bfpart = new BufferedReader(new FileReader("file1.txt"));
while((line1 = bfpart.readLine())!=null){
found = false;
BufferedReader bfin = new BufferedReader(new FileReader("file2.txt"));
while((line2 = bfin.readLine())!=null){
System.out.println("line1"+line1);
System.out.println("line2"+line1);
if(line1.equals(line2)){
System.out.println("equal");
found = true;
break;
}
else{
System.out.println("not equal");
}
}
bfin.close();
if(found==false)
notequalcnt++;
}
bfpart.close();
}
}
You're comparing every line from file 1 with every line from file 2, and you are printing "not equal" every time any one of them doesn't match.
If file 2 has 6 lines, and you are looking for a given line from file 1 (say it's also in file 2), then 5 of the lines from file 2 won't match, and "not equal" will be output 5 times.
Your current implementation says "if any lines in file 2 don't match, it's not a match", but what you really mean is "if any lines in file 2 do match, it is a match". So your logic (pseudocode) should be more like this:
for each line in file 1 {
found = false
reset file 2 to beginning
for each line in file 2
if line 1 equals line 2
found = true, break.
if found
"found!"
else
"not found!"
}
Also you describe this as comparing "nth line of file 1 with nth line of file 2", but that's not actually what your implementation does. Your implementation is actually comparing the first line of file 1 with every line of file 2 then stopping, because you've already consumed every line of file 2 in that inner loop.
Your code has a lot of problems, and you probably need to sit back and work out your logic on paper first.
If the target is to compare and find the matching lines. Convert the file contents to an arraylist and compare the values.
Scanner s = new Scanner(new File("file1.txt"));
ArrayList<String> file1_list = new ArrayList<String>();
while (s.hasNext()){
file1_list .add(s.next());
}
s.close();
s = new Scanner(new File("file2.txt"));
ArrayList<String> file2_list = new ArrayList<String>();
while (s.hasNext()){
file2_list .add(s.next());
}
s.close();
for(String line1 : file1_list ){
if(file2_list.contains(line1)){
// found the line
}else{
// NOt found the line
}
}
Check Apache file Utils o compare files.
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html

Shuffle multiple files in same order

Setup:
I have 50 files, each with 25000 lines.
To-do:
I need to shuffle all of them "in the same order".
E.g.:
If before shuffle:
File 1 File 2 File 3
A A A
B B B
C C C
then after shuffle I should get:
File 1 File 2 File 3
B B B
C C C
A A A
i.e. corresponding rows in files should be shuffled in same order.
Also, the shuffle should be deterministic, i.e. if I give File A as input, it should always produce same shuffled output.
I can write a Java program to do it, probably a script to. Something like, shuffle number between 1 and 25000 and store that in a file, say shuffle_order. Then simply process one file at a time and order existing rows according to shuffle_order. But is there a better/quick way to do this?
Please let me know if more info needed.
The next uses only basic bash commands. The principe is:
generate a random order (numbers)
order all files in this order
the code
#!/bin/bash
case "$#" in
0) echo "Usage: $0 files....." ; exit 1;;
esac
ORDER="./.rand.$$"
trap "rm -f $ORDER;exit" 1 2
count=$(grep -c '^' "$1")
let odcount=$(($count * 4))
paste -d" " <(od -A n -N $odcount -t u4 /dev/urandom | grep -o '[0-9]*') <(seq -w $count) |\
sort -k1n | cut -d " " -f2 > $ORDER
#if your system has the "shuf" command you can replace the above 3 lines with a simple
#seq -w $count | shuf > $ORDER
for file in "$#"
do
paste -d' ' $ORDER $file | sort -k1n | cut -d' ' -f2- > "$file.rand"
done
echo "the order is in the file $ORDER" # remove this line
#rm -f $ORDER # and uncomment this
# if dont need preserve the order
paste -d " " *.rand #remove this line - it is only for showing test result
from the input files:
A B C
--------
a1 a2 a3
b1 b2 b3
c1 c2 c3
d1 d2 d3
e1 e2 e3
f1 f2 f3
g1 g2 g3
h1 h2 h3
i1 i2 i3
j1 j2 j3
will make A.rand B.rand C.rand with the next example content
g1 g2 g3
e1 e2 e3
b1 b2 b3
c1 c2 c3
f1 f2 f3
j1 j2 j3
d1 d2 d3
h1 h2 h3
i1 i2 i3
a1 a2 a3
real testing - genereting 50 files with 25k lines
line="Consequatur qui et qui. Mollitia expedita aut excepturi modi. Enim nihil et laboriosam sit a tenetur."
for n in $(seq -w 50)
do
seq -f "$line %g" 25000 >file.$n
done
running the script
bash sorter.sh file.??
result on my notebook
real 1m13.404s
user 0m56.127s
sys 0m5.143s
Probably very inefficient but try below:
#!/bin/bash
arr=( $(for i in {1..25000}; do
echo "$i"
done | shuf) )
for file in files*; do
index=0
new=$(while read line; do
echo "${arr[$index]} $line"
(( index++ ))
done < "$file" | sort -h | sed 's/^[0-9]\+ //')
echo "$new" > "$file"
done
I propose to shuffle them with a python script. By setting the same seed for every shuffling, you will obtain the same final data order.
import argparse
import logging
import os
import random
from tqdm import tqdm
logging.getLogger().setLevel(logging.INFO)
def main(args):
assert os.path.isfile(args.input_file), (
f"filename {args.input_file} does not exist"
)
logging.info("Reading input file...")
with open(args.input_file) as fi:
data = fi.readlines()
logging.info("Generating indexes")
indexes = list(range(len(data)))
logging.info("Shuffling...")
random.seed(args.seed)
random.shuffle(indexes)
logging.info(f"Writing results, in place? {args.in_place}")
if not args.in_place:
name, ext = os.path.splitext(args.input_file)
new_filename = name + "_shuffled" + ext
args.input_file = new_filename
with open(args.input_file, "w") as fo:
for index in tqdm(indexes, desc="Writing to output file..."):
fo.write(data[index])
fo.flush()
os.fsync(fo)
logging.info("Done!")
if __name__ == '__main__':
parser = argparse.ArgumentParser("Shuffle file by lines.")
parser.add_argument('--input_file', type=str, required=True, help="Input file to be shuffled")
parser.add_argument('--in_place', action="store_true", help="Whether to shuffle file in-place.")
parser.add_argument('--seed', type=int, required=True, help="Seed with which the file will be shuffled.")
args = parser.parse_args()
main(args)
You can run this script with:
python shuffle.py --input_file File1 --seed 123
python shuffle.py --input_file File1 --seed 123
python shuffle.py --input_file File1 --seed 123
And all the files will be shuffled in the same way.

Categories