How to get specific substring with option vale using java - java

I have a string and from this string, I want to get password file path which is identified by an option (-sn).
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\workdir\PV_81\config\sum81pv.pwf -C 5000"
above line is a configuration line which can be with either -sn or -n.
please suggest how to get D:\workdir\PV_81\config\sum81pv.pwf line from above string or the string may be with quoted string.
below is my code which check only -sn option but I want to check with either -sn or -n .
if ( s.matches( "^\\s*msql.*$" ) )
{
StringTokenizer st = new StringTokenizer( s, " " );
while ( st.hasMoreTokens() )
{
if ( st.nextToken().equals( "-sn" ) )
{
pwf = st.nextToken();
}
}
}
I want to use StreamTokenizer instead of StringTokenizer class and get D:\workdir\PV_81\config\sum81pv.pwf
this path may be containing spaces in it.
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\workdir\PV_81\config\sum81pv.pwf -C 5000"
if ( s.matches( "^\\s*msql.*$" ) )
{
StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF)
{
System.out.println(tokenizer.sval);
}
}

You should use a regular expression to detect that option in a more general way. If you want a quick fix you can use the OR operator in your if but each time that new operations appear your if will grow and it's a bad idea.
if ( s.matches( "^\\s*msql.*$" ) )
{
StringTokenizer st = new StringTokenizer( s, " " );
while ( st.hasMoreTokens() )
{
string token = st.nextToken();
if ( token.equals( "-sn" ) || token.equals("-n" ) )
{
pwf = st.nextToken();
}
}
}

As pointed out on this answer, you could use any good command line arguments parser, like:
Commons CLI
http://commons.apache.org/cli/
Java Gems
http://code.google.com/p/javagems/
JArgs
http://jargs.sourceforge.net/
GetOpt
http://www.urbanophile.com/arenn/hacking/download.html
More Q&A on command like arguments: this and this.

Use regex, as given in this example
public static void main(String[] args) {
System.out.println(findString("msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV_81\\config\\sum81pv.pwf -C 5000"));
System.out.println(findString("msqlsum81pv 0 0 25 25 25 2 -s D:\\workdir\\PV_81\\config\\sum81pv.pwf -C 5000"));
System.out.println(findString("msqlsum81pv 0 0 25 25 25 2 -sn \"D:\\workdir\\PV_81\\config\\sum81pv.pwf\" -C 5000"));
System.out.println(findString("msqlsum81pv 0 0 25 25 25 2 -s \"D:\\workdir\\PV_81\\config\\sum81pv.pwf\" -C 5000"));
}
private static String findString(String inputCommand) {
String path;
if(inputCommand.matches(".*(-sn|-s) \"+.*")) {
path = inputCommand.replaceAll(".*(-sn|-s) \"?([^\"]*)?.*", "$2");
} else {
path = inputCommand.replaceAll(".*(-sn|-s) \"?([^ ]*)?.*", "$2");
}
return path;
}
O/P
D:\\workdir\\PV_81\\config\\sum81pv.pwf
D:\\workdir\\PV_81\\config\\sum81pv.pwf
D:\\workdir\\PV_81\\config\\sum81pv.pwf
D:\\workdir\\PV_81\\config\\sum81pv.pwf
Edit: note you might need to modify this if the path could contain whitespace. Then you might want to check until -C or allways escape the whole path and check when the next " will appear.

definitely use regular expression,my answer is below
public static String extractString(final String input) {
String ret = null;
final Pattern pattern = Pattern.compile("(..\\\\.*\\.*?)(?:\"?)\\p{Space}");
final Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
ret = matcher.group(1);
}
return ret;
}
basically, i search from first '\' to first space after dot, and extract this substring, use capture group to filter quote mark if there is one
therefore it doesnt matter where this substring is in this cmd string

Related

How can I read all the bytes in a text file including new lines?

I'm trying to write a simple class to detect the line terminator of a text file.
The idea is quite simple: count the occurrences of these three patterns ~ \n ~ \r ~ \r\n
in the beginning of the file and return the pattern associated to the higher counter.
What I haven't got in two days is how to read the LF character (\n) in a variable (I'm on OS X). Everything I've tried so far seems to avoid LF from being read. I know that's a typical behavior of Reader classes, but I also got the same problem using DataInputStream
import java.io.File
import java.io.InputStream
import java.io.FileInputStream
import java.io.BufferedInputStream
import scala.io.Codec
object EolDetection {
def detect(file: File)(implicit codec: Codec): String = {
detect(new FileInputStream(file))
}
def detect(is: InputStream)(implicit codec: Codec): String = {
detect( new BufferedInputStream(is))
}
def detect(bs: BufferedInputStream): String = {
var LFcnt = 0 ; var CRLFcnt = 0 ; var CRcnt = 0 ; var wasCR = false
try {
var ascii = bs.read()
while ( ascii > -1 && (LFcnt + CRLFcnt + CRcnt) < 11 ) {
ascii match {
case 13 => wasCR = true ; debug("A")
case 10 => if (wasCR) CRLFcnt += 1 else LFcnt += 1 ; wasCR = false ; debug("B")
case _ => if (wasCR) CRcnt += 1 ; wasCR = false; debug("C")
}
ascii = bs.read()
}
debug(s"\nLF=$LFcnt CRLF=$CRLFcnt CR=$CRcnt\n")
var sep = "\n"
if (LFcnt > CRLFcnt && LFcnt > CRcnt) sep = "\n"
if (CRLFcnt > LFcnt && CRLFcnt > CRcnt) sep = "\r\n"
if (CRcnt > CRLFcnt && CRcnt > LFcnt) sep = "\r"
sep
} finally {
bs.close()
}
}
def debug(msg: String) = printf(msg)
}
Any suggestion? If you know a project with such feature to add as a dependency, please let me know.

How to split a java string at backslash and underscore

i have this string config_wdiCore_20_2.xls
I want to split this for having in output this result:
module: wdiCore
version: 20_2
My Java:
String XLS_PPT_FILE = "D:\\xxx\\Excel\\yyyy\\config_wdiCore_20_2.xls"
String[] path = XLS_PPT_FILE.split("\\\\");
String namePath = path[path.length-1];
println(namePath);
Output:
config_wdiCore_20_2.xls
How can i split this output to have the coe module and the code version ?
UPDATE:
namePath.split("_")
Ouptut:
namePath.split("_", 3)
Split based on underscore that is not preceded by digit and dot and get the value from desired index.
(?<!\d)_|\.
Online demo
Alternatively you can use Positive Lookbehind instead of Negative Lookbehind as well
(?<=\D)_|\.
output array:
[0] > config
[1] > wdiCore
[2] > 20_2
[3] > xls
Get the desired values from captured groups at index 1 and 2
([^_]*)_(\d+(_\d+)?)\.
Online demo
Sample code:
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if(m.find()){
String module = m.group(1);
String version = m.group(2);
}
You could use String.lastIndexOf(int) to get the last back-slash. Then strip off the extension ".xls" and finally split by _ with a limit of 3 (so your version stays one String). Something like,
public static void main(String[] args) {
String XLS_PPT_FILE = "D:\\xxx\\Excel\\yyyy\\config_wdiCore_20_2.xls";
int pos = XLS_PPT_FILE.lastIndexOf('\\');
String baseName = (pos > -1) ? XLS_PPT_FILE.substring(pos + 1)
: XLS_PPT_FILE;
pos = baseName.indexOf(".xls");
if (pos > -1) {
baseName = baseName.substring(0, pos);
}
String[] parts = baseName.split("\\_", 3);
System.out.printf("module: %s%nversion: %s%n", parts[1], parts[2]);
}
Output is (as requested)
module: wdiCore
version: 20_2
namePath.split("_", 3)
Split the namePathstring on the first 2 _ characters (returning a list of 3 strings)

Matching a content within a large text using reg ex in java

Problem : I need to match a content within a large text (Wikipedia dump consisting of xml pages) in java.
Content required: Infobox
Reg ex used : "\\{\\{Infobox(.*?)\\}\\}"
Issue: the above pattern matches the first occurrence of }} within the infobox and if I remove the ? character in the reg ex, the pattern matches the last occurrence. But, I am looking for extracting just the infobox and }} should match the end of the info box.
Ex info box:
{{infobox RPG
|title= Amber Diceless Roleplaying Game
|image= [[Image:Amber DRPG.jpg|200px]]
|caption= Cover of the main ''Amber DRPG'' rulebook (art by [[Stephen Hickman]])
|designer= [[Erick Wujcik]]
|publisher= [[Phage Press]]<br>[[Guardians of Order]]
|date= 1991
|genre= [[Fantasy]]
|system= Custom (direct comparison of statistics without dice)
|footnotes=
}}
Code snippet:
String regex = "\\{\\{Infobox(.*?)\\}\\}";
Pattern p1 = Pattern.compile(regex, Pattern.DOTALL);
Matcher m1 = p1.matcher(xmlPage.getText());
String workgroup = "";
while(m1.find()){
workgroup = m1.group();
}
The solution depends upon the nesting depth of {{ .. }} blocks inside the infobox block. If the inside blocks don't nest, that is there are {{ ... }} blocks but NOT {{ .. {{ .. }} .. }} blocks then you can try the regex: infobox([^\\{]*(\\{\\{[^\\}]*\\}\\})*.*?)\\}\\}
I tested this on the string: "A {{ start {{infobox abc {{ efg }} hij }}end }} B" and was able to match " abc {{ efg }} hij "
If the nesting of {{ .. }} blocks is deeper then a regex won't help because you can't specify to the regex engine how big the inner block is. To achieve that you need to count the number of opening {{ and closing }} sequences and extract the string in that fashion. That means you would be better off reading the text one character at a time and processing it.
Explanation of regex:
We start with infobox and then open the group capture parenthesis. We then look for a string of characters which are NOT {.
Following that we look for zero or more "groups" of the form {{ .. }} (BUT with no nested blocks there-in). Nesting is not allowed here because we use [^\\}] to look for the end of the block by only allowing non-} characters inside the block.
Finally we accept the characters just prior to the closing }}.
You shloud try this regex:
String regex = "\\{\\{[Ii]nfobox([^\\}].*\\n+)*\\}\\}";
or
Pattern pattern = Pattern.compile("\\{\\{[Ii]nfobox([^\\}].*\\n+)*\\}\\}");
Explanation : the above regex expression looks for
1 . \\{\\{ - matches two {{
2. [Ii]nfobox - matches Infobox or infobox
3. ([^\\}\\}].*\\n+)* - matches the body of the infobox (the body doesn't contain }} and contains any kind of characters any number of times )
----3.a. [^\\}] - matches everything except }
----3.b. .* - matches any character any number of times
----3.c. \n+ - matches new line 1 or more times
4. \\}\\} - matches - ends with }}
If your xmlPage.getText() will return content similar to this:
{{infobox ... }}{infobox .... {{ nested stuff }} }}{{infobox ... }}
where you will have both multiple infoboxes on the same level and also nested stuff ( and the nested level can be anything ) then you can't use regexp to parse the content. Why ? because the structure behaves in similar way to html or xml and thus it behaves not like a regular structure. You can find multiple answers on the topic "regexp and html" to find good explanation to this problem. For example here:
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
But if you can guarantee that you won't have multiple infoboxes on the same level but only nested ones then you can parse the doc removing '?'.
public static void extractValuesTest(String[] args) {
String payloadformatstr= "selected card is |api:card_number| with |api:title|";
String receivedInputString= "siddiselected card is 1234567 with dbs card";
int firstIndex = payloadformatstr.indexOf("|");
List<String> slotSplits= extarctString(payloadformatstr, "\\|(.*?)\\|");
String[] mainSplits = payloadformatstr.split("\\|(.*?)\\|");
int mainsplitLength = mainSplits.length;
int slotNumber=0;
Map<String,String> parsedValues = new HashMap<>();
String replaceString="";
int receivedstringLength = receivedInputString.length();
for (String slot : slotSplits) {
String[] slotArray = slot.split(":");
int processLength = slotArray !=null ? slotArray.length : 0;
String slotType = null;
String slotKey = null;
if(processLength == 2){
slotType = slotArray[0];
slotKey = slotArray[1];
}
/*String slotBefore= (firstIndex != 0 && slotNumber < mainsplitLength) ? mainSplits[slotNumber]:"";
String slotAfter= (firstIndex != 0 && slotNumber+1 < mainsplitLength) ? mainSplits[slotNumber+1]:"";
int startIndex = receivedInputString.indexOf(slotBefore)+slotBefore.length();
int endIndex = receivedInputString.indexOf(slotAfter);
String extractedValue = receivedInputString.substring(startIndex, endIndex);*/
String slotBefore= (firstIndex != 0 && slotNumber < mainsplitLength) ? mainSplits[slotNumber]:null;
String slotAfter= (firstIndex != 0 && slotNumber+1 < mainsplitLength) ? mainSplits[slotNumber+1]:null;
int startIndex = StringUtils.isEmpty(slotBefore) ? 0:receivedInputString.indexOf(slotBefore)+slotBefore.length();
//int startIndex = receivedInputString.indexOf(slotBefore)+slotBefore.length();
int endIndex = StringUtils.isEmpty(slotAfter) ? receivedstringLength: receivedInputString.indexOf(slotAfter);
String extractedValue = (endIndex != receivedstringLength) ? receivedInputString.substring(startIndex, endIndex):
receivedInputString.substring(startIndex);
System.out.println("Extracted value is "+extractedValue);
parsedValues.put(slotKey, extractedValue);
replaceString+=slotBefore+(extractedValue != null ? extractedValue:"");
//String extractedValue = extarctSlotValue(receivedInputString,slotBefore,slotAfter);
slotNumber++;
}
System.out.println(replaceString);
System.out.println(parsedValues);
}
public static void replaceTheslotsWithValues(String payloadformatstr,String receivedInputString,String slotPattern,String statPatternOfSlot) {
payloadformatstr= "selected card is |api:card_number| with |api:title|.";
receivedInputString= "selected card is 1234567 with dbs card.";
slotPattern="\\|(.*?)\\|";
statPatternOfSlot="|";
int firstIndex = payloadformatstr.indexOf(statPatternOfSlot);
List<String> slotSplits= extarctString(payloadformatstr, slotPattern);
String[] mainSplits = payloadformatstr.split(slotPattern);
int mainsplitLength = mainSplits.length;
int slotNumber=0;
Map<String,String> parsedValues = new HashMap<>();
String replaceString="";
for (String slot : slotSplits) {
String[] slotArray = slot.split(":");
int processLength = slotArray !=null ? slotArray.length : 0;
String slotType = null;
String slotKey = null;
if(processLength == 2){
slotType = slotArray[0];
slotKey = slotArray[1];
}
String slotBefore= (firstIndex != 0 && slotNumber < mainsplitLength) ? mainSplits[slotNumber]:"";
String slotAfter= (firstIndex != 0 && slotNumber+1 < mainsplitLength) ? mainSplits[slotNumber+1]:"";
int startIndex = receivedInputString.indexOf(slotBefore)+slotBefore.length();
int endIndex = receivedInputString.indexOf(slotAfter);
String extractedValue = receivedInputString.substring(startIndex, endIndex);
System.out.println("Extracted value is "+extractedValue);
parsedValues.put(slotKey, extractedValue);
replaceString+=slotBefore+(extractedValue != null ? extractedValue:"");
//String extractedValue = extarctSlotValue(receivedInputString,slotBefore,slotAfter);
slotNumber++;
}
System.out.println(replaceString);
System.out.println(parsedValues);
}

cutting column from file in java

I've searched and can't find my question.
I've saved file with linux output ls -l which content is:
drwxr-xr-x 2 usr usr 4096 Jan 20 17:49 file1
drwxrwxr-x 4 usr usr 4096 Jan 20 18:00 file2
drwx------ 2 usr usr 4096 Feb 3 08:48 catalog1
And I want to leave for example only eighth column with hour, and cut off rest of it. What should I do? I'm very beginner with java and programming.
You can use a regular expression to match the timestamp (since it's guaranteed that a time-like value will not appear in any of the other fields). Something like:
// Populate this with the output of the ls -l command
String input;
// Create a regular expression pattern to match.
Pattern pattern = Pattern.compile("\\d{2}:\\d{2}");
// Create a matcher for this pattern on the input string.
Matcher matcher = pattern.matcher(input);
// Try to find instances of the given regular expression in the input string.
while (matcher.find()){
System.out.println(matcher.group());
}
To retrieve any arbitrary column, you can opt to write a regular expression for whichever column you're trying to retrieve, or you may wish to just split each row on the space character, then select by index. For example, to get all of the filesizes:
String input;
String[] inputLines = input.split("\n");
for (String inputLine : inputLines) {
String[] columns = inputLine.split(" ");
System.out.println(columns[4]); // Where 4 indicates the filesize column
}
You need to use StringTokenizer to extract out the exact information that you are looking for. Try the following code:
String value = "drwxr-xr-x 2 usr usr 4096 Jan 20 17:49 file1\n"+
"drwxrwxr-x 4 usr usr 4096 Jan 20 18:00 file2\n"+
"drwx------ 2 usr usr 4096 Feb 3 08:48 catalog1";
StringBuffer sBuffer = new StringBuffer(10);
StringTokenizer sTokenizer = new StringTokenizer(value,"\n");
while (sTokenizer.hasMoreTokens())
{
String sValue = sTokenizer.nextToken();
StringTokenizer sToken = new StringTokenizer(sValue," ");
int counter = 0;
while (sToken.hasMoreTokens())
{
String token = sToken.nextToken();
counter++;
if (counter == 8)//8 is the column that you want to leave.
{
sBuffer.append(token+"\n");
break;
}
}
}
System.out.println(sBuffer.toString());

how to extract this using regex

I need to extract this
Example:
www.google.com
maps.google.com
maps.maps.google.com
I need to extraact google.com from this.
How can I do this in Java?
Split on . and pick the last two bits.
String s = "maps.google.com";
String[] arr = s.split("\\.");
//should check the size of arr here
System.out.println(arr[arr.length-2] + '.' + arr[arr.length-1]);
Assuming you want to get the top level domain out of the hostname, you could try this:
Pattern pat = Pattern.compile( ".*\\.([^.]+\\.[^.]+)" ) ;
Matcher mat = pat.matcher( "maps.google.com" ) ;
if( mat.find() ) {
System.out.println( mat.group( 1 ) ) ;
}
if it's the other way round, and you want everything excluding the last 2 parts of the domain (in your example; www, maps, and maps.maps), then just change the first line to:
Pattern pat = Pattern.compile( "(.*)\\.[^.]+\\.[^.]+" ) ;
Extracting a known substring from a string doesn't make much sense ;) Why would you do a
String result = address.replaceAll("^.*google.com$", "$1");
when this is equal:
String result = "google.com";
If you need a test, try:
String isGoogle = address.endsWith(".google.com");
If you need the other part from a google address, this may help:
String googleSubDomain = address.replaceAll(".google.com", "");
(hint - the first line of code is a solution for your problem!)
String str="www.google.com";
try{
System.out.println(str.substring(str.lastIndexOf(".", str.lastIndexOf(".") - 1) + 1));
}catch(ArrayIndexOutOfBoundsException ex){
//handle it
}
Demo

Categories