I have code that i'm running to get a list of user groups from the command line of a given user, using the following code:
private ArrayList<String> accessGroups = new ArrayList<String>();
public void setAccessGroups(String userName) {
try {
Runtime rt = Runtime.getRuntime();
Process pr = rt.exec("/* code to get users */");
BufferedReader input = new BufferedReader(new InputStreamReader(pr.getInputStream()));
String line = null;
// This code needs some work
while ((line = input.readLine()) != null){
System.out.println("#" + line);
String[] temp;
temp = line.split("\\s+");
if(line.contains("GRPNAME-")) {
for(int i = 0; i < temp.length; i++){
accessGroups.add(temp[i]);
}
}
}
// For debugging purposes, to delete
System.out.println(accessGroups);
} catch (IOException e) {
e.printStackTrace();
}
}
The code to get users returns a result containing the following:
#Local Group Memberships *localgroup1 *localgroup2
#Global Group memberships *group1 *group2
# *group3 *group4
# *GRPNAME-1 *GRPNAME-2
The code is designed to extract anything beginning with GRPNAME-. This works fine, it's just if I print the ArrayList I get:
[, *GRPNAME-1, *GRPNAME-2]
There's an reference to a string of "". Is there a simple way I can alter the regex, or another solution I could try to remove this from occurring at the point of being added.
The expected output is:
[*GRPNAME-1, *GRPNAME-2]
Edit: answered, edited output to reflect changes in code.
Instead of this tokenization as presented from this snippet:
line.split("\\s+");
Use a pattern to match \S+ and add them to your collection. For example:
// Class level
private static final Pattern TOKEN = Pattern.compile("\\S+");
// Instance level
{
Matcher tokens = TOKEN.matcher(line);
while (tokens.find())
accessGroups.add(tokens.group());
}
Simple answer in the end, in place of:
temp = line.split("\\s+");
use:
temp = line.trim().split("\\s+");
Related
I am trying to build an array of processes running on my machine; to do so I have been trying to use the following two commands:
tasklist /fo csv /nh # For a CSV output
tasklist /nh # For a non-CSV output
The issue that I am having is that I can not properly parse the output.
First Scenario
I have a line like:
"wininit.exe","584","Services","0","5,248 K"
Which I have attempted to parse using "".split(","), however this fails when it comes to the process memory usage - the comma in the number field willl result in an extra field.
Second Scenario
Without the non-CSV output, I have a line like:
wininit.exe 584 Services 0 5,248 K
Which I am attempting to parse using "".split("\\s+") however this one now fails on a process like System Idle Process, or any other process with a space in the executible name.
How can I parse either of these output such that the same split index will always contain the correct data column?
To parse a string, always prefer the most strict formatting. In this case, CSV. In this way, you could process each line with a regular expression containing FIVE groups:
private final Pattern pattern = Pattern
.compile("\\\"([^\\\"]*)\\\",\\\"([^\\\"]*)\\\",\\\"([^\\\"]*)\\\",\\\"([^\\\"]*)\\\",\\\"([^\\\"]*)\\\"");
private void parseLine(String line) {
Matcher matcher = pattern.matcher(line);
if (!matcher.find()) {
throw new IllegalArgumentException("invalid format");
}
String name = matcher.group(1);
int pid = Integer.parseInt(matcher.group(2));
String sessionName = matcher.group(3);
String sessionId = matcher.group(4);
String memUsage = matcher.group(5);
System.out.println(name + ":" + pid + ":" + memUsage);
}
You should use a StringTokenizer class instead of split. You use the " delimiter and expect the delimiter to be returned. You can then use that delimiter to provide field separation. For instance,
StringTokenizer st = new StringTokenizer(input, "\"", true);
State state = NONE;
while (st.hasMoreTokens()) {
String t = st.nextToken();
switch(state) {
case NONE:
if ("\"".equals(t)) {
state = BEGIN;
}
// skip the ,
break;
case BEGIN:
// Store t in which entry it correspond to.
state = END;
break;
case END:
state = NONE;
break;
}
}
Each token will be stored within its respective data set and you can then process that information for each Process.
Tried this and seems to work.
public void parse(){
try {
Runtime runtime = Runtime.getRuntime();
Process proc = runtime.exec("tasklist -fo csv /nh");
BufferedReader stdInput = new BufferedReader(new
InputStreamReader(proc.getInputStream()));
String line = "";
while ((line = stdInput.readLine()) != null) {
System.out.println();
for (String column: line.split("\"")){
if (!column.equals(",")&& !column.equals("")){
System.out.print("["+column+"]");
}
}
}
}catch (Exception e){
e.printStackTrace();
}
}
I want to counter the lines of the file and in the second pass i want to take every single line and manipulating it. It doesn't have a compilation error but it can't go inside the second while ((line = br.readLine()) != null) .
Is there a different way to get the lines(movies) of the file and storing in an array ?
BufferedReader br = null;
try { // try to read the file
br = new BufferedReader(new FileReader("movies.txt"));
String line;
int numberOfMovies = 0;
while ((line = br.readLine()) != null) {
numberOfMovies++;
}
Movie[] movies = new Movie[numberOfMovies]; // store in a Movie
// array every movie of
// the file
String title = "";
int id = 0;
int likes = 0;
int icounter = 0; // count to create new movie for each line
while ((line = br.readLine()) != null) {
line = line.trim();
line = line.replaceAll("/t", "");
line = line.toLowerCase();
String[] tokens = line.split(" "); // store every token in a
// string array
id = Integer.parseInt(tokens[0]);
likes = Integer.parseInt(tokens[tokens.length]);
for (int i = 1; i < tokens.length; i++) {
title = title + " " + tokens[i];
}
movies[icounter] = new Movie(id, title, likes);
icounter++;
}
} catch (IOException e) {
e.printStackTrace();
}
Simplest way would be to reset br again.
try { // try to read the file
br = new BufferedReader(new FileReader("movies.txt"));
String line; int numberOfMovies = 0;
while (br.hasNextLine()){
numberOfMovies++;
}
br.close();
Movie[] movies = new Movie[numberOfMovies];
// store in a Movie
// array every movie of
// the file
String title = "";
int id = 0;
int likes = 0;
int icounter = 0;
// count to create new movie for each line
br = new BufferedReader(new FileReader("movies.txt"));
while ((br.hasNextLine()) {
line = line.trim();
line = line.replaceAll("/t", "");
line = line.toLowerCase();
String[] tokens = line.split(" ");
// store every token in a
// string array
id = Integer.parseInt(tokens[0]);
likes = Integer.parseInt(tokens[tokens.length]);
for (int i = 1; i < tokens.length; i++) {
title = title + " " + tokens[i];
}
movies[icounter] = new Movie(id, title, likes);
icounter++;
}
} catch (IOException e) { e.printStackTrace(); }
I changed br.nextLine() != null to br.hasNextLine() because it's shorter and more appropriate in this case. Plus it won't consume a line.
There are two things here:
InputStreams and Readers are one-shot structures: once you've read them to the end, you either need to explicitly rewind them (if they support rewinding), or you need to close them (always close your streams and readers!) and open a new one.
However in this case the two passes are completely unnecessary, just use a dynamically growing structure to collect your Movie objects instead of arrays: an ArrayList for example.
Firstly, there is no need to read the file twice.
Secondly, why don't you use the java.nio.file.Files class to read your file.
It has a method readAllLines(Path path, Charset cs) that gives you back a List<String>.
Then if you want to know how many lines just call the size() method on the list and you can use the list to construct the Movie objects.
List<Movie> movieList = new ArrayList<>();
for (String line : Files.readAllLines(Paths.get("movies.txt"), Charset.defaultCharset())) {
// Construct your Movie object from each individual line and add to the list of Movies
movieList.add(new Movie(id, title, likes));
}
The use of the Files class also reduces your boilerplate code as it will handle closing the resource when it has completed reading meaning you will not need a finally block to close anything.
If you use the same Reader, everything is already read once you reach the second loop.
Close the first Reader, then create another one to read a second time.
You are running through the file with the BufferedReader, until the nextline points towards null. As your BufferedReader IS null, it won't even enter the second while((line = br.readline) != null), as the first read line is null.
Try getting a new BufferedReader. something like this:
...
int id = 0;
int likes = 0;
int icounter = 0;
br = new BufferedReader(new FileReader("movies.txt")) //Re-initialize the br to point
//onto the first line again
while ((line = br.readLine()) != null)
...
EDIT:
Close the reader first..
This is a combination of a couple of other answers already on this post, but this is how I would go about rewriting your code to populate a List. This doubly solves the problem of 1) needing to read the file twice 2) removing the boilerplate around using BufferedReader while using Java8 Streams to make the initializing of your List as concise as possible:
private static class Movie {
private Movie(int id, String title, int likes) {
//TODO: set your instance state here
}
}
private static Movie movieFromFileLine(String line) {
line = line.trim();
line = line.replaceAll("/t", "");
line = line.toLowerCase();
String[] tokens = line.split(" "); // store every token in a
String title = "";
int id = Integer.parseInt(tokens[0]);
int likes = Integer.parseInt(tokens[tokens.length]);
for (int i = 1; i < tokens.length; i++) {
title = title + " " + tokens[i];
}
return new Movie(id, title, likes);
}
public static void main(String[] args) throws IOException {
List<Movie> movies = Files.readAllLines(Paths.get("movies.txt"), Charset.defaultCharset()).stream().map
(App::movieFromFileLine).collect(Collectors.toList());
//TODO: Make some magic with your list of Movies
}
For cases where you absolutely need to read a source (file, URL, or other) twice, then you need to be aware that it is quite possible for the contents to change between the first and second readings and be prepared to handle those differences.
If you can make a reasonable assumption that the content of the source will fit in to memory and your code fully expects to work on multiple instances of Readers/InputStreams, you may first consider using an appropriate IOUtils.copy method from commons-io to read the contents of the source and copy it to a ByteArrayOutputStream to create a byte[] that can be re-read over and over again.
I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}
My program needs to read from a multi-lined .ini file, I've got it to the point it reads every line that start with a # and prints it. But i only want to to record the value after the = sign. here's what the file should look like:
#music=true
#Volume=100
#Full-Screen=false
#Update=true
this is what i want it to print:
true
100
false
true
this is my code i'm currently using:
#SuppressWarnings("resource")
public void getSettings() {
try {
BufferedReader br = new BufferedReader(new FileReader(new File("FileIO Plug-Ins/Game/game.ini")));
String input = "";
String output = "";
while ((input = br.readLine()) != null) {
String temp = input.trim();
temp = temp.replaceAll("#", "");
temp = temp.replaceAll("[*=]", "");
output += temp + "\n";
}
System.out.println(output);
}catch (IOException ex) {}
}
I'm not sure if replaceAll("[*=]", ""); truly means anything at all or if it's just searching for all for of those chars. Any help is appreciated!
Try following:
if (temp.startsWith("#")){
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
Explanation:
To process lines only starting with desired character use String#startsWith method. When you have string to extract values from, String#split will split given text with character you give as method argument. So in your case, text before = character will be in array at position 0, text you want to print will be at position 1.
Also note, that if your file contains many lines starting with #, it should be wise not to concatenate strings together, but use StringBuilder / StringBuffer to add strings together.
Hope it helps.
Better use a StringBuffer instead of using += with a String as shown below. Also, avoid declaring variables inside loop. Please see how I've done it outside the loop. It's the best practice as far as I know.
StringBuffer outputBuffer = new StringBuffer();
String[] fields;
String temp;
while((input = br.readLine()) != null)
{
temp = input.trim();
if(temp.startsWith("#"))
{
fields = temp.split("=");
outputBuffer.append(fields[1] + "\n");
}
}
I'm trying import CSV file to Arraylist using StringTokenizer:
public class Test
{
public static void main(String [] args)
{
List<ImportedXls> datalist = new ArrayList<ImportedXls>();
try
{
FileReader fr = new FileReader("c:\\temp.csv");
BufferedReader br = new BufferedReader(fr);
String stringRead = br.readLine();
while( stringRead != null )
{
StringTokenizer st = new StringTokenizer(stringRead, ",");
String docNumber = st.nextToken( );
String note = st.nextToken( ); /** PROBLEM */
String index = st.nextToken( ); /** PROBLEM */
ImportedXls temp = new ImportedXls(docNumber, note, index);
datalist.add(temp);
// read the next line
stringRead = br.readLine();
}
br.close( );
}
catch(IOException ioe){...}
for (ImportedXls item : datalist) {
System.out.println(item.getDocNumber());
}
}
}
I don't understand how the nextToken works, because if I keep the initialize three variables (docNumber, note and index) as nextToken(), it fails on:
Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(Unknown Source)
at _test.Test.main(Test.java:32)
If I keep docNumber only, it works. Could you help me?
It seems that some of the rows of your input file have less then 3 comma separated fields.You should always check if tokenizer has more tokens (StringTokenizer.hasMoreTokens), unless you are are 100% sure your input is correct.
CORRECT parsing of CSV files is not so trivial task. Why not to use a library that can do it very well - http://opencsv.sourceforge.net/ ?
Seems like your code is getting to a line that the Tokenizer is only breaking up into 1 part instead of 3. Is it possible to have lines with missing data? If so, you need to handle this.
Most probably your input file doesn't contain another element delimited by , in at least one line. Please show us your input - if possible the line that fails.
However, you don't need to use StringTokenizer. Using String#split() might be easier:
...
while( stringRead != null )
{
String[] elements = stringRead.split(",");
if(elements.length < 3) {
throw new RuntimeException("line too short"); //handle missing entries
}
String docNumber = elements[0];
String note = elements[1];
String index = elements[2];
ImportedXls temp = new ImportedXls(docNumber, note, index);
datalist.add(temp);
// read the next line
stringRead = br.readLine();
}
...
You should be able to check your tokens using the hasMoreTokens() method. If this returns false, then it's possible that the line you've read does not contain anything (i.e., an empty string).
It would be better though to use the String.split() method--if I'm not mistaken, there were plans to deprecate the StringTokenizer class.