I have txt file with line:
1st line - 20-01-01 Abs Def est (xabcd)
2nd line - 290-01-01 Abs Def est ghj gfhj (xabcd fgjh fgjh)
3rd line - 20-1-1 Absfghfgjhgj (xabcd ghj 5676gyj)
I want to keep 3 diferent String array:
[0]20-01-01 [1]290-01-01 [2] 20-1-1
[0]Abs Def est [1]Abs Def est ghj gfhj [2] Absfghfgjhgj
[0]xabcd [1]xabcd fgjh fgjh [2] xabcd ghj 5676gyj
Using String[] array 1 = myLine.split(" ") i only have piece 20-01-01 but i also want to keep other 2 Strings
EDIT: I want to do this using regular Expressions (text file is large)
This is my piece of code:
Please help, i searching, but does not found anything
Thx.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Comparator;
import java.util.Date;
import java.util.Set;
import java.util.TreeSet;
public class Holiday implements Comparable<Date>{
Date date;
String name;
public Holiday(Date date, String name){
this.date=date;
this.name=name;
}
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream(new File("c:/holidays.txt"));
InputStreamReader isr = new InputStreamReader(fis, "windows-1251");
BufferedReader br = new BufferedReader(isr);
TreeSet<Holiday> tr=new TreeSet<>();
System.out.println(br.readLine());
String myLine = null;
while ( (myLine = br.readLine()) != null)
{
String[] array1 = myLine.split(" "); //OR use this
//String array1 = myLine.split(" ")[0];//befor " " read 1-st string
//String array2 = myLine.split("")[1];
//Holiday h=new Holiday(array1, name)
//String array1 = myLine.split(" ");
// check to make sure you have valid data
// String[] array2 = array1[1].split(" ");
System.out.println(array1[0]);
}
}
#Override
public int compareTo(Date o) {
// TODO Auto-generated method stub
return 0;
}
}
Pattern p = Pattern.compile("(.*?) (.*?) (\\(.*\\))");
Matcher m = p.matcher("20-01-01 Abs Def est (abcd)");
if (!m.matches()) throw new Exception("Invalid string");
String s1 = m.group(1); // 20-01-01
String s2 = m.group(2); // Abs Def est
String s3 = m.group(3); // (abcd)
Use a StringTokenizer, which has a " " as a delimiter by default.
You seem to be splitting based on whitespace. Each element of the string array would contain the individual whitespace-separate substrings, which you can then piece back together later on via string concatenation.
For instance,
array1[0] would be 20-01-01
array1[1] would be Abs
array1[2] would be Def
so on and so forth.
Another option is to Java regular expressions, but that may only be useful if your input text file is has a consistent formatting and if there's a lot of lines to process. It is very powerful, but requires some experience.
Match required text data by regular expression.
The regexp below ensure there are exactly 3 words in the middle and 1 word in the bracket.
String txt = "20-01-01 Abs Def est hhh (abcd)";
Pattern p = Pattern.compile("(\\d\\d-\\d\\d-\\d\\d) (\\w+ \\w+ \\w+) ([(](\\w)+[)])");
Matcher matcher = p.matcher(txt);
if (matcher.find()) {
String s1 = matcher.group(1);
String s2 = matcher.group(2);
String s3 = matcher.group(3);
System.out.println(s1);
System.out.println(s2);
System.out.println(s3);
}
However if you need more flexibility you may want to use code provided by Lence Java.
Related
Small question regarding a Java job to extract information out of lines from a file please.
Setup, I have a file, in which one line looks like this:
bla,bla42bla()bla=bla+blablaprimaryKey="(ZAPDBHV7120D41A,USA,blablablablablabla
The file contains many of those lines (as describe above)
In each of the lines, there are two particular information I am interested in, the primaryKey, and the country.
In my example, ZAPDBHV7120D41A and USA
For sure, each line of the file has exactly once the primaryKey, and exactly once the country, they are separated by a comma. It is there exactly once. in no particular order (it can appear at the start of the line, middle, end of the line, etc).
The primary key is a combination of alphabet in caps [A, B, C, ... Y, Z] and numbers [0, 1, 2, ... 9]. It has no particular predefined length.
The primary key is always in between primaryKey="({primaryKey},{country},
Meaning, the actual primaryKey is found after the string primaryKey-equal-quote-open parenthesis. And before another comma three letters country comma.
I would like to write a program, in which I can extract all the primary key, as well as all countries from the file.
Input:
bla,bla42bla()bla=bla+blablaprimaryKey="(ZAPDBHV7120D41A,USA,blablablablablabla
bla++blabla()bla=bla+blablaprimaryKey="(AA45555DBMW711DD4100,ARG,bla
[...]
Result:
The primaryKey is ZAPDBHV7120D41A
The country is USA
The primaryKey is AA45555DBMW711DD4100
The country is ARG
Therefore, I tried following:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
public class RegexExtract {
public static void main(String[] args) throws Exception {
final String csvFile = "my_file.txt";
try (final BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
String line;
while ((line = br.readLine()) != null) {
Pattern.matches("", line); // extract primaryKey and country based on regex
String primaryKey = ""; // extract the primary from above
String country = ""; // extract the country from above
System.out.println("The primaryKey is " + primaryKey);
System.out.println("The country is " + country);
}
}
}
}
But I am having a hard time constructing the regular expression needed to match and extract.
May I ask what is the correct code in order to extract from the line based on above information?
Thank you
Explanations after the code.
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExtract {
public static void main(String[] args) {
Path path = Paths.get("my_file.txt");
try (BufferedReader br = Files.newBufferedReader(path)) {
Pattern pattern = Pattern.compile("primaryKey=\"\\(([A-Z0-9]+),([A-Z]+)");
String line = br.readLine();
while (line != null) {
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
String primaryKey = matcher.group(1);
String country = matcher.group(2);
System.out.println("The primaryKey is " + primaryKey);
System.out.println("The country is " + country);
}
line = br.readLine();
}
}
catch (IOException xIo) {
xIo.printStackTrace();
}
}
}
Running the above code produces the following output (using the two sample lines in your question).
The primaryKey is ZAPDBHV7120D41A
The country is USA
The primaryKey is AA45555DBMW711DD4100
The country is ARG
The regular expression looks for the following [literal] string
primaryKey="(
The double quote is escaped since it is within a string literal.
The opening parenthesis is escaped because it is a metacharacter and the double backslash is required since Java does not recognize \( in a string literal.
Then the regular expression groups together the string of consecutive capital letters and digits that follow the previous literal up to (but not including) the comma.
Then there is a second group of capital letters up to the next comma.
Refer to the Regular Expressions lesson in Oracle's Java tutorials.
I have a value(String) like "BLD00000001BLD00000002 BLD00000003, BLD00000004".
I want to use Regex """^BLD\d{8}"""
but it didn't work..
I want to return results like (BLD00000001','BLD00000002','BLD00000003 ... )
var regex = Regex("""[\{\}\[\]\/?.,;:|\) *~`!^\-_+<>#\#$%&\\\=\(\'\"]""")
val cvrtBldIds = bldIds.split(regex)
if (cvrtBldIds.joinToString(separator="").length % 11 != 0) {
throw BadRequestException("MSG000343", listOf("빌딩Id", "BLD[숫자8자리]"))
} else {
val res = cvrtBldIds
.filter{it.startsWith("BLD")} // BLD로 시작하는 것만 추출
.joinToString(separator = "','") // 아이디 앞뒤로 ',' 붙이기
bldIds = res
var sb = StringBuffer()
sb.append("'")
sb.append(bldIds)
sb.append("'")
input.bldId = sb.toString()
}
Do it as follows:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "BLD00000001BLD00000002 BLD00000003, BLD00000004";
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile("BLD\\d{8}");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
list.add(matcher.group());
}
System.out.println(list);
}
}
Output:
[BLD00000001, BLD00000002, BLD00000003, BLD00000004]
Notes:
BLD\\d{8} means starting with BLD and then 8 digits.
Java regex tutorial: https://docs.oracle.com/javase/tutorial/essential/regex/
Seems you want to split on a space, or a comma-space combo, or between a digit and the text BLD. The following regex can do that:
,?\s|(?<=\d)(?=BLD)
See regex101 for demo.
Here is how you can extract BLD\d{8} pattern matches in Kotlin using .findall():
val text = """"BLD00000001BLD00000002 BLD00000003, BLD00000004"."""
val matcher = """BLD\d{8}""".toRegex()
println(matcher.findAll(text).map{it.value}.toList() )
// => [BLD00000001, BLD00000002, BLD00000003, BLD00000004]
See the Kotlin demo
I am using file reader to read the csv file, the second column of the csv file is an rgb value such as rgb(255,255,255) but the columns in the csv file is separate by commas. If I use comma deliminator, it will read like "rgb(255," so how do I read the whole rgb value, the code is pasted below. Thanks!
FileReader reader = new FileReader(todoTaskFile);
BufferedReader in = new BufferedReader(reader);
int columnIndex = 1;
String line;
while ((line = in.readLine()) != null) {
if (line.trim().length() != 0) {
String[] dataFields = line.split(",");
//System.out.println(dataFields[0]+dataFields[1]);
if (!taskCount.containsKey(dataFields[columnIndex])) {
taskCount.put(dataFields[columnIndex], 1);
} else {
int oldCount = taskCount.get(dataFields[columnIndex]);
taskCount.put(dataFields[columnIndex],oldCount + 1);
}
}
I would strongly suggest not to use custom methods to parse CSV input. There are special libraries that do it for you.
#Ashraful Islam posted a good way to parse the value from a "cell" (I reused it), but getting this "cell" raw value must be done in a different way. This sketch shows how to do it using apache.commons.csv library.
package csvparsing;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class GetRGBFromCSV {
public static void main(String[] args) throws IOException {
Reader in = new FileReader(GetRGBFromCSV.class.getClassLoader().getResource("sample.csv").getFile());
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in); // remove ".withFirstRecordAsHeader()"
for (CSVRecord record : records) {
String color = record.get("Color"); // use ".get(1)" to get value from second column if there's no header in csv file
System.out.println(color);
Pattern RGB_PATTERN = Pattern.compile("rgb\\((\\d{1,3}),(\\d{1,3}),(\\d{1,3})\\)", Pattern.CASE_INSENSITIVE);
Matcher m = RGB_PATTERN.matcher(color);
if (m.find()) {
Integer red = Integer.parseInt(m.group(1));
Integer green = Integer.parseInt(m.group(2));
Integer blue = Integer.parseInt(m.group(3));
System.out.println(red + " " + green + " " + blue);
}
}
}
}
This is a custom valid CSV input which would probably make regex-based solutions behave unexpectedly:
Name,Color
"something","rgb(100,200,10)"
"something else","rgb(10,20,30)"
"not the value rgb(1,2,3) you are interested in","rgb(10,20,30)"
There are lots of options which you might forget to take into account when you write your custom parser: quoted and unquoted strings, delimiter within quotes, escaped quotes within quotes, different delimiters (, or ;), multiple columns etc. Third-party csv parser would take care about those things for you. You shouldn't reinvent the wheel.
line = "rgb(25,255,255)";
line = line.replace(")", "");
line = line.replace("rgb(", "");
String[] vals = line.split(",");
cast the values in vals to Integer and then you can use them.
Here is how you can do this :
Pattern RGB_PATTERN = Pattern.compile("rgb\\((\\d{1,3}),(\\d{1,3}),(\\d{1,3})\\)");
String line = "rgb(25,255,255)";
Matcher m = RGB_PATTERN.matcher(line);
if (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
Here
\\d{1,3} => match 1 to 3 length digit
(\\d{1,3}) => match 1 to 3 length digit and stored the match
Though ( or ) are meta character we have to escape it.
I'm writing a duplicate remover for BibTex. The books are listed in that form:
#Book{abramowitz+stegun,
author = "Milton Abramowitz and Irene A. Stegun",
title = "Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables",
publisher = "Dover",
year = 1964,
address = "New York",
edition = "ninth Dover printing, tenth GPO printing"
}^
What I have done is to read the data from external txt file, and Tokenize them after each book.
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package gotowy;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
/**
*
* #author Adam
*/
public class DuplicateFinder {
void deleteDuplicates(File filename) throws IOException{
BufferedReader reader = new BufferedReader(new FileReader(filename));
String textLine = reader.readLine();
String dodaj = "";
do {
//System.out.println(textLine);
textLine = reader.readLine();
dodaj = dodaj + textLine;
} while(textLine != null);
reader.close();
String books;
books = dodaj;
System.out.println(books);
String delimiter = "^";
StringTokenizer st = new StringTokenizer(books,delimiter);
int liczbaTokenow = st.countTokens();
System.out.println(liczbaTokenow);
System.out.println(st);
books.substring(books.indexOf("title") + 3 , books.length());
// while (st.hasMoreTokens()) {
// System.out.println(st.nextToken()+"xDDDDDDDDDDDD");
//}
}
}
And now I need help with get the substring of the title after every "title" keyword (in each token!!) in my list and compare them. Any ideas?
Thx in advance! :)
if(textLine.startwith("title"){
s = s.substring(textLine.indexOf("\"") + 1);
s = s.substring(0, s.indexOf("\""));f
System.out.println(s);
}
I assume that the title will always be within the quotation marks. The problem will then be the same as here, except that the parenthesis should be replaced by quotation marks. So it will be something like:
public String getTitle(String s){
s = s.substring(s.indexOf("\"") + 1);
s = s.substring(0, s.indexOf("\""));
return s;}
Your while-loop would then be:
while (st.hasMoreTokens()) {
String item = st.nextToken();
String substringWithTitle = item.substring(item.indexOf("title"));
String title = getTitle(substringWithTitle);}
I have a String:
StartTime-2014-01-14 12:05:00-StartTime
The requirement is to replace the timestamp with current timestamp.
I tried the below code which is not giving me the expected output:
String st = "StartTime-2014-01-14 12:05:00-StartTime";
String replace = "StartTime-2014-01-14 13:05:00-StartTime";
Pattern COMPILED_PATTERN = Pattern.compile(st, Pattern.CASE_INSENSITIVE);
Matcher matcher = COMPILED_PATTERN.matcher(DvbseContent);
String f = matcher.replaceAll(replace);
Expected Output is:
StartTime-<Current_Time_stamp>-StartTime
Or instead of Regex, you can just use indexOf and lastIndexOf:
String f = "StartTime-2014-01-14 12:05:00-StartTime";
String timestamp = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss")
.format(new java.util.Date());
String newString = f.substring(0, f.indexOf("-") + 1)
+ timestamp
+ f.substring(f.lastIndexOf("-"));
Output:
StartTime-2014-02-10 12:52:47-StartTime
You could match it like this:
(StartTime-).*?(-StartTime)
and replace it with this (or similar):
"$1" + current_time_stamp + "$2"
Example Java Code:
import java.util.*;
import java.util.Date;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
java.util.Date timestamp = new java.util.Date();
String search = "StartTime-2014-01-14 12:05:00-StartTime";
String regex = "(StartTime-).*?(-StartTime)";
String replacement = "$1"+ timestamp + "$2";
String result = search.replaceAll(regex, replacement);
System.out.println(result);
};
};
Output:
StartTime-Fri Feb 14 08:53:57 GMT 2014-StartTime