The following is a sample line from a CSV file I'm trying to parse using regex or string.split(","), and I want to extract the year (in below example, 2013). But the problem is: the year column index is not always 17, it could also be 19. I am thinking of looping through the each string in the string.split(",") array and match the pattern against "2XXX".
9344949,HW488429,10/09/2013 05:00:00 AM,039XX W MONROE
ST,0610,BURGLARY,FORCIBLE
ENTRY,RESIDENCE,false,false,1122,011,28,26,05,1149955,1899326,**2013**,10/16/2013
12:39:00 AM,41.87966141386545,-87.72485045045373,"(41.87966141386545,
-87.72485045045373)"
This can break down each line in the CSV file
Pattern.compile("^([^,]+,){2}\\d{2}/\\d{2}/(\\d{4})([^,]+,){3}([^,]+)");
But I need some help matching each string against 2XXX. Tried this:
Pattern patt = Pattern.compile("^2\d{3}"); however, my eclipse reports error on this.
Firstly, I strongly advise you try using a CSV parser like Boris suggested.
IF you must do it your way you could do something like
String year;
String str = // your csv line
String[] strArr = str.split(",");
for(String s : strArr)
{
if(s.trim().matches("2\\d{3}"))
{
year = s;
break;
}
}
Related
I am getting a piece of JSON text from a url connection and saving it to a string currently as such:
...//setting up url and connection
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String str = in.readLine();
When I print str, I correctly find the data {"build":{"version_component":"1.0.111"}}
Now I want to extract the 111 from str, but I am having some trouble.
I tried
String afterLastDot = inputLine.substring(inputLine.lastIndexOf(".") + 1);
but I end up with 111"}}
I need a solution that is generic so that if I have String str = {"build":{"version_component":"1.0.111111111"}}; the solution still works and extracts 111111111 (ie, I don't want to hard code extract the last three digits after the decimal point)
If you cannot use a JSON parser then you can this regex based extraction:
String lastNum = str.replaceAll("^.*\\.(\\d+).*", "$1");
RegEx Demo
^.* is greedy match that matches everything until last DOT and 1 or more digits that we put in group #1 to be used in replacement.
Find the start and the end indexes of the String you need and substring(start, end) :
// String str = "{"build":{"version_component":"1.0.111"}};" cannot compile without escaping
String str = "{\"build\":{\"version_component\":\"1.0.111\"}}";
int start = str.lastIndexOf(".")+1;
int end = str.lastIndexOf("\"");
String substring = str.substring(start,end);
just use JSON api
JSONObject obj = new JSONObject(str);
String versionComponent= obj.getJSONObject("build").getString("version_component");
Then just split and take the last element
versionComponent.split("\\.")[2];
Please, your can try the following code :
...
int index = inputLine.lastIndexOf(".")+1 ;
String afterLastDot = inputLine.substring(index, index+3);
With Regular Expressions (Rexp),
You can solve your problem like this ;
Pattern pattern = Pattern.compile("111") ;
Matcher matcher = pattern.matcher(str) ;
while(matcher.find()){
System.out.println(matcher.start()+" "+matcher.end());
System.out.println(str.substring(matcher.start(), matcher.end()));
}
I'm trying to convert a file into a String and after that i will replace the name of the converted file without non numeric characters but when i replace it the file extension of the file is also replaced. for example (2014.05-06.txt -> 20140506.txt but whats happening is 20140506txt) i want to remain the .txt, .log or any type of extension.
String strDatefiles = Arrays.toString(saDateFiles).replaceAll("[\\W]", "");
Edited:
String[] saDateFiles = fileList.list();
String strDatefiles = Arrays.toString(saDateFiles.substring(0, saDateFiles.lastIndexOf("."))).replaceAll("[\\W]", "");
this saDateFiles.lastIndexOf("."))) have error replace with a length?
Edited2:
String[] saDateFiles = fileList.list();
String strDatefiles = Arrays.toString(saDateFiles).substring(0, Arrays.toString(saDateFiles).lastIndexOf(".")).replaceAll("[\\W]","");
System.out.println(strDatefiles);`
Output: 20140502txt20140904 (I have 2 files inside)
I would take the indexOf the last . in the String, and then manipulate the two substrings. For example,
String saDateFiles = "2014.05-06.txt";
int lastDot = saDateFiles.lastIndexOf('.');
String strDatefiles = saDateFiles.substring(0, lastDot).replaceAll("\\D", "")
.concat(saDateFiles.substring(lastDot));
System.out.println(strDatefiles);
Outputs (as requested)
20140506.txt
As you noticed, the above was for one file name. To do it for an array of file names, you could use a for-each loop and the above code like
String[] saDateFilesArr = fileList.list();
for (String saDateFiles : saDateFilesArr) {
int lastDot = saDateFiles.lastIndexOf('.');
String strDatefiles = saDateFiles.substring(0, lastDot)
.replaceAll("\\D", "").concat(saDateFiles.substring(lastDot));
System.out.println(strDatefiles);
}
Apply your replace function to the part of file name before the ".". You can extract this part with the code :
fileName.substring(0, fileName.lastIndexOf(".")) ;
Use :
String strDatefiles = Arrays.toString(saDateFiles.substring(0, saDateFiles.lastIndexOf("."))).replaceAll("[\\W]", "");
i need to read a file upto certain comma,for example;
String s=hii,lol,wow,and,finally
need output as hii,lol,wow,and
Dont want last comma followed with characters
As my code is reading last comma string
Example:iam getting my code out put as: finally
Below is my code
please guide me
File file =new File("C:/Users/xyz.txt");
FileInputStream inputStream = new FileInputStream(file);
String filke = IOUtils.toString(inputStream);
String[] pieces = filke.split("(?=,)");
String answer = Arrays.stream(pieces).skip(pieces.length - 1).collect(Collectors.joining());
String www=answer.substring(1);
System.out.format("Answer = \"%s\"%n", www);
You don't necessarily need to use regex for this. Just get the index of the last ',' and get the substring from 0 to that index:
String answer = "hii,lol,wow,and,finally";
String www = answer.substring(0, answer.lastIndexOf(','));
System.out.println(www); // prints hii,lol,wow,and
String in Java has a method called lastIndexOf(String str). That might come in handy for you.
Say your input is String s = "hii,lol,wow,and,finally";
You can do a String operation like:
String s = "hii,lol,wow,and,finally";
s = s.substring(0, s.lastIndexOf(","));
This gives you the output: hii,lol,wow,and
If you want to use java 8 stream to do it for you maybe try filter ?
String answer = Arrays.stream(pieces).filter(p -> !Objects.equals(p, pieces[pieces.length-1])).collect(Collectors.joining());
this will print Answer = "hii,lol,wow,and"
To have stricly regex you can use the Pattern.compile and Matcher
Pattern.compile("\w+(?=,)");
Matcher matcher = pattern.matcher(filke);
while (matcher.find()) {
System.out.println(matcher.group(1) + ","); // regex not good enough, maybe someone can edit it to include , (comma)
}
Will match hii, lol, wow, and,
See the regex example here https://regex101.com/r/1iZDjg/1
I´m parsing a plain text and trying to convert into an Object.
The text looks like(and i can´t change the format):
"N001";"2014-08-12-07.11.37.352000";" ";"some#email.com ";4847 ;"street";"NAME SURNAME ";26 ;"CALIFORNIA ";21
and The Object to convert:
String index;
String timestamp;
String mail;
Integer zipCode
...
I´ve tried with:
StringTokenizer st1 = new StringTokenizer(N001\";\"2014-08-12-07.11.37.352000\";\" \";\"some#email.com \";4847 ;\"street\";\"NAME SURNAME \";26 ;\"CALIFORNIA \";21);
while(st2.hasMoreTokens()) {
System.out.println(st2.nextToken(";").replaceAll("\"",""));
}
And the output is the correct one, i´ve thinking to have a counter and hardcoding with a case bucle and set the field deppending the counter, but the problem is that I have 40 fields...
Some idea?
Thanks a lot!
String line = "N001";"2014-08-12-07.11.37.352000";" ";"some#email.com ";4847 ;"street";"NAME SURNAME ";26 ;"CALIFORNIA ";21
StringTokenizer st1 = new StringTokenizer(line, ";");
while(st2.hasMoreTokens()) {
System.out.println(st2.nextToken().replaceAll("\"",""));
}
Or you can use split method and directly get a array of values using the delimiter ;
String []values = line.split(";");
then iterate through the array and get and cast the values they way you want
Regardless of the way you are parsing the file, you somehow need to define the mapping of column-to-field (and how to parse the text).
if this is a CVS file, you could use a library like super-csv. All you need to do is write a mapping definition.
I would first split your input String based on the semi-colon separator, then clean up the values.
For instance:
String input = "\"N001\";\"2014-08-12-07.11.37.352000\";\" " +
"\";\"some#email.com " +
"\";4847 ;\"street\";\"NAME " +
"SURNAME \";26 ;\"CALIFORNIA " +
"\";21 ";
// raw split
String[] split = input.split(";");
System.out.printf("Raw: %n%s%n", Arrays.toString(split));
// cleaning up whitespace and double quotes
ArrayList<String> cleanValues = new ArrayList<String>();
for (String s: split) {
String clean = s.replaceAll("[\\s\"]", "");
if (!clean.isEmpty()) {
cleanValues.add(clean);
}
}
System.out.printf("Clean: %n%s%n", cleanValues);
Output
Raw:
["N001", "2014-08-12-07.11.37.352000", " ", "some#email.com ", 4847 , "street", "NAME SURNAME ", 26 , "CALIFORNIA ", 21 ]
Clean:
[N001, 2014-08-12-07.11.37.352000, some#email.com, 4847, street, NAMESURNAME, 26, CALIFORNIA, 21]
Note
In order to map the values to your variables you will need to know their index in advance, and it will have to be consistent.
Then you can use the get(int i) method to retrieve them from your List - e.g. cleanValues.get(2) will get you the e-mail, etc.
Note (2)
If you do not know the indices in advance or they may vary, then you are in trouble.
You can of course try to get those indices by using regular expressions but I suspect you might end up complicating your life quite a bit.
you can use Java Reflection to automate your process.
Iterate over the fields
Field[] fields = dummyRow.getClass().getFields();
and set your values
SomeClass object = construct.newInstance();
field.set(object , value);
I have read whole xml file as single string using class.The output is
String result=<?xml version="1.0"?><catalog><book id="bk101"><part1><date>Fri Apr 05 11:46:46 IST 2013</date><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price> <publish_date>2000-10-01</publish_date></part1></book></catalog>
Now i want to replace date value.so first i want to extract date from the string and replace new value.I have following code,
Date date=new Date()
String str=result.substring(result.indexOf("<date>"));
It displays whole string from date tag to end tag.
How to extract date tag and replace it?
This here gets the contents of the tags using regex... but as for replacing it - I'll get back to you.
String result = "<?xml version=\"1.0\"?><catalog><book id=\"bk101\"><part1><date>Fri Apr 05 11:46:46 IST 2013</date><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price> <publish_date>2000-10-01</publish_date></part1></book></catalog>";
String pattern = ".*(?i)(<date.*?>)(.+?)(</date>).*";
System.out.println(result.replaceAll(pattern, "$2"));
Cheers
String str=result.substring(result.indexOf("<date>") ,result.indexOf("</date>")+"</date>".length());
String#substring(int beginIndex)
Returns a new string that is a substring of this string. The substring
begins with the character at the specified index and extends to the
end of this string.
String#substring(int beginIndex,int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at
index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Edit: Oh, you wanted it in java. This is the C# solution =)
You can solve this by replacing the entire date including the tag.
You have two dates in your XML so to be sure that you will not replace both of them you can do it like this.
int index1 = result.IndexOf("<date>");
int index2 = result.IndexOf("</date>") - index1 + "</date>".Length;
var stringToReplace = result.Substring(index1, index2);
var newResult = result.Replace(stringToReplace, "<date>" + "The Date that you want to insert" + "</date>");
Just the value:
String str = result.substring(result.indexOf("<date>") + "<date>".length(),
result.indexOf("</date>"));
Including the tags:
String str = result.substring(result.indexOf("<date>"),
result.indexOf("</date>") + "</date>".length());