Need to extract data from CSV file

Need to extract data from CSV file - java

In my file I have below data, everything is string
Input
"abcd","12345","success,1234,out",,"hai"
The output should be like below
Column 1: "abcd"
Column 2: "12345"
Column 3: "success,1234,out"
Column 4: null
Column 5: "hai"
We need to use comma as a delimiter , the null value is comming without double quotes.
Could you please help me to find a regular expression to parse this data

You could try a tool like CSVReader from OpenCsv https://sourceforge.net/projects/opencsv/
You can even configure a CSVParser (used by the reader) to output null on several conditions. From the doc :
/**
* Denotes what field contents will cause the parser to return null: EMPTY_SEPARATORS, EMPTY_QUOTES, BOTH, NEITHER (default)
*/
public static final CSVReaderNullFieldIndicator DEFAULT_NULL_FIELD_INDICATOR = NEITHER;

You can use this Regular Expression
"([^"]*)"
DEMO: https://regex101.com/r/WpgU9W/1
Match 1
Group 1. 1-5 `abcd`
Match 2
Group 1. 8-13 `12345`
Match 3
Group 1. 16-32 `success,1234,out`
Match 4
Group 1. 36-39 `hai`

Using the ("[^"]+")|(?<=,)(,) regex you may find either quoted strings ("[^"]+"), which should be treated as is, or commas preceded by commas, which denote null field values. All you need now is iterate through the matches and check which of the two capture groups defined and output accordingly:
String input = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
Pattern pattern = Pattern.compile("(\"[^\"]+\")|(?<=,)(,)");
Matcher matcher = pattern.matcher(input);
int col = 1;
while (matcher.find()) {
if (matcher.group(1) != null) {
System.out.println("Column " + col + ": " + matcher.group(1));
col++;
} else if (matcher.group(2) != null) {
System.out.println("Column " + col + ": null");
col++;
}
}
Demo: https://ideone.com/QmCzPE

Step #1:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(,,)";
final String string = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"\n"
+ "\"abcd\",\"12345\",\"success,1234,out\",\"null\",\"hai\"";
final String subst = ",\"null\",";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
Original Text:
"abcd","12345","success,1234,out",,"hai"
Transformation: (with null)
"abcd","12345","success,1234,out","null","hai"
Step #2: (use REGEXP)
"([^"]*)"
Result:
abcd
12345
success,1234,out
null
hai
Credits:
Emmanuel Guiton [https://stackoverflow.com/users/7226842/emmanuel-guiton] REGEXP

You can also use the Replace function:
final String inuput = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
System.out.println(inuput);
String[] strings = inuput
.replaceAll(",,", ",\"\",")
.replaceAll(",,", ",\"\",") // if you have more then one null successively
.replaceAll("\",\"", "\";\"")
.replaceAll("\"\"", "")
.split(";");
for (String string : strings) {
String output = string;
if (output.isEmpty()) {
output = null;
}
System.out.println(output);
}

Related

How to get value of optional parameters from a url with regex in java

I have some uris which I want to extract parameters if they are exist, I come up with this code. Can someone point me to fix regex to success.
cityId and countryId works as expected but Cant get only numbers after word '-a-'
Regex
// "/city/berlin-a-10284?cityId=123456&countryId=4545"
// "/city/berlin-a-10284"
// "/city/berlin-a-10284?cityId=123456"
// "/city/berlin-a-10284?countryId=4545"
private String ValueExtractor(String url, String searchWord) {
String regex = "(?<=" + searchWord + ").*?(?=&|$)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(url);
return matcher.find() ? matcher.group() : "";
}
String productId = "";
String cityId = "";
String countryId = "";
if (url.contains("-p-")) {
productId = ValueExtractor(url, "-a-");
}
if (url.contains("cityId")) {
cityId = ValueExtractor(url, "cityId=");
}
if (url.contains("countryId")) {
countryId = ValueExtractor(url, "countryId=");
}
Expected results:
"/city/berlin-a-10284?cityId=123456&countryId=4545"
productId:10284
cityId: 123456
countryId: 4545
"/city/berlin-a-10284"
productId:10284
"/city/berlin-a-10284?cityId=123456"
productId:10284
cityId: 123456
"/city/berlin-a-10284?countryId=4545"
productId:10284
countryId: 4545

Cant get only numbers after word '-a-'
You can use the regex, (?<=-a-)\d+(?=[?&]|$) to retrieve this number.
(?<=-a-)\d+ specifies one or more digits preceded by -a-.
(?=[?&]|$) specifies positive lookahead for ?, or & or end of line.

Using regex for doing string operation

I have a string
String s="my name is ${name}. My roll no is ${rollno} "
I want to do string operations to update the name and rollno using a method.
public void name(String name, String roll)
{
String new = s.replace(" ${name}", name).replace(" ${rollno}", roll);
}
Can we achieve the same using some other means like using regex to change after first "$" and similarly for the other?

You can use either Matcher#appendReplacement or Matcher#replaceAll (with Java 9+):
A more generic version:
String s="my name is ${name}. My roll no is ${rollno} ";
Matcher m = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
Map<String,String> replacements = new HashMap();
replacements.put("name","John");
replacements.put("rollno","123");
StringBuffer replacedLine = new StringBuffer();
while (m.find()) {
if (replacements.get(m.group(1)) != null)
m.appendReplacement(replacedLine, replacements.get(m.group(1)));
else
m.appendReplacement(replacedLine, m.group());
}
m.appendTail(replacedLine);
System.out.println(replacedLine.toString());
// => my name is John. My roll no is 123
Java 9+ solution:
Matcher m2 = Pattern.compile("\\$\\{([^{}]+)\\}").matcher(s);
String result = m2.replaceAll(x ->
replacements.get(x.group(1)) != null ? replacements.get(x.group(1)) : x.group());
System.out.println( result );
// => my name is John. My roll no is 123
See the Java demo.
The regex is \$\{([^{}]+)\}:
\$\{ - a ${ char sequence
([^{}]+) - Group 1 (m.group(1)): any one or more chars other than { and }
\} - a } char.
See the regex demo.

Length of String within tags in java

We need to find the length of the tag names within the tags in java
{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}
so the length of Student tag is 7 and that of subject tag is 7 and that of marks is 5.
I am trying to split the tags and then find the length of each string within the tag.
But the code I am trying gives me only the first tag name and not others.
Can you please help me on this?
I am very new to java. Please let me know if this is a very silly question.
Code part:
System.out.println(
getParenthesesContent("{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}"));
public static String getParenthesesContent(String str) {
return str.substring(str.indexOf('{')+1,str.indexOf('}'));
}

You can use Patterns with this regex \\{(\[a-zA-Z\]*)\\} :
String text = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
Matcher matcher = Pattern.compile("\\{([a-zA-Z]*)\\}").matcher(text);
while (matcher.find()) {
System.out.println(
String.format(
"tag name = %s, Length = %d ",
matcher.group(1),
matcher.group(1).length()
)
);
}
Outputs
tag name = Student, Length = 7
tag name = Subject, Length = 7
tag name = Marks, Length = 5

You might want to give a try to another regex:
String s = "{Abc}{Defg}100{Hij}100{/Klmopr}{/Stuvw}"; // just a sample String
Pattern p = Pattern.compile("\\{\\W*(\\w++)\\W*\\}");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + ", length: " + m.group(1).length());
}
Output you get:
Abc, length: 3
Defg, length: 4
Hij, length: 3
Klmopr, length: 6
Stuvw, length: 5
If you need to use charAt() to walk over the input String, you might want to consider using something like this (I made some explanations in the comments to the code):
String s = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
ArrayList<String> tags = new ArrayList<>();
for(int i = 0; i < s.length(); i++) {
StringBuilder sb = new StringBuilder(); // Use StringBuilder and its append() method to append Strings (it's more efficient than "+=") String appended = ""; // This String will be appended when correct tag is found
if(s.charAt(i) == '{') { // If start of tag is found...
while(!(Character.isLetter(s.charAt(i)))) { // Skip characters that are not letters
i++;
}
while(Character.isLetter(s.charAt(i))) { // Append String with letters that are found
sb.append(s.charAt(i));
i++;
}
if(!(tags.contains(sb.toString()))) { // Add final String to ArrayList only if it not contained here yet
tags.add(sb.toString());
}
}
}
for(String tag : tags) { // Printing Strings contained in ArrayList and their length
System.out.println(tag + ", length: " + tag.length());
}
Output you get:
Student, length: 7
Subject, length: 7
Marks, length: 5

yes use regular expression, find the pattern and apply that.

Java - Regex to split mathematical expression for operator excluding operator which comes under brackets

I need to split below string using below regex. but it splits data which comes under brackets.
Input
T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14
Expected Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14
It should not split data which comes under "(" and ")";
I tried with below code but split data which comes under "(" and ")"
String[] result = expr.split("[+*/]");
any pointer to fix this.
I am new to this regex.
Input
(T(i-1).XX_1 + XY_8) + T(i-1).YY_2 * (ZY_14 + ZY_14)
Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14 , ZY_14
if it is T(i-1) need to ignore.
For below expression its not working
XY_98 + XY_99 +XY_100
String lineExprVal = lineExpr.replaceAll("\\s+","");
String[] result = lineExprVal.split("[+*/-] (?!(^))");

You can split every thing outside your parentheses like this :
String str = "T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14";
String result[] = str.split("[+*/-] (?!(^))");
//---------------------------^----^^--List of your delimiters
System.out.println(Arrays.toString(result));
This will print :
[T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14]
The idea is simple you have to split with your delimiters that not inside your parenthesis.
You can check this here ideone and you can check your regex here Regex demo
EDIT
In your second case you have to use this regex :
String str = "(T(i - 1).XX_1 + XY_8)+ (i - 1).YY_2*(ZY_14 + ZY_14)";
String result[] = str.split("[+*+\\/-](?![^()]*(?:\\([^()]*\\))?\\))");
System.out.println(Arrays.toString(result));
This will give you :
[(T(i-1).XX_1+XY_8), T(i-1).YY_2, (ZY_14+ZY_14)]
^----Group1------^ ^--Groupe2-^ ^--Groupe3-^
You can find the Regex Demo, i inspirit this solution from this post here Regex to match only comma's but not inside multiple parentheses .
Hope this can help you.

Split in your second mathematical expression is really hard if it is not possible, so instead you have to use pattern, it is more helpful, so for your expression, you need this regex :
(\w+\([\w-*+\/]+\).\w+)|((?:(\w+\(.*?\))))|(\w+)
Here is a Demo regex you will understand more.
To get the result you need to loop throw your result :
public static void main(String[] args) {
String input = "(T(i-1).XX_1 + XY_8) + X + T(i-1).YY_2 * (ZY_14 + ZY_14) + T(i-1)";
Pattern pattern = Pattern.compile("(\\w+\\([\\w-*+\\/]+\\).\\w+)|((?:(\\w+\\(.*?\\))))|(\\w+)");
Matcher matcher = pattern.matcher(input);
List<String> reslt = new ArrayList<>();
while (matcher.find()) {//loop throw your matcher
if (matcher.group(1) != null) {
reslt.add(matcher.group(1));
}
//In your case you have to avoid this two groups
// if (matcher.group(2) != null) {
// reslt.add(matcher.group(2));
// }
// if (matcher.group(3) != null) {
// reslt.add(matcher.group(3));
// }
if (matcher.group(4) != null) {
reslt.add(matcher.group(4));
}
}
reslt.forEach(System.out::println);
}
This will gives you :
T(i-1).XX_1
XY_8
X
T(i-1).YY_2
ZY_14
ZY_14

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters

You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>

To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);

Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Need to extract data from CSV file - java

You can use this Regular Expression "([^"]*)" DEMO: https://regex101.com/r/WpgU9W/1 Match 1 Group 1. 1-5 `abcd` Match 2 Group 1. 8-13 `12345` Match 3 Group 1. 16-32 `success,1234,out` Match 4 Group 1. 36-39 `hai`

Related

How to get value of optional parameters from a url with regex in java

Using regex for doing string operation

Length of String within tags in java

Java - Regex to split mathematical expression for operator excluding operator which comes under brackets

complex regular expression in Java

Categories

Resources