Scanning 2 Different Data Types Java - java

I have a data file that is a list of names followed by "*****" and then continues with integers. How do I scan the names and then break with the asterisks, followed by scanning the integers?

This question might help : Splitting up data file in Java Scanner
Use the Scanner.useDelimiter() method, put "*****" as the delimiter, like this for example :
sc.useDelimiter("*****");
OR
Alternative :
Read the whole string
Split the string using String.split()
Resulting String array will have index 0 contain the names and index 1 contain the integers.

Below code should work for you
Scanner scanner = new Scanner(<INPUT_STR>).useDelimiter("[*****]");
while (scanner.hasNext()) {
if (scanner.hasNextInt()) {
// For Integer
} else {
// For String
}
}

Although this seems a tedious thing, I think this would solve the issue without worrying if the split returns anything, and the out of bounds.
final String x = "abc****12354";
final Pattern p = Pattern.compile("[A-Z]*[a-z]*\\*{4}");
final Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group());
}
final Pattern p1 = Pattern.compile("\\*{4}[0-9]*");
final Matcher m1 = p1.matcher(x);
while (m1.find()) {
System.out.println(m1.group());
}
The first pattern match minus the last 4 stars (can be substring-ed out) and the second pattern match minus the leading 4 stars (also can be removed) would give the request fields.

Related

How to get a String with Java regular expression in brackets within brackets

How can i get a String inside brackets. See code below.
String str = "C1<C2, C3<T1>>.C4<T2>.C5"
I need to get C1<C2, C3<T1>>, C4<T2>, and C5.
See code what I tried below
Pattern pat = Pattern.compile("(\\w+(<[^>]+>)?)(.\\w+(<[^>]+>)?)*");
Matcher mat = pat.matcher(str);
but the result was
C1<C2, C3<T1>
There are 2 problems that I see with your code:
It seems like you are only printing the first match instead of
looping through the results. Use while(mat.find()) to iterate
through the list of matches.
Simplify your pattern to \\w+(<[^>]+>+)? to get C1<C2, C3<T1>>, C4<T2>, and C5.
RegEx pattern explained:
w+= 1 or more alphanumeric or underscore character
()? = 0 or 1 of what is in the parenthesis
< = match the < character
[^>]+ = 1 or more sets characters until the > character
>+ = 1 or more > character (An alternative would be >{1,2} if you want to enforce only either one or two > characters.)
Your resulting code should look like the following:
public static void main(String[] args)
{
String str = "C1<C2, C3<T1>>.C4<T2>.C5";
Pattern pat = Pattern.compile("\\w+(<[^>]+>+)?");
Matcher mat = pat.matcher(str);
while(mat.find()) {
System.out.println(mat.group());
}
}
If you just want a list of the parts though, a much simpler way to accomplish this would be to use split() instead of RegEx. You can split the string on ., save the pieces in an array and then iterate through the array as so desired.
That would be accomplished with the following:
String[] parts = str.split("\\.");
Just split on dots:
String[] parts = str.split("\\.");
This does what you want using the sample input in the question.

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Replacing digits separated with commas using String.replace("","");

I have a string which looks like following:
Turns 13,000,000 years old
Now i want to convert the digits to words in English, I have a function ready for that however I am finding problems to detect the original numbers (13,000,000) in this case, because it is separated by commas.
Currently I am using the following regex to detect a number in a string:
stats = stats.replace((".*\\d.*"), (NumberToWords.start(Integer.valueOf(notification_data_greet))));
But the above seems not to work, any suggestions?
You need to extract the number using a RegEx wich allows for the commas. The most robust one I can think of right now is
\d{1,3}(,?\d{3})*
Wich matches any unsigned Integer both with correctly placed commas and without commas (and weird combinations thereof like 100,000000)
Then replace all , from the match by the empty String and you can parse as usual:
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
int n = Integer.parseInt(num);
// Do stuff with the number n
}
Working example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws InterruptedException {
String input = "1,300,000,000";
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
System.out.println(num);
int n = Integer.parseInt(num);
System.out.println(n);
}
}
}
Gives output
1300000000
1300000000
Try this regex:
[0-9][0-9]?[0-9]?([0-9][0-9][0-9](,)?)*
This matches numbers that are seperated by a comma for each 1000. So it will match
10,000,000
but not
10,1,1,1
You can do it with the help of DecimalFormat instead of a regular expression
DecimalFormat format = (DecimalFormat) DecimalFormat.getInstance();
System.out.println(format.parse("10,000,000"));
Try the below regex to match the comma separted numbers,
\d{1,3}(,\d{3})+
Make the last part as optional to match also the numbers which aren't separated by commas,
\d{1,3}(,\d{3})*

Regex - extract indefinite number of hits

The method getPolygonPoints() (see below) becomes a String name as parameter, which looks something like this:
points={{-100,100},{-120,60},{-80,60},{-100,100},{-100,100}}
The first number stands for the x-coordinate, the second for the y coordinate. For example,the first point is
x=-100
y=100
The second point is
x=-120
y=60
and so on.
Now I want to extract the points of the String and put them in a ArrayList, which has to look like this at the end:
[-100, 100, -120, 60, -80, 60, -100, 100, -100, 100]
The special feature here is, that the number of points in the given String changes and is not always the same.
I have written the following code:
private ArrayList<Integer> getPolygonPoints(String name) {
// the regular expression
String regGroup = "[-]?[\\d]{1,3}";
// compile the regular expression into a pattern
Pattern regex = Pattern.compile("\\{(" + regGroup + ")");
// the mather
Matcher matcher;
ArrayList<Integer> points = new ArrayList<Integer>();
// matcher that will match the given input against the pattern
matcher = regex.matcher(name);
int i = 1;
while(matcher.find()) {
System.out.println(Integer.parseInt(matcher.group(i)));
i++;
}
return points;
}
The first x coordinate is extracted correctly, but then a IndexOutOfBoundsException is thrown. I think that happens, because group 2 is not defined.
I think at first I have to count the points and then iterate over this number. Inside of the iteration I would put the int values in the ArrayList with a simple add(). But I don't know how to do this. Maybe I don't understand the regex part at this point. Especially how the groups work.
Please help!
String points = "{{-100,100},{-120,60},{-80,60},{-100,100},{-100,100}}";
String[] strs = points.replaceAll("(\\{|\\})", "").split(",");
ArrayList<Integer> list = new ArrayList<Integer>(strs.length);
for (String s : strs)
{
list.add(Integer.valueOf(s));
}
The part you don't seem to understand about the regex API is that the capture group number "reset" with every call to find(). Or, to put it another way: the number of the capture group is its position in the pattern, not in the input string.
You're also going about this the wrong way. You should match the whole construct you're looking for, in this case the {x,y} pairs. I'm assuming you don't want to validate the format of the whole string, so we can ignore the outside brackets and comma:
Pattern p = Pattern.compile("\\{(-?\\d+),(-?\\d+)\\}");
Matcher m = p.matcher(name);
while (m.find()) {
String x = m.group(1);
String y = m.group(2);
// parse and add to list
}
Alternately, since you don't care about which coordinate is X and which is Y, you can even do:
Matcher m = Pattern.compile("-?\\d+").matcher(name);
while (m.find()) {
String xOrY = m.group();
// parse etc.
}
Now, if you want to validate the input as well, I'd say that's a separate concern, I wouldn't necessarily try to do it in the same step as the parsing to keep the regex readable. (It might be possible in this case but if you don't need it why bother in the first place.)
You can also try this regex:
((-?\d+)\s*,\s*(-?\d+))
It will give you three groups:
Group 1 : x
Group 2 : y
Group 3 : x,y
You can use which one is required to you.
How about doing it in just one line:
List<String> list = Arrays.asList(name.replaceAll("(^\\w+=\\{+)|(\\}+$)", "").split("\\{?,\\}?"));
Your whole method would then be:
private ArrayList<Integer> getPolygonPoints(String name) {
return new ArrayList<String>(Arrays.asList(name.replaceAll("(^\\w+=\\{+)|(\\}+$)", "").split("\\{?,\\}?")));
}
This works by first stripping off the leading and trailing text, then splits on commas optionally surrounded by braces.
BTW You really should return the abstract type List, not the concrete implementation ArrayList.

Scanner through a line with whitespace and comma

I am new to Java and looking for some help with Java's Scanner class. Below is the problem.
I have a text file with multiple lines and each line having multiple pairs of digit.Such that each pair of digit is represented as ( digit,digit ). For example 3,3 6,4 7,9. All these multiple pairs of digits are seperated from each other by a whitespace. Below is an exampel from the text file.
1 2,3 3,2 4,5
2 1,3 4,2 6,13
3 1,2 4,2 5,5
What i want is that i can retrieve each digit seperately. So that i can create an array of linkedlist out it. Below is what i have acheived so far.
Scanner sc = new Scanner(new File("a.txt"));
Scanner lineSc;
String line;
Integer vertix = 0;
Integer length = 0;
sc.useDelimiter("\\n"); // For line feeds
while (sc.hasNextLine()) {
line = sc.nextLine();
lineSc = new Scanner(line);
lineSc.useDelimiter("\\s"); // For Whitespace
// What should i do here. How should i scan through considering the whitespace and comma
}
Thanks
Consider using a regular expression, and data that doesn't conform to your expectation will be easily identified and dealt with.
CharSequence inputStr = "2 1,3 4,2 6,13";
String patternStr = "(\\d)\\s+(\\d),";
// Compile and use regular expression
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
while (matcher.find()) {
// Get all groups for this match
for (int i=0; i<=matcher.groupCount(); i++) {
String groupStr = matcher.group(i);
}
}
Group one and group two will correspond to the first and second digit in each pairing, respectively.
1. use nextLine() method of Scanner to get the each Entire line of text from the File.
2. Then use BreakIterator class with its static method getCharacterInstance(), to get the individual character, it will automatically handle commas, spaces, etc.
3. BreakIterator also give you many flexible methods to separate out the sentences, words etc.
For more details see this:
http://docs.oracle.com/javase/6/docs/api/java/text/BreakIterator.html
Use the StringTokenizer class. http://docs.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
//this is in the while loop
//read each line
String line=sc.nextLine();
//create StringTokenizer, parsing with space and comma
StringTokenizer st1 = new StringTokenizer(line," ,");
Then each digit is read as a string when you call nextToken() like this, if you wanted all digits in the line
while(st1.hasMoreTokens())
{
String temp=st1.nextToken();
//now if you want it as an integer
int digit=Integer.parseInt(temp);
//now you have the digit! insert it into the linkedlist or wherever you want
}
Hope this helps!
Use split(regex), more simple :
while (sc.hasNextLine()) {
final String[] line = sc.nextLine().split(" |,");
// What should i do here. How should i scan through considering the whitespace and comma
for(int num : line) {
// Do your job
}
}

Categories