extracting a particular field from url

extracting a particular field from url - java

I want to extract particular fields from a url of a facebookpage. Iam not able to extract since link format is not static.eg:if I gave the below examples as input it should give the o/p as what we desire
1)https://www.facebook.com/pages/Ice-cream/109301862430120?rf=102173023157556
o/p -109301862430120
What about this type of link
can anyone help me

So in short, you want to get name after last / and (if there is any) before ? mark.
You can do it with using URI and File classes like
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
System.out.println(new File(new URI(data).getRawPath()).getName());
Output: 149675731889496
If you need to use regex then you can use
([^/?]+)(\\?|$)
and just read content of group 1 (the one in first pair of parenthesis).
If you don't want to use groups, and make regex match only digit part (without including ? in match) then you can use look around mechanisms like look-ahead (?=...). Regex you would have to use would look like
[^/?]+(?=\\?|$)
Code example:
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
Pattern p = Pattern.compile("([^/?]+)(\\?|$)");
Matcher m = p.matcher(data);
if (m.find()){
System.out.println(m.group(1));
}
Output:
149675731889496

Related

Regex pattern error on API 21(android 5) and below

Android 5 and below getting error from my regex pattern on runtime:
java.util.regex.PatternSyntaxException: Syntax error in regexp pattern near index 4:
(?<g1>(http|ftp)(s)?://)?(?<g2>[\w-:#])+(?<TLD>\.[\w\-]+)+(:\d+)?((|\?)([\w\-._~:/?#\[\]#!$&'()*+,;=.%])*)*
Here is code sample:
val urlRegex = "(?<g1>(http|ftp)(s)?://)?(?<g2>[\\w-:#])+(?<TLD>\\.[\\w\\-]+)+(:\\d+)?((|\\?)([\\w\\-._~:/?#\\[\\]#!$&'()*+,;=.%])*)*"
val sampleUrl = "https://www.google.com"
val urlMatchers = Pattern.compile(urlRegex).matcher(sampleUrl)
assert(urlMatchers.find())
This pattern works really fine on all APIs above 21.

It seems the earlier versions do not support named groups. As per this source, the named groups were introduced in Kotlin 1.2. Remove them if you do not need those submatches and only use the regex for validation.
Your regex is very inefficient as it contains a lot of nested quantified groups. See a "cleaner" version of it below.
Also, it seems you want to check if there is a regex match inside your input string. Use Regex#containsMatchIn():
val urlRegex = "(?:(?:http|ftp)s?://)?[\\w:#.-]+\\.[\\w-]+(?::\\d+)?\\??[\\w.~:/?#\\[\\]#!$&'()*+,;=.%-]*"
val sampleUrl = "https://www.google.com"
val urlMatchers = Regex(urlRegex).containsMatchIn(sampleUrl)
println(urlMatchers) // => true
See the Kotlin demo and the regex demo.
If you need to check the whole string match use matches:
Regex(urlRegex).matches(sampleUrl)
See another Kotlin demo.
Note that to define a regex, you need to use the Regex class constructor.

Trying to find apt regex expression for my query

Pattern p1 = Pattern.compile("(?:^|)'([^']*?)'(?:$|)");
Matcher m = p1.matcher(input);
//Matcher m = p2.matcher(testcases);
while (m.find()) {
output += (m.group().replace("\'", "").trim() + "/");
}
Input
/content/folder[#name='folder query 2']/folder[#name='Share file
Zone']/folder[#name='steve']/folder[#name="steve's Personal
Folder"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']
output should be -
folder query 2,
Share file zone,
steve,
steve's Personal folder,
Backup,
20150317,
.Archive
for some reasons my regex expressions seems to be reading only words with quotes so it doesn't consider steve's neither double quotes of the same. i am trying to format the query hence what i require is only folder names irrespective of single quote or double quote without considering apostrophe associated.

Use the following regex: (['"])(.*?)\1
It matches the opening quote (single or double), capturing that character as capture #1, captures the text as capture #2, and ends with the same kind of quote used at the beginning, by matching the capture #1.
Remember to escape " and \ when writing as Java string literal.
Test
String input = "/content/folder[#name='folder query 2']/folder[#name='Share file Zone']/folder[#name='steve']/folder[#name=\"steve's Personal Folder\"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']";
for (Matcher m = Pattern.compile("(['\"])(.*?)\\1").matcher(input); m.find(); )
System.out.println(m.group(2));
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive

You can get every thing between [#name=['\"] and ['\"] your regex should look like this \\[#name=['\"](.*?)['\"]] :
Pattern p1 = Pattern.compile("\\[#name=['\"](.*?)['\"]]");
Matcher m = p1.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive
Ideone Demo

search and replace string in java using pattern

Given the string
Content ID [9283745997] Content ID [9283005997] There can be text in between Content ID [9283745953] Content ID [9283741197] Content ID [928374500] There can be valid text here which should not be removed.
I want to remove the text starting Content ID followed by [9283745997] any numbers can be present between square brackets. Eventually I want the result string to be
There can be text in between There can be valid text here which should not be removed.
Could anyone please provide a valid regex to capture this recurring text but the numerals within square brackets are unique?
I appreciate your help!
My soulution to this was :
Pattern p = Pattern.compile("(Content ID \\[\\d*\\] )");
Matcher m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, "");
}
m.appendTail(sb);
System.out.println(sb);

So basically you are trying to remove each of Content ID [one or more digits].
To do this you can use replaceAll("regex","replacement") method of String class. As replacement you can use empty String "".
Only problem that stays is what regex should you use.
to match Content ID just write it normally as "Content ID "
to match [ or ] you will have to add \ before each of them because they are regex metacharacters and you need to escape them (in Java you will need to write \ as "\\")
to represent one digit (character from range 0-9) regex uses \d (again in Java you will need to write \ as "\\" which will result in "\\d")
to say "one or more of previously described element" just add + after definition of such element. For example if you want to match one or more letters a you can write it as a+.
Now you should be able to create correct regex. If you will have some questions feel free to ask them in comments.

Try this one:
(Content ID \[[0-9]+\])
You can test it here: http://regexpal.com/

I would use the regex
Content ID \[\d+\] ?
Implement it like this:
str.replaceAll("Content ID \\[\\d+\\] ?", "");
You can find an explanation and demonstration here: http://regex101.com/r/qD5rJ6

how to get all names and date of births from a specific file using java

Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.

I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}

match a string of characters between tags:

I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks

Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extracting a particular field from url - java

Related

Regex pattern error on API 21(android 5) and below

Trying to find apt regex expression for my query

search and replace string in java using pattern

how to get all names and date of births from a specific file using java

match a string of characters between tags:

Categories

Resources