InnerHTML for Java - java

I would like JavaScript style innerHTML in Java. For instance, I want to get 'TRUE' from the string below:
String control = "<div class='myclass'>TRUE</div>";
But my pattern seems to be off as find() returns false. Ideas anyone?
Pattern pattern = Pattern.compile(">(.*?)<");
Matcher matcher = pattern.matcher(control);
if(matcher.find()) {
result = matcher.group(1);
}

get rid of the question mark:
public static void main(String[] args) {
String control = "<div class='myclass'>TRUE</div>";
Pattern pattern = Pattern.compile(">(.*)<");
Matcher matcher = pattern.matcher(control);
String result = null;
if(matcher.find()) {
result = matcher.group(1);
}
System.out.print(result);
}
BTW it would be better to learn how to use java's DOM objects and XPath classes.

Either use Jquery or if you really insist on doing it in Java, try using JSoup to strip out the HTML and return on the safe stuff

Related

Parsing a String using Java regex

I have the below java string in the below format.
externalCustomerID: { \"custToken\": \"xyz\" }
I want to extract xyz value from above string.
can anyone suggest me any regex expression for that in java?
check this one
Pattern pattern = Pattern.compile("(\\w+: \\{ \"\\w+\": \")(\\w+)");
Matcher matcher = pattern.matcher("externalCustomerID: { \"custToken\": \"xyz\" }");
if (matcher.find()) {
System.out.println(matcher.group(2));
}

How can I extract substring from the given url using regex in Android Studio

I'm trying to extract CANseIqFMnf from the URL https://www.instagram.com/p/CANseIqFMnf/ using regex in Android studio. Please help me to get a regex expression eligible for Android Studio.
Here is the code for my method:
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String REGEX = "/p\//";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(url);
boolean match = matcher.matches();
if (match){
Log.e("success", "start = " + matcher.start() + " end = " + matcher.end() );
}else{
Log.e("failed", "failed");
}
But it gives me failed in return!
Method 1
You just need to use replaceAll method in String, no need to compile a pattern and complicate things:
String input = "https://www.instagram.com/p/CANseIqFMnf/";
String output = input.replaceAll("https://www.instagram.com/p/", "").replaceAll("/", "");
Log.v(TAG, output);
Note that the first replaceAll is to remove the url and the second replaceAll is to remove any slashes /
Method 2
Pattern pattern = Pattern.compile("https://www.instagram.com/p/(.*?)/");
Matcher matcher = pattern.matcher("https://www.instagram.com/p/CANseIqFMnf/");
while(matcher.find()) {
System.out.println(matcher.group(1));
}
Note that if matcher.find() returns true then if you used modifiers like this in your REGEX (.*?) then the part found there will be in group(1), and group(0) will hold the entire regex match which is in your case the entire url.
Alternate option w/o regex can be implemented in a simpler manner as below using java.nio.file.Paths APIs
public class Url {
public static void main(String[] args) {
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String name = java.nio.file.Paths.get(url).getFileName().toString();
System.out.println(name);
}
}

How to remove text between <script></script> tags

I want to remove the content between <script></script>tags. I'm manually checking for the pattern and iterating using while loop. But, I'm getting StringOutOfBoundException at this line:
String script = source.substring(startIndex,endIndex-startIndex);
Below is the complete method:
public static String getHtmlWithoutScript(String source) {
String START_PATTERN = "<script>";
String END_PATTERN = " </script>";
while (source.contains(START_PATTERN)) {
int startIndex=source.lastIndexOf(START_PATTERN);
int endIndex=source.indexOf(END_PATTERN,startIndex);
String script=source.substring(startIndex,endIndex);
source.replace(script,"");
}
return source;
}
Am I doing anything wrong here? And I'm getting endIndex=-1. Can anyone help me to identify, why my code is breaking.
String text = "<script>This is dummy text to remove </script> dont remove this";
StringBuilder sb = new StringBuilder(text);
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
sb.replace(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag), "");
System.out.println(sb.toString());
If you want to remove the script tags too add the following line :
sb.toString().replace(startTag, "").replace(endTag, "")
UPDATE :
If you dont want to use StringBuilder you can do this:
String text = "<script>This is dummy text to remove </script> dont remove this";
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
String textToRemove = text.substring(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag));
text = text.replace(textToRemove, "");
System.out.println(text);
You can use a regex to remove the script tag content:
public String removeScriptContent(String html) {
if(html != null) {
String re = "<script>(.*)</script>";
Pattern pattern = Pattern.compile(re);
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
return html.replace(matcher.group(1), "");
}
}
return null;
}
You have to add this two imports:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
I know I'm probably late to the party. But I would like to give you a regex (really tested solution).
What you have to note here is that when it comes to regular expressions, their engines are greedy by default. So a search string such as <script>(.*)</script> will match the entire string starting from <script> up until the end of the line, or end of the file depending on the regexp options used. This is due to the fact that the search engine uses greedy matching by default.
Now in order to perform the match that you want to in an accurate manner... you could use "lazy" searching.
Search with Lazy loading
<script>(.*?)<\/script>
Now with that, you will get accurate results.
You can read more about about Regexp Lazy & Greedy in this answer.
This worked for me:
private static String removeScriptTags(String message) {
String scriptRegex = "<(/)?[ ]*script[^>]*>";
Pattern pattern2 = Pattern.compile(scriptRegex);
if(message != null) {
Matcher matcher2 = pattern2.matcher(message);
StringBuffer str = new StringBuffer(message.length());
while(matcher2.find()) {
matcher2.appendReplacement(str, Matcher.quoteReplacement(" "));
}
matcher2.appendTail(str);
message = str.toString();
}
return message;
}
Credit goes to nealvs: https://nealvs.wordpress.com/2010/06/01/removing-tags-from-a-string-in-java/

Extract json data from given string

I am having a string something like this :
a.b.c.d.e =
{"altImages":2,"available":1,"availableColorCount":3};
Now I only need to fetch :
{"altImages":2,"available":1,"availableColorCount":3}
What should be regex expression to extract that part from given string. Please help
My Try :
(?smi)a.b.c.d\\(.*\"e\"=(.*?)\\}\\);.*
But its not helping around.
Try this:
.+\s*=\s*({(?:.+:.+,?)+})(?=;)
You can use something like:
.*?\n(.*);
Here is the version with named groups:
String text = "a.b.c.d.e = \n{\"altImages\":2,\"available\":1,\"availableColorCount\":3};";
Pattern pattern = Pattern.compile(".*?\n(?<JSON>.*);");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
System.out.println(matcher.group("JSON"));
}

Extracting a pattern from String

I have a Random string from which i need to match a certain pattern and parse it out.
My String-
{"sid":"zw9cmv1pzybexi","parentId":null,"time":1373271966311,"color":"#e94d57","userId":"255863","st":"comment","type":"section","cType":"parent"},{},null,null,null,null,{"sid":"zwldv1lx4f7ovx","parentId":"zw9cmv1pzybexi","time":1373347545798,"color":"#774697","userId":"5216907","st":"comment","type":"section","cType":"child"},{},null,null,null,null,null,{"sid":"zw76w68c91mhbs","parentId":"zw9cmv1pzybexi","time":1373356224065,"color":"#774697","userId":"5216907","st":"comment","type":"section","cType":"child"},
From the above I want to parse out (using regex) all the values for userId attribute. Can anyone help me out on how to do this ? It is a Random string and not JSON. Can you provide me a regex solution for this ?
Is that a random string ? It looks like JSON to me, and if it is I would recommend a JSON parser in preference to a regexp. The right thing to do when faced with a particular language/grammar is to use the corresponding parser, rather than a (potentially) fragile regexp.
To get the user Ids, you can use this pattern:
String input = "{\"sid\":\"zw9cmv1pzybexi\",\"parentId\":null,\"time\":1373271966311,\"color\":\"#e94d57\",\"userId\":\"255863\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"parent\"},{},null,null,null,null,{\"sid\":\"zwldv1lx4f7ovx\",\"parentId\":\"zw9cmv1pzybexi\",\"time\":1373347545798,\"color\":\"#774697\",\"userId\":\"5216907\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"child\"},{},null,null,null,null,null,{\"sid\":\"zw76w68c91mhbs\",\"parentId\":\"zw9cmv1pzybexi\",\"time\":1373356224065,\"color\":\"#774697\",\"userId\":\"5216907\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"child\"},";
Pattern p = Pattern.compile("\"userId\":\"(.*?)\"");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
which outputs:
255863
5216907
5216907
If you want the full string "userId":"xxxx", you can use m.group(); instead of m.group(1);.
Use JSON parser instead of using Regex, your code will be much more readable and maintainable
http://json.org/java/
https://code.google.com/p/json-simple/
As other already told you, it looks like a JSON String, but if you really want to parse this string on your own, you could use this piece of code:
final Pattern pattern = Pattern.compile("\"userId\":\"(\\d+)\"");
final Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
The matcher will match every "userId":"12345" pattern. matcher.group(1) will return every userId, 12345 in this case (matcher.group() without parameter returns the entire group, ie "userId":"12345").
Here's the regex-code you're asking for ..
//assign subject
String subject = "{\"sid\":\"zw9cmv1pzybexi\",\"parentId\":null,\"time\":1373271966311,\"color\":\"#e94d57\",\"userId\":\"255863\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"parent\"},{},null,null,null,null,{\"sid\":\"zwldv1lx4f7ovx\",\"parentId\":\"zw9cmv1pzybexi\",\"time\":1373347545798,\"color\":\"#774697\",\"userId\":\"5216907\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"child\"},{},null,null,null,null,null,{\"sid\":\"zw76w68c91mhbs\",\"parentId\":\"zw9cmv1pzybexi\",\"time\":1373356224065,\"color\":\"#774697\",\"userId\":\"5216907\",\"st\":\"comment\",\"type\":\"section\",\"cType\":\"child\"},";
//specify pattern and matcher
Pattern pat = Pattern.compile( "userId\":\"(\\d+)", Pattern.CASE_INSENSITIVE|Pattern.DOTALL );
Matcher mat = pat.matcher( subject );
//browse all
while ( mat.find() )
{
System.out.println( "result [" + mat.group( 1 ) + "]" );
}
But OF COURSE I´d suggest to solve this using a JSON-Parser like
http://json.org/java/
Greetings
Christopher
It's a JSON format, so you have to use a JSON Parser:
JSONArray array = new JSONArray(yourString);
for (int i=0;i<array.length();i++){
JSONObject jo = inputArray.getJSONObject(i);
userId = jo.getString("userId");
}
EDIT : Regex pattern
"userId"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")
Result :
"userId" : "Some user ID (numeric or letters)"

Categories