get <img> value from a string in java - java

I'm parsing data from a json file. Now, I've a data like this
String Content = <p><img class="alignleft size-full wp-image-56999" alt="abdullah" src="http://www.some.com/wp-content/uploads/2013/12/imageName.jpg" width="348" height="239" />Text</p>
<p>Text</p> <p>Text</p><p>The post Some Text appeared first on Some Webiste</p>
Now, I want to divide this string in two pieces. I want to get this URL from src.
http://www.some.com/wp-content/uploads/2013/12/imageName.jpg
and store it a variable. Also, I want to remove the last line The Post appeared... and store the text's in another variable.
So, the questions are:
Is it possible to get that?
If possible, how can I achieve that ?

IN Java
Get a Document object
Document originalDoc = new SAXReader().read(new StringReader("<div>data</div>");
Then you can parse it.. (read this tutorial)
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
In JavaScript
to get attribute
var url = document.getElementsByTagName('img')[0].getAttribute('src');
In case if you have a string and you want a document object, use jquery
string stringValue = '<div>data</div>';
var myObject= $(stringValue);

Use String.substring(firstIndex, lastIndex) to get the link from src attribute
learn to use a HTML parser like JSoup, will be useful in near future

If its a well structured string you can parse it using any DOM parser and extract data from it...

Related

Extract JSON as String from jsp

I am working on the parsing a website view-source:https://massive.ucsd.edu/ProteoSAFe/datasets.jsp. I want to parse the .jsp and extract the JSOn object from the same.
I am using Jsoup to extract the data
Document doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();
Then using Java pattern to extract Json as string:
Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));
Matcher m = p.matcher(script.html());
While doing so I am getting error. Last line is not getting parsed correctly.
It cuts in the end so I get
'A JSONObject text must end with '}' at character 577' error.
Can anyone suggest me better way to parse this page to get data.
While it seems like a bad idea to parse any HTML with regex.
This works for me Pattern.compile("(?s)var datasets = (\\[.*?\\]);")
(Tested via Python, since that's all I have available).
And that returns a JSONArray, not a JSONObject.

Send a tag in a url

First, sorry for my english it's not my native language.
So, I am working on an application in JSP and in one of my forms I have a field "comments". When I submit this form, the value of this field is sent to my servlet by an ajax request.
var request = 'mainServlet?command=SendRequest';
request += ('&comments=' + $('#comments').val());
But when there is a "<" or ">" in the field, $('#comments').val() translate them into "&lt" or "&gl". For exemple, is converted to &lt ;test&gl ;
And when I want to recover the value in my servlet, I do:
String comments = request.getParameter("comments");
But the url looks like : mainServlet?command=SendRequest&comments=&lt ;test&gl ;
So request.getParameter("comments"); returns an empty string.
I thought that I could replace the string like &lt by my own code and then replace it again in my servlet, but is there a simpler way to do this?
Thanks.
Edit: After, I reuse the comments in an other jsp.
I believe what you need is the encodeURIComponent function. It will convert any string into a format that you can use inside a URI.
Just remember to decode it on the receiving end, I believe the URLDecoder class can do this for you.

How do you parse the contents of a .odt file into a string in Java?

Preferable using the ODFDOM API. I would like to have the entire file's contents in a string, if possible. If not, how would you search the file for a specific substring?
Thanks in advance.
you will need to load the odt document and then get the content root. From there, get the text content which will return you a string. So that should give you an idea on how to search using string? For example:
TextDocument document = TextDocument.loadDocument("test.odt");
String texts = document.getContentRoot().getTextContent());

java xml parsing between tags

What im trying to do is parse xml through java. and i only want a snippet of text from each tag for example.
xml example
<data>\nSome Text :\n\MY Spectre around me night and day. Some More: Like a wild beast
guards my way.</data>
<data>\nSome Text :\n\Cruelty has a human heart. Some More: And Jealousy a human face
</data>
so far i have this
NodeList ageList = firstItemElement.getElementsByTagName("data");
Element ageElement =(Element)ageList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
out.write("Data : " + ((Node)textAgeList.item(0)).getNodeValue().trim());
im trying to just get the "Some More:....." part i dont want the whole tag
also im trying to get rid of all the \n
If you're not restricted to the standard DOM API, you could try to use jOOX, which wraps standard DOM. Your example would then translate to:
// Use jOOX's jquery-like API to find elements and their text content
for (String string : $(firstItemElement).find("data").texts()) {
// Use standard String methods to replace content
System.out.println(string.replace("\\n", ""));
}
I would take all of the element text and use regular expressions to capture the relevant parts.

How to return the original string in Java after using jquery.serialize()

I have a problem after serializing the form with jquery.
Why some text retain the html entities even after loaded to Java(Servlet)
For example I have a text & and it will return into %26 in Java.
I serialize and submit the form into Java using this..
function ajaxSubmit(frmN){
var serForm = $(frmN).serialize();
$.ajax({
type:'POST',
url:'inser',
data:{actionName : "insertField", formField : serForm},
success: function(request){
$("#reqContainer").html(request);
}
});
}
Is there a way to deserialize the html entities from java.
I guess I need first to split the & and then split the =
to get the list of field and its value, and after that the
deserialization will begin.
I'll appreciate any help.
I read some article using JSON but I don't have time to study it.
If there is an alternative way submitting all the form values via
ajax with jquery, and will get the original
value from Java please let me know.
Did you try JQuery form plugin? I remember a ajaxSubmit utility method.
http://malsup.com/jquery/form/
Otherwise you could just use URLDecoder in java. This will change your ascii character back to its original string.

Categories