I am working on the parsing a website view-source:https://massive.ucsd.edu/ProteoSAFe/datasets.jsp. I want to parse the .jsp and extract the JSOn object from the same.
I am using Jsoup to extract the data
Document doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();
Then using Java pattern to extract Json as string:
Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));
Matcher m = p.matcher(script.html());
While doing so I am getting error. Last line is not getting parsed correctly.
It cuts in the end so I get
'A JSONObject text must end with '}' at character 577' error.
Can anyone suggest me better way to parse this page to get data.
While it seems like a bad idea to parse any HTML with regex.
This works for me Pattern.compile("(?s)var datasets = (\\[.*?\\]);")
(Tested via Python, since that's all I have available).
And that returns a JSONArray, not a JSONObject.
First, sorry for my english it's not my native language.
So, I am working on an application in JSP and in one of my forms I have a field "comments". When I submit this form, the value of this field is sent to my servlet by an ajax request.
var request = 'mainServlet?command=SendRequest';
request += ('&comments=' + $('#comments').val());
But when there is a "<" or ">" in the field, $('#comments').val() translate them into "<" or "&gl". For exemple, is converted to < ;test&gl ;
And when I want to recover the value in my servlet, I do:
String comments = request.getParameter("comments");
But the url looks like : mainServlet?command=SendRequest&comments=< ;test&gl ;
So request.getParameter("comments"); returns an empty string.
I thought that I could replace the string like < by my own code and then replace it again in my servlet, but is there a simpler way to do this?
Thanks.
Edit: After, I reuse the comments in an other jsp.
I believe what you need is the encodeURIComponent function. It will convert any string into a format that you can use inside a URI.
Just remember to decode it on the receiving end, I believe the URLDecoder class can do this for you.
Preferable using the ODFDOM API. I would like to have the entire file's contents in a string, if possible. If not, how would you search the file for a specific substring?
Thanks in advance.
you will need to load the odt document and then get the content root. From there, get the text content which will return you a string. So that should give you an idea on how to search using string? For example:
TextDocument document = TextDocument.loadDocument("test.odt");
String texts = document.getContentRoot().getTextContent());
What im trying to do is parse xml through java. and i only want a snippet of text from each tag for example.
xml example
<data>\nSome Text :\n\MY Spectre around me night and day. Some More: Like a wild beast
guards my way.</data>
<data>\nSome Text :\n\Cruelty has a human heart. Some More: And Jealousy a human face
</data>
so far i have this
NodeList ageList = firstItemElement.getElementsByTagName("data");
Element ageElement =(Element)ageList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
out.write("Data : " + ((Node)textAgeList.item(0)).getNodeValue().trim());
im trying to just get the "Some More:....." part i dont want the whole tag
also im trying to get rid of all the \n
If you're not restricted to the standard DOM API, you could try to use jOOX, which wraps standard DOM. Your example would then translate to:
// Use jOOX's jquery-like API to find elements and their text content
for (String string : $(firstItemElement).find("data").texts()) {
// Use standard String methods to replace content
System.out.println(string.replace("\\n", ""));
}
I would take all of the element text and use regular expressions to capture the relevant parts.
I have a problem after serializing the form with jquery.
Why some text retain the html entities even after loaded to Java(Servlet)
For example I have a text & and it will return into %26 in Java.
I serialize and submit the form into Java using this..
function ajaxSubmit(frmN){
var serForm = $(frmN).serialize();
$.ajax({
type:'POST',
url:'inser',
data:{actionName : "insertField", formField : serForm},
success: function(request){
$("#reqContainer").html(request);
}
});
}
Is there a way to deserialize the html entities from java.
I guess I need first to split the & and then split the =
to get the list of field and its value, and after that the
deserialization will begin.
I'll appreciate any help.
I read some article using JSON but I don't have time to study it.
If there is an alternative way submitting all the form values via
ajax with jquery, and will get the original
value from Java please let me know.
Did you try JQuery form plugin? I remember a ajaxSubmit utility method.
http://malsup.com/jquery/form/
Otherwise you could just use URLDecoder in java. This will change your ascii character back to its original string.