Extract JSON as String from jsp - java

I am working on the parsing a website view-source:https://massive.ucsd.edu/ProteoSAFe/datasets.jsp. I want to parse the .jsp and extract the JSOn object from the same.
I am using Jsoup to extract the data
Document doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();
Then using Java pattern to extract Json as string:
Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));
Matcher m = p.matcher(script.html());
While doing so I am getting error. Last line is not getting parsed correctly.
It cuts in the end so I get
'A JSONObject text must end with '}' at character 577' error.
Can anyone suggest me better way to parse this page to get data.

While it seems like a bad idea to parse any HTML with regex.
This works for me Pattern.compile("(?s)var datasets = (\\[.*?\\]);")
(Tested via Python, since that's all I have available).
And that returns a JSONArray, not a JSONObject.

Related

How to maintain the format of a string Java

So I'm parsing a JSON string to a java string and printing it. I'm using the following method to do that.
JSONParser parser=new JSONParser();
Object obj = parser.parse(output);
JSONObject jsonObject = (JSONObject) obj;
String stdout= (String) jsonObject.get("Stdout");
String stderr= (String) jsonObject.get("Stderr");
out.print(stdout);
out.print(stderr);
This is my JSON string:
{"Stdout":"/mycode.c: In function 'main':\n/mycode.c:8:5: error: expected ';' before 'return'\n return 0;\r\n ^\nsh: 1: ./myapp: not found\n","Stderr":"exit status 127"}
When I use System.out.print(stdout) and System.out.print(stdout) I get my desired format of output in the console. That is:
But now obviously I want it on my webpage so I do out.print(stdout) instead. But I don't get the desired format. Instead it just shows a single line. See picture:
Any ideas how to fix this?
Your webpage is HTML, so your /r/n aren't being treated as line breaks.
You could replace all of the \r\n with <br> tags to force new lines.
Or put your whole message in a <PRE> tag, which will render it as plain boring text and not HTML content. This is probably the safer option, because the content could contain other characters or text that might upset HTML parsing by the browser:
out.print("<PRE>" + stdout + "</PRE>");

get <img> value from a string in java

I'm parsing data from a json file. Now, I've a data like this
String Content = <p><img class="alignleft size-full wp-image-56999" alt="abdullah" src="http://www.some.com/wp-content/uploads/2013/12/imageName.jpg" width="348" height="239" />Text</p>
<p>Text</p> <p>Text</p><p>The post Some Text appeared first on Some Webiste</p>
Now, I want to divide this string in two pieces. I want to get this URL from src.
http://www.some.com/wp-content/uploads/2013/12/imageName.jpg
and store it a variable. Also, I want to remove the last line The Post appeared... and store the text's in another variable.
So, the questions are:
Is it possible to get that?
If possible, how can I achieve that ?
IN Java
Get a Document object
Document originalDoc = new SAXReader().read(new StringReader("<div>data</div>");
Then you can parse it.. (read this tutorial)
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
In JavaScript
to get attribute
var url = document.getElementsByTagName('img')[0].getAttribute('src');
In case if you have a string and you want a document object, use jquery
string stringValue = '<div>data</div>';
var myObject= $(stringValue);
Use String.substring(firstIndex, lastIndex) to get the link from src attribute
learn to use a HTML parser like JSoup, will be useful in near future
If its a well structured string you can parse it using any DOM parser and extract data from it...

Jackson not escaping quotes in JSON

I'm trying to put a json in a javascript file in java, but when I write the json to a string, the string doesn't appear to be a valid json for javascript; it is missing some escapes. (This is happening in a string in the json which I formatted as a faux json.)
For example, this would be a valid json in my javascript file:
{
"message":
"the following books failed: [{\"book\": \"The Horse and his Boy\",\"author\": \"C.S. Lewis\"}, {\"book\": \"The Left Hand of Darkness\",\"author\": \"Ursula K. le Guin\"}, ]"
}
Here's what I get, though, where the double quotes aren't escaped:
{
"message":
"The following books failed: [{"book": "The Horse and his Boy","author": "C.S. Lewis"}, {"book": "The Left Hand of Darkness","author": "Ursula K. le Guin"}, ]"
}
I get the second result when I do this:
new ObjectMapper().writer().writeValueAsString(booksMessage);
But when I write it directly to a file with jackson, I get the first, good result:
new ObjectMapper().writer().writeValue(fileToWriteTo, booksMessage);
So why does jackson escape differently when writing to a file, and how do I get it to escape like that for me when writing to a string?
The writeValue() methods of the ObjectWriter class encode the input text.
You don't need to write to a file. An alternative approach for getting the same string could be:
StringWriter sw = new StringWriter();
new ObjectMapper().writer().writeValue(sw, booksMessage);
String result = sw.toString();
I added
booksJson = Pattern.compile("\\\\").matcher(booksJson).replaceAll("\\\\\\\\");
which escapes all the escape characters. That way when I write it to file and it removes the escapes, I still have the escapes I need. So turns out my real question was how to write to file without Java escapes being removed.
I'm very late to the party but I faced a similar problem and I realized it was not a problem with Jackson or my data. It was Java. I was reading from a JSON file and then trying to write it into a template HTML file.
I had a line my original JSON like yours, something like:
{"field" : "This field contains what looks like another JSON field: {\"abc\": \"value\"}"}
And when I wrote the above to a string, the backslash before the quotes in abc and value disappeared. I noticed that the contextual help for String.replaceAll mentioned something about Matcher.quoteReplacement. I went from this:
template = template.replaceAll("%template%", jsonDataString);
to this:
Pattern pattern = Pattern.compile("%template%");
Matcher matcher = Pattern.matcher(template);
matcher.replaceAll(matcher.quoteReplacement(jsonDataString));
Problem solved.
Matcher.quoteReplacement

Combination of Specific special character causes Error

When I am sending a TextEdit data as a JSON with data as a combination of "; the app fails every time.
In detail if I am entering my username as anything but password as "; the resultant JSON file looks like:-
{"UserName":"qa#1.com","Password":"\";"}
I have searched a lot, what I could understand is the resultant JSON data voilates the syntax which results in throwing Default exception. I tried to get rid of special symbol by using URLEncoder.encode() method. But now the problem is in decoding.
Any help at any step will be very grateful.
Logcat:
I/SW_HttpClient(448): sending post: {"UserName":"qa#1.com","Password":"\";"}
I/SW_HttpClient(448): HTTPResponse received in [2326ms]
I/SW_HttpClient(448): stream returned: <!DOCTYPE html PUBLIC ---- AN HTML PAGE.... A DEFAULT HANDLER>
Hi try the following code
String EMPLOYEE_SERVICE_URI = Utils.authenticate+"?UserName="+uid+"&Email="+eid+"&Password="+URLEncoder.encode(pwd,"UTF-8");
The JSON you provided in the Question is valid.
The JSON spec requires double quotes in strings to be escaped with a backslash. Read the syntax graphs here - http://www.json.org/.
If something is throwing an exception while parsing that JSON, then either the parser is buggy or the exception means something else.
I have searched a lot, what I could understand is the resultant JSON data voilates the syntax
Your understanding is incorrect.
I tried to get rid of special symbol by using URLEncoder.encode() method.
That is a mistake, and is only going to make matters worse:
The backslash SHOULD be there.
The server or whatever that processes the JSON will NOT be expecting random escaping from a completely different standard.
But now the problem is in decoding.
Exactly.
Following provided JSON can be parsed through GSON library with below code
private String sampledata = "{\"UserName\":\"qa#1.com\",\"Password\":\"\\\";\"}";
Gson g = new Gson();
g.fromJson(sampledata, sample.class);
public class sample {
public String UserName;
public String Password;
}
For decoding the text I got the solution with..
URLDecoder.decode(String, String);

Java : How to assign Json formated String to Java String?

I have a big json string which i will be getting as a request from the UI , which will be converted to a String and parsed .
I want to simulate the similar environment for testing locally , so for this purpose i captured the JSon format.
Currently i am manually adding "/" to this big json string .
Is there any other way to achieve this ??
For example i got this json
{"age":29,"messages":["msg 1","msg 2","msg 3"],"name":"Preethi"}
and converted that into
String str = "{\"age\":\"29\",\"messages\":[\"msg 1\",\"msg 2\",\"msg 3\"],\"name\":\"mkyong\"}";
Is there any other way to achieve this ??
On the client-side, do a search and regex "replace all" of double-quotes into single quotes on the desired form field before actually sending the request.
Actually, Java doesn't have verbatim string literals.
If you want a Java-like (and Java-VM-based) language that does, however, you might want to look at Groovy which has various forms of string literal.
we have in build method to convert jsonObject to string. Why don't you use that.
JSONObject json = new JSONObject();
json.toString();

Categories