How to copy certain text from a website with Java - java

I have a website that is in plain text. The website is in a format like this:
{"code1":"Text I want copied","code2":"Second text I want to copy"}
Every time the website refreshes though, the texts I want copied change in length. I am curious how I could retrieve the text starting after ' :" ' and before ' ", ', using Java. I want the same thing to happen with the second text as well. I also would like to remove the html tags. Help will be greatly appreciated.

Using the org.json library, you could parse the JSON like:
String myJSONString = "{\"code1\":\"Text I want copied\",\"code2\":\"Second text I want to copy\"}";
JSONObject object = new JSONObject(myJSONString);
String[] keys = JSONObject.getNames(object);
String firstText = (String) object.get(keys[0]);
String secondText = (String) object.get(keys[1]);
For parsing the web page, you can use the JSoup library. See an example from this answer.

Related

I want Bold text for my java coded Telegram bot, how?

I want to send bold text via a bot.
To send it as a normal person you would have to type 2 stars in front and behind the message, but this doesn't work for the bot. I have searched for a solution here but most bots are developed in PHP or Python.
`String a = emoji+"**dump alert**\n";
String b = "Date and time: ";
String c = month+" "+date.format(format1)+" / "+date.format(format2)+"\n";
String d = "Exchange: "+exc;
return a+b+c+d;`
For work with html tags in Text you need ON this options.
You can do it edit this flag:
message.enableHtml(true);
After this you can set bold text use this example:
String text = "<b>Bold text</b>";
When you work with Markdown you should use only one star *bold*
It should work with html tags. So instead of
String a = emoji+"**dump alert**\n";
try using this
String a = emoji+"<b>dump alert</b>\n";

Crawling & parsing results of querying google-like search engine

I have to write parser in Java (my first html parser by this way). For now I'm using jsoup library and I think it is very good solution for my problem.
Main goal is to get some information from Google Scholar (h-index, numbers of publications, years of scientific carier). I know how to parse html with 10 people, like this:
http://scholar.google.pl/citations?mauthors=Cracow+University+of+Economics&hl=pl&view_op=search_authors
for( Element element : htmlDoc.select("a[href*=/citations?user") ){
if( element.hasText() ) {
String findUrl = element.absUrl("href");
pagesToVisit.add(findUrl);
}
}
BUT I need to find information about all of scientists from asked university. How to do that? I was thinking about getting url from button, which is guiding us to next 10 results, like that:
Elements elem = htmlDoc.getElementsByClass("gs_btnPR");
String nextUrl = elem.attr("onclick");
But I get url like that:
citations?view_op\x3dsearch_authors\x26hl\x3dpl\x26oe\x3dLatin2\x26mauthors\x3dAGH+University+of+Science+and+Technology\x26after_author\x3dslQKAC78__8J\x26astart\x3d10
I have to translate \x signs and add that site to my "toVisit" sites? Or it is a better idea inside jsoup library or mayby in other library? Please let me know! I don't have any other idea, how to parse something like this...
I have to translate \x signs and add that site to my "toVisit" sites...I don't have any other idea, how to parse something like this...
The \xAA is hexadecimal encoded ascii. For instance \x3d is =, and \x26 is &. These values can be converted using Integer.parseInt with radix set to 16.
char c = (char)Integer.parseInt("\\x3d", 16);
System.out.println(c);
If you need to decode these values without a 3rd party library, you can do so using regular expressions. For example, using the String supplied in your question:
String st = "citations?view_op\\x3dsearch_authors\\x26hl\\x3dpl\\x26oe\\x3dLatin2\\x26mauthors\\x3dAGH+University+of+Science+and+Technology\\x26after_author\\x3dslQKAC78__8J\\x26astart\\x3d10";
System.out.println("Before Decoding: " + st);
Pattern p = Pattern.compile("\\\\x([0-9A-Fa-f]{2})");
Matcher m = p.matcher(st);
while ( m.find() ){
String c = Character.toString((char)Integer.parseInt(m.group(1), 16));
st = st.replaceAll("\\" + m.group(0), c);
m = p.matcher("After Decoding: " + st);//optional, but added for clarity as st has changed
}
System.out.println(st);
You currently get a URL like this using your code:
citations?view_op\x3dsearch_authors\x26hl\x3dpl\x26oe\x3dLatin2\x26mauthors\x3dAGH+University+of+Science+and+Technology\x26after_author\x3dQPQwAJz___8J\x26astart\x3d10
You have to extract that bold part (using a regex), and use that to construct the URL for getting the next page of search results, which looks like this:
scholar.google.pl/citations?view_op=search_authors&hl=plmauthors=Cracow+University+of+Economic&after_author=QPQwAJz___8J
You can then get that next page from this URL and parse using Jsoup, and repeat for getting all the next remaining pages.
Will put together some example code later.

How to maintain the format of a string Java

So I'm parsing a JSON string to a java string and printing it. I'm using the following method to do that.
JSONParser parser=new JSONParser();
Object obj = parser.parse(output);
JSONObject jsonObject = (JSONObject) obj;
String stdout= (String) jsonObject.get("Stdout");
String stderr= (String) jsonObject.get("Stderr");
out.print(stdout);
out.print(stderr);
This is my JSON string:
{"Stdout":"/mycode.c: In function 'main':\n/mycode.c:8:5: error: expected ';' before 'return'\n return 0;\r\n ^\nsh: 1: ./myapp: not found\n","Stderr":"exit status 127"}
When I use System.out.print(stdout) and System.out.print(stdout) I get my desired format of output in the console. That is:
But now obviously I want it on my webpage so I do out.print(stdout) instead. But I don't get the desired format. Instead it just shows a single line. See picture:
Any ideas how to fix this?
Your webpage is HTML, so your /r/n aren't being treated as line breaks.
You could replace all of the \r\n with <br> tags to force new lines.
Or put your whole message in a <PRE> tag, which will render it as plain boring text and not HTML content. This is probably the safer option, because the content could contain other characters or text that might upset HTML parsing by the browser:
out.print("<PRE>" + stdout + "</PRE>");

get <img> value from a string in java

I'm parsing data from a json file. Now, I've a data like this
String Content = <p><img class="alignleft size-full wp-image-56999" alt="abdullah" src="http://www.some.com/wp-content/uploads/2013/12/imageName.jpg" width="348" height="239" />Text</p>
<p>Text</p> <p>Text</p><p>The post Some Text appeared first on Some Webiste</p>
Now, I want to divide this string in two pieces. I want to get this URL from src.
http://www.some.com/wp-content/uploads/2013/12/imageName.jpg
and store it a variable. Also, I want to remove the last line The Post appeared... and store the text's in another variable.
So, the questions are:
Is it possible to get that?
If possible, how can I achieve that ?
IN Java
Get a Document object
Document originalDoc = new SAXReader().read(new StringReader("<div>data</div>");
Then you can parse it.. (read this tutorial)
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
In JavaScript
to get attribute
var url = document.getElementsByTagName('img')[0].getAttribute('src');
In case if you have a string and you want a document object, use jquery
string stringValue = '<div>data</div>';
var myObject= $(stringValue);
Use String.substring(firstIndex, lastIndex) to get the link from src attribute
learn to use a HTML parser like JSoup, will be useful in near future
If its a well structured string you can parse it using any DOM parser and extract data from it...

Java : How to assign Json formated String to Java String?

I have a big json string which i will be getting as a request from the UI , which will be converted to a String and parsed .
I want to simulate the similar environment for testing locally , so for this purpose i captured the JSon format.
Currently i am manually adding "/" to this big json string .
Is there any other way to achieve this ??
For example i got this json
{"age":29,"messages":["msg 1","msg 2","msg 3"],"name":"Preethi"}
and converted that into
String str = "{\"age\":\"29\",\"messages\":[\"msg 1\",\"msg 2\",\"msg 3\"],\"name\":\"mkyong\"}";
Is there any other way to achieve this ??
On the client-side, do a search and regex "replace all" of double-quotes into single quotes on the desired form field before actually sending the request.
Actually, Java doesn't have verbatim string literals.
If you want a Java-like (and Java-VM-based) language that does, however, you might want to look at Groovy which has various forms of string literal.
we have in build method to convert jsonObject to string. Why don't you use that.
JSONObject json = new JSONObject();
json.toString();

Categories