Unable to display subscript text in textView - java

I'm quite new to Android and I'm trying to display some chemical formulas in a textView which is contained in a customListView.
I'm fetching all datas from xml parsing, then I wish to display the formula, such as C₉H₈O₄.
But I can see only 1-4 digits.
I'm converting from "normal" to "subscript" in this way
str = str.replaceAll("0", "\u2080");
str = str.replaceAll("1", "\u2081");
str = str.replaceAll("2", "\u2082");
str = str.replaceAll("3", "\u2083");
str = str.replaceAll("4", "\u2084");
str = str.replaceAll("5", "\u2085");
str = str.replaceAll("6", "\u2086");
str = str.replaceAll("7", "\u2087");
str = str.replaceAll("8", "\u2088");
str = str.replaceAll("9", "\u2089");
str contains the formula fetched from the xml file.
The strange behavior is that I can see in the Logcat the formula as it should be.
I also tried with customs fonts but nothing.
Here are two results:
the first is with normal font, the second with a custom one
https://www.dropbox.com/s/jyk64p700up14db/cella.jpg
https://www.dropbox.com/s/ab9h1b45j2hrods/Schermata%2003-2456370%20alle%2022.05.45.png
Over the web I can read as a solution using something like
setText(Html.fromHtml("X<sub>2</sub>"));
but I really don't know how to use it in my case.
Any suggestion?

It will not be easy trying to solv that problem with Html.fromHtml("X<sub>2</sub>")
you need a lib that can help you to achieve that
(JEuclid is a complete MathML rendering solution, consisting of:) http://jeuclid.sourceforge.net/
Look at the example and you'll get a way to resolve your issue
Other alternatives for rendering math expressions with TeX:
http://jmathtex.sourceforge.net/
http://sourceforge.net/projects/snuggletex/
http://forge.scilab.org/index.php/p/jlatexmath/

Finally I solved the problem: It was a font issue.
I just used Calibri and It works!

Related

Java Regexp to match domain of url

I would like to use Java regex to match a domain of a url, for example,
for www.table.google.com, I would like to get 'google' out of the url, namely, the second last word in this URL string.
Any help will be appreciated !!!
It really depends on the complexity of your inputs...
Here is a pretty simple regex:
.+\\.(.+)\\..+
It fetches something that is inside dots \\..
And here are some examples for that pattern: https://regex101.com/r/L52oz6/1.
As you can see, it works for simple inputs but not for complex urls.
But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. So if that does not solve the problem for your inputs then please callback, I will adjust the regex pattern then.
Note that you can also just use simple splitting like:
String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];
But don't forget the index-bound checking.
Or if you search for a very quick solution than walk through the input starting from the last position. Work your way through until you found the first dot, continue until the second dot was found. Then extract that part with input.substring(index1, index2);.
There is also already a delegate method for exactly that purpose, namely String#lastIndexOf (see the documentation).
Take a look at this code snippet:
String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);
Maybe the bounds are off by 1, haven't tested the code, but you get the idea. Also don't forget bound checking.
The advantage of this approach is that it is really fast, it can directly work on the internal structures of Strings without creating copies.
My attempt:
(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)
Demo. Works correctly for:
http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c
private static final Pattern URL_MATCH_GET_SECOND_AND_LAST =
Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);
String sURL = "www.table.google.com";
if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){
Matcher matchURL = URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);
if (matchURL .find()) {
String sFirst = matchURL.group(1);
String sSecond= matchURL.group(2);
}
}

Crawling & parsing results of querying google-like search engine

I have to write parser in Java (my first html parser by this way). For now I'm using jsoup library and I think it is very good solution for my problem.
Main goal is to get some information from Google Scholar (h-index, numbers of publications, years of scientific carier). I know how to parse html with 10 people, like this:
http://scholar.google.pl/citations?mauthors=Cracow+University+of+Economics&hl=pl&view_op=search_authors
for( Element element : htmlDoc.select("a[href*=/citations?user") ){
if( element.hasText() ) {
String findUrl = element.absUrl("href");
pagesToVisit.add(findUrl);
}
}
BUT I need to find information about all of scientists from asked university. How to do that? I was thinking about getting url from button, which is guiding us to next 10 results, like that:
Elements elem = htmlDoc.getElementsByClass("gs_btnPR");
String nextUrl = elem.attr("onclick");
But I get url like that:
citations?view_op\x3dsearch_authors\x26hl\x3dpl\x26oe\x3dLatin2\x26mauthors\x3dAGH+University+of+Science+and+Technology\x26after_author\x3dslQKAC78__8J\x26astart\x3d10
I have to translate \x signs and add that site to my "toVisit" sites? Or it is a better idea inside jsoup library or mayby in other library? Please let me know! I don't have any other idea, how to parse something like this...
I have to translate \x signs and add that site to my "toVisit" sites...I don't have any other idea, how to parse something like this...
The \xAA is hexadecimal encoded ascii. For instance \x3d is =, and \x26 is &. These values can be converted using Integer.parseInt with radix set to 16.
char c = (char)Integer.parseInt("\\x3d", 16);
System.out.println(c);
If you need to decode these values without a 3rd party library, you can do so using regular expressions. For example, using the String supplied in your question:
String st = "citations?view_op\\x3dsearch_authors\\x26hl\\x3dpl\\x26oe\\x3dLatin2\\x26mauthors\\x3dAGH+University+of+Science+and+Technology\\x26after_author\\x3dslQKAC78__8J\\x26astart\\x3d10";
System.out.println("Before Decoding: " + st);
Pattern p = Pattern.compile("\\\\x([0-9A-Fa-f]{2})");
Matcher m = p.matcher(st);
while ( m.find() ){
String c = Character.toString((char)Integer.parseInt(m.group(1), 16));
st = st.replaceAll("\\" + m.group(0), c);
m = p.matcher("After Decoding: " + st);//optional, but added for clarity as st has changed
}
System.out.println(st);
You currently get a URL like this using your code:
citations?view_op\x3dsearch_authors\x26hl\x3dpl\x26oe\x3dLatin2\x26mauthors\x3dAGH+University+of+Science+and+Technology\x26after_author\x3dQPQwAJz___8J\x26astart\x3d10
You have to extract that bold part (using a regex), and use that to construct the URL for getting the next page of search results, which looks like this:
scholar.google.pl/citations?view_op=search_authors&hl=plmauthors=Cracow+University+of+Economic&after_author=QPQwAJz___8J
You can then get that next page from this URL and parse using Jsoup, and repeat for getting all the next remaining pages.
Will put together some example code later.

JSon to CSV with Java using CDL: possible to replace comma-sep. by semi-colum sep. values?

Everything is in the title :)
I'm using org.json.CDL to convert JSONArray into CSV data but it renders a string with ',' as separator.
I'd like to know if it's possible to replace with ';' ?
Here is a simple example of what i'm doing:
public String exportAsCsv() throws Exception {
return CDL.toString(
new JSONArray(
mapper.writeValueAsString(extractAccounts()))
);
}
Thanks in advance for any advice on that question.
Edit: No replacement solution of course, as this could have impact for large data, and of course the library used enable me to specify the field separator.
Edit2: Finally the solution to extract data as JSONArray (and String...) was not very good, especially for large data file.
So i made the following changes:
use a Java CSV library (for example: http://www.csvreader.com/java_csv_samples.php)
refactor code to stream data from json input source to csv output source
This is nicer for large data treatment. If you have comments do not hesitate.
String output = "Hello,This,is,separated,by,a,comma";
// Simple call the replaceAll method.
output = output.replace(',',';');
I found this in the String documentation.
Example
String value = "Hello,tthis,is,a,string";
value = value.replace(',', ';');
System.out.println(value);
// Outputs: Hello;tthis;is;a;string

Java Western + Arabic String concatenation issues

I'm having trouble in concatenating pieces of text mixing Western and Arabic chars.
I've a list of tokens like this:
-LRB-
دریای
مازندران
-RRB-
,
I use the following procedure to concatenate these list of tokens:
String str = "";
for (String tok : tokens) {
str += tok + " ";
}
This is the output of my procedure:
-LRB- دریای مازندران -RRB- ,
As can be seen, the position of the Arabic words is inverted.
How can I solve this (maybe suggesting to Java to ignore the information about text direction)?
EDIT
Actually, it seems that my problem was a false problem.
Now I've a new one. I need to wrap each word inside a string like this (word *) so that my output will be like this:
(word1 *)(word2 *)(word3 *)...
The procedure that I use is the following:
String str = "";
for (String tok : tokens) {
str += "(" + tok + "*)";
}
However, the result that I got is this:
(-LRB- *)(دریای *)(مازندران *)(-RRB- *)(, *)
instead of:
(-LRB- *)(دریای)(* مازندران *)(-RRB- *)(, *)
** EDIT2 **
Actually, I've discovered that my problem is not a problem. I wrote my string on a file and I opened it with nano (in the console). And it was correctly concatenated.
So the problem was due to the Eclipse console (and also gedit) which --let's say-- incorrectly rendered the string.
Anyway, thanks for your help!
The output is correct, and if you are presenting this text to an Arabic-speaking user you should not override the directionality of the text. Arabic is written from right to left. When you concatenate two Arabic strings together, the first will appear to the right of the second. This is controlled by the BiDi algorithm, the details of which are covered in http://www.unicode.org/reports/tr9/.
First, I would suggest using StringBuilder instead of raw String concatination. You will make your Garbage Collector a lot happier. Second, not seeing the input or how your StringTokenizer is setup, I would venture a guess that it seems like you are having problems tokenizing the string properly.

How to change the width and height of an html file using java

I wanted to change width="xyz" , where (xyz) can be any particular value to width="300". I researched on regular expressions and this was the one I am using a syntax with regular expression
String holder = "width=\"340\"";
String replacer="width=\"[0-9]*\"";
theWeb.replaceAll(replacer,holder);
where theWeb is the string
. But this was not getting replaced. Any help would be appreciated.
Your regex is correct. One thing you might be forgetting is that in Java all string methods do not affect the current string - they only return a new string with the appropriate transformation. Try this instead:
String replacement = 'width="340"';
String regex = 'width="[0-9]*"';
String newWeb = theWeb.replaceAll(regex, replacement); // newWeb holds new text
Better use JSoup for manipulating and extracting data, etc. from Html
See this link for more details:
http://jsoup.org/

Categories