Changing HTML to PlainText

Changing HTML to PlainText - java

I'm trying to edit HTML to be plaintext in java, but I am running into an issue. I am trying to get the number on the padding-left element in the code and transform it into tabs but it doesn't work.
ie.
<p style="padding-left:40px;">Hello</p> becomes Hello with a tab in front of it.
Here is my code so far (every 40px becomes one tab)
private static String setNonHTML(String txt)
{
System.out.println(txt.substring(txt.indexOf("<p style=\"padding-left:") + 23, txt.indexOf("px\"><b>")));
//return "";
return txt
.replaceAll("<br>","\n")
.replaceAll(txt.substring(txt.indexOf("<p style=\"padding-left:"), txt.indexOf("px\"><b>") + 7)
,"\n" + repeat("\t",Integer.parseInt(txt.substring(txt.indexOf("<p style=\"padding-left:") + 23, txt.indexOf("px\"><b>")))/40))
.replaceAll(txt.substring(txt.indexOf("<p style=\"padding-left:"), txt.indexOf("px\">") + 4)
,"\n" + repeat("\t",Integer.parseInt(txt.substring(txt.indexOf("<p style=\"padding-left:") + 23, txt.indexOf("px\">")))/40))
.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", "\n");
}

I cleaned up some of your code to show you what is happening
private static String setNonHTML(String txt)
{
System.out.println(txt.substring(txt.indexOf("<p style=\"padding-left:") + 23, txt.indexOf("px\"><b>")));
//return "";
//grab the padding text indexes
int beforePaddingIndex = txt.indexOf("<p style=\"padding-left:");
int afterPaddingIndex = txt.indexOf("px\"><b>");
//replace all breaks with new lines
txt = txt.replaceAll("<br>", "\n");
//replaces all instances of 40px\"> with \n\t
txt = txt.replaceAll(txt.substring(beforePaddingIndex, afterPaddingIndex + 7), "\n" + repeat("\t", Integer.parseInt(txt.substring(beforePaddingIndex + 23, afterPaddingIndex)) / 40));
//the indexes of these items have changed because the last operation replaced them. The following items will not have indexes due to the replace operation.
beforePaddingIndex = txt.indexOf("<p style=\"padding-left:");
afterPaddingIndex = txt.indexOf("px\"><b>");
afterPaddingBeforeBoldIndex = txt.indexOf("px\">");
//replace a substring of the same tag a second time? should find nothing
txt = txt.replaceAll(txt.substring(beforePaddingIndex, afterPaddingIndex), "\n" + repeat("\t", Integer.parseInt(txt.substring(beforePaddingIndex + 23, afterPaddingBeforeBoldIndex)) / 40));
txt = txt.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", "\n");
return txt;
}
as you can see, after the first replace all, there is a second replace all that takes place on virtually the same indexes. You grab the index of values inline after the first replace all so I set them again to replicate that behavior. Splitting out code into descriptive variables and sections is a good practice and is monumentally helpful when trying to debug complicated sections. I don't know what the output of your program is giving you, so I have no way to know if this actually solves your issue, but it does look like a bug and I believe this might give you a good start.
As for what you should do to fix this, you may want to look into some off the shelf solution like http://htmlcleaner.sourceforge.net/javause.php
That allows you to traverse and modify html programmatically and read off attributes like padding left and the extract content between tags.

Related

Dealing with long Strings & TextView behavior

I’m developing an android app that gets objects from a server and shows them in a simple list.
I’m trying to figure out how to deal with long object’s titles :
Every title populates a designated multi-line TextView.
If a title is longer than 16 characters, it messes with my desired UI.
There are two scenarios I need to solve -
1). If the title is longer than 16 characters & contains more than one word, I need to split the words into different lines (I tried to .split("") and .trim(), but I don’t want to use another view, just break a line in the same one, and the use in ("") seems unreliable to me).
2). If the title is longer than 16 characters and contains only one long word, I only need to change font size specifically.
Any ideas for a good and reliable solution?
Thanks a lot in advance.

use SpannableString for a single view
For title:
SpannableString titleSpan = new SpannableString("title String");
titleSpan.setSpan(new RelativeSizeSpan(1.3f), 0, titleSpan.length(), Spanned.SPAN_EXCLUSIVE_EXCLUSIVE);
for Message
SpannableString messageSpan = new SpannableString("Message String");
messageSpan.setSpan(new RelativeSizeSpan(1.0f), 0, messageSpan.length(), Spanned.SPAN_EXCLUSIVE_EXCLUSIVE);
set in TextView
tvTermsPolicyHeading.setText(TextUtils.concat(titleSpan, messageSpan));

Code like below it will work as you need
String title; //your title
//find length of your title
int length = title.length();
if (length>16){
string[] titles = myString.split("\\s+");
int size = titles.length;
if (size < 2){
yourTextview.setText(title);
// reduce the text size of your textview
}else {
String newTitle= "";
for (int i=0;i<titles.length;i++){
newTitle = titles[i]+"\n"
}
yourTextview.setText(newTitle);
}
}

You can split and then concatenate the words using "\n" if there are more than one words.
In case of long word
You can see this question here
Auto-fit TextView for Android

try this:
if(title.split(" ").size > 1){
String line1 = title.substring(0, 16);
int end = line1.lastIndexOf(" ");
titleTextView.setText(title.substring(0,end) + "\n" +
title.substring(end+1,title.size-1);
}else{
titleTextView.setText(title);
titleTextView.setTextSize(yourTextSize);
}
this code should work perfectly for your case.

Java: Issue when replacing Strings on loop

I'm building a small app which auto translates boolean queries in Java.
This is the code to find if the query string contains a certain word and if so, it replaces it with the translated value.
int howmanytimes = originalValues.size();
for (int y = 0; y < howmanytimes; y++) {
String originalWord = originalValues.get(y);
System.out.println("original Word = " + originalWord);
if (toReplace.contains(" " + originalWord.toLowerCase() + " ")
|| toCheck.contains('"' + originalWord.toLowerCase() + '"')) {
toReplace = toReplace.replace(originalWord, translatedValues.get(y).toLowerCase());
System.out.println("replaced " + originalWord + " with " + translatedValues.get(y).toLowerCase());
}
System.out.println("to Replace inside loop " + toReplace);
}
The problem is when a query has, for example, '(mykeyword OR "blue mykeyword")' and the translated values are different, for example, mykeyword translates to elpalavra and "blue mykeyword" translates to "elpalavra azul". What happens in this case is that the result string will be '(elpalavra OR "blue elpalavra")' when it should be '(elpalavra OR "elpalavra azul")' . I understand that in the first loop it replaces all keywords and in the second it no longer contains the original value it should for translation.
How can I fix this?
Thank you

you can sort originalValues by size desc. And after that loop through them.
This way you first replace "blue mykeyword" and only after you replace "mykeyword"

The "toCheck" variable is not explained what is for, and in any case the way it is used looks weird (to me at least).
Keeping that aside, one way to answer your request could be this (based only on the requirements you specified):
sort your originalValues, so that the ones with more words are first. The ones that have same number of words, should be ordered from more length to less.

making a String from a loop(Arraylist) and several individual signs to mysql commandline

I might have overlooked some factors influencing the process but that is why i seek help here. It is my first post here and i have read the initial prescriptions for helping me getting the best question as a basis for the best answer. I hop you will understand(otherwise please make a comment with further questions)
The case is that i have been creating an ArrayList
ArrayList<String> liste = new ArrayList<String>();
I gather several names, quantities, and dates:
if(shepherd == 0) {
} else if(shepherd <= 0) {
System.out.println(shepherd);
String s = "('shepherd'," + "'" + shepherd + "'," +"'" + ft.format(date) + "'" + ")";
liste.add(s);
}
I have defined shepherd as follows:
double shepherd = 0;
Next, I wish to add these entries to my MySql database.
I construct a query, and print it out so that I can verify that it is of the correct format:
System.out.println("INSERT INTO kennel VALUES");
for(int i = 0; i < liste.size(); i++) {
System.out.println(liste.get(i));
if(i != liste.size()-1) {
System.out.println(",");
}
}
This shows the correct command, with the proper syntax, but it's only output to the console at this point.
I have to send this through some Jsch or Ganymed. Most likely as a String. So i am wondering how i could take all the different parts, the doubles, the strings, the loop and build up a String, identical to the printed line i get in console.
I sensed it would look like this:
String command = (mysql -e "use kennel;insert into department3 values ('shepherd','1','2013-03-04');";
I believe that I am having some trouble with the " and ( and '.
I hope i made it clear what the trouble is about. Thank you in advance. Sincerely

Your string need to be held within quotation marks. Because this will interfere with the quotation marks within your String, you need to escape them. You can do this by placing a backslash in front of the character. :)
String command = "(mysql -e \"use kennel;insert into department3 values ('shepherd','1','2013-03-04');\"";

Pulling just certain address from href tag in string always ends in .jpg

I have a string that always looks like so:
Site Info
...where sitenum=XXX will be any 3 or 4 or 5 number combo. I am trying to get just the sitenum from this string.
I figured this would give me the correct information for 3 numbers:
String src = de.substring(de.lastIndexOf("sitenum=") + 3);
However, that just takes 'sit' off of the 'sitenum=' and returns everything else like ">Site Info
I would like it to stop after getting the numbers and hitting the " that is found just after the numbers.
Am i using lastIndexOf incorrectly?
EDIT -- Answer worked for one url, but not another:
http://www.wcc.nrcs.usda.gov/cgibin/wygraph-multi.pl?state=NV&wateryear=current&stationidname=19K07S
I am trying to pull 'state' from this url, but it is not pulling state, just replacing letters in 'state'... Here is the code:
String state = de.substring(de.lastIndexOf("state=") + 2,
de.indexOf("&", de.lastIndexOf("state=")));
The state is always a 2 letter or the number 0... When I run this on my string I get:
ate=0
for example... I am confused on how this works?
EDIT EDIT! AH! I get it... so 2 needs to be 6 cause that is the amount of chars I am comparing to find the next char from?

Use this:
String src = de.substring(de.lastIndexOf("sitenum=") + 8,de.indexOf("\"",de.lastIndexOf("sitenum=")));

Not the best way of doing it, but check if it is what you need:
String src = de.substring(de.lastIndexOf("sitenum=") + "sitenum=".length(), de.indexOf(">Site Info") - 1);

Actually you are trying to get the text which is after "sitenum=", but lastIndexOf("sitenum=") will return the starting index of "sitenum=" and not the text which you are expecting
try
int startindex = de.lastIndexOf("sitenum=") + "sitenum=".length();
int endIndex = de.lastIndexOf("\">Site Info</a>");
String src = de.substring(startindex ,endIndex);

Java text splitting algorithm

I have a large string (with text).
I need to split it into a few pieces (according to max chat limit), run some operations with them independently, and in the end merge the result.
A pretty simple task.
I'm just looking for an algorithm that will split text naturally. So it doesn't split it on fixed sized substrings, and doesn't cut the words in half.
For example (* is the 100th char, max char limit is set to 100):
....split me aro*und here...
the 1st fragment should contain: ...split me
the 2nd fragment should be: around here...
Working in Java btw.

The wikipedia article on word wrapping discusses this. It also links to an algorithm by Knuth.

You could use lastIndexOf(String find, int index).
public static List<String> splitByText(String text, String sep, int maxLength) {
List<String> ret = new ArrayList<String>();
int start = 0;
while (start + maxLength < text.length()) {
int index = text.lastIndexOf(sep, start + maxLength);
if (index < start)
throw new IllegalArgumentException("Unable to break into strings of " +
"no more than " + maxLength);
ret.add(text.substring(start, index));
start = index + sep.length();
}
ret.add(text.substring(start));
return ret;
}
And
System.out.println(splitByText("....split me around here...", " ", 14));
Prints
[....split me, around here...]

Jakarta commons-lang WordUtils.wrap() is close:
It only breaks on spaces
It doesn't return a list, but you can choose a "line separator" that's unlikely to occur in the text & then split on that

If you're using Swing for your chat, then you can handle it like this:
//textarea is JTextArea instance
textarea.setLineWrap(true);
textarea.setWrapStyleWord(true);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Changing HTML to PlainText - java

Related

Dealing with long Strings & TextView behavior

Java: Issue when replacing Strings on loop

making a String from a loop(Arraylist) and several individual signs to mysql commandline

Pulling just certain address from href tag in string always ends in .jpg

Java text splitting algorithm

Categories

Resources