Replace every word with tag

Replace every word with tag - java

JAVASCRIPT or JAVA solution needed
The solution I am looking for could use java or javascript. I have the html code in a string so I could manipulate it before using it with java or afterwards with javascript.
problem
Anyway, I have to wrap each word with a tag. For example:
<html> ... >
Hello every one, cheers
< ... </html>
should be changed to
<html> ... >
<word>Hello</word> <word>every</word> <word>one</word>, <word>cheers</word>
< ... </html>
Why?
This will help me use javascript to select/highlight a word. It seems the only way to do it is to use the function highlightElementAtPoint which I added in the JAVASCRIPT hint: It simply finds the element of a certain x,y coordinate and highlights it. I figured that if every word is an element, it will be doable.
The idea is to use this approach to allow us to detect highlighted text in an android WebView even if that would mean to use a twisted highlighting method. Think a bit more and you will find many other applications for this.
JAVASCRIPT hint
I am using the following code to highlight a word; however, this will highlight the whole text belonging to a certain tag. When each word is a tag, this will work to some extent. If there is a substitute that will allow me to highlight a word at a certain position, it would also be a solution.
function highlightElementAtPoint(xOrdinate, yOrdinate) {
var theElement = document.elementFromPoint(xOrdinate, yOrdinate);
selectedElement = theElement;
theElement.style.backgroundColor = "yellow";
var theName = theElement.nodeName;
var theArray = document.getElementsByTagName(theName);
var theIndex = -1;
for (i = 0; i < theArray.length; i++) {
if (theArray[i] == theElement) {
theIndex = i;
}
}
window.androidselection.selected(theElement.innerHTML);
return theName + " " + theIndex;
}

Try to use something like
String yourStringHere = yourStringHere.replace(" ", "</word> <word>" )
yourStringHere.replace("<html></word>", "<html>" );//remove first closing word-tag
Should work, maybe u have to change sth...

var tags = document.body.innerText.match(/\w+/g);
for(var i=0;i<tags.length;i++){
tags[i] = '<word>' + tags[i] + '</word>';
}
Or as #ThomasK said:
var tags = document.body.innerText;
tags = '<word>' + tags + '</word>';
tags = tags.replace(/\s/g,'</word><word>');
But you have to keep in mind: .replace(" ",foo) only replaces the space once. For multiple replaces you have to use .replace(/\s+/g,foo)
And as #ajax333221 said, the second way will include commas, dots and other symbols, so the better solution is the first
JSFiddle example: http://jsfiddle.net/c6ftq/4/

inputStr = inputStr.replaceAll("(?<!</?)\\w++(?!\\s*>)","<word>$0</word>");

You can try following code,
import java.util.StringTokenizer;
public class myTag
{
static String startWordTag = "<Word>";
static String endWordTag = "</Word>";
static String space = " ";
static String myText = "Hello how are you ";
public static void main ( String args[] )
{
StringTokenizer st = new StringTokenizer (myText," ");
StringBuffer sb = new StringBuffer();
while ( st.hasMoreTokens() )
{
sb.append(startWordTag);
sb.append(st.nextToken());
sb.append(endWordTag);
sb.append(space);
}
System.out.println ( "Result:" + sb.toString() );
}
}

Related

String Manipulation in java 1.6

String can be like below. Using java1.6
String example = "<number>;<name-value>;<name-value>";
String abc = "+17005554141;qwq=1234;ddd=ewew;otg=383";
String abc = "+17005554141;qwq=123454";
String abc = "+17005554141";
I want to remove qwq=1234 if present from String. qwq is fixed and its value can VARY like for ex 1234 or 12345 etc
expected result :
String abc = "+17005554141;ddd=ewew;otg=383";
String abc = "+17005554141"; \\removed ;qwq=123454
String abc = "+17005554141";
I tried through
abc = abc.replaceAll(";qwq=.*;", "");
but not working.

I came up with this qwq=\d*\;? and it works. It matches for 0 or more decimals after qwq=. It also has an optional parameter ; since your example seems to include that this is not always appended after the number.
I know the question is not about javascript, but here's an example where you can see the regex working:
const regex = /qwq=\d*\;?/g;
var items = ["+17005554141;qwq=123454",
"+17005554141",
"+17005554141;qwq=1234;ddd=ewew;otg=383"];
for(let i = 0; i < items.length; i++) {
console.log("Item before replace: " + items[i]);
console.log("Item after replace: " + items[i].replace(regex, "") + "\n\n");
}

You can use regex for removing that kind of string like this. Use this code,
String example = "+17005554141;qwq=1234;ddd=ewew;otg=383";
System.out.println("Before: " + example);
System.out.println("After: " + example.replaceAll("qwq=\\d+;?", ""));
This gives following output,
Before: +17005554141;qwq=1234;ddd=ewew;otg=383
After: +17005554141;ddd=ewew;otg=383

.* applies to multi-characters, not limited to digits. Use something that applies only to bunch of digits
abc.replaceAll(";qwq=\\d+", "")
^^
Any Number

please try
abc = abc.replaceAll("qwq=[0-9]*;", "");

If you don't care about too much convenience, you can achieve this by just plain simple String operations (indexOf, replace and substring). This is maybe the most legacy way to do this:
private static String replaceQWQ(String target)
{
if (target.indexOf("qwq=") != -1) {
if (target.indexOf(';', target.indexOf("qwq=")) != -1) {
String replace =
target.substring(target.indexOf("qwq="), target.indexOf(';', target.indexOf("qwq=")) + 1);
target = target.replace(replace, "");
} else {
target = target.substring(0, target.indexOf("qwq=") - 1);
}
}
return target;
}
Small test:
String abc = "+17005554141;qwq=1234;ddd=ewew;otg=383";
String def = "+17005554141;qwq=1234";
System.out.println(replaceQWQ(abc));
System.out.println(replaceQWQ(def));
outputs:
+17005554141;ddd=ewew;otg=383
+17005554141

Another one:
abc.replaceAll(";qwq=[^;]*;", ";");

You must to use groups in replaceAll method.
Here is an example:
abc.replaceAll("(.*;)(qwq=\\d*;)(.*)", "$1$3");
More about groups you can find on: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

Replace the words in String without using String replace

Is there any solution on how to replace words in string without using String replace?
As you all can see this is like hard coded it. Is there any method to make it dynamically? I heard that there is some library file able to make it dynamically but I am not very sure.
Any expert out there able to give me some solutions? Thank you so much and have a nice day.
for (int i = 0; i < results.size(); ++i) {
// To remove the unwanted words in the query
test = results.toString();
String testresults = test.replace("numFound=2,start=0,docs=[","");
testresults = testresults.replace("numFound=1,start=0,docs=[","");
testresults = testresults.replace("{","");
testresults = testresults.replace("SolrDocument","");
testresults = testresults.replace("numFound=4,start=0,docs=[","");
testresults = testresults.replace("SolrDocument{", "");
testresults = testresults.replace("content=[", "");
testresults = testresults.replace("id=", "");
testresults = testresults.replace("]}]}", "");
testresults = testresults.replace("]}", "");
testresults = testresults.replace("}", "");

In this case, you will need learn regular expression and a built-in String function String.replaceAll() to capture all possible unwanted words.
For example:
test.replaceAll("SolrDocument|id=|content=\\[", "");

Simply create and use a custom String.replace() method which happens to use the String.replace() method within it:
public static String customReplace(String inputString, String replaceWith, String... stringsToReplace) {
if (inputString.equals("")) { return replaceWith; }
if (stringsToReplace.length == 0) { return inputString; }
for (int i = 0; i < stringsToReplace.length; i++) {
inputString = inputString.replace(stringsToReplace[i], replaceWith);
}
return inputString;
}
In the example method above you can supply as many strings as you like to be replaced within the stringsToReplace parameter as long as they are delimited with a comma (,). They will all be replaced with what you supply for the replaceWith parameter.
Here is an example of how it can be used:
String test = "This is a string which contains numFound=2,start=0,docs=[ crap and it may also "
+ "have numFound=1,start=0,docs=[ junk in it along with open curly bracket { and "
+ "the SolrDocument word which might also have ]}]} other crap in there too.";
testResult = customReplace(strg, "", "numFound=2,start=0,docs=[ ", "numFound=1,start=0,docs=[ ",
+ "{ ", "SolrDocument ", "]}]} ");
System.out.println(testResult);
You can also pass a single String Array which contains all your unwanted strings within its elements and pass that array to the stringsToReplace parameter, for example:
String test = "This is a string which contains numFound=2,start=0,docs=[ crap and it may also "
+ "have numFound=1,start=0,docs=[ junk in it along with open curly bracket { and "
+ "the SolrDocument word which might also have ]}]} other crap in there too.";
String[] unwantedStrings = {"numFound=2,start=0,docs=[ ", "numFound=1,start=0,docs=[ ",
"{ ", "SolrDocument ", "]}]} "};
String testResult = customReplace(test, "", unwantedStrings);
System.out.println(testResult);

Need help to form a regex in java

I want to find a regx and occurrences of it in the page source using language Java. The value I am trying to search is as given in the program below.
There might be one or more spaces between tags. I am not able to form a regx for this value. Can some one please help me to find the regx for this value?
My program which checks regx is as given below-
String regx=""<img height=""1"" width=""1"" style=""border-style:none;"" alt="""" src=""//api.adsymptotic.com/api/s/trackconversion?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel""/>";
WebDrive driver = new FirefoxDriver();
driver.navigate().to("abc.xom");
int count=0, found=0;
source = driver.getPageSource();
source = source.replaceAll("\\s+", " ").trim();
pattern = Pattern.compile(regx);
matcher = pattern.matcher(source);
while(matcher.find())
{
count++;
found=1;
}
if(found==0)
{
System.out.println("Maximiser not found");
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Fail";
}
else
{
System.out.println("Maximiser is found" + count);
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Pass";
}
count=0; found=0;

Hard to tell without the original text and expected result, but your Pattern clearly won't compile as is.
You should single-escape double quotes (\") and double-escape special characters (i.e. \\?) for your code and your Pattern to compile.
Something in the lines of:
String regx="<img height=\"1\" width=\"1\" style=\"border-style:none;\" " +
"alt=\"\" src=\"//api.adsymptotic.com/api/s/trackconversion" +
"\\?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b" +
"&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel\"/>";
Also consider scraping markup with appropriate framework (i.e. JSoup for HTML) instead of regex.

how to remove anchor tag and make it text

String k= <html>
<a target="_blank" href="http://www.taxmann.com/directtaxlaws/fileopencontainer.aspx?Page=CIRNO&
amp;id=1999033000019320&path=/Notifications/DirectTaxLaws/HTMLFiles/S.O.193(E)30031999.htm&
amp;aa=">number S.O.I93(E), dated the 30th March, 1999
</html>
I'm getting this HTML in a String and I want to remove the anchor tag so that data is also removed from link.
I just want display it as text not as a link.
how to do this i m trying to do so much not able to do please send me code regarding that i m
creating app for Android this issue i m getting in android on web view.

use JSoup, and jSoup.parse()

You can use the following example (don't remember where i've found it, but it works) using replace method to modify the string before showing it:
k = replace ( k, "<a target=\"_blank\" href=", "");
String replace(String _text, String _searchStr, String _replacementStr) {
// String buffer to store str
StringBuffer sb = new StringBuffer();
// Search for search
int searchStringPos = _text.indexOf(_searchStr);
int startPos = 0;
int searchStringLength = _searchStr.length();
// Iterate to add string
while (searchStringPos != -1) {
sb.append(_text.substring(startPos, searchStringPos)).append(_replacementStr);
startPos = searchStringPos + searchStringLength;
searchStringPos = _text.indexOf(_searchStr, startPos);
}
// Create string
sb.append(_text.substring(startPos,_text.length()));
return sb.toString();
}
To substitute all the target with an empty line:
k = replace ( k, "<a target=\"_blank\" href=\"http://www.taxmann.com/directtaxlaws/fileopencontainer.aspx?Page=CIRNO&id=1999033000019320&path=/Notifications/DirectTaxLaws/HTMLFiles/S.O.193(E)30031999.htm&aa=\">", "");
No escape is needed for slash.

Extracting contents from HTML represented as a String

I have a Big html in String variable and I want to get contents of a div. I can not rely on regular expression because it can have nested div's. So, let's suppose I have following String -
String test = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
Then how can I get this with a simple java program -
<div id="mainContent">foo bar<div>good best better</div> <div>test test</div></div>
Well my approch is something like this (might be horrable, still fighting to correct) -
public static void main(String[] args) {
int count = 1;
int fl = 0;
String s = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
String tmp = s;
int len = s.length();
for (int i=0; i<len; i++){
int st = s.indexOf("div>");
if(st > -1) {
char c = s.charAt(st-1);
if(c == '/') {
count--;
} else {
count++;
}
s = s.substring(st+4);
System.out.println(s);
i = i + st;
System.out.println(c + " -- " + st + " -- " + count + " -- " + i);
if (count == 0) {
fl = i;
break;
}
}
}
System.out.println("final ind - " + fl);
s = tmp.substring(0, fl + 4);
System.out.println("final String - " + s);
}

I would recommend using JSoup to parse the HTML and find what you are looking for.
It fulfills the simple requirement for sure. You can do what you want in just a couple of lines of code!
jsoup is a Java library for working with real-world HTML. It provides
a very convenient API for extracting and manipulating data, using the
best of DOM, CSS, and jquery-like methods.
jsoup implements the WHATWG HTML5 specification, and parses HTML to
the same DOM as modern browsers do.
scrape and parse HTML from a URL, file, or string
find and extract data, using DOM traversal or CSS selectors
jsoup is designed to deal with all varieties of HTML found in the
wild; from pristine and validating, to invalid tag-soup; jsoup will
create a sensible parse tree.
Using the selector syntax makes finding and extracting data extremely simple.
public static void main(final String[] args)
{
final String s = "<div><div id=\"mainContent\">foo bar<div>good best better</div> <div>test test</div></div><div>foo bar</div></div>";
final Document d = Jsoup.parse(s);
final Elements e = d.select("#mainContent");
System.out.println(e.get(0));
}
outputs
<div id="mainContent">
foo bar
<div>
good best better
</div>
<div>
test test
</div>
</div>
Doesn't get much more simple than that!

I'm afraid the answer is: You don't. At least not with a "simple" program...
But there is hope: You can use a HTML parser library (like NekoHTML or HTMLParser, although the latter project seems to be dead) to parse the string and retrive the part you need.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace every word with tag - java

Try to use something like String yourStringHere = yourStringHere.replace(" ", "</word> <word>" ) yourStringHere.replace("<html></word>", "<html>" );//remove first closing word-tag Should work, maybe u have to change sth...

inputStr = inputStr.replaceAll("(?<!</?)\\w++(?!\\s*>)","<word>$0</word>");

Related

String Manipulation in java 1.6

Replace the words in String without using String replace

Need help to form a regex in java

how to remove anchor tag and make it text

Extracting contents from HTML represented as a String

Categories

Resources