I have the folowing XML SAX Handler:
private class GetXML_Handler extends DefaultHandler {
int x = 0;
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
Log.i("DataHandler", "Start of XML element");
int y = 0;
if (qName.equals("polygon")) {
locations.add(x, new ArrayList<location>());
String coordinates = attributes.getValue("coordinates");
String[] parts = coordinates.split(",");
System.out.println("Cyklus zacaty");
locations.get(x).add(y, new location(Double.parseDouble(parts[0]), Double.parseDouble(parts[1])));
for(int i = 2; i <= parts.length; i = i + 2){
y++;
double Latitude = Double.parseDouble(parts[i].substring(2));
double Longitude = Double.parseDouble(parts[i+1]);
locations.get(x).add(y, new location(Latitude, Longitude));
}
System.out.println("cyklus skonceny");
x++;
}
}
}
However "cyklus zacaty" never gets printed. It prints okay if I move it before String[] parts = coordinates.split(",") . Those strings are rly big (like 350 GPS coordinates) so is it possible that java simply cant handle it and it stops (with no exception).
Also my "Start of XML element" gets printed only 5 times (till first coordinates) but if I remove this split method it prints 28 times (number of my XML elements). I am sure that my XML handler works correctly (its just something about those Strings).
Structure of XML is like this:
<?xml version="1.0" encoding="UTF-8"?>
<oblasti>
<oblast>
<nazovOblasti>VT</nazovOblasti>
<polygon>
<coordinates>
132456,4658789,0 56487,4864684
</coordinates>
</polygon>
....
Any suggestions?
Thanks in forward
The Attributes parameter does not contain the child elements of the node, only the attributes assigned to the node (if any). Hence, attributes.getValue("coordinates") is not doing what you expect.
I think your problem is not with the split method, but this line:
locations.get(x).add(y, new location(Double.parseDouble(parts[0]), Double.parseDouble(parts[1])));
and the reason is because the string has whitespaces, and they stay after splitted, so it can't be parsed.
try this just after the split and before anything else:
for (i=0; i<parts.length(); i++){
parts[i].trim();
}
when an exception occurs, sometimes it breaks the thread before last line is completely executed, especially if it's a console output. It has made me mad too many times. This may be the reason your printline doesn't show.
Related
I need to truncate html string that was already sanitized by my app before storing in DB & contains only links, images & formatting tags. But while presenting to users, it need to be truncated for presenting an overview of content.
So I need to abbreviate html strings in java such that
<img src="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg" />
<br/><a href="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg" />
when truncated does not return something like this
<img src="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg" />
<br/><a href="htt
but instead returns
<img src="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg" />
<br/>
Your requirements are a bit vague, even after reading all the comments. Given your example and explanations, I assume your requirements are the following:
The input is a string consisting of (x)html tags. Your example doesn't contain this, but I assume the input can contain text between the tags.
In the context of your problem, we do not care about nesting. So the input is really only text intermingled with tags, where opening, closing and self-closing tags are all considered equivalent.
Tags can contain quoted values.
You want to truncate your string such that the string is not truncated in the middle of a tag. So in the truncated string every '<' character must have a corresponding '>' character.
I'll give you two solutions, a simple one which may not be correct, depending on what the input looks like exactly, and a more complex one which is correct.
First solution
For the first solution, we first find the last '>' character before the truncate size (this corresponds to the last tag which was completely closed). After this character may come text which does not belong to any tag, so we then search for the first '<' character after the last closed tag. In code:
public static String truncate1(String input, int size)
{
if (input.length() < size) return input;
int pos = input.lastIndexOf('>', size);
int pos2 = input.indexOf('<', pos);
if (pos2 < 0 || pos2 >= size) {
return input.substring(0, size);
}
else {
return input.substring(0, pos2);
}
}
Of course this solution does not consider the quoted value strings: the '<' and '>' characters might occur inside a string, in which case they should be ignored. I mention the solution anyway because you mention your input is sanatized, so possibly you can ensure that the quoted strings never contain '<' and '>' characters.
Second solution
To consider the quoted strings, we cannot rely on standard Java classes anymore, but we have to scan the input ourselves and remember if we are currently inside a tag and inside a string or not. If we encounter a '<' character outside of a string, we remember its position, so that when we reach the truncate point we know the position of the last opened tag. If that tag wasn't closed, we truncate before the beginning of that tag. In code:
public static String truncate2(String input, int size)
{
if (input.length() < size) return input;
int lastTagStart = 0;
boolean inString = false;
boolean inTag = false;
for (int pos = 0; pos < size; pos++) {
switch (input.charAt(pos)) {
case '<':
if (!inString && !inTag) {
lastTagStart = pos;
inTag = true;
}
break;
case '>':
if (!inString) inTag = false;
break;
case '\"':
if (inTag) inString = !inString;
break;
}
}
if (!inTag) lastTagStart = size;
return input.substring(0, lastTagStart);
}
A robust way of doing it is to use the hotsax code which parses HTML letting you interface with the parser using the traditional low level SAX XML API [Note it is not an XML parser it parses poorly formed HTML in only chooses to let you interface with it using a standard XML API).
Here on github I have created a working quick-and-dirty example project which has a main class that parses your truncated example string:
XMLReader parser = XMLReaderFactory.createXMLReader("hotsax.html.sax.SaxParser");
final StringBuilder builder = new StringBuilder();
ContentHandler handler = new DoNothingContentHandler(){
StringBuilder wholeTag = new StringBuilder();
boolean hasText = false;
boolean hasElements = false;
String lastStart = "";
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String text = (new String(ch, start, length)).trim();
wholeTag.append(text);
hasText = true;
}
#Override
public void endElement(String namespaceURI, String localName,
String qName) throws SAXException {
if( !hasText && !hasElements && lastStart.equals(localName)) {
builder.append("<"+localName+"/>");
} else {
wholeTag.append("</"+ localName +">");
builder.append(wholeTag.toString());
}
wholeTag = new StringBuilder();
hasText = false;
hasElements = false;
}
#Override
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
wholeTag.append("<"+ localName);
for( int i = 0; i < atts.getLength(); i++) {
wholeTag.append(" "+atts.getQName(i)+"='"+atts.getValue(i)+"'");
hasElements = true;
}
wholeTag.append(">");
lastStart = localName;
hasText = false;
}
};
parser.setContentHandler(handler);
//parser.parse(new InputSource( new StringReader( "<div>this is the <em>end</em> my <br> friend some link" ) ));
parser.parse(new InputSource( new StringReader( "<img src=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" />\n<br/><a href=\"htt" ) ));
System.out.println( builder.toString() );
It outputs:
<img src='http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg'></img><br/>
It is adding an </img> tag but thats harmless for html and it would be possible to tweak the code to exactly match the input in the output if you felt that necessary.
Hotsax is actually generated code from using yacc/flex compiler tools run over the HtmlParser.y and StyleLexer.flex files which define the low level grammar of html. So you benefit from the work of the person who created that grammar; all you need to do is write some fairly trivial code and test cases to reassemble the parsed fragments as shown above. That's much better than trying to write your own regular expressions, or worst and coded string scanner, to try to interpret the string as that is very fragile.
Afer I understand what you want here is the most simple solution I could come up with.
Just work from the end of your substring to the start until you find '>' This is the end mark of the last tag. So you can be sure that you only have complete tags in the majority of cases.
But what if the > is inside texts?
Well to be sure about this just search on until you find < and ensure this is part of a tag (do you know the tag string for instance?, since you only have links, images and formating you can easily check this. If you find another > before finding < starting a tag this is the new end of your string.
Easy to do, correct and should work for you.
If you are not certain if strings / attributes can contain < or > you need to check the appearence of " and =" to check if you are inside a string or not. (Remember you can cut of an attribute values). But I think this is overengineering. I never found an attribute with < and > in it and usually within text it is also escaped using & lt ; and something alike.
I don't know the context of the problem the OP needs to solve, but I am not sure if it makes a lot of sense to truncate html code by the length of its source code instead of the length of its visual representation (which can become arbitrarily complex, of course).
Maybe a combined solution could be useful, so you don't penalize html code with a lot of markup or long links, but also set a clear total limit which cannot be exceeded. Like others already wrote, the usage of a dedicated HTML parser like JSoup allows the processing of non well-formed or even invalid HTML.
The solution is loosely based on JSoup's Cleaner. It traverses the parsed dom tree of the source code and tries to recreate a destination tree while continuously checking, if a limit has been reached.
import org.jsoup.nodes.*;
import org.jsoup.parser.*;
import org.jsoup.select.*;
String html = "<img src=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" />" +
"<br/><a href=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" />";
//String html = "<b>foo</b>bar<p class=\"baz\">Some <img />Long Text</p><a href='#'>hello</a>";
Document srcDoc = Parser.parseBodyFragment(html, "");
srcDoc.outputSettings().prettyPrint(false);
Document dstDoc = Document.createShell(srcDoc.baseUri());
dstDoc.outputSettings().prettyPrint(false);
Element dst = dstDoc.body();
NodeVisitor v = new NodeVisitor() {
private static final int MAX_HTML_LEN = 85;
private static final int MAX_TEXT_LEN = 40;
Element cur = dst;
boolean stop = false;
int resTextLength = 0;
#Override
public void head(Node node, int depth) {
// ignore "body" element
if (depth > 0) {
if (node instanceof Element) {
Element curElement = (Element) node;
cur = cur.appendElement(curElement.tagName());
cur.attributes().addAll(curElement.attributes());
String resHtml = dst.html();
if (resHtml.length() > MAX_HTML_LEN) {
cur.remove();
throw new IllegalStateException("html too long");
}
} else if (node instanceof TextNode) {
String curText = ((TextNode) node).getWholeText();
String resHtml = dst.html();
if (curText.length() + resHtml.length() > MAX_HTML_LEN) {
cur.appendText(curText.substring(0, MAX_HTML_LEN - resHtml.length()));
throw new IllegalStateException("html too long");
} else if (curText.length() + resTextLength > MAX_TEXT_LEN) {
cur.appendText(curText.substring(0, MAX_TEXT_LEN - resTextLength));
throw new IllegalStateException("text too long");
} else {
resTextLength += curText.length();
cur.appendText(curText);
}
}
}
}
#Override
public void tail(Node node, int depth) {
if (depth > 0 && node instanceof Element) {
cur = cur.parent();
}
}
};
try {
NodeTraversor t = new NodeTraversor(v);
t.traverse(srcDoc.body());
} catch (IllegalStateException ex) {
System.out.println(ex.getMessage());
}
System.out.println(" in='" + srcDoc.body().html() + "'");
System.out.println("out='" + dst.html() + "'");
For the given example with max length of 85, the result is:
html too long
in='<img src="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg"><br>'
out='<img src="http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg"><br>'
It also correctly truncates within nested elements, for a max html length of 16 the result is:
html too long
in='<i>f<b>oo</b>b</i>ar'
out='<i>f<b>o</b></i>'
For a maximum text length of 2, the result of a long link would be:
text too long
in='<b>foo</b>bar'
out='<b>fo</b>'
You can achieve this with library "JSOUP" - html parser.
You can download it from below link.
Download JSOUP
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class HTMLParser
{
public static void main(String[] args)
{
String html = "<img src=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" /><br/><a href=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" /><img src=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" /><br/><a href=\"http://d2qxdzx5iw7vis.cloudfront.net/34775606.jpg\" />";
Document doc = Jsoup.parse(html);
doc.select("a").remove();
System.out.println(doc.body().children());
}
}
Well whatever you want to do. There are two libraries out there jSoup and HtmlParser which I tend to use. Please check them out. Also I see bearly XHTML in the wild anymore. Its more about HTML5 (which does not have an XHTML counterpart) nowadays.
[Update]
I mention JSoup and HtmlParser since they are fault tollerant in a way the browser is. Please check if they suite you since they are very good at dealing with malformed and damaged HTML text. Create a DOM out of your HTML and write it back to string you should get rid of the damaged tags also you can filter the DOM by yourself and remove even more content if you have to.
PS: I guess the XML decade is finally (and gladly) over. Today JSON is going to be overused.
A third potential answer I would consider as a potential solution is not to work with strings ins the first place.
When I remember correctly there are DOM tree representations that work closely with the underlying string presentation. Therefore they are character exact. I wrote one myself but I think jSoup has such a mode. Since there are a lot of parsers out there you should be able to find one that actually does.
With such a parser you can easily see which tag runs from what string position to another. Actually those parsers maintain a String of the document and alter it but only store range information like start and stop positions within the document avoiding to multiply those information for nested nodes.
Therefore you can find the most outer node for a given position, know exactly from what to where and easily can decide if this tag (including all its children) can be used to be presented within your snippet. So you will have the chance to print complete text nodes and alike without the risk to only present partial tag information or headline text and alike.
If you do not find a parser that suites you on this, you can ask me for advise.
for(int j = 1;j<fileArray.size();j++) {
if(str.contains(fileArray.get(end+j))) {
}
}
(assume end is some number such as 30).
The goal of this part is when having a window length of 30 and a fileArray size > 30, check if theres anything after index 30 that matches whatever is inside the window.
ex: "i like to eat piesss aaaabbbbpiesssbbbb"
starting from the beginning of the string add the first 17 characters to a arraylist called window. then i check the rest of the string starting from right after window to see if there's anything that matches. space doesnt match so you add it to the output. keep checking then you see "piesss" matches. Then i replace the second "piesss" with wherever the first "piesss" occurs.
So right now im using fileArray.get(end+j) to check if there's anything that matches within my string(str) except this doesn't really work. Is there a way I could fix this code segment?
The replacement part of your question is still unclear. As is any reasoning to use an ArrayList. I've written some code that does a 5 character window search for a match after splitting the string you provided. Note how with the 30 and 17 values you gave nothing is ever matched (see commented out code). However with tweaked values some matches can be found.
public static void main(String[] args) {
// 1 2 3
//012345678901234567890123456789012345678 <- shows the index
String test = "i like to eat piesss aaaabbbbpiesssbbbb";
// int first = 17;
// int end = 30;
int first = 20;
int end = 37;
String firstHalf = test.substring(0, first);
String secondHalf = test.substring(first, end);
int matchSize = 5;
for (int i = 0; i + matchSize < secondHalf.length() ; i++)
{
String window = secondHalf.substring(i, i + matchSize);
if ( firstHalf.contains(window) )
{
System.out.println(window);
}
}
System.out.println("Done searching.");
}
Displays:
piess
iesss
Done searching.
If this isn't what you meant PLEASE edit your question to make your needs clear.
Very new to Java: Trying to learn it.
I created an Array and would like to access individual components of the array.
The first issue I am having is how to I print the array as a batch or the whole array as indicated below? For example: on the last value MyValue4 I added a line break so that when the values are printed, the output will look like this: There has to be a better way to do this?
MyValue1
MyValue2
MyValue3
MyValue4
MyValue1
MyValue2
MyValue3
MyValue4
The next thing I need to do is, manipulate or replace a value with something else, example: MyValue with MyValx, when the repeat variable is at a certain number or value.
So when the repeat variable reaches 3 change my value to something else and then change back when it reaches 6.
I am familiar with the Replace method, I am just not sure how to put this all together.
I am having trouble with changing just parts of the array with the while and for loop in the mix.
My Code:
public static String[] MyArray() {
String MyValues[] = { "MyValue1", "MyValue2", "MyValue3", "MyValue4\n" };
return MyValues;
}
public static void main(String[] args) {
int repeat = 0;
while (repeat < 7) {
for (String lines : MyArray()) {
System.out.println(lines);
}
repeat = repeat + 1;
if (repeat == 7) {
break;
}
}
}
Maybe to use for cycle to be shorter:
for (int i = 0; i < 7; i++) {
for (String lines : MyArray()) {
// Changes depended by values.
if (i > 3) {
lines = MyValx;
}
System.out.println(lines); // to have `\n` effect
}
System.out.println();
}
And BTW variables will start in lower case and not end withenter (\n). So use:
String myValues[] = {"MyValue1", "MyValue2", "MyValue3", "MyValue4"};
instead of:
String MyValues[] = { "MyValue1", "MyValue2", "MyValue3", "MyValue4\n" };
and add System.out.println(); after eache inside cycle instead of this:
MyValues[n] = "value";
where n is the position in the array.
You may consider using System.out.println() without any argument for printing an empty line instead of inserting new-line characters in your data.
You already know the for-each loop, but consider a count-controlled loop, such as
for (int i = 0; i < lines.length; i++) {
...
}
There you can use i for accessing your array as well as for deciding for further actions.
Replacing array items based on a number in a string might be a bit trickier. A regular expression will definitely do the job, if you are familiar with that. If not, I can recommend learning this, because it will sure be useful in future situations.
A simpler approach might be using
int a = Integer.parseInt("123"); // returns 123 as integer
but that only works on strings, which contain pure numbers (positive and negative). It won't work with abc123. This will throw an exception.
These are some ideas, you might try out and experiment with. Also use the documentation excessively. ;-)
This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.
String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")
I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."
Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.
I have a file with data in the form timestamp, coordinate, coordinate, seperated by spaces, as here;
14:25:01.215 370.0 333.0
I need to loop through and add the coordinates only to an array. The data from the file is read in and put into as String[] called info, from split(" "). I have two problems, I think the end of the file has a extra " " which I need to lose appropriately and I also want confirmation/suggestions of my loop, at the moment I am getting sporadic out of bounds exceptions. My loop is as follows;
String[] info;
info = dataHolder.split(" ");
ArrayList<String> coOrds1 = new ArrayList<String>();
for (int counter = 0; counter < info.length; counter = counter+3)
{
coOrds1.add(info[counter+1]);
coOrds1.add(info[counter+2]);
}
Help and suggestions appreciated.
The text file is here but the class receives in a UDP packet from another class so I am unsure if this potentially adds " " at the end or not.
There are various classes/methods in Google's Guava library that could help with this task, in particular Splitter.omitEmptyStrings() which will discard any trailing space at the end of the file:
String input = Files.toString(file, Charsets.US_ASCII);
Iterable<String> fields =
Splitter.on(" ")
.omitEmptyStrings()
.split(input);
List<Coord> coords = Lists.newArrayList();
for (List<String> group: Iterables.partition(fields, 3)) {
String t = group.get(0);
double x = Double.parseDouble(group.get(1));
double y = Double.parseDouble(group.get(2));
coords.add(new Coord(t, x, y));
}
The problem will occur if you have an extra space at the end, because you are testing for counter < info.length and using counter + 1 and counter + 2. Try changing the loop conditions to:
for (int counter = 0; counter + 2 < info.length; counter = counter+3)
There is no need for external libraries.
You could just call dataHolder.trim(); which will remove any whitespace from the beginning and end your string. Then using dataHolder.split("\s"); //splits on "whitespace", you will receive an array consisting only of your data and with the appropriate size.
This will save you all the checks at each iteration whether counter+2 is still within the scope of the array. While still a valid solution, this could introduce further problems in the future due to its inherent nature of being "check-to-validate" - you simply might forget to process one of the cases - while trimming the string beforehand makes it structurally, constructed valid and there is no need to process special cases.