Remove last n lines (sentences) in a String in Java - java

I am looking for an efficient way to remove last n lines from a String. Efficient as in- fast performing as well as something that does not create too may objects. Therefore would like to stay away from split(). Especially because, at times, my strings could be a few hundred or even thousand lines.
For instance, I am getting a string as such:
This is a sample code line 1.
This is a sample code line 2.
Warm Regards,
SomeUser.
The last 3 lines (an empty line, "Warm Regards,", and "SomeUser.") is what I am trying to get rid of. Note that the content (including the last 3 lines) isn't fixed.
I am thinking of counting the lines first using this solution here: https://stackoverflow.com/a/18816371/1353174 and then again, use another similar loop to reach to a position that is lines - n and do a substring till that position.
However, just posting this problem here to know if there are any other and perhaps more efficient ways to achieve this. External library-based solutions (like Apache Commons StringUtils) are also welcome.

You can use String.lastIndexOf to find last third occurrence of '\n' symbol and then do String.substring to get the result.
public static void main(String[] args) {
String s = "This is a sample code line 1.\n" +
"This is a sample code line 2.\n" +
"\n" +
"Warm Regards,\n" +
"SomeUser.";
int truncateIndex = s.length();
for (int i = 0; i < 3; i++) {
System.out.println(truncateIndex);
truncateIndex = s.lastIndexOf('\n', truncateIndex - 1);
}
System.out.println(s.substring(0, truncateIndex));
System.out.println("--");
}
This code snippet intentionally doesn't care for corner cases, such as when there is less than three lines in input string, to make code simple and readable.

public static final String SAMPLE_TEXT = "This is a sample code line 1.\nThis is a sample code line 2.\r\n\nWarm Regards,\r\nSomeUser.";
public static void main (String[] args) throws java.lang.Exception {
String[] lines = SAMPLE_TEXT.split("\\r?\\n"); // catches Windows newlines (\r) as well)
for (int i = 0; i < lines.length - 3; i++) { // lines.length - 3 to discard the last 3 lines
System.out.println(lines[i]);
}
}
Here's a runnable example:
http://ideone.com/nwaMcD

#scala.annotation.tailrec
def rmLines(in: String, nlines: Int): String =
if (nlines == 0) {
in
} else {
val lastBreakIndex = in.lastIndexOf('\n')
if (lastBreakIndex == -1) {
in
} else {
rmLines(in.substring(0, lastBreakIndex), nlines - 1)
}
}

Use regular expressions to do it : http://docs.oracle.com/javase/tutorial/essential/regex/

Related

How to tokenize a string using indexOf and substring methods

So I have to tokenize a string, and I can only use these 2 methods to tokenize
I have the base, but I don't know what to put in,
My friend did it, but I forgot how it looked, it went something like this
I remember he split it using the length of a tab
public class Tester
{
private static StringBuffer sb = new StringBuffer ("The cat in the hat");
public static void main(String[] args)
{
for(int i = 0; i < sb.length() ; i++)
{
int tempIndex = sb.indexOf(" ", 0);
sb.substring(0,tempIndex);
if(tempIndex > 0)
{
System.out.println(sb.substring(0,tempIndex));
sb.delete(0, sb.length());
}
}
}
}
String.indexOf(int ch) returns the index of a character. If you do sb.indexOf(' ') you'll get the first index of a space. You can use that in conjunction with substring(): sb.substring(0,sb.indexOf(' ')-1) will get you your first token.
This seems like a homework problem, so I don't want to give you the full answer, but you probably can work it out. Comment if you need more help.
If your are familiar with a while loop construct you can take a look at my pseudocode, should be within the constraints of your problem:
String text = "texty text text"
while(TextHasASapce){
print text up to space
set text to equal all text AFTER the space
}
print ??
Using your two allowed methods the above is convertible line by line to what you are after.
Hope it helps.

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

How do I access and manipulate individual values in a String Array when using a while and for loop

Very new to Java: Trying to learn it.
I created an Array and would like to access individual components of the array.
The first issue I am having is how to I print the array as a batch or the whole array as indicated below? For example: on the last value MyValue4 I added a line break so that when the values are printed, the output will look like this: There has to be a better way to do this?
MyValue1
MyValue2
MyValue3
MyValue4
MyValue1
MyValue2
MyValue3
MyValue4
The next thing I need to do is, manipulate or replace a value with something else, example: MyValue with MyValx, when the repeat variable is at a certain number or value.
So when the repeat variable reaches 3 change my value to something else and then change back when it reaches 6.
I am familiar with the Replace method, I am just not sure how to put this all together.
I am having trouble with changing just parts of the array with the while and for loop in the mix.
My Code:
public static String[] MyArray() {
String MyValues[] = { "MyValue1", "MyValue2", "MyValue3", "MyValue4\n" };
return MyValues;
}
public static void main(String[] args) {
int repeat = 0;
while (repeat < 7) {
for (String lines : MyArray()) {
System.out.println(lines);
}
repeat = repeat + 1;
if (repeat == 7) {
break;
}
}
}
Maybe to use for cycle to be shorter:
for (int i = 0; i < 7; i++) {
for (String lines : MyArray()) {
// Changes depended by values.
if (i > 3) {
lines = MyValx;
}
System.out.println(lines); // to have `\n` effect
}
System.out.println();
}
And BTW variables will start in lower case and not end withenter (\n). So use:
String myValues[] = {"MyValue1", "MyValue2", "MyValue3", "MyValue4"};
instead of:
String MyValues[] = { "MyValue1", "MyValue2", "MyValue3", "MyValue4\n" };
and add System.out.println(); after eache inside cycle instead of this:
MyValues[n] = "value";
where n is the position in the array.
You may consider using System.out.println() without any argument for printing an empty line instead of inserting new-line characters in your data.
You already know the for-each loop, but consider a count-controlled loop, such as
for (int i = 0; i < lines.length; i++) {
...
}
There you can use i for accessing your array as well as for deciding for further actions.
Replacing array items based on a number in a string might be a bit trickier. A regular expression will definitely do the job, if you are familiar with that. If not, I can recommend learning this, because it will sure be useful in future situations.
A simpler approach might be using
int a = Integer.parseInt("123"); // returns 123 as integer
but that only works on strings, which contain pure numbers (positive and negative). It won't work with abc123. This will throw an exception.
These are some ideas, you might try out and experiment with. Also use the documentation excessively. ;-)

String.split() Not Acting on Semicolon or Space Delimiters

This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.
String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")
I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."
Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.

Java: How To Grab Each nth Lines From a String

I'm wondering how I could grab each nth lines from a String, say each 100, with the lines in the String being seperated with a '\n'.
This is probably a simple thing to do but I really can't think of how to do it, so does anybody have a solution?
Thanks much,
Alex.
UPDATE:
Sorry I didn't explain my question very well.
Basically, imagine there's a 350 line file. I want to grab the start and end of each 100 line chunk. Pretending each line is 10 characters long, I'd finish with a 2 seperate arrays (containing start and end indexes) like this:
(Lines 0-100) 0-1000
(Lines 100-200) 1000-2000
(Lines 200-300) 2000-3000
(Lines 300-350) 3000-3500
So then if I wanted to mess around with say the second set of 100 lines (100-200) I have the regions for them.
You can split the string into an array using split() and then just get the indexes you want, like so:
String[] strings = myString.split("\n");
int nth = 100;
for(int i = nth; i < strings.length; i + nth) {
System.out.println(strings[i]);
}
String newLine = System.getProperty("line.separator");
String lines[] = text.split(newLine);
Where text is string with your whole text.
Now to get nth line, do e.g.:
System.out.println(lines[nth - 1]); // Minus one, because arrays in Java are zero-indexed
One approach is to create a StringReader from the string, wrap it in a BufferedReader and use that to read lines. Alternatively, you could just split on \n to get the lines, of course...
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
for (int i = 0; i < allLines.length; i += 100)
{
selectedLines.add(allLines[i]);
}
This is simpler code than using a BufferedReader, but it does mean having the complete split string in memory (as well as the original, at least temporarily, of course). It's also less flexible in terms of being adapted to reading lines from other sources such as a file. But if it's all you need, it's pretty straightforward :)
EDIT: If the start indexes are needed too, it becomes slightly more complicated... but not too bad. You probably want to encapsulate the "start and line" in a single class, but for the sake of brevity:
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
List<Integer> selectedIndexes = new ArrayList<Integer>();
int index = 0;
for (int i = 0; i < allLines.length; i++)
{
if (i % 100 == 0)
{
selectedLines.add(allLines[i]);
selectedIndexes.add(index);
}
index += allLines[i].length + 1; // Add 1 for the trailing "\n"
}
Of course given the start index and the line, you can get the end index just by adding the line length :)

Categories