String manipulation coding advice - java

I got a string that looks like this.
String line = "50464,"STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS",C,04/05/2006,STIRLING,BUCHANAN";
If i want to split the string in 6 parts to put in an array with
String result = line.split(",");
i would have a problem since STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS should be only one element because it's between "" but since there is also a komma there, it would be split there also. Also i wouldn't want the "'s there.
If i look for the position of the first ' and create a substring of that pos + 1 and then look for the other " in that substring and make a substring out of that.
Now i have 2 strings of which one has everything before the first " and one string that has the part that comes after the 2nd ", and the string value of STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS. Then i can replace the komma there with nothing using replace(",", "");, maybe save the position of the komma to place it back once the string is split on komma's, but that aside. Next action would be to concatenate everyting together again so i get:
50464,STRONACHLACHAR PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS,C,04/05/2006,STIRLING,BUCHANAN
which can be succesfully split on komma's and i end up with an array of 6 elements i can work with.
In programming it would look like this. Left the part out on how to put the komma back.
String line = "50464,\"STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS\",C,04/05/2006,STIRLING,BUCHANAN";
String end2= "";
if(line.contains("\"")){
int pos = line.indexOf("\"");
String firstPart = line.substring(0, 6);
String temp = line.substring(pos+1);
int pos2 = temp.indexOf("\"");
String secondPart = temp.substring(pos2+1);
String temp2 = temp.substring(0, pos2-1);
String temp3 = temp2.replace(",", "");
String end = firstPart.concat(temp3);
end2 = end.concat(secondPart);
}
String[] output = end2.split(",");
for(int i = 0; i < output.length; i++){
System.out.println(output[i] + " ");
}
But what i am wondering about is: if this is good programming practice or am i thinking too complicated on how to do this? Since its a 1500 line file and every line would have to be checked. Even so there still might be other irregularities that need to be dealt with.
Btw, the purpose of this is that all lines should end up in an array of 6 elements, no more, no less.
What kind of parameters/thinking should i keep in mind to process a file/lines like this one?

Your data appears to be well-formed comma separated value (CSV) lines. Instead of tying yourself in knots, suggest reusing a library such as http://opencsv.sourceforge.net

Related

Java: Checking each space in a String

I'm sure this is fairly simple, however I've tried googling the question but can't find an answer that fits my problem.
I'm playing around with string manipulation and one of the things I'm trying to do is get the first letter of each word. (And then place them all into a string)
I'm having trouble with registering each 'space' so that my If statement will be triggered. Here's what I have so far.
while (scanText.hasNext()) {
boolean isSpace = false;
if (scanText.hasNext(" ")) {isSpace = true;}
String s = scanText.next();
if (isSpace) {firstLetters += s + " ";}
}
Also, if there is a much better way to do this then please let me know
You can also split the original text by spaces, and collect the words.
String input = " Hello world aaa ";
String[] split = input.trim().split("\\s+"); // all types of whitespace; " +" to pick spaces only
// operate on "split" array containing words now: [Hello, world, aaa]
However using regexps here might be overkill.
Assuming that scanText is a Scanner object, you could use something like stated on the documentation:
Scanner s = new Scanner(input).useDelimiter("\\s+"); //regex for spaces
https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

java string matching

All that I am doing in my project is taking two values(that I am reading from two different excel files) and checking how similar they are.! I tried using the pattern and matcher classes which works perfectly fine when both the words are exactly the same (as in organisation and organisation/s). In my data I have say something like (employee and employment), I just need "employ" as the common string between the two, in which case..pattern and matches fails.! I am stuck with this since a week.I have about 700 rows in the first excel file and about 9000 in the other. Each cell value that I am reading into the program using java, I am storing them in two separate variables. Next, i tried using 4 for loops to match word by word and character by character to find only those characters that match between the two.I have pasted the coded for the for loop implementation. Four for loops are like driving me nuts.! Any help in completing this would be greatly appreciated.
String str1 = "Cover for employees of the company";
String str2 = "Employment Agencies ";
String str,strfinal;
String[] count1 = str1.split("\\s+");
String[] count2 = str2.split("\\s+");
char[] count11 = str1.toCharArray();
char[] count22 = str2.toCharArray();
for(int i=0;i<count1.length;i++)
{
for(int j=0;j<count2.length;j++)
{
for(int m=0;m<count1[i].length();m++)
{
for(int n=0;n<count2[j].length();n++)
{
if(count11[m]==count22[n])
{
// please look at the logic that I am looking for to implement
}
}
}
}
}
Expected output: employ
one more concept that I am trying to implement (in order to make my program more efficient) is..
cover ----(compared with) employment. First character itself does not match.Implies go to the next word in the second string. Once all words in the second string are traversed and checked for, go to the next word in the first string and compare this word with all the words in the second string.
Okay.. so this is what I am looking for right now.. Any help will be greatly appreciated.
Thanks!

Issue in Combining splitted String

I have extracted text from "web 2.0 wikipedia" article, and splitted it into "sentences". After that, I am going to create "Strings" which each string containing 5 sentences.
When extracted, the text looks like below, in EditText
Below is my code
finalText = textField.getText().toString();
String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";
List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
colelctionOfFiveSentences = colelctionOfFiveSentences + textArrayWithFullStop[i];
if( (i%5==0) )
{
textCollection.add(colelctionOfFiveSentences);
colelctionOfFiveSentences = "";
}
}
But, when I use the Toast to display the text, here what is gives
Toast.makeText(Talk.this, textCollection.get(0), Toast.LENGTH_LONG).show();
As you can see, this is only one sentence! But I expected it to have 5 sentences!
And the other thing is, the second sentence is starting from somewhere else. Here how I have extracted it into Toast
Toast.makeText(Talk.this, textCollection.get(1), Toast.LENGTH_LONG).show();
This make no sense to me! How can I properly split the text into sentences and, create Strings containing 5 sentences each?
The problem is that for the first sentence, 0 % 5 = 0, so it is being added to the array list immediately. You should use another counter instead of mod.
finalText = textField.getText().toString();
String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";
int sentenceAdded = 0;
List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
colelctionOfFiveSentences += textArrayWithFullStop[i] + ". ";
sentenceAdded++;
if(sentenceAdded == 5)
{
textCollection.add(colelctionOfFiveSentences);
colelctionOfFiveSentences = "";
sentenceAdded = 0;
}
}
add ". " to textArrayWithFullStop[i]
colelctionOfFiveSentences = colelctionOfFiveSentences + textArrayWithFullStop[i]+". ";
I believe that if you modify the mod line to this:
if(i%5==4)
you will have what you need.
You probably realize this, but there are other reasons why someone might use a ". ", that doesn't actually end a sentence, for instance
I spoke to John and he said... "I went to the store.
Then I went to the Tennis courts.",
and I don't believe he was telling the truth because
1. Why would someone go to play tennis after going to the store and
2. John has no legs!
I had to ask, am I going to let him get away with these lies?
That's two sentences that don't end with a period and would mislead your code into thinking it's 5 sentences broken up at entirely the wrong places, so this approach is really fraught with problems. However, as an exercise in splitting strings, I guess it's as good as any other.
As a side problem(splitting sentences) solution I would suggest to start with this regexp
string.split(".(\\[[0-9\\[\\]]+\\])? ")
And for main problem may be you could use copyOfRange()

closest thing to NSScanner in Java

I'm moving some code from objective-c to java. The project is an XML/HTML Parser. In objective c I pretty much only use the scanUpToString("mystring"); method.
I looked at the Java Scanner class, but it breaks everything into tokens. I don't want that. I just want to be able to scan up to occurrences of substrings and keep track of the scanners current location in the overall string.
Any help would be great thanks!
EDIT
to be more specific. I don't want Scanner to tokenize.
String test = "<title balh> blah <title> blah>";
Scanner feedScanner = new Scanner(test);
String title = "<title";
String a = feedScanner.next(title);
String b = feedScanner.next(title);
In the above code I'd like feedScanner.next(title); to scan up to the end of the next occurrence of "<title"
What actually happens is the first time feeScanner.next is called it works since the default delimiter is whitespace, however, the second time it is called it fails (for my purposes).
You can achieve this with String class (Java.lang.String).
First get the first index of your substring.
int first_occurence= string.indexOf(substring);
Then iterate over entire string and get the next value of substrings
int next_index=indexOf( str,fromIndex);
If you want to save the values, add them to the wrapper class and the add to a arraylist object.
This really is easier by just using String's methodsdirectly:
String test = "<title balh> blah <title> blah>";
String target = "<title";
int index = 0;
index = test.indexOf( target, index ) + target.length();
// Index is now 6 (the space b/w "<title" and "blah"
index = test.indexOf( target, index ) + target.length();
// Index is now at the ">" in "<title> blah"
Depending on what you want to actually do besides walk through the string, different approaches might be better/worse. E.g. if you want to get the blah> blah string between the <title's, a Scanner is convenient:
String test = "<title balh> blah <title> blah>";
Scanner scan = new Scanner(test);
scan.useDelimiter("<title");
String stuff = scan.next(); // gets " blah> blah ";
Maybe String.split is something for you?
s = "The almighty String is mystring is your String is our mystring-object - isn't it?";
parts = s.split ("mystring");
Result:
Array("The almighty String is ", " is your String is our ", -object - isn't it?)
You know that in between your "mystring" must be. I'm not sure for start and end, so maybe you need some s.startsWith ("mystring") / s.endsWith.

Java: How To Grab Each nth Lines From a String

I'm wondering how I could grab each nth lines from a String, say each 100, with the lines in the String being seperated with a '\n'.
This is probably a simple thing to do but I really can't think of how to do it, so does anybody have a solution?
Thanks much,
Alex.
UPDATE:
Sorry I didn't explain my question very well.
Basically, imagine there's a 350 line file. I want to grab the start and end of each 100 line chunk. Pretending each line is 10 characters long, I'd finish with a 2 seperate arrays (containing start and end indexes) like this:
(Lines 0-100) 0-1000
(Lines 100-200) 1000-2000
(Lines 200-300) 2000-3000
(Lines 300-350) 3000-3500
So then if I wanted to mess around with say the second set of 100 lines (100-200) I have the regions for them.
You can split the string into an array using split() and then just get the indexes you want, like so:
String[] strings = myString.split("\n");
int nth = 100;
for(int i = nth; i < strings.length; i + nth) {
System.out.println(strings[i]);
}
String newLine = System.getProperty("line.separator");
String lines[] = text.split(newLine);
Where text is string with your whole text.
Now to get nth line, do e.g.:
System.out.println(lines[nth - 1]); // Minus one, because arrays in Java are zero-indexed
One approach is to create a StringReader from the string, wrap it in a BufferedReader and use that to read lines. Alternatively, you could just split on \n to get the lines, of course...
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
for (int i = 0; i < allLines.length; i += 100)
{
selectedLines.add(allLines[i]);
}
This is simpler code than using a BufferedReader, but it does mean having the complete split string in memory (as well as the original, at least temporarily, of course). It's also less flexible in terms of being adapted to reading lines from other sources such as a file. But if it's all you need, it's pretty straightforward :)
EDIT: If the start indexes are needed too, it becomes slightly more complicated... but not too bad. You probably want to encapsulate the "start and line" in a single class, but for the sake of brevity:
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
List<Integer> selectedIndexes = new ArrayList<Integer>();
int index = 0;
for (int i = 0; i < allLines.length; i++)
{
if (i % 100 == 0)
{
selectedLines.add(allLines[i]);
selectedIndexes.add(index);
}
index += allLines[i].length + 1; // Add 1 for the trailing "\n"
}
Of course given the start index and the line, you can get the end index just by adding the line length :)

Categories