Java char array copying yielding weird results - java

I wanted to optimize my code, so instead of copying my entire char array for each iteration in the alphabet, I opted to do the copying beforehand and then I'd just add chars into the copy.
E.g.:
copy "lord" (i=0)
modify the first letter (aord, bord, cord &c)
copy "lord" (i=1)
modify the second letter (lard, lbrd, lcrd &c)
&c
for (int i = 0; i < wordLength; i++) {
Word moddedWord = new Word(Arrays.copyOf(temp.word.content, wordLength));
for (int c = 0; c < alphabetLength; c++) {
if (alphabet[c] != temp.word.content[i]) {
// Word moddedWord = new Word(Arrays.copyOf(temp.word.content, wordLength));
moddedWord.content[i] = alphabet[c];
Word res = WordList.Contains(moddedWord);
if (res != null && WordList.MarkAsUsedIfUnused(res)) {
WordRec wr = new WordRec(res, temp);
q.Put(wr);
}
}
}
}
However, when I do this small change, my program doesn't work, when it used to when I instead used the commented line for copying. I've debugged this for hours on end now and I can find nothing that changes this, I've tried various forms of copying, I've tried storing the "original" word as a String and then converting it to a char array when I need to copy it, nothing seems to work. Oh by the way, "Word" is just a wrapper for char[] (Word.content is a char[] field).

You can't avoid copying if you want to store each modification of the word. Here:
new WordRec(res, temp);
you create a word record based on the mutable instance of the word and then you keep changing that one instance. You'd need to copy temp inside this constructor. So the best you achieve is copying a bit later, possibly a bit less due to the "ifology" within which it happens.
Now, if you really want to improve performance, then rework the WordList to be a WordSet and have O(1) lookup time with the Contains method.
A final note: please respect the Java naming conventions. Methods start with a lowercase letter.

Related

Why is the second code more efficient than the first one?

I am confused between two codes, why the second one I am going to give here is more efficient than the first one.
Both of the codes just reverse a String, but first code is slower than the other and I am not able to understand why.
The first code is:
String reverse1(String s) {
String answer = "";
for(int j = s.length() - 1; j >= 0; j--) {
answer += s.charAt(j);
}
return answer;
}
The second code is:
String reverse2(String s) {
char answer[] = new char[s.length()];
for(int j = s.length() - 1; j >= 0; j--) {
answer[s.length() - j - 1] = s.charAt(j);
}
return new String(answer);
}
And I'm not able to understand how the second code is more efficient than the first one, I'd appreciate any insight on this.
The first code declares
String answer;
Strings are immutable. Therefore, every append operation reallocates the entire string, copies it, then copies in the new character.
The second code declares
char answer[];
Arrays are mutable, so each iteration copies only a single character. The final string is created once, not once per iteration of the loop.
Your question is perhaps difficult to answer exactly, in part because the answer would depend on the actual implementation of the first version. This, in turn, would depend on what version of Java you are using, and what the compiler decided to do.
Assuming that the compiler keeps the first version verbatim as you wrote it, then yes, the first version might be more inefficient, because it would require allocating a new string for each step in the reversal process. The second version, on the contrary, just maintains a single array of characters.
However, if the compiler is smart enough to use a StringBuilder, then the answer changes. Consider the following first version:
String reverse1(String s) {
StringBuilder answer = new StringBuilder();
for (int j = s.length() - 1; j >= 0; j--)
answer.append(s.charAt(j));
return answer;
}
Under the hood, StringBuilder is implemented using a character array. So calling StringBuilder#append is somewhat similar to the second version, i.e. it just adds new characters to the end of the buffer.
So, if your first version executes using literal String, then it is more inefficient than the second version, but using StringBuilder it might be on par with the second version.
String is immutable. Whenever you do answer += s.charAt(j); it creates a new object. Try printing GC logs using -XX:+PrintGCDetails and see if the latency is caused by minor GC.
String object is immutable and every time you made an add operation you create another object, allocating space and so on, so it's quite inefficient when you need to concatenate many strings.
Your char array method fits your specific need well, but if you need more generic string concatenation support, you could consider StringBuilder
In this code you are creating a new String object in each loop iteration,because String is immutable class
String reverse1(String s) {
String answer = "";
for (int j = s.length() - 1; j >= 0; j--)
answer += s.charAt(j);
return answer;
}
In this code you have already allocated memory to char array,Your code will create only single String at last line, so it is more efficient
String reverse2(String s) {
char answer[] = new char[s.length()];
for (int j = s.length() - 1; j >= 0; j--)
answer[s.length() - j - 1] = s.charAt(j);
return new String(answer);
}
Why is the second code more efficient than the first one?
String is immuable, by answer += s.charAt(j); you are creating a new instance of String in each loop, which makes your code slow.
Instead of String, you are suggested to use StringBuilder in a single thread context, for both performance and readablity(might be a little slower than fix-sized char array but has a better readablity):
String reverse1(String s) {
StringBuilder answer = new StringBuilder("");
for (int j = s.length() - 1; j >= 0; j--)
answer.append(s.charAt(j));
return answer.toString();
}
The JVM treats strings as immutable. Hence, every time you append to the existing string, you are actually create a new string! This means that a new string object has to be created in heap for every loop iteration. Creating an object and maintaining its lifecycle has its overhead. Add to that the garbage collection of the discarded strings (the string created in the previous iteration won't have a reference to it in the next, and hence, it is collected by the JVM).
You should consider using a StringBuilder. I ran some tests and the time taken by the StringBuilder code is not much smaller than that of the fixed-length array.
There are some nuances to how the JVM treats strings. There are things like string interning that the JVM does so that it does not have to create a new object for multiple strings with the same content. You might want to look into that.

java split a string and put into an array position

So i am using string.split because i need to take certain parts of a string and then print the first part. The part size may vary so I can't use substring or a math formula. I am attempting to store everything I need in the string array to then selectively print what I need based on the position, this much I can control. However, I am not sure what to do because I know when I do a split, it takes the two parts and stores them in the array. However, there is one case where I need that value in the array untouched. I'm afraid if I do
format[0] = rename
That it will overwrite that value and mess up the entire array. My question is how do I assign a position to this value when I don't know what the position of the others will be? Do I need to preemptively assign it a value or give it the last possible value in the array? I have attached a segment of the code that deals with my question. The only thing I can add is that this is in a bigger loop and rename's value changes every iteration. Don't pay to much attention to the comments, those are more of reminders for me as to what to do rather than what the code is suppose to do. Any pointers, tips, help is greatly appreciated.
String format[];
rename = workbook.getSheet(sheet).getCell(column,row).getContents();
for(int i = 0; i < rename.length(); i++) {
//may need to add[i] so it has somewhere to go and store
if(rename.charAt(i) == '/') {
format = rename.split("/");
}
else if(rename.charAt(i) == '.') {
if(rename.charAt(0) == 0) {
//just put that value in the array
format = rename;
} else {
//round it to the tenths place and then put it into the array
format = rename.split("\\.");
}
} else if(rename.charAt(i) == '%') {
//space between number and percentage
format = rename.split(" ");
}
}
Whenever you assign a variable it gets overwritten
format[0] = rename
Will overwrite the first index of this array of Strings.
In your example, the 'format' array is being overwritten with each iteration of the for loop. After the loop has been completed 'format' will contain only the values for the most recent split.
I would suggest looking into using an ArrayList, they are much easier to manage than a traditional array and you can simply just iterate through the split values and append them at the end.

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it

Determining if a given string of words has words greater than 5 letters long

So, I'm in need of help on my homework assignment. Here's the question:
Write a static method, getBigWords, that gets a String parameter and returns an array whose elements are the words in the parameter that contain more than 5 letters. (A word is defined as a contiguous sequence of letters.) So, given a String like "There are 87,000,000 people in Canada", getBigWords would return an array of two elements, "people" and "Canada".
What I have so far:
public static getBigWords(String sentence)
{
String[] a = new String;
String[] split = sentence.split("\\s");
for(int i = 0; i < split.length; i++)
{
if(split[i].length => 5)
{
a.add(split[i]);
}
}
return a;
}
I don't want an answer, just a means to guide me in the right direction. I'm a novice at programming, so it's difficult for me to figure out what exactly I'm doing wrong.
EDIT:
I've now modified my method to:
public static String[] getBigWords(String sentence)
{
ArrayList<String> result = new ArrayList<String>();
String[] split = sentence.split("\\s+");
for(int i = 0; i < split.length; i++)
{
if(split[i].length() > 5)
{
if(split[i].matches("[a-zA-Z]+"))
{
result.add(split[i]);
}
}
}
return result.toArray(new String[0]);
}
It prints out the results I want, but the online software I use to turn in the assignment, still says I'm doing something wrong. More specifically, it states:
Edith de Stance states:
⇒     You might want to use: +=
⇒     You might want to use: ==
⇒     You might want to use: +
not really sure what that means....
The main problem is that you can't have an array that makes itself bigger as you add elements.
You have 2 options:
ArrayList (basically a variable-length array).
Make an array guaranteed to be bigger.
Also, some notes:
The definition of an array needs to look like:
int size = ...; // V- note the square brackets here
String[] a = new String[size];
Arrays don't have an add method, you need to keep track of the index yourself.
You're currently only splitting on spaces, so 87,000,000 will also match. You could validate the string manually to ensure it consists of only letters.
It's >=, not =>.
I believe the function needs to return an array:
public static String[] getBigWords(String sentence)
It actually needs to return something:
return result.toArray(new String[0]);
rather than
return null;
The "You might want to use" suggestions points to that you might have to process the array character by character.
First, try and print out all the elements in your split array. Remember, you do only want you look at words. So, examine if this is the case by printing out each element of the split array inside your for loop. (I'm suspecting you will get a false positive at the moment)
Also, you need to revisit your books on arrays in Java. You can not dynamically add elements to an array. So, you will need a different data structure to be able to use an add() method. An ArrayList of Strings would help you here.
split your string on bases of white space, it will return an array. You can check the length of each word by iterating on that array.
you can split string though this way myString.split("\\s+");
Try this...
public static String[] getBigWords(String sentence)
{
java.util.ArrayList<String> result = new java.util.ArrayList<String>();
String[] split = sentence.split("\\s+");
for(int i = 0; i < split.length; i++)
{
if(split[i].length() > 5)
{
if(split[i].matches("[a-zA-Z]+"))
{
result.add(split[i]);
}
if (split[i].matches("[a-zA-Z]+,"))
{
String temp = "";
for(int j = 0; j < split[i].length(); j++)
{
if((split[i].charAt(j))!=((char)','))
{
temp += split[i].charAt(j);
//System.out.print(split[i].charAt(j) + "|");
}
}
result.add(temp);
}
}
}
return result.toArray(new String[0]);
}
Whet you have done is correct but you can't you add method in array. You should set like a[position]= spilt[i]; if you want to ignore number then check by Float.isNumber() method.
Your logic is valid, but you have some syntax issues. If you are not using an IDE like Eclipse that shows you syntax errors, try commenting out lines to pinpoint which ones are syntactically incorrect. I want to also tell you that once an array is created its length cannot change. Hopefully that sets you off in the right directions.
Apart from syntax errors at String array declaration should be like new String[n]
and add method will not be there in Array hence you should use like
a[i] = split[i];
You need to add another condition along with length condition to check that the given word have all letters this can be done in 2 ways
first way is to use Character.isLetter() method and second way is create regular expression
to check string have only letter. google it for regular expression and use matcher to match like the below
Pattern pattern=Pattern.compile();
Matcher matcher=pattern.matcher();
Final point is use another counter (let say j=0) to store output values and increment this counter as and when you store string in the array.
a[j++] = split[i];
I would use a string tokenizer (string tokenizer class in java)
Iterate through each entry and if the string length is more than 4 (or whatever you need) add to the array you are returning.
You said no code, so... (This is like 5 lines of code)

startsWith(String) method and arrays

I have to take a string and convert the string to piglatin. There are three rules to piglatin, one of them being:
if the english word starts with a vowel return the english word + "yay" for the piglatin version.
So i tried doing this honestly expecting to get an error because the startsWith() method takes a string for parameters and not an array.
public String pigLatinize(String p){
if(pigLatRules(p) == 0){
return p + "yay";
}
}
public int pigLatRules(String r){
String vowel[] = {"a","e","i","o","u","A","E","I","O","U"};
if(r.startsWith(vowel)){
return 0;
}
}
but if i can't use an array i'd have to do something like this
if(r.startsWith("a")||r.startsWith("A")....);
return 0;
and test for every single vowel not including y which would take up a very large amount of space, and just personally I would think it would look rather messy.
As i write this i'm thinking of somehow testing it through iteration.
String vowel[] = new String[10];
for(i = 0; i<vowel[]; i++){
if(r.startsWith(vowel[i]){
return 0;
}
I don't know if that attempt at iteration even makes sense though.
Your code:
String vowel[] = new String[10];
for(i = 0; i<vowel[]; i++){
if(r.startsWith(vowel[i]){
return 0;
}
}
Is actually really close to a solution that should work (assuming you actually put some values in the array).
What values do you need to put in it, well as you mentioned you can populate the array with all the possible values for vowels. Those of course being
String[] vowel={"a","A","e","E","i","I","o","O","u","U"};
now you have this you would want to loop (as you worked out) over the array and do your check:
public int pigLatRules(String r){
final String[] vowels={"a","A","e","E","i","I","o","O","u","U"};
for(int i = 0; i< vowels.length; i++){
if(r.startsWith(vowels[i])){
return 0;
}
}
return 1;
}
There are some improvements you can make to this though. Some are best practice some are just choice, some are performance.
As for a best practice, You are currently returning an int from this function. You would be best to change the result of this function to be a boolean value (I recommend looking them up if you have not encountered them).
As for a choice you say you do not like having to have an array with the upercase and lowercase vowels in. Well here is a little bit of information. Strings have lots of methods on them http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html one of them is toLowerCase() which as you can guess lowercases a whole string. if you do this to the work you pass in to your function, you cut the amount of checks you need to do in half.
There is lots more you cam get into but this is just a little bit.
Put all those characters in a HashSet and then just perform a lookup to see if the character is valid or not and return 0 accordingly.
Please go through some example on HashSet insert/lookup. It should be straightforward.
Hope this helps.
Put all the vowels in a string, grab the first char in the word you are testing and just see if your char is in the string of all vowels.

Categories