simplifying code via refactoring

simplifying code via refactoring - java

Is there a refactoring tool, either for C or Java that can simplify this type of redundant code. I believe this is called data propagation.
This is essentially what an optimizing compiler would do.
public int foo() {
int a = 3;
int b = 4;
int c = a + b;
int d = c;
System.out.println(c);
return c;
}
into
public int foo() {
int c = 7;
System.out.println(c);
return c;
}

I think it's not a good idea.
It's for example the following code:
long hours = 5;
long timeInMillis = hours * 60 * 1000;
That's much more cleaner and understandable than just:
long timeInMillis = 300000;

I can offer a solution for C. My solution uses the two tools that I described in another answer here (in reverse order).
Here is your program, translated to C:
int foo() {
int a = 3;
int b = 4;
int c = a + b;
int d = c;
printf("%d", c);
return c;
}
Step 1: Constant propagation
$ frama-c -semantic-const-folding t.c -lib-entry -main foo
...
/* Generated by Frama-C */
/*# behavior generated:
assigns \at(\result,Post) \from \nothing; */
extern int ( /* missing proto */ printf)() ;
int foo(void)
{
int a ;
int b ;
int c ;
int d ;
a = 3;
b = 4;
c = 7;
d = 7;
printf("%d",7);
return (c);
}
Step 2: Slicing
$ frama-c -slice-calls printf -slice-return foo -slice-print tt.c -lib-entry -main foo
...
/* Generated by Frama-C */
extern int printf() ;
int foo(void)
{
int c ;
c = 7;
printf("%d",7);
return (c);
}

Yes, the best refactoring tool I've seen people using is thier brain.
The brain seems a remarkably good tool for logically organising code for consumption by other brains. It can also be used to enhance the code with comments, where appropriate, and impart additional meaning with layout and naming.
Compilers are good for optimising the code for consumption by an underlying layer closer to transistors that make up the processor. One of the benefits of a higher generation programming langauge is that it doesen't read like something a machine made.
Apologies if this seems a little glib and unhelpful. I certainly have used variaious tools but I don't recall any tool that handled "data propogation."

Eclipse (and I'm sure NetBeans and IntelliJ) has almost all these refactorings available. I'll give the specifics with Eclipse. Start with:
public int foo() {
int a = 3;
int b = 4;
int c = a + b;
int d = c;
System.out.println(c);
return c;
}
First, d will show as a warning that you have an unread local variable. <CTRL>+1 on that line and select "Remove d and all assignments". Then you have:
public int foo() {
int a = 3;
int b = 4;
int c = a + b;
System.out.println(c);
return c;
}
Next, highlight the a in int c = a + b; and type <CTRL>+<ALT>+I to inline a. Repeat with b and you will have:
public int foo() {
int c = 3 + 4;
System.out.println(c);
return c;
}
Now you're almost there. I don't know of a refactoring to convert 3+4 into 7. It seems like it would be easy for someone to implement, but is probably not a common use-case as others have pointed out that, depending on the domain, 3+4 can be more expressive than 7. You could go further and inline c, giving you:
public int foo() {
System.out.println(3 + 4);
return 3 + 4;
}
But it is impossible to know if this an improvement or a step backwards without knowing the 'real' problem with the original code.

the semantic information of the code may get lost. possible dependencies might break. In short: only the programmer knows which variables are important or may become important, since only the programmer knows the context of the code. I'm afraid you'll have to do the refactoring yourself

Yes, IntelliJ offers this functionality inside of their community edition. Now to address a more serious issue, I am pretty sure you are mixing up compilation with refactoring. When you compile something you take a language higher than machine code and convert it into machine code (essentially). What you want is to remove declarations that are redundant inside the high level language that is your program file, .c,.java,etc . It is quite possible that the compiler has already optimized the less than great code into what you propose, there are tools available to see what it is doing. In terms of refactoring less is typically better, but do not sacrifice maintainability for less lines of code.

One possible approach is to put it into a symbolic math program (like Mathematica or Maple) and have it do the simplification for you. It will do it regardless of whether they are constants or not.
The drawback is that you need to convert the code to a different language. (Though it could be mostly copy and paste if the syntax is similar.) Furthermore, it could be dangerous if you expect certain integer types to overflow at a specific size. Symbolic math programs don't care and will optimize it according to the "math". Same thing goes for floating-point round-off errors.
In your example, if you enter this into Mathematica:
a = 3;
b = 4;
c = a + b;
d = c;
c
Will output this in Mathematica:
7
Of course you can't just copy and paste because it's a different language and different syntax, but it's the best thing I have in mind for your question. I myself use Mathematica to simplify expressions and other math before I throw it into C/C++.
For a more complicated example involving unknowns:
Original C Code:
int a = 3 + x*x;
int b = 4 + y*y;
int c = a + b - 7 + 2*x*y;
int d = c;
Enter this into Mathematica (which is still mostly copy+paste):
a = 3 + x*x;
b = 4 + y*y;
c = a + b - 7 + 2*x*y;
d = c;
FullSimplify[c]
Output:
(x + y)^2
Which transforms back into the following C-code:
d = (x + y)
d = d * d;
This is obviously much more simple than the original code. In general, symbolic programs will even handle non-trivial expressions and will do just as well (or even better) than any compiler internal.
The final drawback is that symbolic math programs like Mathematica or Maple aren't free and are fairly expensive. SAGE is an open-sourced program, but I hear it is not as good as either Mathematica or Maple.

If you're talking about C, you could look at the compiled, optimized assembly code. Then you could refactor your C code to the same structure as the optimized assembly. Like Alfredo said, though, that could lead to more ambiguous code.

Why not compile the code using an optimizing compiler. Then decompile the code. It is just my thought and I have not tried it out.

Related

Code duplication vs new variable creation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Looking for which of the 2 options below is considered good programming practice ?
This is a case of 2 options:
duplicates code, but does not create new variables / objects
if (a > b) {
calcSum (a, b) + calcDiff(a, b);
} else {
calcSum (b, a) + calcDiff(b, a);
}
vs
does not duplicate code, but creates new variables / objects
int big;
int small
if (a > b) {
big = a;
small = b;
} else {
big = a;
small = b;
}
calcSum (big, small) + calcDiff(big, small);

Despite what others here have said, I think the second option is preferred. Duplication of your business logic makes it hard to test and hard to maintain, so you should look for ways to avoid duplicating it.
Another way to do that would be to make it a helper method (calculateSumPlusDifference(int max, int min)) and then use it:
if (a > b) {
calculateSumPlusDifference(a, b);
} else {
calculateSumPlusDifference(b, a);
}
This way you still get compact logic around a and b, but you avoid duplicating your business logic.

I would say first option is preferable, as it is both more compact and doesn't create new variables (although that is not really a big concern, it would probably be optimized out).
Also you can use the Math library's min and max methods to do it all on one line if you so desire.

They both work. However, you really what to see what's more important. Readability or efficiency? For instance, what some people suggested is using Math.max sure, this works. However, its very inefficient. Doing any sort of Math. produces a lot of extra unneeded calls. However, its more readable. So We go back down and beg the same question. Readability or efficiency.
Weigh you options, see what is maintainable, readable, and efficient. In the end they obviously both work use YOUR judgement based on the project.
TDLR: if that line of code is going to be needed to be run 100+ times at 1 millisecond go with the second approach if it's only a one time practically linearly ran code go with readability.

I feel like these answers are all missing the point of the question: is it better to duplicate code or to spend memory creating new variables to avoid that duplication?
Well, we can actually put that to the test, can't we.
I wrote a class to run both of your examples with random values of a and b (between 1 and 10), 100,000,000 times each. The results are actually very revealing:
//trial 1
Without Creating Variables: 2.220987762 seconds
vs With Creating Variables: 2.218305816 seconds
//trial 2
Without Creating Variables: 2.215427479 seconds
vs With Creating Variables: 2.220663639 seconds
//trial 3
Without Creating Variables: 2.345803733 seconds
vs With Creating Variables: 2.347936366 seconds
Basically, there is practically no difference between the two options speed-wise—even when running them one hundred million times.
So, with that established, the option that represents best practice becomes whichever one makes the code more readable and understandable, which is definitely the second.
Here's the test class, if you'd like to check for yourself:
import java.util.Random;
public class Test {
public static void main(String[] args) {
int testLoops = 100000000;
Random rand = new Random();
//first test
long startTime1 = System.nanoTime();
for(int i=0;i<testLoops;i++) {
int a = rand.nextInt(10)+1;
int b = rand.nextInt(10)+1;
int answer;
if (a > b) {
answer = calcSum (a, b) + calcDiff(a, b);
} else {
answer = calcSum (b, a) + calcDiff(b, a);
}
}
double seconds1 = (double)(System.nanoTime() - startTime1) / 1000000000.0;
System.out.println("Without Creating Variables: " + seconds1 + " seconds");
//second test
long startTime2 = System.nanoTime();
for(int i=0;i<testLoops;i++) {
int big;
int small;
int a = rand.nextInt(10)+1;
int b = rand.nextInt(10)+1;
int answer;
if (a > b) {
big = a;
small = b;
} else {
big = a;
small = b;
}
answer = calcSum (big, small) + calcDiff(big, small);
}
double seconds2 = (double)(System.nanoTime() - startTime2) / 1000000000.0;
System.out.println(" With Creating Variables: " + seconds2 + " seconds");
}
public static int calcSum(int a, int b) {
return a+b;
}
public static int calcDiff(int a, int b) {
return a-b;
}
}

In the example that you provided, the difference is negligible, referring to speed as well as readability.
It was obviously "pseudo"-code, otherwise I'd propose a method like
void calcBoth(int a, int b) {
calcSum(a,b)+calcDiff(a,b);
}
// Call:
if (a < b) calcBoth(a,b);
else calcBoth(b,a);
Maybe something like this is applicable in your case as well.

The functional programmer purist in me might suggest:
public int calc(int a, int b) {
return (a >= b)
? calcSum(a, b) + calcDiff(a, b)
: calc(b, a);
}
At a cost of one recurse you have eliminated all temporary variables and succinctly stated your intent.
An efficient compiler may even recognize the tail recursion and substitute the correct tail-recursion-avoidance mechanism.

I prefer the second because it's easier to see your intentions, but I would have just rewritten calcDiff so that the order of the parameters don't matter.
public int calcDiff(int a, into b) {
if (a > b)
return a - b;
else
return b - a;
}
Presumably the order doesn't matter in calcSum.
public int calcSum(int a, int b) {
return a + b;
}
Then all you need is calcSum(a, b) + calcDiff(a, b).
But as a more general rule I'm not afraid of creating new variables if it improves readability. It doesn't usually make much difference to speed.

Code hunt game 02.05 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Idea is to make the most elegant code as you can. Theme of section: loops.
Task: return sum of squares of numbers from 1 to (n-1)
example: 6 -> 55 (which is 1^2 + 2^2 + 3^2 + 4^2 + 5^2)
I chose Java as language and wrote this code:
public class Program {
public static int Puzzle(int n) {
int r=0;
for(--n;n>=0;r+=n*n--);
return r;
}
}
but compiler says that my code is not elegant enough. Can you help?
Link:CodeHunt

You don't need loop that series. See http://en.wikipedia.org/wiki/Square_pyramidal_number
public class Program {
public static int Puzzle(int n) {
int x = n -1;
//sum of first n-1 squares
return x*(x+1)*(2*x+1)/6;
}
}

Elegance may be a combination of simplicity and accuracy. The biggest issue with your method is that it isn't simple; it may produce the correct result, but it's needlessly complicated with unusual iteration and the fact that you're running a for-loop for its side effects.
Why not go with the more direct approach instead?
public static Puzzle(int n) {
int sum = 0;
for(int i = 1; i <= n; i++) {
sum += Math.pow(i, 2);
}
return sum;
}

Several very inelegant aspects:
Inline increments/decrements: they make the code very confusing, because most people are no experts in when the variable will actually decrement, what will for instance happen with m = (n--)*(--n); (the answer is m = n*(n-2); n -= 2;). For some small simple expressions inline decrements make code more readible. But in nearly all cases, there is no performance gain, as the compiler is smart enough to convert a readible code to one with inline increment/decrement itself.
loops with no body: most people simply get confused, think the next instruction is part of the body, etc. Most IDEs even advice to always use braces and write something in the body.
manipulation of a parameters: this is confusing and makes code less extensible. Say you want to extend your code with some part below and you perform copy-past, since the parameters don't have their original value, the pasted code will work differently. IDEs mostly advice to make at least a copy. Nearly every compiler can optimize this if it turns out the parameters is not used any further.
decrement in for loop: although this sometimes yields a small improvement in code performance, most programmers are used to for loops that increment.
semantical variable names (something a compiler cannot detect): it is recommended that you name your variables appropriately, use sum instead of r. The java compiler sees names simply as identifiers. So at runtime there is no difference, it is however more readable for other people and yourself when you revisit your code months later.
These are all very bad ways to write an algorithm. Most books strongly suggest that unless you really need to write a code that takes the absolute maximum out of your CPU, you better write nice, well structured and readable code. And furthermore if that is the case, there are more efficient languages than Java.
As a better version, I recommend the following code:
public class Program {
public static int Puzzle(int n) {
int sum = 0;
for(int i = 1; i < n; i++) {
sum += i*i;
}
return sum;
}
}
Furthermore you don't need a for-loop to calculate this (as pointed out here):
public class Program {
public static int Puzzle(int n) {
return n*(n-1)*(2*n-1)/6;
}
}

Need explaination on two different approach to find factorial

I am a beginner.I already learned C. But now Java is seeming difficult to me. As in C programming my approach was simple , when I looked at Book's programs for simple task such as Factorial, its given very complex programs like below -
class Factorial {
// this is a recursive method
int fact(int n) {
int result;
if(n==1) return 1;
result = fact(n-1) * n;
return result;
}
}
class Recursion {
public static void main(String args[]) {
Factorial f = new Factorial();
System.out.println("Factorial of 3 is " + f.fact(3));
System.out.println("Factorial of 4 is " + f.fact(4));
System.out.println("Factorial of 5 is " + f.fact(5));
}
}
Instead, when I made my own program (given below) keeping it simple , it also worked and was easy. Can anyone tell me what's the difference between two ?
public class Simplefacto {
public static void main(String[] args) {
int n = 7;
int result = 1;
for (int i = 1; i <= n; i++) {
result = result * i;
}
System.out.println("The factorial of 7 is " + result);
}
}
also can anyone tell me what is java EE and java SE ?

The first approach is that of recursion. Which is not always fast and easy. (and usually leads to StackOverflowError, if you are not careful). The second approach is that of a normal for loop. Interstingly, both approaches are valid even in "C".
I think you should not compare Java programs with C programs. Both languages were designed for different reasons.

There are two main differences between those programs:
Program 1 uses recursion
Program 2 uses the imperative approach
Program 1 uses a class where all program logic is encapsuled
Program 2 has all the logic "like the good old C programs" in one method

The first method is Recursive. This means that the method makes calls to itself and the idea behind this is that recursion (when used appropriately) can yield extremely clean code, much like your factorial method. Formatted correctly is should look more like:
private int factorial(int n) {
if(n==1) return n;
return fact(n-1) * n;
}
So that's a factorial calculator in two lines, which is extremely clean and short. The problem is that you can run into problems for large values of n. Namely, the infamous StackOverflowError.
The second method is what is known as iterative. Iterative methods usually involve some form of a loop, and are the other option to recursion. The advantage is that they make quite readable and easy to follow code, even if it is somewhat more verbose and lengthy. This code is more robust and won't fall over for large values of n, unless n! > Integer.MAX_VALUE.

In the first case, you are adding a behavior that can be reused in multiple behaviors or main() while in the second case, you are putting inline code thats not reusable. The other difference is the recursion vs iteration. fact() is based on recursion while the inline code in main() is achieving the same thing using iteration

Simple assignment coming up with wrong value

private static void convert(int x) {
// assume we've passed in x=640.
final int y = (x + 64 + 127) & (~127);
// as expected, y = 768
final int c = y;
// c is now 320?!
}
Are there any sane explanations for why the above code would produce the values above? This method is called from JNI. The x that is passed in is originally a C++ int type that is static_cast to a jint like so: static_cast<jint>(x);
In the debugger, with the breakpoint set on the y assignment, I see x=640. Stepping one line, I see y=768. Stepping another line and c=320. Using the debugger, I can set the variable c = y and it will correctly assign it 768.
This code is single threaded and runs many times per second and the same result is always observed.
Update from comments below
This problem has now disappeared entirely after a day of debugging it. I'd blame it on cosmic rays if it didn't happen reproducibly for an entire day. Oddest thing I've seen in a very long time.
I'll leave this question open for a while in case someone has some insight on what could possibly cause this.

Step 01: compile it right, see comments under your post.
if needed i with this code it will go:
C# Code:
private void callConvert(object sender, EventArgs e)
{
string myString = Convert.ToString(convert123(640));
textBox1.Text = myString;
}
private static int convert123(int x) {
// assume we've passed in x=640.
int y = (x + 64 + 127) & (~127);
// as expected, y = 768
int c = y;
// c is now 320?!
return (c);
}
but its a c# code
and a tipp for you NEVER call your funktion with a name that is used in the compiler as an standart.
convert is in the most langues used.
(system.convert)

Have you set c to 320 recently? If so, it may have been stored in some memory and the compiler may have reassigned it to what it thought it was and not what it should be. I am, in part, guessing though.

It looks like problem of memory byte size of temporary variables if program is optimized for memory usage. Debugger may not be reliable. I see if the temporary ~127 is store in a byte, then you may reach at the scenario you observed. It all depends on what is ~127 is stored in at run time.

Java: set value and check condition at same time

(In the process of writing my original question, I answered it, but the information might be useful to others, and I thought of a new question)
For instance:
int x;
if (x = 5) { ... }
Creates an error:
Type mismatch: cannot convert from int to boolean. (Because assignment doesn't return a
boolean value)
However,
int x;
if ((x = 5) == 5) {
System.out.println("hi!");
}
will print out "hi!"
And similarly,
String myString = "";
if ((myString = "cheese").equals("cheese")) {
System.out.println(myString);
}
prints out "cheese"
Sadly,
if ((int x = 5) > 2) { ... }
does not work with an in-line declaration. How come? Can I get around this?

Sadly,
I suspect that most Java developers would heartily disagree with that sentiment ...
if ((int x = 5) > 2) { ... }
does not work with an in-line
declaration. How come?
It does not work because a declaration is not a Java expression, and cannot be used in an Java expression.
Why did the Java designers not allow this? I suspect that it is a combination of the following:
Java's syntactic origins are c and C++, and you cannot do this in C or C++ either,
this would make the Java grammar more complicated and the syntax harder to understand,
this would make it easier to write obscure / cryptic programs in Java, which goes against the design goals, and
it is unnecessary, since you can trivially do the same thing in simpler ways. For instance, your example can be rewriten this to make the declaration of x to a separate statement.
Can I get around this?
Not without declaring x in a preceding statement; see above.
(For what it is worth, most Java developers avoid using assignments as expressions. You rarely see code like this:
int x = ...;
...
if ((x = computation()) > 2) {
...
}
Java culture is to favour clear / simple code over clever hacks aimed at expressing something in the smallest number of lines of code.)

Your x only exists within the scope of the assignment, so it's already gone by the time you get to > 2. What is the point of this anyway? Are you trying to write deliberately unreadable code?
Your best way to get around this is to declare x in a scope that will remain valid throughout the if statement. Seriously though, I fail to understand what you're doing here. Why are you creating a variable that is supposed to disappear again immediately?

if ((int x = 5) > 2) { ... }
Yes this will not compile because you can't declare variables inside the condition section of if clause

The > test will work fine, as long as you declare the int outside of the if condition. Perhaps you are simplifying your condition for the sake of brevity, but there is no reason to put your declaration in the condition.
Can I get around this?
Yes, declare your var outside the condition.

Because you didn't declare the int separately as you did in the == test.
jcomeau#intrepid:/tmp$ cat /tmp/test.java
class test {
public static void main(String[] args) {
int x;
if ((x = 5) > 2) System.out.println("OK");
}
}

In Java, for() allows initialization code, but if() doesn't.

You can't declare the variable in condition section. For example
for(int i = 0; j < 9; i++){...}
is completely valid statement. Notice we declare the variable in for but not in a condition clause, now look at this,
for(int i = 0; (int j = 0)<9; i++){...} // Don't try to make logical sense out of it
not allowed.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

simplifying code via refactoring - java

I think it's not a good idea. It's for example the following code: long hours = 5; long timeInMillis = hours * 60 * 1000; That's much more cleaner and understandable than just: long timeInMillis = 300000;

the semantic information of the code may get lost. possible dependencies might break. In short: only the programmer knows which variables are important or may become important, since only the programmer knows the context of the code. I'm afraid you'll have to do the refactoring yourself

If you're talking about C, you could look at the compiled, optimized assembly code. Then you could refactor your C code to the same structure as the optimized assembly. Like Alfredo said, though, that could lead to more ambiguous code.

Why not compile the code using an optimizing compiler. Then decompile the code. It is just my thought and I have not tried it out.

Related

Code duplication vs new variable creation [closed]

Code hunt game 02.05 [closed]

Need explaination on two different approach to find factorial

Simple assignment coming up with wrong value

Java: set value and check condition at same time

Categories

Resources