Welcome to Hydrargentium's: We Blog!

Thursday, January 24, 2008

Geek lesson: refactoring

And now, just to prove that I really do know something about computers, here's a little lesson on refactoring.

First, some background. Programming a computer is nothing more than providing a set of instructions for the computer to perform. Granted, the instructions are, in almost all cases, extremely detailed -- to the point that a large majority of people don't have the patience and/or critical and logical thinking skills to do it successfully. Don't get me wrong, though. I know plenty of very intelligent people that could never program computers for a living, and I've known a number of professional programmers who I do not consider especially intelligent. It's not just smarts that make a programmer, it's a particular kind of smarts.

Now, when programmers build up sets of instructions, which taken altogether is called a program, they frequently (for various reasons, including simple organization) group chunks of related instructions into separate units, or pieces of a program. Programmers have various names for these pieces: procedures, functions, subroutines, methods, macros. For the purpose of this lesson, we'll call them functions.

So, imagine you've got a function, a set of related instructions, that finds all of the information (name, address, etc.) for a single, specific person. Programmers like to name functions, usually with more than one word in the name to describe what the instructions do. If the specific person's name was Brenda, then the name of the function that finds Brenda's information might be called "findBrenda". Also, in many programming languages, function names are denoted with a pair of parentheses after the words, like this: findBrenda(). I'm used to that myself, so I'll continue to use this notation for all of the functions I name here.

Later on, as a programmer, you may find you need a set of instructions that finds information for David. To make life easy, instead of recreating all of the instructions included in findBrenda(), you would simply copy those instructions, and then alter the copy slightly so that it looks for David's information instead of Brenda's. This modified copy of the findBrenda() instructions would be put into a new function, called findDavid().

As your program gets more complicated, you find you have to make more of these finder functions. So, you copy and modify the instructions again and again, making findAndrew(), and findStacey(), and findPierre(). No worries, you think, since it only takes a few minutes to do the copy and change.

Then, things change in the overall program, so that the information for the various people has to be found in a different way. This means that you have to modify findBrenda(), so that it looks for the information using the new method. A quick review of the program, though, shows that you're also going to have to make the same changes to all the other finder functions. Still, that's not too bad, since you can just copy the findBrenda() instructions again, and make the small changes needed to find David instead. And then you have to do the same thing for Andrew, Stacey and Pierre.

Man, this is beginning to seem like a lot of work -- especially when your boss tells you to add finder functions for another ten people. And then you overhear a conversation in the elevator about how the way information for people is found is going to change again next month. All of a sudden, you'll be like, "OMG! This is craziness, all this copying and recopying. There must be a better way!"

Well, it turns out there is: refactoring. In computer terms, refactoring is the process by which common sets of instructions are "factored out" into more generic functions. In other words, we can take the instructions that we copied from findBrenda(), and put them in a less specific function, modifying them so that they will work with whatever name they are given.

So, for our example, we would take out the information-finding instructions, and put them in a new function, called findPerson(). This new function is a little more special, since it doesn't do much on its own. In fact, it can't really do anything until it knows exactly which person for whom its supposed to find information.

But what do we do with this special function? Well, we can now change the other functions (findBrenda(), et al) so that, instead of performing their own set of instructions for finding information, they invoke the instructions contained in findPerson(). Part of this invocation requires providing the findPerson() instructions with the name of the person to find. Thus, findBrenda() would invoke findPerson(), and give it the name "Brenda". Similarly, findDavid() would use findPerson(), specifying "David", and the other finder methods would all be changed to rely on findPerson() as well, each providng their own specific name.

How, you might ask, does this save us any work? We still need to change each of the specific finder functions to invoke the findPerson() instructions. In fact, to save time, we likely did it for findBrenda(), and then copied the solution to the other functions, changing the name of the person specified for each copy. That's essentially the same as what we were doing before we refactored, right?

That's a completely true assessment. While we will potentially save a tiny bit of time, since the single instruction to invoke findPerson() is smaller than the full set of finding instructions, which will mean fewer keystrokes to select the instructions being copied (or less mouse movement), the difference is negligible -- even if we end up doing fifty more finder functions for fifty other people. However, we know that a change in the finder instructions is coming. In fact, this will be the second time they've changed, and you know it's likely that they will have to change again in the future. Every time the instructions need to change, you will have saved the time it takes to copy and modify the new set of instructions for each finder function.

Why is this? Remember how you changed all the finder function to simply invoke findPerson()? Well, from now on, whenever you have to change the instructions for finding a person's information, you will only ever have to change findPerson(). The other, more specific finder functions won't need to be changed, because they don't use those instructions directly. Instead, they simply call on findPerson() to do those instructions for them.

Can you see it now? The old way, every change to the way information is found required changes to every finder function. Thus, if you had one hundred finder functions, you would change the first one, and then copy and modify the instructions to use for every other function. That's one change of instructions, and ninety-nine copy-and-modify steps.

Once you've refactored, every change to the way information is found requires a change only to the findPerson() function. That's it. Thus, if you had one hundred specific finder functions, you would change findPerson(), and then not be required to make changes to any of the other functions! That's one change of instructions, and nothing else.

And that's it. That's refactoring, in as simple terms as I can put it. (Well, not really. I could have gone down a much more concrete route, describing stereo components, or assembly lines, or even bakeries. But I don't think you, my clever readers, needed the elementary school edition.)

Incidentally, the XP folks I mentioned two posts ago have a mantra they follow religiously. (Pardon the redundant phrasing.)

"Refactor, refactor, refactor."




Post a Comment

Subscribe to Post Comments [Atom]

<< Home