The "steps" as Cary was mentioning are the big thing. You want to preserve the original sound wave's "shape" as closely as possible. Without dithering you're slicing it up into bits and pieces and just dropping the bits that you don't end up using, with no modifications to the existing ones .. with dithering you are re-drawing the bits into a smoother slope by basically looking at all the "steps" and playing join the dots
That's a very simplistic explanation of it, but it gets the basic idea accross. This holds true for both doing 24bit->16bit conversions as well as 48k->44.1k conversions.
It can also be compared to colors ... take a jpeg photo with a wide range of colors, and then convvert it to a PNG of GIF with less colors (like an 8bit/256 color gif or png) and you will see the differences. In most cases these applications will dither, in other words they will get as close as they can to the original pixel colours. Sometimes it does a great job, sometimes it doesn't -- depending on what you are converting. Smooth gradients are the toughest in this instance, you end up with banding.
The same holds true when dithering audio, but audio is _much_ more complicated than simple X/Y pixel data in an image (we don't get into the harmonics, distoritions, etc in this post). Some setups do a decent job of it, some not as good.
My basic rule of thumb on this stuff ... the less dithering the better. Dither once in the process, at the last step of mastering (with a very good plugin that does a good job of it) to eliminate the potential loss. Some (rare cases) would argue that recording in 16bit 44.1 is the best then as you end up not having to dither anything ... but I hold out that the advantages of recording 24bit are very much worth that single dither in the end of the process.
Mark