CorpWatch has taken a valiant stab at entity resolution/name disambiguation with corporate data from the Securities and Exchange Commission. Their project is open source so you might learn quite a bit from them about structuring a database for entity resolution.
You might also try the folks at Sunlight Labs, who I believe are working on these issues with various nonprofits such as the National Institute on Money In State Politics and OpenSecrets. I believe both of the latter groups have applied these techniques to campaign finance data at the state and federal levels, and I wouldn't be surprised if they used them to help develop the recently announced TransparencyData.com.
On a technical level, there are numerous approaches to this problem depending on the nature of the data. Most likely you'll need to cobble together several of them to develop an accuracy ranking, which you can then use to separate "good" data from data that will need human review. Below are a few resources that might help along the way:
Finally, you might find some useful nuggets in Programming Collective Intelligence, which uses Python as the language of choice for source examples.
EDIT:
On the address standardization front, Google's geocoding is your best best if you've already shelled out the $10K or so for a license (which lets you brush aside the daily limits on the service). But if you need a stand-alone library, you might want to check out the Ruby port of the Perl module you mentioned:
http://github.com/geocommons/geocoder
It appears the original developer behind the Perl module migrated over to Ruby. It ain't Python, but if it gets the job done....
Great question Joe. I did some retagging — hope that’s ok!