Concise Adblock Filter Set Explained
Adblock is the single most useful Firefox plugin available today. Just like watching sitcoms with automatic commercial-skip, adblock's banner ad supression system elicits a smug sense of satisfaction even after browsing through your 10,000th ad-free web page. However, a huge barrier to adoption seems to be the lack of a default filter set, so when you first install adblock, nothing happens.
The main issue is that adblock does not have any intelligence as to the content that is included with a webpage; it is just a generic regex-based filter system, so it is only as effective as the filters that you provide. There are plenty of pre-made lists available but they tend to be overly-aggressive in what is supressed, resulting in occasional broken pages and/or pages that dead-end because adblock has removed the "Next" button. The most dangerous public set seems to be the EasyList, which has a 360+ item block list. Evidence that the creators know of its greedy nature is their inclusion of a 20+ item whitelist to manually compensate what was initially blocked. Even more unstable is the EasyElement list that searches through the DOM to remove suspected elements directly from the main document -- a list of 570+ substrings to search for.
Intead of using such a large, reactive list of simple and site-specific string matches that tries to supress 100% of ads, I posit that you only need 2 adblock filters to eliminate 70-80% of ads, and still be confident that legitimate content isn't being flagged as a false positive. By getting into the heads of HTML writers, we can pick out the most common patterns used to include ads and create regex patterns to suppress the ads.
/(\b|_)ad(x|s?)(\b|_)/
This regex looks for any element that contains the string 'ad', 'ads', or 'adx' surrounded by a word boundary, because the vast majority of web sites partition their ads into a single directory or serve them through a single script. The word boundary check is crucial to this filter because just searching for the characters 'ad' is ineffective. Instead, the word boundary restriction means that adblock will supress elements that contain strings like 'ads.server.com' or 'www.server.com/ads/' or 'server.com/ad_server.php', but not 'adobe.com' or 'server.com/adjustment'./ad.*\d+[xX]\d+/
This regex exploits the common technique of ad designers to use the image dimensions in their element name, i.e., "server.com/newads.php?location=top&size=468x80". Like the previous rule, we don't just exclude any element that has dimensions, but qualify that by searching for the string 'ad' as well.
At this point, your browsing experience will be significantly improved, but you can bump up your block rate to about 80-90% with a few more simple substring matches. There are many well known ad providers that exist solely to deliver ads, so we can consildate those in composite filter rules:
/a(2\.yimg|dserv|dvert|tdmt|twola)/
This rule collects all the ad serving systems that start with 'a': Yahoo, Atlas, AOLTimeWarner, and generic ad serving systems./b(anners|logads)/
falkag.net
These pick up anything labeled with 'banner', the 'blogads' network, or Falk AdSolutions.
Realistically, reducing the ad load by 90% should be more than sufficient for anyone. Chasing that last 10% -- and whitelisting the collateral damage -- will always be a losing battle. Your time is better used reading the content that is on the page you requested in the first place.