The query optimizer in SQL Server is very powerful and smart – provided you give it enough information and clues to do its job. One recent discovery caught me by surprise – the fact that SQL Server can skip performing joins altogether in certain scenarios.
I came upon this while reviewing the definition of views in Dynamics CRM, most of which look like this:
from [AccountBase] left join [CustomerAddressBase] XXaddress1 on ([AccountBase].[AccountId] = XXaddress1.ParentId and XXaddress1.AddressNumber = 1) left join [CustomerAddressBase] XXaddress2 on ([AccountBase].[AccountId] = XXaddress2.ParentId and XXaddress2.AddressNumber = 2) left join [AccountBase] [account_master_account] on ([AccountBase].[MasterId] = [account_master_account].[AccountId]) left join [LeadBase] [account_originating_lead] on ([AccountBase].[OriginatingLeadId] = [account_originating_lead].[LeadId]) left join [AccountBase] [account_parent_account] on ([AccountBase].[ParentAccountId] = [account_parent_account].[AccountId]) left join [ContactBase] [account_primary_contact] on ([AccountBase].[PrimaryContactId] = [account_primary_contact].[ContactId]) left join [EquipmentBase] [equipment_accounts] on ([AccountBase].[PreferredEquipmentId] = [equipment_accounts].[EquipmentId]) left join ... left join ...
and so forth. Surprisingly, when viewing the execution plan, entire tables were skipped if no columns from those tables were referenced. Now, how is that possible? The result of a join affects the cardinality of the result set; therefore, the criteria for skipping a join cannot be just not selecting columns from a certain table. The key, it turns out, is that the Query Optimizer must be able to guarantee the a one-to-one relationship between the two join tables.
Not surprisingly, having a foreign key relationship between the join tables also satisfies SQL Server’s criteria of not performing the join. In fact, further optimizations can be made when there’s a trusted foreign key relationship. Playing with the AdventureworksDW database, inner joins can be skipped:
Of course, this is because SQL Server checks the existence of value against the target table when the record is inserted; effectively, this is improving query performance by transferring the workload to the insert/update/delete phase.
Curiously, if I select the join column from the joined table, SQL still thinks it has to perform the join, even though there is no difference between FIS.OrderDateKey and dd.DateKey in this scenario. I suppose the Query Optimizer isn’t omniscient.
The best use case for this technique is probably improving performance of large views that join to a lot of tables; this is often the case when abstracting complicated schemas from end users. For joins to complicated subqueries or CTEs expressions in the view, I’ve also successfully used this technique by creating a materialized view with a unique index, thereby guaranteeing to the query optimizer that the result set will be a one-to-one relationship.