In one of my projects, actually Proactimas project, I’m using Entity Framework (EF) on top of SQL Azure and it works pretty well; data access is relatively fast, dead easy most of the time and the cost (in $) is OK. However I have discovered how sensitive EF is to how you write your queries. It appears to be especially true if you have several one-to-many relationships.
Just a quick introduction to Proactima Compliance; Any user has access to one or more companies, each company consists of one or more units. Units can be things like ‘Head Office’, IT-department, a ship etc. For every unit there can be one or more evaluations and each evaluation is based on a set of rules that the company measure compliance against. An example of a ruleset is the Petroleum Activities Act in Norway.
In Proactima Compliance the most frequently accessed page had an action that took about 1000ms to complete. This page allows the user to evaluate a rule and then choose ‘Save and Next’; so there are two actions involved:
- Save current rule and find next based on previous rules’ sequence
- Load next rule and display it to the user
The second action completes in 200-400ms, but the first takes from 800-1200 ms to complete. The main reason for the long processing time is what MiniProfiler calls Duplicate Readers. Basically, because of the way I had written my queries it was executing a reader querying for each rule in a ruleset! It looked like this:
It’s running 103 (!) queries just for that one action, that’s terrible! The actual query looked like this:
var rules = from r in evaluationRule.Evaluation.Rules where r.Rule.Sequence > currentSequence orderby r.Rule.Sequence select r;
The variable ‘evaluationRule’ was already fetched from the database and so I was just navigating to its Evaluation (remember how an evaluation is based on a ruleset which in turn consists of individual rules) and then the ruleset for that evaluation. Then I set up the where clause to make sure we can only get a rule with a sequence higher than the current rule. Finally I do an order by and select, from this I will only choose the first rule thus getting the next rule based on sequence. Possible bug here; sequences could potentially be same, but I’m 100% in control of the sequences so that’s really not an issue.
I won’t show the rendered query from this, it’s a nightmare, but the end result is correct and dead slow… So I had to fix this! I tried to manipulate the query in a try-fail-retry pattern; no success at all! I figured after awhile that I was addressing the issue from the wrong end; I was just thinking of the code. Thinking back on previous issues I’ve had with generated Sql (from other ORMs) I remembered (finally!) an old lesson; start by writing the query manually in the best possible way you can and then try to express that same query using the ORM! So that’s what I did! I fired up Sql Server Management Studio Express 2012 Deluxe Gold Plated Edition, or whatever it’s called and thought really hard about what I really needed. What I ended up with was this:
DECLARE @currentSequence int = 1; DECLARE @currentEvaluation int = 1; select top 1 er.ID, er.RuleID, r.Sequence, er.EvaluationID from EvaluationRules er, Rules r where er.RuleID = r.ID and er.EvaluationID = @currentEvaluation and r.Sequence > @currentSequence order by r.Sequence;
Not that hard to write at all. Now all I needed was to convert it to EF code:
var rules = (from er in Database.EvaluationRules join r in Database.Rules on er.RuleID equals r.ID where er.EvaluationID == evaluationId && r.Sequence > currentSequence orderby r.Sequence select er) .Take(1);
It’s definitely a bit more than my original code based query, but as you can see it looks a whole lot like the manual sql query. And how does MiniProfiler now report on my page:
What an amazing change! 900ms to 222ms! And 100+ queries to just 7! That’s performance optimizations the user notices!
But why did it all go so terribly wrong in the first place? I’m not going to over analyze the two queries or discuss how EF generates the Sql; somebody much smarter than me can probably do that (or has!). But what I learned from all this is that my old skills as Sql developer isn’t lost even if I am using an ORM AND the way you express a query can matter a whole lot when it comes to performance!