Eager Load in Rails
Database is considered an expensive resource, and while developing our application we should be particular about the number of DB queries made, in general fewer the DB queries executed, the better the performance.
One way is to use eager loading
which help us to decrease the number of DB queries performed, by preloading the associated content from database.
Lets assume we have two models Car
and Owner
with one-to-many relationship
between them - i.e Owner
has many Cars
An instance variable @cars
is assigned a bunch of cars in the controller and used in views as follows:
<% @cars.each do |car| %>
<%= car.mod_name %>
<%= car.owner.name %>
<% end %>
DB queries executed in log for the above request.
SELECT "cars".* FROM "cars" ORDER BY "cars"."created_at" ASC
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 1]]
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 2]]
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 3]]
First query is executed to fetch all the Cars
related data, and then again one DB query is executed each time, whenever car.owner.name
was called in our view, which required the associated Owner
data, making a total of N+1 query where N is the number of cars.
This is a common performance antipattern and it is termed infamously as N+1 query problem.
Eager Loading to Rescue!
In Rails ActiveRecord, we have different approaches to fetch data from an association of models. Rails provides us with four methods, namely preload()
, eager_load()
, includes()
and joins()
which we can use judiciously to tackle the N+1 query problem.
preload()
-- Car.order(:created_at).preload(:owner)
SELECT "cars".* FROM "cars" ORDER BY "cars"."created_at" ASC
SELECT "owners".* FROM "owners" WHERE "owners"."id" IN (1, 2, 3)
It makes exact two DB queries where the first query is used to fetch the Car
related data and second query is used to fetch all the associated Owner
data, using the ids from the first query.
We have reduced the number of queries successfully from 4(3 + 1)
to 2
query. It may not seem to be drastic improvement in this case, but it make picture more clear if suppose you had 5000
cars listed in your DB, then in case of N+1 query problem, (5000 + 1)
query will be executed instead of 2
query using preload()
.
The only disadvantage using preload is that we cannot have any data filters on Owners
table. It will throw us an exception, as two separate queries are executed in preload()
and we haven’t joined our Car
and Owner
table anywhere.
-- Car.order(:created_at).preload(:owner).where(owners: {name: 'Jack'})
SELECT "cars".* FROM "cars" WHERE "owners"."name" = ? ORDER BY "cars"."created_at" ASC [["name", "Jack"]]
ActionView::Template::Error (SQLite3::SQLException: no such column: owners.name: SELECT "cars".* FROM "cars" WHERE "owners"."name" = ? ORDER BY "cars"."created_at" ASC):
eager_load()
It makes a single DB query and uses LEFT OUTER JOIN
to club together the two tables.
-- Car.order(:created_at).eager_load(:owner)
SELECT "cars"."id" AS t0_r0, "cars"."mod_name" AS t0_r1,
"cars"."owner_id" AS t0_r2, "cars"."created_at" AS t0_r3,
"cars"."updated_at" AS t0_r4, "owners"."id" AS t1_r0,
"owners"."name" AS t1_r1, "owners"."created_at" AS t1_r2,
"owners"."updated_at" AS t1_r3
FROM "cars" LEFT OUTER JOIN "owners" ON "owners"."id" = "cars"."owner_id" ORDER BY "cars"."created_at" ASC
It has an upper hand over preload()
in the terms when we have to add data filter or order(WHERE
and ORDER BY
clause), on the second table, or Owner
in this case.
includes()
Sometimes it may become tough to decide which one to opt for between preload()
and eager_load()
for better performance. Rails has includes()
for the very same purpose, and does this task for you. It decides on itself which approach to take so that you can sit back and relax.
-- Car.order(:created_at).includes(:owner)
SELECT "cars".* FROM "cars" ORDER BY "cars"."created_at" ASC
SELECT "owners".* FROM "owners" WHERE "owners"."id" IN (1, 2, 3)
includes()
smartly switches to making a single DB query using LEFT OUTER JOIN
, when ordering or filtering is required on the associated data.
-- Car.order(:created_at).includes(:owner).where(owners: {name: 'Jack'})
SELECT "cars"."id" AS t0_r0, "cars"."mod_name" AS t0_r1,
"cars"."owner_id" AS t0_r2, "cars"."created_at" AS t0_r3,
"cars"."updated_at" AS t0_r4, "owners"."id" AS t1_r0,
"owners"."name" AS t1_r1, "owners"."created_at" AS t1_r2,
"owners"."updated_at" AS t1_r3
FROM "cars" LEFT OUTER JOIN "owners" ON "owners"."id" = "cars"."owner_id" WHERE "owners"."name" = ?
ORDER BY "cars"."created_at" ASC [["name", "Jack"]]
joins()
joins()
uses INNER JOIN
to fetch the associated data from tables.
-- Car.order(:created_at).joins(:owner)
SELECT "cars".* FROM "cars" INNER JOIN "owners" ON "owners"."id" = "cars"."owner_id" ORDER BY "cars"."created_at" ASC
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 1]]
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 2]]
SELECT "owners".* FROM "owners" WHERE "owners"."id" = ? LIMIT 1 [["id", 3]]
It seems like we are encountering again our N+1 query problem, and this happens because by default joins()
does not include the owners
column in result, and you have to specify explicitly all columns required using a select()
clause on joins()
.
-- Car.order(:created_at).joins(:owner).select('cars.*, owners.name as owner_name')
SELECT cars.*, owners.name as owner_name FROM "cars" INNER JOIN "owners" ON "owners"."id" = "cars"."owner_id" ORDER BY "
cars"."created_at" ASC
You just have to change car.owner.name
to car.owner_name
in your view as now owner_name
is an attribute on Car
model.
<% @cars.each do |car| %>
<%= car.mod_name %>
<%= car.owner_name %>
<% end %>