`Keyword.get` Considered Harmful

2024-03-05
elixir

Recently, I made a bug while working on an Elixir project: I passed an option with a wrong name to a function, and because of the way the tests were set up they still passed and reported the result I expected. So it slipped into production.

That made me think: how could I prevent such a bug from happening in the future?

Normally, in Elixir code we use the following pattern to fetch optional keyword arguments in a function:

defmodule A do
  def my_fun(x, y, opts \\ []) do
    option1 = Keyword.get(opts, :option1, false)
    option2 = Keyword.get(opts, :option2, [])

    IO.inspect({option1, option2}, label: :options)
    :ok
  end
end

This code expects to get none, some, or all of the following optional keyword arguments: option1, option2. However, in case any of these options are misspelled (say option_one), it will silently ignore the passed option. And if you happen to be so unlucky for your tests not to catch this, you’ll make a bug similar to mine.

The way to avoid this is easy: we need to assert in some way that options that we are not expecting are not passed into the function. Thankfully, in the same Keyword module of the standard library we already have a function that can help us : Keyword.pop/3.

Let’s rewrite our function:

defmodule AStrict do
  def my_fun(x, y, opts \\ []) do
    {option1, opts} = Keyword.pop(opts, :option1, false)
    {option2, opts} = Keyword.pop(opts, :option2, [])

    if !Enum.empty?(opts) do
      raise ArgumentError, "unknown options passed to `my_fun`: #{inspect(opts)}"
    end

    :ok
  end
end

Or even:

defmodule AShort do
  def my_fun(x, y, opts \\ []) do
    {option1, opts} = Keyword.pop(opts, :option1, false)
    {option2, opts} = Keyword.pop(opts, :option2, [])
    [] = opts

    :ok
  end
end

Now we can test what happens if we pass the following options list to those functions:

iex> opts = [option2: [:foo, :bar], option_one: true]

# this is wrong - `option1` is misspelled as `option_one` and no error is reported
# `option1` gets a wrong (the default) value as the result
iex> A.my_fun(1, 2, opts)
options: {false, [:foo, :bar]}
:ok

# good! the bug is caught and we get a nice error message and the correct exception too
iex> AStrict.my_fun(1, 2, opts)
** (ArgumentError) unknown options passed to `my_fun`: [option_one: true]
    iex:32: AStrict.my_fun/3
    iex:27: (file)

# also good - the error message is not as clear, but it's easy enough to debug
# and we sava a couple of lines
iex> AShort.my_fun(1, 2, opts)
** (MatchError) no match of right hand side value: [option_one: true]
    iex:29: AShort.my_fun/3
    iex:27: (file)

I vastly prefer AStrict of course, but for one-off scripts and such AShort is totally appropriate as well.

A question one may ask is: isn’t that slow? Keyword.get/3 only needs to walk the list and will terminate the moment it finds the needed option. Keyword.pop/3 will need to do that and it will need to build a new list!

This doesn’t matter in practice. Option keyword lists are not supposed to be big in any case - if you plan to pass a larger data structure where you need to perform random key-based access to the elements, Map is a better choice and you can easily replace opts \\ [] with opts \\ %{} and Keyword.get/3 + Keyword.pop/3 with Map.get/3 + Map.pop/3 in the examples above.

But let’s measure the performance impact anyway. It’s always important to make sure that the code that you are benchmarking is inside a module: both iex and Livebook will compile modules, but they will only interpret top-level expressions, which will mess up your benchmark. Here we go:

defmodule Bench do
  @opts [x: true, y: 2, z: 3.154] ++
    for id <- 1..7, do: {:"option#{id}", "value#{id}"}

  def get_opts, do: @opts

  def keyword_get_opts(opts) do
    x = Keyword.get(opts, :x, false)
    y = Keyword.get(opts, :y, 0)
    z = Keyword.get(opts, :z, 0.0)

    values = Enum.reduce(1..7, [], fn id, vs ->
      [Keyword.get(opts, :"option#{id}", "") | vs]
    end)

    :crypto.hash(:sha, :erlang.term_to_binary{x, y, z, values})
  end

  def keyword_pop_opts(opts) do
    {x, opts} = Keyword.pop(opts, :x, false)
    {y, opts} = Keyword.pop(opts, :y, 0)
    {z, opts} = Keyword.pop(opts, :z, 0.0)

    {values, _opts} = Enum.reduce(1..7, {[], opts}, fn id, {vs, opts} ->
      {value, opts} = Keyword.pop(opts, :"option#{id}", "")
      {[value | vs], opts}
    end)

    :crypto.hash(:sha, :erlang.term_to_binary{x, y, z, values})
  end

  def get(), do: :timer.tc(Bench, :keyword_get_opts, [@opts]) |> elem(0)
  def pop(), do: :timer.tc(Bench, :keyword_pop_opts, [@opts]) |> elem(0)
end

Hashing the options has two purposes. We can check that both our functions return the same result:

iex> opts = Bench.get_opts()
iex> Bench.keyword_get_opts(opts)
<<207, 54, 241, 240, 61, 169, 33, 189, 133, 84, 38, 213, 190, 16, 92, 175, 252,
  226, 69, 16>>
iex> Bench.keyword_pop_opts(opts)
<<207, 54, 241, 240, 61, 169, 33, 189, 133, 84, 38, 213, 190, 16, 92, 175, 252,
  226, 69, 16>>
iex> Bench.keyword_get_opts(opts) == Bench.keyword_pop_opts(opts)
true

and we ensure that the compiler won’t optimize away the fetching of the options - it’s needed to produce the result of the function!

Now we can run the (very unscientific but enough to show the point, your results may vary) benchmark:

iex> Bench.get()
15
iex> Bench.get()
18
iex> Bench.get()
15

iex> Bench.pop()
21
iex> Bench.pop()
17
iex> Bench.pop()
16

Those results are in microseconds. For realistic workloads with short option keyword lists the performance effect is negligible.

To summarize:

  • Using Keyword.pop/3 instead of Keyword.get/3 for extracting optional keyword arguments in functions prevents a whole class of bugs with misspelled options.
  • The performance effect is so minor, it’s negligible in most real-world scenarios.
  • The code length & readability are more or less the same, even when options are processed with Enum functions.

Happy coding!

P.S. This post got mentioned on r/elixir and the user arjjov suggested using Keyword.validate/2 to solve this issue. This function also allows applying default values when certain options are missing. I didn’t know that this function exists when I wrote the post - it was added in 1.13, and though I usually check changelog for new Elixir versions, that one slipped under the radar.

Even though that function seems to be designed to solve this exact problem, it’s sadly not very user-friendly to use in practice. The thing is - it still returns a keyword list, so you need to call it and write the code to extract those values. You will need to essentially provide the option names twice for each option, which introduces additional logic where you can make mistakes.

Let’s rewrite our previous example using the raising version Keyword.validate!/2:

defmodule AValidate do
  def my_fun(x, y, opts \\ []) do
    opts = Keyword.validate!(opts, [option1: false, option2: []])
    option1 = Keyword.get(opts, :option1)
    option2 = Keyword.get(opts, :option2)

    :ok
  end
end

And let’s call it:

iex> opts = [option2: [:foo, :bar], option_one: true]
iex> AValidate.my_fun(1, 2, opts)
** (ArgumentError) unknown keys [:option_one] in [option2: [:foo, :bar], option_one: true], the allowed keys are: [:option1, :option2]
    (elixir 1.14.2) lib/keyword.ex:355: Keyword.validate!/2
    iex:3: AValidate.my_fun/3
    iex:7: (file)

So, the behavior is very similar to our AStrict:

iex> AStrict.my_fun(1, 2, opts)
** (ArgumentError) unknown options passed to `my_fun`: [option_one: true]
    iex:8: AStrict.my_fun/3
    iex:7: (file)

However, it’s easy to make a mistake like this now:

defmodule AValidateWrong do
  def my_fun(x, y, opts \\ []) do
    opts = Keyword.validate!(opts, [option1: false, option2: []])
    option1 = Keyword.get(opts, :option1)
    # note the misspelled option name
    option2 = Keyword.get(opts, :opton2)

    IO.inspect({option1, option2})
    :ok
  end
end

Let’s call it:

iex> opts = [option2: [:foo, :bar], option_one: true]
iex> AValidateWrong.my_fun(1, 2, opts)
** (ArgumentError) unknown keys [:option_one] in [option2: [:foo, :bar], option_one: true], the allowed keys are: [:option1, :option2]
    (elixir 1.14.2) lib/keyword.ex:355: Keyword.validate!/2
    iex:9: AValidateWrong.my_fun/3
    iex:8: (file)

# and now with the correct options, but still with an incorrect result
# because of the misspelling in `Keyword.get/2` call, we get a wrong value for the `option2`
iex> AValidateWrong.my_fun(1, 2, [option2: [:foo, :bar], option1: true])
{true, nil}
:ok

In general, the more you need to repeat yourself, the more likely you are to make a mistake.

Let’s also check how quick the (correct) Keyword.validate!/2 solution will be. We’ll amend our Bench module like this:

defmodule Bench do
  # see Bench implementation above
  ...

  def keyword_validate_opts(opts) do
    opts = (Keyword.validate!(opts, [x: false, y: 0, z: 0.0] ++
          (for id <- 1..7, do: {:"option#{id}", ""})))

    x = Keyword.get(opts, :x)
    y = Keyword.get(opts, :y)
    z = Keyword.get(opts, :z)

    values = Enum.reduce(1..7, [], fn id, vs ->
      [Keyword.get(opts, :"option#{id}") | vs]
    end)

    :crypto.hash(:sha, :erlang.term_to_binary{x, y, z, values})
  end

  def validate(), do: :timer.tc(Bench, :keyword_validate_opts, [@opts]) |> elem(0)
end

Let’s verify that the behavior is the same:

iex> opts = Bench.get_opts()
iex> Bench.keyword_pop_opts(opts)
<<207, 54, 241, 240, 61, 169, 33, 189, 133, 84, 38, 213, 190, 16, 92, 175, 252,
  226, 69, 16>>
iex> Bench.keyword_validate_opts(opts)
<<207, 54, 241, 240, 61, 169, 33, 189, 133, 84, 38, 213, 190, 16, 92, 175, 252,
  226, 69, 16>>
iex> Bench.keyword_validate_opts(opts) == Bench.keyword_pop_opts(opts)
true

Now we can check how fast this will go (again, very unscientific, but oh well):

iex> Bench.validate()
18
iex> Bench.validate()
18
iex> Bench.validate()
17

Yet again, the performance impact is irrelevant.

So, what would I use? I would still stick with my Keyword.pop/3 solution. Though it does have a downside of you needing to carry that updated state of opts around, I think it’s less probability for me to make a mistake in that (and you will be warned in case of unused or non-existing variable by the compiler) than writing option names (atoms that are not checked by the compiler) twice as the Keyword.validate/2 + Keyword.get/2 solution will require. In the end, the decision seems to be mostly stylistic.

If you enjoyed this content, you can sponsor me on Github to produce more videos / educational blog posts.

And if you're looking for consulting services, feel free to contact me .