Friday, April 19, 2013

Python Expressions: Merging Dictionaries

It's common in Python programming to need to merge 2 or more dictionaries together. 

The first idiom is using the dict constructor.  This idiom has it's limitations, however it will always work fine as long as the keys are all strings. Trying this with non-string keys will fail in Python 3.2 and later, and also fails in alternate Python implementations. The idiom itself is frowned upon.

d1 = dict(a=1, b=2, c=3)
d2 = dict(c=4, d=5, e=6)

# merging 2 dicts with the dict constructor
merged_dict = dict(d1, **d2)

# merging n dicts with the dict constructor
merged_dict = reduce(lambda a, b: dict(a, **b), (d1, d1))

There are also other choices which will work with any type of key. Unfortunately they require a tad more code.

# merging n dicts with a generator comprehension
merged_dict = dict(i for iterator in (d1, d2) for i in iterator.iteritems())

# merging n dicts with dict comprehension
merged_dict = {k:v for d in (d1, d2) for k, v in d.iteritems()}


UPDATE:

I have left out dict.update because it is only usable in a statement, not an expression. It also modifies a dictionary which may not be what I want to do. You can compare:

def fn1(d1, d2):
   d3 = d1.copy()
   d3.update(d2)
   return d3

# vs

return dict(d1, **d2)

# or

return {k:v for d in (d1, d2) for k, v in d.iteritems()}

I prefer the expressions.
§

10 comments:

  1. isn't d1.update(d2) the way to go?

    ReplyDelete
    Replies
    1. I've updated my blog entry to answer your question. Essentially the reason is because dict.update is only usable in a statement, not an expression.

      Delete
  2. I think that there is a much more readable alternative, that I suspect you will find is widespread in Python code bases where a dictionary merge is needed:

    merged_dict = d1.copy()
    merged_dict.update(d2)

    Sometimes someone gets concerned with symmetry, and writes this instead:

    merged_dict = {}
    merged_dict.update(d1)
    merged_dict.update(d2)

    ReplyDelete
    Replies
    1. I don't find any of the expressions that hard to read. I do find the fact that dict.update can't be used in an expression very irritating. That is why I focus on idioms which are expressions. I've updated my blog with a slightly longer explanation.

      Delete
  3. Meh. Code golf. So what if it can't be used in an expression. Why would it need to?

    ReplyDelete
    Replies
    1. I think it is a bit naive to reduce the tradeoffs between usage of statement vs expressions to one of "code golf". Even assuming you don't know about denotational semantics, the differences are far reaching to day to day programming. It's understanding that statements are implicitly tied to state and/or side effects. This impacts thread programming because all bad things happen when multiple threads mutate the same object. It can also affect generators and memoryviews as those structures are always tied to an underlying data structure. It can affect your ability to do distributed programming like map/reduce because ordering often matters when your mutating a structure, or if you have side effects. For all these reasons and more it matters if you can express logic in an expression vs a statement.

      Delete
  4. Use this:
    https://gist.github.com/pysquared/1927707
    then go:
    d3 = d1.copy().update(d2)

    Heehee!

    ReplyDelete
  5. It does not work recursively with stuff like
    initial=dict(a=1, dict(b=dict(c=1,d=2))
    diff=dict(a=2, dict(b=dict(a=4,d=3))
    where you would like:
    final = dict(a=2, dict(b=dict(c=1,a=4,d=3))

    However I think I know how to do it :)

    ReplyDelete
    Replies
    1. I found a solution:

      https://gist.github.com/jul/5427054
      Basically it is adding a merge method to dict that propagates...

      Delete
  6. The reduce construction can be switched to use update:

    reduce(lambda a,b: a.update(b) or a, [d1,d2], {})

    This also allows it to merge any number of dicts.

    ReplyDelete