Sunday, June 15, 2014

SIngletons Reconsidered

TL;DR

Don't make it a global, use it only for stateful resources, and don't use them if you can't implement them properly due to language or ability. Add management controls to the interface so that you can control the behavior of the Singleton in cases like testing, debugging or resetting.

Introduction

 Everyone by now knows about all arguments.

Testability

The typical complaint is that singletons are global and that makes them hard to test and be in tests.  In most languages we can address those issues directly.

  1. Don't make the Singleton global, make it scoped to the Singleton class or module.
  2. Support management controls like a reset or clear method.
There is no reason to make a Singleton global. You should be able to import the class that will return the Singleton. Ideally you make the Singleton truly instantiate with the first constructor call. Any other constructor call would just be returning the already constructed object.  For all usages it becomes just another constructor call that happens to return the same object.

The Singleton should persist state, which does make it harder to test. However, if you add management controls then the Singleton is not a problem for testing.  If you add a reset or destroy to the Singleton class is completely testable.

Hidden Dependencies

If it's no longer a global that means you have an explicit import our include.  It's inclusion is no longer assumed, and as a result you know if a given module actuary uses the Singleton because it does have the import.  The dependencies are no longer hidden now, instead they are explicit and clear.

Violates the Single Responsibility Principle

No it doesn't.  At it's core SRP refers to cohesion and coupling.  Two things that aren't cohesive should not be coupled together because changes in one should not impact the other.  However if they are in the same class you have coupled them together so when either responsibility changes the entire class has to change as well.  This is fragile, tight coupling.  This has nothing to do with an object being a Singleton unless it is somehow exporting it's ability to be a Singleton (like a meta class, mixin or template class might).  That a class is a Singleton is a property of the class, that doesn't mean the behavior is primary, ie. The intent of the class is not to provide Singleton behavior out to other objects.  That behavior is secondary and therefore not exposed to the caller.  Since it's not exposed the Singleton's behavior as a Singleton is encapsulated wholly and not exposed.  SRP remains intact.

Doesn't Work Right in Language X

Yeah, well that's self explanatory.  Don't use language X or if you have to use language X then don't use Singletons. 

Threading

Now that is a real argument.  Yes Singletons can suck in a threaded application unless the Singleton has  semaphores or mutexes to create the appropriate critical sections.  Yes it's hard to get right, and you may not know you didn't get it right until that weird bug happens in production. HOWEVER, that is an ongoing risk of threaded programming regardless of Singleton usage.  Singletons might make it a little more likely you screw it up, but it's not going to be in some novel way.

This risk is also completely mitigated in the case of a read only Singleton, such as a Config object.

Singletons Done Right IMO

Okay, so I'm not a hotshot programmer.  I consider myself a decent bordering on good programmer.  With all those caveats upfront,  here is how I do Singletons.

Override Instantiation to return the same instance always, or the same instance given the constructor arguments as a unique key.

Make the actual instantiation of the Singleton lazy. So it just does the right thing regardless of actually creating the object the first time underneath the covers, or simply returning the same object that already exists.

Always provide an explicit reset or destroy for the Singleton to facilitate testing.


Sunday, April 20, 2014

Creating a local email archive with: offlineimap and procmail

I synchronize my imap folders to maildir on my local laptop often so I can both have access to my email without a network and utilize my preferred search and email clients.  In order to facilitate how I use email I keep a local archive which created and filtered by procmail.

Here is an approximation of my crontab (cron doesn't start a shell, so I put most of the commands in a script):

% crontab -l
HOME=/home/myhome
MAIL=$HOME/maildir
PROCMAILD=$HOME/.procmail.d
0-59/5 9-18 * * * $HOME/bin/syncemail 


Here is the syncemail script:

#!/bin/sh

offlineimap 2>&1 | logger -t offlineimap

for i in `find $MAIL/Disney -type f -newer $PROCMAILD/log `; do
  cat "$i" | procmail
done
[


and here are the relevant portions of my .procmailrc:

PMDIR=$HOME/.procmail.d
VERBOSE=off
MAILDIR=$HOME/maildir
DEFAULT=$MAILDIR/mbox
LOGFILE=$PMDIR/log
LOGABSTRACT=all
ARCHIVEBY=`date +%Y-%m`
ARCHIVE=$MAILDIR/archives/$ARCHIVEBY
MKARCHIVE=`test -d ${ARCHIVE} || mkdir -p ${ARCHIVE}`

# Prevent duplicates
:0Wh: $PMDIR/msgid.lock
| /usr/bin/formail -D 100000 $PMDIR/msgid.cache

:0c
${ARCHIVE}/


§

Sunday, March 23, 2014

REST: POST vs PUT for Resource Creation

Questions often come up about whether to use PUT or POST for creating resources in REST APIs.

I've found both are appropriate in different situations.

PUT

PUT is best used when the client is providing the resource id.
PUT https://.../v1/resource/<id>
Per spec PUT is for storing the enclosed entity "under the supplied Request-URI".  This makes it the ideal HTTP method for use when creating or "storing" a resource.  Only when all the requirements for PUT can't be met should POST be considered.   The perfect example of when the client cannot provide the resource id.


POST

POST is best used when the client doesn't know the resource id a priori.
POST https://.../v1/resource
POST shouldn't be the first choice for resource creation is because  it's really more of a catchall method.
"The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI."
It doesn't require anything be created, or made available for later.
"A successful POST does not require that the entity be created as a resource on the origin server or made accessible for future reference. That is, the action performed by the POST method might not result in a resource that can be identified by a URI."
 § 

Wednesday, February 12, 2014

Python: Aggregating Multiple Context Managers

If you make use of context managers you'll eventually run into a situation where you're nesting a number of them in a single with statement.  It can be somewhat unwieldy from a readability point of view to put everything on one line:

with contextmanager1, contextmanager2, contextmanager3, contextmanager4:
    pass


and while you can break it up on multiple lines:

with contextmanager1, \
           contextmanager2, \
           contextmanager3, \
           contextmanager4:
    pass


sometimes that still isn't very readable.  This is more of a problem if you're using the same set of context managers in a number of places.  Ideally you should be able to put the context managers in a variable and use that with however many with statements need them:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with handlers:
    pass


Of course this doesn't work because handlers is a tuple, not a context manager. This will cause with to throw a exception.  What you can do is create a context manager that aggregates other context managers:

from contextlib import contextmanager
import sys

@contextmanager
def aggregate(handlers):
    for handler in handlers:
        handler.__enter__()
 
    err = None
    exc_info = (None, None, None)
    try:
        yield
    except Exception as err:
        exc_info = sys.exc_info()

    # exc_info get's passed to each subsequent handler.__exit__
    # unless one of them suppresses the exception by returning True
    for handler in reversed(handlers):
        if handler.__exit__(*exc_info):
            err = False
            exc_info = (None, None, None)
    if err:
        raise err

So now you can aggregate all the context managers into one and use that one in the with statement:

handlers = (contextmanager1, contextmanager2, contextmanager3, contextmanager4)
with aggregate(handlers):
    pass


You can build up the list of context managers however you want and use aggregate when using them in a with statement.

§ 

Friday, January 17, 2014

Python Metaprogramming: A Brief Decorator Explanation

A brief explanation on how to think about Python decorators.  Given the following decorator definition:


def decorator(fn):
    def replacement(*a, **kw):
        ...
    return replacement

This usage of the decorator

@decorator
def fn():
    return

is functionaly equivalent to

def fn():
    return
fn = decorator(fn)

Note that fn is not being executed.  Instead decorator is being passed the callable object fn, and is in turn returning a callable object replacement which is then bound to the name fn.  Whether or not the original callable ever gets called is up to decorator and the replacement callable.

Another thing to consider, which often causes people problems, is the timing of the decorator's execution, which is to say during the loading of the module.  If you want to execute a particular piece of logic during fn's call, then that logic needs to be placed in the replacement callable, not in the decorator.

So now that everything is clear it's obvious that


@decorator
@make_decorator(args)
def fn():
    return


Is really just

def fn():
    return
fn = decorator(make_decorator(args)(fn))


Which means the first decorator in a stack is the last to be evaluated.

§

Wednesday, December 25, 2013

Development Server: Automatic Reload on Code Change

There are actually many ways of automatically reloading code when it is modified.  Some are platform/language specific and some are not, although they do depend on certain common behaviors.  This is one I'm using to develop my Python/Gunicorn application and it isn't specific to Python or Gunicorn; however it does require that you have inotify-tools installed, your server can run in the foreground and that it reloads the project when it receives a SIGHUP signal.

wrapper:

#!/bin/sh

SERVER=$1
WATCHED_DIR=.
$SERVER &
RUNNING_PID=$!

trap 'kill -TERM $RUNNING_PID 2> /dev/null; exit' SIGINT

while /bin/true; do
 echo "Starting '$SERVER'..."
 inotifywait -q --exclude '.*\.py[co]$' \
           -e modify -e close_write -e move \
           -e create -e delete \
           -r $WATCHED_DIR
 kill -HUP $RUNNING_PID
 echo "Hupping '$SERVER'..."
done



You can then call it like this:

wrapper 'gunicorn project:main'

This will watch the current directory your in '.' and anytime a modification, creation, deletion or move occurs on any file in the current directory, inotify will issue the notification and stop waiting.  This will cause kill to send a SIGHUP to the the server forcing it to reload the project.  inotify will then wait on the next filesystem event.

 

Variants/Alternatives

 

There are a few variations on this theme which are fairly simple.

  • If you must kill and then restart the process in order to reload you can move the execution of $SERVER & into the while loop.  You should also change the HUP to a TERM in this case to make sure the process is terminated.
  • If you need to reload when anything changes in multiple directories you can just append the full list to inotifywait or generalize the wrapper and take the directories to watch as an argument.
  • If you want to or have to use something different from inotify-tools you can.  This same process should be usable by any of the file system event notification frameworks as long as the have a script that waits on events or allows you to write a script that waits on event.

§
 

Friday, April 19, 2013

Python str with custom truth values

I've run into multiple instances where I get a string from some external service like a config file or a database.  There are values that I want to treat as true and others as false for the purposes of logic in my code.  For example let's say I have a status string with a few possible values, some false and some true.
  1. "Yes", "on", "true"
  2. "No", "off", "false"
I can map the values to True or False in Python.  Now I just need a string I can configure what it's truth values are.

class BooleanString(str):
    def __nonzero__(self):
        return self.lower() in ('on', 'yes', 'true') 

bool(BooleanString("YES")) # return True
bool(BooleanString("some other value")) # returns False


The thing is I don't want to hard code the truth values in the class definition.  I want to pass them in to the creation of the derived class.


def mkboolstr(truth):
    def __nonzero__(self):
        return self.lower() in truth
    return type('BooleanString', (str, ), dict(__nonzero__=__nonzero__))

BooleanString = mkboolstr(('on', 'yes', 'true'))

bool(BooleanString("TRUE")) # return True
bool(BooleanString("some other value")) # returns False


And now we have a function which creates a BooleanString class using a parameterized string as the comparable truth.


§

Python Expressions: Merging Dictionaries

It's common in Python programming to need to merge 2 or more dictionaries together. 

The first idiom is using the dict constructor.  This idiom has it's limitations, however it will always work fine as long as the keys are all strings. Trying this with non-string keys will fail in Python 3.2 and later, and also fails in alternate Python implementations. The idiom itself is frowned upon.

d1 = dict(a=1, b=2, c=3)
d2 = dict(c=4, d=5, e=6)

# merging 2 dicts with the dict constructor
merged_dict = dict(d1, **d2)

# merging n dicts with the dict constructor
merged_dict = reduce(lambda a, b: dict(a, **b), (d1, d1))

There are also other choices which will work with any type of key. Unfortunately they require a tad more code.

# merging n dicts with a generator comprehension
merged_dict = dict(i for iterator in (d1, d2) for i in iterator.iteritems())

# merging n dicts with dict comprehension
merged_dict = {k:v for d in (d1, d2) for k, v in d.iteritems()}


UPDATE:

I have left out dict.update because it is only usable in a statement, not an expression. It also modifies a dictionary which may not be what I want to do. You can compare:

def fn1(d1, d2):
   d3 = d1.copy()
   d3.update(d2)
   return d3

# vs

return dict(d1, **d2)

# or

return {k:v for d in (d1, d2) for k, v in d.iteritems()}

I prefer the expressions.
§

Thursday, April 18, 2013

Python Expressions: Dict Slicing

As anyone who comes from Perl can tell you hash slicing is useful.  Python dicts do not natively support slicing.  One of the issues is slicing in Python seems limited to defined ranges, rather than an ad-hoc collection of values.

Take heart! There is an expression that has the same effect in Python as Perl's hash slicing:

from operator import itemgetter

d = dict(a=1, b=2, c=3, d=4)
h, i, j, k = itemgetter('a', 'c', 'a', 'd')(d)
# h, i, j, k => 1, 3, 1, 4

Alternate solutions are usually more complicated expressions.
d = dict(a=1, b=2, c=3, d=4)

# conceptually more complicated expressions
h, i, j, k = (d[i] for i in ('a', 'c', 'a', 'd'))
h, i, j, k = map(d.get, ('a', 'c', 'a', 'd'))


itemgetter provides a more simple expression for extracting multiple ad-hoc values from a dictionary.
§