Style Matters

Grokking Pythonic

Overview


  • A case for style
  • The Zen
  • PEP8
  • Anti-patterns and code smells

A Case for Style


As long as it works, why does it matter what it looks like?

What does work mean?

  • The inputs produce the expected outputs?
  • The code is understandable?
  • The code is easy to change/debug?

A Case for Style

A Case for Style

Idiomatic Python

Pythonic: a coding style that leverages Python’s features to make readable and beautiful software.

Pythonic code:

  • Embodies Python’s guiding principles (The Zen of Python, PEP 20)
  • Follows community norms (PEP 8, PEP 257, black, etc.)
  • Makes good use of python’s features and libraries
  • Is easy to change

History of The Zen

Tim Peters wrote the Zen of Python, which was officially adopted via a Python Enhancement Proposal (PEP 20) in 2004.

import this

Tim also invented Timsort, a popular sorting algorithm used in many modern languages.

Side Note: What’s a PEP?


The Python programming language evolves through Python Enhancement Proposals, which provide a mechanism for public discussion and peer-review of language changes and new features.

Surprisingly, not everyone on the internet agrees, and sometimes the discussions can get a bit heated and deviate from civility. See the Walrus Operator.

The Zen of Python


1. Beautiful

Beautiful is better than ugly.

Beautiful code:

Although beauty is subjective, this so post does a good job to explain the main attributed of beautiful code, which includes:

  • Clarity and Transparency

  • Elegance

  • Efficiency

  • Aesthetics

It can take some time to develop the beautiful python taste. Study professional codes and keep programming.

The Zen of Python


2. Explicit

Explicit is better than implicit.

Explicit


Explicit code means the abstractions are conistent and clear. It doesn’t mean everything is spelled out (you don’t have to understand bytecode, assembly, or semiconductor physics for your code to be explicit).

The Zen of Python


3. Simple

Simple is better than complex.Complex is better than complicated.

Simple


Simple means the most direct (shortest lines of code, most readable) is preferred. This means avoiding complex features when possible (i.e., dictionaries and numpy arrays are better than custom classes for simple cases.) However, sometime the correct behavior is complex, so complex code is unavoidable. In this case, it can just be keept as simple as possible (for the user).

The Zen of Python


4. Flat

Flat is better than nested.

Flat


Highly nested structures are complex, and they often require recursion to effectively navigate. This should be avoided when possible. In scientific computing, numpy operations are strongly preferred over nested for loops, both for efficiency and readability.

The Zen of Python


5. Sparse

Sparse is better than dense.

Concise


Sparsity means fewer things. Fewer classes, fewer functions, fewer parameters.

Note

“Perfection is achieved, not when there is nothing to add, but when there is nothing left to take away.”

-Antoine de Saint Exupéry

The Zen of Python


6. Readable

Readability counts.

Readable


Readability means code is optimized not just for a computer to understand, but for a developer to understand. Documentation is a first class concern.

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

– Martin Fowler”

The Zen of Python


7. Consistent

Special cases aren’t special enough to break the rules.

Consistent


Consistent means abstractions are clear and not full of special exceptions. Software shouldn’t be like the English language.

The Zen of Python


8. Pragmatic

Although practicality beats purity.

Pragmatic


Pragmatic means prioritizing functionality and usability over more esoteric concerns and unlikely edge cases.

The Zen of Python


9. Correct

Errors should never pass silently.Unless explicitly silenced.

Correct


Errors should be raised unless the way to deal with them is obvious.

The Zen of Python


10. Unsurprising

In the face of ambiguity, refuse the temptation to guess.

Unsurprising


Guessing what a user may want when there are several viable choices is rarely a good idea.

The Zen of Python


11. Intuitive

There should be one– and preferably only one – obvious way to do it. Although that way may not be obvious at first unless you’re Dutch.

Intuitive


Intuitive software is best illustrated when doing new things seldom requires referencing the documentation; the abstractions, object names, and APIs are such it is obvious how certain things should be done.

Intuitive


Python’s creator, Guido Van Rossum, is Dutch, hence the inside joke in the PEP.

As the old adage goes:

If you ain’t Dutch, you ain’t much


Python’s creator, Guido Van Rossum, is Dutch, hence the inside joke in the PEP.

As the old adage goes:

If you ain’t Dutch, you ain’t much

The Zen of Python


12. Operational

Now is better than never.

Operational


Usually it is better to have something that works for 90% of the cases now than a perfect program sometime in the distant future (which often means never).

The Zen of Python


13. Flexible

Although never is often better than *right* now.

Flexible


On the flip side, implementations that are too hastily done can be hard to change. Especially when they must be maintained forever. Like an angry email to your boss, some code is better left unwritten.

The Zen of Python


14. Explainable

If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

Explainable


Working through a hard programming problem? Grab a friend, or a rubber duck and explain your implementation. Often simpler approaches will surface.

The Zen of Python


15. Organized

Namespaces are one honking great idea – let’s do more of those!

Organized


Coming from MATLab, I found having to import python modules annoying. Why is it np.pi rather than just pi? Then, I created a variable named pi in a matlab finite element code which, of course, overwrote the expected value. It took me several hours to debug, but afterward I understood why namespaces are a “honking” good idea.

Python Style (PEP 8)


  • Python style-guide adopted in 2001
  • Several guidelines to approach pythonic
  • Originally just for the python source code, adopted more broadly later

Background


PEP8 is python’s style guide. While originally only applicable to the standard library, it is now widely considered good practice for (nearly) all python projects.

Here we explore a few of the more important aspects of PEP8, but you can find the whole thing here.

PEP 8: Line Lengths


  • Limit line lengths to 79 characters
  • Limit docstring lengths to 72 characters

Note

Many modern formatting tools (e.g., black) expand the line limit to around 90 characters.

PEP 8: Imports

  • Each import should be a single line
  • Unless multiple objects are imported from the same module
no
import numpy as np, scipy, matplotlib.pyplot as plt, sklearn


yes
import matplotlib.pyplot as plt
import numpy as np
import scipy
import sklearn
from pandas import Series, DataFrame

PEP 8: Imports

  • Imports should be grouped by
    • module level dunders (__future__)
    • standard library
    • external packages
    • internal packages
  • Imports should be organized alphabetically
  • Avoid wildcards

PEP 8: Imports


from __future__ import annotations

import pathlib
from collections import defaultdict

import numpy as np
import matplotlib.pyplot as plt

from mylibrary.util import combobulate

PEP 8: Naming

  • i (single lowercase letter) - used for incremental variables.
  • PascalCase - used for class definitions
  • snake_case - used for functions, methods, modules, variables …
  • UPPER_SNAKE_CASE - used for constants
  • _private_variable - _ indicates non-public object
  • __mangled - leading __ performs name mangling in classes
  • In most cases, dunders (e.g., __variable__) should be avoided

PEP 8: Naming


wrong
myVariable = 42  # camel case

MyVariable = 42  # pascal case

my-variable = 42  # kabob case 


correct
my_variable = 42  # snake case

PEP 8: Try/Except


try/except clauses should:

  • Use a specific Exception types (where possible)
  • Limit the logic in the try/except scope

PEP 8: Try/Except

wrong
try:
    cool_function(my_inputs)
except: 
    pass


correct
try:
    cool_function(my_inputs)
except (ValueError, SpecificError): 
    pass

PEP 8: Try/Except

wrong
try:
    value = my_dict[key]
    return handle_value(value)
except KeyError:
    return key_not_found(key)


correct
try:
    value = my_dict[key]
except KeyError:
    return key_not_found(key)
else:
    return handle_value(value)

PEP 8: Membership

Negative membership checks


wrong
not "a" in my_dict


correct
"a" not in my_dict

PEP 8: Boolification

Bool checks


# Wrongest
if my_var is True:
    ...
# Wronger
if my_var == True:
    ...
# Correct
if my_var:
    ...

PEP 8: Spacing


wrong
spam( ham[ 1 ], { eggs : 2 } )


correct
spam(ham[1], {eggs: 2})

PEP 8: Spacing


wrong
bar = (0, )


correct
foo = (0,)

PEP 8: Spacing


wrong
x             = 1
y             = 2
long_variable = 3


correct
x = 1
y = 2
long_variable = 3

PEP 8: Type Checks


wrong
type(my_dict) == dict


ok
isinstance(my_dict, dict)


better
from typing import Mapping
isinstance(my_dict, Mapping)

Understanding Check


myConstant = 10

class snake_class:
    '''A special class for snakes.'''
    __snake_type__="python"
    _mylonglist = [1,2,3]
    
    def ShedSkin(self ):
        return self._mylonglist[0 : 2]

Understanding Check


MY_CONSTANT = 10

class SnakeClass:
    """A special class for snakes."""
    _snake_type = "python"
    _my_long_tuple = (1, 2, 3)
    
    def shed_skin(self):
        return self._my_long_tuple[0: 2]

PEP 8: When to Ignore It?

A style guide is about consistency.

When to ignore guidelines:

  • Makes the code less readable (for someone used to the guideline)
  • Existing code follows a different style (or clean it up)

Style: Automation


  • Style should be enforced by each project
  • Nitpicks are better taken from a bot
  • Consistency is important!
  • Use automatic linting tools

Style: Automation Tools


  • black, autopep8, pyflakes, isort, flake8 …
  • All tools can be bundled into pre-commit
  • recommendation: Try shed
  • ruff is also becoming popular

Shed Installation

Shed can be installed with pip:

pip install shed

Then run it while in your repo with

shed

Code Smells and Anti-patterns


Style/design issues in code that “works”, but isn’t pythonic (elegant, efficient, readable, …)

Code Smell: Complexity


Code Smell: Complexity


def my_func(arg_1):
    """An Example over-indented function."""
    if isinstance(arg_1, int):
        if arg_1 > 0:
            if arg_1 < 20:
                for n in range(arg_1):
                    ...

Code Smell: Complexity


def my_func(arg_1):
    """Improvements with syntactic sugar."""
    if isinstance(arg_1, int) and 0 < arg_1 < 20:
        for n in range(arg_1):
            ...

Code Smell: Complexity


def _is_valid_arg(arg_1):
    """Return True if arg_1 is valid for use in my_func."""
    return isinstance(arg_1, int) and 0 < arg_1 < 20


def my_func(arg_1):
    """Improvement with new functions."""
    if _is_valid_arg(arg_1):
        for n in range(arg_1):
            ...

Code Smell: Complexity


How to measure?

  • lines per fuction/method/class/module
  • number of indents (more rigorous: cyclomatic complexity)
  • number of symbols defined
  • others?

Code Smell: Complexity


Complexity deodorant

  • Divide functionality into multiple functions
  • Create abstractions (e.g., use/create classes)
  • Syntactic sugar (make complexity more readable)

Code Smell: Readability


What does this do?

def func(a, b=1, c=0):
    d = np.mean(a, c, keepdims=True)
    e = (a - d) ** b
    f = a.shape[c]
    return np.sum(e, c) / f

Code Smell: Readability


Add meaningful names and docstring

def get_stats_moment(array, moment=1, axis=0):
    """Calculate the statistical moment."""
    mean = np.mean(array, axis=axis, keepdims=True)
    demean_raised = (array - mean) ** moment
    sample_count = array.shape[axis]
    demean_sum = np.sum(demean_raised, axis=axis) 
    return demean_sum / sample_count

Code Smell: Readability


Even Better, use scipy!

from scipy.stats import moment

Code Smell: Readability


Incomprehensible Comprehensions

nested = [1, 2, 3, [2, 3], {1, 2, 3}]

[y for z in [x if isinstance(x, list) else[x] \
for x in nested] for y in z]

Code Smell: Readability


Break into different lines

[
    y for z in 
    [
        x if isinstance(x, list) 
        else [x] 
        for x in nested
    ] 
    for y in z
]

Code Smell: Readability


Wrap in a function

def unwrap_list(nested):
    """unwrap nested heterogeneous lists"""
    out = [
        y for z in
        [
            x if isinstance(x, list)
            else [x]
            for x in nested
        ]
        for y in z
    ]
    return out

Code Smell: Readability


If still unreadable, remove comprehension.

def unwrap_list(nested):
    """unwrap nested heterogeneous lists"""
    out = []
    for element in nested:
        if isinstance(element, list):
            for sub_element in element:
                out.append(sub_element)
        else:
            out.append(element)
    return out

Code Smell: Signatures


Signature: inputs and outputs of a callable (can include names and types)

Common problems:

  • Too many arguments
  • Variable output types

Code Smell: Too Many Inputs


Too many input parameters (probably need different functions)

def do_stats(
    ar, 
    mean=True, 
    median=True, 
    std=True, 
    kurtosis=True,
    ...
):
    ...

Code Smell: Too Many Inputs

Note

The largest signature I know of is pd.read_csv which, as of pandas 1.5.1, supports 51 arguments. Although it is not ideal, it is one of the most used function in the python data ecosystem. Also, all but one of its parameters is optional, and keyword only arguments are used.

Code Smell: Multiple Return Types


Output changes based on parameters

import numpy as np

def std(ar, axis=0, return_mean=True):
    """Calc std of array"""
    mean = np.mean(ar, axis=axis, keepdims=True)
    std_ar = np.sum((ar - mean) ** 2) / ar.shape[axis]
    if return_mean:
        return std_ar, mean
    else:
        return std_ar

Code Smell: Multiple Return Types


None returned rather than raising an Error

import numpy as np

def std(ar, axis=0):
    """Calc std of array"""
    mean = np.mean(ar, axis=axis, keepdims=True)
    std_ar = np.sum((ar - mean) ** 2) / ar.shape[axis]
    if np.any(np.isnan(std)):
        return None
    return std_ar

Code Smell: Ignorance

Not using language/std lib features:

  • iteration
  • zip
  • enumerate
  • dict.get
  • dict.items
  • collections
  • list comprehensions

Code Smell: Ignorance


Iteration: Bad
some_list = [1, 2, 3]

for i in range(len(some_list)):
    val = some_list[i]


Iteration: Good
some_list = [1, 2, 3]

for val in some_list:
    ...

Code Smell: Ignorance


Dual Iteration: Bad
list_1 = [1, 2, 3]
list_2 = [4, 5, 6]

list_len = min([len(list_1), len(list_2)])

for i in range(list_len):
    val_1 = list_1[i]
    val_2 = list_2[i]

Code Smell: Ignorance


Dual Iteration: Good
list_1 = [1, 2, 3]
list_2 = [4, 5, 6]

for val_1, val_2 in zip(list_1, list_2):
    ...

Code Smell: Ignorance

Enumeration: Bad
some_list = [1, 2, 3]

count = 0
for val in some_list:
    count += 1


Enumeration: Good
some_list = [1, 2, 3]

for count, val in enumerate(some_list):
    ...

Code Smell: Ignorance

Dubious Keys: Bad
some_dict = {1: 1, 2: 2}

if 3 in some_dict:
    val = some_dict[3]
else:
    val = 3


Dubious Keys: Good
some_dict = {1: 1, 2: 2}

val = some_dict.get(3, 3)

Code Smell: Ignorance


Key Value Iteration: Bad
some_dict = {1: 1, 2: 2}

for key in some_dict:
    value = some_dict[key]


Key Value Iteration: Good
some_dict = {1: 1, 2: 2}

for key, value in some_dict.items():
    ...

Code Smell: Ignorance


Nested Dict: Bad
key_dict = {}  # a dict of lists for storing outputs

# iterate key/value of data dict
for key, data in some_data.items():
    if key in key_dict:
        key_dict[key].append(value)
    else:
        key_dict[key] = [value]

Code Smell: Ignorance


Nested Dict: Good
from collections import defaultdict

key_dict = defaultdict(list)  # a dict of lists for storing outputs

# iterate key/value of data dict
for key, data in some_data.items():
    key_dict[key].append(data)

Code Smell: Ignorance


Queing: Bad
queue = [1, 2, 3]  # a poor man's queue

# pop last value off queue
last_value = queue.pop(-1)

# append to queue end
queue.append(4)

Code Smell: Ignorance

Queing: Good
from collections import deque
queue = deque([1, 2, 3])  # a proper queue

# pop last value off queue
last_value = queue.pop()

# append to queue end
queue.append(4)

# pop first value
first_value = queue.popleft()

# append to first value
queue.appendleft(-1)

Code Smell: Ignorance


Filtering: Bad
out = []  # get all even numbers < 100
for a in range(100):
    if a % 2 == 0:
        out.append(a)    


Filtering: Good
out = [a for a in range(100) if a % 2 == 0]

Summary


  • Style matters
    • “Was this written by a scientist?”
  • Remember the Zen
  • Follow guidelines
  • Code should smell nice
  • Consistency