Style Matters

Grokking Pythonic

Overview

A case for style
The Zen
PEP8
Anti-patterns and code smells

A Case for Style

As long as it works, why does it matter what it looks like?

What does work mean?

The inputs produce the expected outputs?
The code is understandable?
The code is easy to change/debug?

A Case for Style

Idiomatic Python

Pythonic: a coding style that leverages Python’s features to make readable and beautiful software.

Pythonic code:

Embodies Python’s guiding principles (The Zen of Python, PEP 20)
Follows community norms (PEP 8, PEP 257, black, etc.)
Makes good use of python’s features and libraries
Is easy to change

History of The Zen

Tim Peters wrote the Zen of Python, which was officially adopted via a Python Enhancement Proposal (PEP 20) in 2004.

import this

Tim also invented Timsort, a popular sorting algorithm used in many modern languages.

Side Note: What’s a PEP?

The Python programming language evolves through Python Enhancement Proposals, which provide a mechanism for public discussion and peer-review of language changes and new features.

Surprisingly, not everyone on the internet agrees, and sometimes the discussions can get a bit heated and deviate from civility. See the Walrus Operator.

The Zen of Python

1. Beautiful

Beautiful is better than ugly.

Beautiful code:

Although beauty is subjective, this so post does a good job to explain the main attributed of beautiful code, which includes:

Clarity and Transparency
Elegance
Efficiency
Aesthetics

It can take some time to develop the beautiful python taste. Study professional codes and keep programming.

The Zen of Python

2. Explicit

Explicit is better than implicit.

Explicit

Explicit code means the abstractions are conistent and clear. It doesn’t mean everything is spelled out (you don’t have to understand bytecode, assembly, or semiconductor physics for your code to be explicit).

The Zen of Python

3. Simple

Simple is better than complex.Complex is better than complicated.

Simple

Simple means the most direct (shortest lines of code, most readable) is preferred. This means avoiding complex features when possible (i.e., dictionaries and numpy arrays are better than custom classes for simple cases.) However, sometime the correct behavior is complex, so complex code is unavoidable. In this case, it can just be keept as simple as possible (for the user).

The Zen of Python

4. Flat

Flat is better than nested.

Flat

Highly nested structures are complex, and they often require recursion to effectively navigate. This should be avoided when possible. In scientific computing, numpy operations are strongly preferred over nested for loops, both for efficiency and readability.

The Zen of Python

5. Sparse

Sparse is better than dense.

Concise

Sparsity means fewer things. Fewer classes, fewer functions, fewer parameters.

Note

“Perfection is achieved, not when there is nothing to add, but when there is nothing left to take away.”

-Antoine de Saint Exupéry

The Zen of Python

6. Readable

Readability counts.

Readable

Readability means code is optimized not just for a computer to understand, but for a developer to understand. Documentation is a first class concern.

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

– Martin Fowler”

The Zen of Python

7. Consistent

Special cases aren’t special enough to break the rules.

Consistent

Consistent means abstractions are clear and not full of special exceptions. Software shouldn’t be like the English language.

The Zen of Python

8. Pragmatic

Although practicality beats purity.

Pragmatic

Pragmatic means prioritizing functionality and usability over more esoteric concerns and unlikely edge cases.

The Zen of Python

9. Correct

Errors should never pass silently.Unless explicitly silenced.

Correct

Errors should be raised unless the way to deal with them is obvious.

The Zen of Python

10. Unsurprising

In the face of ambiguity, refuse the temptation to guess.

Unsurprising

Guessing what a user may want when there are several viable choices is rarely a good idea.

The Zen of Python

11. Intuitive

There should be one– and preferably only one – obvious way to do it. Although that way may not be obvious at first unless you’re Dutch.

Intuitive

Intuitive software is best illustrated when doing new things seldom requires referencing the documentation; the abstractions, object names, and APIs are such it is obvious how certain things should be done.

Intuitive

Python’s creator, Guido Van Rossum, is Dutch, hence the inside joke in the PEP.

As the old adage goes:

If you ain’t Dutch, you ain’t much

Python’s creator, Guido Van Rossum, is Dutch, hence the inside joke in the PEP.

As the old adage goes:

If you ain’t Dutch, you ain’t much

The Zen of Python

12. Operational

Now is better than never.

Operational

Usually it is better to have something that works for 90% of the cases now than a perfect program sometime in the distant future (which often means never).

The Zen of Python

13. Flexible

Although never is often better than *right* now.

Flexible

On the flip side, implementations that are too hastily done can be hard to change. Especially when they must be maintained forever. Like an angry email to your boss, some code is better left unwritten.

The Zen of Python

14. Explainable

If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

Explainable

Working through a hard programming problem? Grab a friend, or a rubber duck and explain your implementation. Often simpler approaches will surface.

The Zen of Python

15. Organized

Namespaces are one honking great idea – let’s do more of those!

Organized

Coming from MATLab, I found having to import python modules annoying. Why is it np.pi rather than just pi? Then, I created a variable named pi in a matlab finite element code which, of course, overwrote the expected value. It took me several hours to debug, but afterward I understood why namespaces are a “honking” good idea.

Python Style (PEP 8)

Python style-guide adopted in 2001
Several guidelines to approach pythonic
Originally just for the python source code, adopted more broadly later

Background

PEP8 is python’s style guide. While originally only applicable to the standard library, it is now widely considered good practice for (nearly) all python projects.

Here we explore a few of the more important aspects of PEP8, but you can find the whole thing here.

PEP 8: Line Lengths

Limit line lengths to 79 characters
Limit docstring lengths to 72 characters

Note

Many modern formatting tools (e.g., black) expand the line limit to around 90 characters.

PEP 8: Imports

Each import should be a single line
Unless multiple objects are imported from the same module

no

import numpy as np, scipy, matplotlib.pyplot as plt, sklearn

yes

import matplotlib.pyplot as plt
import numpy as np
import scipy
import sklearn
from pandas import Series, DataFrame

PEP 8: Imports

Imports should be grouped by
- module level dunders (__future__)
- standard library
- external packages
- internal packages

Imports should be organized alphabetically

Avoid wildcards

PEP 8: Imports

from __future__ import annotations

import pathlib
from collections import defaultdict

import numpy as np
import matplotlib.pyplot as plt

from mylibrary.util import combobulate

PEP 8: Naming

i (single lowercase letter) - used for incremental variables.
PascalCase - used for class definitions
snake_case - used for functions, methods, modules, variables …
UPPER_SNAKE_CASE - used for constants
_private_variable - _ indicates non-public object
__mangled - leading __ performs name mangling in classes
In most cases, dunders (e.g., __variable__) should be avoided

PEP 8: Naming

wrong

myVariable = 42  # camel case

MyVariable = 42  # pascal case

my-variable = 42  # kabob case

correct

my_variable = 42  # snake case

PEP 8: Try/Except

try/except clauses should:

Use a specific Exception types (where possible)
Limit the logic in the try/except scope

PEP 8: Try/Except

wrong

try:
    cool_function(my_inputs)
except: 
    pass

correct

try:
    cool_function(my_inputs)
except (ValueError, SpecificError): 
    pass

PEP 8: Try/Except

wrong

try:
    value = my_dict[key]
    return handle_value(value)
except KeyError:
    return key_not_found(key)

correct

try:
    value = my_dict[key]
except KeyError:
    return key_not_found(key)
else:
    return handle_value(value)

PEP 8: Membership

Negative membership checks

wrong

not "a" in my_dict

correct

"a" not in my_dict

PEP 8: Boolification

Bool checks

# Wrongest
if my_var is True:
    ...
# Wronger
if my_var == True:
    ...
# Correct
if my_var:
    ...

PEP 8: Spacing

wrong

spam( ham[ 1 ], { eggs : 2 } )

correct

spam(ham[1], {eggs: 2})

PEP 8: Spacing

wrong

bar = (0, )

correct

foo = (0,)

PEP 8: Spacing

wrong

x             = 1
y             = 2
long_variable = 3

correct

x = 1
y = 2
long_variable = 3

PEP 8: Type Checks

wrong

type(my_dict) == dict

ok

isinstance(my_dict, dict)

better

from typing import Mapping
isinstance(my_dict, Mapping)

Understanding Check

myConstant = 10

class snake_class:
    '''A special class for snakes.'''
    __snake_type__="python"
    _mylonglist = [1,2,3]
    
    def ShedSkin(self ):
        return self._mylonglist[0 : 2]

Understanding Check

MY_CONSTANT = 10

class SnakeClass:
    """A special class for snakes."""
    _snake_type = "python"
    _my_long_tuple = (1, 2, 3)
    
    def shed_skin(self):
        return self._my_long_tuple[0: 2]

PEP 8: When to Ignore It?

A style guide is about consistency.

When to ignore guidelines:

Makes the code less readable (for someone used to the guideline)
Existing code follows a different style (or clean it up)

Style: Automation

Style should be enforced by each project
Nitpicks are better taken from a bot
Consistency is important!
Use automatic linting tools

Style: Automation Tools

black, autopep8, pyflakes, isort, flake8 …
All tools can be bundled into pre-commit
recommendation: Try shed
ruff is also becoming popular

Shed Installation

Shed can be installed with pip:

pip install shed

Then run it while in your repo with

shed

Code Smells and Anti-patterns

Style/design issues in code that “works”, but isn’t pythonic (elegant, efficient, readable, …)

Code Smell: Complexity

def my_func(arg_1):
    """An Example over-indented function."""
    if isinstance(arg_1, int):
        if arg_1 > 0:
            if arg_1 < 20:
                for n in range(arg_1):
                    ...

Code Smell: Complexity

def my_func(arg_1):
    """Improvements with syntactic sugar."""
    if isinstance(arg_1, int) and 0 < arg_1 < 20:
        for n in range(arg_1):
            ...

Code Smell: Complexity

def _is_valid_arg(arg_1):
    """Return True if arg_1 is valid for use in my_func."""
    return isinstance(arg_1, int) and 0 < arg_1 < 20


def my_func(arg_1):
    """Improvement with new functions."""
    if _is_valid_arg(arg_1):
        for n in range(arg_1):
            ...

Code Smell: Complexity

How to measure?

lines per fuction/method/class/module
number of indents (more rigorous: cyclomatic complexity)
number of symbols defined
others?

Code Smell: Complexity

Complexity deodorant

Divide functionality into multiple functions
Create abstractions (e.g., use/create classes)
Syntactic sugar (make complexity more readable)

Code Smell: Readability

What does this do?

def func(a, b=1, c=0):
    d = np.mean(a, c, keepdims=True)
    e = (a - d) ** b
    f = a.shape[c]
    return np.sum(e, c) / f

Code Smell: Readability

Add meaningful names and docstring

def get_stats_moment(array, moment=1, axis=0):
    """Calculate the statistical moment."""
    mean = np.mean(array, axis=axis, keepdims=True)
    demean_raised = (array - mean) ** moment
    sample_count = array.shape[axis]
    demean_sum = np.sum(demean_raised, axis=axis) 
    return demean_sum / sample_count

Code Smell: Readability

Even Better, use scipy!

from scipy.stats import moment

Code Smell: Readability

Incomprehensible Comprehensions

nested = [1, 2, 3, [2, 3], {1, 2, 3}]

[y for z in [x if isinstance(x, list) else[x] \
for x in nested] for y in z]

Code Smell: Readability

Break into different lines

[
    y for z in 
    [
        x if isinstance(x, list) 
        else [x] 
        for x in nested
    ] 
    for y in z
]

Code Smell: Readability

Wrap in a function

def unwrap_list(nested):
    """unwrap nested heterogeneous lists"""
    out = [
        y for z in
        [
            x if isinstance(x, list)
            else [x]
            for x in nested
        ]
        for y in z
    ]
    return out

Code Smell: Readability

If still unreadable, remove comprehension.

def unwrap_list(nested):
    """unwrap nested heterogeneous lists"""
    out = []
    for element in nested:
        if isinstance(element, list):
            for sub_element in element:
                out.append(sub_element)
        else:
            out.append(element)
    return out

Code Smell: Signatures

Signature: inputs and outputs of a callable (can include names and types)

Common problems:

Too many arguments
Variable output types

Code Smell: Too Many Inputs

Too many input parameters (probably need different functions)

def do_stats(
    ar, 
    mean=True, 
    median=True, 
    std=True, 
    kurtosis=True,
    ...
):
    ...

Code Smell: Too Many Inputs

Note

The largest signature I know of is pd.read_csv which, as of pandas 1.5.1, supports 51 arguments. Although it is not ideal, it is one of the most used function in the python data ecosystem. Also, all but one of its parameters is optional, and keyword only arguments are used.

Code Smell: Multiple Return Types

Output changes based on parameters

import numpy as np

def std(ar, axis=0, return_mean=True):
    """Calc std of array"""
    mean = np.mean(ar, axis=axis, keepdims=True)
    std_ar = np.sum((ar - mean) ** 2) / ar.shape[axis]
    if return_mean:
        return std_ar, mean
    else:
        return std_ar

Code Smell: Multiple Return Types

None returned rather than raising an Error

import numpy as np

def std(ar, axis=0):
    """Calc std of array"""
    mean = np.mean(ar, axis=axis, keepdims=True)
    std_ar = np.sum((ar - mean) ** 2) / ar.shape[axis]
    if np.any(np.isnan(std)):
        return None
    return std_ar

Code Smell: Ignorance

Not using language/std lib features:

iteration
zip
enumerate
dict.get
dict.items
collections
list comprehensions

Code Smell: Ignorance

Iteration: Bad

some_list = [1, 2, 3]

for i in range(len(some_list)):
    val = some_list[i]

Iteration: Good

some_list = [1, 2, 3]

for val in some_list:
    ...

Code Smell: Ignorance

Dual Iteration: Bad

list_1 = [1, 2, 3]
list_2 = [4, 5, 6]

list_len = min([len(list_1), len(list_2)])

for i in range(list_len):
    val_1 = list_1[i]
    val_2 = list_2[i]

Code Smell: Ignorance

Dual Iteration: Good

list_1 = [1, 2, 3]
list_2 = [4, 5, 6]

for val_1, val_2 in zip(list_1, list_2):
    ...

Code Smell: Ignorance

Enumeration: Bad

some_list = [1, 2, 3]

count = 0
for val in some_list:
    count += 1

Enumeration: Good

some_list = [1, 2, 3]

for count, val in enumerate(some_list):
    ...

Code Smell: Ignorance

Dubious Keys: Bad

some_dict = {1: 1, 2: 2}

if 3 in some_dict:
    val = some_dict[3]
else:
    val = 3

Dubious Keys: Good

some_dict = {1: 1, 2: 2}

val = some_dict.get(3, 3)

Code Smell: Ignorance

Key Value Iteration: Bad

some_dict = {1: 1, 2: 2}

for key in some_dict:
    value = some_dict[key]

Key Value Iteration: Good

some_dict = {1: 1, 2: 2}

for key, value in some_dict.items():
    ...

Code Smell: Ignorance

Nested Dict: Bad

key_dict = {}  # a dict of lists for storing outputs

# iterate key/value of data dict
for key, data in some_data.items():
    if key in key_dict:
        key_dict[key].append(value)
    else:
        key_dict[key] = [value]

Code Smell: Ignorance

Nested Dict: Good

from collections import defaultdict

key_dict = defaultdict(list)  # a dict of lists for storing outputs

# iterate key/value of data dict
for key, data in some_data.items():
    key_dict[key].append(data)

Code Smell: Ignorance

Queing: Bad

queue = [1, 2, 3]  # a poor man's queue

# pop last value off queue
last_value = queue.pop(-1)

# append to queue end
queue.append(4)

Code Smell: Ignorance

Queing: Good

from collections import deque
queue = deque([1, 2, 3])  # a proper queue

# pop last value off queue
last_value = queue.pop()

# append to queue end
queue.append(4)

# pop first value
first_value = queue.popleft()

# append to first value
queue.appendleft(-1)

Code Smell: Ignorance

Filtering: Bad

out = []  # get all even numbers < 100
for a in range(100):
    if a % 2 == 0:
        out.append(a)

Filtering: Good

out = [a for a in range(100) if a % 2 == 0]

Summary

Style matters
- “Was this written by a scientist?”
Remember the Zen
Follow guidelines
Code should smell nice
Consistency