Note

This page was generated from a jupyter notebook.

A super-brief intro to Python and NumPy

Python is: * interpreted (high level) * readable * concise * cross-platform * dynamically typed * object oriented * automatically memory-managed

Almost all of the below is explained much more fully at various places online. For a nice entry level tutorial set, try the Software Carpentry intros: http://swcarpentry.github.io/python-novice-inflammation/

The main Python documentation is also an extremely readable source of knowledge. Just Google!

PROGRAM FILES AND INTERACTIVE ENVIRONMENTS

Put Python code in .py text files (AKA “scripts”). Run these from a shell, as:

python myscript.py

OR

Use one of Python’s interactive environments (e.g., iPython)

ipython

In an interactive environment:

  • Run code line-by-line, preserving variables

  • Run your scripts, using the magic command %run (and preserve the variables)

This Jupyter notebook is an interactive environment.

MODULES

Python has some built-in functions, but everything else comes in a library module.

Modules are imported, then the functions they hold run with a dot syntax:

[1]:
import math  # comments with a hash

x = math.cos(2.0 * math.pi)
print(x)  # print is a built-in function
1.0

OR import the functions and properties individually:

[2]:
from numpy import cos, pi  # numpy, numeric python, also has these functions

x = cos(2.0 * pi)
print(x)
1.0

Get help in an interactive shell with a trailing ?, quit it with q

[3]:
?pi

TYPES

Python distinguishes: * integer (int), e.g., 3, * float (float), e.g., 1.0 or 2.5, * boolean (bool), e.g., True * complex (complex), e.g., 3.2 + 2.4i, * strings (str), e.g., ‘Hello world!’

You may also encounter NumPy types, like numpy.float64

[4]:
type(pi)
[4]:
float

Typecasting (switching between types on-the-fly) is automatic where possible, but can be time-expensive if you want efficient code.

Python’s inbuilt data structures are: * tuples, with parentheses—immutable (you can’t change values once they exist) * lists, with square brackets—mutable (you can change values once they exist) * sets, as set()—unordered collection with no duplicates * dictionaries, with curly brackets—associated pairs of key:value

Note that all these data structures let you happily mix data types… But the cost is that Python runs more slowly than, e.g., C++.

[5]:
mytuple = (0, 1, 2, 3)
print(
    f"You can index: {mytuple[1]}"
)  # this uses a string method which replaces {} with the argument of format
You can index: 1

Tuples are immutable, meaning that you cannot reassign values to them. Python will give TypeError if you try to do so. We can test this using a try...except block: if a TypeError occurs in the assignment statement below, we should see the printed message:

[6]:
try:
    mytuple[0] = 100
except TypeError:
    print("A TypeError occurred")
A TypeError occurred

… and indeed we do.

[7]:
mylist = [0, 1, 2, 3]
print("This time reassignment works:")
mylist[0] = "I can store a string!"
print(mylist)
This time reassignment works:
['I can store a string!', 1, 2, 3]
[8]:
myset = {0, 1, 2, 3}
print(myset)
{0, 1, 2, 3}
[9]:
myset.add("string!")  # you can use both ' and " to declare a string
print(f"Adding is easy: {myset}")
Adding is easy: {0, 1, 2, 3, 'string!'}
[10]:
myset.add(0)
print(f"But remember, duplicates don't count! {myset}")
But remember, duplicates don't count! {0, 1, 2, 3, 'string!'}

Almost anything can be a key or a value:

[11]:
mydict = {"firstkey": 1, "secondkey": 2, 3: "three"}
print(mydict)
{'firstkey': 1, 'secondkey': 2, 3: 'three'}
[12]:
print(mydict["firstkey"])
print(mydict["secondkey"])
print(mydict[3])
1
2
three
[13]:
print(f"Get the keys (note lack of ordering): {mydict.keys()}")
print(f"Get the values: {mydict.values()}")
Get the keys (note lack of ordering): dict_keys(['firstkey', 'secondkey', 3])
Get the values: dict_values([1, 2, 'three'])
[14]:
try:
    print("The next line should generate a KeyError...")
    print(f" {mydict[2]}")
except KeyError:
    print("...and indeed it did.")
The next line should generate a KeyError...
...and indeed it did.

INDEXING

  • Indexing starts from 0

  • Index with square brackets [start : stop : step]

  • “stop” is exclusive of the named index

  • Colon alone means “all of these” or “to the start/end”

[15]:
x = list(range(10))
print(f"x[3] gives {x[3]}")
print(f"x[1:5:2] gives {x[1:5:2]}")
x[3] gives 3
x[1:5:2] gives [1, 3]
[16]:
print(f"x[8:] gives {x[8:]}")
print(f"x[:7] gives {x[:7]}")
print(f"x[::3] gives {x[::3]}")
x[8:] gives [8, 9]
x[:7] gives [0, 1, 2, 3, 4, 5, 6]
x[::3] gives [0, 3, 6, 9]

PYTHON IS LIKE, BUT ISN’T, MATLAB

  • This is a power:

[17]:
x = 10.0**2  # …or…
import numpy as np

x = np.square(10.0)  # NEVER 10.^2.

Likewise, it’s also useful to know about the “truncation” (//) and “remainder” (%) division operators:

[18]:
print(f"truncate: {(13 // 4)}")
print(f"remainder: {(13 % 4)}")
truncate: 3
remainder: 1
  • End indices are NOT inclusive

[19]:
len(range(0, 100))  # in Matlab this would be 101
[19]:
100
[20]:
[
    x for x in range(5)
]  # this is called "list comprehension", and is a readable way to make a list
[20]:
[0, 1, 2, 3, 4]
  • Intelligent memory management means Python will pass objects by reference where possible. In other words, if you set two things equal and then later change the first one, the second one will also change (and vice versa):

[21]:
x = [0] * 3
y = [1, 2, 3]
print(f"x starts as {x}")
print(f"y starts as {y}")
x starts as [0, 0, 0]
y starts as [1, 2, 3]
[22]:
x = y
print(f"After setting equal, x is {x}")
After setting equal, x is [1, 2, 3]
[23]:
y[1] = 100
print(f"After modifying y, x is {x}")
After modifying y, x is [1, 100, 3]
[24]:
# one way to stop this automatic behaviour is by forcing a copy with [:]
x = y[:]
print(x)
print(y)
[1, 100, 3]
[1, 100, 3]
[25]:
y[1] = 1000000
print(f"After forcing a copy, x is still {x} but y is now {y}")
After forcing a copy, x is still [1, 100, 3] but y is now [1, 1000000, 3]
  • In Matlab, assigning a value to a variable triggers output unless you suppress it with a semi-colon at the end of the line; this isn’t necessary in Python:

[26]:
x = range(10)  # …see?
  • Python doesn’t use brackets to delineate code blocks. It uses indentation with a fixed number of spaces (normally 4). This also applies to for loops, while loops, if statements, try/except statements, class declarations, function declarations, etc.

[27]:
def myfunction(arg1, arg2, **kwds):
    # **kwds is a special (optional) dictionary input type,
    # that you can use as an input "wildcard"
    try:
        print_this = kwds["printme"]
    except KeyError:
        x = arg1 * arg2
        return x  # ...no brackets needed; both lines have 4 space indents
    else:
        print(print_this)
[28]:
print("first time:")
myfunction(3.0, 4.0)
first time:
[28]:
12.0
[29]:
print("second time…")
myfunction(5, 6, printme="Printed this time!")
second time…
Printed this time!
  • Python’s plotting is a blatant clone of matlab’s, and lives in the library matplotlib.pyplot:

[30]:
%matplotlib inline
# that command tells this notebook to put plots into the notebook
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(10)  # like range(), but produces a numpy array
y = np.random.rand(10)  # ten random floats, 0->1
plt.plot(x, y, "*--")
plt.xlabel("xaxis")
plt.ylabel("yaxis")
plt.title("my plot!")
[30]:
Text(0.5, 1.0, 'my plot!')
../../_images/tutorials_python_intro_python_intro_48_1.png

NumPy and Landlab

Landlab makes extensive use of the NumPy (Numeric Python) libraries. These allow significant acceleration of standard Python processes on matrix-like data arrays. Here we look at some of the key features and differences with pure Python along with some NumPy best-practice.

[31]:
import numpy as np

Initialize NumPy arrays from standard Python iterables (lists, tuples):

[32]:
myarray = np.array([0, 1, 3, 6, 18])

…or with one of the many standard array creation methods in NumPy. Particularly useful ones are:

[33]:
a = np.zeros(10, dtype=int)
print(f"a: {a}")
a: [0 0 0 0 0 0 0 0 0 0]
[34]:
b = np.ones(5, dtype=bool)
print(f"b: {b}")
b: [ True  True  True  True  True]
[35]:
c = np.random.rand(10)
print(f"c: {c}")
c: [0.21341013 0.1471262  0.91117598 0.4982964  0.48572055 0.09033031
 0.71109171 0.78369747 0.0019031  0.85486636]
[36]:
d = np.arange(5.0)
print(f"d: {d}")
d: [0. 1. 2. 3. 4.]
[37]:
e = np.empty((3, 3), dtype=float)
e.fill(100.0)
print(f"e: {e}")
e: [[100. 100. 100.]
 [100. 100. 100.]
 [100. 100. 100.]]

Arrays also have some built-in methods and properties. We see ‘fill’ above, but also noteworthy are:

[38]:
print(f"e has shape: {(e.shape)}")
print(f"e has size: {e.size} ")  # preferred to len() when working with arrays
c.max(), c.min(), c.mean(), c.sum()
e has shape: (3, 3)
e has size: 9
[38]:
(np.float64(0.9111759825035648),
 np.float64(0.0019031034722111206),
 np.float64(0.4697618216568964),
 np.float64(4.697618216568964))
[39]:
f = c.copy()
print(f"flatten: {e.flatten()}")
flatten: [100. 100. 100. 100. 100. 100. 100. 100. 100.]

Slicing works like (better than?) in pure Python:

[40]:
print(d[2:])
[2. 3. 4.]
[41]:
e[1, 1] = 5.0
print(e)
[[100. 100. 100.]
 [100.   5. 100.]
 [100. 100. 100.]]
[42]:
print(e[1:, 1:])
[[  5. 100.]
 [100. 100.]]

Note that logical operations with NumPy tend to require NumPy-native functions, rather than pure Python and, or, not etc.

[43]:
bool1 = np.array([True, True, False, False])
bool2 = np.array([True, False, True, False])
print(f"AND: {np.logical_and(bool1, bool2)}")
print(f"OR: {np.logical_or(bool1, bool2)}")
print(f"NOT: {np.logical_not(bool1)}")
AND: [ True False False False]
OR: [ True  True  True False]
NOT: [False False  True  True]
[44]:
print(f"ANY: {np.any(bool1)}")
print(f"ALL: {np.all(bool1)}")
ANY: True
ALL: False

Now, let’s demonstrate the speed of NumPy over pure Python:

[45]:
f_list = range(1000)
f_array = np.arange(1000, dtype=int)


def addone_py(list_in):
    for i in list_in:
        i += 1


def addone_np(array_in):
    array_in += 1
[46]:
print("time for list:")
%timeit addone_py(f_list)  # a magic command for timing things
time for list:
29 μs ± 3.24 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
[47]:
print("time for array:")
%timeit addone_np(f_array)
time for array:
735 ns ± 44.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In particular, never loop to do a logical test:

[48]:
# NOT THIS:
myoutput_slow = np.zeros(10, dtype=float)
for i in range(len(c)):  # c is our random number array
    if c[i] > 0.5:
        myoutput_slow[i] = c[i]

# DO THIS INSTEAD:
myoutput_fast = np.zeros(10, dtype=float)
greater_than_half = c > 0.5
myoutput_fast[greater_than_half] = c[greater_than_half]

print(np.all(np.equal(myoutput_slow, myoutput_fast)))
True

The online NumPy help is actually an extremely readable resource, and is highly recommended to find out more about the family of available NumPy methods.


Generated by nbsphinx from a Jupyter notebook.