Welcome to pypercolate!¶
pypercolate is a scientific Python package that implements the Newman-Ziff algorithm for Monte Carlo simulation of percolation on graphs.
December 12, 2015 (Version 0.4.6)
Andreas Sorge <pypercolate@asorge.de>
- Max Planck Institute for Dynamics & Self-Organization, Göttingen, Germany
- Organization for Research on Complex Adaptive Systems, Göttingen, Germany
Citation¶
Please cite as: Andreas Sorge. (2015). pypercolate 0.4.6. Zenodo.
Documentation license¶
This documentation is licensed under a Creative Commons Attribution 4.0 International License.
About pypercolate¶
License¶
pypercolate is free software; you can redistribute it and/or modify it under the terms of the Apache License 2.0:
Copyright (c) 2015, Andreas Sorge <pypercolate@asorge.de>
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Changelog¶
0.4.6 2015-12-12
- [Bug]: Add Python 3.5 compatibility
- [Bug]: Fix NumPy 1.10 casting error
0.4.5 2015-12-12
- [Bug]: Fix Readthedocs notebook conversion
0.4.1 2015-08-24
- [Bug]: Fix citation in documentation
0.4.0 2015-08-14
- [Feature]: Implement percolate.hpc module
0.3.1 2015-08-14
- [Feature]: Restructure package development
- [Feature]: Configure continuous deployment to PyPI with Travis
v0.3.0 2015-08-09
- [Feature]: Configure Travis, Coveralls, Landscape continuous integration
v0.2.5 2015-05-13
- [Bug]: Fix changelog
v0.2.3 2015-04-30
- [Bug]: Fix setup.cfg classifiers
v0.2.2 2015-04-30
- [Bug]: Change author email
v0.2.1 2015-04-29
- [Bug]: Update Zenodo DOI
v0.2.0 2015-04-25
- [Feature]: Basic functionality
Setup¶
Requirements¶
pypercolate runs under Python 2.7 and Python 3.4 (and later). It requires the following Python packages:
future
numpy
scipy
networkx
simoa
Installation from PyPI¶
To install the most recent stable version from the Python Package Index (PyPI), including the dependencies, run
$ pip install percolate
Installation from source¶
To install the latest version from the source repository, run
$ pip install git+https://github.com/andsor/pypercolate.git
pypercolate Tutorial¶
Bond percolation on linear chains and square lattices¶
This tutorial demonstrates how to use the pypercolate package. As examples, we present bond percolation with spanning cluster detection on a linear chain and a square grid of variable size, respectively.
Preamble¶
# configure plotting
%config InlineBackend.rc = {'figure.dpi': 300, 'savefig.dpi': 300, \
'figure.figsize': (10, 5), 'font.size': 12, \
'figure.facecolor': (1, 1, 1, 0)}
%matplotlib inline
# import packages
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import networkx as nx
import percolate
from pprint import pprint
# configure plotting colors
colors = [
'#0e5a94',
'#eb008a',
'#37b349',
'#f29333',
'#00aabb',
'#b31e8d',
'#f8ca12',
'#7a2d00',
]
mpl.rcParams['axes.color_cycle'] = colors
Evolution of one sample state (realization)¶
First, we get to know the basic unit of computation in the Newman-Ziff
algorithm: a single sample state (realization). The sample_states
generator function evolves a single realization by successively adding
bonds, incrementing the occupation number \(n\) by \(1\) in each
step.
Linear chain¶
# Generate linear chain graph with auxiliary nodes for spanning cluster detection
chain = percolate.spanning_1d_chain(length=10)
# Evolve sample state and plot it at the same time
edges = list()
fig, axes = plt.subplots(figsize=(8.0, 2.0), ncols=5, nrows=2, squeeze=True)
axes = axes.ravel()
for i, sample_state in enumerate(percolate.sample_states(chain)):
if 'edge' in sample_state:
edge = sample_state['edge']
edges.append(edge)
nx.draw(
chain,
ax=axes[i],
edgelist=edges,
width=4,
pos={node: (node, 0) for node in chain.nodes_iter()},
node_size=10,
)
axes[i].set_title('n = {}'.format(i))
pprint(sample_state)
plt.tight_layout(0)
plt.show()
{'M': 9,
'N': 10,
'has_spanning_cluster': False,
'max_cluster_size': 1,
'moments': array([ 9., 9., 9., 9., 9.]),
'n': 0}
{'M': 9,
'N': 10,
'edge': (6, 7),
'has_spanning_cluster': False,
'max_cluster_size': 2,
'moments': array([ 8., 8., 8., 8., 8.]),
'n': 1}
{'M': 9,
'N': 10,
'edge': (3, 4),
'has_spanning_cluster': False,
'max_cluster_size': 2,
'moments': array([ 7., 8., 10., 14., 22.]),
'n': 2}
{'M': 9,
'N': 10,
'edge': (7, 8),
'has_spanning_cluster': False,
'max_cluster_size': 3,
'moments': array([ 6., 7., 9., 13., 21.]),
'n': 3}
{'M': 9,
'N': 10,
'edge': (5, 6),
'has_spanning_cluster': False,
'max_cluster_size': 4,
'moments': array([ 5., 6., 8., 12., 20.]),
'n': 4}
{'M': 9,
'N': 10,
'edge': (9, 10),
'has_spanning_cluster': False,
'max_cluster_size': 4,
'moments': array([ 4., 6., 10., 18., 34.]),
'n': 5}
{'M': 9,
'N': 10,
'edge': (8, 9),
'has_spanning_cluster': False,
'max_cluster_size': 6,
'moments': array([ 3., 4., 6., 10., 18.]),
'n': 6}
{'M': 9,
'N': 10,
'edge': (1, 2),
'has_spanning_cluster': False,
'max_cluster_size': 6,
'moments': array([ 2., 4., 8., 16., 32.]),
'n': 7}
{'M': 9,
'N': 10,
'edge': (2, 3),
'has_spanning_cluster': False,
'max_cluster_size': 6,
'moments': array([ 1., 4., 16., 64., 256.]),
'n': 8}
{'M': 9,
'N': 10,
'edge': (4, 5),
'has_spanning_cluster': True,
'max_cluster_size': 10,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 9}

Figure: Evolution of a single realization of bond percolation on the linear chain with 10 nodes. The terminal nodes are the auxiliary nodes for spanning cluster detection.
Square grid¶
# Generate square grid graph with auxiliary nodes for spanning cluster detection
grid = percolate.spanning_2d_grid(3)
# Evolve sample state and plot it at the same time
edges = list()
fig, axes = plt.subplots(figsize=(8.0, 4.0), ncols=4, nrows=3, squeeze=True)
axes = axes.ravel()
for i, sample_state in enumerate(percolate.sample_states(grid)):
if 'edge' in sample_state:
edge = sample_state['edge']
edges.append(edge)
nx.draw(
grid,
ax=axes[i - 1],
edgelist=edges,
width=4,
pos={node: node for node in grid.nodes_iter()},
node_size=100,
)
axes[i - 1].set_title('n = {}'.format(i))
pprint(sample_state)
plt.tight_layout(0)
plt.show()
{'M': 12,
'N': 9,
'has_spanning_cluster': False,
'max_cluster_size': 1,
'moments': array([ 8., 8., 8., 8., 8.]),
'n': 0}
{'M': 12,
'N': 9,
'edge': ((2, 0), (3, 0)),
'has_spanning_cluster': False,
'max_cluster_size': 2,
'moments': array([ 7., 7., 7., 7., 7.]),
'n': 1}
{'M': 12,
'N': 9,
'edge': ((1, 2), (1, 1)),
'has_spanning_cluster': False,
'max_cluster_size': 2,
'moments': array([ 6., 7., 9., 13., 21.]),
'n': 2}
{'M': 12,
'N': 9,
'edge': ((3, 2), (3, 1)),
'has_spanning_cluster': False,
'max_cluster_size': 2,
'moments': array([ 5., 7., 11., 19., 35.]),
'n': 3}
{'M': 12,
'N': 9,
'edge': ((2, 0), (1, 0)),
'has_spanning_cluster': True,
'max_cluster_size': 3,
'moments': array([ 4., 6., 10., 18., 34.]),
'n': 4}
{'M': 12,
'N': 9,
'edge': ((1, 2), (2, 2)),
'has_spanning_cluster': True,
'max_cluster_size': 3,
'moments': array([ 3., 6., 14., 36., 98.]),
'n': 5}
{'M': 12,
'N': 9,
'edge': ((3, 0), (3, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 5,
'moments': array([ 2., 4., 10., 28., 82.]),
'n': 6}
{'M': 12,
'N': 9,
'edge': ((2, 2), (2, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 5,
'moments': array([ 1., 4., 16., 64., 256.]),
'n': 7}
{'M': 12,
'N': 9,
'edge': ((3, 2), (2, 2)),
'has_spanning_cluster': True,
'max_cluster_size': 9,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 8}
{'M': 12,
'N': 9,
'edge': ((3, 1), (2, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 9,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 9}
{'M': 12,
'N': 9,
'edge': ((1, 1), (2, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 9,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 10}
{'M': 12,
'N': 9,
'edge': ((2, 0), (2, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 9,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 11}
{'M': 12,
'N': 9,
'edge': ((1, 0), (1, 1)),
'has_spanning_cluster': True,
'max_cluster_size': 9,
'moments': array([ 0., 0., 0., 0., 0.]),
'n': 12}

Figure: Evolution of a single realization of bond percolation on the 3x3 square grid. The left-hand and right-hand outermost nodes are the auxiliary nodes for spanning cluster detection.
Single run statistics¶
Now, we want to compare cluster statistics for several sample states (realizations) evolved in parallel. We also want to compare these statistics for several system sizes \(L\).
# number of parallel runs (sample states to evolve)
runs = 4
Linear chain¶
# system sizes
chain_ls = [10, 100, 1000, 10000]
# generate the linear chain graphs with spanning cluster detection
# for all system sizes
chain_graphs = [ percolate.spanning_1d_chain(l) for l in chain_ls ]
# compute the single-run cluster statistics for all sample states
# and system sizes
chain_single_runs = [
[ percolate.single_run_arrays(graph=chain_graph) for _ in range(runs) ]
for chain_graph in chain_graphs
]
# plot
fig, axes = plt.subplots(
nrows=len(chain_ls), ncols=5, squeeze=True, figsize=(8.0, 6.0)
)
for l_index, l in enumerate(chain_ls):
for single_run in chain_single_runs[l_index]:
axes[l_index, 0].plot(
single_run['has_spanning_cluster'], lw=4, alpha=0.7, rasterized=True
)
axes[l_index, 1].plot(
single_run['max_cluster_size'], lw=4, alpha=0.7, rasterized=True
)
for k in range(3):
axes[l_index, k + 2].plot(
single_run['moments'][k], lw=4, alpha=0.7, rasterized=True
)
axes[l_index, 0].set_ylabel(r'L={}'.format(l))
for ax in axes[l_index, :]:
num_edges = chain_single_runs[l_index][0]['M']
ax.set_xlim(xmax=1.05 * num_edges)
ax.set_xticks([0, l / 2, l - 1])
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[l_index, 0].set_yticks([0, 1])
axes[0, 0].set_title(r'spanning?')
axes[0, 1].set_title(r'largest cluster')
for k in range(3):
axes[0, k + 2].set_title(r'$M_{}$'.format(k))
for ax in axes[-1, :]:
ax.set_xlabel(r'$n$')
plt.tight_layout(0)
plt.show()

Figure: Cluster statistics for single realizations of bond percolation on the linear chain with \(L\) nodes, evolved according to the Newman-Ziff algorithm over all occupation numbers \(n = 0 \ldots L - 1\).
# clear memory
del chain_single_runs
Square grid¶
# system sizes
grid_ls = [3, 10, 32, 100, 316]
# generate the square grid graphs with spanning cluster detection
# for all system sizes
grid_graphs = [ percolate.spanning_2d_grid(l) for l in grid_ls ]
# compute the single-run cluster statistics for all sample states
# and system sizes
grid_single_runs = [
[ percolate.single_run_arrays(graph=grid_graph) for _ in range(runs) ]
for grid_graph in grid_graphs
]
# plot
fig, axes = plt.subplots(
nrows=len(grid_ls), ncols=5, squeeze=True, figsize=(8.0, 6.0)
)
for l_index, l in enumerate(grid_ls):
for single_run in grid_single_runs[l_index]:
axes[l_index, 0].plot(
single_run['has_spanning_cluster'], lw=4, alpha=0.7, rasterized=True
)
axes[l_index, 1].plot(
single_run['max_cluster_size'], lw=4, alpha=0.7, rasterized=True
)
for k in range(3):
axes[l_index, k + 2].plot(
single_run['moments'][k], lw=4, alpha=0.7, rasterized=True
)
axes[l_index, 0].set_ylabel(r'L={}'.format(l))
for ax in axes[l_index, :]:
num_edges = grid_single_runs[l_index][0]['M']
ax.set_xlim(xmax=num_edges)
ax.set_xticks(np.linspace(0, num_edges, num=3))
ax.set_xticklabels(['0', '', num_edges])
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[l_index, 0].set_yticks([0, 1])
axes[0, 0].set_title(r'spanning?')
axes[0, 1].set_title(r'largest cluster')
for k in range(3):
axes[0, k + 2].set_title(r'$M_{}$'.format(k))
for ax in axes[-1, :]:
ax.set_xlabel(r'$n$')
plt.tight_layout(0)
plt.show()

Figure: Cluster statistics for single realizations of bond percolation on the \(L \times L\) square grid, evolved according to the Newman-Ziff algorithm over all occupation numbers \(n = 0 \ldots 2 L (L - 1)\).
# clear memory
del grid_single_runs
Microcanonical ensemble averages¶
Next we explore how pypercolate enables us to aggregate cluster statistics across a number of runs (realizations), evolved over all occupation numbers. For each occupation number, this yields the microcanonical average.
Linear chain¶
# number of runs
chain_runs = 40
# system sizes
chain_ls = [10, 100, 1000, 10000]
# generate the linear chain graphs with spanning cluster detection
# for all system sizes
chain_graphs = [ percolate.spanning_1d_chain(l) for l in chain_ls ]
# compute the microcanonical averages for all system sizes
chain_microcanonical_averages = [
percolate.microcanonical_averages(
graph=chain_graph, runs=chain_runs
)
for chain_graph in chain_graphs
]
# combine microcanonical averages into one array
chain_microcanonical_averages_arrays = [
percolate.microcanonical_averages_arrays(avg)
for avg in chain_microcanonical_averages
]
# plot
fig, axes = plt.subplots(
nrows=len(chain_ls), ncols=3, squeeze=True, figsize=(8.0, 6.0)
)
for l_index, l in enumerate(chain_ls):
avg_arrays = chain_microcanonical_averages_arrays[l_index]
line, = axes[l_index, 0].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['spanning_cluster'],
rasterized=True,
)
axes[l_index, 0].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['spanning_cluster_ci'].T[1],
avg_arrays['spanning_cluster_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
line, = axes[l_index, 1].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['max_cluster_size'],
rasterized=True,
)
axes[l_index, 1].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['max_cluster_size_ci'].T[1],
avg_arrays['max_cluster_size_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[l_index, 2].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['moments'][2],
rasterized=True,
)
axes[l_index, 2].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['moments_ci'][2].T[1],
avg_arrays['moments_ci'][2].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[l_index, 0].set_ylabel(r'L={}'.format(l))
axes[l_index, 1].set_ylim(ymax=1.0)
axes[l_index, 2].set_ylim(ymin=0.0)
for ax in axes[l_index, :]:
num_edges = avg_arrays['M']
ax.set_xlim(xmax=1.05 * num_edges)
ax.set_xticks([0, l / 2, num_edges])
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[0, 0].set_title(r'percolation probability')
axes[0, 1].set_title(r'percolation strength')
axes[0, 2].set_title(r'$\langle M_2 \rangle$')
for ax in axes[-1, :]:
ax.set_xlabel(r'$n$')
plt.tight_layout(0)
plt.show()

Figure: Microcanonical averages of cluster statistics of bond percolation on the linear chain with \(L\) nodes, over \(40\) runs. The samples of the microcanonical ensembles have been evolved by the Newman-Ziff algorithm over all occupation numbers \(n = 0 \ldots L - 1\).
Square grid¶
# number of runs
grid_runs = 40
# system sizes
grid_ls = [3, 10, 32, 100, 316]
# generate the square grid graphs with spanning cluster detection
# for all system sizes
grid_graphs = [ percolate.spanning_2d_grid(l) for l in grid_ls ]
# compute the microcanonical averages for all system sizes
grid_microcanonical_averages = [
percolate.microcanonical_averages(
graph=grid_graph, runs=grid_runs
)
for grid_graph in grid_graphs
]
# combine microcanonical averages into one array
grid_microcanonical_averages_arrays = [
percolate.microcanonical_averages_arrays(avg)
for avg in grid_microcanonical_averages
]
# plot
fig, axes = plt.subplots(
nrows=len(grid_ls), ncols=3, squeeze=True, figsize=(8.0, 6.0)
)
for l_index, l in enumerate(grid_ls):
avg_arrays = grid_microcanonical_averages_arrays[l_index]
line, = axes[l_index, 0].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['spanning_cluster'],
rasterized=True,
)
axes[l_index, 0].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['spanning_cluster_ci'].T[1],
avg_arrays['spanning_cluster_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
line, = axes[l_index, 1].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['max_cluster_size'],
rasterized=True,
)
axes[l_index, 1].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['max_cluster_size_ci'].T[1],
avg_arrays['max_cluster_size_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[l_index, 2].plot(
np.arange(avg_arrays['M'] + 1),
avg_arrays['moments'][2],
rasterized=True,
)
axes[l_index, 2].fill_between(
np.arange(avg_arrays['M'] + 1),
avg_arrays['moments_ci'][2].T[1],
avg_arrays['moments_ci'][2].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[l_index, 0].set_ylabel(r'L={}'.format(l))
axes[l_index, 1].set_ylim(ymax=1.0)
axes[l_index, 2].set_ylim(ymin=0.0)
for ax in axes[l_index, :]:
num_edges = avg_arrays['M']
ax.set_xlim(xmax=num_edges)
ax.set_xticks(np.linspace(0, num_edges, num=3))
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[0, 0].set_title(r'percolation probability')
axes[0, 1].set_title(r'percolation strength')
axes[0, 2].set_title(r'$\langle M_2 \rangle$')
for ax in axes[-1, :]:
ax.set_xlabel(r'$n$')
plt.tight_layout(0)
plt.show()

Figure: Microcanonical averages of cluster statistics of bond percolation on the \(L \times L\) square grid, over \(40\) runs. The samples of the microcanonical ensembles have been evolved by the Newman-Ziff algorithm over all occupation numbers \(n = 0 \ldots 2 L (L - 1)\).
Canonical ensemble averages¶
Having computed the microcanonical averages for all occupation numbers \(n\), the last step is to transform them to canonical averages for the desired values of the occupation probability \(p\).
Linear chain¶
# occupation probabilities
chain_ps_arrays = [ np.linspace(1.0 - x, 1.0, num=100) for x in [1.0, 0.1, 0.01] ]
# compute canonical averages from microcanonical averages
# for all occupation probabilities and system sizes
chain_stats = [
[
percolate.canonical_averages(ps, avg_arrays)
for avg_arrays in chain_microcanonical_averages_arrays
]
for ps in chain_ps_arrays
]
# plot
fig, axes = plt.subplots(
nrows=len(chain_ps_arrays), ncols=4, squeeze=True, figsize=(8.0, 4.5)
)
for ps_index, ps in enumerate(chain_ps_arrays):
for l_index, l in enumerate(chain_ls):
my_stats = chain_stats[ps_index][l_index]
line, = axes[ps_index, 0].plot(
ps,
my_stats['spanning_cluster'],
rasterized=True,
label=r'{}'.format(l),
)
axes[ps_index, 0].fill_between(
ps,
my_stats['spanning_cluster_ci'].T[1],
my_stats['spanning_cluster_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
line, = axes[ps_index, 1].plot(
ps,
my_stats['max_cluster_size'],
rasterized=True,
label=r'L={}'.format(l),
)
axes[ps_index, 1].fill_between(
ps,
my_stats['max_cluster_size_ci'].T[1],
my_stats['max_cluster_size_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 2].plot(
ps,
my_stats['moments'][2],
rasterized=True,
label=r'L={}'.format(l),
)
axes[ps_index, 2].fill_between(
ps,
my_stats['moments_ci'][2].T[1],
my_stats['moments_ci'][2].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 3].semilogy(
ps,
my_stats['moments'][2],
rasterized=True,
)
axes[ps_index, 3].fill_between(
ps,
np.where(
my_stats['moments_ci'][2].T[1] > 0.0,
my_stats['moments_ci'][2].T[1],
0.01
),
np.where(
my_stats['moments_ci'][2].T[0] > 0.0,
my_stats['moments_ci'][2].T[0],
0.01
),
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 0].set_ylim(ymax=1.0)
axes[ps_index, 1].set_ylim(ymax=1.0)
axes[ps_index, 2].set_ylim(ymin=0.0)
axes[ps_index, 3].set_ylim(ymin=0.5)
for ax in axes[ps_index, :]:
ax.set_xlim(xmin=ps.min(), xmax=ps.max() + (ps.max() - ps.min()) * 0.05)
ax.set_xticks(np.linspace(ps.min(), ps.max(), num=3))
for ax in axes[ps_index, :-1]:
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[0, 0].set_title(r'perc. probability')
axes[0, 1].set_title(r'perc. strength')
axes[0, 2].set_title(r'$\langle M_2 \rangle$')
axes[0, 3].set_title(r'$\langle M_2 \rangle$')
for ax in axes[-1, :]:
ax.set_xlabel(r'$p$')
axes[0, 2].legend(frameon=False, loc='center left')
plt.tight_layout(0)
plt.show()

Figure: Canonical averages of cluster statistics of bond percolation on the linear chain with \(L\) nodes, over \(40\) runs.
Square grid¶
# occupation probabilities
grid_ps_arrays = [ np.linspace(0.5 - x, 0.5 + x, num=100) for x in [0.5, 0.05] ]
# compute canonical averages from microcanonical averages
# for all occupation probabilities and system sizes
grid_stats = [
[
percolate.canonical_averages(ps, avg_arrays)
for avg_arrays in grid_microcanonical_averages_arrays
]
for ps in grid_ps_arrays
]
# plot
fig, axes = plt.subplots(
nrows=len(grid_ps_arrays), ncols=4, squeeze=True, figsize=(8.0, 4.5)
)
for ps_index, ps in enumerate(grid_ps_arrays):
for l_index, l in enumerate(grid_ls):
my_stats = grid_stats[ps_index][l_index]
line, = axes[ps_index, 0].plot(
ps,
my_stats['spanning_cluster'],
rasterized=True,
label=r'L={}'.format(l),
)
axes[ps_index, 0].fill_between(
ps,
my_stats['spanning_cluster_ci'].T[1],
my_stats['spanning_cluster_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
line, = axes[ps_index, 1].plot(
ps,
my_stats['max_cluster_size'],
rasterized=True,
label=r'L={}'.format(l),
)
axes[ps_index, 1].fill_between(
ps,
my_stats['max_cluster_size_ci'].T[1],
my_stats['max_cluster_size_ci'].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 2].plot(
ps,
my_stats['moments'][2],
rasterized=True,
label=r'L={}'.format(l),
)
axes[ps_index, 2].fill_between(
ps,
my_stats['moments_ci'][2].T[1],
my_stats['moments_ci'][2].T[0],
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 3].semilogy(
ps,
my_stats['moments'][2],
rasterized=True,
)
axes[ps_index, 3].fill_between(
ps,
np.where(
my_stats['moments_ci'][2].T[1] > 0.0,
my_stats['moments_ci'][2].T[1],
0.01
),
np.where(
my_stats['moments_ci'][2].T[0] > 0.0,
my_stats['moments_ci'][2].T[0],
0.01
),
facecolor=line.get_color(),
alpha=0.5,
rasterized=True,
)
axes[ps_index, 0].set_ylim(ymax=1.0)
axes[ps_index, 1].set_ylim(ymax=1.0)
axes[ps_index, 2].set_ylim(ymin=0.0)
axes[ps_index, 3].set_ylim(ymin=0.5)
for ax in axes[ps_index, :]:
ax.set_xlim(xmin=ps.min(), xmax=ps.max())
ax.set_xticks(np.linspace(ps.min(), ps.max(), num=3))
for ax in axes[ps_index, :-1]:
ax.set_yticks(np.linspace(0, ax.get_ylim()[1], num=3))
axes[0, 0].set_title(r'perc. probability')
axes[0, 1].set_title(r'perc. strength')
axes[0, 2].set_title(r'$\langle M_2 \rangle$')
axes[0, 3].set_title(r'$\langle M_2 \rangle$')
for ax in axes[-1, :]:
ax.set_xlabel(r'$p$')
axes[0, 2].legend(frameon=False, loc='best')
plt.tight_layout(0)
plt.show()

Figure: Canonical averages of cluster statistics of bond percolation on the \(L \times L\) square grid, over \(40\) runs.
Quickstart High-Performance Computing with pypercolate¶
The percolate.hpc
module is an implementation of the Newman–Ziff algorithm for High-Performance Computing (HPC).
Here we demonstrate how to compute the raw data for a finite-size scaling study of bond percolation on a square grid.
For high-performance computing, we utilize the task-based parallelization framework Jug.
Our simulation design is as follows:
For each system size L:
Prepare the graph of the current system size L
Prepare the graph for pypercolate and precompute auxiliary information
For each value p of the occupation probability:
Precompute the convolution coefficients f
Map-Reduce T tasks:
Map-Reduce R runs:
Seed the Random Number Generator
For each occupation number n:
Compute the microcanonical cluster statistics
For each occupation probability p:
Convolve the microcanonical statistics into the canonical statistics
Finalize tasks results
Write results to disk
A jugfile is a Python script to which Jug transparently adds parallelization magic.
Imports¶
First, we import the packages needed.
# Python 2/3 compatibility
from __future__ import unicode_literals
# http://python-future.org/compatible_idioms.html#zip-izip
from builtins import map, zip
from future.utils import iteritems
from functools import reduce
import itertools
import numpy as np
import scipy.stats
from jug import Task, TaskGenerator, barrier
from jug.utils import CustomHash
import jug.mapreduce
import percolate
import percolate.hpc
Define input¶
Here we assign the input parameters of the simulation study.
# script parameters
SYSTEM_DIMENSIONS = [8, 16, 32]
RUNS_PER_TASK = 100
NUMBER_OF_TASKS = 100
NUMBER_OF_RUNS = NUMBER_OF_TASKS * RUNS_PER_TASK
PS = np.linspace(0.4, 0.6, num=40)
SPANNING_CLUSTER = True
UINT32_MAX = 4294967296
SEED = 201508061904 % UINT32_MAX
ALPHA = 2 * scipy.stats.norm.cdf(-1.0) # 1 sigma
The SPANNING_CLUSTER
variable defines whether the script should detect spanning clusters.
The UINT32_MAX
integer is the upper bound of the seeds.
The SEED
seeds the central random number generator which in turn seeds the task random number generators.
Jug task to prepare the percolation graph and precomput the convolution factors¶
The following jug task creates the square grid graph of \(L \times L\) nodes, where \(L\) is the system size.
@TaskGenerator
def prepare_percolation_graph(dimension):
graph = percolate.spanning_2d_grid(length=dimension)
return percolate.percolate.percolation_graph(
graph=graph, spanning_cluster=SPANNING_CLUSTER
)
Next, the jugfile defines a task to precompute the convolution factors for a given occupation probability \(p\):
@TaskGenerator
def compute_convolution_factors_for_single_p(perc_graph_result, p):
return percolate.percolate._binomial_pmf(
n=perc_graph_result['num_edges'],
p=p,
)
An elementary run (sub-task)¶
Every task samples the bond percolation setting with a number of runs \(N_{\text{runs}}\). The following function implements a single run as described in the user guide. Note here that we carefully circumvent Jug’s automatic loading of task results: Instead of piping through the convolution factor tasks directly, we add a layer of indirectness and supply an iterator to these tasks. This way, Jug does not automatically load the task results, which would unnecessarily increase memory consumption and limit the system size. The inner loop convolves the microcanonical statistics for one occupation number after the other, carefully loading and unloading the respective convolution factor task results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | def bond_run(perc_graph_result, seed, ps, convolution_factors_tasks):
"""
Perform a single run (realization) over all microstates and return the
canonical cluster statistics
"""
microcanonical_statistics = percolate.hpc.bond_microcanonical_statistics(
seed=seed, **perc_graph_result
)
# initialize statistics array
canonical_statistics = np.empty(
ps.size,
dtype=percolate.hpc.canonical_statistics_dtype(
spanning_cluster=SPANNING_CLUSTER,
)
)
# loop over all p's and convolve canonical statistics
# http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
for row, convolution_factors_task in zip(
np.nditer(canonical_statistics, op_flags=['writeonly']),
convolution_factors_tasks,
):
# load task result
# http://jug.readthedocs.org/en/latest/api.html#jug.Task.load
assert not convolution_factors_task.is_loaded()
convolution_factors_task.load()
# fetch task result
my_convolution_factors = convolution_factors_task.result
# convolve to canonical statistics
row[...] = percolate.hpc.bond_canonical_statistics(
microcanonical_statistics=microcanonical_statistics,
convolution_factors=my_convolution_factors,
spanning_cluster=SPANNING_CLUSTER,
)
# explicitly unload task to save memory
# http://jug.readthedocs.org/en/latest/api.html#jug.Task.unload
convolution_factors_task.unload()
# initialize canonical averages for reduce
ret = percolate.hpc.bond_initialize_canonical_averages(
canonical_statistics=canonical_statistics,
spanning_cluster=SPANNING_CLUSTER,
)
return ret
|
A simple task¶
Here is the definition of the actual task, each instance of which conducts a number of runs and reduces their results:
@TaskGenerator
def bond_task(
perc_graph_result, seeds, ps, convolution_factors_tasks_iterator
):
"""
Perform a number of runs
The number of runs is the number of seeds
convolution_factors_tasks_iterator needs to be an iterator
We shield the convolution factors tasks from jug value/result mechanism
by supplying an iterator to the list of tasks for lazy evaluation
http://github.com/luispedro/jug/blob/43f0d80a78f418fd3aa2b8705eaf7c4a5175fff7/jug/task.py#L100
http://github.com/luispedro/jug/blob/43f0d80a78f418fd3aa2b8705eaf7c4a5175fff7/jug/task.py#L455
"""
# restore the list of convolution factors tasks
convolution_factors_tasks = list(convolution_factors_tasks_iterator)
return reduce(
percolate.hpc.bond_reduce,
map(
bond_run,
itertools.repeat(perc_graph_result),
seeds,
itertools.repeat(ps),
itertools.repeat(convolution_factors_tasks),
)
)
Write to disk¶
Another task is to write the finalized results for a system size to disk. We provide an example implementation here that writes to a HDF5 file:
@TaskGenerator
def write_to_disk(dimension, canonical_averages):
import h5py
# Read/write if exsits, create otherwise
f = h5py.File('percolate_hpc.hdf5', mode='a')
key = '{}'.format(dimension)
# dataset should not exist yet!
if key in f:
raise RuntimeError
f.create_dataset(
name=key,
data=canonical_averages,
)
f.close()
Hashing the iterator¶
For supplying a task iterator, for technical reasons we need to supply a dummy hash function to Jug:
def dummy_hash(x):
"""
Supplies a constant dummy hash
"""
return 'dummy_hash'.encode('utf-8')
The control flow¶
Now, we are in the position to implement the control flow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | convolution_factors = {}
perc_graph_results = {}
reduced_canonical_averages = {}
final_canonical_averages = {}
# initialize random number generator
rng = np.random.RandomState(seed=SEED)
# in general:
# avoid premature optimization: design simulation for large system sizes
# (small system sizes take much less time anyway)
for dimension in SYSTEM_DIMENSIONS:
# building the graph takes ca. 10s per 10^6 nodes
'''
>>> import timeit
>>> timeit.timeit(
... stmt='percolate.spanning_2d_grid(1000)',
... setup='import percolate',
... number=1
... )
>>>
'''
# So that is why we store every graph on disk (as a task)
perc_graph_results[dimension] = prepare_percolation_graph(dimension)
# computing the binomial PMF takes ca. 1s per 10^6 number of states *
# number of p's
'''
>>> import timeit
>>> s = """\
... [
... percolate.percolate._binomial_pmf(n=1e4, p=p)
... for p in 100 * [0.5]
... ]
>>> timeit.timeit(stmt=s, setup='import percolate', number=1)
'''
# so store all binomial PMF (for every dimension, p) on disk
convolution_factors[dimension] = [
compute_convolution_factors_for_single_p(
perc_graph_result=perc_graph_results[dimension], p=p,
)
for p in PS
]
# finish all tasks up to here first before proceeding
# http://jug.readthedocs.org/en/latest/barrier.html#barriers
barrier()
seeds = rng.randint(UINT32_MAX, size=NUMBER_OF_RUNS)
# now process the actual tasks
bond_tasks = list()
for my_seeds in np.split(seeds, NUMBER_OF_TASKS):
# pass an iterator to list of convolution factors tasks to circumvent
# jug automatic fetching results of each task (iterator is not resolved
# and hence we achieve lazy evaluation of tasks here)
convolution_iterator = iter(
convolution_factors[dimension]
)
# iter cannot be pickled / hashed, so supply a dummy hash
convolution_iterator_hashed = CustomHash(
convolution_iterator, dummy_hash,
)
bond_tasks.append(
bond_task(
perc_graph_result=perc_graph_results[dimension],
seeds=my_seeds,
ps=PS,
convolution_factors_tasks_iterator=convolution_iterator_hashed,
)
)
# reduce
reduced_canonical_averages[dimension] = jug.mapreduce.reduce(
reducer=percolate.hpc.bond_reduce,
inputs=bond_tasks,
reduce_step=100,
)
# finalize canonical averages
final_canonical_averages[dimension] = Task(
percolate.hpc.finalize_canonical_averages,
number_of_nodes=perc_graph_results[dimension]['num_nodes'],
ps=PS,
canonical_averages=reduced_canonical_averages[dimension],
alpha=ALPHA,
)
# write to disk
write_to_disk(
dimension=dimension,
canonical_averages=final_canonical_averages[dimension],
)
|
Note the use of Jug’s barrier
command.
It ensures that Jug finishes the computation of the percolation graph and all convolution factors, before proceeding with the actual bond tasks.
Normally, Jug auto-detects such dependencies and transparently makes sure that all previous tasks a particular task depends on have been run.
However, here we need to explicitly call barrier
, as we circumvent this very mechanism.
As explained earlier, this is to avoid the convolution factors of all occupation probabilities to be loaded into memory at the same time.
Executing the tasks¶
Finally, running
$ jug-execute jugfile.py
executes the tasks. This command can be run multiple times, in parallel: for example, on a cluster. Jug will automagically handle processing the tasks and storing intermediary results across worker nodes via the shared file system (NFS).
The complete jugfile¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | # encoding: utf-8
"""
A sample jugfile to use with Jug
jug execute jugfile
"""
# Python 2/3 compatibility
from __future__ import unicode_literals
# http://python-future.org/compatible_idioms.html#zip-izip
from builtins import map, zip
from future.utils import iteritems
from functools import reduce
import itertools
import numpy as np
import scipy.stats
from jug import Task, TaskGenerator, barrier
from jug.utils import CustomHash
import jug.mapreduce
import percolate
import percolate.hpc
# script parameters
SYSTEM_DIMENSIONS = [8, 16, 32]
RUNS_PER_TASK = 100
NUMBER_OF_TASKS = 100
NUMBER_OF_RUNS = NUMBER_OF_TASKS * RUNS_PER_TASK
PS = np.linspace(0.4, 0.6, num=40)
SPANNING_CLUSTER = True
UINT32_MAX = 4294967296
SEED = 201508061904 % UINT32_MAX
ALPHA = 2 * scipy.stats.norm.cdf(-1.0) # 1 sigma
@TaskGenerator
def prepare_percolation_graph(dimension):
graph = percolate.spanning_2d_grid(length=dimension)
return percolate.percolate.percolation_graph(
graph=graph, spanning_cluster=SPANNING_CLUSTER
)
@TaskGenerator
def compute_convolution_factors_for_single_p(perc_graph_result, p):
return percolate.percolate._binomial_pmf(
n=perc_graph_result['num_edges'],
p=p,
)
def bond_run(perc_graph_result, seed, ps, convolution_factors_tasks):
"""
Perform a single run (realization) over all microstates and return the
canonical cluster statistics
"""
microcanonical_statistics = percolate.hpc.bond_microcanonical_statistics(
seed=seed, **perc_graph_result
)
# initialize statistics array
canonical_statistics = np.empty(
ps.size,
dtype=percolate.hpc.canonical_statistics_dtype(
spanning_cluster=SPANNING_CLUSTER,
)
)
# loop over all p's and convolve canonical statistics
# http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
for row, convolution_factors_task in zip(
np.nditer(canonical_statistics, op_flags=['writeonly']),
convolution_factors_tasks,
):
# load task result
# http://jug.readthedocs.org/en/latest/api.html#jug.Task.load
assert not convolution_factors_task.is_loaded()
convolution_factors_task.load()
# fetch task result
my_convolution_factors = convolution_factors_task.result
# convolve to canonical statistics
row[...] = percolate.hpc.bond_canonical_statistics(
microcanonical_statistics=microcanonical_statistics,
convolution_factors=my_convolution_factors,
spanning_cluster=SPANNING_CLUSTER,
)
# explicitly unload task to save memory
# http://jug.readthedocs.org/en/latest/api.html#jug.Task.unload
convolution_factors_task.unload()
# initialize canonical averages for reduce
ret = percolate.hpc.bond_initialize_canonical_averages(
canonical_statistics=canonical_statistics,
spanning_cluster=SPANNING_CLUSTER,
)
return ret
@TaskGenerator
def bond_task(
perc_graph_result, seeds, ps, convolution_factors_tasks_iterator
):
"""
Perform a number of runs
The number of runs is the number of seeds
convolution_factors_tasks_iterator needs to be an iterator
We shield the convolution factors tasks from jug value/result mechanism
by supplying an iterator to the list of tasks for lazy evaluation
http://github.com/luispedro/jug/blob/43f0d80a78f418fd3aa2b8705eaf7c4a5175fff7/jug/task.py#L100
http://github.com/luispedro/jug/blob/43f0d80a78f418fd3aa2b8705eaf7c4a5175fff7/jug/task.py#L455
"""
# restore the list of convolution factors tasks
convolution_factors_tasks = list(convolution_factors_tasks_iterator)
return reduce(
percolate.hpc.bond_reduce,
map(
bond_run,
itertools.repeat(perc_graph_result),
seeds,
itertools.repeat(ps),
itertools.repeat(convolution_factors_tasks),
)
)
@TaskGenerator
def write_to_disk(dimension, canonical_averages):
import h5py
# Read/write if exsits, create otherwise
f = h5py.File('percolate_hpc.hdf5', mode='a')
key = '{}'.format(dimension)
# dataset should not exist yet!
if key in f:
raise RuntimeError
f.create_dataset(
name=key,
data=canonical_averages,
)
f.close()
def dummy_hash(x):
"""
Supplies a constant dummy hash
"""
return 'dummy_hash'.encode('utf-8')
convolution_factors = {}
perc_graph_results = {}
reduced_canonical_averages = {}
final_canonical_averages = {}
# initialize random number generator
rng = np.random.RandomState(seed=SEED)
# in general:
# avoid premature optimization: design simulation for large system sizes
# (small system sizes take much less time anyway)
for dimension in SYSTEM_DIMENSIONS:
# building the graph takes ca. 10s per 10^6 nodes
'''
>>> import timeit
>>> timeit.timeit(
... stmt='percolate.spanning_2d_grid(1000)',
... setup='import percolate',
... number=1
... )
>>>
'''
# So that is why we store every graph on disk (as a task)
perc_graph_results[dimension] = prepare_percolation_graph(dimension)
# computing the binomial PMF takes ca. 1s per 10^6 number of states *
# number of p's
'''
>>> import timeit
>>> s = """\
... [
... percolate.percolate._binomial_pmf(n=1e4, p=p)
... for p in 100 * [0.5]
... ]
>>> timeit.timeit(stmt=s, setup='import percolate', number=1)
'''
# so store all binomial PMF (for every dimension, p) on disk
convolution_factors[dimension] = [
compute_convolution_factors_for_single_p(
perc_graph_result=perc_graph_results[dimension], p=p,
)
for p in PS
]
# finish all tasks up to here first before proceeding
# http://jug.readthedocs.org/en/latest/barrier.html#barriers
barrier()
seeds = rng.randint(UINT32_MAX, size=NUMBER_OF_RUNS)
# now process the actual tasks
bond_tasks = list()
for my_seeds in np.split(seeds, NUMBER_OF_TASKS):
# pass an iterator to list of convolution factors tasks to circumvent
# jug automatic fetching results of each task (iterator is not resolved
# and hence we achieve lazy evaluation of tasks here)
convolution_iterator = iter(
convolution_factors[dimension]
)
# iter cannot be pickled / hashed, so supply a dummy hash
convolution_iterator_hashed = CustomHash(
convolution_iterator, dummy_hash,
)
bond_tasks.append(
bond_task(
perc_graph_result=perc_graph_results[dimension],
seeds=my_seeds,
ps=PS,
convolution_factors_tasks_iterator=convolution_iterator_hashed,
)
)
# reduce
reduced_canonical_averages[dimension] = jug.mapreduce.reduce(
reducer=percolate.hpc.bond_reduce,
inputs=bond_tasks,
reduce_step=100,
)
# finalize canonical averages
final_canonical_averages[dimension] = Task(
percolate.hpc.finalize_canonical_averages,
number_of_nodes=perc_graph_results[dimension]['num_nodes'],
ps=PS,
canonical_averages=reduced_canonical_averages[dimension],
alpha=ALPHA,
)
# write to disk
write_to_disk(
dimension=dimension,
canonical_averages=final_canonical_averages[dimension],
)
|
User Guide¶
Percolation Theory¶
Introduction¶
Percolation theory characterizes how global connectivity emerges in a system of a large number of objects. These objects connect according to some local rule constrained by an underlying topology. Thus, given the topology and the local rule, percolation theory yields the global, emergent behavior [HEG14]. Early occurrences of percolation theory in the literature include the classic works by Flory and Stockmayer on polymerization and the sol-gel transition [Flo41][Sto43]. However, it is only later that a theory of percolation starts to condense [BH57].
Definition ([HEG14], p. 2): We say a system is at percolation, or percolates, if sufficiently many objects are connected locally such that a global connection emerges. This global connection is a continuous “chain” or “cluster” of locally connected objects, which is unbounded in size (in infinite systems), or of the order of the system size (in finite systems).
Typically, percolation also refers to a stochastic process of increasing connectivity and eventual emergence of the giant cluster. In an infinite system, this emergence in an ensemble of system configurations constitutes a phase transition. In fact, percolation is a phase transition paradigm [SA94].
The central quantity in percolation settings is the cluster size distribution \(n_s\), which we will introduce shortly. The setting of percolation is a graph. A typical setting is a regular lattice of sites connected to their nearest neighbors. In site percolation, all sites are subsequently occupied. In bond percolation, it is the bonds that are subsequently added to form a giant cluster of connected sites.
In the following, we introduce the concepts and notation mainly according to Stauffer’s and Aharony’s classical textbook [SA94].
The cluster size distribution¶
In the regular lattice setting, a cluster is a maximum set of occupied sites which are pairwise joined by paths on the lattice only traversing occupied sites. In general, a cluster is component of the graph. The size \(s\) of a cluster is the number of nodes in the component. Note that infinite graphs allow for infinite cluster sizes.
The occupation of sites, or the cluster sizes, typically depend on a (global) system parameter. For example, the paradigmatic percolation model is that of each site independently being occupied with some probability \(p\) (Bernoulli percolation). All the following statistics only require the general percolation setting of a graph. Let \(\varrho\) denote the system parameter.
Definition: For any given cluster size \(s\), let the cluster number \(n_s(\varrho, L)\) be the density of clusters of size \(s\) with respect to the system size \(L\).
In other words, in a system of \(L\) sites, the cluster number \(n_s(\varrho, L)\) is the number \(N_s(\varrho, L)\) of clusters of size \(s\) divided by the total number \(L\) of sites,
This definition also applies to systems of infinite size as
The cluster size distribution \(n_s\) is the fundamental quantity in percolation theory.
Percolation threshold and characteristic cluster size¶
Typically, in an infinite system the largest cluster grows with increasing parameter \(\varrho\), and at some critical value \(\varrho_c\), an infinite cluster appears. This number \(\varrho_c\) is the percolation threshold. At and above \(\varrho_c\), there is an infinite cluster, and the system is said to percolate.
The probability that a system of size \(L\) percolates at parameter \(\varrho\), i.e. has a cluster of order of the system size, is \(\Pi(\varrho,L)\). In the infinite system, we have
The percolation strength \(P(\varrho, L)\) is the fraction of sites belonging to the infinite cluster. In the infinite system, the limit strength \(P(\varrho) = \lim_{L \to \infty} P(\varrho, L)\) is the typical order parameter of the percolation transition.
The cluster size distribution typically is of the form
for large \(s\) and with some characteristic cluster size \(s_\xi\). At the percolation transition, the characteristic cluster size \(s_\xi\) diverges as a power law,
with the critical exponent \(\sigma\).
In general, clusters of size \(s < s_\xi \sim |\varrho - \varrho_c|^{-1 / \sigma}\) dominate the moments of the cluster size distribution. These clusters effectively follow a power-law distribution \(n_s(\varrho) \sim s^{-\tau}\), as all clusters at the critical point \(n_s(\varrho_c) \sim s^{-\tau}\). For \(s \gg s_\xi\), the distribution is cut off exponentially. Thus, clusters in this regime do not exhibit “critical” behavior.
Average cluster size¶
For any given site, the probability that it is part of a cluster of size \(s\) is \(s n_s\). The occupation probability \(p(\varrho, L)\) is the probability that any given site is part of a finite cluster, in a system of size \(L\) (may be infinite) at parameter \(\varrho\),
which is the first moment of the cluster size distribution.
Hence, for any given site of any given finite cluster, the probability \(w_s(\varrho, L)\) that the cluster is of size \(s\), is
with \(\sum_{s=1}^\infty w_s(\varrho, L) = 1\).
For any given site of any given finite cluster, the average size \(S(\varrho, L)\) of the cluster is
which is the second moment divided by the first moment of the cluster size distribution. Note that this average is different from the average of the (finite) cluster sizes in the system. The average cluster size \(S(\varrho, L)\) is defined with respect to a site, and thus, it is an intensive quantity [SA94].
Note that for infinite systems (\(L\to\infty\)), these statistics exclude the infinite cluster. At the critical point, the average cluster size \(S(\varrho_c)\) nevertheless diverges as
with the critical exponent \(\gamma\). As \(S\) is the second moment of the cluster size distribution (up to a factor), it is a measure of fluctuations in the system. Thus, divergence of \(S\) actually defines the percolation phase transition.
Correlation length¶
The correlation function \(g(\mathbf{r})\) is the probability that a site at position \(\mathbf{r}\) from an occupied site in a finite cluster belongs to the same cluster.
Typically, for large \(r \equiv |\mathbf{r}|\), there is an exponential cutoff, i.e. \(g(\mathbf{r}) \sim e^{-r/\xi}\), at the correlation length \(\xi\).
Definition: The correlation length \(\xi\) is defined as
For a cluster of size \(s\), its radius of gyration \(R_s\) is defined as the average square distance to the cluster center of mass [SA94]. It turns out that \(2 R_s^2\) is the average square distance between two sites of the same (finite) cluster. Averaging over \(2R_s^2\) yields the squared correlation length [SA94],
since \(s^2 n_s\) is the weight of clusters of size \(s\). Hence, the correlation length is the radius of the clusters that dominate the second moment of the cluster size distribution, or, the fluctuations.
The divergence of quantities at the critical point involves sums over all cluster sizes \(s\). The cutoff of the cluster number \(n_s\) at \(s_\xi \sim |\varrho - \varrho_c|^{-1/\sigma}\) marks the cluster sizes \(s \approx s_\xi\) that contribute the most to the sums and hence, to the divergence. This also holds for the correlation length \(\xi\), which is the radius of those clusters of sizes \(s \approx s_\xi\). As such, this is the one and only length scale which characterizes the behavior of an infinite system in the critical region [SA94].
The correlation length \(\xi\) defines the relevant length scale. As \(\xi\) diverges at \(\varrho \to \varrho_c\), a length scale is absent at the percolation transition \(\varrho = \varrho_c\). This lack of a relevant length scale is a typical example of scale invariance. This implies that the system appears to be self-similar on length scales smaller than \(\xi\). As \(\xi\) becomes infinite at \(\varrho_c\), the whole system becomes self-similar. The lack of a relevant length scale also implies that functions of powers (power laws) describe the relevant quantities in the critical region. In particular, the correlation length itself diverges as a power law,
The form of this divergence is the same in all systems, which is called universal behavior. The critical exponent \(\nu\) depends only on general features of the topology and the local rule, giving rise to universality classes of systems with the same critical exponents.
Scaling relations¶
The scaling theory of percolation clusters relates the critical exponents of the percolation transition to the cluster size distribution [Sta79]. As the critical point lacks any length scale, the cluster sizes also need to follow a power law,
with the Fisher exponent \(\tau\) [Fis67]. The scaling assumption is that the ratio \(n_s(\varrho) / n_s(\varrho_c)\) is a function of the ratio \(s / s_\xi(\varrho)\) only [Sta79],
As in the critical region, the characteristic cluster size diverges as \(s_\xi \sim |\varrho - \varrho_c|^{-1/\sigma}\), we have \(s / s_\xi(\varrho) \sim |(\varrho - \varrho_c) s^\sigma |^{1/\sigma}\), and hence
with some scaling function \(f\) which rapidly decays to zero, \(f(x) \to 0\) for \(|x| > 1\) (\(s > s_\xi\)) [SA94].
It remains to determine the scaling relationship of cluster radius \(R_s\) and cluster size \(s\) in the critical region. For \(s \sim R_s^D\) for some possibly fractal cluster dimension \(D\), we have [SA94]
The cutoff cluster size \(s_\xi\) was the crossover size separating critical behavior (\(n_s \sim s^{-\tau}\)) from non-critical behavior (\(n_s \to 0\) exponentially fast). Now, the correlation length \(\xi \sim s_\xi^{1/D} = s_\xi^{\sigma \nu}\) is the crossover length separating the critical and non-critical regimes.
The following scaling law relates the system dimensionality \(d\) and the fractal dimensionality \(D = \frac{1}{\sigma \nu}\) of the infinite cluster to the exponents of the cluster size distribution [HEG14]:
Consider the \(k\)-th raw moment of the cluster size distribution
which scales as
in the critical region.
In this region, above the percolation threshold (\(\varrho > \varrho_c\)), the percolation strength behaves as [SA94]
defining the critical exponent \(\beta\) as
As the second raw moment \(M_2(\varrho) \sim |\varrho - \varrho_c|^{(\tau - 3)/\sigma}\), we have
or
These are the scaling relations between the critical exponents, which all derive from the exponents \(\tau\) and \(\sigma\) of the cluster size distribution.
The Newman–Ziff Algorithm¶
Introduction¶
The scope of the Newman–Ziff algorithm is the simulation and statistical analysis of percolation on graphs [NZ01]. On regular lattices with \(N\) sites, the algorithm takes time \(\mathcal{O}(N)\), which is an enormous improvement compared to the \(\mathcal{O}(N^2)\) complexity of conventional percolation simulation algorithms.
In site percolation problems, each node of the graph is either “occupied” or “empty”. The occupied nodes induce a subgraph. When the largest component of this subgraph spans the whole original graph, the system is said to percolate. The notion of such a spanning cluster is straightforward in regular lattices: it either extends from one boundary to the opposite boundary, or wraps around the whole system if there are no boundaries (e.g. periodic boundary conditions).
In bond percolation problems, each edge of the graph is either “occupied” or “empty”. The occupied edges induce a subgraph. (A variant is to consider an edge either to exist or not, and consider the resulting graph.) As in site percolation, the system is said to percolate if the largest component (cluster) of this subgraph extends across the whole original graph.
In Bernoulli percolation problems, let it be either site or bond percolation, each site or bond is independently occupied with a probability \(p\). This is a paradigm of percolation theory. As the number of configurations grow exponentially with system size, one resorts to Monte Carlo simulation: that is, sampling the space of all configurations and computing the statistics from these samples. In order to numerically trace the percolation transition with finite-size scaling analysis, one needs to simulate several realizations (samples, runs) over a range of increasing system sizes and sufficiently many values of the order paramater \(p\) in the critical region. This entails simulating independent runs for almost arbitrarily close values of \(p\) in high-resolution studies. While the computation time grows linearly with resolution, accuracy does not. The intrinsic variance of the statistics at two adjacent values of \(p\), due to the independence of runs, dominates the difference caused by the differing values of \(p\).
The identification of clusters in any given configuration conventionally takes \(\mathcal{O}(N)\) time, which amounts to the overall \(\mathcal{O}(N^2)\) time requirement.
From microcanonical to canonical statistics¶
In their 2001 paper, Neman & Ziff point out the following [NZ01]. The order parameter \(p\) is originally defined microscopically as the occupation probability. However, and consequently, it is also the macroscopic average occupation ratio. This is a weighted average over all configurations (microstates). Their weight is of course determined by the microscopic occupation probability \(p\). Each such microstate has a fixed occupation number \(n\). All configurations at a fixed \(n\) constitute the microcanonical ensemble at that number. Weighting and averaging over a statistic of each microcanonical ensemble yields the canonical average of that statistic. Newman & Ziff refer to this thermodynamic analogy, where \(p\) plays the role of temperature, and the occupation number \(n\) the role of the energy of a microstate.
First, we sample all microcanonical ensembles (for each occupation number \(n\)), and compute the cluster statistic of interest. Then, we convolve the resulting microcanonical averages with their weights in the canonical ensemble of a given value of \(p\). This yields the canonical average of the statistic at that value of \(p\). This procedure enables us to compute the cluster statistics at an arbitrary resolution of the order parameter \(p\) from the precomputed microcanonical averages.
For bond percolation, the probability that \(n\) out of a total of \(M\) bonds are occupied at occupation probability \(p\) is the binomial probability mass function ([NZ01], Equation 1):
A microcanonical statistic \(\{Q_m\}\) measured at each occupation number \(m\) transforms into the canonical average \(Q(p)\) for occupation probability \(p\) according to the convolution ([NZ01], Equation 2):
Common Random Numbers and incremental cluster detection¶
At the core of the Newman-Ziff algorithm is the incremental evolution of a set of sample states (realizations) by successively adding bonds to an initially empty lattice. Specifically, a sample state with \(n + 1\) occupied bonds derives from a sample state with \(n\) occupied bonds and one additional bond. Instead of identifying the clusters for \(n + 1\) from scratch, it takes only little effort to derive the clusters from the previous configuration with \(n\) occupied bonds. As a result, the overall time complexity of the algorithm reduces to \(\mathcal{O}(N)\).
A convenient side effect is that this method effectively employs the variance reduction technique of common random numbers for runs across all occupation numbers \(n\). Different realizations, or run, still remain independent, which is all we need. At the same time, fluctuations solely due to using different realizations when changing the occupation number \(n\), or, ultimately, the occupation probability \(p\), now remain absent from the statistics.
Starting each run with a lattice with all \(M\) bonds empty, we successively add a bond to each configuration. Addings bonds is random, but in a predetermined random permutation. Each new bond might either join two sites from different clusters, resulting in the merger of these two clusters, or the new bond joins two sites of the same cluster. Keeping track of this merging of clusters is taken care of by a weighted union/find algorithm with path compression (see [NZ01] and references therein).
In order to detect a percolating, spanning cluster, we introduce auxiliary nodes to the graph, see [NZ01], Section II.D and Figure 6.
A Python Implementation of the Newman–Ziff algorithm¶
The pypercolate Python package implements the Newman–Ziff algorithm for bond percolation on graphs.
The elementary unit of computation is the sample state:
This is one particular realization with a given number of edges—one member of
the microcanonical ensemble.
As Newman & Ziff suggest [NZ01], the package evolves a sample
state by successively adding edges, in a random but predetermined order.
This is implemented as a generator function
percolate.percolate.sample_states()
to iterate over.
Each step of the iteration adds one edge.
A collection of sample states (realizations) evolved in parallel form a
microcanonical ensemble at each iteration step.
A microcanonical ensemble is hence a collection of different sample states
(realizations) but with the same number of edges (occupation number).
The percolate.percolate.microcanonical_averages()
generator function
evolves a microcanonical ensemble.
At each step, it calculates the cluster statistics over all realizations in the
ensemble.
The percolate.percolate.microcanonical_averages_arrays()
helper function
collects these statistics over all iteration steps into single numpy arrays.
Finally, the percolate.percolate.canonical_averages()
function calculates
the statistics of the canonical ensemble from the microcanonical ensembles.
The percolate.percolate.sample_states()
generator handles cluster joining
by the networkx.utils.union_find.UnionFind data structure.
A Python Implementation of the Newman–Ziff Algorithm for High-Performance Computing (HPC)¶
The percolate.hpc module implements the Newman–Ziff algorithm for bond percolation on graphs, tailored towards high-performance computing (HPC). This guide introduces the design of the hpc module to the simulationist. It assumes that the reader is familiar with the pypercolate package and high-performance computing in general.
Performance considerations¶
The hpc module is designed to allow for percolation studies on medium-sized and large graphs with approximately \(10^3\) to \(10^7\) nodes), and averaged over an arbitrarily large number of runs. Currently, it is limited by memory consumption in particular of the networkx graph. A networkx graph consumes approximately \(1\) GiB per \(10^6\) nodes. An efficient implementation would store the list of edges in memory. Let us assume that the number of edges is of the same order as the number of nodes \(N\). For up to approximately \(10^9\) nodes, each node label requires an unsigned integer of \(32\) bits or \(4\) bytes. Hence, each edge consumes \(2 \times 4\) bytes of memory. At \(N\) edges, the list of edges requires of the order of \(10 N\) bytes of memory. A more efficient implementation hence would require about 2 orders of magnitude less memory. The graph-tool Python package provides a more efficient implementation through the native-C++ Boost Graph Library [Pei14]. Future improvements of the percolate.hpc module could implement graph-tool graphs. However, graph-tool requires compilation and hence, careful setup, while networkx is trivial to install into every Python environment.
Template for a HPC bond percolation finite-size scaling study¶
percolate.hpc supports the following control flow for a bond percolation finite-size scaling study. The input parameters of such a study are:
- a list of finite system sizes \(L_i\)
- the number of runs per task \(n_{\text{runs}}\)
- the number of tasks \(N_{\text{tasks}}\)
- the total number of runs \(N_{\text{runs}} = N_{\text{tasks}} \cdot n_{\text{runs}}\)
- the realizations \(\omega_j\) (one per run)
- the occupation probabilities \(p_k\) at which to compute the macrocanonical averages for each system size \(L_i\)
- the confidence level \(1 - \alpha\) for the confidence intervals
Each run is determined by a particular realization \(\omega_j\) of a permutation of bonds to add one-by-one. Typically, this realization \(\omega_j\) is the seed of a pseudo-random number generator (RNG) which reproducibly produces the permutation. The usual considerations of parallel stochastic experiments apply here. For example, the seeds and streams of random numbers of distinct runs need to be independent. For this study, we assume that the astronomically large size of the Mersenne–Twister RNG sufficiently avoids any overlap between runs.
First, each finite system size provides for its own independent sub-study. Each sub-study is structured as follows.
- Prepare the NetworkX graph of the current system size \(L_i\) (user-defined).
- Prepare the graph and auxiliary information for the pypercolate routines with
percolate.percolate.percolation_graph()
. - For every value \(p_k\) of the occupation probability at the current system size, precompute the convolution coefficients \(f_n\) with
percolate.percolate._binomial_pmf()
for all occupation numbers \(n = 0, \ldots, M_i\) (where \(M_i\) is the number of edges at system size \(L_i\)). - Execute \(N_{\text{tasks}}\) tasks.
- Reduce the tasks results with
percolate.hpc.bond_reduce()
. - Finalize the tasks results with
percolate.hpc.finalize_macrocanonical_averages()
. - Write finalized results to disk.
Each task employs the Map–Reduce paradigm to perform \(n_{\text{runs}}\) independent runs.
Basically, a tasks maps a run function onto the seeds \(\omega_j\), reduces the outputs with percolate.hpc.bond_reduce()
, and returns the result.
The output of a task is a NumPy structured array with one row per occupation probability \(p_k\) and the following fields:
- The number of runs \(N_{\text{runs}}\)
- The mean and sum of squared differences to the mean for the
- percolation probability
- maximum cluster size (absolute number of nodes)
- raw moments (absolute number of nodes)
The bond_reduce()
method wraps the simoa.stats.online_variance()
method, which in turn implements the single-pass pairwise algorithm to calculate the variance by Chan et al.
This algorithm stores and updates the sum of squared differences to the mean.
Dividing this sum by the sample size minus \(1\) yields the variance.
The finalize_macrocanonical_averages()
takes the reduced output of all tasks of a system size.
It throughputs the means, and calculates the sample standard deviation from the sum of squared differences.
It further computes the standard confidence intervals at the given \(1 - \alpha\) level of confidence.
The output is a NumPy structured array with one row for each occupation probability \(p_k\) and the following fields:
- the total number of runs \(N_{\text{runs}}\),
- the occupation probability \(p_k\),
- the confidence level \(1 - \alpha\),
- mean, sample standard deviation and \(1 - \alpha\) confidence interval of
- the percolation probability,
- the percolation strength (the maximum cluster size normalized by the total number of nodes)
- the raw moments (normalized by the total number of nodes)
Each run consists of the following steps:
- Accumulate the microcanonical cluster statistics \(Q_n\) across all occupation numbers \(n\) with
percolate.hpc.bond_microcanonical_statistics()
. - For each occupation probability \(p_k\), convolve the microcanonical cluster statistics \(Q_n\) into the macrocanonical statistics \(Q_p\) with the pre-computed convolution coefficients \(f_n\) and
percolate.hpc.bond_macrocanonical_statistics()
- Prepare the canonical statistics of this single run for reducing with other runs with
percolate.hpc.bond_initialize_macrocanonical_averages()
.
Amendment to the original Newman–Ziff algorithm¶
Newman & Ziff point out that the linear operations of averaging and convolving commute. As the convolution is more intricate, they advise to average over the runs first (microcanonical averages), and apply the convolution to the microcanonical averages to arrive at the canonical averages. The design choice here is to perform the convolution first. The reason is that the number of occupation probabilities typically is of the order of \(10^1\) to \(10^2\), irrespective of the system size. In contrast, the order of occupation numbers is of the order of the system size, thus reaching \(10^6\) and beyond. In order to restrain memory consumption, the convolution immediately reduces the result of each run to a negligible amount of memory. Intermediary run results thus need not be kept in memory. On the other hand, we could also reduce the microcanonical run results on the fly, which would also spare us from keeping intermediary results. However, we note here that the calculation of the variance and confidence intervals is a non-linear operation, and hence in general the order of performing this operation and averaging matters. As the variances and confidence intervals are due to ensemble averaging, they should be performed only on the canonical cluster statistics for each single run. This is also how they would be performed if one were to sample the canonical ensemble directly, by creating different realizations of the bond percolation setting at every occupation probability.
Implicit NumPy Multithreading¶
We also note another design choice here that relates to high-performance computing on a shared cluster. NumPy is built against sophisticated linear algebra libraries (BLAS and derivates), and technically, these libraries might use multithreading for certain operations. For example, the numpy.dot function might transparently run on several cores/threads for 2-dimensional matrices. While this is a design feature running NumPy on a single workstation, it is generally undesired in a shared cluster computing environment. As a queueing system handles the allocation of tasks to the worker nodes, it expects each task to consume only one core. Using multiple cores impedes other tasks running concurrently on the same worker node. Hence, we deliberately avoid the dot product and stick to elementary NumPy operations. For example, the following two statements are equivalent:
np.dot(a, b) # might use multithreading
np.sum(a * b) # does not use multithreading
pypercolate HPC performance metrics¶
In this Section, we compute basic performance metrics of the percolate.hpc module.
Namely, we time the execution and measure memory consumption of graph creation with spanning_2d_grid()
, of computing the convolution coefficients with _binomial_pmf()
, and of performing a single run with bond_microcanonical_statistics()
.
Preamble¶
# configure plotting
%config InlineBackend.rc = {'figure.dpi': 300, 'savefig.dpi': 300, \
'figure.figsize': (6, 3), 'font.size': 12, \
'figure.facecolor': (1, 1, 1, 0)}
%matplotlib inline
import timeit
import matplotlib.pyplot as plt
import memory_profiler
import numpy as np
import percolate
plt.style.use('ggplot')
System sizes¶
We determine performance metrics for the following system sizes.
dimensions = 2 ** np.arange(3, 11)
np.save('dimensions.npy', dimensions)
print(dimensions)
[ 8 16 32 64 128 256 512 1024]
System information¶
This Section details the hardware, the operating system, and the Python environment that collects the performance metrics.
# $ pip install py-cpuinfo
from cpuinfo import cpuinfo
{
key: value
for key, value in cpuinfo.get_cpu_info().items()
if key in ['arch', 'brand', 'count', 'hz_advertised', 'l2_cache_size', 'vendor_id']
}
{'arch': 'X86_64',
'brand': 'Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz',
'count': 4,
'hz_advertised': '2.7000 GHz',
'l2_cache_size': '4096 KB',
'vendor_id': 'GenuineIntel'}
import psutil
print("Total memory: {:.1f} GiB".format(psutil.virtual_memory().total / 1024 ** 3))
Total memory: 7.7 GiB
import os
import platform
import sys
import pip
import pprint
print(platform.platform())
'Linux-3.13.0-24-generic-x86_64-with-LinuxMint-17-qiana'
print(
platform.python_version(),
platform.python_implementation(),
platform.python_compiler(),
)
3.4.0 CPython GCC 4.8.2
pprint.pprint({
package.key: package.version
for package in pip.get_installed_distributions()
if package.key in [
'cython', 'future', 'ipython', 'matplotlib', 'memory-profiler',
'networkx', 'numpy', 'percolate', 'pip', 'scipy', 'simoa',
]
})
{'cython': '0.22.1',
'future': '0.15.0',
'ipython': '3.2.1',
'matplotlib': '1.4.3',
'memory-profiler': '0.33',
'networkx': '1.10',
'numpy': '1.9.2',
'percolate': '0.3.0.post0.dev10+gce92429.dirty',
'pip': '7.1.0',
'scipy': '0.16.0',
'simoa': '0.4.1'}
np.show_config()
openblas_info: library_dirs = ['/usr/lib'] language = f77 libraries = ['openblas'] blas_mkl_info: NOT AVAILABLE mkl_info: NOT AVAILABLE openblas_lapack_info: NOT AVAILABLE lapack_opt_info: include_dirs = ['/usr/include/atlas'] library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base'] language = f77 libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] define_macros = [('ATLAS_INFO', '"\"3.10.1\""')] lapack_mkl_info: NOT AVAILABLE atlas_3_10_threads_info: NOT AVAILABLE blas_opt_info: library_dirs = ['/usr/lib'] language = f77 libraries = ['openblas'] atlas_info: include_dirs = ['/usr/include/atlas'] library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base'] language = f77 libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] define_macros = [('ATLAS_INFO', '"\"3.10.1\""')] atlas_threads_info: NOT AVAILABLE atlas_3_10_info: NOT AVAILABLE
Scripts¶
For each performance metric, we define and run a script in an independent process. Each script reads the system sizes from disk and writes the performance metrics back to disk. We use independent scripts and processes here as especially memory usage measurements seem to be volatile if executed in the same process, IPython session, or notebook.
%%python
import memory_profiler
import numpy as np
import percolate
def get_graph_memory(dimension):
mem_before = memory_profiler._get_memory(-1)
graph = percolate.spanning_2d_grid(dimension)
mem_after = memory_profiler._get_memory(-1)
return mem_after - mem_before
dimensions = np.load('dimensions.npy')
graph_mem = np.fromiter(
(
get_graph_memory(dimension)
for dimension in dimensions
),
dtype=np.float,
)
np.save('graph_mem.npy', graph_mem)
%%python
import memory_profiler
import numpy as np
import percolate
def get_convolution_memory(dimension):
mem_before = memory_profiler._get_memory(-1)
convolution_factors = percolate.percolate._binomial_pmf(
n=2 * dimension * (dimension - 1), p=0.5,
)
mem_after = memory_profiler._get_memory(-1)
return mem_after - mem_before
dimensions = np.load('dimensions.npy')
convolution_mem = np.fromiter(
(
get_convolution_memory(dimension)
for dimension in dimensions
),
dtype=np.float,
)
np.save('convolution_mem.npy', convolution_mem)
%%python
import timeit
import numpy as np
import percolate
dimensions = np.load('dimensions.npy')
graph_times = np.fromiter(
(
timeit.timeit(
stmt='percolate.spanning_2d_grid({})'.format(dimension),
setup='import percolate',
number=1,
)
for dimension in dimensions
),
dtype=np.float,
)
np.save('graph_times.npy', graph_times)
%%python
import timeit
import numpy as np
import percolate
dimensions = np.load('dimensions.npy')
convolution_stmt = """\
[
percolate.percolate._binomial_pmf(n={}, p=p)
for p in np.linspace(0.4, 0.6, num=1)
]
"""
convolution_times = np.fromiter(
(
timeit.timeit(
stmt=convolution_stmt.format(2 * dimension * (dimension - 1)),
setup='import percolate; import numpy as np',
number=1,
)
for dimension in dimensions
),
dtype=np.float,
)
np.save('convolution_times.npy', convolution_times)
%%python
import timeit
import numpy as np
dimensions = np.load('dimensions.npy')
run_stmt = """\
percolate.hpc.bond_microcanonical_statistics(
seed=42, **perc_graph
)
"""
run_setup = """\
import percolate
import percolate.hpc
perc_graph = percolate.percolate.percolation_graph(
graph=percolate.spanning_2d_grid({}),
spanning_cluster=True,
)
"""
run_times = np.fromiter(
(
timeit.timeit(
stmt=run_stmt,
setup=run_setup.format(dimension),
number=1,
)
for dimension in dimensions
),
dtype=np.float,
)
np.save('run_times.npy', run_times)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:302: RuntimeWarning: invalid value encountered in power
ret['moments'] += ret['max_cluster_size'] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:286: RuntimeWarning: invalid value encountered in power
ret['moments'] -= weights[i] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:286: RuntimeWarning: invalid value encountered in subtract
ret['moments'] -= weights[i] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:297: RuntimeWarning: invalid value encountered in power
ret['moments'] += weight ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:297: RuntimeWarning: invalid value encountered in add
ret['moments'] += weight ** np.arange(5)
%%python
import memory_profiler
import numpy as np
import percolate
import percolate.hpc
def get_run_memory(dimension):
perc_graph = percolate.percolate.percolation_graph(
graph=percolate.spanning_2d_grid(dimension),
spanning_cluster=True,
)
mem_before = memory_profiler._get_memory(-1)
return (
max(memory_profiler.memory_usage(
(
percolate.hpc.bond_microcanonical_statistics,
[],
dict(seed=42, **perc_graph)
),
interval=.01,
)) - mem_before
)
dimensions = np.load('dimensions.npy')
run_mem = np.fromiter(
(
get_run_memory(dimension)
for dimension in dimensions
),
dtype=np.float,
)
np.save('run_mem.npy', run_mem)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:302: RuntimeWarning: invalid value encountered in power
ret['moments'] += ret['max_cluster_size'] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:286: RuntimeWarning: invalid value encountered in power
ret['moments'] -= weights[i] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:286: RuntimeWarning: invalid value encountered in subtract
ret['moments'] -= weights[i] ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:297: RuntimeWarning: invalid value encountered in power
ret['moments'] += weight ** np.arange(5)
/home/sorge/repos/sci/pypercolate/percolate/hpc.py:297: RuntimeWarning: invalid value encountered in add
ret['moments'] += weight ** np.arange(5)
graph_mem = np.load('graph_mem.npy')
convolution_mem = np.load('convolution_mem.npy')
graph_times = np.load('graph_times.npy')
convolution_times = np.load('convolution_times.npy')
run_times = np.load('run_times.npy')
run_mem = np.load('run_mem.npy')
Performance plots¶
plt.loglog(dimensions ** 2, graph_times, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel(r'execution time (s)')
plt.title(r'percolate.spanning\_2d\_grid')
plt.show()

plt.loglog(dimensions ** 2, graph_mem / 2 ** 10, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel('memory consumption (GiB)')
plt.title('percolate.spanning\_2d\_grid')
plt.show()

plt.loglog(dimensions ** 2, convolution_times, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel(r'execution time per $p$ (s)')
plt.title('percolate.percolate.\_binomial\_pmf')
plt.show()

plt.loglog(dimensions ** 2, convolution_mem / 2 ** 10, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel(r'memory consumption per $p$ (GiB)')
plt.title('percolate.percolate.\_binomial\_pmf')
plt.show()

plt.loglog(dimensions ** 2, run_times, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel(r'execution time (s)')
plt.title('percolate.hpc.bond\_microcanonical\_statistics')
plt.show()

plt.loglog(dimensions ** 2, run_mem / 2 ** 10, 'x')
plt.xlabel(r'number of nodes')
plt.ylabel(r'memory consumption (GiB)')
plt.title('percolate.hpc.bond\_microcanonical\_statistics')
plt.show()

Developer Guide¶
- Repository: github.com/andsor/pypercolate
- Bibliography: www.citeulike.org/group/19226
Development environment¶
Use tox to prepare virtual environments for development.
To set up a Python 2.7 environment in .devenv27
, run:
$ tox -e devenv27
To set up a Python 3.4 environment in .devenv34
, run:
$ tox -e devenv34
Add requirements for the development environments to the requirements-dev.txt file.
Packaging¶
This package uses setuptools.
Run
$ python setup.py sdist
or
$ python setup.py bdist
or
$ python setup.py bdist_wheel
to build a source, binary or wheel distribution.
Complete Git Integration¶
The package is maintained in a git repository.
The setuptools script setup.py
uses the information of tags to infer the
version of your project with the help of setuptools_scm.
To use this feature you need to tag with the format MAJOR.MINOR[.PATCH]
, e.g. 0.0.1
or 0.1
.
Run
$ python setup.py --version
to retrieve the current PEP440-compliant version.
This version will be used when building a package and is also accessible
through percolate.__version__
.
Unleash the power of Git by using its pre-commit hooks.
Make sure pre-commit is installed, e.g. pip install pre-commit
, then just
run pre-commit install
.
Sphinx Documentation¶
This project follows the NumPy documentation style.
Build the documentation with:
$ tox -e docs
Add further options separated from tox options by a double dash --
:
$ tox -e docs -- --help
Add requirements for building the documentation to the requirements-doc.txt file.
Continuous documentation building¶
For continuously building the documentation during development, run:
$ tox -e cdocs
Unittest & Coverage¶
Run
$ tox -e py27
or:
$ tox -e py34
to run all unittests defined in the subfolder test
with the help of tox
and py.test.
The py.test plugin pytest-cov is used to automatically generate a coverage report.
Continuous testing¶
For continuous testing in a Python 2.7 environment, run:
$ tox -e c27
For continuous testing in a Python 3.4 environment, run:
$ tox -e c34
Requirements Management¶
Add requirements to the requirements.txt file which
will be automatically used by setup.py
.
Bibliography¶
A CiteULike group manages the bibliography.
To download the bibliography, run
$ doit download_bib
Continuous Integration¶
pypercolate uses Travis to run the tests on each commit. Travis also reports the test coverage to Coveralls. If further deploys each tagged commit as a release to the Python Package Index (PyPI).
Landscape.io continuously measures “Code Health”.
ReadTheDocs builds and hosts this documentation.
percolate package¶
Submodules¶
percolate.hpc module¶
Low-level routines to implement the Newman-Ziff algorithm for HPC
See also
percolate
- The high-level module
-
percolate.hpc.
bond_canonical_statistics
(microcanonical_statistics, convolution_factors, **kwargs)[source]¶ canonical cluster statistics for a single run and a single probability
Parameters: - microcanonical_statistics (ndarray) – Return value of bond_microcanonical_statistics
- convolution_factors (1-D array_like) – The coefficients of the convolution for the given probabilty
p
and for each occupation numbern
.
Returns: - ret (ndarray of size
1
) – Structured array with dtype as returned by canonical_statistics_dtype - ret[‘percolation_probability’] (ndarray of float) –
The “percolation probability” of this run at the value of
p
. Only exists if microcanonical_statistics argument has thehas_spanning_cluster
field. - ret[‘max_cluster_size’] (ndarray of int) – Weighted size of the largest cluster (absolute number of sites)
- ret[‘moments’] (1-D
numpy.ndarray
of float) – Array of size5
. Thek
-th entry is the weightedk
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
.
-
percolate.hpc.
bond_initialize_canonical_averages
(canonical_statistics, **kwargs)[source]¶ Initialize the canonical averages from a single-run cluster statistics
Parameters: canonical_statistics (1-D structured ndarray) – Typically contains the canonical statistics for a range of values of the occupation probability p
. The dtype is the result of canonical_statistics_dtype.Returns: - ret (structured ndarray) – The dype is the result of canonical_averages_dtype.
- ret[‘number_of_runs’] (1-D ndarray of int) –
Equals
1
(initial run). - ret[‘percolation_probability_mean’] (1-D array of float) –
Equals
canonical_statistics['percolation_probability']
(ifpercolation_probability
is present) - ret[‘percolation_probability_m2’] (1-D array of float) –
Each entry is
0.0
- ret[‘max_cluster_size_mean’] (1-D array of float) –
Equals
canonical_statistics['max_cluster_size']
- ret[‘max_cluster_size_m2’] (1-D array of float) –
Each entry is
0.0
- ret[‘moments_mean’] (2-D array of float) –
Equals
canonical_statistics['moments']
- ret[‘moments_m2’] (2-D array of float) –
Each entry is
0.0
-
percolate.hpc.
bond_microcanonical_statistics
(perc_graph, num_nodes, num_edges, seed, spanning_cluster=True, auxiliary_node_attributes=None, auxiliary_edge_attributes=None, spanning_sides=None, **kwargs)[source]¶ Evolve a single run over all microstates (bond occupation numbers)
Return the cluster statistics for each microstate
Parameters: - perc_graph (networkx.Graph) – The substrate graph on which percolation is to take place
- num_nodes (int) – Number
N
of sites in the graph - num_edges (int) – Number
M
of bonds in the graph - seed ({None, int, array_like}) – Random seed initializing the pseudo-random number generator. Piped through to numpy.random.RandomState constructor.
- spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not.
Defaults to
True
. - auxiliary_node_attributes (optional) – Value of
networkx.get_node_attributes(graph, 'span')
- auxiliary_edge_attributes (optional) – Value of
networkx.get_edge_attributes(graph, 'span')
- spanning_sides (list, optional) – List of keys (attribute values) of the two sides of the auxiliary
nodes.
Return value of
list(set(auxiliary_node_attributes.values()))
Returns: - ret (ndarray of size
num_edges + 1
) – Structured array with dtypedtype=[('has_spanning_cluster', 'bool'), ('max_cluster_size', 'uint32'), ('moments', 'uint64', 5)]
- ret[‘n’] (ndarray of int) – The number of bonds added at the particular iteration
- ret[‘edge’] (ndarray of int) –
The index of the edge added at the particular iteration.
Note that
ret['edge'][0]
is undefined! - ret[‘has_spanning_cluster’] (ndarray of bool) –
True
if there is a spanning cluster,False
otherwise. Only exists if spanning_cluster argument is set toTrue
. - ret[‘max_cluster_size’] (int) – Size of the largest cluster (absolute number of sites)
- ret[‘moments’] (2-D
numpy.ndarray
of int) – Array of shape(num_edges + 1, 5)
. Thek
-th entry is thek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
.
See also
bond_sample_states()
,microcanonical_statistics_dtype()
,numpy.random.RandomState()
-
percolate.hpc.
bond_reduce
(row_a, row_b)[source]¶ Reduce the canonical averages over several runs
This is a “true” reducer. It is associative and commutative.
This is a wrapper around simoa.stats.online_variance.
Parameters: row_b (row_a,) – Output of this function, or initial input from bond_initialize_canonical_averages Returns: ret – Array is of dtype as returned by canonical_averages_dtype Return type: structured ndarray See also
bond_initialize_canonical_averages()
,canonical_averages_dtype()
,simoa.stats.online_variance()
-
percolate.hpc.
bond_sample_states
(perc_graph, num_nodes, num_edges, seed, spanning_cluster=True, auxiliary_node_attributes=None, auxiliary_edge_attributes=None, spanning_sides=None, **kwargs)[source]¶ Generate successive sample states of the bond percolation model
This is a generator function to successively add one edge at a time from the graph to the percolation model. At each iteration, it calculates and returns the cluster statistics. CAUTION: it returns a reference to the internal array, not a copy.
Parameters: - perc_graph (networkx.Graph) – The substrate graph on which percolation is to take place
- num_nodes (int) – Number
N
of sites in the graph - num_edges (int) – Number
M
of bonds in the graph - seed ({None, int, array_like}) – Random seed initializing the pseudo-random number generator. Piped through to numpy.random.RandomState constructor.
- spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not.
Defaults to
True
. - auxiliary_node_attributes (optional) – Return value of
networkx.get_node_attributes(graph, 'span')
- auxiliary_edge_attributes (optional) – Return value of
networkx.get_edge_attributes(graph, 'span')
- spanning_sides (list, optional) – List of keys (attribute values) of the two sides of the auxiliary
nodes.
Return value of
list(set(auxiliary_node_attributes.values()))
Yields: - ret (ndarray) –
Structured array with dtype
dtype=[('has_spanning_cluster', 'bool'), ('max_cluster_size', 'uint32'), ('moments', 'int64', 5)]
- ret[‘n’] (ndarray of int) – The number of bonds added at the particular iteration
- ret[‘edge’] (ndarray of int) –
The index of the edge added at the particular iteration
Note that in the first step, when
ret['n'] == 0
, this value is undefined! - ret[‘has_spanning_cluster’] (ndarray of bool) –
True
if there is a spanning cluster,False
otherwise. Only exists if spanning_cluster argument is set toTrue
. - ret[‘max_cluster_size’] (int) – Size of the largest cluster (absolute number of sites)
- ret[‘moments’] (1-D
numpy.ndarray
of int) – Array of size5
. Thek
-th entry is thek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
.
Raises: ValueError
– If spanning_cluster isTrue
, but graph does not contain any auxiliary nodes to detect spanning clusters.See also
numpy.random.RandomState()
,microcanonical_statistics_dtype()
Notes
Iterating through this generator is a single run of the Newman-Ziff algorithm. [12] The first iteration yields the trivial state with \(n = 0\) occupied bonds.
Spanning cluster
Raw moments of the cluster size distribution
References
[12] (1, 2) Newman, M. E. J. & Ziff, R. M. Fast monte carlo algorithm for site or bond percolation. Physical Review E 64, 016706+ (2001), doi:10.1103/physreve.64.016706. [13] Stauffer, D. & Aharony, A. Introduction to Percolation Theory (Taylor & Francis, London, 1994), second edn. [14] Binder, K. & Heermann, D. W. Monte Carlo Simulation in Statistical Physics (Springer, Berlin, Heidelberg, 2010), doi:10.1007/978-3-642-03163-2.
-
percolate.hpc.
canonical_averages_dtype
(spanning_cluster=True)[source]¶ The NumPy Structured Array type for canonical averages over several runs
Helper function
Parameters: spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not. Defaults to True
.Returns: ret – A list of tuples of field names and data types to be used as dtype
argument in numpy ndarray constructorsReturn type: list of pairs of str See also
http()
- //docs.scipy.org/doc/numpy/user/basics.rec.html
canonical_statistics_dtype()
,finalized_canonical_averages_dtype()
-
percolate.hpc.
canonical_statistics_dtype
(spanning_cluster=True)[source]¶ The NumPy Structured Array type for canonical statistics
Helper function
Parameters: spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not. Defaults to True
.Returns: ret – A list of tuples of field names and data types to be used as dtype
argument in numpy ndarray constructorsReturn type: list of pairs of str See also
http()
- //docs.scipy.org/doc/numpy/user/basics.rec.html
microcanoncial_statistics_dtype()
,canonical_averages_dtype()
-
percolate.hpc.
finalize_canonical_averages
(number_of_nodes, ps, canonical_averages, alpha)[source]¶ Finalize canonical averages
-
percolate.hpc.
finalized_canonical_averages_dtype
(spanning_cluster=True)[source]¶ The NumPy Structured Array type for finalized canonical averages over several runs
Helper function
Parameters: spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not. Defaults to True
.Returns: ret – A list of tuples of field names and data types to be used as dtype
argument in numpy ndarray constructorsReturn type: list of pairs of str
-
percolate.hpc.
microcanonical_statistics_dtype
(spanning_cluster=True)[source]¶ Return the numpy structured array data type for sample states
Helper function
Parameters: spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not. Defaults to True
.Returns: ret – A list of tuples of field names and data types to be used as dtype
argument in numpy ndarray constructorsReturn type: list of pairs of str
percolate.percolate module¶
Low-level routines to implement the Newman-Ziff algorithm
See also
percolate
- The high-level module
-
percolate.percolate.
alpha_1sigma
¶ The alpha for the 1 sigma confidence level
-
percolate.percolate.
canonical_averages
(ps, microcanonical_averages_arrays)[source]¶ Compute the canonical cluster statistics from microcanonical statistics
This is according to Newman and Ziff, Equation (2). Note that we also simply average the bounds of the confidence intervals according to this formula.
Parameters: - ps (iterable of float) – Each entry is a probability for which to form the canonical ensemble and compute the weighted statistics from the microcanonical statistics
- microcanonical_averages_arrays – Typically the output of
microcanonical_averages_arrays()
Returns: - ret (dict) – Canonical ensemble cluster statistics
- ret[‘ps’] (iterable of float) – The parameter ps
- ret[‘N’] (int) – Total number of sites
- ret[‘M’] (int) – Total number of bonds
- ret[‘spanning_cluster’] (1-D
numpy.ndarray
of float) – The percolation probability: The normalized average number of runs that have a spanning cluster. - ret[‘spanning_cluster_ci’] (2-D
numpy.ndarray
of float, size 2) – The lower and upper bounds of the percolation probability. - ret[‘max_cluster_size’] (1-D
numpy.ndarray
of float) – The percolation strength: Average relative size of the largest cluster - ret[‘max_cluster_size_ci’] (2-D
numpy.ndarray
of float) – Lower and upper bounds of the normal confidence interval of the percolation strength. - ret[‘moments’] (2-D
numpy.ndarray
of float, shape (5, M + 1)) – Average raw moments of the (relative) cluster size distribution. - ret[‘moments_ci’] (3-D
numpy.ndarray
of float, shape (5, M + 1, 2)) – Lower and upper bounds of the normal confidence interval of the raw moments of the (relative) cluster size distribution.
-
percolate.percolate.
microcanonical_averages
(graph, runs=40, spanning_cluster=True, model='bond', alpha=<Mock name='mock().__rmul__()' id='140115313080136'>, copy_result=True)[source]¶ Generate successive microcanonical percolation ensemble averages
This is a generator function to successively add one edge at a time from the graph to the percolation model for a number of independent runs in parallel. At each iteration, it calculates and returns the averaged cluster statistics.
Parameters: - graph (networkx.Graph) – The substrate graph on which percolation is to take place
- runs (int, optional) – Number of independent runs.
Defaults to
40
. - spanning_cluster (bool, optional) – Defaults to
True
. - model (str, optional) –
The percolation model (either
'bond'
or'site'
). Defaults to'bond'
.Note
Other models than
'bond'
are not supported yet. - alpha (float, optional) – Significance level.
Defaults to 1 sigma of the normal distribution.
1 - alpha
is the confidence level. - copy_result (bool, optional) – Whether to return a copy or a reference to the result dictionary.
Defaults to
True
.
Yields: - ret (dict) – Cluster statistics
- ret[‘n’] (int) – Number of occupied bonds
- ret[‘N’] (int) – Total number of sites
- ret[‘M’] (int) – Total number of bonds
- ret[‘spanning_cluster’] (float) –
The average number (Binomial proportion) of runs that have a spanning
cluster.
This is the Bayesian point estimate of the posterior mean, with a
uniform prior.
Only exists if spanning_cluster is set to
True
. - ret[‘spanning_cluster_ci’] (1-D
numpy.ndarray
of float, size 2) – The lower and upper bounds of the Binomial proportion confidence interval with uniform prior. Only exists if spanning_cluster is set toTrue
. - ret[‘max_cluster_size’] (float) – Average size of the largest cluster (absolute number of sites)
- ret[‘max_cluster_size_ci’] (1-D
numpy.ndarray
of float, size 2) – Lower and upper bounds of the normal confidence interval of the average size of the largest cluster (absolute number of sites) - ret[‘moments’] (1-D
numpy.ndarray
of float, size 5) – Thek
-th entry is the averagek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
. - ret[‘moments_ci’] (2-D
numpy.ndarray
of float, shape (5,2)) –ret['moments_ci'][k]
are the lower and upper bounds of the normal confidence interval of the averagek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
.
Raises: ValueError
– If runs is not a positive integerValueError
– If alpha is not a float in the interval (0, 1)
See also
sample_states()
,percolate.percolate._microcanonical_average_spanning_cluster()
,percolate.percolate._microcanonical_average_max_cluster_size()
Notes
Iterating through this generator corresponds to several parallel runs of the Newman-Ziff algorithm. Each iteration yields a microcanonical percolation ensemble for the number \(n\) of occupied bonds. [9] The first iteration yields the trivial microcanonical percolation ensemble with \(n = 0\) occupied bonds.
Spanning cluster
See also
Raw moments of the cluster size distribution
See also
References
[9] Newman, M. E. J. & Ziff, R. M. Fast monte carlo algorithm for site or bond percolation. Physical Review E 64, 016706+ (2001), doi:10.1103/physreve.64.016706.
-
percolate.percolate.
microcanonical_averages_arrays
(microcanonical_averages)[source]¶ Compile microcanonical averages over all iteration steps into single arrays
Helper function to aggregate the microcanonical averages over all iteration steps into single arrays for further processing
Parameters: microcanonical_averages (iterable) – Typically, this is the microcanonical_averages()
generatorReturns: - ret (dict) – Aggregated cluster statistics
- ret[‘N’] (int) – Total number of sites
- ret[‘M’] (int) – Total number of bonds
- ret[‘spanning_cluster’] (1-D
numpy.ndarray
of float) – The percolation probability: The normalized average number of runs that have a spanning cluster. - ret[‘spanning_cluster_ci’] (2-D
numpy.ndarray
of float, size 2) – The lower and upper bounds of the percolation probability. - ret[‘max_cluster_size’] (1-D
numpy.ndarray
of float) – The percolation strength: Average relative size of the largest cluster - ret[‘max_cluster_size_ci’] (2-D
numpy.ndarray
of float) – Lower and upper bounds of the normal confidence interval of the percolation strength. - ret[‘moments’] (2-D
numpy.ndarray
of float, shape (5, M + 1)) – Average raw moments of the (relative) cluster size distribution. - ret[‘moments_ci’] (3-D
numpy.ndarray
of float, shape (5, M + 1, 2)) – Lower and upper bounds of the normal confidence interval of the raw moments of the (relative) cluster size distribution.
See also
-
percolate.percolate.
percolation_graph
(graph, spanning_cluster=True)[source]¶ Prepare the (internal) percolation graph from a given graph
Helper function to prepare the given graph for spanning cluster detection (if required). Basically it strips off the auxiliary nodes and edges again. It also returns fundamental graph quantitities (number of nodes and edges).
Parameters: - graph –
- spanning_cluster –
Returns: ret
Return type:
-
percolate.percolate.
sample_states
(graph, spanning_cluster=True, model='bond', copy_result=True)[source]¶ Generate successive sample states of the percolation model
This is a generator function to successively add one edge at a time from the graph to the percolation model. At each iteration, it calculates and returns the cluster statistics.
Parameters: - graph (networkx.Graph) – The substrate graph on which percolation is to take place
- spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not.
Defaults to
True
. - model (str, optional) –
The percolation model (either
'bond'
or'site'
). Defaults to'bond'
.Note
Other models than
'bond'
are not supported yet. - copy_result (bool, optional) – Whether to return a copy or a reference to the result dictionary.
Defaults to
True
.
Yields: - ret (dict) – Cluster statistics
- ret[‘n’] (int) – Number of occupied bonds
- ret[‘N’] (int) – Total number of sites
- ret[‘M’] (int) – Total number of bonds
- ret[‘has_spanning_cluster’] (bool) –
True
if there is a spanning cluster,False
otherwise. Only exists if spanning_cluster argument is set toTrue
. - ret[‘max_cluster_size’] (int) – Size of the largest cluster (absolute number of sites)
- ret[‘moments’] (1-D
numpy.ndarray
of int) – Array of size5
. Thek
-th entry is thek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
.
Raises: ValueError
– If model does not equal'bond'
.ValueError
– If spanning_cluster isTrue
, but graph does not contain any auxiliary nodes to detect spanning clusters.
See also
microcanonical_averages()
- Evolves multiple sample states in parallel
Notes
Iterating through this generator is a single run of the Newman-Ziff algorithm. [2] The first iteration yields the trivial state with \(n = 0\) occupied bonds.
Spanning cluster
Raw moments of the cluster size distribution
References
[2] (1, 2) Newman, M. E. J. & Ziff, R. M. Fast monte carlo algorithm for site or bond percolation. Physical Review E 64, 016706+ (2001), doi:10.1103/physreve.64.016706. [3] Stauffer, D. & Aharony, A. Introduction to Percolation Theory (Taylor & Francis, London, 1994), second edn. [4] Binder, K. & Heermann, D. W. Monte Carlo Simulation in Statistical Physics (Springer, Berlin, Heidelberg, 2010), doi:10.1007/978-3-642-03163-2.
-
percolate.percolate.
single_run_arrays
(spanning_cluster=True, **kwargs)[source]¶ Generate statistics for a single run
This is a stand-alone helper function to evolve a single sample state (realization) and return the cluster statistics.
Parameters: - spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not.
Defaults to
True
. - kwargs (keyword arguments) – Piped through to
sample_states()
Returns: - ret (dict) – Cluster statistics
- ret[‘N’] (int) – Total number of sites
- ret[‘M’] (int) – Total number of bonds
- ret[‘max_cluster_size’] (1-D
numpy.ndarray
of int, sizeret['M'] + 1
) – Array of the sizes of the largest cluster (absolute number of sites) at the respective occupation number. - ret[‘has_spanning_cluster’] (1-D
numpy.ndarray
of bool, sizeret['M'] + 1
) – Array of booleans for each occupation number. The respective entry isTrue
if there is a spanning cluster,False
otherwise. Only exists if spanning_cluster argument is set toTrue
. - ret[‘moments’] (2-D
numpy.ndarray
of int) – Array of shape(5, ret['M'] + 1)
. The(k, m)
-th entry is thek
-th raw moment of the (absolute) cluster size distribution, withk
ranging from0
to4
, at occupation numberm
.
See also
- spanning_cluster (bool, optional) – Whether to detect a spanning cluster or not.
Defaults to
-
percolate.percolate.
spanning_1d_chain
(length)[source]¶ Generate a linear chain with auxiliary nodes for spanning cluster detection
Parameters: length (int) – Number of nodes in the chain, excluding the auxiliary nodes. Returns: A linear chain graph with auxiliary nodes for spanning cluster detection Return type: networkx.Graph See also
sample_states()
- spanning cluster detection
-
percolate.percolate.
spanning_2d_grid
(length)[source]¶ Generate a square lattice with auxiliary nodes for spanning detection
Parameters: length (int) – Number of nodes in one dimension, excluding the auxiliary nodes. Returns: A square lattice graph with auxiliary nodes for spanning cluster detection Return type: networkx.Graph See also
sample_states()
- spanning cluster detection
Module contents¶
Implements the Newman-Ziff algorithm for Monte Carlo simulation of percolation
This module implements the Newman-Ziff algorithm for Monte Carlo simulation of Bernoulli percolation on arbitrary graphs.
The percolate
module provides these high-level functions from the
percolate.percolate
module:
percolate.sample_states (graph[, ...]) |
Generate successive sample states of the percolation model |
percolate.single_run_arrays ([spanning_cluster]) |
Generate statistics for a single run |
percolate.microcanonical_averages (graph[, ...]) |
Generate successive microcanonical percolation ensemble averages |
percolate.microcanonical_averages_arrays (...) |
Compile microcanonical averages over all iteration steps into single arrays |
percolate.canonical_averages (ps, ...) |
Compute the canonical cluster statistics from microcanonical statistics |
percolate.spanning_1d_chain (length) |
Generate a linear chain with auxiliary nodes for spanning cluster detection |
percolate.spanning_2d_grid (length) |
Generate a square lattice with auxiliary nodes for spanning detection |
percolate.statistics (graph, ps[, ...]) |
Helper function to compute percolation statistics |
See also
percolate.percolate
- low-level functions
Notes
Currently, the module only implements bond percolation. Spanning cluster detection is implemented, but wrapping detection is not.
The elementary unit of computation is the sample state:
This is one particular realization with a given number of edges—one member of
the microcanonical ensemble.
As Newman & Ziff suggest [1], the module evolves a sample state by
successively adding edges, in a random but predetermined order.
This is implemented as a generator function sample_states()
to iterate
over.
Each step of the iteration adds one edge.
A collection of sample states (realizations) evolved in parallel form a
microcanonical ensemble at each iteration step.
A microcanonical ensemble is hence a collection of different sample states
(realizations) but with the same number of edges (occupation number).
The microcanonical_averages()
generator function evolves a microcanonical
ensemble.
At each step, it calculates the cluster statistics over all realizations in the
ensemble.
The microcanonical_averages_arrays()
helper function collects these
statistics over all iteration steps into single numpy arrays.
Finally, the canonical_averages()
function calculates the statistics of
the canonical ensemble from the microcanonical ensembles.
References
[1] | Newman, M. E. J. & Ziff, R. M. Fast monte carlo algorithm for site or bond percolation. Physical Review E 64, 016706+ (2001), 10.1103/physreve.64.016706 |
Bibliography¶
[AC98] | Alan Agresti and Brent A. Coull. Approximate Is Better than “Exact” for Interval Estimation of Binomial Proportions. The American Statistician, 52(2):119–126, 1998. URL: http://dx.doi.org/10.2307/2685469, doi:10.2307/2685469. |
[BH10] | Kurt Binder and Dieter W. Heermann. Monte Carlo Simulation in Statistical Physics. Springer, Berlin, Heidelberg, 2010. URL: http://dx.doi.org/10.1007/978-3-642-03163-2, doi:10.1007/978-3-642-03163-2. |
[BH57] | S. R. Broadbent and J. M. Hammersley. Percolation processes. I. Crystals and mazes. Mathematical Proceedings of the Cambridge Philosophical Society, 53(03):629–641, July 1957. URL: http://dx.doi.org/10.1017/s0305004100032680, doi:10.1017/s0305004100032680. |
[Cam11] | Ewan Cameron. On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy: The Simplicity and Superiority of the Bayesian Approach. Publications of the Astronomical Society of Australia, 28:128–139, 2011. URL: http://dx.doi.org/10.1071/as10046, doi:10.1071/as10046. |
[DCB01] | Anirban DasGupta, Tony T. Cai, and Lawrence D. Brown. Interval Estimation for a Binomial Proportion. Statistical Science, 16(2):101–133, 2001. URL: http://dx.doi.org/10.1214/ss/1009213286, doi:10.1214/ss/1009213286. |
[Fis67] | Michael E. Fisher. The theory of condensation and the critical point. Physics, 3(5):255+, 1967. |
[Flo41] | Paul J. Flory. Molecular Size Distribution in Three Dimensional Polymers. I. Gelation. Journal of the American Chemical Society, 63(11):3083–3090, 1941. URL: http://dx.doi.org/10.1021/ja01856a061, doi:10.1021/ja01856a061. |
[HEG14] | Allen Hunt, Robert Ewing, and Behzad Ghanbarian. Percolation Theory for Flow in Porous Media. volume 880 of Lecture Notes in Physics. Springer, Cham, Switzerland, third edition, 2014. URL: http://dx.doi.org/10.1007/978-3-319-03771-4, doi:10.1007/978-3-319-03771-4. |
[NZ01] | M. E. J. Newman and R. M. Ziff. Fast monte carlo algorithm for site or bond percolation. Physical Review E, 64(1):016706+, 2001. URL: http://dx.doi.org/10.1103/physreve.64.016706, arXiv:cond-mat/0101295, doi:10.1103/physreve.64.016706. |
[Pei14] | Tiago P. Peixoto. The graph-tool python library. figshare, 2014. URL: http://figshare.com/articles/graph_tool/1164194, doi:10.6084/m9.figshare.1164194. |
[Sta79] | D. Stauffer. Scaling theory of percolation clusters. Physics Reports, 54(1):1–74, 1979. URL: http://dx.doi.org/10.1016/0370-1573(79)90060-7, doi:10.1016/0370-1573(79)90060-7. |
[SA94] | Dietrich Stauffer and Amnon Aharony. Introduction to Percolation Theory. Taylor & Francis, London, second edition, 1994. URL: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0748402535. |
[Sto43] | Walter H. Stockmayer. Theory of Molecular Size Distribution and Gel Formation in Branched-Chain Polymers. The Journal of Chemical Physics, 11(2):45–55, 1943. URL: http://dx.doi.org/10.1063/1.1723803, doi:10.1063/1.1723803. |
[Was04] | Larry Wasserman. All of Statistics. Springer New York, 2004. URL: http://dx.doi.org/10.1007/978-0-387-21736-9, doi:10.1007/978-0-387-21736-9. |