More sophisticated data handling in PyNN
13th December 2011
In all versions of the PyNN API up to the current one, 0.7, recorded data, whether spikes or state variables such as the membrane potential, is handled as 2 x N NumPy arrays, where the second column contains the value (spike time, membrane potential) and the first column contains the ID of the neuron in which it was recorded. These arrays contain only the raw data, with none of the essential metadata for further analysis, such as the start and end time of the recording (needed for analysis of spike trains), the sampling interval, or the units, nor non-essential-but-useful metadata such as "which population did this data come from".
Also, in versions prior to 0.7, only spikes, membrane potential and synaptic conductances could be recorded, which was reflected in having specific methods (of the Population class) for each:
- record(), getSpikes() and printSpikes()
- record_v(), get_v() and print_v()
- record_gsyn(), get_gsyn() and print_gsyn()
In version 0.7, we added the ability to record any variable from a neuron, but there wasn't really any API support for it, just the semi-private method _record(variable) and the not-supposed-to-be-exposed population.recorders['variable'].get().
For version 0.8, we would like to make data handling both simpler and more powerful. First, the API will be simplified by reducing the number of Population methods to just record(variables), get_data() and write_data(). The old methods will be deprecated for 0.8 and will be removed in 0.9. Thus, we can replace
>>> p.record() # record spikes
>>> p.record_v() # record membrane potential
>>> p._record('foo') # record state variable `foo`
with
>>> p.record(['spikes', 'v', 'foo'])
and
>>> data_spikes = p.getSpikes() >>> data_v = p.get_v() >>> data_foo = p.recorders['foo'].get()
with
>>> data = p.get_data()
Since data now must contain the data for three different variables, we need something more than a single NumPy array. It would be possible to just return a tuple of three arrays, but since we also want to be able to include more metadata we propose to make use of the Neo package (http://neuralensemble.org/trac/neo), an attempt to develop a common Python data model for neurophysiology data (whether real or simulated), and return a Neo Block object. For more information on Neo, see the documentation at http://packages.python.org/neo. Here, it will suffice to note that a Block is the top-level container, and contains one or more Segments. Each Segment is a container for data sharing a common time basis, and can contain lists of AnalogSignal, AnalogSignalArray and SpikeTrain objects. These data objects inherit from NumPy array, and so can be treated in further processing (analysis, visualization, etc.) in exactly the same way as before, but in addition they carry metadata about units, sampling interval, etc.
Note that I am in general reluctant to add required dependencies to PyNN beyond the essential, like NumPy, because they risk adding to the difficulty of installation, and hence of putting people off before they've even started. Neo adds only one pure-Python dependency on top of NumPy - the Quantities package (http://pypi.python.org/pypi/quantities), and I feel that the small potential increase in installation difficulty is more than outweighed by the advantages that using Neo brings.
Let's have an example of using PyNN with the planned API version 0.8:
import pyNN.neuron as sim # can of course replace `neuron` with `nest`, `brian`, etc.
import matplotlib.pyplot as plt
import numpy as np
sim.setup(timestep=0.01)
p_in = sim.Population(10, sim.SpikeSourcePoisson, {'rate': 10.0}, label="input")
p_out = sim.Population(10, sim.EIF_cond_exp_isfa_ista, label="AdExp neurons")
random = sim.FixedProbabilityConnector(p_connect=0.5, weights=0.05)
connections = sim.Projection(p_in, p_out, method=random, target='excitatory')
p_in.record('spikes')
p_out.record('spikes') # record spikes from all neurons
p_out[0:2].record(['v', 'w', 'gsyn_exc']) # record other variables from first two neurons
sim.run(500.0)
spikes_in = p_in.get_data()
data_out = p_out.get_data()
fig_settings = {
'lines.linewidth': 0.5,
'axes.linewidth': 0.5,
'axes.labelsize': 'small',
'legend.fontsize': 'small',
'font.size': 8
}
plt.rcParams.update(fig_settings)
plt.figure(1, figsize=(6,8))
def plot_spiketrains(segment):
for spiketrain in segment.spiketrains:
y = np.ones_like(spiketrain) * spiketrain.annotations['source_id']
plt.plot(spiketrain, y, '.')
plt.ylabel(segment.name)
plt.setp(plt.gca().get_xticklabels(), visible=False)
def plot_signal(signal, index, colour='b'):
label = "Neuron %d" % signal.annotations['source_ids'][index]
plt.plot(signal.times, signal[:, index], colour, label=label)
plt.ylabel("%s (%s)" % (signal.name, signal.units._dimensionality.string))
plt.setp(plt.gca().get_xticklabels(), visible=False)
plt.legend()
n_panels = sum(a.shape[1] for a in data_out.segments[0].analogsignalarrays) + 2
plt.subplot(n_panels, 1, 1)
plot_spiketrains(spikes_in.segments[0])
plt.subplot(n_panels, 1, 2)
plot_spiketrains(data_out.segments[0])
panel = 3
for array in data_out.segments[0].analogsignalarrays:
for i in range(array.shape[1]):
plt.subplot(n_panels, 1, panel)
plot_signal(array, i, colour='bg'[panel%2])
panel += 1
plt.xlabel("time (%s)" % array.times.units._dimensionality.string)
plt.setp(plt.gca().get_xticklabels(), visible=True)
plt.savefig("neo_example.png")
Here we can see that a segment contains a list of spiketrains and a list of signal arrays recorded by PyNN. Each SpikeTrain and AnalogSignalArray object has metadata, such as units, variable name, which allows for automated axis labelling.
The adoption of Neo as an output representation also makes it easier to handle data when running multiple simulations with the same network, calling reset() between each run. Before, it was necessary to call getSpikes(), get_v() before every reset(), and take care of storing the resulting data. Now, each run just creates a new Neo Segment, and PyNN takes care of storing the data until it is needed. This is illustrated in the example below. [Note that if you still want to retrieve the data after every run you can do so: just call get_data(clear=True)]
import pyNN.nest as sim # can of course replace `nest` with `neuron`, `brian`, etc.
import matplotlib.pyplot as plt
from quantities import nA
sim.setup()
cell = sim.Population(1, sim.HH_cond_exp)
step_current = sim.DCSource(start=20.0, stop=80.0)
step_current.inject_into(cell)
cell.record('v')
for amp in (-0.2, -0.1, 0.0, 0.1, 0.2):
step_current.amplitude = amp
sim.run(100.0)
sim.reset(annotations={"amplitude": amp*nA})
data = cell.get_data()
sim.end()
for segment in data.segments:
vm = segment.analogsignalarrays[0]
plt.plot(vm.times, vm,
label=str(segment.annotations["amplitude"]))
plt.legend(loc="upper left")
plt.xlabel("Time (%s)" % vm.times.units._dimensionality)
plt.ylabel("Membrane potential (%s)" % vm.units._dimensionality)
plt.savefig("reset_example.png")
This change to data handling could potentially have a large impact on existing workflows. For this reason, I would really like to have people test it in real world scenarios before moving this functionality to trunk. For now, it is in a Subversion branch at https://neuralensemble.org/svn/PyNN/branches/neo_output. Please give it a try and let me know how it works for you. [Note: at the moment, it is only implemented for the neuron and nest backends, and does not yet work with MPI (gather functionality not implemented)].
Attachments
-
neo_example.png
(80.3 KB) - added by apdavison
5 months ago.
-
reset_example_nest.png
(52.4 KB) - added by apdavison
5 months ago.


