Skip to Main Content

YSPH Biostatistics Seminar: “Motif Expansion of Global Information Flow in Spike Train Data”

September 15, 2022
  • 00:00<v Robert>Good afternoon.</v>
  • 00:01In respect for everybody's time today,
  • 00:04let's go ahead and get started.
  • 00:07So today it is my pleasure to introduce,
  • 00:09Dr. Alexander Strang.
  • 00:12Dr. Strang earned his bachelor's in Mathematics and Physics,
  • 00:16as well as his PhD in Applied Mathematics
  • 00:19from Case Western Reserve University in Cleveland, Ohio.
  • 00:24Born in Ohio, so representer.
  • 00:27He studies variational inference problems,
  • 00:29noise propagation in biological networks,
  • 00:32self-organizing edge flows,
  • 00:34and functional form game theory
  • 00:36at the University of Chicago,
  • 00:38where he is a William H. Kruskal Instructor of Statistics,
  • 00:42and Applied Mathematics.
  • 00:43Today he is going to talk to us about motivic expansion
  • 00:47of global information flow in spike train data.
  • 00:50Let's welcome our speaker.
  • 00:54<v ->Okay, thank you very much.</v>
  • 00:56Thank you first for the kind invite,
  • 00:59and for the opportunity to speak here in your seminar.
  • 01:03So I'd like to start with some acknowledgements.
  • 01:06This is very much a work in progress.
  • 01:09Part of what I'm going to be showing you today
  • 01:11is really the work of a master's student
  • 01:12that I've been working with this summer, that's Bowen.
  • 01:15And really I'd like to thank Bowen
  • 01:16for a lot of the simulation
  • 01:18and a lot of the TE calculation I'll show you later.
  • 01:21This project more generally was born
  • 01:22out of conversations with Brent Doiron
  • 01:24and Lek-Heng Lim here at Chicago.
  • 01:27Brent really was the inspiration for
  • 01:30starting to venture into computational neuroscience.
  • 01:33I'll really say that that I am new to this world,
  • 01:35it's a world that's exciting to me,
  • 01:37but really is a world that I am actively exploring
  • 01:41and learning about.
  • 01:42So I look forward to conversations afterwards
  • 01:44to learn more here.
  • 01:46My background was much more inspired by Lek-Heng's work
  • 01:49in computational topology.
  • 01:52And some of what I'll be presenting today
  • 01:54is really inspired by conversations with him.
  • 01:58So let's start with some introduction and motivation.
  • 02:01The motivation personally for this talk,
  • 02:05so it goes back really to work that I started
  • 02:06when I was a graduate student,
  • 02:08I've had this sort of long standing interest
  • 02:10in the interplay between structure and dynamics,
  • 02:12in particular on networks.
  • 02:14I've done interesting questions like,
  • 02:16how does the structure of a network determine dynamics
  • 02:18of processes on that network?
  • 02:21And in turn, how do processes on that network
  • 02:24give rise to structure?
  • 02:30On the biological side,
  • 02:32today's talk I'm going to be focusing on
  • 02:34sort of applications of this question
  • 02:36within neural networks.
  • 02:38And I think that this sort of world of
  • 02:39computational neuroscience is really exciting
  • 02:41if you're interested in this interplay
  • 02:42between structure and dynamics
  • 02:44because neural networks encode, transmit
  • 02:46and process information via dynamical processes.
  • 02:49For example, the process, the dynamical process
  • 02:53of a neural network is directed by the wiring patterns
  • 02:56by the structure of that network.
  • 02:58And moreover, if you're talking
  • 03:00about some sort of learning process,
  • 03:01then those wiring patterns can change
  • 03:03and adapt during the learning process,
  • 03:05so that are themselves dynamic.
  • 03:08In this area I've been interested in questions like,
  • 03:10how is the flow of information governed
  • 03:12by the wiring pattern?
  • 03:14How do patterns of information flow
  • 03:16present themselves in data?
  • 03:17And can they be inferred from that data?
  • 03:19And what types of wiring patterns
  • 03:21might develop during learning?
  • 03:24Answering questions of this type requires
  • 03:26a couple of things.
  • 03:26Sort of very big picture, it requires a language
  • 03:29for describing structures and patterns.
  • 03:31It requires having a dynamical process,
  • 03:33some sort of model for the neural net,
  • 03:35and it requires a generating model
  • 03:38that generates initial structure
  • 03:40and links the structure to dynamics.
  • 03:42Alternatively, if we don't generate things using a model,
  • 03:45if we have some sort of observable or data,
  • 03:47then we can try to work in the other direction
  • 03:49and go from dynamics to structure.
  • 03:52Today during this talk,
  • 03:53I'm gonna be focusing really on this first piece,
  • 03:55a language for describing structures and patterns.
  • 03:57And on the second piece on sort of an observable
  • 04:00that I've been working on trying to compute to use,
  • 04:04to try to connect these three points together.
  • 04:08So to get started, a little bit of biology.
  • 04:10Really I was inspired in this project
  • 04:12by a paper from K.G. Mura.
  • 04:14Here he was looking at a coupled oscillator model,
  • 04:17this is a Kuramoto model for neural activity
  • 04:20where the firing dynamics interact with the wiring.
  • 04:22So the wiring in the couples,
  • 04:25the oscillators would adapt on a slower time scale
  • 04:29than the oscillators themselves.
  • 04:31And that adaptation could represent
  • 04:34different types of learning processes.
  • 04:36For example, the fire-together wire-together rules
  • 04:39or Hebbian learning,
  • 04:41you can look at causal learning rules,
  • 04:43or anti-Hebbian learning rules.
  • 04:45And this is just an example I've run of this system.
  • 04:48This system of OD is sort of interesting
  • 04:50because it can generate all sorts of different patterns.
  • 04:52You can see synchronized firing,
  • 04:54you can see traveling waves,
  • 04:55you can see chaos,
  • 04:57these occur at different sort of critical boundaries.
  • 04:59So you can see phase transitions
  • 05:01when you have large collections of these oscillators.
  • 05:04And depending on how they're coupled together,
  • 05:05it behaves differently.
  • 05:07In particular some of what's interesting here is
  • 05:09that starting from some random seed topology,
  • 05:13the dynamics play forward from that initial condition,
  • 05:16and that random seed topology produces some ensemble of
  • 05:19wiring patterns that are of themselves random.
  • 05:22You can think about the ensemble of wiring patterns
  • 05:24as being chaotic,
  • 05:25sort of realizations of some random initialization.
  • 05:29That said, you can also observe structures
  • 05:32within the systems of coupled oscillators.
  • 05:33So you could see large scale cyclic structures
  • 05:36representing organized causal firing patterns
  • 05:38in certain regimes.
  • 05:40So this is a nice example where graph structure
  • 05:42and learning parameters can determine dynamics,
  • 05:44and in turn where those dynamics can determine structure.
  • 05:48On the other side, you can also think
  • 05:49about a data-driven side instead of a model-driven side.
  • 05:53If we empirically observe sample trajectories
  • 05:56of some observables, for example, neuron recordings,
  • 05:58then we might hope to infer something
  • 05:59about the connectivity that generates them.
  • 06:01And so here instead of starting by posing a model,
  • 06:04and then simulating it and studying how it behaves,
  • 06:06we can instead study data or try to study structure in data.
  • 06:10Often that data comes in the form of covariance matrices
  • 06:12representing firing rates.
  • 06:14And these covariance matrices maybe auto covariance matrices
  • 06:17with some sort of time lag.
  • 06:19Here there are a couple of standard structural approaches,
  • 06:22so there is a motivic expansion approach.
  • 06:25This was at least introduced by Brent Doiron's lab
  • 06:28with his student, Gay Walker.
  • 06:30Here the idea is that you define some graph motifs,
  • 06:34and then you can study the dynamics
  • 06:36in terms of those graph motifs.
  • 06:38For example, if you build a power series in those motifs,
  • 06:41then you can try to represent your covariance matrices
  • 06:44in terms of that power series.
  • 06:45And this is something we're gonna talk
  • 06:46about quite a bit today.
  • 06:47This is really part of why I was inspired by this work is,
  • 06:49I had been working separately on the idea of
  • 06:51looking at covariance matrices
  • 06:53in terms of these power series expansions.
  • 06:56This is also connected to topological data analysis,
  • 06:59and this is where the conversations with Lek-Heng
  • 07:01played a role in this work.
  • 07:03Topological data analysis aims to construct graphs,
  • 07:07representing dynamical sort of systems.
  • 07:08For example, you might look at the dynamical similarity
  • 07:11of firing patterns of certain neurons,
  • 07:13and then tries to study the topology of those graphs.
  • 07:18Again, this sort of leads to similar questions,
  • 07:20but we can be a little bit more precise here
  • 07:21for thinking in neuroscience.
  • 07:24We can say more precisely, for example,
  • 07:25how is information processing and transfer represented,
  • 07:29both in these covariance matrices and the structures
  • 07:32that we hope to extract from them.
  • 07:33In particular, can we try
  • 07:35and infer causality from firing patterns?
  • 07:39And this is fundamentally an information theoretic question.
  • 07:42Really we're asking, can we study
  • 07:43the directed exchange of information from trajectories?
  • 07:47Here one approach, I mean, in some sense
  • 07:49you can never tell causality without some underlying model,
  • 07:53without some underlying understanding of the mechanism.
  • 07:56So if all we can do is observe,
  • 07:58then we need to define what we mean by causality.
  • 08:01A reasonable sort of standard definition here
  • 08:03is Wiener Causality,
  • 08:04which says that two times series share a causal relation,
  • 08:06so we say X causes Y,
  • 08:08if the history of X informs a future of Y.
  • 08:12Note that here "cause" put in quotes,
  • 08:14really means forecasts.
  • 08:15That means that the past or the present of X
  • 08:18carries relevant information about the future of Y.
  • 08:22A natural measure of that information is transfer entropy.
  • 08:26Transfer entropy was introduced by Schreiber in 2000,
  • 08:30and it's the expected KL divergence
  • 08:32between the distribution of the future of Y
  • 08:35given the history of X
  • 08:38and the marginal distribution of the future of Y.
  • 08:41So essentially it's how much predictive information
  • 08:43does X carry about Y?
  • 08:46This is a nice quantity for a couple of reasons.
  • 08:48First, it's zero when two trajectories are independent.
  • 08:51Second, since it's just defined in terms of
  • 08:53these conditional distributions, it's model free.
  • 08:56So I put here no with a star because this,
  • 08:59the generative assumptions actually do matter
  • 09:01when you go to try and compute it,
  • 09:02but in principle it's defined independent of the model.
  • 09:05Again, unlike some other effective causality measures,
  • 09:07it doesn't require introducing some time lag to define.
  • 09:11It's a naturally directed quantity, right?
  • 09:13We can say that the future of Y
  • 09:15conditioned on the past of X and...
  • 09:17That transfer entropy is defined on the terms of
  • 09:20the future of Y conditioned on the past of X and Y.
  • 09:23And that quantity is directed because reversing X and Y,
  • 09:27it does not sort of symmetrically change this statement.
  • 09:30This is different than quantities
  • 09:31like mutual information or correlation
  • 09:32that are also often used
  • 09:34to try to measure effective connectivity in networks,
  • 09:37which are fundamentally symmetric quantities.
  • 09:41Transfer entropy is also less corrupted
  • 09:43by measurement noise, linear mixing of signals,
  • 09:46or shared coupling to external sources.
  • 09:50Lastly, and maybe most interestingly,
  • 09:52if we think in terms of correlations,
  • 09:54correlations are really moments,
  • 09:56correlations are really about covariances, right?
  • 09:57Second order moments.
  • 09:59Transfer entropies, these are about entropies,
  • 10:01these are sort of logs of distributions,
  • 10:04and so they depend on the full shape of these distributions,
  • 10:06which means that transfer entropy can capture coupling
  • 10:10that is maybe not apparent or not obvious,
  • 10:13just looking at a second order moment type analysis.
  • 10:17So transfer entropy has been applied pretty broadly.
  • 10:20It's been applied to spiking cortical networks
  • 10:22and calcium imaging,
  • 10:24to MEG data in motor tasks and auditory discrimination.
  • 10:29It's been applied to motion recognition,
  • 10:31precious metal prices
  • 10:32and multivariate time series forecasting,
  • 10:34and more recently to accelerate learning
  • 10:36in different artificial neural nets.
  • 10:38So you can look at feedforward architectures,
  • 10:40convolution architectures, even recurrent neural nets,
  • 10:42and transfer entropy has been used
  • 10:44to accelerate learning in those frameworks.
  • 10:49For this part of the talk,
  • 10:50I'd like to focus really on two questions.
  • 10:52First, how do we compute transfer entropy?
  • 10:55And then second, if we could compute transfer entropy
  • 10:58and build a graph out of that,
  • 11:00how would we study the structure of that graph?
  • 11:01Essentially, how is information flow structured?
  • 11:05We'll start with computing in transfer entropy.
  • 11:09To compute transfer entropy,
  • 11:10we actually need to write down an equation.
  • 11:13So transfer entropy was originally introduced
  • 11:14for discrete time arbitrary order Markov processes.
  • 11:18So suppose we have two Markov processes: X and Y,
  • 11:21and we'll let X of N denote the state of
  • 11:23process X at time N,
  • 11:25and XNK where the K is in superscript to denote the sequence
  • 11:29starting from N minus K plus one going up to N.
  • 11:32So that's sort of the last K states
  • 11:35that the process X visited,
  • 11:37then the transfer entropy from Y to X,
  • 11:40they're denoted T, Y arrow to X
  • 11:45is the entropy of the future of X conditioned on its past
  • 11:50minus the entropy of the future of X conditioned on its past
  • 11:54and the past of the trajectory Y.
  • 11:56So here you can think the transfer entropy
  • 11:57is essentially the reduction in entropy
  • 11:59of the future states of X
  • 12:00when incorporating the past of Y.
  • 12:03This means that computing transfer entropy
  • 12:05reduces to estimating essentially these entropies.
  • 12:07That means we need to be able to estimate
  • 12:08essentially the conditional distributions
  • 12:10inside of these parentheses.
  • 12:14That's easy for certain processes.
  • 12:15So for example, if X and Y are Gaussian processes,
  • 12:19then really what we're trying to compute
  • 12:20is conditional mutual information.
  • 12:22And there are nice equations
  • 12:23for conditional mutual information
  • 12:25when you have Gaussian random variables.
  • 12:26So if I have three Gaussian random variables: X, Y, Z,
  • 12:29possibly multivariate with joint covariance sigma,
  • 12:33then the conditional mutual information
  • 12:35between these variables,
  • 12:36so the mutual information between X and Y conditioned on Z
  • 12:39is just given by this ratio of log determinants
  • 12:42of those covariances.
  • 12:45In particular, a common test model used
  • 12:48in sort of the transfer entropy literature
  • 12:51are linear auto-regressive processes
  • 12:53because a linear auto-regressive process
  • 12:55when perturbed by Gaussian noise
  • 12:57produces a Gaussian process.
  • 12:58All of the different
  • 12:59joint marginal conditional distributions are all Gaussian,
  • 13:02which means that we can compute
  • 13:03these covariances analytically,
  • 13:05which then means that you can compute
  • 13:06the transfer entropy analytically.
  • 13:07So these linear auto-regressive processes
  • 13:09are nice test cases
  • 13:10because you can do everything analytically.
  • 13:12They're also somewhat disappointing or somewhat limiting
  • 13:15because in this linear auto-regressive case,
  • 13:17transfer entropy is the same as Granger causality.
  • 13:22And in this Gaussian case,
  • 13:24essentially what we've done is we've reduced
  • 13:26transfer entropy to a study of time-lagged correlations,
  • 13:29so this becomes the same
  • 13:30as sort of a correlation based analysis,
  • 13:32we can't incorporate information beyond the second moments,
  • 13:34if we restrict ourselves to Gaussian processes
  • 13:36or Gaussian approximations.
  • 13:39The other thing to note is this is strongly model-dependent
  • 13:41because this particular formula
  • 13:43for computing mutual information
  • 13:44depends on having Gaussian distributions.
  • 13:50In a more general setting or a more empirical setting,
  • 13:53you might observe some data.
  • 13:55You don't know if that data
  • 13:56comes from some particular process,
  • 13:58so you can't necessarily assume
  • 13:59that conditional distributions are Gaussian,
  • 14:01but we would still like to estimate transfer entropy,
  • 14:03which leads to the problem of estimating transfer entropy
  • 14:06given an observed time series.
  • 14:08We would like to do this again, sans some model assumption,
  • 14:10so we don't wanna assume Gaussianity.
  • 14:13This is sort of trivial,
  • 14:14again, I star that in discrete state spaces
  • 14:17because essentially it amounts to counting occurrences,
  • 14:20but it becomes difficult whenever the state spaces are large
  • 14:23and/or high dimensional as they often are.
  • 14:26This leads to a couple of different approaches.
  • 14:28So as a first example, let's consider spike train data.
  • 14:32So spike train data consists essentially of
  • 14:35binning the state of a neuron into either on or off.
  • 14:39So neurons, you can think either in the state zero or one,
  • 14:41and then a pair wise calculation for transfer entropy
  • 14:44only requires estimating a joint probability distribution
  • 14:48over four to the K plus L states where K plus L,
  • 14:51K is the history of X that we remember,
  • 14:54and L is the history of Y.
  • 14:57So if sort of the Markov process
  • 14:59generating the spike train data is not of high order,
  • 15:02does not have a long memory,
  • 15:04then these K and L can be small,
  • 15:06and this state space is fairly small,
  • 15:08so this falls into that first category
  • 15:10when we're looking at a discrete state space,
  • 15:12and it's not too difficult.
  • 15:15In a more general setting,
  • 15:16if we don't try to bin the states
  • 15:18of the neurons to on or off,
  • 15:19for example, maybe we're looking at a firing rate model
  • 15:22where we wanna look at the firing rates of the neurons,
  • 15:24and that's a continuous random variable,
  • 15:27then we need some other types of estimators.
  • 15:29So the common estimator used here
  • 15:31is a kernel density estimator, a KSG estimator,
  • 15:34and this is designed for large continuous
  • 15:36or high dimensional state spaces,
  • 15:37e.g. sort of these firing rate models.
  • 15:40Typically the approach is to employ a Takens delay map,
  • 15:43which embeds your high dimensional data
  • 15:45in some sort of lower dimensional space
  • 15:48that tries to capture the intrinsic dimension
  • 15:50of the attractor that your dynamic process settles onto.
  • 15:55And then you try to estimate an unknown density
  • 15:57based on this delay map using a k-nearest
  • 15:59neighbor kernel density estimate.
  • 16:01The advantage of this sort of
  • 16:03k-nearest neighbor kernel density is
  • 16:05that it dynamically adapts the width of the kernel,
  • 16:07giving your sample density.
  • 16:09And this has been implemented in some open source toolkits,
  • 16:11these are the toolkits that we've been working with.
  • 16:15So we've tested this in a couple of different models,
  • 16:18and really I'd say this work,
  • 16:19this is still very much work in progress,
  • 16:20this is work that Bowen was developing over the summer.
  • 16:23And so we developed a couple different models to test.
  • 16:26The first were these Linear Auto-Regressive Networks,
  • 16:29and we just used these to test
  • 16:31the accuracy of the estimators
  • 16:32because everything here is Gaussian,
  • 16:33so you can compute the necessary
  • 16:35transfer entropies analytically.
  • 16:37The next more interesting class of networks
  • 16:39are Threshold Linear Networks or TLNs.
  • 16:42These are a firing rate model where your rate R
  • 16:44obeys this sarcastic differential equation.
  • 16:47So the rate of change and the rate, DR of T is,
  • 16:51so you have sort of a leak term, negative RFT,
  • 16:53and then plus here, this is essentially a coupling.
  • 16:57All of this is inside here, the brackets with a plus,
  • 17:00this is like a ReLU function,
  • 17:02so this is just taking the positive part
  • 17:04of what's on the inside.
  • 17:05Here B is an activation threshold,
  • 17:08W is a wiring matrix,
  • 17:09and then R are those rates, again.
  • 17:11And then C here, that's essentially covariance
  • 17:13for some noise term for terming this process,
  • 17:17we use these TLNs to test the sensitivity
  • 17:19of our transfer entropy estimators
  • 17:21to common and private noise sources as you change C,
  • 17:24as well as sort of how well the entropy network
  • 17:26agrees with the wiring matrix.
  • 17:31A particular class of TLNs
  • 17:32were really nice for these experiments
  • 17:35what are called Combinatorial Threshold Linear Networks.
  • 17:37These are really pretty new,
  • 17:38these were introduced by Carina Curto's lab this year,
  • 17:42and really this was inspired by a talk
  • 17:45I'd seen her give at FACM in May.
  • 17:49These are threshold linear networks
  • 17:51where the weight matrix here, W,
  • 17:52representing the wiring of the neurons
  • 17:55is determined by a directed graph G.
  • 17:58So you start with some directed graph G,
  • 18:00that's what's shown here on the left.
  • 18:01This figure is adapted from Carina's paper,
  • 18:03this is a very nice paper
  • 18:04if you'd like to take a look at it.
  • 18:07And if I and J are not connected,
  • 18:10then the weight matrix is assigned one value;
  • 18:12and if they are connected, then it's assigned another value,
  • 18:14and the wiring is zero if I equals J.
  • 18:18These networks are nice
  • 18:20if we wanna test structural hypotheses
  • 18:22because it's very easy to predict from the input graph
  • 18:25how the output dynamics of the network should behave,
  • 18:28and they are really beautiful analysis
  • 18:30that Carina does in this paper to show
  • 18:32that you can produce all these different
  • 18:33interlocking patterns of limit cycles
  • 18:35and multistable states,
  • 18:36and chaos, and all these nice patterns,
  • 18:38and you can design them by picking these nice
  • 18:41sort of directed graphs.
  • 18:44The last class of networks that we've built to test
  • 18:46are Leaky-Integrate and Fire Networks.
  • 18:48So here we're using a Leaky-Integrate and Fire model
  • 18:51where our wiring matrix, W, is drawn randomly.
  • 18:54It's block stochastic, which means
  • 18:57that it's (indistinct) between blocks.
  • 19:00And it's a balanced network,
  • 19:02so we have excitatory and inhibitory neurons
  • 19:04that talk to each other,
  • 19:06and maintain a sort of a balance in the dynamics here.
  • 19:09The hope is to pick a large enough scale network
  • 19:11that we see properly chaotic dynamics
  • 19:13using this Leaky-Integrate and Fire model.
  • 19:17These tests have yielded fairly mixed results,
  • 19:21so the simple tests behave sort of as expected.
  • 19:24So the estimators that are used are biased,
  • 19:27and the bias typically decays slower
  • 19:29than the variance in estimation,
  • 19:30which means that you do need fairly long trajectories
  • 19:32to try to properly estimate the transfer entropy.
  • 19:36That said, transfer entropy does correctly identify
  • 19:38causal relationships and simple graphs,
  • 19:40and transfer entropy matches the underlying structure
  • 19:44used in a Combinatorial Threshold Linear Network, so CTLN.
  • 19:49Unfortunately, these results did not carry over as cleanly
  • 19:52to the Leaky-Integrate and Fire models
  • 19:54or to model sort of larger models.
  • 19:56So what I'm showing you on the right here,
  • 19:58this is a matrix where we've calculated
  • 20:00the pairwise transfer entropy between all neurons
  • 20:03in a 150 neuron balanced network.
  • 20:06This has shown absolute, this has shown in the log scale.
  • 20:09And the main thing I wanna highlight for it
  • 20:11to taking a look at this matrix
  • 20:12is very hard to see exactly what the structure is.
  • 20:15You see this banding,
  • 20:17that's because neurons tend to be highly predictive
  • 20:20if they fire a lot.
  • 20:21So there's a strong correlation
  • 20:22between the transfer entropy, between X and Y,
  • 20:25and just the activity level of X,
  • 20:29but it's hard to distinguish block-wise differences,
  • 20:31for example, between inhibitory neurons
  • 20:33and excitatory neurons, and that really takes plotting out.
  • 20:36So here this box in a whisker plot on the bottom,
  • 20:39this is showing you if we group entries of this matrix
  • 20:43by the type of connection,
  • 20:44so maybe excitatory to excitatory,
  • 20:46or inhibitory to excitatory, or so on,
  • 20:48that the distribution of realized transfer entropy
  • 20:50is really a different,
  • 20:52but they're different in sort of subtle ways.
  • 20:54So in this sort of larger scale balance network,
  • 20:58it's much less clear whether transfer entropy
  • 21:02effectively is like equated in some way
  • 21:05with the true connectivity or wiring.
  • 21:09In some ways, this is not a surprise
  • 21:10because the behavior of the balance networks
  • 21:12is inherently balanced,
  • 21:13and (indistinct) inherently unstructured,
  • 21:16but there are ways in which these experiments
  • 21:18have sort of revealed confounding factors
  • 21:20that are conceptual factors
  • 21:22that make transfer entropies
  • 21:24not as sort of an ideal measure,
  • 21:25or maybe not as ideal as it seems
  • 21:28given the start of this talk.
  • 21:29So for example, suppose two trajectories:
  • 21:33X and Y are both strongly driven by a third trajectory, Z,
  • 21:36but X responds to Z first.
  • 21:39Well, then the present information about X
  • 21:40or the present state of X carries information
  • 21:42about the future of Y, so X is predictive of Y,
  • 21:45so X forecast Y.
  • 21:46So in the transfer entropy or Wiener causality setting,
  • 21:48we would say X causes Y,
  • 21:51even if X and Y are only both responding to Z.
  • 21:54So here in this example,
  • 21:56suppose you have a directed tree where information
  • 21:59or sort of dynamics propagate down the tree.
  • 22:02If you look at this node here, PJ and I,
  • 22:07PJ will react
  • 22:08to essentially information traveling down this tree
  • 22:12before I does, so PJ would be predictive for I.
  • 22:15So we would observe an effective connection
  • 22:19where PJ forecasts I,
  • 22:21which means that neurons that are not directly connected
  • 22:23may influence each other,
  • 22:24and that this transfer entropy
  • 22:26really you should think of in terms of forecasting,
  • 22:29not in terms of being a direct analog to the wiring matrix.
  • 22:33One way around this is to condition on the state
  • 22:35of the rest of the network
  • 22:37before you start doing some averaging.
  • 22:39This leads to some other notions of entropy.
  • 22:41So for example, causation entropy,
  • 22:42and this is sort of a promising direction,
  • 22:44but it's not a time to explore yet.
  • 22:47So that's the estimation side,
  • 22:49those are the tools for estimating transfer entropy.
  • 22:52Now let's switch gears
  • 22:53and talk about that second question I had introduced,
  • 22:55which is essentially, how do we analyze structure?
  • 22:57Suppose we could calculate a transfer entropy graph,
  • 23:00how would we extract structural information from that graph?
  • 23:04And here, I'm going to be introducing some tools
  • 23:06that I've worked on for awhile
  • 23:08for describing sort of random structures and graphs.
  • 23:11These are tied back to some work I'd really done
  • 23:15as a graduate student in conversations with Lek-Heng.
  • 23:18So we start in a really simple context,
  • 23:19which is the graph or network.
  • 23:21This could be directed or undirected,
  • 23:22however, we're gonna require that does not have self-loops,
  • 23:24then it's finite.
  • 23:26We'll let V here be the number of vertices
  • 23:28and E be the number of edges.
  • 23:30Then the object of study that we'll introduce
  • 23:33is something called an edge flow.
  • 23:34An edge flow is essentially a function
  • 23:35on the edges of the graph.
  • 23:37So this is a function that accepts pairs of endpoints
  • 23:40and returns a real number,
  • 23:42and this is an alternating function.
  • 23:43So if I had to take F of IJ,
  • 23:45that's negative F of JI
  • 23:47because you can think of F of IJ as being some flow,
  • 23:49like a flow of material between I and J,
  • 23:52hence this name, edge flow.
  • 23:54This is analogous to a vector field
  • 23:56because this is like the analogous construction
  • 23:58to a vector field in the graph,
  • 23:59and represents some sort of flow between nodes.
  • 24:02Edge flows are really sort of generic things,
  • 24:04so you can take this idea of an edge flow
  • 24:07and apply it in a lot of different areas
  • 24:09because really all you need is,
  • 24:10you just need to structure some alternating function
  • 24:12on the edges of the graph.
  • 24:13So I've sort of read papers
  • 24:16and worked in a bunch of these different areas,
  • 24:19particularly I've focused on applications of this
  • 24:21in game theory, in pairwise and social choice settings,
  • 24:25in biology and Markov chains.
  • 24:26And a lot of this project has been attempting
  • 24:28to take this experience working with edge flows in,
  • 24:31for example, say non-equilibrium thermodynamics
  • 24:34or looking at pairwise preference data,
  • 24:36and looking at a different application area
  • 24:38here to neuroscience.
  • 24:40Really you could think about the edge flow
  • 24:42or a relevant edge flow in neuroscience,
  • 24:43you might be asking about asymmetries and wiring patterns,
  • 24:46or differences in directed influence or causality,
  • 24:49or really you could think about these
  • 24:50transfer entropy quantities.
  • 24:51This is why I was excited about transfer entropy.
  • 24:53Transfer entropy is inherently directed notion
  • 24:56of information flow,
  • 24:57so it's natural to think
  • 24:59that if you can calculate things like a transfer entropy,
  • 25:01then really what you're studying
  • 25:03is some sort of edge flow on a graph.
  • 25:06Edge flows often are subject to
  • 25:08sort of the same set of common questions.
  • 25:10So if I wanna analyze the structure of an edge flow,
  • 25:12there's some really big global questions
  • 25:14that I would often ask,
  • 25:15that get asked in all these different application areas.
  • 25:19One common question is,
  • 25:20well, does the flow originate somewhere and end somewhere?
  • 25:23Are there sources and sinks in the graph?
  • 25:25Another is, does it circulate?
  • 25:26And if it does circulate, on what scales and where?
  • 25:31If you have a network that's connected
  • 25:33to a whole exterior network,
  • 25:34for example, if you're looking at some small subsystem
  • 25:37that's embedded in a much larger system
  • 25:38as is almost always the case in neuroscience,
  • 25:41then you also need to think about,
  • 25:42what passes through the network?
  • 25:43So is there a flow or a current that moves
  • 25:46through the boundary of the network?
  • 25:47Is there information that flows through the network
  • 25:51that you're studying?
  • 25:52And in particular if we have these different types of flow,
  • 25:55if flow can originate and source and end in sinks,
  • 25:57if it can circulate, if it can pass through,
  • 25:59can we decompose the flow into pieces that do each of these,
  • 26:03and ask how much of the flow does one, two, or three?
  • 26:07Those questions lead to a decomposition.
  • 26:11So here we're going to start with this simple idea,
  • 26:13we're going to decompose an edge flow
  • 26:15by projecting it onto orthogonal subspaces
  • 26:17associated with some graph operators.
  • 26:20Generically if we consider two linear operators: A and B,
  • 26:24where the product A times B equals zero,
  • 26:27then the range of B must be contained
  • 26:29in the null space of A,
  • 26:31which means that I can express
  • 26:33essentially any set of real numbers.
  • 26:35So you can think of this as being the vector space
  • 26:38of possible edge flows as a direct sum of the range of B,
  • 26:43the range of A transpose
  • 26:45and the intersection of the null space of B transpose
  • 26:47in the null space of A.
  • 26:48This blue subspace, this is called the harmonic space,
  • 26:53and this is trivial
  • 26:56in many applications
  • 26:58if you choose A and B correctly.
  • 27:00So there's often settings where you can pick A and B,
  • 27:02so that these two null spaces have no intersection,
  • 27:06and then this decomposition boils down
  • 27:08to just separating a vector space
  • 27:10into the range of B and the range of A transpose.
  • 27:16In the graph setting,
  • 27:17our goal is essentially to pick these operators
  • 27:19to the meaningful things.
  • 27:20That is to pick graph operators,
  • 27:22so that these subspaces carry a meaningful,
  • 27:26or carry meaning in the structural context.
  • 27:30So let's think a little bit about graph operators here,
  • 27:33so let's look at two different classes of operators.
  • 27:35So we can consider matrices that have E rows and N columns,
  • 27:40or matrices that have L rows and E columns where,
  • 27:44again, E is the number of edges in this graph.
  • 27:48If I have a matrix with E rows,
  • 27:50then each column of the matrix has as many entries
  • 27:53as there are edges in the graph,
  • 27:55so it can be thought of as itself an edge flow.
  • 27:57So you could think that this matrix is composed
  • 27:59of a set of columns where each column is some particular
  • 28:02sort of motivic flow or flow motif.
  • 28:05In contrast if I look at a matrix where I have E columns,
  • 28:09then each row of the matrix is a flow motif,
  • 28:11so products against M
  • 28:14evaluate inner products against specific flow motifs.
  • 28:18That means that in this context,
  • 28:20if I look at the range of this matrix,
  • 28:21this is really a linear combination
  • 28:23of a specific subset of flow motifs.
  • 28:25And in this context,
  • 28:26if I look at the null space of the matrix,
  • 28:28I'm looking at all edge flows orthogonal
  • 28:30to that set of flow motifs.
  • 28:32So here if I look at the range of a matrix with E rows,
  • 28:36that subspace is essentially a modeling behavior
  • 28:39similar to the motifs.
  • 28:40So if I pick a set of motifs that flow out of a node
  • 28:44or flow into a node,
  • 28:45then this range is going to be a subspace of edge flows
  • 28:48that tend to originate in sources and end in sinks.
  • 28:51In contrast here, the null space of M,
  • 28:54that's all edge flows orthogonal to the flow motifs,
  • 28:57so it models behavior distinct from the motifs.
  • 28:59Essentially this space asks, what doesn't the flow do?
  • 29:02Whereas this space asks, what does the flow do?
  • 29:07Here is a simple, sort of very classical example.
  • 29:09And really this goes all the way back to,
  • 29:11you could think like Kirchhoff electric circuit theory.
  • 29:14We can define two operators.
  • 29:15Here G, this is essentially a gradient operator.
  • 29:18And if you've taken some graph theory,
  • 29:20you might know this as the edge incidence matrix.
  • 29:22This is a matrix which essentially records
  • 29:25the endpoints of an edge
  • 29:26and evaluates differences across it.
  • 29:29So, for example, if I look at this first row of G,
  • 29:33this corresponds to edge one in the graph,
  • 29:35and if I had a function defined on the nodes in the graph,
  • 29:39products with G would evaluate differences across this edge.
  • 29:43If you look at its columns,
  • 29:44each column here is a flow motif.
  • 29:46So, for example, this highlighted second column,
  • 29:49this is entries: one, negative one, zero, negative one.
  • 29:52If you carry those back to the edges,
  • 29:53that corresponds to this specific flow motif.
  • 29:56So here this gradient,
  • 29:58it's adjoint to essentially a divergence operator,
  • 30:00which means that the flow motifs are unit inflows
  • 30:03or unit outflows from specific nodes,
  • 30:05like what's shown here.
  • 30:07You can also introduce something like a curl operator.
  • 30:10The curl operator evaluates paths, sums around loops.
  • 30:13So this row here, for example, this is a flow motif
  • 30:16corresponding to the loop labeled A in this graph.
  • 30:20You could certainly imagine
  • 30:21other operators' built cutter, other motifs,
  • 30:23these operators are particularly nice
  • 30:25because they define principled subspaces.
  • 30:28So if we apply that generic decomposition,
  • 30:31then we could say that the space
  • 30:32of possible edge flows are E,
  • 30:34it can be decomposed into the range of the grading operator,
  • 30:37the range of the curl transpose,
  • 30:39and the intersection of their null spaces
  • 30:42into this harmonic space.
  • 30:44This is nice because the range of the gradient that flows,
  • 30:46it start and end somewhere.
  • 30:48Those are flows that are associated with
  • 30:50like motion down a potential.
  • 30:52So these if you're thinking physics,
  • 30:53you might say that these are sort of conservative,
  • 30:55these are like flows generated by a voltage
  • 30:57if you're looking at electric circuit.
  • 30:59These cyclic flows, well, these are the flows
  • 31:01in the range of the curl transpose,
  • 31:03and then this harmonic space,
  • 31:04those are flows that enter and leave the network
  • 31:06without either starting or ending
  • 31:09a sink or a source, or circulating.
  • 31:11So you can think that really this decomposes
  • 31:13the space of edge flows into flows that start
  • 31:16and end somewhere inside the network.
  • 31:17Flows that circulate within the network,
  • 31:19and flows that do neither,
  • 31:20i.e. flows that enter and leave the network.
  • 31:22So this accomplishes that initial decomposition
  • 31:25I'd set out at the start.
  • 31:28Once we have this decomposition, then we can evaluate
  • 31:31the sizes of the components of decomposition to measure
  • 31:34how much of the flow starts and ends somewhere,
  • 31:38how much circulates and so on.
  • 31:39So we can introduce these generic measures
  • 31:41we're given some operator N,
  • 31:44we decompose the space of edge flows
  • 31:46into the range of M and the null space of M transpose,
  • 31:49which means we can project F onto these subspaces,
  • 31:52and then just evaluate the sizes of these components.
  • 31:55And that's a way of measuring
  • 31:57how much of the flow behaves like
  • 31:59the flow motifs contained in this operator,
  • 32:01and how much it doesn't.
  • 32:04So, yeah, so that lets us answer this question,
  • 32:07and this is the tool that we're going to be using
  • 32:09sort of as our measurable.
  • 32:12Now that's totally easy to do,
  • 32:16if you're given a fixed edge flow and a fixed graph
  • 32:17because if you have fixed graph,
  • 32:18you can build your operators, you choose the motifs,
  • 32:20you have fixed edge flow, you just project the edge flow
  • 32:23onto the subspaces spanned by those operators,
  • 32:25and you're done.
  • 32:27However, there are many cases where it's worth thinking
  • 32:31about a distribution of edge flows,
  • 32:33and then expected structures given that distribution.
  • 32:37So here we're going to be considering random edge flows,
  • 32:39for example, in edge flow capital F,
  • 32:41here I'm using capital letters to denote random quantities
  • 32:43sampled from an edge flow distributions.
  • 32:45This is a distribution of possible edge flows.
  • 32:46And this is worth thinking about
  • 32:48because many generative models are stochastic.
  • 32:51They may involve some random seed,
  • 32:53or they may, for example, like that neural model
  • 32:55or a lot of these sort of neural models be chaotic.
  • 32:58So even if they are deterministic generative models,
  • 33:01the output data behaves
  • 33:03as it was sampled from the distribution.
  • 33:05On the empirical side, for example,
  • 33:07when we're estimating transfer entropy
  • 33:09or estimating some information flow,
  • 33:11then there's always some degree of measurement error
  • 33:13or uncertainty in that estimate,
  • 33:15which really means sort of from a Bayesian perspective,
  • 33:18we should be thinking that our estimator
  • 33:21is a point estimate drawn from some
  • 33:23posterior distribution of edge flows,
  • 33:24and then we're back in the setting where,
  • 33:25again, we need to talk about a distribution.
  • 33:28Lastly, this random edge flow setting is also
  • 33:31really important if we wanna compare to null hypotheses
  • 33:35because often if you want to compare
  • 33:37to some sort of null hypothesis,
  • 33:38it's helpful to have an ensemble of edge flows
  • 33:41to compare against, which means that we would like
  • 33:44to be able to talk about expected structure
  • 33:46under varying distributional assumptions.
  • 33:50If we can talk meaningfully about random edge flows,
  • 33:54then really what we can start doing is
  • 33:56we can start bridging the expected structure
  • 33:59back to the distribution.
  • 34:00So what we're looking for is a way of explaining
  • 34:03sort of generic expectations
  • 34:05of what the structure will look like
  • 34:07as we vary this distribution of edge flows.
  • 34:10You could think that a particular dynamical system
  • 34:13generates a wiring pattern,
  • 34:17that generates firing dynamics,
  • 34:19those firing dynamics determine
  • 34:21some sort of information flow graph.
  • 34:23And then that information flow graph
  • 34:25is really a sample from that generative model.
  • 34:28And we would like to be able to talk about,
  • 34:30what would we expect if we knew the distribution
  • 34:33of edge flows about the global structure?
  • 34:35That is, we'd like to bridge global structure
  • 34:37back to this distribution,
  • 34:39and then ideally you would bridge that distribution back
  • 34:41to the generative mechanism.
  • 34:42This is a project for a future work,
  • 34:45obviously this is fairly ambitious.
  • 34:47However, this first point is something that you can do
  • 34:51really in fairly explicit detail.
  • 34:53And that's what I'd like to spell out
  • 34:54with the end of this talk is
  • 34:55how do you bridge global structure
  • 34:58back to a distribution of edge flows?
  • 35:02So, yeah, so that's the main question,
  • 35:05how does the choice of distribution
  • 35:06influence the expected global flow structure?
  • 35:12So first, we start with the Lemma.
  • 35:15Suppose that we have a distribution of edge flows
  • 35:17with some expectation F bar and some covariance,
  • 35:20here I'm using double bar V to denote covariance.
  • 35:24We'll let S contained in the set of,
  • 35:26or S be a subspace
  • 35:29contained within the vector space of edge flows,
  • 35:31and we'll let Ps of S be the orthogonal projector onto S.
  • 35:35Then Fs of S, that's the projection F onto this subspace S,
  • 35:40the expectation of its norm squared
  • 35:43is the norm of the expected flow projected onto S squared.
  • 35:48So this is essentially the expectation of the sample
  • 35:53is the measure evaluated of the expected sample.
  • 35:56And then plus a term that involves an inner product
  • 35:58between the projector onto the subspace,
  • 36:00and the covariance matrix for the edge flows.
  • 36:02Here this denotes the matrix inner product,
  • 36:04so this is just the sum overall IJ entries.
  • 36:09What's nice about this formula
  • 36:10is at least in terms of expectation, it reduces the study
  • 36:16of the bridge between distribution
  • 36:18and network structure to a study of moments, right?
  • 36:22Because we've replaced the distributional problem here
  • 36:24with a linear algebra problem
  • 36:27that's posed in terms of this projector,
  • 36:29the projector under the subspace S,
  • 36:31which is determined by the topology of the network,
  • 36:33and the variance in that edge flow
  • 36:36which is determined by your generative model.
  • 36:40Well, you might say, okay, well, (laughs) fine,
  • 36:42this is a matrix inner product, we can just stop here,
  • 36:44we could compute this projector,
  • 36:45we could sample a whole bunch of edge flows,
  • 36:47compute this covariance.
  • 36:48So you can do this matrix inner product,
  • 36:50but I sort of agree
  • 36:51because I suspect that you can really do more
  • 36:55with this sort of inner product.
  • 36:57So I'd like to highlight some challenges
  • 37:00associated with this inner product.
  • 37:03So first, let's say, I asked you to design a distribution
  • 37:06with tunable global structure.
  • 37:07So for example, I said, I want you to pick
  • 37:09a generative model or design a distribution of edge flows
  • 37:12that when I sample edge flows from it,
  • 37:14their expected structures matched some expectation.
  • 37:18It's not obvious how to do that given this formula,
  • 37:22it's not obvious in particular
  • 37:23because these projectors,
  • 37:24like the projector on the subspace S typically depend
  • 37:27in fairly non-trivial ways on the graph topology.
  • 37:30So small changes in the graph topology
  • 37:32can completely change as projector.
  • 37:34In essence, it's hard to isolate topology from distribution.
  • 37:37You can think that this inner product,
  • 37:39if I think about it in terms of the IJ entries,
  • 37:43while easy to compute, it's not easy to interpret
  • 37:47because I and J are somewhat arbitrary indexing.
  • 37:49And obviously really the topology of the graph,
  • 37:51it's not encoded in the indexing,
  • 37:53that's encoded in the structure of these matrices.
  • 37:56So in some ways what we really need is a better basis
  • 37:59for computing this inner product.
  • 38:01In addition, computing this inner product
  • 38:03just may not be empirically feasible
  • 38:05because it might not be feasible
  • 38:07to estimate all these covariances.
  • 38:08There's lots of settings
  • 38:09where if you have a random edge flow,
  • 38:11it becomes very expensive to try to estimate
  • 38:13all the covariances in this graph,
  • 38:14err, sorry, in this matrix
  • 38:16because this matrix has as many entries
  • 38:19as there are pairs of edges in the graph.
  • 38:22And typically that number of edges grows fairly quickly
  • 38:26in the number of nodes of the graph.
  • 38:27So in the worst case,
  • 38:29the size of these matrices
  • 38:31goes not to the square of the number of nodes of the graph,
  • 38:33but the number of nodes of the graph to the fourth,
  • 38:35so this becomes very expensive very fast.
  • 38:37Again, we could try to address this problem
  • 38:41if we had a better basis for performing this inner product
  • 38:43because we might hope to be able to truncate
  • 38:46somewhere in that basis,
  • 38:47and use a lower dimensional representation.
  • 38:50So to build there, I'm gonna show you
  • 38:52a particular family of covariances.
  • 38:55We're going to start with a very simple generative model,
  • 38:58so let's suppose that each node of the graph
  • 39:00is assigned some set of attributes,
  • 39:02here a random vector X sampled from a...
  • 39:04So you can think of trait space,
  • 39:05a space of possible attributes,
  • 39:07and these are sampled i.i.d.
  • 39:09In addition, we'll assume
  • 39:10that there exists an alternating function F,
  • 39:13which accepts pairs of attributes and returns a real number.
  • 39:17So this is something that I can evaluate
  • 39:19on the endpoints of an edge,
  • 39:21and return an edge flow value.
  • 39:24In this setting,
  • 39:26everything that I'd shown you before simplifies.
  • 39:29So if my edge flow F is drawn by first sampling
  • 39:33a set of attributes,
  • 39:34and then plugging those attributes
  • 39:35into functions on the edges, then the
  • 39:40mean edge flow is zero, so that F bar goes away,
  • 39:44and the covariance reduces to this form.
  • 39:46So you have a standard form where the covariance
  • 39:48in the edge flow is a function of two scalar quantities,
  • 39:52that's sigma squared in row.
  • 39:53These are both statistics associated with this function
  • 39:56and the distribution of traits.
  • 39:59And then some matrices,
  • 40:00so we have an identity matrix,
  • 40:02and we have this gradient matrix showing up again.
  • 40:05This is really nice because when you plug it back in
  • 40:07to try to compute say the expected sizes of the components,
  • 40:13this matrix inner product
  • 40:15that I was complaining about before,
  • 40:17this whole matrix inner product simplifies.
  • 40:19So when you have a variance
  • 40:21that's in this nice, simple canonical form,
  • 40:23then the expected overall size of the edge flow,
  • 40:26that's just sigma squared,
  • 40:27the expected size projected onto that
  • 40:30sort of conservative subspace
  • 40:32that breaks into this combination
  • 40:35of the sigma squared in the row.
  • 40:37Again, those are some simple statistics.
  • 40:39And then V, E, L, and E,
  • 40:41those are just sort of
  • 40:42essentially dimension counting on the network.
  • 40:43So this is the number of vertices, the number of edges,
  • 40:47and the number of loops,
  • 40:48the number of loops that's the number of edges
  • 40:49minus the number of vertices plus one.
  • 40:52And similarly, the expected cyclic size
  • 40:55or size of the cyclic component reduces to,
  • 40:57again, this sort of scalar factor
  • 40:59in terms of some simple statistics
  • 41:01and some dimension counting sort of
  • 41:03topology related quantities.
  • 41:07So this is very nice because this allows us
  • 41:11to really separate the role of topology
  • 41:13from the role of the generative model.
  • 41:14The generative model determines sigma in row,
  • 41:17and topology determines these dimensions.
  • 41:22It turns out that the same thing is true,
  • 41:26even if you don't sample the edge flow
  • 41:29using this sort of trait approach,
  • 41:31but the graph is complete.
  • 41:33So if your graph is complete,
  • 41:34then no matter how you sample your edge flow,
  • 41:37for any edge flow distribution,
  • 41:38exactly the same formulas hold,
  • 41:40you just replace those simple statistics
  • 41:43with estimators for those statistics given your sample flow.
  • 41:47And this is sort of a striking result
  • 41:49because this says that this conclusion
  • 41:51that was linked to some specific generative model
  • 41:54with some very sort of specific assumptions, right?
  • 41:56We assumed it was i.i.d. extends to all complete graphs,
  • 41:59regardless of the actual distribution that we sampled from.
  • 42:05Up until this point,
  • 42:06this is kind of just an algebra miracle.
  • 42:09And one of the things I'd like to do
  • 42:10at the end of this talk is explain why this is true,
  • 42:13and show how to generalize these results.
  • 42:16So to build there,
  • 42:17let's emphasize some of the advantages of this.
  • 42:19So first, the advantages of this model,
  • 42:22it's mechanistically plausible in certain settings,
  • 42:24it cleanly separated the role of topology and distribution.
  • 42:28And these coefficients that had to do with the topology,
  • 42:30these are just dimensions,
  • 42:31these are non-negative quantities.
  • 42:34So it's easy to work out monotonic relationships
  • 42:36between expected structure
  • 42:38and simple statistics of the edge flow distribution.
  • 42:44The fact that you can do that enables more general analysis.
  • 42:47So what I'm showing you on the right here,
  • 42:48this is from a different application area.
  • 42:51This was an experiment where we trained
  • 42:53a set of agents to play a game using a genetic algorithm,
  • 42:58and then we looked at the expected sizes of sort of cyclic
  • 43:01and acyclic components in a tournament among those agents.
  • 43:05And you could actually predict these curves
  • 43:08using this sort of type of structural analysis
  • 43:10because it was possible to predict the dynamics
  • 43:13of the simple statistics, this sigma in this row.
  • 43:17So this is a really powerful analytical tool,
  • 43:20but it is limited to this particular model.
  • 43:23In particular, it only models unstructured cycles.
  • 43:26So if you look at the cyclic component
  • 43:27generated by this model, it just looks like random noise
  • 43:30that's been projected onto the range of the curl transpose.
  • 43:34It's limited to correlations on adjacent edges,
  • 43:36so we only generate correlations on edges
  • 43:38that share an endpoint
  • 43:39because you could think that all of the original
  • 43:41random information comes from the endpoints.
  • 43:45And then it's in some ways not general enough,
  • 43:47so it lacks some expressivity.
  • 43:48We can't parametize all possible expected structures
  • 43:51by picking a sigma in a row.
  • 43:54And we lack some notion of sufficiency,
  • 43:56i.e. if the graph is not complete,
  • 43:58then this nice algebraic property
  • 44:01that it actually didn't matter what the distribution was,
  • 44:03this fails to hold.
  • 44:04So if the graph is not complete,
  • 44:06then projection onto the family of covariances
  • 44:09parameterized in this fashion
  • 44:11changes the expected global structure.
  • 44:15So we would like to address these limitations.
  • 44:17And so our goal for the next part of this talk
  • 44:19is to really generalize these results.
  • 44:21To generalize, we're going to
  • 44:23switch our perspective a little bit.
  • 44:25So I'll recall this formula
  • 44:27that if we generate our edge flow
  • 44:30by sampling quantities on the endpoints,
  • 44:32and then plugging them into functions on the edges,
  • 44:34then you necessarily get a covariance
  • 44:35that's in this two parameter family
  • 44:37where I have two scalar quantities
  • 44:39associated with the statistics of the edge flow.
  • 44:41That's the sigma in this row.
  • 44:42And then I have some matrices that are associated
  • 44:44with the topology of the network
  • 44:45in the subspaces I'm projecting onto.
  • 44:48These are related to a different way
  • 44:51of looking at the graph.
  • 44:52So I can start with my original graph
  • 44:54and then I can convert it to an edge graph
  • 44:57where I have one node per edge in the graph,
  • 45:00and nodes are connected if they share an endpoint.
  • 45:04You can then assign essentially signs to these edges
  • 45:07based on whether the edge direction chosen
  • 45:11in the original graph is consistent
  • 45:12or inconsistent at the node that links to edges.
  • 45:16So for example, edges one and two both point into this node,
  • 45:20so there's an edge that's linking
  • 45:21one and two in the edge graph with a positive sum.
  • 45:25This essentially tells you that the influence of
  • 45:29random information assigned on this node linking one and two
  • 45:33would positively correlate the sample edge flow
  • 45:36on edges one and two.
  • 45:38Then this form, what this form
  • 45:41sort of for covariance matrices says,
  • 45:43is that we're looking at families of edge flows
  • 45:46that have correlations on edges sharing an endpoint.
  • 45:49So edges at distance one in this edge graph,
  • 45:51and non-adjacent edges are
  • 45:52entirely independent of each other.
  • 45:56Okay?
  • 45:58So that's essentially what
  • 45:59the trait performance model is doing,
  • 46:01is it's parameterizing a family of covariance matrices
  • 46:04where we're modeling correlations at distance one,
  • 46:06but not further in the edge graph.
  • 46:08So then the natural thought
  • 46:09for how to generalize these results is to ask,
  • 46:11can we model longer distance correlations
  • 46:13through this graph?
  • 46:15To do so, let's think a little bit about
  • 46:17what this matrix that's showing up inside the covariance is.
  • 46:21So we have a gradient, tons of gradient transpose.
  • 46:24This is an effect of Laplacian for that edge graph.
  • 46:30And you can do this for other motifs.
  • 46:32If you think about different sort of motif constructions,
  • 46:35essentially if you take a product of M transpose times M,
  • 46:38that will generate something that looks like a Laplacian
  • 46:41or an adjacency matrix for a graph
  • 46:44where I'm assigning nodes to be motifs
  • 46:47and looking at the overlap of motifs.
  • 46:50And if I look at M times M transpose,
  • 46:52and I'm looking at the overlap of edges via shared motifs.
  • 46:55So these operators you can think
  • 46:56about as being Laplacians for some sort of graph
  • 46:59that's generated from the original graph motifs.
  • 47:04Like any adjacency matrix,
  • 47:06powers of something like GG transpose minus 2I,
  • 47:11that will model connections along longer paths
  • 47:14along longer distances in these graphs
  • 47:16associated with motifs,
  • 47:17in this case with the edge graph.
  • 47:20So our thought is maybe,
  • 47:21well, we could extend this trait performance family
  • 47:23of covariance matrices by instead of only looking at
  • 47:27a linear combination of an identity matrix, and this matrix,
  • 47:31we could look at a power series.
  • 47:32So we could consider combining powers of this matrix.
  • 47:37And this will generate this family of matrices
  • 47:39that are parameterized by some set of coefficients-
  • 47:41<v Robert>Dr. Strang?</v>
  • 47:43<v ->Ah, yes?</v> <v ->I apologize (mumbles)</v>
  • 47:44I just wanna remind you
  • 47:46that we have a rather tight time limit,
  • 47:48approximately a couple of minutes.
  • 47:50<v ->Yes, of course.</v>
  • 47:52So here, the idea is to parametize this family of matrices
  • 47:57by introducing a set of polynomials with coefficients alpha,
  • 48:00and then plugging into the polynomial,
  • 48:03the Laplacian that's generated by sort of the,
  • 48:06or the adjacent matrix
  • 48:08generated by the graph motifs we're interested in.
  • 48:11And that trait performance result,
  • 48:12that was really just looking at the first order case here,
  • 48:14that was looking at a linear polynomial
  • 48:17with these chosen coefficients.
  • 48:20This power series model is really nice analytically,
  • 48:24so if we start with some graph operator M,
  • 48:28and we consider the family of covariance matrices
  • 48:31generated by plugging M, M transpose
  • 48:34into some polynomial and power series,
  • 48:36then this family of matrices is contained
  • 48:39within the span of powers of M, M transpose.
  • 48:45You can talk about this family
  • 48:47sort of in terms of combinatorics.
  • 48:48So for example, if we use that gradient
  • 48:50times gradient transpose minus twice the identity,
  • 48:52then powers of this is essentially, again, paths counting.
  • 48:55So this is counting paths of length N.
  • 48:58You can also look at things like the trace of these powers.
  • 49:00So if you look at the trace series,
  • 49:02that's the sequence where you look at the trace
  • 49:04of powers of these,
  • 49:06essentially these adjacency matrices.
  • 49:09This is doing some sort of loop count
  • 49:11where we're counting loops of different length.
  • 49:14And you could think that this trace series
  • 49:15in some sense is controlling amplification
  • 49:17of self-correlations within the sampled edge flow.
  • 49:22Depending on the generative model,
  • 49:23we might wanna use different operators
  • 49:25for generating this family.
  • 49:26So, for example, going back to that
  • 49:28synaptic plasticity model with coupled oscillators,
  • 49:31in this case using the gradient to generate
  • 49:34the family of covariance matrices.
  • 49:35It's not really the right structure
  • 49:37because the dynamics of the model
  • 49:39sort of have these natural cyclic connections.
  • 49:43So it's better to build the power series using the curl.
  • 49:46So depending on your model,
  • 49:47you can adapt this power series family
  • 49:49by plugging in a different graph operator.
  • 49:53Let's see now, what happens if we try to compute
  • 49:55the expected sizes of some components
  • 49:58using a power series of this form?
  • 50:00So if the variance or covariance matrix
  • 50:04for our edge flow is a power series in,
  • 50:06for example, the gradient, gradient transpose,
  • 50:08then the expected sizes of the measures
  • 50:12can all be expressed as
  • 50:13linear combinations of this trace series
  • 50:16and the coefficients of the original polynomial.
  • 50:19For example, the expected cyclic size of the flow
  • 50:21is just the polynomial evaluated at negative two
  • 50:24multiplied by the number of the loops in the graph.
  • 50:26And this really generalizes that trait performance result
  • 50:29because the trait performance result is given
  • 50:31by restricting these polynomials to be linear.
  • 50:34Okay?
  • 50:36This you can extend sort of to other bases,
  • 50:41but really what this accomplishes is
  • 50:43by generalizing trait performance,
  • 50:45we achieve this sort of generic properties
  • 50:50that it failed to have.
  • 50:52So in particular, if I have an edge flow subspace S
  • 50:56spanned by a set of flow motifs stored in some operator M,
  • 50:59then this power series family of covariance
  • 51:01is associated with the Laplacian,
  • 51:03that is M times M transpose is both expressive
  • 51:07in the sense that for any non-negative A and B,
  • 51:11I can pick some alpha and beta,
  • 51:13so that the expected size of the projection of F
  • 51:16onto the subspace is A,
  • 51:18and the projected size of F
  • 51:19onto the subspace orthogonal to S is B
  • 51:23for any covariance in this power series family.
  • 51:27And it's sufficient in the sense
  • 51:29that for any edge flow distribution
  • 51:31with mean zero in covariance V.
  • 51:35If C is the matrix nearest to V in Frobenius norm
  • 51:38restricted to the power series family,
  • 51:40then these inner products computed in terms of C
  • 51:44are exactly the same as the inner products
  • 51:46computed in terms of V,
  • 51:47so they directly predict the structure,
  • 51:49which means that if I use this power series family,
  • 51:51discrepancies off of this family
  • 51:54don't change the expected structure.
  • 51:57Okay?
  • 51:57So I know I'm short on time here,
  • 51:59so I'd like to skip then just to the end of this talk.
  • 52:03There's further things you can do with this,
  • 52:04this is sort of really nice.
  • 52:06Mathematically you can build an approximation theory
  • 52:08out of this and study for different random graph families,
  • 52:12how many terms in these power series you need?
  • 52:15And those terms define some nice,
  • 52:17sort of simple minimal set of statistics
  • 52:19to try to sort of estimate structure,
  • 52:22but I'd like to really just get to the end here
  • 52:25and emphasize the takeaways from this talk.
  • 52:28So the first half of this talk
  • 52:30was focused on information flow.
  • 52:32What we saw is that information flow is a non-trivial,
  • 52:35but well studied, estimation problem.
  • 52:37And this is something that at least on my side
  • 52:38sort of is a work in progress with students.
  • 52:41Here in some ways, the conclusion of that first half
  • 52:43would be that causation entropy
  • 52:45may be a more appropriate measure than TE
  • 52:47when trying to build these flow graphs
  • 52:49to apply these structural measures to.
  • 52:51Then on the structural side,
  • 52:53we can say that power series family,
  • 52:55this is a nice family of covariance matrices.
  • 52:57It has nice properties that are useful empirically
  • 52:59because they let us build global correlation structures
  • 53:02from a sequence of local correlations
  • 53:03from that power series.
  • 53:06If you plug this back into the expected measures,
  • 53:08you can recover monotonic relations,
  • 53:10like in that limited trait performance case.
  • 53:12And truncation of these power series
  • 53:14reduces the number of quantities
  • 53:16that you would actually need to measure.
  • 53:19Actually to a number of quantities
  • 53:20that can be quite small relative to the graph,
  • 53:22and that's where this approximation theory comes in.
  • 53:25One way, sort of maybe to summarize this entire approach
  • 53:28is what we've done is by looking at these power series
  • 53:31built in terms of the graph operators
  • 53:33is it provides a way to study
  • 53:35inherently heterogeneous connections,
  • 53:38or covariances, or edge flow distributions
  • 53:41using a homogeneous correlation model
  • 53:43that's built sort of at multiple scales
  • 53:45by starting the local scale, and then looking at powers.
  • 53:49In some ways this is a comment
  • 53:50that I ended a previous version of this talk with.
  • 53:53I still think that this structural analysis is in some ways
  • 53:56a hammer seeking a nail,
  • 53:57and that this inflammation flow construction,
  • 53:59this is work in progress to try to build that nail.
  • 54:02So thank you all for your attention,
  • 54:04I'll turn it now over to questions.
  • 54:09<v Robert>(mumbles) really appreciate it.</v>
  • 54:14Unfortunately, for those of you on Zoom,
  • 54:16you're welcome to keep up the conversation,
  • 54:17so (mumbles) unfortunately have to clear the room.
  • 54:20So I do apologize (mumbles)
  • 54:25Dr. Steinman?
  • 54:27It might be interesting, yeah. (laughs)
  • 54:28(students laugh)
  • 54:30Dr. Strang? <v ->Oh, yes, yeah.</v>
  • 54:33<v Robert>Okay, do you mind if people...?</v>
  • 54:35Yeah, we have to clear the room,
  • 54:36do you mind if people email you if they have questions?
  • 54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
  • 54:42Do I mind if...?
  • 54:45<v Robert>We have to clear the room,</v>
  • 54:47do you mind if people email you if they have questions?
  • 54:49<v ->No, not at all.</v>
  • 54:50<v Robert>(mumbles) may continue the conversation,</v>
  • 54:52so I do apologize, they are literally
  • 54:54just stepping in the room right now.
  • 54:57<v ->Okay, no, yeah, that's totally fine.</v>
  • 54:59<v Robert>Thank you, thank you.</v>
  • 55:01And thanks again for a wonderful talk.
  • 55:03<v ->Thank you.</v>