YSPH Biostatistics Seminar: “Motif Expansion of Global Information Flow in Spike Train Data”

September 15, 2022

Information

Alexander Strang, Kruskal Instructor, Statistics and Applied Math, University of Chicago

September 12, 2022

ID8065

To CiteDCA Citation Guide

00:00<v Robert>Good afternoon.</v>
00:01In respect for everybody's time today,
00:04let's go ahead and get started.
00:07So today it is my pleasure to introduce,
00:09Dr. Alexander Strang.
00:12Dr. Strang earned his bachelor's in Mathematics and Physics,
00:16as well as his PhD in Applied Mathematics
00:19from Case Western Reserve University in Cleveland, Ohio.
00:24Born in Ohio, so representer.
00:27He studies variational inference problems,
00:29noise propagation in biological networks,
00:32self-organizing edge flows,
00:34and functional form game theory
00:36at the University of Chicago,
00:38where he is a William H. Kruskal Instructor of Statistics,
00:42and Applied Mathematics.
00:43Today he is going to talk to us about motivic expansion
00:47of global information flow in spike train data.
00:50Let's welcome our speaker.
00:54<v ->Okay, thank you very much.</v>
00:56Thank you first for the kind invite,
00:59and for the opportunity to speak here in your seminar.
01:03So I'd like to start with some acknowledgements.
01:06This is very much a work in progress.
01:09Part of what I'm going to be showing you today
01:11is really the work of a master's student
01:12that I've been working with this summer, that's Bowen.
01:15And really I'd like to thank Bowen
01:16for a lot of the simulation
01:18and a lot of the TE calculation I'll show you later.
01:21This project more generally was born
01:22out of conversations with Brent Doiron
01:24and Lek-Heng Lim here at Chicago.
01:27Brent really was the inspiration for
01:30starting to venture into computational neuroscience.
01:33I'll really say that that I am new to this world,
01:35it's a world that's exciting to me,
01:37but really is a world that I am actively exploring
01:41and learning about.
01:42So I look forward to conversations afterwards
01:44to learn more here.
01:46My background was much more inspired by Lek-Heng's work
01:49in computational topology.
01:52And some of what I'll be presenting today
01:54is really inspired by conversations with him.
01:58So let's start with some introduction and motivation.
02:01The motivation personally for this talk,
02:05so it goes back really to work that I started
02:06when I was a graduate student,
02:08I've had this sort of long standing interest
02:10in the interplay between structure and dynamics,
02:12in particular on networks.
02:14I've done interesting questions like,
02:16how does the structure of a network determine dynamics
02:18of processes on that network?
02:21And in turn, how do processes on that network
02:24give rise to structure?
02:30On the biological side,
02:32today's talk I'm going to be focusing on
02:34sort of applications of this question
02:36within neural networks.
02:38And I think that this sort of world of
02:39computational neuroscience is really exciting
02:41if you're interested in this interplay
02:42between structure and dynamics
02:44because neural networks encode, transmit
02:46and process information via dynamical processes.
02:49For example, the process, the dynamical process
02:53of a neural network is directed by the wiring patterns
02:56by the structure of that network.
02:58And moreover, if you're talking
03:00about some sort of learning process,
03:01then those wiring patterns can change
03:03and adapt during the learning process,
03:05so that are themselves dynamic.
03:08In this area I've been interested in questions like,
03:10how is the flow of information governed
03:12by the wiring pattern?
03:14How do patterns of information flow
03:16present themselves in data?
03:17And can they be inferred from that data?
03:19And what types of wiring patterns
03:21might develop during learning?
03:24Answering questions of this type requires
03:26a couple of things.
03:26Sort of very big picture, it requires a language
03:29for describing structures and patterns.
03:31It requires having a dynamical process,
03:33some sort of model for the neural net,
03:35and it requires a generating model
03:38that generates initial structure
03:40and links the structure to dynamics.
03:42Alternatively, if we don't generate things using a model,
03:45if we have some sort of observable or data,
03:47then we can try to work in the other direction
03:49and go from dynamics to structure.
03:52Today during this talk,
03:53I'm gonna be focusing really on this first piece,
03:55a language for describing structures and patterns.
03:57And on the second piece on sort of an observable
04:00that I've been working on trying to compute to use,
04:04to try to connect these three points together.
04:08So to get started, a little bit of biology.
04:10Really I was inspired in this project
04:12by a paper from K.G. Mura.
04:14Here he was looking at a coupled oscillator model,
04:17this is a Kuramoto model for neural activity
04:20where the firing dynamics interact with the wiring.
04:22So the wiring in the couples,
04:25the oscillators would adapt on a slower time scale
04:29than the oscillators themselves.
04:31And that adaptation could represent
04:34different types of learning processes.
04:36For example, the fire-together wire-together rules
04:39or Hebbian learning,
04:41you can look at causal learning rules,
04:43or anti-Hebbian learning rules.
04:45And this is just an example I've run of this system.
04:48This system of OD is sort of interesting
04:50because it can generate all sorts of different patterns.
04:52You can see synchronized firing,
04:54you can see traveling waves,
04:55you can see chaos,
04:57these occur at different sort of critical boundaries.
04:59So you can see phase transitions
05:01when you have large collections of these oscillators.
05:04And depending on how they're coupled together,
05:05it behaves differently.
05:07In particular some of what's interesting here is
05:09that starting from some random seed topology,
05:13the dynamics play forward from that initial condition,
05:16and that random seed topology produces some ensemble of
05:19wiring patterns that are of themselves random.
05:22You can think about the ensemble of wiring patterns
05:24as being chaotic,
05:25sort of realizations of some random initialization.
05:29That said, you can also observe structures
05:32within the systems of coupled oscillators.
05:33So you could see large scale cyclic structures
05:36representing organized causal firing patterns
05:38in certain regimes.
05:40So this is a nice example where graph structure
05:42and learning parameters can determine dynamics,
05:44and in turn where those dynamics can determine structure.
05:48On the other side, you can also think
05:49about a data-driven side instead of a model-driven side.
05:53If we empirically observe sample trajectories
05:56of some observables, for example, neuron recordings,
05:58then we might hope to infer something
05:59about the connectivity that generates them.
06:01And so here instead of starting by posing a model,
06:04and then simulating it and studying how it behaves,
06:06we can instead study data or try to study structure in data.
06:10Often that data comes in the form of covariance matrices
06:12representing firing rates.
06:14And these covariance matrices maybe auto covariance matrices
06:17with some sort of time lag.
06:19Here there are a couple of standard structural approaches,
06:22so there is a motivic expansion approach.
06:25This was at least introduced by Brent Doiron's lab
06:28with his student, Gay Walker.
06:30Here the idea is that you define some graph motifs,
06:34and then you can study the dynamics
06:36in terms of those graph motifs.
06:38For example, if you build a power series in those motifs,
06:41then you can try to represent your covariance matrices
06:44in terms of that power series.
06:45And this is something we're gonna talk
06:46about quite a bit today.
06:47This is really part of why I was inspired by this work is,
06:49I had been working separately on the idea of
06:51looking at covariance matrices
06:53in terms of these power series expansions.
06:56This is also connected to topological data analysis,
06:59and this is where the conversations with Lek-Heng
07:01played a role in this work.
07:03Topological data analysis aims to construct graphs,
07:07representing dynamical sort of systems.
07:08For example, you might look at the dynamical similarity
07:11of firing patterns of certain neurons,
07:13and then tries to study the topology of those graphs.
07:18Again, this sort of leads to similar questions,
07:20but we can be a little bit more precise here
07:21for thinking in neuroscience.
07:24We can say more precisely, for example,
07:25how is information processing and transfer represented,
07:29both in these covariance matrices and the structures
07:32that we hope to extract from them.
07:33In particular, can we try
07:35and infer causality from firing patterns?
07:39And this is fundamentally an information theoretic question.
07:42Really we're asking, can we study
07:43the directed exchange of information from trajectories?
07:47Here one approach, I mean, in some sense
07:49you can never tell causality without some underlying model,
07:53without some underlying understanding of the mechanism.
07:56So if all we can do is observe,
07:58then we need to define what we mean by causality.
08:01A reasonable sort of standard definition here
08:03is Wiener Causality,
08:04which says that two times series share a causal relation,
08:06so we say X causes Y,
08:08if the history of X informs a future of Y.
08:12Note that here "cause" put in quotes,
08:14really means forecasts.
08:15That means that the past or the present of X
08:18carries relevant information about the future of Y.
08:22A natural measure of that information is transfer entropy.
08:26Transfer entropy was introduced by Schreiber in 2000,
08:30and it's the expected KL divergence
08:32between the distribution of the future of Y
08:35given the history of X
08:38and the marginal distribution of the future of Y.
08:41So essentially it's how much predictive information
08:43does X carry about Y?
08:46This is a nice quantity for a couple of reasons.
08:48First, it's zero when two trajectories are independent.
08:51Second, since it's just defined in terms of
08:53these conditional distributions, it's model free.
08:56So I put here no with a star because this,
08:59the generative assumptions actually do matter
09:01when you go to try and compute it,
09:02but in principle it's defined independent of the model.
09:05Again, unlike some other effective causality measures,
09:07it doesn't require introducing some time lag to define.
09:11It's a naturally directed quantity, right?
09:13We can say that the future of Y
09:15conditioned on the past of X and...
09:17That transfer entropy is defined on the terms of
09:20the future of Y conditioned on the past of X and Y.
09:23And that quantity is directed because reversing X and Y,
09:27it does not sort of symmetrically change this statement.
09:30This is different than quantities
09:31like mutual information or correlation
09:32that are also often used
09:34to try to measure effective connectivity in networks,
09:37which are fundamentally symmetric quantities.
09:41Transfer entropy is also less corrupted
09:43by measurement noise, linear mixing of signals,
09:46or shared coupling to external sources.
09:50Lastly, and maybe most interestingly,
09:52if we think in terms of correlations,
09:54correlations are really moments,
09:56correlations are really about covariances, right?
09:57Second order moments.
09:59Transfer entropies, these are about entropies,
10:01these are sort of logs of distributions,
10:04and so they depend on the full shape of these distributions,
10:06which means that transfer entropy can capture coupling
10:10that is maybe not apparent or not obvious,
10:13just looking at a second order moment type analysis.
10:17So transfer entropy has been applied pretty broadly.
10:20It's been applied to spiking cortical networks
10:22and calcium imaging,
10:24to MEG data in motor tasks and auditory discrimination.
10:29It's been applied to motion recognition,
10:31precious metal prices
10:32and multivariate time series forecasting,
10:34and more recently to accelerate learning
10:36in different artificial neural nets.
10:38So you can look at feedforward architectures,
10:40convolution architectures, even recurrent neural nets,
10:42and transfer entropy has been used
10:44to accelerate learning in those frameworks.
10:49For this part of the talk,
10:50I'd like to focus really on two questions.
10:52First, how do we compute transfer entropy?
10:55And then second, if we could compute transfer entropy
10:58and build a graph out of that,
11:00how would we study the structure of that graph?
11:01Essentially, how is information flow structured?
11:05We'll start with computing in transfer entropy.
11:09To compute transfer entropy,
11:10we actually need to write down an equation.
11:13So transfer entropy was originally introduced
11:14for discrete time arbitrary order Markov processes.
11:18So suppose we have two Markov processes: X and Y,
11:21and we'll let X of N denote the state of
11:23process X at time N,
11:25and XNK where the K is in superscript to denote the sequence
11:29starting from N minus K plus one going up to N.
11:32So that's sort of the last K states
11:35that the process X visited,
11:37then the transfer entropy from Y to X,
11:40they're denoted T, Y arrow to X
11:45is the entropy of the future of X conditioned on its past
11:50minus the entropy of the future of X conditioned on its past
11:54and the past of the trajectory Y.
11:56So here you can think the transfer entropy
11:57is essentially the reduction in entropy
11:59of the future states of X
12:00when incorporating the past of Y.
12:03This means that computing transfer entropy
12:05reduces to estimating essentially these entropies.
12:07That means we need to be able to estimate
12:08essentially the conditional distributions
12:10inside of these parentheses.
12:14That's easy for certain processes.
12:15So for example, if X and Y are Gaussian processes,
12:19then really what we're trying to compute
12:20is conditional mutual information.
12:22And there are nice equations
12:23for conditional mutual information
12:25when you have Gaussian random variables.
12:26So if I have three Gaussian random variables: X, Y, Z,
12:29possibly multivariate with joint covariance sigma,
12:33then the conditional mutual information
12:35between these variables,
12:36so the mutual information between X and Y conditioned on Z
12:39is just given by this ratio of log determinants
12:42of those covariances.
12:45In particular, a common test model used
12:48in sort of the transfer entropy literature
12:51are linear auto-regressive processes
12:53because a linear auto-regressive process
12:55when perturbed by Gaussian noise
12:57produces a Gaussian process.
12:58All of the different
12:59joint marginal conditional distributions are all Gaussian,
13:02which means that we can compute
13:03these covariances analytically,
13:05which then means that you can compute
13:06the transfer entropy analytically.
13:07So these linear auto-regressive processes
13:09are nice test cases
13:10because you can do everything analytically.
13:12They're also somewhat disappointing or somewhat limiting
13:15because in this linear auto-regressive case,
13:17transfer entropy is the same as Granger causality.
13:22And in this Gaussian case,
13:24essentially what we've done is we've reduced
13:26transfer entropy to a study of time-lagged correlations,
13:29so this becomes the same
13:30as sort of a correlation based analysis,
13:32we can't incorporate information beyond the second moments,
13:34if we restrict ourselves to Gaussian processes
13:36or Gaussian approximations.
13:39The other thing to note is this is strongly model-dependent
13:41because this particular formula
13:43for computing mutual information
13:44depends on having Gaussian distributions.
13:50In a more general setting or a more empirical setting,
13:53you might observe some data.
13:55You don't know if that data
13:56comes from some particular process,
13:58so you can't necessarily assume
13:59that conditional distributions are Gaussian,
14:01but we would still like to estimate transfer entropy,
14:03which leads to the problem of estimating transfer entropy
14:06given an observed time series.
14:08We would like to do this again, sans some model assumption,
14:10so we don't wanna assume Gaussianity.
14:13This is sort of trivial,
14:14again, I star that in discrete state spaces
14:17because essentially it amounts to counting occurrences,
14:20but it becomes difficult whenever the state spaces are large
14:23and/or high dimensional as they often are.
14:26This leads to a couple of different approaches.
14:28So as a first example, let's consider spike train data.
14:32So spike train data consists essentially of
14:35binning the state of a neuron into either on or off.
14:39So neurons, you can think either in the state zero or one,
14:41and then a pair wise calculation for transfer entropy
14:44only requires estimating a joint probability distribution
14:48over four to the K plus L states where K plus L,
14:51K is the history of X that we remember,
14:54and L is the history of Y.
14:57So if sort of the Markov process
14:59generating the spike train data is not of high order,
15:02does not have a long memory,
15:04then these K and L can be small,
15:06and this state space is fairly small,
15:08so this falls into that first category
15:10when we're looking at a discrete state space,
15:12and it's not too difficult.
15:15In a more general setting,
15:16if we don't try to bin the states
15:18of the neurons to on or off,
15:19for example, maybe we're looking at a firing rate model
15:22where we wanna look at the firing rates of the neurons,
15:24and that's a continuous random variable,
15:27then we need some other types of estimators.
15:29So the common estimator used here
15:31is a kernel density estimator, a KSG estimator,
15:34and this is designed for large continuous
15:36or high dimensional state spaces,
15:37e.g. sort of these firing rate models.
15:40Typically the approach is to employ a Takens delay map,
15:43which embeds your high dimensional data
15:45in some sort of lower dimensional space
15:48that tries to capture the intrinsic dimension
15:50of the attractor that your dynamic process settles onto.
15:55And then you try to estimate an unknown density
15:57based on this delay map using a k-nearest
15:59neighbor kernel density estimate.
16:01The advantage of this sort of
16:03k-nearest neighbor kernel density is
16:05that it dynamically adapts the width of the kernel,
16:07giving your sample density.
16:09And this has been implemented in some open source toolkits,
16:11these are the toolkits that we've been working with.
16:15So we've tested this in a couple of different models,
16:18and really I'd say this work,
16:19this is still very much work in progress,
16:20this is work that Bowen was developing over the summer.
16:23And so we developed a couple different models to test.
16:26The first were these Linear Auto-Regressive Networks,
16:29and we just used these to test
16:31the accuracy of the estimators
16:32because everything here is Gaussian,
16:33so you can compute the necessary
16:35transfer entropies analytically.
16:37The next more interesting class of networks
16:39are Threshold Linear Networks or TLNs.
16:42These are a firing rate model where your rate R
16:44obeys this sarcastic differential equation.
16:47So the rate of change and the rate, DR of T is,
16:51so you have sort of a leak term, negative RFT,
16:53and then plus here, this is essentially a coupling.
16:57All of this is inside here, the brackets with a plus,
17:00this is like a ReLU function,
17:02so this is just taking the positive part
17:04of what's on the inside.
17:05Here B is an activation threshold,
17:08W is a wiring matrix,
17:09and then R are those rates, again.
17:11And then C here, that's essentially covariance
17:13for some noise term for terming this process,
17:17we use these TLNs to test the sensitivity
17:19of our transfer entropy estimators
17:21to common and private noise sources as you change C,
17:24as well as sort of how well the entropy network
17:26agrees with the wiring matrix.
17:31A particular class of TLNs
17:32were really nice for these experiments
17:35what are called Combinatorial Threshold Linear Networks.
17:37These are really pretty new,
17:38these were introduced by Carina Curto's lab this year,
17:42and really this was inspired by a talk
17:45I'd seen her give at FACM in May.
17:49These are threshold linear networks
17:51where the weight matrix here, W,
17:52representing the wiring of the neurons
17:55is determined by a directed graph G.
17:58So you start with some directed graph G,
18:00that's what's shown here on the left.
18:01This figure is adapted from Carina's paper,
18:03this is a very nice paper
18:04if you'd like to take a look at it.
18:07And if I and J are not connected,
18:10then the weight matrix is assigned one value;
18:12and if they are connected, then it's assigned another value,
18:14and the wiring is zero if I equals J.
18:18These networks are nice
18:20if we wanna test structural hypotheses
18:22because it's very easy to predict from the input graph
18:25how the output dynamics of the network should behave,
18:28and they are really beautiful analysis
18:30that Carina does in this paper to show
18:32that you can produce all these different
18:33interlocking patterns of limit cycles
18:35and multistable states,
18:36and chaos, and all these nice patterns,
18:38and you can design them by picking these nice
18:41sort of directed graphs.
18:44The last class of networks that we've built to test
18:46are Leaky-Integrate and Fire Networks.
18:48So here we're using a Leaky-Integrate and Fire model
18:51where our wiring matrix, W, is drawn randomly.
18:54It's block stochastic, which means
18:57that it's (indistinct) between blocks.
19:00And it's a balanced network,
19:02so we have excitatory and inhibitory neurons
19:04that talk to each other,
19:06and maintain a sort of a balance in the dynamics here.
19:09The hope is to pick a large enough scale network
19:11that we see properly chaotic dynamics
19:13using this Leaky-Integrate and Fire model.
19:17These tests have yielded fairly mixed results,
19:21so the simple tests behave sort of as expected.
19:24So the estimators that are used are biased,
19:27and the bias typically decays slower
19:29than the variance in estimation,
19:30which means that you do need fairly long trajectories
19:32to try to properly estimate the transfer entropy.
19:36That said, transfer entropy does correctly identify
19:38causal relationships and simple graphs,
19:40and transfer entropy matches the underlying structure
19:44used in a Combinatorial Threshold Linear Network, so CTLN.
19:49Unfortunately, these results did not carry over as cleanly
19:52to the Leaky-Integrate and Fire models
19:54or to model sort of larger models.
19:56So what I'm showing you on the right here,
19:58this is a matrix where we've calculated
20:00the pairwise transfer entropy between all neurons
20:03in a 150 neuron balanced network.
20:06This has shown absolute, this has shown in the log scale.
20:09And the main thing I wanna highlight for it
20:11to taking a look at this matrix
20:12is very hard to see exactly what the structure is.
20:15You see this banding,
20:17that's because neurons tend to be highly predictive
20:20if they fire a lot.
20:21So there's a strong correlation
20:22between the transfer entropy, between X and Y,
20:25and just the activity level of X,
20:29but it's hard to distinguish block-wise differences,
20:31for example, between inhibitory neurons
20:33and excitatory neurons, and that really takes plotting out.
20:36So here this box in a whisker plot on the bottom,
20:39this is showing you if we group entries of this matrix
20:43by the type of connection,
20:44so maybe excitatory to excitatory,
20:46or inhibitory to excitatory, or so on,
20:48that the distribution of realized transfer entropy
20:50is really a different,
20:52but they're different in sort of subtle ways.
20:54So in this sort of larger scale balance network,
20:58it's much less clear whether transfer entropy
21:02effectively is like equated in some way
21:05with the true connectivity or wiring.
21:09In some ways, this is not a surprise
21:10because the behavior of the balance networks
21:12is inherently balanced,
21:13and (indistinct) inherently unstructured,
21:16but there are ways in which these experiments
21:18have sort of revealed confounding factors
21:20that are conceptual factors
21:22that make transfer entropies
21:24not as sort of an ideal measure,
21:25or maybe not as ideal as it seems
21:28given the start of this talk.
21:29So for example, suppose two trajectories:
21:33X and Y are both strongly driven by a third trajectory, Z,
21:36but X responds to Z first.
21:39Well, then the present information about X
21:40or the present state of X carries information
21:42about the future of Y, so X is predictive of Y,
21:45so X forecast Y.
21:46So in the transfer entropy or Wiener causality setting,
21:48we would say X causes Y,
21:51even if X and Y are only both responding to Z.
21:54So here in this example,
21:56suppose you have a directed tree where information
21:59or sort of dynamics propagate down the tree.
22:02If you look at this node here, PJ and I,
22:07PJ will react
22:08to essentially information traveling down this tree
22:12before I does, so PJ would be predictive for I.
22:15So we would observe an effective connection
22:19where PJ forecasts I,
22:21which means that neurons that are not directly connected
22:23may influence each other,
22:24and that this transfer entropy
22:26really you should think of in terms of forecasting,
22:29not in terms of being a direct analog to the wiring matrix.
22:33One way around this is to condition on the state
22:35of the rest of the network
22:37before you start doing some averaging.
22:39This leads to some other notions of entropy.
22:41So for example, causation entropy,
22:42and this is sort of a promising direction,
22:44but it's not a time to explore yet.
22:47So that's the estimation side,
22:49those are the tools for estimating transfer entropy.
22:52Now let's switch gears
22:53and talk about that second question I had introduced,
22:55which is essentially, how do we analyze structure?
22:57Suppose we could calculate a transfer entropy graph,
23:00how would we extract structural information from that graph?
23:04And here, I'm going to be introducing some tools
23:06that I've worked on for awhile
23:08for describing sort of random structures and graphs.
23:11These are tied back to some work I'd really done
23:15as a graduate student in conversations with Lek-Heng.
23:18So we start in a really simple context,
23:19which is the graph or network.
23:21This could be directed or undirected,
23:22however, we're gonna require that does not have self-loops,
23:24then it's finite.
23:26We'll let V here be the number of vertices
23:28and E be the number of edges.
23:30Then the object of study that we'll introduce
23:33is something called an edge flow.
23:34An edge flow is essentially a function
23:35on the edges of the graph.
23:37So this is a function that accepts pairs of endpoints
23:40and returns a real number,
23:42and this is an alternating function.
23:43So if I had to take F of IJ,
23:45that's negative F of JI
23:47because you can think of F of IJ as being some flow,
23:49like a flow of material between I and J,
23:52hence this name, edge flow.
23:54This is analogous to a vector field
23:56because this is like the analogous construction
23:58to a vector field in the graph,
23:59and represents some sort of flow between nodes.
24:02Edge flows are really sort of generic things,
24:04so you can take this idea of an edge flow
24:07and apply it in a lot of different areas
24:09because really all you need is,
24:10you just need to structure some alternating function
24:12on the edges of the graph.
24:13So I've sort of read papers
24:16and worked in a bunch of these different areas,
24:19particularly I've focused on applications of this
24:21in game theory, in pairwise and social choice settings,
24:25in biology and Markov chains.
24:26And a lot of this project has been attempting
24:28to take this experience working with edge flows in,
24:31for example, say non-equilibrium thermodynamics
24:34or looking at pairwise preference data,
24:36and looking at a different application area
24:38here to neuroscience.
24:40Really you could think about the edge flow
24:42or a relevant edge flow in neuroscience,
24:43you might be asking about asymmetries and wiring patterns,
24:46or differences in directed influence or causality,
24:49or really you could think about these
24:50transfer entropy quantities.
24:51This is why I was excited about transfer entropy.
24:53Transfer entropy is inherently directed notion
24:56of information flow,
24:57so it's natural to think
24:59that if you can calculate things like a transfer entropy,
25:01then really what you're studying
25:03is some sort of edge flow on a graph.
25:06Edge flows often are subject to
25:08sort of the same set of common questions.
25:10So if I wanna analyze the structure of an edge flow,
25:12there's some really big global questions
25:14that I would often ask,
25:15that get asked in all these different application areas.
25:19One common question is,
25:20well, does the flow originate somewhere and end somewhere?
25:23Are there sources and sinks in the graph?
25:25Another is, does it circulate?
25:26And if it does circulate, on what scales and where?
25:31If you have a network that's connected
25:33to a whole exterior network,
25:34for example, if you're looking at some small subsystem
25:37that's embedded in a much larger system
25:38as is almost always the case in neuroscience,
25:41then you also need to think about,
25:42what passes through the network?
25:43So is there a flow or a current that moves
25:46through the boundary of the network?
25:47Is there information that flows through the network
25:51that you're studying?
25:52And in particular if we have these different types of flow,
25:55if flow can originate and source and end in sinks,
25:57if it can circulate, if it can pass through,
25:59can we decompose the flow into pieces that do each of these,
26:03and ask how much of the flow does one, two, or three?
26:07Those questions lead to a decomposition.
26:11So here we're going to start with this simple idea,
26:13we're going to decompose an edge flow
26:15by projecting it onto orthogonal subspaces
26:17associated with some graph operators.
26:20Generically if we consider two linear operators: A and B,
26:24where the product A times B equals zero,
26:27then the range of B must be contained
26:29in the null space of A,
26:31which means that I can express
26:33essentially any set of real numbers.
26:35So you can think of this as being the vector space
26:38of possible edge flows as a direct sum of the range of B,
26:43the range of A transpose
26:45and the intersection of the null space of B transpose
26:47in the null space of A.
26:48This blue subspace, this is called the harmonic space,
26:53and this is trivial
26:56in many applications
26:58if you choose A and B correctly.
27:00So there's often settings where you can pick A and B,
27:02so that these two null spaces have no intersection,
27:06and then this decomposition boils down
27:08to just separating a vector space
27:10into the range of B and the range of A transpose.
27:16In the graph setting,
27:17our goal is essentially to pick these operators
27:19to the meaningful things.
27:20That is to pick graph operators,
27:22so that these subspaces carry a meaningful,
27:26or carry meaning in the structural context.
27:30So let's think a little bit about graph operators here,
27:33so let's look at two different classes of operators.
27:35So we can consider matrices that have E rows and N columns,
27:40or matrices that have L rows and E columns where,
27:44again, E is the number of edges in this graph.
27:48If I have a matrix with E rows,
27:50then each column of the matrix has as many entries
27:53as there are edges in the graph,
27:55so it can be thought of as itself an edge flow.
27:57So you could think that this matrix is composed
27:59of a set of columns where each column is some particular
28:02sort of motivic flow or flow motif.
28:05In contrast if I look at a matrix where I have E columns,
28:09then each row of the matrix is a flow motif,
28:11so products against M
28:14evaluate inner products against specific flow motifs.
28:18That means that in this context,
28:20if I look at the range of this matrix,
28:21this is really a linear combination
28:23of a specific subset of flow motifs.
28:25And in this context,
28:26if I look at the null space of the matrix,
28:28I'm looking at all edge flows orthogonal
28:30to that set of flow motifs.
28:32So here if I look at the range of a matrix with E rows,
28:36that subspace is essentially a modeling behavior
28:39similar to the motifs.
28:40So if I pick a set of motifs that flow out of a node
28:44or flow into a node,
28:45then this range is going to be a subspace of edge flows
28:48that tend to originate in sources and end in sinks.
28:51In contrast here, the null space of M,
28:54that's all edge flows orthogonal to the flow motifs,
28:57so it models behavior distinct from the motifs.
28:59Essentially this space asks, what doesn't the flow do?
29:02Whereas this space asks, what does the flow do?
29:07Here is a simple, sort of very classical example.
29:09And really this goes all the way back to,
29:11you could think like Kirchhoff electric circuit theory.
29:14We can define two operators.
29:15Here G, this is essentially a gradient operator.
29:18And if you've taken some graph theory,
29:20you might know this as the edge incidence matrix.
29:22This is a matrix which essentially records
29:25the endpoints of an edge
29:26and evaluates differences across it.
29:29So, for example, if I look at this first row of G,
29:33this corresponds to edge one in the graph,
29:35and if I had a function defined on the nodes in the graph,
29:39products with G would evaluate differences across this edge.
29:43If you look at its columns,
29:44each column here is a flow motif.
29:46So, for example, this highlighted second column,
29:49this is entries: one, negative one, zero, negative one.
29:52If you carry those back to the edges,
29:53that corresponds to this specific flow motif.
29:56So here this gradient,
29:58it's adjoint to essentially a divergence operator,
30:00which means that the flow motifs are unit inflows
30:03or unit outflows from specific nodes,
30:05like what's shown here.
30:07You can also introduce something like a curl operator.
30:10The curl operator evaluates paths, sums around loops.
30:13So this row here, for example, this is a flow motif
30:16corresponding to the loop labeled A in this graph.
30:20You could certainly imagine
30:21other operators' built cutter, other motifs,
30:23these operators are particularly nice
30:25because they define principled subspaces.
30:28So if we apply that generic decomposition,
30:31then we could say that the space
30:32of possible edge flows are E,
30:34it can be decomposed into the range of the grading operator,
30:37the range of the curl transpose,
30:39and the intersection of their null spaces
30:42into this harmonic space.
30:44This is nice because the range of the gradient that flows,
30:46it start and end somewhere.
30:48Those are flows that are associated with
30:50like motion down a potential.
30:52So these if you're thinking physics,
30:53you might say that these are sort of conservative,
30:55these are like flows generated by a voltage
30:57if you're looking at electric circuit.
30:59These cyclic flows, well, these are the flows
31:01in the range of the curl transpose,
31:03and then this harmonic space,
31:04those are flows that enter and leave the network
31:06without either starting or ending
31:09a sink or a source, or circulating.
31:11So you can think that really this decomposes
31:13the space of edge flows into flows that start
31:16and end somewhere inside the network.
31:17Flows that circulate within the network,
31:19and flows that do neither,
31:20i.e. flows that enter and leave the network.
31:22So this accomplishes that initial decomposition
31:25I'd set out at the start.
31:28Once we have this decomposition, then we can evaluate
31:31the sizes of the components of decomposition to measure
31:34how much of the flow starts and ends somewhere,
31:38how much circulates and so on.
31:39So we can introduce these generic measures
31:41we're given some operator N,
31:44we decompose the space of edge flows
31:46into the range of M and the null space of M transpose,
31:49which means we can project F onto these subspaces,
31:52and then just evaluate the sizes of these components.
31:55And that's a way of measuring
31:57how much of the flow behaves like
31:59the flow motifs contained in this operator,
32:01and how much it doesn't.
32:04So, yeah, so that lets us answer this question,
32:07and this is the tool that we're going to be using
32:09sort of as our measurable.
32:12Now that's totally easy to do,
32:16if you're given a fixed edge flow and a fixed graph
32:17because if you have fixed graph,
32:18you can build your operators, you choose the motifs,
32:20you have fixed edge flow, you just project the edge flow
32:23onto the subspaces spanned by those operators,
32:25and you're done.
32:27However, there are many cases where it's worth thinking
32:31about a distribution of edge flows,
32:33and then expected structures given that distribution.
32:37So here we're going to be considering random edge flows,
32:39for example, in edge flow capital F,
32:41here I'm using capital letters to denote random quantities
32:43sampled from an edge flow distributions.
32:45This is a distribution of possible edge flows.
32:46And this is worth thinking about
32:48because many generative models are stochastic.
32:51They may involve some random seed,
32:53or they may, for example, like that neural model
32:55or a lot of these sort of neural models be chaotic.
32:58So even if they are deterministic generative models,
33:01the output data behaves
33:03as it was sampled from the distribution.
33:05On the empirical side, for example,
33:07when we're estimating transfer entropy
33:09or estimating some information flow,
33:11then there's always some degree of measurement error
33:13or uncertainty in that estimate,
33:15which really means sort of from a Bayesian perspective,
33:18we should be thinking that our estimator
33:21is a point estimate drawn from some
33:23posterior distribution of edge flows,
33:24and then we're back in the setting where,
33:25again, we need to talk about a distribution.
33:28Lastly, this random edge flow setting is also
33:31really important if we wanna compare to null hypotheses
33:35because often if you want to compare
33:37to some sort of null hypothesis,
33:38it's helpful to have an ensemble of edge flows
33:41to compare against, which means that we would like
33:44to be able to talk about expected structure
33:46under varying distributional assumptions.
33:50If we can talk meaningfully about random edge flows,
33:54then really what we can start doing is
33:56we can start bridging the expected structure
33:59back to the distribution.
34:00So what we're looking for is a way of explaining
34:03sort of generic expectations
34:05of what the structure will look like
34:07as we vary this distribution of edge flows.
34:10You could think that a particular dynamical system
34:13generates a wiring pattern,
34:17that generates firing dynamics,
34:19those firing dynamics determine
34:21some sort of information flow graph.
34:23And then that information flow graph
34:25is really a sample from that generative model.
34:28And we would like to be able to talk about,
34:30what would we expect if we knew the distribution
34:33of edge flows about the global structure?
34:35That is, we'd like to bridge global structure
34:37back to this distribution,
34:39and then ideally you would bridge that distribution back
34:41to the generative mechanism.
34:42This is a project for a future work,
34:45obviously this is fairly ambitious.
34:47However, this first point is something that you can do
34:51really in fairly explicit detail.
34:53And that's what I'd like to spell out
34:54with the end of this talk is
34:55how do you bridge global structure
34:58back to a distribution of edge flows?
35:02So, yeah, so that's the main question,
35:05how does the choice of distribution
35:06influence the expected global flow structure?
35:12So first, we start with the Lemma.
35:15Suppose that we have a distribution of edge flows
35:17with some expectation F bar and some covariance,
35:20here I'm using double bar V to denote covariance.
35:24We'll let S contained in the set of,
35:26or S be a subspace
35:29contained within the vector space of edge flows,
35:31and we'll let Ps of S be the orthogonal projector onto S.
35:35Then Fs of S, that's the projection F onto this subspace S,
35:40the expectation of its norm squared
35:43is the norm of the expected flow projected onto S squared.
35:48So this is essentially the expectation of the sample
35:53is the measure evaluated of the expected sample.
35:56And then plus a term that involves an inner product
35:58between the projector onto the subspace,
36:00and the covariance matrix for the edge flows.
36:02Here this denotes the matrix inner product,
36:04so this is just the sum overall IJ entries.
36:09What's nice about this formula
36:10is at least in terms of expectation, it reduces the study
36:16of the bridge between distribution
36:18and network structure to a study of moments, right?
36:22Because we've replaced the distributional problem here
36:24with a linear algebra problem
36:27that's posed in terms of this projector,
36:29the projector under the subspace S,
36:31which is determined by the topology of the network,
36:33and the variance in that edge flow
36:36which is determined by your generative model.
36:40Well, you might say, okay, well, (laughs) fine,
36:42this is a matrix inner product, we can just stop here,
36:44we could compute this projector,
36:45we could sample a whole bunch of edge flows,
36:47compute this covariance.
36:48So you can do this matrix inner product,
36:50but I sort of agree
36:51because I suspect that you can really do more
36:55with this sort of inner product.
36:57So I'd like to highlight some challenges
37:00associated with this inner product.
37:03So first, let's say, I asked you to design a distribution
37:06with tunable global structure.
37:07So for example, I said, I want you to pick
37:09a generative model or design a distribution of edge flows
37:12that when I sample edge flows from it,
37:14their expected structures matched some expectation.
37:18It's not obvious how to do that given this formula,
37:22it's not obvious in particular
37:23because these projectors,
37:24like the projector on the subspace S typically depend
37:27in fairly non-trivial ways on the graph topology.
37:30So small changes in the graph topology
37:32can completely change as projector.
37:34In essence, it's hard to isolate topology from distribution.
37:37You can think that this inner product,
37:39if I think about it in terms of the IJ entries,
37:43while easy to compute, it's not easy to interpret
37:47because I and J are somewhat arbitrary indexing.
37:49And obviously really the topology of the graph,
37:51it's not encoded in the indexing,
37:53that's encoded in the structure of these matrices.
37:56So in some ways what we really need is a better basis
37:59for computing this inner product.
38:01In addition, computing this inner product
38:03just may not be empirically feasible
38:05because it might not be feasible
38:07to estimate all these covariances.
38:08There's lots of settings
38:09where if you have a random edge flow,
38:11it becomes very expensive to try to estimate
38:13all the covariances in this graph,
38:14err, sorry, in this matrix
38:16because this matrix has as many entries
38:19as there are pairs of edges in the graph.
38:22And typically that number of edges grows fairly quickly
38:26in the number of nodes of the graph.
38:27So in the worst case,
38:29the size of these matrices
38:31goes not to the square of the number of nodes of the graph,
38:33but the number of nodes of the graph to the fourth,
38:35so this becomes very expensive very fast.
38:37Again, we could try to address this problem
38:41if we had a better basis for performing this inner product
38:43because we might hope to be able to truncate
38:46somewhere in that basis,
38:47and use a lower dimensional representation.
38:50So to build there, I'm gonna show you
38:52a particular family of covariances.
38:55We're going to start with a very simple generative model,
38:58so let's suppose that each node of the graph
39:00is assigned some set of attributes,
39:02here a random vector X sampled from a...
39:04So you can think of trait space,
39:05a space of possible attributes,
39:07and these are sampled i.i.d.
39:09In addition, we'll assume
39:10that there exists an alternating function F,
39:13which accepts pairs of attributes and returns a real number.
39:17So this is something that I can evaluate
39:19on the endpoints of an edge,
39:21and return an edge flow value.
39:24In this setting,
39:26everything that I'd shown you before simplifies.
39:29So if my edge flow F is drawn by first sampling
39:33a set of attributes,
39:34and then plugging those attributes
39:35into functions on the edges, then the
39:40mean edge flow is zero, so that F bar goes away,
39:44and the covariance reduces to this form.
39:46So you have a standard form where the covariance
39:48in the edge flow is a function of two scalar quantities,
39:52that's sigma squared in row.
39:53These are both statistics associated with this function
39:56and the distribution of traits.
39:59And then some matrices,
40:00so we have an identity matrix,
40:02and we have this gradient matrix showing up again.
40:05This is really nice because when you plug it back in
40:07to try to compute say the expected sizes of the components,
40:13this matrix inner product
40:15that I was complaining about before,
40:17this whole matrix inner product simplifies.
40:19So when you have a variance
40:21that's in this nice, simple canonical form,
40:23then the expected overall size of the edge flow,
40:26that's just sigma squared,
40:27the expected size projected onto that
40:30sort of conservative subspace
40:32that breaks into this combination
40:35of the sigma squared in the row.
40:37Again, those are some simple statistics.
40:39And then V, E, L, and E,
40:41those are just sort of
40:42essentially dimension counting on the network.
40:43So this is the number of vertices, the number of edges,
40:47and the number of loops,
40:48the number of loops that's the number of edges
40:49minus the number of vertices plus one.
40:52And similarly, the expected cyclic size
40:55or size of the cyclic component reduces to,
40:57again, this sort of scalar factor
40:59in terms of some simple statistics
41:01and some dimension counting sort of
41:03topology related quantities.
41:07So this is very nice because this allows us
41:11to really separate the role of topology
41:13from the role of the generative model.
41:14The generative model determines sigma in row,
41:17and topology determines these dimensions.
41:22It turns out that the same thing is true,
41:26even if you don't sample the edge flow
41:29using this sort of trait approach,
41:31but the graph is complete.
41:33So if your graph is complete,
41:34then no matter how you sample your edge flow,
41:37for any edge flow distribution,
41:38exactly the same formulas hold,
41:40you just replace those simple statistics
41:43with estimators for those statistics given your sample flow.
41:47And this is sort of a striking result
41:49because this says that this conclusion
41:51that was linked to some specific generative model
41:54with some very sort of specific assumptions, right?
41:56We assumed it was i.i.d. extends to all complete graphs,
41:59regardless of the actual distribution that we sampled from.
42:05Up until this point,
42:06this is kind of just an algebra miracle.
42:09And one of the things I'd like to do
42:10at the end of this talk is explain why this is true,
42:13and show how to generalize these results.
42:16So to build there,
42:17let's emphasize some of the advantages of this.
42:19So first, the advantages of this model,
42:22it's mechanistically plausible in certain settings,
42:24it cleanly separated the role of topology and distribution.
42:28And these coefficients that had to do with the topology,
42:30these are just dimensions,
42:31these are non-negative quantities.
42:34So it's easy to work out monotonic relationships
42:36between expected structure
42:38and simple statistics of the edge flow distribution.
42:44The fact that you can do that enables more general analysis.
42:47So what I'm showing you on the right here,
42:48this is from a different application area.
42:51This was an experiment where we trained
42:53a set of agents to play a game using a genetic algorithm,
42:58and then we looked at the expected sizes of sort of cyclic
43:01and acyclic components in a tournament among those agents.
43:05And you could actually predict these curves
43:08using this sort of type of structural analysis
43:10because it was possible to predict the dynamics
43:13of the simple statistics, this sigma in this row.
43:17So this is a really powerful analytical tool,
43:20but it is limited to this particular model.
43:23In particular, it only models unstructured cycles.
43:26So if you look at the cyclic component
43:27generated by this model, it just looks like random noise
43:30that's been projected onto the range of the curl transpose.
43:34It's limited to correlations on adjacent edges,
43:36so we only generate correlations on edges
43:38that share an endpoint
43:39because you could think that all of the original
43:41random information comes from the endpoints.
43:45And then it's in some ways not general enough,
43:47so it lacks some expressivity.
43:48We can't parametize all possible expected structures
43:51by picking a sigma in a row.
43:54And we lack some notion of sufficiency,
43:56i.e. if the graph is not complete,
43:58then this nice algebraic property
44:01that it actually didn't matter what the distribution was,
44:03this fails to hold.
44:04So if the graph is not complete,
44:06then projection onto the family of covariances
44:09parameterized in this fashion
44:11changes the expected global structure.
44:15So we would like to address these limitations.
44:17And so our goal for the next part of this talk
44:19is to really generalize these results.
44:21To generalize, we're going to
44:23switch our perspective a little bit.
44:25So I'll recall this formula
44:27that if we generate our edge flow
44:30by sampling quantities on the endpoints,
44:32and then plugging them into functions on the edges,
44:34then you necessarily get a covariance
44:35that's in this two parameter family
44:37where I have two scalar quantities
44:39associated with the statistics of the edge flow.
44:41That's the sigma in this row.
44:42And then I have some matrices that are associated
44:44with the topology of the network
44:45in the subspaces I'm projecting onto.
44:48These are related to a different way
44:51of looking at the graph.
44:52So I can start with my original graph
44:54and then I can convert it to an edge graph
44:57where I have one node per edge in the graph,
45:00and nodes are connected if they share an endpoint.
45:04You can then assign essentially signs to these edges
45:07based on whether the edge direction chosen
45:11in the original graph is consistent
45:12or inconsistent at the node that links to edges.
45:16So for example, edges one and two both point into this node,
45:20so there's an edge that's linking
45:21one and two in the edge graph with a positive sum.
45:25This essentially tells you that the influence of
45:29random information assigned on this node linking one and two
45:33would positively correlate the sample edge flow
45:36on edges one and two.
45:38Then this form, what this form
45:41sort of for covariance matrices says,
45:43is that we're looking at families of edge flows
45:46that have correlations on edges sharing an endpoint.
45:49So edges at distance one in this edge graph,
45:51and non-adjacent edges are
45:52entirely independent of each other.
45:56Okay?
45:58So that's essentially what
45:59the trait performance model is doing,
46:01is it's parameterizing a family of covariance matrices
46:04where we're modeling correlations at distance one,
46:06but not further in the edge graph.
46:08So then the natural thought
46:09for how to generalize these results is to ask,
46:11can we model longer distance correlations
46:13through this graph?
46:15To do so, let's think a little bit about
46:17what this matrix that's showing up inside the covariance is.
46:21So we have a gradient, tons of gradient transpose.
46:24This is an effect of Laplacian for that edge graph.
46:30And you can do this for other motifs.
46:32If you think about different sort of motif constructions,
46:35essentially if you take a product of M transpose times M,
46:38that will generate something that looks like a Laplacian
46:41or an adjacency matrix for a graph
46:44where I'm assigning nodes to be motifs
46:47and looking at the overlap of motifs.
46:50And if I look at M times M transpose,
46:52and I'm looking at the overlap of edges via shared motifs.
46:55So these operators you can think
46:56about as being Laplacians for some sort of graph
46:59that's generated from the original graph motifs.
47:04Like any adjacency matrix,
47:06powers of something like GG transpose minus 2I,
47:11that will model connections along longer paths
47:14along longer distances in these graphs
47:16associated with motifs,
47:17in this case with the edge graph.
47:20So our thought is maybe,
47:21well, we could extend this trait performance family
47:23of covariance matrices by instead of only looking at
47:27a linear combination of an identity matrix, and this matrix,
47:31we could look at a power series.
47:32So we could consider combining powers of this matrix.
47:37And this will generate this family of matrices
47:39that are parameterized by some set of coefficients-
47:41<v Robert>Dr. Strang?</v>
47:43<v ->Ah, yes?</v> <v ->I apologize (mumbles)</v>
47:44I just wanna remind you
47:46that we have a rather tight time limit,
47:48approximately a couple of minutes.
47:50<v ->Yes, of course.</v>
47:52So here, the idea is to parametize this family of matrices
47:57by introducing a set of polynomials with coefficients alpha,
48:00and then plugging into the polynomial,
48:03the Laplacian that's generated by sort of the,
48:06or the adjacent matrix
48:08generated by the graph motifs we're interested in.
48:11And that trait performance result,
48:12that was really just looking at the first order case here,
48:14that was looking at a linear polynomial
48:17with these chosen coefficients.
48:20This power series model is really nice analytically,
48:24so if we start with some graph operator M,
48:28and we consider the family of covariance matrices
48:31generated by plugging M, M transpose
48:34into some polynomial and power series,
48:36then this family of matrices is contained
48:39within the span of powers of M, M transpose.
48:45You can talk about this family
48:47sort of in terms of combinatorics.
48:48So for example, if we use that gradient
48:50times gradient transpose minus twice the identity,
48:52then powers of this is essentially, again, paths counting.
48:55So this is counting paths of length N.
48:58You can also look at things like the trace of these powers.
49:00So if you look at the trace series,
49:02that's the sequence where you look at the trace
49:04of powers of these,
49:06essentially these adjacency matrices.
49:09This is doing some sort of loop count
49:11where we're counting loops of different length.
49:14And you could think that this trace series
49:15in some sense is controlling amplification
49:17of self-correlations within the sampled edge flow.
49:22Depending on the generative model,
49:23we might wanna use different operators
49:25for generating this family.
49:26So, for example, going back to that
49:28synaptic plasticity model with coupled oscillators,
49:31in this case using the gradient to generate
49:34the family of covariance matrices.
49:35It's not really the right structure
49:37because the dynamics of the model
49:39sort of have these natural cyclic connections.
49:43So it's better to build the power series using the curl.
49:46So depending on your model,
49:47you can adapt this power series family
49:49by plugging in a different graph operator.
49:53Let's see now, what happens if we try to compute
49:55the expected sizes of some components
49:58using a power series of this form?
50:00So if the variance or covariance matrix
50:04for our edge flow is a power series in,
50:06for example, the gradient, gradient transpose,
50:08then the expected sizes of the measures
50:12can all be expressed as
50:13linear combinations of this trace series
50:16and the coefficients of the original polynomial.
50:19For example, the expected cyclic size of the flow
50:21is just the polynomial evaluated at negative two
50:24multiplied by the number of the loops in the graph.
50:26And this really generalizes that trait performance result
50:29because the trait performance result is given
50:31by restricting these polynomials to be linear.
50:34Okay?
50:36This you can extend sort of to other bases,
50:41but really what this accomplishes is
50:43by generalizing trait performance,
50:45we achieve this sort of generic properties
50:50that it failed to have.
50:52So in particular, if I have an edge flow subspace S
50:56spanned by a set of flow motifs stored in some operator M,
50:59then this power series family of covariance
51:01is associated with the Laplacian,
51:03that is M times M transpose is both expressive
51:07in the sense that for any non-negative A and B,
51:11I can pick some alpha and beta,
51:13so that the expected size of the projection of F
51:16onto the subspace is A,
51:18and the projected size of F
51:19onto the subspace orthogonal to S is B
51:23for any covariance in this power series family.
51:27And it's sufficient in the sense
51:29that for any edge flow distribution
51:31with mean zero in covariance V.
51:35If C is the matrix nearest to V in Frobenius norm
51:38restricted to the power series family,
51:40then these inner products computed in terms of C
51:44are exactly the same as the inner products
51:46computed in terms of V,
51:47so they directly predict the structure,
51:49which means that if I use this power series family,
51:51discrepancies off of this family
51:54don't change the expected structure.
51:57Okay?
51:57So I know I'm short on time here,
51:59so I'd like to skip then just to the end of this talk.
52:03There's further things you can do with this,
52:04this is sort of really nice.
52:06Mathematically you can build an approximation theory
52:08out of this and study for different random graph families,
52:12how many terms in these power series you need?
52:15And those terms define some nice,
52:17sort of simple minimal set of statistics
52:19to try to sort of estimate structure,
52:22but I'd like to really just get to the end here
52:25and emphasize the takeaways from this talk.
52:28So the first half of this talk
52:30was focused on information flow.
52:32What we saw is that information flow is a non-trivial,
52:35but well studied, estimation problem.
52:37And this is something that at least on my side
52:38sort of is a work in progress with students.
52:41Here in some ways, the conclusion of that first half
52:43would be that causation entropy
52:45may be a more appropriate measure than TE
52:47when trying to build these flow graphs
52:49to apply these structural measures to.
52:51Then on the structural side,
52:53we can say that power series family,
52:55this is a nice family of covariance matrices.
52:57It has nice properties that are useful empirically
52:59because they let us build global correlation structures
53:02from a sequence of local correlations
53:03from that power series.
53:06If you plug this back into the expected measures,
53:08you can recover monotonic relations,
53:10like in that limited trait performance case.
53:12And truncation of these power series
53:14reduces the number of quantities
53:16that you would actually need to measure.
53:19Actually to a number of quantities
53:20that can be quite small relative to the graph,
53:22and that's where this approximation theory comes in.
53:25One way, sort of maybe to summarize this entire approach
53:28is what we've done is by looking at these power series
53:31built in terms of the graph operators
53:33is it provides a way to study
53:35inherently heterogeneous connections,
53:38or covariances, or edge flow distributions
53:41using a homogeneous correlation model
53:43that's built sort of at multiple scales
53:45by starting the local scale, and then looking at powers.
53:49In some ways this is a comment
53:50that I ended a previous version of this talk with.
53:53I still think that this structural analysis is in some ways
53:56a hammer seeking a nail,
53:57and that this inflammation flow construction,
53:59this is work in progress to try to build that nail.
54:02So thank you all for your attention,
54:04I'll turn it now over to questions.
54:09<v Robert>(mumbles) really appreciate it.</v>
54:14Unfortunately, for those of you on Zoom,
54:16you're welcome to keep up the conversation,
54:17so (mumbles) unfortunately have to clear the room.
54:20So I do apologize (mumbles)
54:25Dr. Steinman?
54:27It might be interesting, yeah. (laughs)
54:28(students laugh)
54:30Dr. Strang? <v ->Oh, yes, yeah.</v>
54:33<v Robert>Okay, do you mind if people...?</v>
54:35Yeah, we have to clear the room,
54:36do you mind if people email you if they have questions?
54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
54:42Do I mind if...?
54:45<v Robert>We have to clear the room,</v>
54:47do you mind if people email you if they have questions?
54:49<v ->No, not at all.</v>
54:50<v Robert>(mumbles) may continue the conversation,</v>
54:52so I do apologize, they are literally
54:54just stepping in the room right now.
54:57<v ->Okay, no, yeah, that's totally fine.</v>
54:59<v Robert>Thank you, thank you.</v>
55:01And thanks again for a wonderful talk.
55:03<v ->Thank you.</v>