Skip to Main Content

YSPH Biostatistics Seminar: “Robust Mendelian Randomization in the Presence of Many Weak Instruments and Widespread Horizontal Pleiotropy”

September 21, 2022

Speaker: Ting Ye, PhD, Assistant Professor, Department of Biostatistics, University of Washington

September 20, 2022

ID
8096

Transcript

  • 00:00<v Instructor>Good afternoon.</v>
  • 00:01In respect for everybody's time today,
  • 00:04let's go ahead and get started.
  • 00:07So today, it is my pleasure to introduce
  • 00:09Dr. Alexander Strang.
  • 00:12Dr. Strang earned his bachelor's in mathematics, in physics,
  • 00:16as well as his PhD in applied mathematics
  • 00:19from Case Western Reserve University in Cleveland, Ohio.
  • 00:24Born in Ohio, so representing.
  • 00:27He studies variational inference problems,
  • 00:29noise propagation in biological networks,
  • 00:32self organizing edge flows,
  • 00:34and functional form game theory
  • 00:36at the University of Chicago,
  • 00:38where he is a William H. Kruskal Instructor
  • 00:40of physics and applied mathematics.
  • 00:43Today, he's going to talk to us about motivic expansion
  • 00:47of global information flow in spike train data.
  • 00:50Let's welcome our speaker.
  • 00:54<v ->Okay, thank you very much.</v>
  • 00:56Thank you, first, for the kind invite,
  • 00:59and for the opportunity to speak here in your seminar.
  • 01:03So, I'd like to start with some acknowledgements.
  • 01:06This is very much work in progress.
  • 01:09Part of what I'm going to be showing you today
  • 01:11is really the work of a Master's student
  • 01:12that I've been working with this summer, that's Bowen,
  • 01:15and really, I'd like to thank Bowen
  • 01:16for a lot of the simulation,
  • 01:18and a lot of the TE calculation I'll show you later.
  • 01:21This project, more generally, was born out of conversations
  • 01:23with Brent Doiron and Lek-Heng Lim here at Chicago.
  • 01:28Brent really was the inspiration
  • 01:29for starting to venture into computational neuroscience.
  • 01:33I really say that I am new to this world,
  • 01:35this world is exciting to me, but really it's a world
  • 01:38that I am actively exploring and learning about.
  • 01:42So I look forward to conversations afterwards
  • 01:44to learn more here.
  • 01:46My background was much more inspired
  • 01:48by Lek-Heng's work in computational technology,
  • 01:52and some of what I'll be presenting today
  • 01:54is really inspired by conversations with him.
  • 01:58So, let's start with some introduction and motivation.
  • 02:01The motivation, personally, for this talk.
  • 02:05So it goes back, really, to work that I started
  • 02:06when I was a graduate student.
  • 02:08I've had this long standing interest in the interplay
  • 02:11between structure and dynamics, in particular in networks.
  • 02:14I've been interested in questions like
  • 02:16how does the structure of a network determine
  • 02:17dynamics of processes on that network.
  • 02:21And, in turn, how do processes on that network
  • 02:24give rise to structure?
  • 02:26On the biological side...
  • 02:30On the biological side, in today's talk,
  • 02:32I'm going to be focusing on applications of this question
  • 02:36within neural networks.
  • 02:38And I think that this world of computational neuroscience
  • 02:40is really exciting if you're interested in this interplay
  • 02:42between structure and dynamics,
  • 02:44because neural networks encode, transmit, and process
  • 02:47information via dynamical processes.
  • 02:49For example, the process, the dynamical process
  • 02:53of a neural network is directed by the wiring patterns,
  • 02:56by the structure of that network, and moreover,
  • 02:58if you're talking about some sort of learning process,
  • 03:01then those wiring patterns can change and adapt
  • 03:04during the learning process, so they are themselves dynamic.
  • 03:08In this area, I've been interested in questions like,
  • 03:10how is the flow of information governed
  • 03:12by the wiring pattern,
  • 03:14how do patterns of information flow present themselves
  • 03:17in data, and can they be inferred from that data,
  • 03:19and what types of wiring patterns
  • 03:21might develop during learning.
  • 03:24Answering questions of this type
  • 03:25requires a couple of things.
  • 03:26So the very, very big picture requires a language
  • 03:29for describing structures and patterns,
  • 03:31it requires having a dynamical process,
  • 03:33some sort of model for the neural net,
  • 03:35and it requires a generating model
  • 03:38that generates initial structure,
  • 03:40and links structure to dynamics.
  • 03:42Alternatively, if we don't generate things using a model,
  • 03:45if we have some sort of observable or data,
  • 03:47then we can try to work in the other direction
  • 03:49and go from dynamics to structure.
  • 03:52Today, during this talk, I'm going to be focusing really
  • 03:54on this first piece,
  • 03:55on a language for describing structures and patterns,
  • 03:57and on the second piece,
  • 03:59on an observable that I've been working on
  • 04:01trying to compute to use to try to connect
  • 04:05these three points together.
  • 04:08So, to get started, a little bit of biology.
  • 04:10Really, I was inspired in this project by a paper
  • 04:13from Keiji Miura.
  • 04:14He was looking at a coupled oscillator model,
  • 04:17this was a Kuramoto model for neural activity
  • 04:20where the firing dynamics interact with the wiring.
  • 04:22So the wiring that couples the oscillators
  • 04:26would adapt on a slower timescale
  • 04:29than the oscillators themselves,
  • 04:31and that adaptation could represent
  • 04:34different types of learning processes.
  • 04:36For example, the fire-together wire-together rules,
  • 04:40so Hebbian learning,
  • 04:41you could look at causal learning rules,
  • 04:43or anti-Hebbian learning rules.
  • 04:45This is just an example of one, of the system.
  • 04:48This system of (indistinct) is sort of interesting
  • 04:50because it can generate all sorts of different patterns.
  • 04:52You can see synchronized firing,
  • 04:54you can see traveling waves,
  • 04:55you can see chaos.
  • 04:57These occur at different critical boundaries.
  • 04:59So you can see phase transmissions
  • 05:01when you have large collections of these oscillators.
  • 05:04And depending on how they're coupled together,
  • 05:05it behaves differently.
  • 05:07In particular, what's interesting here is that
  • 05:10starting from some random seed topology,
  • 05:13the dynamics play forward from that initial condition,
  • 05:16and that random seed topology
  • 05:18produces an ensemble of wiring patterns
  • 05:20that are themselves random.
  • 05:22And we can think of that ensemble of wiring patterns
  • 05:24as being chaotic realizations of some random initialization.
  • 05:29That said, you can also observe structures
  • 05:32within the systems of coupled oscillators.
  • 05:33So you can see large scale cyclic structures
  • 05:36representing organized causal firing patterns
  • 05:38in certain regimes.
  • 05:40So this is a nice example
  • 05:41where graph structure and learning parameters
  • 05:43can determine dynamics, and in turn,
  • 05:44where those dynamics can determine structure.
  • 05:48On the other side, you can also think about
  • 05:49a data-driven side instead of a model-driven side.
  • 05:53If we empirically observe sample trajectories
  • 05:56of some observables, for example, neuron recordings,
  • 05:58then we might hope to infer something
  • 05:59about the connectivity that generates them.
  • 06:01And so here, instead of starting by posing a model,
  • 06:04and then simulating it and setting up how it behaves,
  • 06:06we can instead study data,
  • 06:07or try to study structure in data.
  • 06:10Often, that data comes in the form of covariance matrices
  • 06:12representing firing rates.
  • 06:14And these covariance matrices
  • 06:15may be auto covariance matrices with some sort of time-lag.
  • 06:19Here, there are a couple of standard structural approaches.
  • 06:22So there's a motivic expansion approach.
  • 06:25This was at least introduced by Brent Doiron's lab,
  • 06:28with his student, Gabe Ocker.
  • 06:30Here, the idea is that you define some graph motifs,
  • 06:34and then you can study the dynamics
  • 06:36in terms of those graph motifs.
  • 06:38For example, if you build a power series in those motifs,
  • 06:41then you can try to represent your covariance matrices
  • 06:44in terms of that power series.
  • 06:45And this is something we're gonna talk about
  • 06:46quite a bit today.
  • 06:47This really, part of why I was inspired by this work is
  • 06:49I had been working separately
  • 06:51on the idea of looking at covariance matrices
  • 06:53in terms of these power series expansions.
  • 06:56This is also connected to topological data analysis,
  • 06:59and this is where the conversations with Lek-Heng
  • 07:01played a role in this work.
  • 07:03Topological data analysis aims to construct graphs
  • 07:07representing dynamical systems.
  • 07:08For example, you might look at the dynamical similarity
  • 07:11of firing patterns of certain neurons,
  • 07:13and then try to study the topology of those graphs.
  • 07:18Again, this leads to similar questions,
  • 07:20but we could be a little bit more precise here
  • 07:21for thinking in neuroscience.
  • 07:24We can say more precisely, for example,
  • 07:25how is information processing and transfer represented,
  • 07:29both in these covariance matrices
  • 07:31and the structures that we hope to extract from them?
  • 07:33In particular, can we try and infer causality
  • 07:36from firing patterns?
  • 07:39And this is fundamentally an information theoretic question.
  • 07:42And really, we're asking,
  • 07:43can we study the directed exchange of information
  • 07:45from trajectories?
  • 07:47Here, one approach, I mean, in some sense,
  • 07:49you can never tell causality without some underlying model,
  • 07:53without some underlying understanding and mechanism,
  • 07:56so if all we can do is observe,
  • 07:58then we need to define what we mean by causality.
  • 08:01A reasonable standard definition here is Wiener causality,
  • 08:04which says that two time series share a causal relation,
  • 08:06so we say x causes y,
  • 08:08if the history of x informs the future of y.
  • 08:12And note that here, cause, I put in quotes,
  • 08:14really means forecasts.
  • 08:16It means that the past, or the present of x,
  • 08:18carries relevant information about the future of y.
  • 08:22A natural measure of that information is transfer entropy.
  • 08:26Transfer entropy was introduced by Schrieber in 2000,
  • 08:30and is the expected KL divergence
  • 08:32between the distribution of the future of y
  • 08:35given the history of x,
  • 08:38and the marginal distribution of the future of y.
  • 08:41So essentially, it's how much predictive information
  • 08:43does x carry about y.
  • 08:46This is a nice quantity for a couple of reasons.
  • 08:48First, it's zero when two trajectories are independent.
  • 08:51Second, since it's just defining
  • 08:53some of these conditional distributions, it's model free,
  • 08:56so I put here no with a star,
  • 08:58because generative assumptions actually do matter
  • 09:01when you go to try and compute it, but in principle,
  • 09:02it's defined independent of the model.
  • 09:05Again, unlike some other effective causality measures,
  • 09:07it doesn't require introducing some time-lag to define.
  • 09:11It's a naturally directed quantity.
  • 09:13We can say that the future of y
  • 09:15conditioned on the past of x...
  • 09:17That transfer entropy is defined in terms of the future of y
  • 09:20conditioned on the past of x and y.
  • 09:23And that quantity is directed, because reversing x and y
  • 09:27does not symmetrically change the statement.
  • 09:30This is different than quantities like mutual information
  • 09:32or correlation, that are also often used
  • 09:34to try and measure effective connectivity in networks
  • 09:37which are fundamentally symmetric quantities.
  • 09:41Transfer entropy is also less corrupted
  • 09:43by measurement noise, linear mixing of signals,
  • 09:46or shared coupling to external sources.
  • 09:50Lastly, and maybe most interestingly,
  • 09:52if we think in terms of correlations,
  • 09:54correlations are really moments,
  • 09:56correlations are really about covariances, right,
  • 09:57second order moments.
  • 09:59Transfer entropies, these are about entropies,
  • 10:01these are logs of distributions,
  • 10:04and so they depend on the full shape of these distributions.
  • 10:06Which means that transfer entropy can capture coupling
  • 10:10that is maybe not apparent, or not obvious
  • 10:13just looking at second order moment type analysis.
  • 10:17So transfer entropy has been applied pretty broadly.
  • 10:20It's been applied to spiking cortical networks
  • 10:22and calcium imaging,
  • 10:24to MEG data in motor tasks and auditory discrimination,
  • 10:29it's been applied to emotion recognition,
  • 10:31precious metal prices
  • 10:32and multivariate time series forecasting,
  • 10:34and more recently, to accelerate learning
  • 10:36in different artificial neural nets.
  • 10:38So you can look at feedforward architectures,
  • 10:40convolutional architectures, even recurrent neural nets.
  • 10:42And transfer entropy has been used to accelerate learning
  • 10:45in those frameworks.
  • 10:49For this part of the talk,
  • 10:50I'd like to focus really on two questions.
  • 10:52First, how do we compute transfer entropy,
  • 10:55and then second, if we could compute transfer entropy,
  • 10:58and build a graph out of that,
  • 11:00how would we study the structure of that graph?
  • 11:01Essentially, how is information flow structured?
  • 11:05We'll start with computing transfer entropy.
  • 11:09To compute transfer entropy,
  • 11:10we actually need to write down an equation.
  • 11:13So transfer entropy was originally introduced
  • 11:14for discrete time arbitrary order Markov processes.
  • 11:18So suppose we have two Markov processes X and Y.
  • 11:20And we'll let Xn denote the state of process X at time n,
  • 11:25and Xnk, where the k is in superscript,
  • 11:27to denote the sequence starting from n minus k plus 1
  • 11:31going up to n.
  • 11:32So that's the last k states that the process X visited.
  • 11:37Then, the transfer entropy from Y to X,
  • 11:40they're denoted T, Y over to X,
  • 11:44is the entropy of the future of X, conditioned on its past,
  • 11:50minus the entropy of the future of X conditioned on its past
  • 11:54and the past of the trajectory Y.
  • 11:56So here, you can think the transfer entropy
  • 11:57is essentially the reduction in entropy
  • 11:59of the future states of X when incorporating the past of Y.
  • 12:03This means that computing transfer entropy
  • 12:05reduces to estimating essentially these entropies.
  • 12:07That means we need to estimate essentially
  • 12:09the conditional distributions inside of these parentheses.
  • 12:14That's easy for certain processes, so for example,
  • 12:16if X and Y are Gaussian processes,
  • 12:19then really what we're trying to compute
  • 12:20is conditional mutual information,
  • 12:22and there are nice equations
  • 12:23for conditional mutual information
  • 12:25when you have Gaussian random variables.
  • 12:26So if I have three Gaussian random variables, X, Y, Z,
  • 12:29possibly multivariate, with joint covariant sigma,
  • 12:33then the conditional mutual information
  • 12:35between these variables, so the mutual information
  • 12:37between X and Y conditioned on Z,
  • 12:39is just given by this ratio of log determinants
  • 12:42of those convariances.
  • 12:45In particular, a common test model used
  • 12:48in the transfer entropy literature
  • 12:51are linear auto-regressive processes,
  • 12:53because a linear auto-regressive process,
  • 12:55when perturbed by Gaussian noise,
  • 12:57produces a Gaussian process.
  • 12:58All of the different joint marginal
  • 13:00conditional distributions are all Gaussian,
  • 13:02which means that we can compute
  • 13:03these covariances analytically, which then means
  • 13:06that you can compute the transfer entropy analytically.
  • 13:07So these linear auto-regressive processes
  • 13:09are nice test cases,
  • 13:10'cause you can do everything analytically.
  • 13:12They're also somewhat disappointing, or somewhat limiting,
  • 13:15because in this linear auto-regressive case,
  • 13:17transfer entropy is the same as Granger causality.
  • 13:22And in this Gaussian case, essentially what we've done
  • 13:25is we've reduced transfer entropy
  • 13:27to a study of time-lagged correlations.
  • 13:29So this becomes the same as a correlation based analysis.
  • 13:32We can't incorporate information beyond the second moments
  • 13:34if we restrict ourselves to Gaussian processes,
  • 13:36or Gaussian approximations.
  • 13:39The other thing to note is this is strongly model dependent,
  • 13:41because this particular formula
  • 13:43for computing mutual information
  • 13:44depends on having Gaussian distributions.
  • 13:50In a more general setting, or a more empirical setting,
  • 13:53you might observe some data.
  • 13:55You don't know if that data
  • 13:56comes from some particular process,
  • 13:58and you can't necessarily assume
  • 13:59the conditional distribution is Gaussian.
  • 14:01But we would still like to estimate transfer entropy,
  • 14:03which leads to the problem of estimating transfer entropy
  • 14:06given an observed time series.
  • 14:08We would like to do this, again, sans model assumptions,
  • 14:11so we don't want to assume Gaussianity.
  • 14:13This is sort of trivial, again, I star that,
  • 14:16in discrete state spaces,
  • 14:17because essentially it amounts to counting occurrences.
  • 14:20But it becomes difficult whenever the state spaces are large
  • 14:23and/or high dimensional, as they often are.
  • 14:26This leads to a couple of different approaches.
  • 14:28So, as a first example, let's consider spike train data.
  • 14:32So spike train data consists, essentially,
  • 14:34of binning the state of a neuron into either on or off.
  • 14:39So neurons, you can think either in a state zero or one.
  • 14:41And then a pairwise calculation for transfer entropy
  • 14:44only requires estimating a joint probability distribution
  • 14:48over 4 to the k plus l states, where k plus l,
  • 14:51k is the history of x that we remember,
  • 14:54and l is the history of y.
  • 14:56So if the Markov process generating the spike train data
  • 15:01is not of high order, does not have a long memory,
  • 15:04then these k and l can be small,
  • 15:06and this state space is fairly small,
  • 15:08so this falls into that first category,
  • 15:10when we're looking at a discrete state space
  • 15:12and it's not too difficult.
  • 15:15In a more general setting, if we don't try to bin the states
  • 15:18of the neurons to on or off,
  • 15:19for example, maybe we're looking at a firing rate model,
  • 15:22where we want to look at the firing rates of the neurons,
  • 15:24and that's a continuous random variable,
  • 15:27then we need some other types of estimators.
  • 15:29So the common estimator used here
  • 15:31is a kernel density estimator, or KSG estimator.
  • 15:34And this is designed for large, continuous,
  • 15:36or high dimensional state spaces,
  • 15:37e.g., these firing rate models.
  • 15:40Typically, the approach is to employ a Takens delay map,
  • 15:43which embeds your high dimensional data
  • 15:45in some sort of lower dimensional space,
  • 15:48that tries to capture the intrinsic dimension
  • 15:50of the attractor that your dynamic process settles onto.
  • 15:55And then you try to estimate an unknown density
  • 15:57based on this delay map using a k-nearest neighbor
  • 16:00kernel density estimate.
  • 16:01The advantage of this sort of k-nearest neighbor
  • 16:04kernel density is it dynamically adapts
  • 16:06the width of the kernel given your sample density.
  • 16:09And this has been implemented in some open source toolkits.
  • 16:11These are the toolkits that we've been working with.
  • 16:15So we've tested this on a couple of different models.
  • 16:18And really, I'd say this work,
  • 16:19this is still very much work in progress,
  • 16:20this is work that Bowen was developing over the summer.
  • 16:23And so we developed a couple of different models to test.
  • 16:26The first were these linear auto-regressive networks,
  • 16:29and we just used these
  • 16:30to test the accuracy of the estimators,
  • 16:32because everything here is Gaussian, so you can compute
  • 16:34the necessary transfer entropies analytically.
  • 16:37The next, more interesting class of networks
  • 16:39are threshold linear networks, or TLNs.
  • 16:42These are a firing rate model, where your rate, r,
  • 16:44obeys this stochastic differential equation.
  • 16:47So the rate of change in the rate, dr(t), is...
  • 16:51So you have sort of a leaf term, -r(t), and then plus,
  • 16:55here, this is essentially a coupling,
  • 16:57all of this is inside here, the brackets with a plus,
  • 17:00this is like a (indistinct) function,
  • 17:02so this is just taking the positive part
  • 17:04of what's on the inside.
  • 17:05Here, b is an activation threshold,
  • 17:08W is a wiring matrix, and then r are those rates again.
  • 17:11And then C here, that's essentially covariants
  • 17:13for some noise term perturbing this process.
  • 17:17We use these TLNs to test the sensitivity
  • 17:19of our transfer entropy estimators
  • 17:21to common and private noise sources as you change C,
  • 17:24as well as how well the transfer entropy network agrees
  • 17:27with the wiring matrix.
  • 17:31A particular class of TLNs that were really nice
  • 17:33for these experiments are called
  • 17:35combinatorial threshold linear networks.
  • 17:37These are really pretty new,
  • 17:38these were introduced by Carina Curto's lab this year.
  • 17:42And really, this was inspired by a talk I'd seen her give
  • 17:47at FACM in May.
  • 17:49These are threshold linear networks
  • 17:51where the weight matrix here, W,
  • 17:52representing the wiring of the neurons,
  • 17:55is determined by a directed graph G.
  • 17:58So you start with some directed graph G,
  • 18:00that's what's shown here on the left.
  • 18:01This figure is adapted from Carina's paper,
  • 18:03this is a very nice paper
  • 18:04if you'd like to take a look at it.
  • 18:07And if i and j are not connected,
  • 18:10then the weight matrix is assigned one value,
  • 18:12and if they are connected, then it's assigned another value.
  • 18:14And the wiring is zero if i equals j.
  • 18:18These networks are nice if we want to test
  • 18:20structural hypotheses, because it's very easy to predict
  • 18:24from the input graph how the output dynamics
  • 18:27of the network should behave.
  • 18:28They're a really beautiful analysis
  • 18:30that Carina does in this paper to show
  • 18:32that you can produce all these different
  • 18:33interlocking patterns of limit cycles,
  • 18:35and multi-step states, and chaos,
  • 18:37and all these nice patterns,
  • 18:38and you can design them
  • 18:39by picking these nice directed graphs.
  • 18:44The last class of networks that we've built to test
  • 18:46are leaky-integrate and fire networks.
  • 18:48So here, we're using a leaky integrate and fire model,
  • 18:51where our wiring matrix W is drawn randomly,
  • 18:54it's block-stochastic,
  • 18:57which means that it's Erdos-Renyi between blocks.
  • 19:00And it's a balanced network,
  • 19:02so we have excitatory and inhibitory neurons
  • 19:04that talk to each other and maintain a balance
  • 19:08in the dynamics here.
  • 19:09The hope is to pick a large enough scale network
  • 19:11that we see properly chaotic dynamics
  • 19:13using this leaky integrate and fire model.
  • 19:17These tests have yielded fairly mixed results.
  • 19:21So the simple tests behave as expected.
  • 19:24So the estimators that are used are biased,
  • 19:27and the bias typically decays slower
  • 19:29than the variance estimation,
  • 19:30which means that you do need fairly long trajectories
  • 19:32to try to properly estimate the transfer entropy.
  • 19:36That said, transfer entropy does correctly identify
  • 19:38causal relationships in simple graphs,
  • 19:40and transfer entropy matches the underlying structure
  • 19:44used in combinatorial threshold linear networks, so CTLN.
  • 19:49Unfortunately, these results did not carry over as cleanly
  • 19:52to the leaky integrate and fire models,
  • 19:54or to larger models.
  • 19:56So what I'm showing you on the right here,
  • 19:58this is a matrix where we've calculated
  • 20:00the pairwise transfer entropy
  • 20:02between all neurons in a 150 neuron balanced network.
  • 20:06This is shown absolute, this is shown in the log scale.
  • 20:09And the main thing I want to highlight, first,
  • 20:11taking a look at this matrix,
  • 20:12it's very hard to see exactly what the structure is.
  • 20:15You see this banding?
  • 20:17That's because neurons tend to be highly predictive
  • 20:20if they fire a lot.
  • 20:21So there's a strong correlation
  • 20:22between the transfer entropy between x and y,
  • 20:25and just the activity level of x.
  • 20:29But it's hard to distinguish blockwise differences,
  • 20:31for example, between inhibitory neurons, excitatory neurons,
  • 20:34and that really takes plotting out,
  • 20:36so here, this box and whisker plot on the bottom,
  • 20:39this is showing you if we group entries of this matrix
  • 20:43by type of connection.
  • 20:44So maybe excitatory to excitatory,
  • 20:45or inhibitor to excitatory, or so on,
  • 20:48that the distribution of realized transfer entropy
  • 20:50is really different,
  • 20:52but they're different in sort of subtle ways.
  • 20:54So in this larger scale balanced network,
  • 20:58it's much less clear whether transfer entropy
  • 21:02effectively is equated in some way
  • 21:05with the true connectivity or wiring.
  • 21:09In some ways, this is not a surprise,
  • 21:10because the behavior of the balanced networks
  • 21:12is inherently balanced,
  • 21:13and Erdos-Renyi is inherently in the structure.
  • 21:16But there are ways in which these experiments have revealed
  • 21:19confounding factors that are conceptual factors
  • 21:22that make transfer entropies not an ideal measure,
  • 21:25or maybe not as ideal as it seems,
  • 21:28given the start of this talk.
  • 21:29So for example, suppose two trajectories X and Y
  • 21:33are both strongly driven by a third trajectory Z,
  • 21:36but X responds to Z first.
  • 21:39Well, then the present information about X,
  • 21:40or the present state of X,
  • 21:41carries information about the future of Y,
  • 21:43so X is predictive of Y.
  • 21:45So X forecasts Y, so in the transfer entropy
  • 21:47or Wiener causality setting, we would say X causes Y,
  • 21:51even if X and Y are only both responding to Z.
  • 21:54So here, in this example, suppose you have a directed tree
  • 21:58where information or dynamics propagate down the tree.
  • 22:02If you look at this node here, Pj and i,
  • 22:07Pj will react to essentially information
  • 22:11traveling down this tree before i does,
  • 22:13so Pj would be predictive for i,
  • 22:15so we would observe an effective connection,
  • 22:19where Pj forecasts i.
  • 22:21Which means that neurons that are not directly connected
  • 22:23may influence each other, and that this transfer entropy,
  • 22:26really, you should think of in terms of forecasting,
  • 22:29not in terms of being a direct analog to the wiring matrix.
  • 22:33One way around this is to condition
  • 22:35on the state of the rest of the network
  • 22:37before you start doing some averaging.
  • 22:39This leads to some other notions of entropy,
  • 22:41so, for example, causation entropy,
  • 22:42and this is sort of a promising direction,
  • 22:44but it's not a direction we've had time to explore yet.
  • 22:47So that's the estimation side.
  • 22:49Those are the tools for estimating transfer entropy.
  • 22:52Now, let's switch gears
  • 22:53and talk about that second question I introduced,
  • 22:55which is essentially, how do we analyze structure.
  • 22:57Suppose we could calculate a transfer entropy graph.
  • 23:00How would we extract structural information from that graph?
  • 23:04And here, I'm going to be introducing some tools
  • 23:06that I've worked on for a while
  • 23:08for describing random structures and graphs.
  • 23:11These are tied back to some work I've really done
  • 23:15as a graduate student, and conversations with Lek-Heng.
  • 23:18So we start in a really simple context,
  • 23:19we just have a graph or network.
  • 23:21This could be directed or undirected,
  • 23:23and we're gonna require that it does not have self-loops,
  • 23:24and that it's finite.
  • 23:26We'll let V here be the number of vertices,
  • 23:28and E be the number of edges.
  • 23:30Then the object of study that we'll introduce
  • 23:33is something called an edge flow.
  • 23:34An edge flow is essentially a function
  • 23:35on the edges of the graph.
  • 23:37So this is a function that accepts pairs of endpoints
  • 23:40and returns a real number.
  • 23:42And this is an alternating function,
  • 23:43so if I take f(i, j), that's negative f(j, i),
  • 23:47because you can think of f(i, j) as being some flow,
  • 23:49like a flow of material between i and j,
  • 23:52hence this name, edge flow.
  • 23:54This is analogous to a vector field,
  • 23:56because this is analogous to the structure of a vector field
  • 23:58on the graph,
  • 23:59and represents some sort of flow between nodes.
  • 24:02Edge flows are really sort of generic things.
  • 24:04So you can take this idea of an edge flow
  • 24:07and apply it in a lot of different areas,
  • 24:09because really all you need
  • 24:10is you just need the structure of some alternating function
  • 24:12on the edges of a graph.
  • 24:13So I've read papers,
  • 24:16and worked in a bunch of these different areas.
  • 24:19Particularly, I've focused on applications of this
  • 24:21in game theory, in pairwise and social choice settings,
  • 24:25in biology and Markov chains.
  • 24:26And a lot of this project has been attempting
  • 24:28to take this experience working with edge flows in,
  • 24:31for example, say, non-equilibrium thermodynamics,
  • 24:34or looking at pairwise preference data,
  • 24:36and looking at a different application area
  • 24:38here to neuroscience.
  • 24:40Really, you can you think about the edge flow,
  • 24:42or relevant edge flow in neuroscience,
  • 24:43you might be asking about asymmetries in wiring patterns,
  • 24:46or differences in directed influence or causality,
  • 24:49or, really, you can think about
  • 24:50these transfer entropy quantities.
  • 24:51This is why I was excited about transfer entropy.
  • 24:53Transfer entropy is inherently directed notion
  • 24:56of information flow, so it's natural to think that
  • 24:59if you can calculate things like the transfer entropy,
  • 25:01then really, what you're studying is some sort of edge flow
  • 25:04on a graph.
  • 25:06Edge flows often are subject to the same common questions.
  • 25:10So if I want to analyze the structure of an edge flow,
  • 25:12there's some really big global questions
  • 25:14that I would often ask,
  • 25:15that get asked in all these different application areas.
  • 25:19One common question is,
  • 25:20well, does the flow originate somewhere and end somewhere?
  • 25:23Are there sources and sinks in the graph?
  • 25:25Another is, does it circulate?
  • 25:26And if it does circulate, on what scales, and where?
  • 25:31If you have a network that's connected
  • 25:33to a whole exterior network, for example,
  • 25:35if you're looking at some small subsystem
  • 25:37that's embedded in a much larger system,
  • 25:38as is almost always the case in neuroscience,
  • 25:41then you also need to think about
  • 25:42what passes through the network.
  • 25:43So, is there a flow or current
  • 25:45that moves through the boundary of the network,
  • 25:47and is there information that flows through
  • 25:50the network that you're studying?
  • 25:52And in particular, if we have these different types of flow,
  • 25:55if flow can originate in source and end in sinks,
  • 25:57if it can circulate, if it can pass through,
  • 25:59can we decompose the flow into pieces that do each of these
  • 26:03and ask how much of the flow does 1, 2, or 3?
  • 26:07Those questions lead to a decomposition.
  • 26:11So here, we're going to start with a simple idea.
  • 26:13We're going to decompose an edge flow
  • 26:15by projecting it onto orthogonal subspaces
  • 26:17associated with some graph operators.
  • 26:20Generically, if we consider two linear operators,
  • 26:23A and B, where the product A times B equals zero,
  • 26:27then the range of B must be contained
  • 26:29in the null space of A,
  • 26:31which means that I can express
  • 26:33essentially any set of real numbers,
  • 26:35so you can think of this as being
  • 26:36the vector space of possible edge flows,
  • 26:39as a direct sum of the range of B,
  • 26:43the range of A transpose,
  • 26:45and the intersection of the null space of B transpose
  • 26:47and the null space of A.
  • 26:48This blue subspace, this is called the harmonic space,
  • 26:53and this is trivial in many applications
  • 26:58if you choose A and B correctly.
  • 27:00So there's often settings where you can pick A and B
  • 27:02so that these two null spaces have no intersection,
  • 27:06and then this decomposition boils down
  • 27:08to just separating a vector space into the range of B
  • 27:13and the range of A transpose.
  • 27:16In the graph setting, our goal is essentially
  • 27:18to pick these operators to be meaningful things,
  • 27:20that is, to pick graph operators
  • 27:22so that these subspaces carry a meaningful,
  • 27:26or carry meaning in the structural context.
  • 27:30So let's think a little bit about graph operators here.
  • 27:33So, let's look at two different classes of operators.
  • 27:35So we can consider matrices that have E rows and n columns,
  • 27:40or matrices that have l rows and E columns,
  • 27:43where again, E is the number of edges in this graph.
  • 27:48If I have a matrix with E rows,
  • 27:50then each column with a matrix has as many entries
  • 27:53as there are edges in the graph,
  • 27:55so it can be thought of as itself an edge flow.
  • 27:57So you can think that this matrix
  • 27:59is composed of a set of columns,
  • 28:00where each column is some particular motivic flow,
  • 28:03or flow motif.
  • 28:05In contrast, if I look at a matrix where I have E columns,
  • 28:09then each row of the matrix is a flow motif,
  • 28:11so products against M evaluate inner products
  • 28:16against specific flow motifs.
  • 28:18That means in this context,
  • 28:20if I look at the range of this matrix,
  • 28:21this is really a linear combination
  • 28:23of a specific subset of flow motifs,
  • 28:25and in this context,
  • 28:26if I look at the null space of the matrix,
  • 28:28I'm looking at all edge flows orthogonal
  • 28:30to that set of flow motifs.
  • 28:32So here, if I look at the range of a matrix with E rows,
  • 28:36that subspace is essentially modeling behavior
  • 28:39similar to the motifs, so if I pick a set of motifs
  • 28:42that flow out of a node, or flow into a node,
  • 28:45then this range is going to be a subspace of edge flows
  • 28:48that tend to originate in sources and end in sinks.
  • 28:51In contrast, here, the null space of M,
  • 28:54that's all edge flows orthogonal to the flow motifs,
  • 28:57so it models behavior distinct from the motifs.
  • 28:59Essentially, this space asks what doesn't the flow do,
  • 29:02whereas this space asks what does the flow do.
  • 29:07Here is a simple, very classical example.
  • 29:09And this goes all the way back to, you can think,
  • 29:11like Kirchhoff electric circuit theory.
  • 29:14We can define two operators.
  • 29:15Here, G, this is essentially a gradient operator.
  • 29:18And if you've taken some graph theory, you might know this
  • 29:20as the edge incidence matrix.
  • 29:22This is the matrix which essentially records
  • 29:25the endpoints of an edge,
  • 29:26and evaluates differences across it.
  • 29:29So for example, if I look at this first row of G,
  • 29:33this corresponds to edge I in the graph,
  • 29:35and if I had a function defined on the nodes in the graph,
  • 29:39products with G would evaluate differences across this edge.
  • 29:43If you look at its columns,
  • 29:44each column here is a flow motif.
  • 29:46So for example, this highlighted second column,
  • 29:49this is entries 1, -1, 0, -1,
  • 29:52if you carry those back to the edges,
  • 29:53that corresponds to this specific flow motif.
  • 29:56So here, this gradient, it's adjoint,
  • 29:58so essentially a divergence operator,
  • 30:00which means that the flow motifs are unit in flows
  • 30:03or unit out flows for specific nodes,
  • 30:05like what's shown here.
  • 30:07You can also introduce something like a curl operator.
  • 30:10The curl operator evaluates path sums around loops.
  • 30:13So this row here, for example, this is a flow motif
  • 30:16corresponding to the loop labeled A in this graph.
  • 30:20You could certainly imagine other operators
  • 30:22build other motifs.
  • 30:23These operators are particularly nice,
  • 30:25because they define principled subspaces.
  • 30:28So if we apply that generic decomposition,
  • 30:31then we could say that the space of possible edge flows, RE,
  • 30:34can be decomposed into the range of the gradient operator,
  • 30:37the range of the curl transpose,
  • 30:39and the intersection of their null spaces,
  • 30:42into this harmonic space.
  • 30:44This is nice, because the range of the gradient,
  • 30:46that's flows that start and end somewhere,
  • 30:48those are flows that are associated
  • 30:49with motion (indistinct) potential.
  • 30:52So these, if you're thinking physics,
  • 30:53you might say that these are conservative,
  • 30:55these are flows generated by voltage
  • 30:57if you're looking at an electric circuit.
  • 30:59These cyclic flows, while these are the flows and range
  • 31:01of the curl transpose, and then this harmonic space,
  • 31:04those are flows that enter and leave the network
  • 31:06without either starting or ending at a sink or a source,
  • 31:10or circulating.
  • 31:11So you can think that really,
  • 31:12this decomposes the space of edge flows
  • 31:14into flows that start and end somewhere inside the network,
  • 31:17flows that circulate within the network,
  • 31:19and flows that do neither,
  • 31:20i.e. flows that enter and leave the network.
  • 31:22So this accomplishes that initial decomposition
  • 31:25I'd set out at the start.
  • 31:28Once we have this decomposition,
  • 31:29then we can evaluate the sizes of the components
  • 31:33of the decomposition to measure how much of the flow
  • 31:36starts and ends somewhere, how much circulates, and so on.
  • 31:39So, we can introduce these generic measures,
  • 31:41where given some operator M,
  • 31:44we decompose the space of edge flows
  • 31:46into the range of M and the null space of M transpose,
  • 31:49which means we can project f onto these subspaces,
  • 31:52and then evaluate the sizes of these components,
  • 31:55and that's a way of measuring how much of the flow
  • 31:58behaves like the flow motifs contained in this operator,
  • 32:01and how much doesn't.
  • 32:04So, yeah.
  • 32:05So that lets us answer this question,
  • 32:07and this is the tool that we're going to be using
  • 32:09as our measurable.
  • 32:12Now, that's totally easy to do,
  • 32:16if you're given a fixed edge flow and a fixed graph.
  • 32:17If you have a fixed graph, you can build your operators,
  • 32:19you choose the motifs, you have fixed edge flow,
  • 32:22you just project the edge flow onto the subspaces,
  • 32:24span by those operators, and you're done.
  • 32:27However, there are many cases where
  • 32:30it's worth thinking about a distribution of edge flows,
  • 32:33and then expected structures given that distribution.
  • 32:37So here, we're going to be considering random edge flows,
  • 32:39for example, an edge flow of capital F.
  • 32:41Here, I'm using capital letters to denote random quantities
  • 32:43sampled from an edge flow distribution.
  • 32:45So this is the distribution of possible edge flows.
  • 32:47And this is worth thinking about
  • 32:48because many generative models are stochastic.
  • 32:51They may involve some random seed,
  • 32:53or they may, for example, like that neural model,
  • 32:55or a lot of these sort of neural models, be chaotic,
  • 32:58so even if they are deterministic generative models,
  • 33:01the output data behaves as though it's been sampled
  • 33:03from a distribution.
  • 33:05On the empirical side, for example,
  • 33:07when we're estimating transfer entropy,
  • 33:09or estimating some information flow,
  • 33:11then there's always some degree of measurement error,
  • 33:13or uncertainty in the estimate,
  • 33:15which really means that from a Bayesian perspective,
  • 33:18we should be thinking that our estimator
  • 33:21is a point estimate drawn from some posterior distribution
  • 33:24of edge flows, and that we're back in the setting where,
  • 33:25again, we need to talk about a distribution.
  • 33:28Lastly, this random edge flow setting is also
  • 33:31really important if we want to compare the null hypotheses.
  • 33:35Because often, if you want to compare
  • 33:37to some sort of null hypothesis,
  • 33:38it's helpful to have an ensemble of edge flows
  • 33:41to compare against,
  • 33:43which means that we would like to be able to talk about
  • 33:44expected structure under varying distributional assumptions.
  • 33:50If we can talk meaningfully about random edge flows,
  • 33:54then really what we can start doing
  • 33:56is we can start bridging the expected structure
  • 33:59back to the distribution.
  • 34:00So what we're looking for
  • 34:01is a way of explaining generic expectations
  • 34:05of what the structure will look like
  • 34:07as we vary this distribution of edge flows.
  • 34:10You could think that a particular dynamical system
  • 34:13generates a wiring pattern, that generates firing dynamics,
  • 34:19those firing dynamics determine
  • 34:21some sort of information flow graph,
  • 34:23and then that information flow graph
  • 34:25is really a sample from that generative model,
  • 34:28and we would like to be able to talk about
  • 34:30what would we expect
  • 34:32if we knew the distribution of edge flows
  • 34:34about the global structure.
  • 34:35That is, we'd like to bridge global structure
  • 34:37back to this distribution.
  • 34:39And then, ideally, you'd bridge that distribution
  • 34:41back to the generative mechanism.
  • 34:42And this is a project for future work.
  • 34:45Obviously, this is fairly ambitious.
  • 34:47However, this first point is something you can do
  • 34:51really in fairly explicit detail,
  • 34:53and that's what I would like to spell out
  • 34:54with the end of this talk,
  • 34:55is how do you bridge global structure
  • 34:58back to a distribution of edge flows.
  • 35:02So yeah.
  • 35:03So that's our main question.
  • 35:05How does the choice of distribution
  • 35:06influence the expected global flow structure?
  • 35:12So first, let's start with a lemma.
  • 35:15Suppose that we have a distribution of edge flows
  • 35:17with some expectation f bar, and some covariance,
  • 35:20here I'm using double bar V to denote covariance.
  • 35:24We'll let S contained in the set of...
  • 35:27S will be a subspace contained within
  • 35:29the vector space of edge flows,
  • 35:31and we'll let PS be the orthogonal projector onto S.
  • 35:35Then FS, that's the projection of F onto this subspace S,
  • 35:40the expectation of its norm squared
  • 35:43is the norm of the expected flow projected onto S squared,
  • 35:48so this is essentially the expectation of the sample
  • 35:53is the measure evaluated with the expected sample.
  • 35:56And then plus a term that involves an inner product
  • 35:58between the projector on the subspace
  • 36:00and the covariance matrix for the edge flows.
  • 36:02Here, this denotes the matrix inner product,
  • 36:04so is just the sum over all ij entries.
  • 36:09What's nice about this formula is,
  • 36:10at least in terms of expectation,
  • 36:13it reduces this study of the bridge
  • 36:17between distribution and network structure
  • 36:20to a study of moments, right?
  • 36:22Because we've replaced a distributional problem here
  • 36:24with a linear algebra problem
  • 36:27that's posed in terms of this projector,
  • 36:29the projector out of the subspace S,
  • 36:31which is determined by the topology of the network.
  • 36:33And the variance in that edge flow,
  • 36:36which is determined by your generative model.
  • 36:40Well, you might say, "Okay, well, fine,
  • 36:42this is a matrix inner product, we can just stop here.
  • 36:44We could compute this projector.
  • 36:45We could sample a whole bunch of edge flows
  • 36:47to compute this covariance.
  • 36:48So you can do this matrix inner product."
  • 36:50But I'm sort of greedy, because I suspect
  • 36:54that you can really do more with this inner product.
  • 36:57So I'd like to highlight some challenges
  • 37:00associated with this inner product.
  • 37:03So first, let's say I asked you to design a distribution
  • 37:06with tuneable global structure.
  • 37:07So for example, I said I want you to
  • 37:09pick a generative model,
  • 37:10or design a distribution of edge flows,
  • 37:12that when I sample edge flows from it,
  • 37:14their expected structures match some expectation.
  • 37:18It's not obvious how to do that given this formula.
  • 37:22It's not obvious in particular, because these projectors,
  • 37:24like the projector onto subspace S,
  • 37:26typically depend in fairly non-trivial ways
  • 37:29on the graph topology.
  • 37:30So small changes in the graph topology
  • 37:32can completely change this projector.
  • 37:34In essence, it's hard to isolate topology from distribution.
  • 37:37You could think that this inner product,
  • 37:39if I think about it in terms of the ij entries,
  • 37:43while easy to compute, is not easy to interpret,
  • 37:47because i and j are somewhat arbitrary indexing.
  • 37:49And obviously, really, the topology of the graph,
  • 37:51it's not encoded in the indexing,
  • 37:53it's encoded in the structure of these matrices.
  • 37:56So in some ways, what we really need
  • 37:57is a better basis for computing this inner product.
  • 38:01In addition, computing this inner product
  • 38:03just may not be empirically feasible,
  • 38:05because it might not be feasible
  • 38:07to estimate all these covariances.
  • 38:08There's lots of settings where,
  • 38:09if you have a random edge flow,
  • 38:11it becomes very expensive to try to estimate
  • 38:13all the covariances in this graph, or sorry,
  • 38:15in this matrix, because this matrix has as many entries
  • 38:19as there are pairs of edges in the graph.
  • 38:22And typically, that number of edges grows fairly quickly
  • 38:26in the number of nodes in the graph.
  • 38:27So in the worst case, the size of these matrices
  • 38:31goes not to the square of the number of nodes in the graph,
  • 38:33but the number of nodes in the graph to the fourth,
  • 38:35so this becomes very expensive very fast.
  • 38:37Again, we could try to address this problem
  • 38:41if we had a better basis for performing this inner product,
  • 38:43because we might hope to be able to truncate
  • 38:46somewhere in that basis,
  • 38:47and use a lower dimensional representation.
  • 38:50So, to build there,
  • 38:52I'm going to show you a particular family of covariances.
  • 38:55We're going to start with a very simple generative model.
  • 38:58So let's suppose that each node of the graph
  • 39:00is assigned some set of attributes,
  • 39:02here, a random vector X, sampled from a...
  • 39:04So you can think of trait space,
  • 39:05a space of possible attributes.
  • 39:07And these are sampled i.i.d.
  • 39:09In addition, we'll assume there exists
  • 39:11an alternating function f,
  • 39:13which accepts pairs of attributes,
  • 39:15and returns a real number.
  • 39:17So this is something that I can evaluate on the endpoints
  • 39:20of an edge, and return an edge flow value.
  • 39:24In this setting,
  • 39:26everything that I've shown you before simplifies.
  • 39:29So if my edge flow F is drawn
  • 39:32by first sampling a set of attributes,
  • 39:34and then plugging those attributes into functions
  • 39:36on the edges, then the mean edge flow is zero,
  • 39:42so that f bar goes away,
  • 39:44and the covariance reduces to this form.
  • 39:46So you get a standard form,
  • 39:47where the covariance and the edge flow
  • 39:49is a function of two scalar quantities,
  • 39:52that's sigma squared and rho,
  • 39:53these are both statistics associated with this function
  • 39:56and the distribution of traits.
  • 39:59And then some matrices, so we have an identity matrix,
  • 40:02and we have this gradient matrix showing up again.
  • 40:05This is really nice, because when you plug it back in,
  • 40:07we try to compute, say,
  • 40:08the expected sizes of the components,
  • 40:13this matrix inner product
  • 40:15that I was complaining about before,
  • 40:17this whole matrix inner product simplifies.
  • 40:19So when you have a variance
  • 40:21that's in this nice, simple, canonical form,
  • 40:23then the expected overall size of the edge flow,
  • 40:26that's just sigma squared, the expected size
  • 40:29projected onto that conservative subspace,
  • 40:32that breaks into this combination
  • 40:35of the sigma squared and the rho,
  • 40:37again, those are some simple statistics.
  • 40:39And then V, E, L, and E, those are just
  • 40:42essentially dimension counting on the network.
  • 40:44So this is the number of vertices, the number of edges,
  • 40:47and the number of loops, the number of loops,
  • 40:48that's the number of edges
  • 40:49minus the number of vertices plus one.
  • 40:52And similarly, the expected cyclic size,
  • 40:55or size of the cyclic component, reduces to,
  • 40:57again, this scalar factor in terms of the statistics,
  • 41:01and some dimension counting topology related quantities.
  • 41:08So this is very nice,
  • 41:09because this allows us to really separate
  • 41:12the role of topology from the role of the generative model.
  • 41:14The generative model determines sigma and rho,
  • 41:17and topology determines these dimensions.
  • 41:22It turns out that the same thing is true
  • 41:26even if you don't sample the edge flow
  • 41:29using this trait approach, but the graph is complete.
  • 41:33So if your graph is complete,
  • 41:34then no matter how you sample your edge flow,
  • 41:37for any edge flow distribution,
  • 41:38exactly the same formulas hold,
  • 41:40you just replace those simple statistics
  • 41:43with estimators for those statistics
  • 41:45given your sampled flow.
  • 41:47And this is sort of a striking result,
  • 41:49because this says that this conclusion
  • 41:51that was linked to some specific generative model
  • 41:54with some very specific assumptions, right,
  • 41:56we assumed it was i.i.d.,
  • 41:57extends to all complete graphs,
  • 41:59regardless of the actual distribution that we sampled from.
  • 42:05Up until this point,
  • 42:06this is kind of just an algebra miracle.
  • 42:09And one of the things I'd like to do at the end of this talk
  • 42:11is explain why this is true,
  • 42:13and show how to generalize these results.
  • 42:16So to build there,
  • 42:17let's emphasize some of the advantages of this.
  • 42:19So first, the advantages of the model,
  • 42:22it's mechanistically plausible in certain settings,
  • 42:24it cleanly separated the role of topology and distribution,
  • 42:28and these coefficients that had to do with topology,
  • 42:30these are just dimensions,
  • 42:31these are non negative quantities,
  • 42:34so it's easy to work out monotonic relationships
  • 42:36between expected structure and simple statistics
  • 42:40of the edge flow distribution.
  • 42:44The fact that you can do that enables more general analysis.
  • 42:47So I'm showing you on the right here,
  • 42:48this is from a different application area.
  • 42:51This was an experiment where we trained a set of agents
  • 42:55to play a game using a genetic algorithm,
  • 42:58and then we looked at the expected sizes
  • 43:00of cyclic and acyclic components
  • 43:02in a tournament among those agents.
  • 43:05And you can actually predict these curves
  • 43:08using this type of structure analysis,
  • 43:10because it was possible to predict the dynamics
  • 43:13of these simple statistics, this sigma and this rho.
  • 43:18So this is a really powerful analytical tool,
  • 43:20but it is limited to this particular model.
  • 43:23In particular, it only models unstructured cycles,
  • 43:26so if you look at the cyclic component
  • 43:27generated by this model,
  • 43:28it just looks like random noise that's been projected
  • 43:31onto the range of the current transpose.
  • 43:34It's limited to correlations on adjacent edges,
  • 43:36so we only generate correlations
  • 43:38on edges that share an endpoint, because you could think
  • 43:40that all of the original random information
  • 43:42comes from the endpoints.
  • 43:44And then, in some ways, it's not general enough.
  • 43:47So it lacks an expressivity.
  • 43:48We can't parameterize all possible expected structures
  • 43:51by picking sigma and rho.
  • 43:54And we lack some notion of sufficiency,
  • 43:56i.e. if the graph is not complete,
  • 43:58then this nice algebraic property,
  • 44:01that it actually didn't matter what the distribution was,
  • 44:03this fails to hold.
  • 44:04So if the graph is not complete,
  • 44:06then projection onto the family of covariances
  • 44:09parameterized in this fashion
  • 44:11changes the expected global structure.
  • 44:15So we would like to address these limitations.
  • 44:17And so our goal for the next part of this talk
  • 44:19is to really generalize these results.
  • 44:21To generalize,
  • 44:22we're going to switch our perspective a little bit.
  • 44:25So I'll recall this formula,
  • 44:27that if we generate our edge flow
  • 44:30by sampling quantities on the endpoints,
  • 44:32and then plugging them into functions on the edges,
  • 44:34then you necessarily get a covariance
  • 44:35that's in this two parameter family,
  • 44:37where I have two scalar quantities
  • 44:39associated with the statistics of the edge flow,
  • 44:41that's this sigma and this rho,
  • 44:42and then I have some matrices
  • 44:43that are associated with the topology of the network
  • 44:45in the subspaces I'm projecting onto.
  • 44:48These are related to a different way
  • 44:51of looking at the graph.
  • 44:52So I can start with my original graph,
  • 44:54and then I can convert it to an edge graph,
  • 44:57where I have one node per edge in the graph,
  • 45:00and nodes are connected if they share an endpoint.
  • 45:04You can then assign essentially signs to these edges
  • 45:07based on whether the edge direction
  • 45:10chosen in the original graph is consistent or inconsistent
  • 45:14at the node that links two edges.
  • 45:16So for example, edges 1 and 2 both point in to this node,
  • 45:20so there's an edge linking 1 and 2
  • 45:22in the edge graph with a positive sign.
  • 45:25This essentially tells you
  • 45:25that the influence of random information
  • 45:30assigned on this node linking 1 and 2
  • 45:33would positively correlate the sample edge flow
  • 45:36on edges 1 and 2.
  • 45:38Then, this form, what this form for covariance matrices says
  • 45:43is that we're looking at families of edge flows
  • 45:46that have correlations on edges sharing an endpoint,
  • 45:49so edges at distance one in this edge graph,
  • 45:51and non-adjacent edges
  • 45:52are entirely independent of each other.
  • 45:56Okay.
  • 45:58So that's essentially what the trait-performance model
  • 46:00is doing, is it's parameterizing
  • 46:02a family of covariance matrices,
  • 46:04where we're modeling correlations at distance one,
  • 46:06but not further in the edge graph.
  • 46:08So then the natural thought
  • 46:09for how to generalize these results is to ask,
  • 46:11"Can we model longer distance correlations to this graph?"
  • 46:15To do so, let's think a little bit
  • 46:17about what this matrix
  • 46:19that's showing up inside the covariances,
  • 46:21so we have a gradient times a gradient transpose.
  • 46:24This is in effect a Laplacian for that edge graph.
  • 46:30And you can do this for other motifs.
  • 46:32If you think about different motif constructions,
  • 46:35essentially if you take a product of M transpose times M,
  • 46:38that will generate something that looks like a Laplacian
  • 46:41or an adjacency matrix for a graph
  • 46:44where I'm assigning nodes to be motifs,
  • 46:47and looking at the overlap of motifs.
  • 46:50And if I look at M times M transpose,
  • 46:52and I'm looking at the overlap of edges via shared motifs.
  • 46:55So these operators you can think about as being Laplacians
  • 46:57for some sort of graph
  • 46:59that's generated from the original graph motifs.
  • 47:04Like any adjacency matrix,
  • 47:06powers of something like G, G transpose minus 2I,
  • 47:11that would model connections along longer paths,
  • 47:14along longer distances in these graphs
  • 47:16associated with motifs, in this case, with the edge graph.
  • 47:20So our thought is, maybe, well,
  • 47:21we could extend this trait performance
  • 47:23family of covariance matrices
  • 47:25by instead of only looking at
  • 47:27a linear combination of an identity matrix and this matrix,
  • 47:31we could look at a power series.
  • 47:32So we could consider combining powers of this matrix.
  • 47:37And this would generate this family of matrices
  • 47:39that are parameterized by some set of
  • 47:41coefficients (indistinct)... <v ->Dr. Strang.</v>
  • 47:43I apologize, I just wanted to remind you
  • 47:46that we have a rather tight time limit,
  • 47:48approximately a couple of minutes.
  • 47:50<v ->Yes, of course.</v>
  • 47:52So here, the idea is to parameterize this family of matrices
  • 47:57by introducing a set of polynomials with coefficients alpha,
  • 48:00and then plugging into the polynomial
  • 48:03the Laplacian that's generated by...
  • 48:06The adjacency matrix generated by the graph motifs
  • 48:09we're interested in.
  • 48:11And that trait performance result,
  • 48:12that was really just looking at the first order case here,
  • 48:14that was looking at a linear polynomial
  • 48:17with these chosen coefficients.
  • 48:20This power series model is really nice analytically.
  • 48:24So if we start with some graph operator M,
  • 48:28and we consider the family of covariance matrices
  • 48:31generated by plugging M, M transpose into some
  • 48:34polynomial and power series,
  • 48:36then this family of matrices
  • 48:39is contained within the span of powers of M, M transpose.
  • 48:45You can talk about this family in terms of combinatorics.
  • 48:48So, for example, if we use that gradient
  • 48:50times gradient transpose minus twice the identity,
  • 48:52then powers of this is essentially, again, path counting,
  • 48:55so this is counting paths of length n.
  • 48:58You can also look at things like the trace of these powers.
  • 49:00So if you look at the trace series,
  • 49:02that's the sequence where you look at the trace of powers
  • 49:05of these, essentially, these adjacency matrices.
  • 49:09This is doing some sort of loop count,
  • 49:11where we're counting loops of different length.
  • 49:14And you can think of this trace series, in some sense,
  • 49:15as controlling amplification of self-correlations
  • 49:19within the sampled edge flow.
  • 49:22Depending on the generative model,
  • 49:23we might want to use different operators
  • 49:25for generating these families.
  • 49:26So for example, going back to that synaptic plasticity model
  • 49:29with coupled oscillators, in this case, using the gradient
  • 49:33to generate the family of covariance matrices
  • 49:35is not really the right structure,
  • 49:37because the dynamics of the model
  • 49:40have these natural cyclic connections.
  • 49:43So it's better to build the power series using the curl.
  • 49:46So depending on your model,
  • 49:47you can adapt this power series family
  • 49:49by plugging in a different graph operator.
  • 49:53Let's see now what happens if we try to compute
  • 49:55the expected sizes of some components
  • 49:58using a power series of this form.
  • 50:00So, if the variance, or covariance matrix for edge flow
  • 50:04is a power series in, for example,
  • 50:06the gradient, gradient transpose,
  • 50:08then the expected sizes of the measures
  • 50:12can all be expressed as linear combinations
  • 50:14of this trace series
  • 50:16and the coefficients of the original polynomial.
  • 50:19For example, the expected cyclic size of the flow
  • 50:21is just the polynomial evaluated at negative two,
  • 50:24multiplied by the number of loops in the graph.
  • 50:26And this, this really generalizes
  • 50:28that trait performance result,
  • 50:29because the trait performance result
  • 50:30is given by restricting these polynomials to be linear.
  • 50:36This, you can extend to other bases.
  • 50:41But really, what this accomplishes
  • 50:43is by generalizing trait performance,
  • 50:45we achieve this generic properties that it failed to have.
  • 50:52So in particular, if I have an edge flow subspace S
  • 50:56spanned by the flow motifs stored in some operator M,
  • 50:59then this power series family of covariances
  • 51:01associated with the Laplacian, that is, M times M transpose,
  • 51:05is both expressive, in the sense that
  • 51:08for any non negative a and b,
  • 51:11I can pick some alpha and beta
  • 51:13so that the expected size
  • 51:15of the projection of F onto the subspaces a,
  • 51:18and the projected size of F on the subspace orthogonal to S
  • 51:22is b for any covariance in this power series family.
  • 51:27And it's sufficient in the sense that
  • 51:30for any edge flow distribution with mean zero,
  • 51:32and covariance V,
  • 51:35if C is the matrix nearest to V in Frobenius norm,
  • 51:38restricted to the power series family,
  • 51:40then these inner products computed in terms of C
  • 51:44are exactly the same as inner products
  • 51:46computed in terms of V,
  • 51:47so they directly predict the structure,
  • 51:49which means that if I use this power series family,
  • 51:51discrepancies off of this family
  • 51:54don't change the expected structure.
  • 51:57Okay.
  • 51:57So, I know I'm short on time here,
  • 51:59so I'd like to skip, then, just to the end of this talk.
  • 52:03There's further things you can do with this,
  • 52:04this is sort of really nice mathematically.
  • 52:07You can build an approximation theory out of this,
  • 52:10and study it for different random graph families,
  • 52:12how many terms in these power series you need.
  • 52:15And those terms define
  • 52:16some nicer simple minimal set of statistics,
  • 52:19to try to estimate structure.
  • 52:22But I'd like to really just get to the end here,
  • 52:25and emphasize the takeaways from this talk.
  • 52:28So the first half of this talk
  • 52:30was focused on information flow.
  • 52:32What we saw is that information flow is a non-trivial,
  • 52:35but well studied estimation problem.
  • 52:37And this is something that, at least on my side,
  • 52:38is a work in progress with students.
  • 52:41Here, the, in some ways,
  • 52:42the conclusion of that first half
  • 52:43would be that causation entropy
  • 52:45may be a more appropriate measure than TE
  • 52:47when trying to build these flow graphs
  • 52:49to apply these structural measures to.
  • 52:51Then, on the structural side, we can say that
  • 52:54power series families,
  • 52:55this is a nice family of covariance matrices.
  • 52:57It has nice properties that are useful empirically,
  • 52:59because they let us build global correlation structures
  • 53:02from a sequence of local correlations
  • 53:03from that power series.
  • 53:06If you plug this back into the expected measures,
  • 53:08you can recover monotonic relations,
  • 53:10like in that limited trait performance case.
  • 53:12And truncation of these power series
  • 53:14reduces the number of quantities
  • 53:16that you would actually need to measure.
  • 53:19Actually, to a number of quantities that can be quite small
  • 53:21relative to the graph,
  • 53:22and that's where this approximation theory comes in.
  • 53:25One way, maybe to sort of summarize this entire approach,
  • 53:28is what we've done is by looking at these power series
  • 53:31built in terms of the graph operators,
  • 53:33is it provides a way to study
  • 53:35inherently heterogeneous connections, or covariances,
  • 53:39or edge flows distributions,
  • 53:41using a homogeneous correlation model
  • 53:43that's built at multiple scales by starting with local scale
  • 53:46and then looking at powers.
  • 53:49In some ways, this is a common...
  • 53:50I ended a previous version of this talk with,
  • 53:53I still think that this structural analysis is,
  • 53:55in some ways, a hammer seeking a nail,
  • 53:57and that this information flow construction,
  • 53:59this is work in progress to try and build that nail.
  • 54:02So thank you all for your attention.
  • 54:04I'll turn it now over to questions.
  • 54:07<v Instructor>(indistinct)</v>
  • 54:09Thank you so much for your talk.
  • 54:11Really appreciate it.
  • 54:15For those of you on Zoom,
  • 54:16you're welcome to keep up the conversations,
  • 54:17but unfortunately we have to clear the room,
  • 54:20so I do apologize.
  • 54:21But, (indistinct).
  • 54:25Dr. Strang?
  • 54:26Am I muted?
  • 54:30Dr. Strang?
  • 54:32<v ->Oh, yes, yeah.</v>
  • 54:33<v Instructor>Okay, do you mind if people...</v>
  • 54:35We have to clear the room, do you mind if people
  • 54:37email you if they have questions?
  • 54:40<v ->I'm sorry, I couldn't hear the end of the question.</v>
  • 54:42Do I mind if...
  • 54:45<v Instructor>We have to clear the room,</v>
  • 54:47do you mind if people email you if they have questions,
  • 54:49and (indistinct)... <v ->No, no, not at all.</v>
  • 54:52<v Instructor>So I do apologize, they are literally</v>
  • 54:54(indistinct) the room right now.
  • 54:57<v ->Okay, no, yeah, that's totally fine.</v>
  • 54:59<v Instructor>Thank you.</v>
  • 55:01And thanks again for a wonderful talk.
  • 55:03<v ->Thank you.</v>