This thesis investigates the connection between the behaviour of discrete networks under signal propagation and a purely algebraic procedure of multilinear algebra, formulated in terms of the Bhattacharya--Mesner product~(BMP) of tensors. A \emph{tensor neural network} $N = N(G, \mathcal{A})$ is defined by an acyclic directed graph~$G$ together with a collection $\mathcal{A}$ of \emph{activation tensors}, one per node: the activation tensor $A_i \in (\C^n)^{\otimes(d_i+1)}$ of a node $a_i$ with in-degree $d_i$ encodes the node's response to each possible combination of input signals. The \emph{total tensor} of the network, of order~$q = |V(G)|$, is defined entry-wise as the product \[ N_{i_1, \dots, i_q} \;:=\; \prod_{j=1}^{q} (A_j)_{(i_{P_j},\, i_j)}, \] where $P_j$ denotes the multi-index of the parent states of node $a_j$. In order to express the total tensor as the BMP of the activation tensors, two auxiliary operations are introduced: the \emph{blow} (inflation) and the \emph{forget} (copy/projection), which increase the order of a tensor to make it compatible with the adjacency structure of the graph. The main result of the thesis (Theorem~4.2.2) states that, given any total ordering of the nodes compatible with the partial order induced by~$G$, the total tensor is recovered as \[ N \;=\; \circ\!\bigl(\tilde{T}_q,\, \tilde{T}_{q-1},\, \dots,\, \tilde{T}_1\bigr), \] where each $\tilde{T}_j$ is the activation tensor $A_j$ reshaped to order~$q$ via blow and forget, and the BMP is applied in \emph{reverse} node order, reflecting the non-commutativity of the product. Several consequences of the theorem are examined. We characterise the initial conditions under which two networks with different topologies share the same total tensor, with particular attention to the case of a totally disconnected network. For \emph{coherent} networks, those satisfying a global proportionality condition between the total output of each node and the total input received by its successors, we prove that the total tensor of the binary acyclic triangle network has rank one if and only if every activation tensor has rank one; without coherence, this equivalence fails. As an application, we construct tensor neural networks whose total tensor equals the matrix multiplication tensor $\langle a, b, c\rangle$. We compare the classical algorithm ($R(\langle 2,2,2\rangle) \leq 8$) with Strassen's algorithm ($R(\langle 2,2,2\rangle) \leq 7$), showing how increased topological complexity reduces computational complexity. For the $3\times 3$ case, we introduce \emph{MatMulNet}, a parametric neural model with learnable factors $H, K \in \mathbb{R}^{9 \times r}$, $F \in \mathbb{R}^{r \times 9}$, trained to fit $\langle 3,3,3\rangle$ for rank parameters $r \in \{18,\dots,24\}$. A sweep over seven independent runs per value of~$r$ yields a qualitative transition at $r = 23$. The results are consistent with the known bound $R(\langle 3,3,3\rangle) \leq 23$ of Laderman (1976) and with the theoretical constraints $19 \leq R(\langle 3,3,3\rangle)$ (Bläser, 2003) and $17 \leq \BR(\langle 3,3,3\rangle)$ (Conner-Harper-Landsberg, 2019).
Marziali, S. (2026). Product of Tensors and the Algebra of Networks.
Product of Tensors and the Algebra of Networks
Marziali, Sara
2026-06-11
Abstract
This thesis investigates the connection between the behaviour of discrete networks under signal propagation and a purely algebraic procedure of multilinear algebra, formulated in terms of the Bhattacharya--Mesner product~(BMP) of tensors. A \emph{tensor neural network} $N = N(G, \mathcal{A})$ is defined by an acyclic directed graph~$G$ together with a collection $\mathcal{A}$ of \emph{activation tensors}, one per node: the activation tensor $A_i \in (\C^n)^{\otimes(d_i+1)}$ of a node $a_i$ with in-degree $d_i$ encodes the node's response to each possible combination of input signals. The \emph{total tensor} of the network, of order~$q = |V(G)|$, is defined entry-wise as the product \[ N_{i_1, \dots, i_q} \;:=\; \prod_{j=1}^{q} (A_j)_{(i_{P_j},\, i_j)}, \] where $P_j$ denotes the multi-index of the parent states of node $a_j$. In order to express the total tensor as the BMP of the activation tensors, two auxiliary operations are introduced: the \emph{blow} (inflation) and the \emph{forget} (copy/projection), which increase the order of a tensor to make it compatible with the adjacency structure of the graph. The main result of the thesis (Theorem~4.2.2) states that, given any total ordering of the nodes compatible with the partial order induced by~$G$, the total tensor is recovered as \[ N \;=\; \circ\!\bigl(\tilde{T}_q,\, \tilde{T}_{q-1},\, \dots,\, \tilde{T}_1\bigr), \] where each $\tilde{T}_j$ is the activation tensor $A_j$ reshaped to order~$q$ via blow and forget, and the BMP is applied in \emph{reverse} node order, reflecting the non-commutativity of the product. Several consequences of the theorem are examined. We characterise the initial conditions under which two networks with different topologies share the same total tensor, with particular attention to the case of a totally disconnected network. For \emph{coherent} networks, those satisfying a global proportionality condition between the total output of each node and the total input received by its successors, we prove that the total tensor of the binary acyclic triangle network has rank one if and only if every activation tensor has rank one; without coherence, this equivalence fails. As an application, we construct tensor neural networks whose total tensor equals the matrix multiplication tensor $\langle a, b, c\rangle$. We compare the classical algorithm ($R(\langle 2,2,2\rangle) \leq 8$) with Strassen's algorithm ($R(\langle 2,2,2\rangle) \leq 7$), showing how increased topological complexity reduces computational complexity. For the $3\times 3$ case, we introduce \emph{MatMulNet}, a parametric neural model with learnable factors $H, K \in \mathbb{R}^{9 \times r}$, $F \in \mathbb{R}^{r \times 9}$, trained to fit $\langle 3,3,3\rangle$ for rank parameters $r \in \{18,\dots,24\}$. A sweep over seven independent runs per value of~$r$ yields a qualitative transition at $r = 23$. The results are consistent with the known bound $R(\langle 3,3,3\rangle) \leq 23$ of Laderman (1976) and with the theoretical constraints $19 \leq R(\langle 3,3,3\rangle)$ (Bläser, 2003) and $17 \leq \BR(\langle 3,3,3\rangle)$ (Conner-Harper-Landsberg, 2019).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1319554
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
