The work presented in this thesis is essentially theoretical, but motivated by ecological applications. Ecological interaction networks represent the functioning of an ecosystem. Investigating the variability of interaction networks enables us to understand how the ecosystems are affected by external factors. This thesis suggests a methodology to analyze bipartite networks, applicable to ecological mutualistic networks. This methodology is based on U-statistics of row-column exchangeable matrices. Row-column exchangeable matrices are random matrices, the joint probability distribution of which is invariant by separate permutations of rows and columns. U-statistics correspond to the class of statistics defined as the empirical mean of a function of a subset, over all subsets of observations. U-statistics of matrices are the average of a submatrix function over the entire matrices. In network analysis, row-column exchangeable matrices are the adjacency matrices of bipartite node-exchangeable networks and U-statistics can be used as estimators of quantities of interest. This thesis focuses on the asymptotic behavior of the U-statistics of row-column exchangeable matrices. In the first part, backward martingales are used to derive a limit theorem on U-statistics of row-column exchangeable matrices. In the second part, a Hoeffding-type decomposition is established for them, which extends the previous limit theorem. Inspired by this decomposition, an estimator of the asymptotic variance is also suggested, making it possible to propose a general method for performing statistical inference tasks on exchangeable network models. The third part of the thesis extends the methodology to degenerate U-statistics, which have a faster rate of convergence. These statistical developments are applied to the analysis of bipartite networks, including mutualistic ecological networks. Many ecological questions are interested in the general structure of networks rather than the collection of present species. This makes exchangeable random network models, the adjacency matrices of which are row-column exchangeable, well-suited to analyze these networks. U-statistics are used as estimators of quantities of interest such as the degree heterogeneity, motif densities or graphon metrics. It possible to obtain statistical guarantees on these estimators, for example in the form of confidence intervals, owing to the theoretical results and the methodology developed in this thesis. Some examples of exchangeable random network models and U-statistics are given, answering real ecological questions. Simulation studies are used to validate the use of this methodology for these examples.
Authors
- Bibliographic Reference
- Tâm Le Minh. U-statistics of row-column exchangeable matrices : application to ecological network analysis. Statistics [math.ST]. Université Paris-Saclay, 2023. English. ⟨NNT : 2023UPASM027⟩. ⟨tel-04321993⟩
- HAL Collection
- ['AgroParisTech', 'CNRS-INSMI - INstitut des Sciences Mathématiques et de leurs Interactions', 'STAR - Dépôt national des thèses électroniques', 'MIA-Paris', 'Université Paris-Saclay', 'Archive ouverte en agrobiosciences', 'Institut National de Recherche en Agriculture, Alimentation et Environnement', 'Graduate School Mathématiques', 'Graduate School Computer Science', 'Département MathNum', 'Réseau "Systèmes Agricoles et Eau"']
- HAL Identifier
- 4321993
- Institution
- ['AgroParisTech', 'Université Paris-Saclay', 'Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement']
- Laboratory
- Mathématiques et Informatique Appliquées
- Published in
- France
Table of Contents
- Remerciements 6
- Table of notations 20
- Preface 24
- Introduction 34
- Networks and models 35
- The Erdos-Rényi model and some variations 37
- Configuration models 39
- Exponential random graph models 40
- Latent space models 43
- Estimation for likelihood-based models 45
- Interaction networks in ecology 47
- Ecological interactions 47
- Network structure 49
- Network response to environmental changes 51
- Probabilistic approaches 53
- Null model analysis 55
- Principles of hypothesis testing 56
- Null models in for ecological networks 56
- Exchangeable random network models 61
- Mathematical definition of exchangeability 61
- Ecological implications of exchangeability 65
- Exchangeable models 66
- U-statistics 70
- The basics 70
- Hoeffding decomposition 72
- Martingale properties 74
- Asymptotic behavior of U-statistics 76
- Dependent data 78
- Contributions 79
- General framework 79
- Results 85
- Outline 94
- U-statistics on bipartite exchangeable networks 98
- Introduction 99
- Main result 103
- Asymptotic framework 103
- Theorems 104
- The Aldous-Hoover theorem 106
- Proof of Theorem 2.2.5 107
- Proof of Theorem 2.2.7 108
- Proof of Theorem 2.2.8 109
- Proof of Theorem 2.2.9 110
- Applications 111
- The BEDD model 111
- Heterogeneity in the row weights of a network 113
- Network comparison 119
- Motif frequencies 121
- Discussion 127
- Properties of mN and nN 130
- Backward martingales 131
- Square-integrable backward martingale 132
- Asymptotic variance 133
- Conditional Lindeberg condition 137
- Hewitt-Savage theorem 140
- Identifiability of the BEDD model 141
- Proof of Theorem 2.3.2 142
- Proof of Theorem 2.3.4 143
- Derivation of variances 144
- Some U-statistics written with matrix operations 152
- Hoeffding-type decomposition for U-statistics on bipartite networks 156
- Introduction 157
- Hoeffding decomposition of a submatrix U-statistic 161
- Asymptotic normality of U-statistics 164
- Estimation of the asymptotic variance of a non-degenerated U-statistic 167
- Some useful notations and results 168
- Estimation of the conditional expectations 169
- Estimation of Vh 171
- Calculation of the estimator 172
- Extension to functions of U-statistics 174
- RCE models, kernel functions and network comparison 175
- Examples of RCE models 175
- Examples of kernel functions 176
- Network comparison 181
- Simulations 182
- Motif counts 182
- Graphon product distance 185
- Heterogeneity in the row weights of a network 186
- Illustrations 187
- Backward martingales 190
- Proofs of the results presented in Section 3.2 190
- Proofs of the results presented in Section 3.3 192
- Proofs of the results presented in Section 3.4 195
- Proofs of the results presented in Section 3.5 200
- Asymptotic distribution of degenerate U-statistics on bipartite networks 204
- A curse or a blessing? 205
- An example 206
- Incomplete U-statistics 208
- Another orthogonal decomposition for U-statistics 209
- Aldous-Hoover-Kallenberg representation of RCE matrices 209
- Graph subsets of AHK variables 210
- Decomposition of the probability space 212
- Decomposition of U-statistics 213
- Variance 214
- Principal part and support graphs 215
- Gaussian case 217
- Other limit distributions 222
- Other asymptotic frameworks 225
- Practical identification of limit distribution of U-statistics 226
- Degeneracy 227
- Order of degeneracy 227
- Example 229
- Application to the row degree homogeneity test 231
- Definition of the test statistic 232
- Simulation study 235
- Proofs for Section 4.2.3 237
- Proofs for Section 4.2.5 238
- Proofs for Section 4.2.6 239
- Proofs for Section 4.2.9 240
- Proof of Theorem 4.2.12 240
- Proof of Theorem 4.2.13 241
- Proof of Theorem 4.3.1 242
- Proofs for Section 4.5 246
- Perspectives 256
- Completion of the methodology 257
- Non-Gaussian degenerate limit theorem 257
- Plug-and-play variance estimators for degenerate U-statistics 257
- Improvements to the methodology 258
- Bootstrap methods 258
- Non-asymptotic results 261
- Berry-Esseen theorems 262
- Beyond RCE models 263
- Missing species 263
- Sparse networks 265
- Résumé 270
- Contexte 270
- Cadre général 273
- Modèles de réseaux bipartites échangeables 273
- Cadre asymptotique 276
- U-statistiques sur des réseaux bipartites 278
- Méthodologie proposée 279
- Résultats 280
- Caractérisation des modèles BEDD 280
- Décompositions de type Hoeffding 281
- Théorèmes limites 283
- Estimateurs de la variance 286
- Plan de la thèse 288