Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Thu, 25 Nov 21
 [1] arXiv:2111.12149 [pdf, other]

Title: Binned multinomial logistic regression for integrative cell type annotationSubjects: Applications (stat.AP)
Categorizing individual cells into one of many known cell type categories, also known as cell type annotation, is a critical step in the analysis of singlecell genomics data. The current process of annotation is timeintensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learning approaches have provided automated solutions to annotation, there remains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article, we propose a new multinomial logistic regression estimator which can be used to model cell type probabilities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simulation studies that our approach estimates cell type probabilities more accurately than competitors in a wide variety of scenarios. We apply our method to ten singlecell RNAseq datasets and demonstrate its utility in predicting fine resolution cell type labels on unlabeled data as well as refining cell type labels on data with existing coarse resolution annotations. An R package implementing the method is available at https://github.com/keshavmotwani/IBMR and the collection of datasets we analyze is available at https://github.com/keshavmotwani/AnnotatedPBMC.
 [2] arXiv:2111.12157 [pdf, other]

Title: Bayesian Sample Size Prediction for Online ActivityComments: 10 pages, 7 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will subsequently participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.
 [3] arXiv:2111.12161 [pdf, other]

Title: Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference ApproachSubjects: Methodology (stat.ME)
We propose a modelfree framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $\Gamma$value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations where the training data is confounded. Under the marginal sensitivity model of Tan (2006), we characterize the shift between the distribution of the observations and that of the counterfactuals. We first develop a general method for predictive inference of test samples from a shifted distribution; we then leverage this to construct covariatedependent prediction sets for counterfactuals. No matter the value of the shift, these prediction sets (resp. approximately) achieve marginal coverage if the propensity score is known exactly (resp. estimated). We describe a distinct procedure also attaining coverage, however, conditional on the training data. In the latter case, we prove a sharpness result showing that for certain classes of prediction problems, the prediction intervals cannot possibly be tightened. We verify the validity and performance of the new methods via simulation studies and apply them to analyze real datasets.
 [4] arXiv:2111.12163 [pdf, other]

Title: spOccupancy: An R package for single species, multispecies, and integrated spatial occupancy modelsComments: 31 pages, 4 figuresSubjects: Applications (stat.AP)
Occupancy modeling is a common approach to assess spatial and temporal species distribution patterns, while explicitly accounting for measurement errors common in detectionnondetection data. Numerous extensions of the basic single species occupancy model exist to address dynamics, multiple species or states, interactions, false positive errors, autocorrelation, and to integrate multiple data sources. However, development of specialized and computationally efficient software to fit spatial models to large data sets is scarce or absent. We introduce the spOccupancy R package designed to fit single species, multispecies, and integrated spatiallyexplicit occupancy models. Using a Bayesian framework, we leverage P\'olyaGamma data augmentation and Nearest Neighbor Gaussian Processes to ensure models are computationally efficient for potentially massive data sets. spOccupancy provides userfriendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and kfold crossvalidation), and outofsample prediction. We illustrate the package's functionality via a vignette, simulated data analysis, and two bird case studies, in which we estimate occurrence of the Blackthroated Green Warbler (Setophaga virens) across the eastern USA and species richness of a foliagegleaning bird community in the Hubbard Brook Experimental Forest in New Hampshire, USA. The spOccupancy package provides a userfriendly approach to fit a variety of single and multispecies occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large data sets.
 [5] arXiv:2111.12201 [pdf, other]

Title: Parameter estimation and uncertainty quantification using information geometryComments: 50 pages (exc. references), 12 figures. ReviewSubjects: Methodology (stat.ME); Applications (stat.AP)
In this work we (1) review likelihoodbased inference for parameter estimation and the construction of confidence regions, and (2) explore the use of techniques from information geometry, including geodesic curves and Riemann scalar curvature, to supplement typical techniques for uncertainty quantification such as Bayesian methods, profile likelihood, asymptotic analysis and bootstrapping. These techniques from information geometry provide dataindependent insights into uncertainty and identifiability, and can be used to inform data collection decisions. All code used in this work to implement the inference and information geometry techniques is available on GitHub.
 [6] arXiv:2111.12224 [pdf, other]

Title: Asymptotics for Markov chain mixture detectionComments: To be published in Econometrics and StatisticsSubjects: Statistics Theory (math.ST)
Sufficient conditions are provided under which the loglikelihood ratio test statistic fails to have a limiting chisquared distribution under the null hypothesis when testing between one and two components under a general twocomponent mixture model, but rather tends to infinity in probability. These conditions are verified when the component densities describe continuoustime, discretestatespace Markov chains and the results are illustrated via a parametric bootstrap simulation on an analysis of the migrations over time of a set of corporate bonds ratings. The precise limiting distribution is derived in a simple case with two states, one of which is absorbing which leads to a rightcensored exponential scale mixture model. In that case, when centred by a function growing logarithmically in the sample size, the statistic has a limiting distribution of Gumbel extremevalue type rather than chisquared.
 [7] arXiv:2111.12244 [pdf, other]

Title: A Unified Decision Framework for Phase I DoseFinding DesignsSubjects: Methodology (stat.ME)
The purpose of a phase I dosefinding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerated dose. Over the past three decades, various dosefinding designs have been proposed and discussed, including conventional modelbased designs, new modelbased designs using toxicity probability intervals, and rulebased designs. We present a simple decision framework that can generate several popular designs as special cases. We show that these designs share common elements under the framework, such as the same likelihood function, the use of loss functions, and the nature of the optimal decisions as Bayes rules. They differ mostly in the choice of the prior distributions. We present theoretical results on the decision framework and its link to specific and popular designs like mTPI, BOIN, and CRM. These results provide useful insights into the designs and their underlying assumptions, and convey information to help practitioners select an appropriate design.
 [8] arXiv:2111.12267 [pdf, other]

Title: The Practical Scope of the Central Limit TheoremComments: 47 pages, 17 figuresSubjects: Other Statistics (stat.OT); Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)
The \textit{Central Limit Theorem (CLT)} is at the heart of a great deal of applied problemsolving in statistics and data science, but the theorem is silent on an important implementation issue: \textit{how much data do you need for the CLT to give accurate answers to practical questions?} Here we examine several approaches to addressing this issue  along the way reviewing the history of this problem over the last 290 years  and we illustrate the calculations with casestudies from finitepopulation sampling and gambling. A variety of surprises emerge.
 [9] arXiv:2111.12272 [pdf, other]

Title: Causal Analysis and Prediction of Human Mobility in the U.S. during the COVID19 PandemicSubjects: Applications (stat.AP); Machine Learning (cs.LG)
Since the increasing outspread of COVID19 in the U.S., with the highest number of confirmed cases and deaths in the world as of September 2020, most states in the country have enforced travel restrictions resulting in sharp reductions in mobility. However, the overall impact and longterm implications of this crisis to travel and mobility remain uncertain. To this end, this study develops an analytical framework that determines and analyzes the most dominant factors impacting human mobility and travel in the U.S. during this pandemic. In particular, the study uses Granger causality to determine the important predictors influencing daily vehicle miles traveled and utilize linear regularization algorithms, including Ridge and LASSO techniques, to model and predict mobility. Statelevel timeseries data were obtained from various openaccess sources for the period starting from March 1, 2020 through June 13, 2020 and the entire data set was divided into two parts for training and testing purposes. The variables selected by Granger causality were used to train the three different reduced order models by ordinary least square regression, Ridge regression, and LASSO regression algorithms. Finally, the prediction accuracy of the developed models was examined on the test data. The results indicate that the factors including the number of new COVID cases, social distancing index, population staying at home, percent of out of county trips, trips to different destinations, socioeconomic status, percent of people working from home, and statewide closure, among others, were the most important factors influencing daily VMT. Also, among all the modeling techniques, Ridge regression provides the most superior performance with the least error, while LASSO regression also performed better than the ordinary least square model.
 [10] arXiv:2111.12283 [pdf, other]

Title: Coexchangeable process modelling for uncertainty quantification in joint climate reconstructionComments: Submitted to the Journal of the American Statistical AssociationSubjects: Applications (stat.AP)
Any experiment with climate models relies on a potentially large set of spatiotemporal boundary conditions. These can represent both the initial state of the system and/or forcings driving the model output throughout the experiment. Whilst these boundary conditions are typically fixed using available reconstructions in climate modelling studies, they are highly uncertain, that uncertainty is unquantified, and the effect on the output of the experiment can be considerable. We develop efficient quantification of these uncertainties that combines relevant data from multiple models and observations. Starting from the coexchangeability model, we develop a coexchangable process model to capture multiple correlated spatiotemporal fields of variables. We demonstrate that further exchangeability judgements over the parameters within this representation lead to a Bayes linear analogy of a hierarchical model. We use the framework to provide a joint reconstruction of seasurface temperature and seaice concentration boundary conditions at the last glacial maximum (1923 ka) and use it to force an ensemble of icesheet simulations using the FAMOUSIce coupled atmosphere and icesheet model. We demonstrate that existing boundary conditions typically used in these experiments are implausible given our uncertainties and demonstrate the impact of using more plausible boundary conditions on icesheet simulation.
 [11] arXiv:2111.12348 [pdf]

Title: Comparative Evaluation of Statistical Orbit Determination Algorithms for ShortTerm Prediction of Geostationary and Geosynchronous Satellite Orbits in NavIC ConstellationSubjects: Applications (stat.AP)
NavIC is a newly established Indian regional Navigation Constellation with 3 satellites in geostationary Earth orbit (GEO) and 4 satellites in geosynchronous orbit (GSO). Satellite positions are essential in navigation for various positioning applications. In this paper, we propose a Bootstrap Particle Filter (BPF) approach to determine the satellite positions in NavIC constellation for short duration of 1 hr. The Bootstrap Particle filterbased approach was found to be efficient with meter level prediction accuracy as compared to other methods such as Least Squares (LS), Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF) and Ensemble Kalman Filter (EnKF). The residual analysis revealed that the BPF approach addressed the problem of nonlinearity in the dynamics model as well as nonGaussian nature of the state of the NavIC satellites.
 [12] arXiv:2111.12482 [pdf, other]

Title: One More Step Towards Reality: Cooperative Bandits with Imperfect CommunicationJournalref: Conference on Neural Information Processing Systems, 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The cooperative bandit problem is increasingly becoming relevant due to its applications in largescale decisionmaking. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most realworld distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical realworld communication scenarios, namely, (a) messagepassing over stochastic timevarying networks, (b) instantaneous rewardsharing over a network with random delays, and (c) messagepassing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with nearoptimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayedupdate algorithm that outperforms the existing stateoftheart on various network topologies. Finally, we present tight networkdependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
 [13] arXiv:2111.12526 [pdf]

Title: Mining Metaindicators of University Ranking: A Machine Learning Approach Based on SHAPComments: 4 pages, 1 figureSubjects: Applications (stat.AP); Machine Learning (cs.LG); Machine Learning (stat.ML)
University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the metaindicators of the index system by simplifying the complexity? This research discovered three metaindicators based on interpretable machine learning. The first one is time, to be friends with time, and believe in the power of time, and accumulate historical deposits; the second one is space, to be friends with city, and grow together by codevelop; the third one is relationships, to be friends with alumni, and strive for more alumni donations without ceiling.
 [14] arXiv:2111.12603 [pdf, ps, other]

Title: Strong Invariance Principles for Ergodic Markov ProcessesSubjects: Statistics Theory (math.ST); Probability (math.PR); Computation (stat.CO)
Strong invariance principles describe the error term of a Brownian approximation of the partial sums of a stochastic process. While these strong approximation results have many applications, the results for continuoustime settings have been limited. In this paper, we obtain strong invariance principles for a broad class of ergodic Markov processes. The main results rely on ergodicity requirements and an application of Nummelin splitting for continuoustime processes. Strong invariance principles provide a unified framework for analysing commonly used estimators of the asymptotic variance in settings with a dependence structure. We demonstrate how this can be used to analyse the batch means method for simulation output of Piecewise Deterministic Monte Carlo samplers. We also derive a fluctuation result for additive functionals of ergodic diffusions using our strong approximation results.
 [15] arXiv:2111.12604 [pdf, other]

Title: Statespace deep Gaussian processes with applicationsAuthors: Zheng ZhaoJournalref: Doctoral dissertation, Aalto University, 2021Subjects: Methodology (stat.ME); Signal Processing (eess.SP); Machine Learning (stat.ML)
This thesis is mainly concerned with statespace approaches for solving deep (temporal) Gaussian process (DGP) regression problems. More specifically, we represent DGPs as hierarchically composed systems of stochastic differential equations (SDEs), and we consequently solve the DGP regression problem by using statespace filtering and smoothing methods. The resulting statespace DGP (SSDGP) models generate a rich class of priors compatible with modelling a number of irregular signals/functions. Moreover, due to their Markovian structure, SSDGPs regression problems can be solved efficiently by using Bayesian filtering and smoothing methods. The second contribution of this thesis is that we solve continuousdiscrete Gaussian filtering and smoothing problems by using the Taylor moment expansion (TME) method. This induces a class of filters and smoothers that can be asymptotically exact in predicting the mean and covariance of stochastic differential equations (SDEs) solutions. Moreover, the TME method and TME filters and smoothers are compatible with simulating SSDGPs and solving their regression problems. Lastly, this thesis features a number of applications of statespace (deep) GPs. These applications mainly include, (i) estimation of unknown drift functions of SDEs from partially observed trajectories and (ii) estimation of spectrotemporal features of signals.
 [16] arXiv:2111.12612 [pdf, other]

Title: Multiplier bootstrap for BuresWasserstein barycentersComments: 36 pages, 2 figuresSubjects: Statistics Theory (math.ST); Applications (stat.AP)
BuresWasserstein barycenter is a popular and promising tool in analysis of complex data like graphs, images etc. In many applications the input data are random with an unknown distribution, and uncertainty quantification becomes a crucial issue. This paper offers an approach based on multiplier bootstrap to quantify the error of approximating the true BuresWasserstein barycenter $Q_*$ by its empirical counterpart $Q_n$. The main results state the bootstrap validity under general assumptions on the data generating distribution $P$ and specifies the approximation rates for the case of subexponential $P$. The performance of the method is illustrated on synthetic data generated from the weighted stochastic block model.
 [17] arXiv:2111.12676 [pdf, other]

Title: Superpolynomial accuracy of one dimensional randomized nets using the medianofmeansSubjects: Computation (stat.CO); Numerical Analysis (math.NA); Statistics Theory (math.ST)
Let $f$ be analytic on $[0,1]$ with $f^{(k)}(1/2)\leq A\alpha^kk!$ for some constant $A$ and $\alpha<2$. We show that the median estimate of $\mu=\int_0^1f(x)\,\mathrm{d}x$ under random linear scrambling with $n=2^m$ points converges at the rate $O(n^{c\log(n)})$ for any $c< 3\log(2)/\pi^2\approx 0.21$. We also get a superpolynomial convergence rate for the sample median of $2k1$ random linearly scrambled estimates, when $k=\Omega(m)$. When $f$ has a $p$'th derivative that satisfies a $\lambda$H\"older condition then the medianofmeans has error $O( n^{(p+\lambda)+\epsilon})$ for any $\epsilon>0$, if $k\to\infty$ as $m\to\infty$.
Crosslists for Thu, 25 Nov 21
 [18] arXiv:2111.12139 (crosslist from cs.LG) [pdf, other]

Title: ChebLieNet: Invariant Spectral Graph NNs Turned Equivariant by Riemannian Geometry on Lie GroupsComments: submitted to NeurIPS'21, this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce ChebLieNet, a groupequivariant method on (anisotropic) manifolds. Surfing on the success of graph and groupbased neural networks, we take advantage of the recent developments in the geometric deep learning field to derive a new approach to exploit any anisotropies in data. Via discrete approximations of Lie groups, we develop a graph neural network made of anisotropic convolutional layers (Chebyshev convolutions), spatial pooling and unpooling layers, and global pooling layers. Group equivariance is achieved via equivariant and invariant operators on graphs with anisotropic leftinvariant Riemannian distancebased affinities encoded on the edges. Thanks to its simple form, the Riemannian metric can model any anisotropies, both in the spatial and orientation domains. This control on anisotropies of the Riemannian metrics allows to balance equivariance (anisotropic metric) against invariance (isotropic metric) of the graph convolution layers. Hence we open the doors to a better understanding of anisotropic properties. Furthermore, we empirically prove the existence of (datadependent) sweet spots for anisotropic parameters on CIFAR10. This crucial result is evidence of the benefice we could get by exploiting anisotropic properties in data. We also evaluate the scalability of this approach on STL10 (image data) and ClimateNet (spherical data), showing its remarkable adaptability to diverse tasks.
 [19] arXiv:2111.12140 (crosslist from cs.LG) [pdf, ps, other]

Title: Filter Methods for Feature Selection in Supervised Machine Learning Applications  Review and BenchmarkComments: Source code of the analysis is available on requestSubjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
The amount of data for machine learning (ML) applications is constantly growing. Not only the number of observations, especially the number of measured variables (features) increases with ongoing digitization. Selecting the most appropriate features for predictive modeling is an important lever for the success of ML applications in business and research. Feature selection methods (FSM) that are independent of a certain ML algorithm  socalled filter methods  have been numerously suggested, but little guidance for researchers and quantitative modelers exists to choose appropriate approaches for typical ML problems. This review synthesizes the substantial literature on feature selection benchmarking and evaluates the performance of 58 methods in the widely used R environment. For concrete guidance, we consider four typical dataset scenarios that are challenging for ML models (noisy, redundant, imbalanced data and cases with more features than observations). Drawing on the experience of earlier benchmarks, which have considered much fewer FSMs, we compare the performance of the methods according to four criteria (predictive performance, number of relevant features selected, stability of the feature sets and runtime). We found methods relying on the random forest approach, the double input symmetrical relevance filter (DISR) and the joint impurity filter (JIM) were wellperforming candidate methods for the given dataset scenarios.
 [20] arXiv:2111.12143 (crosslist from cs.LG) [pdf, other]

Title: Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNormComments: 28 pages, 8 figuresSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); High Energy Physics  Theory (hepth); Machine Learning (stat.ML)
Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new way to diagnose (both theoretically and empirically) this criticality. To that end, we introduce partial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0<l$. These quantities are particularly useful when the network architecture involves many different layers. We discuss various properties of the partial Jacobians such as their scaling with depth and relation to the neural tangent kernel (NTK). We derive the recurrence relations for the partial Jacobians and utilize them to analyze criticality of deep MLP networks with (and without) LayerNorm. We find that the normalization layer changes the optimal values of hyperparameters and critical exponents. We argue that LayerNorm is more stable when applied to preactivations, rather than activations due to larger correlation depth.
 [21] arXiv:2111.12148 (crosslist from eess.SP) [pdf, other]

Title: Machine Learning Based Forward Solver: An Automatic Framework in gprMaxComments: 6 pages, 6 figuresSubjects: Signal Processing (eess.SP); Geophysics (physics.geoph); Machine Learning (stat.ML)
General fullwave electromagnetic solvers, such as those utilizing the finitedifference timedomain (FDTD) method, are computationally demanding for simulating practical GPR problems. We explore the performance of a nearrealtime, forward modeling approach for GPR that is based on a machine learning (ML) architecture. To ease the process, we have developed a framework that is capable of generating these MLbased forward solvers automatically. The framework uses an innovative training method that combines a predictive dimensionality reduction technique and a large data set of modeled GPR responses from our FDTD simulation software, gprMax. The forward solver is parameterized for a specific GPR application, but the framework can be extended in a straightforward manner to different electromagnetic problems.
 [22] arXiv:2111.12151 (crosslist from cs.LG) [pdf, other]

Title: Best Arm Identification with Safety ConstraintsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The best arm identification problem in the multiarmed bandit setting is an excellent model of many realworld decisionmaking problems, yet it fails to capture the fact that in the realworld, safety constraints often must be met while learning. In this work we study the question of bestarm identification in safetycritical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.
 [23] arXiv:2111.12166 (crosslist from cs.IT) [pdf, other]

Title: Towards Empirical Sandwich Bounds on the RateDistortion FunctionSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
Ratedistortion (RD) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed subject to a fidelity criterion, by any compression algorithm. As researchers push for everimproving compression performance, establishing the RD function of a given data source is not only of scientific interest, but also sheds light on the possible room for improving compression algorithms. Previous work on this problem relied on distributional assumptions on the data source (Gibson, 2017) or only applied to discrete data. By contrast, this paper makes the first attempt at an algorithm for sandwiching the RD function of a general (not necessarily discrete) source requiring only i.i.d. data samples. We estimate RD sandwich bounds on Gaussian and highdimension bananashaped sources, as well as GANgenerated images. Our RD upper bound on natural images indicates room for improving the performance of stateoftheart image compression methods by 1 dB in PSNR at various bitrates.
 [24] arXiv:2111.12187 (crosslist from cs.LG) [pdf, other]

Title: Input Convex Gradient NetworksComments: Accepted to NeurIPS 2021 Optimal Transport and Machine Learning Workshop this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The gradients of convex functions are expressive models of nontrivial vector fields. For example, Brenier's theorem yields that the optimal transport map between any two measures on Euclidean space under the squared distance is realized as a convex gradient, which is a key insight used in recent generative flow models. In this paper, we study how to model convex gradients by integrating a Jacobianvector product parameterized by a neural network, which we call the Input Convex Gradient Network (ICGN). We theoretically study ICGNs and compare them to taking the gradient of an InputConvex Neural Network (ICNN), empirically demonstrating that a single layer ICGN can fit a toy example better than a single layer ICNN. Lastly, we explore extensions to deeper networks and connections to constructions from Riemannian geometry.
 [25] arXiv:2111.12193 (crosslist from cs.LG) [pdf, other]

Title: MultisetEquivariant Set Prediction with Approximate Implicit DifferentiationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Most set prediction models in deep learning use setequivariant operations, but they actually operate on multisets. We show that setequivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multisetequivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multisetequivariant without being hindered by setequivariance and improve it with approximate implicit differentiation, allowing for better optimization while being faster and saving memory. In a range of toy experiments, we show that the perspective of multisetequivariance is beneficial and that our changes to DSPN achieve better results in most cases. On CLEVR object property prediction, we substantially improve over the stateoftheart Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation.
 [26] arXiv:2111.12258 (crosslist from econ.EM) [pdf, other]

Title: On Recoding Ordered Treatments as Binary IndicatorsSubjects: Econometrics (econ.EM); Methodology (stat.ME)
Researchers using instrumental variables to investigate the effects of ordered treatments (e.g., years of education, months of healthcare coverage) often recode treatment into a binary indicator for any exposure (e.g., any college, any healthcare coverage). The resulting estimand is difficult to interpret unless the instruments only shift compliers from no treatment to some positive quantity and not from some treatment to more  i.e., there are extensive margin compliers only (EMCO). When EMCO holds, recoded endogenous variables capture a weighted average of treatment effects across complier groups that can be partially unbundled into each group's treated and untreated means. Invoking EMCO along with the standard Local Average Treatment Effect assumptions is equivalent to assuming choices are determined by a simple twofactor selection model in which agents first decide whether to participate in treatment at all and then decide how much. The instruments must only impact relative utility in the first step. Although EMCO constrains unobserved counterfactual choices, it places testable restrictions on the joint distribution of outcomes, treatments, and instruments.
 [27] arXiv:2111.12292 (crosslist from cs.CV) [pdf, other]

Title: Improved Finetuning by Leveraging Pretraining Data: Theory and PracticeSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
As a dominant paradigm, finetuning a pretrained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pretraining strategy once the number of training iterations is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis which is popular in learning theory. Our result reveals that the final prediction precision may have a weak dependency on the pretrained model especially in the case of large training iterations. The observation inspires us to leverage pretraining data for finetuning, since this data is also available for finetuning. The generalization result of using pretraining data shows that the final performance on a target task can be improved when the appropriate pretraining data is included in finetuning. With the insight of the theoretical finding, we propose a novel selection strategy to select a subset from pretraining data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based finetuning pipeline.
 [28] arXiv:2111.12295 (crosslist from cs.LG) [pdf, other]

Title: Animal Behavior Classification via Deep Learning on Embedded SystemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Machine Learning (stat.ML)
We develop an endtoend deepneuralnetworkbased algorithm for classifying animal behavior using accelerometry data on the embedded system of an artificial intelligence of things (AIoT) device installed in a wearable collar tag. The proposed algorithm jointly performs feature extraction and classification utilizing a set of infiniteimpulseresponse (IIR) and finiteimpulseresponse (FIR) filters together with a multilayer perceptron. The utilized IIR and FIR filters can be viewed as specific types of recurrent and convolutional neural network layers, respectively. We evaluate the performance of the proposed algorithm via two realworld datasets collected from grazing cattle. The results show that the proposed algorithm offers good intra and interdataset classification accuracy and outperforms its closest contenders including two stateoftheart convolutionalneuralnetworkbased timeseries classification algorithms, which are significantly more complex. We implement the proposed algorithm on the embedded system of the collar tag's AIoT device to perform insitu classification of animal behavior. We achieve realtime insitu behavior inference from accelerometry data without imposing any strain on the available computational, memory, or energy resources of the embedded system.
 [29] arXiv:2111.12399 (crosslist from cs.LG) [pdf, other]

Title: Dictionarybased LowRank Approximations and the Mixed Sparse Coding problemAuthors: Jeremy E. CohenSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained lowrank approximations are nowadays important research topics. This work deals with columns of factor matrices of a lowrank approximation being sparse in a known and possibly overcomplete basis, a model coined as Dictionarybased LowRank Approximation (DLRA). While earlier contributions focused on finding factor columns inside a dictionary of candidate columns, i.e. onesparse approximations, this work is the first to tackle DLRA with sparsity larger than one. I propose to focus on the sparsecoding subproblem coined Mixed SparseCoding (MSC) that emerges when solving DLRA with an alternating optimization strategy. Several algorithms based on sparsecoding heuristics (greedy methods, convex relaxations) are provided to solve MSC. The performance of these heuristics is evaluated on simulated data. Then, I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionarybased Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics. These experiments suggest that DLRA extends the modeling capabilities of lowrank approximations, helps reducing estimation variance and enhances the identifiability and interpretability of estimated factors.
 [30] arXiv:2111.12429 (crosslist from cs.LG) [pdf, other]

Title: tsflex: flexible time series processing & feature extractionComments: The first two authors contributed equally. Submitted to SoftwareXSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Time series processing and feature extraction are crucial and timeintensive steps in conventional machine learning pipelines. Existing packages are limited in their realworld applicability, as they cannot cope with irregularlysampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domainindependent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularlysampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for stridedwindow feature extraction, and (2) the sequenceindex is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple windowstride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, indepth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memoryefficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.
 [31] arXiv:2111.12460 (crosslist from cs.CV) [pdf, other]

Title: ViCE: SelfSupervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic RepresentationsAuthors: Robin Karlsson, Tomoki Hayashi, Keisuke Fujii, Alexander Carballo, Kento Ohtani, Kazuya TakedaSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
This work presents a selfsupervised method to learn dense semantically rich visual concept embeddings for images inspired by methods for learning word embeddings in NLP. Our method improves on prior work by generating more expressive embeddings and by being applicable for highresolution images. Viewing the generation of natural images as a stochastic process where a set of latent visual concepts give rise to observable pixel appearances, our method is formulated to learn the inverse mapping from pixels to concepts. Our method greatly improves the effectiveness of selfsupervised learning for dense embedding maps by introducing superpixelization as a natural hierarchical step up from pixels to a small set of visually coherent regions. Additional contributions are regional contextual masking with nonuniform shapes matching visually coherent patches and complexitybased view sampling inspired by masked language models. The enhanced expressiveness of our dense embeddings is demonstrated by significantly improving the stateoftheart representation quality benchmarks on COCO (+12.94 mIoU, +87.6\%) and Cityscapes (+16.52 mIoU, +134.2\%). Results show favorable scaling and domain generalization properties not demonstrated by prior work.
 [32] arXiv:2111.12486 (crosslist from physics.aoph) [pdf, other]

Title: Enhanced monitoring of atmospheric methane from space with hierarchical Bayesian inferenceAuthors: Clayton Roberts, Oliver Shorttle, Kaisey Mandel, Matthew Jones, Rutger Ijzermans, Bill Hirst, Philip JonathanComments: 20 pages, 6 figures. Under consideration at Nature CommunicationsSubjects: Atmospheric and Oceanic Physics (physics.aoph); Geophysics (physics.geoph); Applications (stat.AP)
Methane is a strong greenhouse gas, with a higher radiative forcing per unit mass and shorter atmospheric lifetime than carbon dioxide. The remote sensing of methane in regions of industrial activity is a key step toward the accurate monitoring of emissions that drive climate change. Whilst the TROPOspheric Monitoring Instrument (TROPOMI) on board the Sentinal5P satellite is capable of providing daily global measurement of methane columns, data are often compromised by cloud cover. Here, we develop a statistical model which uses nitrogen dioxide concentration data from TROPOMI to accurately predict values of methane columns, expanding the average daily spatial coverage of observations of the Permian Basin from 16% to 88% in the year 2019. The addition of predicted methane abundances at locations where direct observations are not available will support inversion methods for estimating methane emission rates at shorter timescales than is currently possible.
 [33] arXiv:2111.12545 (crosslist from cs.LG) [pdf, other]

Title: Learning to Refit for Convex Learning ProblemsSubjects: Machine Learning (cs.LG); Computation (stat.CO)
Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the stateoftheart.
 [34] arXiv:2111.12550 (crosslist from cs.HC) [pdf, other]

Title: A WorkerTask Specialization Model for Crowdsourcing: Efficient Inference and Fundamental LimitsSubjects: HumanComputer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
Crowdsourcing system has emerged as an effective platform to label data with relatively low cost by using nonexpert workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$type workertask specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given recovery accuracy, and propose an inference algorithm achieving the orderwise optimal bound. We conduct experiments both on synthetic and realworld datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions.
 [35] arXiv:2111.12577 (crosslist from cs.CV) [pdf, other]

Title: A Method for Evaluating the Capacity of Generative Adversarial Networks to Reproduce Highorder Spatial ContextComments: Submitted to IEEETPAMI. Early version with partial results has been accepted for poster presentation at SPIEMI 2022Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Generative adversarial networks are a kind of deep generative model with the potential to revolutionize biomedical imaging. This is because GANs have a learned capacity to draw wholeimage variates from a lowerdimensional representation of an unknown, highdimensional distribution that fully describes the input training images. The overarching problem with GANs in clinical applications is that there is not adequate or automatic means of assessing the diagnostic quality of images generated by GANs. In this work, we demonstrate several tests of the statistical accuracy of images output by two popular GAN architectures. We designed several stochastic object models (SOMs) of distinct features that can be recovered after generation by a trained GAN. Several of these features are highorder, algorithmic pixelarrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect the known arrangement rules. We then tested the rates at which the different GANs correctly reproduced the rules under a variety of training scenarios and degrees of featureclass similarity. We found that ensembles of generated images can appear accurate visually, and correspond to low Frechet Inception Distance scores (FID), while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that while loworder ensemble statistics are largely correct, there are numerous quantifiable errors per image that plausibly can affect subsequent use of the GANgenerated images.
 [36] arXiv:2111.12594 (crosslist from cs.CV) [pdf, other]

Title: Conditional ObjectCentric Learning from VideoAuthors: Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus GreffComments: Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Objectcentric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with objectcentric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for any supervision. However, such fullyunsupervised methods still fail to scale to diverse realistic data, despite the use of increasingly complex inductive biases such as priors for the size of objects or the 3D geometry of the scene. In this paper, we instead take a weaklysupervised approach and focus on how 1) using the temporal dynamics of video data in the form of optical flow and 2) conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data. We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation. These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences. We also find that such initialstateconditioning can be used during inference as a flexible interface to query the model for specific objects or parts of objects, which could pave the way for a range of weaklysupervised approaches and allow more effective interaction with trained models.
 [37] arXiv:2111.12664 (crosslist from cs.CV) [pdf, other]

Title: MIO : Mutual Information Optimization using SelfSupervised Binary Contrastive LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Selfsupervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the stateoftheart selfsupervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This formulation not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the displacement of the feature vectors in the feature space. This helps us to get a mathematical insight into the working principle of contrastive learning. An additive $L_2$ regularizer is also used to prevent diverging of the feature vectors and to improve performance. The proposed method outperforms the stateoftheart algorithms on benchmark datasets like STL10, CIFAR10, CIFAR100. After only 250 epochs of pretraining, the proposed model achieves the best accuracy of 85.44\%, 60.75\%, 56.81\% on CIFAR10, STL10, CIFAR100 datasets, respectively.
 [38] arXiv:2111.12683 (crosslist from physics.aoph) [pdf, other]

Title: DataBased Models for Hurricane Evolution Prediction: A Deep Learning ApproachSubjects: Atmospheric and Oceanic Physics (physics.aoph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Fast and accurate prediction of hurricane evolution from genesis onwards is needed to reduce loss of life and enhance community resilience. In this work, a novel model development methodology for predicting storm trajectory is proposed based on two classes of Recurrent Neural Networks (RNNs). The RNN models are trained on input features available in or derived from the HURDAT2 North Atlantic hurricane database maintained by the National Hurricane Center (NHC). The models use probabilities of storms passing through any location, computed from historical data. A detailed analysis of model forecasting error shows that ManyToOne prediction models are less accurate than ManyToMany models owing to compounded error accumulation, with the exception of $6hr$ predictions, for which the two types of model perform comparably. Application to 75 or more test storms in the North Atlantic basin showed that, for shortterm forecasting up to 12 hours, the ManytoMany RNN storm trajectory prediction models presented herein are significantly faster than ensemble models used by the NHC, while leading to errors of comparable magnitude.
Replacements for Thu, 25 Nov 21
 [39] arXiv:1904.12218 (replaced) [pdf, other]

Title: Graph Kernels: A SurveyJournalref: Journal of Artificial Intelligence Research (2021), Volume 72, Pages 9431027Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [40] arXiv:2006.10679 (replaced) [pdf, other]

Title: REGroup: Rankaggregating Ensemble of Generative Classifiers for Robust PredictionsComments: WACV,2022. Project Page : this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [41] arXiv:2007.06226 (replaced) [pdf, other]

Title: AMITE: A Novel Polynomial Expansion for Analyzing Neural Network NonlinearitiesComments: 13 pages, 2 tables, 9 figures, LaTeX; minor grammar updates, equation numbering, and exposition clarification updatesSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [42] arXiv:2007.09738 (replaced) [pdf, other]

Title: Hypothesis tests for structured rank correlation matricesSubjects: Methodology (stat.ME)
 [43] arXiv:2009.09525 (replaced) [pdf, other]

Title: Deep Autoencoders: From Understanding to Generalization GuaranteesJournalref: R. Cosentino, R. Balestriero, R. Baraniuk, B. Aazhang, 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021)Subjects: Machine Learning (cs.LG); Group Theory (math.GR); Machine Learning (stat.ML)
 [44] arXiv:2010.01184 (replaced) [pdf, other]

Title: Effective Sample Size, Dimensionality, and Generalization in Covariate Shift AdaptationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
 [45] arXiv:2010.15764 (replaced) [pdf, other]

Title: Domain adaptation under structural causal modelsComments: 80 pages, 22 figures, accepted in JMLRSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [46] arXiv:2011.09468 (replaced) [pdf, other]

Title: Gradient Starvation: A Learning Proclivity in Neural NetworksAuthors: Mohammad Pezeshki, SékouOumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume LajoieComments: Proceeding of NeurIPS 2021Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
 [47] arXiv:2011.12873 (replaced) [pdf, other]

Title: Hybrid Confidence Intervals for Informative Uniform Asymptotic Inference After Model SelectionAuthors: Adam McCloskeySubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [48] arXiv:2101.01299 (replaced) [pdf, other]

Title: Bayesian Uncertainty Quantification for Lowrank Matrix CompletionSubjects: Methodology (stat.ME)
 [49] arXiv:2102.03906 (replaced) [pdf, ps, other]

Title: Causal versions of Maximum Entropy and Principle of Insufficient ReasonAuthors: Dominik JanzingComments: 16 pagesJournalref: Journal of Causal Inference (2021)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [50] arXiv:2102.09159 (replaced) [pdf, other]

Title: Robust and Differentially Private Mean EstimationComments: 58 pages, 2 figures, both exponential time and efficient algorithms no longer require a known bound on the true meanSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (stat.ML)
 [51] arXiv:2103.02457 (replaced) [pdf, other]

Title: Continuous scaled phasetype distributionsSubjects: Probability (math.PR); Statistics Theory (math.ST)
 [52] arXiv:2103.07088 (replaced) [pdf, other]

Title: Orthogonalized Kernel Debiased Machine Learning for Multimodal Data AnalysisSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [53] arXiv:2104.09401 (replaced) [pdf, ps, other]

Title: Efficient multivariate inference in general factorial diagnostic studiesSubjects: Statistics Theory (math.ST)
 [54] arXiv:2105.00416 (replaced) [pdf, other]

Title: Selective Inference in Propensity Score AnalysisComments: 32 pages, 2 figures, 5 tablesSubjects: Methodology (stat.ME)
 [55] arXiv:2105.09429 (replaced) [pdf, other]

Title: Point process simulation of generalised inverse Gaussian processes and estimation of the Jaeger integralSubjects: Methodology (stat.ME); Signal Processing (eess.SP); Probability (math.PR)
 [56] arXiv:2106.00058 (replaced) [pdf, other]

Title: PUDLE: Implicit Acceleration of Dictionary Learning by BackpropagationSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [57] arXiv:2106.03969 (replaced) [pdf, other]

Title: ChowLiu++: Optimal PredictionCentric Learning of Tree Ising ModelsComments: 49 pages, 3 figures, to appear in FOCS'21Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
 [58] arXiv:2108.10573 (replaced) [pdf, other]

Title: The staircase property: How hierarchical structure can guide deep learningComments: 60 pages, accepted to NeurIPS '21Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [59] arXiv:2109.02624 (replaced) [pdf, other]

Title: Functional additive models on manifolds of planar shapes and formsSubjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
 [60] arXiv:2109.08229 (replaced) [pdf, ps, other]

Title: Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration SamplingComments: Submitted to EconometricaSubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)
 [61] arXiv:2109.11939 (replaced) [pdf, other]

Title: Discovering PDEs from Multiple ExperimentsComments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computational Physics (physics.compph)
 [62] arXiv:2109.14501 (replaced) [pdf, other]

Title: Towards a theory of outofdistribution learningAuthors: Ali Geisa, Ronak Mehta, Hayden S. Helm, Jayanta Dey, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. VogelsteinSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [63] arXiv:2110.05428 (replaced) [pdf, other]

Title: Learning Temporally Causal Latent Processes from General Temporal DataSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [64] arXiv:2110.13081 (replaced) [pdf, ps, other]

Title: A Note on Consistency of the Bayes Estimator of the DensityAuthors: A.G. NogalesComments: arXiv admin note: text overlap with arXiv:2008.00683Subjects: Statistics Theory (math.ST)
 [65] arXiv:2111.04805 (replaced) [pdf, other]

Title: Solution to the NonMonotonicity and Crossing Problems in Quantile RegressionComments: 8 pages, 14 figures, IEEE conference formatSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [66] arXiv:2111.05070 (replaced) [pdf, other]

Title: Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic InterventionsComments: Added a new upper boundSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Methodology (stat.ME); Machine Learning (stat.ML)
 [67] arXiv:2111.11655 (replaced) [pdf, other]

Title: Multitask manifold learning for small sample size datasetsComments: 22 pages, 15 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2111, contact, help (Access key information)