LCSL logo
MaLGa logo

Publication year:

  • Introducing Temporal Correlation in Rainfall and Wind Prediction From Underwater Noise
    A Trucco, A Barla, R Bozzano, S Pensieri, A Verri, David Solarna
    IEEE JOURNAL OF OCEANIC ENGINEERING
    Abstract: While in the past the prediction of wind and rainfall from underwater noise was performed using empirical equations fed with very few spectral bins and fitted to the data, it has recently been shown that regression performed using supervised machine learning techniques can benefit from the simultaneous use of all spectral bins, at the cost of increased complexity. However, both empirical equations and machine learning regressors perform the prediction using only the acoustic information collected at the time when one wants to know the wind speed or the rainfall intensity. At most, averages are made between spectra measured at subsequent times (spectral compounding) or between predictions obtained at subsequent times (prediction compounding). In this article, it is proposed to exploit the temporal correlation inherent in the phenomena being predicted, as has already been done in methods that forecast wind and rainfall from their values (and sometimes those of other meteorological quantities) in the recent past. A special architecture of recurrent neural networks, the long short- term memory, is used along with a data set composed of about 16 months of underwater noise measurements (acquired every 10 min, simultaneously with wind and rain measurements above the sea surface) to demonstrate that the introduction of temporal correlation brings significant advantages, improving the accuracy and reducing the problems met in the widely adopted memoryless prediction performed by random forest regression. Working with samples acquired at 10-min intervals, the best performance is obtained by including three noise spectra for wind prediction and six spectra for rainfall prediction.
  • Physics informed machine learning for wind speed prediction
    D Lagomarsino-Oneto, G Meanti, N Pagliana, A Verri, A Mazzino, L Rosasco, A Seminara
    Energy, Volume 268, 2023, 126628
    Abstract: The ability to predict wind is crucial for both energy production and weather forecasting. Mechanistic models that form the basis of traditional forecasting perform poorly near the ground. Here we take an alternative data-driven approach based on supervised learning. We analyze massive datasets of wind measured from anemometers located at 10 m height in 32 locations in central and north-west Italy. We train supervised learning algorithms using the past history of wind to predict its value at future horizons. Using data from single locations and horizons, we compare systematically several algorithms where we vary the input/output variables, the memory and the linear vs non-linear model. We then compare performance of the best algorithms across all locations and forecasting horizons. We find that the optimal design as well as its performance change with the location. We demonstrate that the presence of a diurnal cycle provides a rationale to understand this variation. We conclude with a systematic comparison with state of the art algorithms. When focusing on publicly available datasets, our algorithm improves performance of 0.3 m/s on average. In the aggregate, these comparisons show that, when the model is accurately designed, shallow algorithms are competitive with deep architectures.
  • Iterative regularization in classification via hinge loss diagonal descent
    V Apidopoulos, T Poggio, L Rosasco, S Villa
    ArXiv Preprint
    Abstract: Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of regression and inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence. Our approach compares favorably with other alternatives, as confirmed also in numerical simulations.
  • Convergence of an Asynchronous Block-Coordinate Forward-Backward Algorithm for Convex Composite Optimization
    C. Traoré, S. Salzo, S. Villa
    ArXiv preprint
    Abstract: In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stochastic quasi-Fejér sequence and thus converge almost surely to a minimizer of the objective function. Moreover, we prove a general sublinear rate of convergence in expectation for the function values and a linear rate of convergence in expectation under an error bound condition of Tseng type.
  • ADHERENT: Learning Human-like Trajectory Generators for Whole-body Control of Humanoid Robots
    P.M. Viceconte, R. Camoriano, G. Romualdi, D. Ferigo, S. Dafarra, S. Traversaro, G. Oriolo, L. Rosasco, D. Pucci
    IEEE Robotics and Automation Letters
    Abstract: Human-like trajectory generation and footstep planning represent challenging problems in humanoid robotics. Recently, research in computer graphics investigated machine-learning methods for character animation based on training human-like models directly on motion capture data. Such methods proved effective in virtual environments, mainly focusing on trajectory visualization. This paper presents ADHERENT, a system architecture integrating machine-learning methods used in computer graphics with whole-body control methods employed in robotics to generate and stabilize human-like trajectories for humanoid robots. Leveraging human motion capture locomotion data, ADHERENT yields a general footstep planner, including forward, sideways, and backward walking trajectories that blend smoothly from one to another. Furthermore, at the joint configuration level, ADHERENT computes data-driven whole-body postural reference trajectories coherent with the generated footsteps, thus increasing the human likeness of the resulting robot motion. Extensive validations of the proposed architecture are presented with both simulations and real experiments on the iCub humanoid robot, thus demonstrating ADHERENT to be robust to varying step sizes and walking speeds.
  • Fast approximation of orthogonal matrices and application to PCA
    C Rusu, Rosasco L
    Signal Processing
    Abstract: Orthogonal projections are a standard technique of dimensionality reduction in machine learning applications. We study the problem of approximating orthogonal matrices so that their application is numerically fast and yet accurate. We find an approximation by solving an optimization problem over a set of structured matrices, that we call extended orthogonal Givens transformations, including Givens rotations as a special case. We propose an efficient greedy algorithm to solve such a problem and show that it strikes a balance between approximation accuracy and speed of computation. The approach is relevant to spectral methods and we illustrate its application to PCA.
  • An elementary analysis of ridge regression with random design
    J Mourtada, Rosasco L
    ArXiv Preprint
    Abstract: In this short note, we present an elementary analysis of the prediction error of ridge regression with random design. The proof is short and self-contained. In particular, it avoids matrix concentration or control of empirical processes, by using a simple combination of exchangeability arguments, matrix identities and operator convexity.
  • Multiclass learning with margin: exponential rates with no bias-variance trade-off
    S Vigogna, G Meanti, De Vito E, Rosasco L
    ArXiv Preprint
    Abstract: We study the behavior of error bounds for multiclass classification under suitable margin conditions. For a wide variety of methods we prove that the classification error under a hard-margin condition decreases exponentially fast without any bias-variance trade-off. Different convergence rates can be obtained in correspondence of different margin assumptions. With a self-contained and instructive analysis we are able to generalize known results from the binary to the multiclass setting.
  • Iterative regularization for low complexity regularizers
    C Molinari, M Massias, Rosasco L, Villa S
    ArXiv Preprint
    Abstract: Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.
  • Nystr\" om Kernel Mean Embeddings
    A Chatalic, N Schreuder, A Rudi, Rosasco L
    ArXiv Preprint
    Abstract: Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nystr\"om method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and illustrate our theoretical findings with numerical experiments.
  • Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times
    D Calandriello, L Carratino, A Lazaric, M Valko, Rosasco L
    ArXiv Preprint
    Abstract: Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points. A reformulation of the same GP posterior highlights that this complexity mainly depends on how many \emph{unique} historical points are considered. This can have important implication in active learning settings, where the set of historical points is constructed sequentially by the learner. We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. Limiting the number of switches also limits the number of unique points in the history of the GP. Thus, the efficient GP reformulation can be used to exactly and cheaply compute the posteriors required to run the GP-Opt algorithms. This approach is especially useful in real-world applications of GP-Opt with high switch costs (e.g. switching chemicals in wet labs, data/model loading in hyperparameter optimization). As examples of this meta-approach, we modify two well-established GP-Opt algorithms, GP-UCB and GP-EI, to switch candidates as infrequently as possible adapting rules from batched GP-Opt. These versions preserve all the theoretical no-regret guarantees while improving practical aspects of the algorithms such as runtime, memory complexity, and the ability of batching candidates and evaluating them in parallel.
  • Grand-Canonical Optimal Transport
    Di Marino S, M Lewin, L Nenna
    ArXiv Preprint
    Abstract: We study a generalization of the multi-marginal optimal transport problem, which has no fixed number of marginals and is inspired of statistical mechanics. It consists in optimizing a linear combination of the costs for all the possible 's, while fixing a certain linear combination of the corresponding marginals.
  • Optimal Transport losses and Sinkhorn algorithm with general convex regularization
    Di Marino S, A Gerolin
    ArXiv Preprint
    Abstract: We introduce a new class of convex-regularized Optimal Transport losses, which generalizes the classical Entropy-regularization of Optimal Transport and Sinkhorn divergences, and propose a generalized Sinkhorn algorithm. Our framework unifies many regularizations and numerical methods previously appeared in the literature. We show the existence of the maximizer for the dual problem, complementary slackness conditions, providing a complete characterization of solutions for such class of variational problems. As a consequence, we study structural properties of these losses, including continuity, differentiability and provide explicit formulas for the its gradient. Finally, we provide theoretical guarantees of convergences and stability of the generalized Sinkhorn algorithm, even in the continuous setting. The techniques developed here are directly applicable also to study Wasserstein barycenters or, more generally, multi-marginal problems.
  • AI-based component management system for structured content creation, annotation, and publication.
    Barla A, Cuneo M, Nunzi SR, Paniati G, Vian A
    5th International Conference on Intelligent Human Systems Integration (IHSI 2022): Integrating People and Intelligent Systems
    Abstract: Nowadays, the ever changing and growing amount of information, regulations, and data requires large organizations to describe on the web increasingly complex and interdependent business processes and services, ideally creating user-profiled content that is clear and up to date. To successfully achieve this goal, as off-the-shelf solutions are missing, institutions have to embark in a digital transformation process fully endorsed by governance, led by a multidisciplinary team of experts, and strongly integrated with artificial intelligence (AI) tools. In this paper we describe how a content service platform, that integrates human processes and state-of-the-art AI services, was successfully employed in our institution (UniGe) to manage, and support a system of about 200 websites. Following a single-sourcing paradigm, its advent allowed for the decoupling of content and technology, preparing UniGe for the future needs of the semantic web.
  • Fast iterative regularization by reusing data
    C Vega, C Molinari, Rosasco L, Villa S
    ArXiv Preprint
    Abstract: Discrete inverse problems correspond to solving a system of equations in a stable way with respect to noise in the data. A typical approach to enforce uniqueness and select a meaningful solution is to introduce a regularizer. While for most applications the regularizer is convex, in many cases it is not smooth nor strongly convex. In this paper, we propose and study two new iterative regularization methods, based on a primal-dual algorithm, to solve inverse problems efficiently. Our analysis, in the noise free case, provides convergence rates for the Lagrangian and the feasibility gap. In the noisy case, it provides stability bounds and early-stopping rules with theoretical guarantees. The main novelty of our work is the exploitation of some a priori knowledge about the solution set, i.e. redundant information. More precisely we show that the linear systems can be used more than once along the iteration. Despite the simplicity of the idea, we show that this procedure brings surprising advantages in the numerical applications. We discuss various approaches to take advantage of redundant information, that are at the same time consistent with our assumptions and flexible in the implementation. Finally, we illustrate our theoretical findings with numerical simulations for robust sparse recovery and image reconstruction through total variation. We confirm the efficiency of the proposed procedures, comparing the results with state-of-the-art methods.
  • Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces
    V Kostic, P Novelli, A Maurer, C Ciliberto, Rosasco L, M Pontil
    ArXiv Preprint
    Abstract: We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relation- ship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for forecasting and mode decomposition.
  • AdaTask: Adaptive Multitask Online Learning
    Laforgue P, Della Vecchia A, Cesa-Bianchi N, Rosasco L
    ArXiv Preprint
    Abstract: We introduce and analyze AdaTask, a multitask online learning algorithm that adapts to the unknown structure of the tasks. When the tasks are stochastically activated, we show that the regret of AdaTask is better, by a factor that can be as large as , than the regret achieved by running independent algorithms, one for each task. AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential. Through a variational formulation of this potential, our analysis reveals how AdaTask jointly learns the tasks and their structure. Experiments supporting our findings are presented.
  • Compounding Approaches for Wind Prediction From Underwater Noise by Supervised Learning
    Trucco A, Barla A, Bozzano R, Fava E, Pensieri S, Verri A, Solarna D
    IEEE Journal of Oceanic Engineering
    Abstract: Underwater noise analysis allows estimation of parameters of meteorological interest, difficult to monitor with in situ devices, especially in very harsh environments such as polar waters. Rainfall detection is a fundamental step of acoustical meteorology toward quantifying precipitation and, indirectly, wind. To date, this task has been conducted with some success by using a few frequency bins of the noise spectrum and combining their absolute values and slopes into some inequalities. Unfortunately, these algorithms do not perform well when applied to spectra obtained by averaging multiple noise recordings made over the course of an hour. Supervised, machine learning models allow the use of all the frequency bins in the spectrum, exploiting relationships that are difficult for a human observer to identify. Among the different models tested, a binary classifier based on random forest performed well with moderate computational load. Using a dataset consisting of over 18 000 hourly averaged spectra (approximately 25 months of in situ recordings) and comparing the results with measurements from a surface-mounted rain gauge, the proposed system detects precipitations greater than 1 mm/h with 90% probability, keeping the false alarm probability below 0.5%. This system has demonstrated remarkable robustness as performance is achieved without intentionally excluding any spectra corrupted by sounds produced by other sources, such as naval traffic and wind blowing over the sea surface.
  • Stochastic Zeroth order Descent with Structured Directions
    Rando M, Molinari C, Villa S, Rosasco L
    ArXiv Preprint
    Abstract: We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of l ≤ d orthogonal directions, where d is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form O(d/lk−c) for every c < 1/2, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using l multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances.
  • Learn Fast, Segment Well: Fast Object Segmentation Learning on the iCub Robot
    F Ceola, E Maiettini, G Pasquale, G Meanti, L Rosasco
    ArXiv Preprint
    Abstract: The visual system of a robot has different requirements depending on the application: it may require high accuracy or reliability, be constrained by limited resources or need fast adaptation to dynamically changing environments. In this work, we focus on the instance segmentation task and provide a comprehensive study of different techniques that allow adapting an object segmentation model in presence of novel objects or different domains. We propose a pipeline for fast instance segmentation learning designed for robotic applications where data come in stream. It is based on an hybrid method leveraging on a pre-trained CNN for feature extraction and fast-to-train Kernel-based classifiers. We also propose a training protocol that allows to shorten the training time by performing feature extraction during the data acquisition. We benchmark the proposed pipeline on two robotics datasets and we deploy it on a real robot, i.e. the iCub humanoid. To this aim, we adapt our method to an incremental setting in which novel objects are learned on-line by the robot. The code to reproduce the experiments is publicly available on GitHub.
  • Approximate Bayesian Neural Operators: Uncertainty Quantification for Parametric PDEs
    E Magnani, N Krämer, R Eschenhagen, L Rosasco, P Hennig
    ArXiv Preprint
    Abstract: Neural operators are a type of deep architecture that learns to solve (i.e. learns the nonlinear solution operator of) partial differential equations (PDEs). The current state of the art for these models does not provide explicit uncertainty quantification. This is arguably even more of a problem for this kind of tasks than elsewhere in machine learning, because the dynamical systems typically described by PDEs often exhibit subtle, multiscale structure that makes errors hard to spot by humans. In this work, we first provide a mathematically detailed Bayesian formulation of the ''shallow'' (linear) version of neural operators in the formalism of Gaussian processes. We then extend this analytic treatment to general deep neural operators using approximate methods from Bayesian deep learning. We extend previous results on neural operators by providing them with uncertainty quantification. As a result, our approach is able to identify cases, and provide structured uncertainty estimates, where the neural operator fails to predict well.
  • Efficient Unsupervised Learning for Plankton Images
    PD Alfano, M Rando, M Letizia, F Odone, L Rosasco, VP Pastore
    ArXiv Preprint
    Abstract: Monitoring plankton populations in situ is fundamental to preserve the aquatic ecosystem. Plank- ton microorganisms are in fact susceptible of minor environmental perturbations, that can reflect into consequent morphological and dynamical modifications. Nowadays, the availability of advanced automatic or semi-automatic acquisition systems has been allowing the production of an increasingly large amount of plankton image data. The adoption of machine learning algorithms to classify such data may be affected by the significant cost of manual annotation, due to both the huge quantity of acquired data and the numerosity of plankton species. To address these challenges, we propose an efficient unsupervised learning pipeline to provide accurate classification of plankton microorganisms. We build a set of image descriptors exploiting a two-step procedure. First, a Variational Autoen- coder (VAE) is trained on features extracted by a pre-trained neural network. We then use the learnt latent space as image descriptor for clustering. We compare our method with state-of-the-art unsupervised approaches, where a set of pre-defined hand-crafted features is used for clustering of plankton images. The proposed pipeline outperforms the benchmark algorithms for all the plankton datasets included in our analysis, providing better image embedding properties.
  • From Handheld to Unconstrained Object Detection: a Weakly-supervised On-line Learning Approach
    E Maiettini, A Maracani, R Camoriano, G Pasquale, V Tikhanoff, L Rosasco, L Natale
    2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
    Abstract: Deep Learning (DL) based methods for object detection achieve remarkable performance at the cost of computationally expensive training and extensive data labeling. Robots embodiment can be exploited to mitigate this burden by acquiring automatically annotated training data via a natural interaction with a human showing the object of interest, hand-held. However, learning solely from this data may introduce biases (the so-called domain shift), and prevents adaptation to novel tasks. While Weakly-supervised Learning offers a well-established set of techniques to cope with these problems in general-purpose Computer Vision, its adoption in challenging robotic domains is still at a preliminary stage. In this work, we target the scenario of a robot trained in a teacher-learner setting to detect handheld objects. The aim is to improve detection performance in different settings by letting the robot explore the environment with a limited human labeling budget. We compare several techniques for WSL in detection pipelines to reduce model re-training costs without compromising accuracy, proposing solutions which target the considered robotic scenario. We show that the robot can improve adaptation to novel domains, either by interacting with a human teacher (Active Learning) or with an autonomous supervision (Semi-supervised Learning). We integrate our strategies into an on-line detection method, achieving efficient model update capabilities with few labels. We experimentally benchmark our method on challenging robotic object detection tasks under domain shift.
  • Scalable Causal Discovery with Score Matching
    F Montagna, N Noceti, L Rosasco, K Zhang, F Locatello
    NeurIPS 2022 Workshop on Score-Based Methods
    Abstract: This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function , we extend the work of Rolland et al., 2022, that only recovers the topological order from the score and requires an expensive pruning step to discover the edges. Our analysis leads to DAS, a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.
  • Implicit regularization with strongly convex bias: stability and acceleration
    S Villa, S Matet, BC Vũ, L Rosasco
    Analysis and Applications, 1-27, 2022
    Abstract: Implicit regularization refers to the property of optimization algorithms to be biased towards a certain class of solutions. This property is relevant to understand the behavior of modern machine learning algorithms as well as to design efficient computational methods. While the case where the bias is given by a Euclidean norm is well understood, implicit regularization schemes for more general classes of biases are much less studied. In this work, we consider the case where the bias is given by a strongly convex functional, in the context of linear models, and data possibly corrupted by noise. In particular, we propose and analyze accelerated optimization methods and highlight a trade-off between convergence speed and stability. Theoretical findings are complemented by an empirical analysis on high-dimensional inverse problems in machine learning and signal processing, showing excellent results compared to the state of the art.
  • Implicit regularization with strongly convex bias: stability and acceleration
    S Villa, S Matet, BC Vũ, L Rosasco
    Analysis and Applications, 1-27, 2022
    Abstract: Implicit regularization refers to the property of optimization algorithms to be biased towards a certain class of solutions. This property is relevant to understand the behavior of modern machine learning algorithms as well as to design efficient computational methods. While the case where the bias is given by a Euclidean norm is well understood, implicit regularization schemes for more general classes of biases are much less studied. In this work, we consider the case where the bias is given by a strongly convex functional, in the context of linear models, and data possibly corrupted by noise. In particular, we propose and analyze accelerated optimization methods and highlight a trade-off between convergence speed and stability. Theoretical findings are complemented by an empirical analysis on high-dimensional inverse problems in machine learning and signal processing, showing excellent results compared to the state of the art.
  • Learning new physics efficiently with nonparametric methods
    M Letizia, G Losapio, M Rando, G Grosso, A Wulzer, M Pierini, M Zanetti, L Rosasco
    The European Physical Journal C 82 (10), 879
    Abstract: We present a machine learning approach for model-independent new physics searches. The corresponding algorithm is powered by recent large-scale implementations of kernel methods, nonparametric learning algorithms that can approximate any continuous function given enough data. Based on the original proposal by D’Agnolo and Wulzer (Phys Rev D 99(1):015014, 2019, arXiv:1806.02350 [hep-ph]), the model evaluates the compatibility between experimental data and a reference model, by implementing a hypothesis testing procedure based on the likelihood ratio. Model-independence is enforced by avoiding any prior assumption about the presence or shape of new physics components in the measurements. We show that our approach has dramatic advantages compared to neural network implementations in terms of training times and computational resources, while maintaining comparable performances. In particular, we conduct our tests on higher dimensional datasets, a step forward with respect to previous studies.
  • The World of Graph Neural Networks: From the Mystery of Generalization to Foundational Limitations
    V Apidopoulos, T Poggio, L Rosasco, S Villa
    ArXiv Preprint
    Abstract: Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of regression and inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence. Our approach compares favorably with other alternatives, as confirmed also in numerical simulations.
  • Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression
    G. Meanti, L. Carratino, E. De Vito and L. Rosasco
    AISTATS
    Abstract:  Kernel methods provide a principled approach to nonparametric learning. While their basic implementations scale poorly to large problems, recent advances showed that approximate solvers can efficiently handle massive datasets. A shortcoming of these solutions is that hyperparameter tuning is not taken care of, and left for the user to perform. Hyperparameters are crucial in practice and the lack of automated tuning greatly hinders efficiency and usability. In this paper, we work to fill in this gap focusing on kernel ridge regression based on the Nyström approximation. After reviewing and contrasting a number of hyperparameter tuning strategies, we propose a complexity regularization criterion based on a data dependent penalty, and discuss its efficient optimization. Then, we proceed to a careful and extensive empirical evaluation highlighting strengths and weaknesses of the different tuning strategies. Our analysis shows the benefit of the proposed approach, that we hence incorporate in a library for large scale kernel methods to derive adaptively tuned solutions.
  • Attributed Graphettes-Based Preterm Infants Motion Analysis
    D Garbarino, M Moro, C Tacchino, P Moretti, M Casadio, F Odone, A Barla
    Complex Networks & Their Applications X
    Abstract: The study of preterm infants neuro-motor status can be performed by analyzing infants spontaneous movements. Nowadays, available automatic methods for assessing infants motion patterns are still limited. We present a novel pipeline for the characterization of infants spontaneous movements, which given RGB videos leverages on network analysis and NLP. First, we describe a body configuration for each frame considering landmark points on infants bodies as nodes of a network and connecting them depending on their proximity. Each configuration can be described by means of attributed graphettes. We identify each attributed graphette by a string, thus allowing to study videos as texts, ie sequences of strings. This allows us exploiting NLP methods as topic modelling to obtain interpretable representations. We analyze topics to describe both global and local differences in infants with normal and abnormal …
  • Fast object segmentation learning with kernel-based Methods for Robotics
    Ceola F., Maiettini E., Pasquale G., Rosasco L., Natale L.
    ICRA 2021
    Abstract: Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation, especially in presence of occlusions. Like many other computer vision tasks, the adoption of deep architectures has made available algorithms that perform this task with remarkable performance. However, adoption of such algorithms in robotics is hampered by the fact that training requires large amount of computing time and it cannot be performed on-line.In this work, we propose a novel architecture for object segmentation, that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods. Our approach is based on a pre-trained Mask R-CNN, in which various layers have been replaced with a set of classifiers and regressors that are retrained for a new task. We employ an efficient Kernel-based method that allows for fast …
  • Constructing fast approximate eigenspaces with application to the fast graph Fourier transforms
    Rusu C., Rosasco L.
    IEEE Transactions on Signal Processing
    Abstract: We investigate numerically efficient approximations of eigenspaces associated with symmetric and general matrices. The eigenspaces are factored into a fixed number of fundamental components that can be efficiently manipulated which we consider to be extended orthogonal Givens or scaling and shear transformations. The number of these components controls the trade-off between approximation accuracy and the computational complexity of projecting on the eigenspaces. We write minimization problems for the single fundamental components and provide closed-form solutions. Then we propose algorithms that iteratively update all these components until convergence. We show results on random matrices and an application on the approximation of graph Fourier transforms for directed and undirected graphs.
  • Regularized ERM on random subspaces
    Della Vecchia A., Mourtada J., De Vito E., Rosasco L.
    AISTATS 2021
    Abstract: We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystr {ö} m approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to ex-tend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector ma-chines. This extension requires developing new proofs, that use different technical tools. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance. Theoretical results are illustrated with simple numerical experiments.
  • Asymptotics of ridge (less) regression under general source condition
    Richards D., Mourtada J., Rosasco L
    AISTATS 2021
    Abstract: We analyze the prediction error of ridge regression in an asymptotic regime where the sample size and dimension go to infinity at a proportional rate. In particular, we consider the role played by the structure of the true regression parameter. We observe that the case of a general deterministic parameter can be reduced to the case of a random parameter from a structured prior. The latter assumption is a natural adaptation of classic smoothness assumptions in nonparametric regression, which are known as source conditions in the the context of regularization theory for inverse problems. Roughly speaking, we assume the large coefficients of the parameter are in correspondence to the principal components. In this setting a precise characterisation of the test error is obtained, depending on the inputs covariance and regression parameter structure. We illustrate this characterisation in a simplified setting to investigate the influence of the true parameter on optimal regularisation for overparameterized models. We show that interpolation (no regularisation) can be optimal even with bounded signal-to-noise ratio (SNR), provided that the parameter coefficients are larger on high-variance directions of the data, corresponding to a more regular function than posited by the regularization term. This contrasts with previous work considering ridge regression with isotropic prior, in which case interpolation is only optimal in the limit of infinite SNR.
  • Regularization: From Inverse Problems to Large-Scale Machine Learning
    De Vito E, Rosasco L, A Rudi
    Harmonic and Applied Analysis
    Abstract: We discuss regularization methods in machine learning with an emphasis on the interplay between statistical and computational aspects. We begin recalling a connection between inverse problem and machine learning. Then, we discuss how it allows to translate modeling principles into computational procedures, suggesting strategies to deal with large-scale learning problems.
  • Accelerated iterative regularization via dual diagonal descent
    Calatroni L., Garrigos G., Rosasco L., Villa S.
    SIAM Journal on Optimization
    Abstract: We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with strongly convex regularization and general data-fit functions. We develop an inertial approach of which we analyze both convergence and stability properties. Using tools from inexact proximal calculus, we prove early stopping results with optimal convergence rates for additive data terms and further consider more general cases, such as the Kullback--Leibler divergence, for which different type of proximal point approximations hold.
  • On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning
    Ferigo D., Camoriano R., Viceconte P.M., Calandriello D., Traversaro S., Rosasco L., Pucci D.
    IEEE Robotics and Automation Letters
    Abstract: Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks. In this context, classical control systems tend to be based on simplified physical models and hard-coded strategies. Although successful in specific scenarios, this approach requires demanding tuning of parameters and switching logic between specifically-designed controllers for handling more general perturbations. We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets high-dimensional whole-body humanoid control and is validated on the iCub humanoid. Reward components incorporating expert knowledge on humanoid control enable fast learning of several robust behaviors by the same policy, spanning the entire body. We validate our method with extensive quantitative analyses in simulation, including out-of-sample tasks which demonstrate policy robustness and generalization, both key requirements towards real-world robot deployment.
  • Parallel random block-coordinate forward–backward algorithm: a unified convergence analysis
    Salzo S., Villa S.
    Mathematical Programming
    Abstract: We study the block-coordinate forward–backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak convergence of the iterates and the asymptotic rate o(1/n) for the mean of the function values. We derive linear rates under strong convexity and error bound conditions. Our analysis is based on an abstract convergence principle for stochastic descent algorithms which allows to extend and simplify existing results.
  • Structured Prediction for CRiSP Inverse Kinematics Learning with Misspecified Robot Models
    Marconi G.M., Camoriano R., Rosasco L., Ciliberto C.
    IEEE Robotics and Automation Letters
    Abstract: With the recent advances in machine learning, problems that traditionally would require accurate modeling to be solved analytically can now be successfully approached with data-driven strategies. Among these, computing the inverse kinematics of a redundant robot arm poses a significant challenge due to the non-linear structure of the robot, the hard joint constraints and the non-invertible kinematics map. Moreover, most learning algorithms consider a completely data-driven approach, while often useful information on the structure of the robot is available and should be positively exploited. In this work, we present a simple, yet effective, approach for learning the inverse kinematics. We introduce a structured prediction algorithm that combines a data-driven strategy with the model provided by a forward kinematics function – even when this function is misspecified – to accurately solve the problem. The proposed approach ensures that predicted joint configurations are well within the robot's constraints. We also provide statistical guarantees on the generalization properties of our estimator as well as an empirical evaluation of its performance on trajectory reconstruction tasks.
  • Proximal Gradient Methods for Machine Learning and Imaging
    S Salzo, Villa S
    Harmonic and Applied Analysis
    Abstract: Convex optimization plays a key role in data sciences. The objective of this work is to provide basic tools and methods at the core of modern nonlinear convex optimization. Starting from the gradient descent method we will focus on a comprehensive convergence analysis for the proximal gradient algorithm and its state-of-the art variants, including accelerated, stochastic and block-wise implementations, which are nowadays very popular techniques to solve machine learning and inverse problems.
  • Mean Nystr\" om Embeddings for Adaptive Compressive Learning
    A Chatalic, L Carratino, De Vito E, Rosasco L
    ArXiv Preprint
    Abstract: Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nystr\"om approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nystr\"om sketches indeed outperform those built with random features.
  • Representation theorems for normed modules
    Di Marino S, D Lučić, E Pasqualetto
    ArXiv Preprint
    Abstract: In this paper we study the structure theory of normed modules, which have been introduced by Gigli. The aim is twofold: to extend von Neumann's theory of liftings to the framework of normed modules, thus providing a notion of precise representative of their elements; to prove that each separable normed module can be represented as the space of sections of a measurable Banach bundle. By combining our representation result with Gigli's differential structure, we eventually show that every metric measure space (whose Sobolev space is separable) is associated with a cotangent bundle in a canonical way.
  • First order expansion in the semiclassical limit of the Levy-Lieb functional
    M Colombo, Di Marino S, F Stra
    ArXiv Preprint
    Abstract: We prove the conjectured first order expansion of the Levy-Lieb functional in the semiclassical limit, arising from Density Functional Theory (DFT). This is accomplished by interpreting the problem as the singular perturbation of an Optimal Transport problem via a Dirichlet penalization.
  • A Stochastic Bregman Primal-Dual Splitting Algorithm for Composite Optimization
    A Silveti-Falls, C Molinari, J Fadili
    ArXiv Preprint
    Abstract: We study a stochastic first order primal-dual method for solving convex-concave saddle point problems over real reflexive Banach spaces using Bregman divergences and relative smoothness assumptions, in which we allow for stochastic error in the computation of gradient terms within the algorithm. We show ergodic convergence in expectation of the Lagrangian optimality gap with a rate of O(1/k) and that every almost sure weak cluster point of the ergodic sequence is a saddle point in expectation under mild assumptions. Under slightly stricter assumptions, we show almost sure weak convergence of the pointwise iterates to a saddle point. Under a relative strong convexity assumption on the objective functions and a total convexity assumption on the entropies of the Bregman divergences, we establish almost sure strong convergence of the pointwise iterates to a saddle point. Our framework is general and does not need strong convexity of the entropies inducing the Bregman divergences in the algorithm. Numerical applications are considered including entropically regularized Wasserstein barycenter problems and regularized inverse problems on the simplex.
  • Inexact and stochastic generalized conditional gradient with augmented Lagrangian and proximal step
    J Fadili, C Molinari, A Silveti-Falls
    Journal of Nonsmooth Analysis and Optimization
    Abstract: In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in [25], which we denote ICGALP , that allow for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, e.g., in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lower-semicontinuous functions subject to an affine constraint of the form Ax = b for some bounded linear operator A. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible proximal operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian values (so-called convergence in the Bregman sense) and asymptotic feasibility of the affine constraint as well as strong convergence of the sequence of dual variables to a solution of the dual problem, in an almost sure sense. Almost sure convergence rates are given for the Lagrangian values and the feasibility gap for the ergodic primal variables. Rates in expectation are given for the Lagrangian values and the feasibility gap subsequentially in the pointwise sense. Numerical experiments verifying the predicted rates of convergence are shown as well.
  • A Supervised Learning Approach for Rainfall Detection From Underwater Noise Analysis
    Trucco A., Bozzano R., Fava E., Pensieri S., Verri A., Barla A.
    IEEE Journal of Oceanic Engineering
    Abstract: Underwater noise analysis allows estimation of parameters of meteorological interest, difficult to monitor with in situ devices, especially in very harsh environments such as polar waters. Rainfall detection is a fundamental step of acoustical meteorology toward quantifying precipitation and, indirectly, wind. To date, this task has been conducted with some success by using a few frequency bins of the noise spectrum and combining their absolute values and slopes into some inequalities. Unfortunately, these algorithms do not perform well when applied to spectra obtained by averaging multiple noise recordings made over the course of an hour. Supervised, machine learning models allow the use of all the frequency bins in the spectrum, exploiting relationships that are difficult for a human observer to identify. Among the different models tested, a binary classifier based on random forest performed well with moderate computational load. Using a dataset consisting of over 18 000 hourly averaged spectra (approximately 25 months of in situ recordings) and comparing the results with measurements from a surface-mounted rain gauge, the proposed system detects precipitations greater than 1 mm/h with 90% probability, keeping the false alarm probability below 0.5%. This system has demonstrated remarkable robustness as performance is achieved without intentionally excluding any spectra corrupted by sounds produced by other sources, such as naval traffic and wind blowing over the sea surface.
  • Predicting Tennis Match Outcomes with Network Analysis and Machine Learning
    Bayram F., Garbarino D., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: Singles tennis is one of the most popular individual sports in the world. Many researchers have embarked on a wide range of approaches to model a tennis match, using probabilistic modeling, or applying machine learning models to predict the outcome of matches. In this paper, we propose a novel approach based on network analysis to infer a surface-specific and time-varying score for professional tennis players and use it in addition to players’ statistics of previous matches to represent tennis match data. Using the resulting features, we apply advanced machine learning paradigms such as Multi-Output Regression and Learning Using Privileged Information, and compare the results with standard machine learning approaches. The models are trained and tested on more than 83,000 men’s singles tennis matches between the years 1991 and 2020. Evaluating the results shows the proposed methods provide more accurate predictions of tennis match outcome than classical approaches and outperform the existing methods in the literature and the current state-of-the-art models in tennis.
  • Where Do We Stand in Regularization for Life Science Studies?
    V Tozzo, C Azencott, S Fiorini, E Fava, A Trucco, A Barla
    Journal of Computational Biology
    Abstract: More and more biologists and bioinformaticians turn to machine learning to analyze large amounts of data. In this context, it is crucial to understand which is the most suitable data analysis pipeline for achieving reliable results. This process may be challenging, due to a variety of factors, the most crucial ones being the data type and the general goal of the analysis (e.g., explorative or predictive). Life science data sets require further consideration as they often contain measures with a low signal-to-noise ratio, high-dimensional observations, and relatively few samples. In this complex setting, regularization, which can be defined as the introduction of additional information to solve an ill-posed problem, is the tool of choice to obtain robust models. Different regularization practices may be used depending both on characteristics of the data and of the question asked, and different choices may lead to different results. In …
  • Epigenetic Regulation in Myelodysplastic Syndrome as A Consequence of A MicroRNA Dysregulation
    D Avenoso, M Squillario, A Barla, A Verri, E Carminati, C Nurra, E Todisco, F Gigli, F Bertolini, G Gregato, C Poletti, M Miglino
    ResearchSquare preprint
    Abstract: The myelodysplastic syndromes are a group of diseases characterized by impairment in hemopoiesis and variable tendency to progress to acute myeloid leukemia. Recent data regarding the MDS pathogenesis highlight alterations in several compartments of DNA replication, cell cycle control, and gene expression machinery. As previously reported, miRNA families are involved in MDS pathogenesis. Also, different miRNAs have been characterized in AML and MDS and allow the identi cation of prognostic risk categories independent from the revised international prognostic scoring system (R-IPSS). Herein we report the results and the possible scenarios regarding pathogenesis and disease progression secondary to the evidence of considerable gene network derangement following miRNA evaluation in a cohort of patients.
  • Temporal Pattern Detection in Time-Varying Graphical Models
    Tomasi F., Tozzo V. , Barla A.
    25th International Conference on Pattern Recognition (ICPR)
    Abstract: Graphical models allow to describe the interplay among variables of a system through a compact representation, suitable when relations evolve over time. For example, in a biological setting, genes interact differently depending on external environmental or metabolic factors. To incorporate this dynamics a viable strategy is to estimate a sequence of temporally related graphs assuming similarity among samples in different time points. While adjacent time points may direct the analysis towards a robust estimate of the underlying graph, the resulting model will not incorporate long-term or recurrent temporal relationships. In this work we propose a dynamical network inference model that leverages on kernels to consider general temporal patterns (such as circadian rhythms or seasonality). We show how our approach may also be exploited when the recurrent patterns are unknown, by coupling the network inference with a clustering procedure that detects possibly non-consecutive similar networks. Such clusters are then used to build similarity kernels. The convexity of the functional is determined by whether we impose or infer the kernel. In the first case, the optimisation algorithm exploits efficiently proximity operators with closed-form solutions. In the other case, we resort to an alternating minimisation procedure which jointly learns the temporal kernel and the underlying network. Extensive analysis on synthetic data shows the efficacy of our models compared to state-of-the-art methods. Finally, we applied our approach on two realworld applications to show how considering long-term patterns is fundamental to have insights on the behaviour of a complex system.
  • Understanding neural networks with reproducing kernel Banach spaces
    Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, Stefano Vigogna
    arXiv preprint arXiv:2109.09710
    Abstract: Characterizing the function spaces corresponding to neural networks can provide a way to understand their properties. In this paper we discuss how the theory of reproducing kernel Banach spaces can be used to tackle this challenge. In particular, we prove a representer theorem for a wide class of reproducing kernel Banach spaces that admit a suitable integral representation and include one hidden layer neural networks of possibly infinite width. Further, we show that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure, with norm given by the total variation norm of the measure. Our analysis simplifies and extends recent results in [34,29,30].
  • ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions
    Luigi Carratino, Stefano Vigogna, Daniele Calandriello, Lorenzo Rosasco
    NeurIPS 2021
    Abstract: We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.
  • Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization
    Marco Rando, Luigi Carratino, Silvia Villa, Lorenzo Rosasco
    arXiv preprint arXiv:2106.08598
    Abstract: Gaussian process optimization is a successful class of algorithms (e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, when the domain of the function is continuous, Gaussian process optimization has to either rely on a fixed discretization of the space, or solve a non-convex optimization subproblem at each evaluation. The first approach can negatively affect performance, while the second one puts a heavy computational burden on the algorithm. A third option, that only recently has been theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra non-convex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of O(T^4), where T is the number of iterations. In this paper, we introduce Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in O(T^2 d_ ext{eff}^2), where d_ ext{eff} is the effective dimension of the explored space, and which is typically much smaller than T. We corroborate our findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization.
  • From inexact optimization to learning via gradient concentration
    Bernhard Stankewitz, Nicole Mücke, Lorenzo Rosasco
    arXiv preprint arXiv:2106.05397
    Abstract: Optimization was recently shown to control the inductive bias in a learning process, a property referred to as implicit, or iterative regularization. The estimator obtained iteratively minimizing the training error can generalise well with no need of further penalties or constraints. In this paper, we investigate this phenomenon in the context of linear models with smooth loss functions. In particular, we investigate and propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration. The proof is easy to follow and allows to obtain sharp learning bounds. More generally, it highlights a way to develop optimization results into learning guarantees.
  • Optimal distributed control of linear parabolic equations by spectral decomposition
    Martin Lazar, Cesare Molinari
    Optimal control, applications and methods
    Abstract: We construct an algorithm for solving a constrained optimal control problem for a first‐order evolutionary system governed by a positive self‐adjoint operator. The problem consists in identifying distributed control that minimizes a given cost functional, which comprises a cost of the control and a trajectory regulation term, while steering the final state close to a given target. The approach explores the dual problem and it generalizes the Hilbert Uniqueness Method (HUM). The practical implementation of the algorithm is based on a spectral decomposition of the operator determining the dynamics of the system. Once this decomposition is available – which can be done offline and saved for future use – the optimal control problem is solved almost instantaneously. It is practically reduced to a scalar nonlinear equation for the optimal Lagrange multiplier. The efficiency of the algorithm is demonstrated through numerical examples corresponding to different types of control operators and penalization terms.
  • Anderson acceleration of coordinate descent
    Quentin Bertrand and Mathurin Massias
    AISTATS 2021
    Abstract: Acceleration of first order methods is mainly obtained via inertial techniques à la Nesterov, or via nonlinear extrapolation. The latter has known a recent surge of interest, with successful applications to gradient and proximal gradient techniques. On multiple Machine Learning problems, coordinate descent achieves performance significantly superior to full-gradient methods. Speeding up coordinate descent in practice is not easy: inertially accelerated versions of coordinate descent are theoretically accelerated, but might not always lead to practical speed-ups. We propose an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent. Experiments on least squares, Lasso, elastic net and logistic regression validate the approach.
  • Iterative regularization for convex regularizers
    Cesare Molinari, Mathurin Massias, Lorenzo Rosasco and Silvia Villa
    AISTATS 2021
    Abstract:  We study iterative regularization for linear models, when the bias is convex but not necessarily strongly convex. We characterize the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances can be achieved with considerable computational speed-ups. .
  • A robust method for statistical testing of empirical power-law distributions
    Garbarino D., Tozzo V., Vian A., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer from large sample sizes and/or noisy data.In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a Monte Carlo sub-sampling procedure on the graphs. We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations.Lastly, we perform a study on 3 websites showing that indeed, depending on their category, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to better characterize topological properties deriving from different generative principles: central or peripheral.
  • A telescope GWAS analysis strategy, based on SNPs-genes-pathways ensamble and on multivariate algorithms, to characterize late onset Alzheimer’s disease
    Squillario M. et al.
    Scientific Reports
    Abstract: Genome–wide association studies (GWAS) have revealed a plethora of putative susceptibility genes for Alzheimer’s disease (AD), with the sole exception of APOE gene unequivocally validated in independent study. Considering that the etiology of complex diseases like AD could depend on functional multiple genes interaction network, here we proposed an alternative GWAS analysis strategy based on (i) multivariate methods and on a (ii) telescope approach, in order to guarantee the identification of correlated variables, and reveal their connections at three biological connected levels. Specifically as multivariate methods, we employed two machine learning algorithms and a genetic association test and we considered SNPs, Genes and Pathways features in the analysis of two public GWAS dataset (ADNI-1 and ADNI-2). For each dataset and for each feature we addressed two binary classifications tasks: cases vs. controls and the low vs. high risk of developing AD considering the allelic status of APOEe4. This complex strategy allowed the identification of SNPs, genes and pathways lists statistically robust and meaningful from the biological viewpoint. Among the results, we confirm the involvement of TOMM40 gene in AD and we propose GRM7 as a novel gene significantly associated with AD.
  • Classification of Epileptic Activity Through Temporal and Spatial Characterization of Intracranial Recordings
    D’Amario V., Arnulfo G., Nobili L., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: Focal epilepsy is a chronic condition characterized by hyper-activity and abnormal synchronization of a specific brain region. For pharmacoresistant patients, the surgical resection of the critical area is considered a valid clinical solution, therefore, an accurate localization is crucial to minimize neurological damage. In current clinical routine the characterization of the Epileptogenic Zone (EZ) is performed using invasive methods, such as Stereo-ElectroEncephaloGraphy (SEEG). Medical experts perform the tag of neural electrophysiological recordings by visually inspecting the acquired data, a highly time consuming and subjective procedure. Here we show the results of an automatic multi-modal classification method for the evaluation of critical areas in focal epileptic patients. The proposed method represents an attempt in the characterization of brain areas which integrates the anatomical information on neural tissue, inferred using Magnetic Resonance Imaging (MRI) in combination with spectral features extracted from SEEG recordings.
  • Gut Microbiota in T1DM-Onset Pediatric Patients: Machine-Learning Algorithms to Classify Microorganisms as Disease Linked
    Biassoni R., Di Marco E., Squillario M., Barla A., Piccolo G., Ugolotti E., Gatti C., Minuto N., Patti G., Maghnie M., d’Annunzio G.
    Journal of Clinical Endocrinology and Metabolism
    Abstract: The purpose of this work is to find the gut microbial fingerprinting of pediatric patients with type 1 diabetes.The microbiome of 31 children with type 1 diabetes at onset and of 25 healthy children was determined using multiple polymorphic regions of the 16S ribosomal RNA. We performed machine-learning analyses and metagenome functional analysis to identify significant taxa and their metabolic pathways content.Compared with healthy controls, patients showed a significantly higher relative abundance of the following most important taxa: Bacteroides stercoris, Bacteroides fragilis, Bacteroides intestinalis, Bifidobacterium bifidum, Gammaproteobacteria and its descendants, Holdemania, and Synergistetes and its descendants. On the contrary, the relative abundance of Bacteroides vulgatus, Deltaproteobacteria and its descendants, Parasutterella and the Lactobacillus, Turicibacter genera were significantly lower in patients with respect to healthy controls. The predicted metabolic pathway more associated with type 1 diabetes patients concerns “carbon metabolism,” sugar and iron metabolisms in particular. Among the clinical variables considered, standardized body mass index, anti-insulin autoantibodies, glycemia, hemoglobin A1c, Tanner stage, and age at onset emerged as most significant positively or negatively correlated with specific clusters of taxa.The relative abundance and supervised analyses confirmed the importance of B stercoris in type 1 diabetes patients at onset and showed a relevant role of Synergistetes and its descendants in patients with respect to healthy controls. In general the robustness and coherence of the showed results underline the relevance of studying the microbioma using multiple polymorphic regions, different types of analysis, and different approaches within each analysis.
  • Multi-parameters Model Selection for Network Inference
    Tozzo V., Barla A.
    Studies in Computational Intelligence
    Abstract: Network inference is the reverse-engineering problem of inferring graphs from data. With the always increasing availability of data, methods based on probability assumptions that infer multiple intertwined networks have been proposed in literature. These methods, while being extremely flexible, have the major drawback of presenting a high number of hyper-parameters that need to be tuned. The tuning of hyper-parameters, in unsupervised settings, can be performed through criteria based on likelihood or stability. Likelihood-based scores can be easily generalised to the multi hyper-parameters setting, but their computation is feasible only under certain probability assumptions. Differently, stability-based methods are of general application and, on single hyper-parameter, they have been proved to outperform likelihood-based scores. In this work we present a multi-parameters extension to stability-based methods that can be easily applied on complex models. We extensively compared this extension with likelihood-based scores on synthetic Gaussian data. Experiments show that our extension provides a better estimate of models of increasing complexity providing a valuable alternative of existing likelihood-based model selection methods.
  • 1290-P: Gut Microbiota in New-Onset Pediatric Patients with Type 1 Diabetes: Machine Learning Algorithms to Classify Microorganisms Disease-Linked
    G D’ANNUNZIO, R Biassoni, M Squillario, E Ugolotti, A Barla, G Piccolo, N Minuto, M Maghnie

    Abstract: Gut microbiota plays a role in human health and autoimmunity. Among environmental factor linked to type 1 diabetes (T1DM) pathogenesis, gut microbiota impairment seems to be involved. We evaluated gut microbial fingerprinting in 31 pediatric patients with new-onset T1DM and 25 healthy children using multiple polymorphic region of the 16S rRNA. We performed machine learning and metagenome functional analyses to identify significant taxa and metabolic pathways and correlate with clinical and metabolic parameters. Inclusion criteria were living in Northern Italy, born from Caucasian parents, singleton birth, personal history negative for acute/chronic gastrointestinal diseases and/or antibiotic or probiotics administration. Different Bacteroidetes species i.e., B.stercoris (q=1,473E-4), B.intestinalis (q=0.010), B.fragilis (q=0.0452) were significantly more frequent in patients as well as B.bifidum …
  • Missing Values in Multiple Joint Inference of Gaussian Graphical Models
    V Tozzo, D Garbarino, A Barla

    Abstract: Real-world phenomena are often not fully measured or completely observable, raising the so-called missing data problem. As a consequence, the need of developing ad-hoc techniques that cope with such issue arises in many inference contexts. In this paper, we focus on the inference of Gaussian Graphical Models (GGMs) from multiple input datasets having complex relationships (eg multi-class or temporal). We propose a method that generalises state-of-the-art approaches to the inference of both multi-class and temporal GGMs while naturally dealing with two types of missing data: partial and latent. Synthetic experiments show that our performance is better than state-of-the-art. In particular, we compared results with single network inference methods that suitably deal with missing data, and multiple joint network inference methods coupled with standard pre-processing techniques (eg imputing). When dealing with fully observed datasets our method analytically reduces to state-of-the-art approaches providing a good alternative as our implementation reaches convergence in shorter or comparable time. Finally, we show that properly addressing the missing data problem in a multi-class real-world example, allows us to discover interesting varying patterns.
  • Temporal pattern detection in time-varying graphical models
    Tomasi F., Tozzo V., Barla A.
    Proceedings - International Conference on Pattern Recognition
    Abstract: Graphical models allow to describe the interplay among variables of a system through a compact representation, suitable when relations evolve over time. For example, in a biological setting, genes interact differently depending on external environmental or metabolic factors. To incorporate this dynamics a viable strategy is to estimate a sequence of temporally related graphs assuming similarity among samples in different time points. While adjacent time points may direct the analysis towards a robust estimate of the underlying graph, the resulting model will not incorporate long-term or recurrent temporal relationships. In this work we propose a dynamical network inference model that leverages on kernels to consider general temporal patterns (such as circadian rhythms or seasonality). We show how our approach may also be exploited when the recurrent patterns are unknown, by coupling the network inference with a clustering procedure that detects possibly non-consecutive similar networks. Such clusters are then used to build similarity kernels. The convexity of the functional is determined by whether we impose or infer the kernel. In the first case, the optimisation algorithm exploits efficiently proximity operators with closed-form solutions. In the other case, we resort to an alternating minimisation procedure which jointly learns the temporal kernel and the underlying network. Extensive analysis on synthetic data shows the efficacy of our models compared to state-of-the-art methods. Finally, we applied our approach on two realworld applications to show how considering long-term patterns is fundamental to have insights on the behaviour of a complex system.
  • The hidden information in patient-reported outcomes and clinician-assessed outcomes: multiple sclerosis as a proof of concept of a machine learning approach
    Brichetto G., Monti Bragadin M., Fiorini S., Battaglia M.A., Konrad G., Ponzio M., Pedullà L., Verri A., Barla A., Tacchino A.
    Neurological Sciences
    Abstract: Machine learning (ML) applied to patient-reported (PROs) and clinical-assessed outcomes (CAOs) could favour a more predictive and personalized medicine. Our aim was to confirm the important role of applying ML to PROs and CAOs of people with relapsing-remitting (RR) and secondary progressive (SP) form of multiple sclerosis (MS), to promptly identifying information useful to predict disease progression. For our analysis, a dataset of 3398 evaluations from 810 persons with MS (PwMS) was adopted. Three steps were provided: course classification; extraction of the most relevant predictors at the next time point; prediction if the patient will experience the transition from RR to SP at the next time point. The Current Course Assignment (CCA) step correctly assigned the current MS course with an accuracy of about 86.0%. The MS course at the next time point can be predicted using the predictors selected in CCA. PROs/CAOs Evolution Prediction (PEP) followed by Future Course Assignment (FCA) was able to foresee the course at the next time point with an accuracy of 82.6%. Our results suggest that PROs and CAOs could help the clinician decision-making in their practice.
  • Transplantation induces profound changes in the transcriptional asset of hematopoietic stem cells: Identification of specific signatures using machine learning techniques
    Cilloni D., Petiti J., Campia V., Podestà M., Squillario M., Montserrat N., Bertaina A., Sabatini F., Carturan S., Berger M., Saglio F., Bandini G., Bonifazi F., Fagioli F., Moretta L., Saglio G., Verri A., Barla A., Locatelli F., Frassoni F.
    Journal of Clinical Medicine
    Abstract: During the phase of proliferation needed for hematopoietic reconstitution following transplantation, hematopoietic stem/progenitor cells (HSPC) must express genes involved in stem cell self-renewal. We investigated the expression of genes relevant for self-renewal and expansion of HSPC (operationally defined as CD34+ cells) in steady state and after transplantation. Specifically, we evaluated the expression of ninety-one genes that were analyzed by real-time PCR in CD34+ cells isolated from (i) 12 samples from umbilical cord blood (UCB); (ii) 15 samples from bone marrow healthy donors; (iii) 13 samples from bone marrow after umbilical cord blood transplant (UCBT); and (iv) 29 samples from patients after transplantation with adult hematopoietic cells. The results show that transplanted CD34+ cells from adult cells acquire an asset very different from transplanted CD34+ cells from cord blood. Multivariate machine learning analysis (MMLA) showed that four specific gene signatures can be obtained by comparing the four types of CD34+ cells. In several, but not all cases, transplanted HSPC from UCB overexpress reprogramming genes. However, these remarkable changes do not alter the commitment to hematopoietic lineage. Overall, these results reveal undisclosed aspects of transplantation biology.
  • Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step
    Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili
    arXiv preprint arXiv:2005.05158
    Abstract: In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in the authors' previous paper, which we denote ICGALP, that allows for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, eg in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lower-semicontinuous functions subject to an affine constraint of the form for some bounded linear operator . Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible prox operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian to an optimum and asymptotic feasibility of the affine constraint as well as weak convergence of the dual variable to a solution of the dual problem, all in an almost sure sense. Almost sure convergence rates, both pointwise and ergodic, are given for the Lagrangian values and the feasibility gap. Numerical experiments verifying the predicted rates of convergence are shown as well.
  • Generalized conditional gradient with augmented lagrangian for composite minimization
    Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili
    SIAM Journal on Optimization
    Abstract: In this paper we propose a splitting scheme which hybridizes generalized conditional gradient with a proximal step which we call CGALP algorithm, for minimizing the sum of three proper convex and lower-semicontinuous functions in real Hilbert spaces. The minimization is subject to an affine constraint, that allows in particular to deal with composite problems (sum of more than three functions) in a separate way by the usual product space technique. While classical conditional gradient methods require Lipschitz-continuity of the gradient of the differentiable part of the objective, CGALP needs only differentiability (on an appropriate subset), hence circumventing the intricate question of Lipschitz continuity of gradients. For the two remaining functions in the objective, we do not require any additional regularity assumption. The second function, possibly nonsmooth, is assumed simple, i.e., the associated proximal mapping is easily computable. For the third function, again nonsmooth, we just assume that its domain is weakly compact and that a linearly perturbed minimization oracle is accessible. In particular, this last function can be chosen to be the indicator of a nonempty bounded closed convex set, in order to deal with additional constraints. Finally, the affine constraint is addressed by the augmented Lagrangian approach. Our analysis is carried out for a wide choice of algorithm parameters satisfying so called "open loop" rules. As main results, under mild conditions, we show asymptotic feasibility with respect to the affine constraint, boundedness of the dual multipliers, and convergence of the Lagrangian values to the saddle-point optimal value. We also provide (subsequential) rates of convergence for both the feasibility gap and the Lagrangian values.
  • On Mixup Regularization
    Luigi Carratino, Mixupoustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert
    arXiv preprint
    Abstract: Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretical foundations of Mixup, by clarifying its regularization effects. We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data. We gain two core insights from this new interpretation. First, the data transformation suggests that, at test time, a model trained with Mixup should also be applied to transformed data, a one-line change in code that we show empirically to improve both accuracy and calibration of the prediction. Second, we show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes, including label smoothing and reduction of the Lipschitz constant of the estimator. These schemes interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. We corroborate our theoretical analysis with experiments that support our conclusions.
  • Kernel methods through the roof: handling billions of points efficiently
    Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi
    Advances in Neural Information Processing Systems (NeurIPS) 2020
    Abstract: Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance. Additionally, we make our software available as an easy to use library.
  • Regularized ERM on random subspaces
    Andrea Della Vecchia, Jaouad Mourtada, Ernesto De Vito, Lorenzo Rosasco
    arXiv preprint
    Abstract: We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data. This approach naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. Our main results show the existence of different regimes, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance. Theoretical results are complemented with numerical experiments on large scale benchmark data sets.
  • Interpolation and Learning with Scale Dependent Kernels
    Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco
    arxiv preprint
    Abstract: We study the learning properties of nonparametric ridge-less least squares. In particular, we consider the common case of estimators defined by scale dependent kernels, and focus on the role of the scale. These estimators interpolate the data and the scale can be shown to control their stability through the condition number. Our analysis shows that are different regimes depending on the interplay between the sample size, its dimensions, and the smoothness of the problem. Indeed, when the sample size is less than exponential in the data dimension, then the scale can be chosen so that the learning error decreases. As the sample size becomes larger, the overall error stop decreasing but interestingly the scale can be chosen in such a way that the variance due to noise remains bounded. Our analysis combines, probabilistic results with a number of analytic techniques from interpolation theory.
  • Construction and Monte Carlo estimation of wavelet frames generated by a reproducing kernel
    Ernesto De Vito, Zeljko Kereta, Valeriya Naumova, Lorenzo Rosasco, Stefano Vigogna
    Journal of Fourier Analysis and Applications 27(37), 2021
    Abstract: We introduce a novel construct of multiscale tight frames on general domains. The frame elements are obtained by spectral filtering of the integral operator associated with a reproducing kernel. Our construction extends classical wavelets as well as generalized wavelets on both continuous and discrete non-Euclidean structures such as Riemannian manifolds and weighted graphs. Moreover, it allows to study the relation between continuous and discrete frames in a random sampling regime, where discrete frames can be seen as Monte Carlo estimates of the continuous ones. Pairing spectral regularization with learning theory, we show that a sample frame tends to its population counterpart, and derive explicit finite-sample rates on spaces of Sobolev and Besov regularity. On the other hand, our results prove the stability of frames constructed on empirical data, in the sense that all stochastic discretizations have the same underlying limit regardless of the set of initial training samples.
  • Asymptotics of Ridge(less) Regression under General Source Condition
    Dominic Richards, Jaouad Mourtada and Lorenzo Rosasco
    arXiv preprint
    Abstract: We analyze the prediction performance of ridge and ridgeless regression when both the number and the dimension of the data go to infinity. In particular, we consider a general setting introducing prior assumptions characterizing ``easy'' and ``hard'' learning problems. In this setting, we show that ridgeless (zero regularisation) regression is optimal for easy problems with a high signal to noise. Furthermore, we show that additional descents in the ridgeless bias and variance learning curve can occur beyond the interpolating threshold, verifying recent empirical observations. More generally, we show how a variety of learning curves are possible depending on the problem at hand. From a technical point of view, characterising the influence of prior assumptions requires extending previous applications of random matrix theory to study ridge regression.
  • Hyperbolic Manifold Regression
    Gian Maria Marconi, Lorenzo Rosasco, Carlo Ciliberto
    23rd International Conference on Artificial Intelligence and Statistics Conference
    Abstract: Geometric representation learning has recently shown great promise in several machine learning settings, ranging from relational learning to language processing and generative models. In this work, we consider the problem of performing manifold-valued regression onto an hyperbolic space as an intermediate component for a number of relevant machine learning applications. In particular, by formulating the problem of predicting nodes of a tree as a manifold regression task in the hyperbolic space, we propose a novel perspective on two challenging tasks: 1) hierarchical classification via label embeddings and 2) taxonomy extension of hyperbolic representations. To address the regression problem we consider previous methods as well as proposing two novel approaches that are computationally more advantageous: a parametric deep learning model that is informed by the geodesics of the target space and a non-parametric kernel-method for which we also prove excess risk bounds. Our experiments show that the strategy of leveraging the hyperbolic geometry is promising. In particular, in the taxonomy expansion setting, we find that the hyperbolic-based estimators significantly outperform methods performing regression in the ambient Euclidean space.
  • Estimating multi-index models with response-conditional least squares
    Timo Klock, Alessandro Lanteri, Stefano Vigogna
    Electronic Journal of Statistics 15(1), 2021
    Abstract: The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming E[Y|X]=g(A⊤X) for some unknown index space A and link function g. In this paper we introduce a method for the estimation of the index space, and study the propagation error of an index space estimate in the regression of the link function. The proposed method approximates the index space by the span of linear regression slope coefficients computed over level sets of the data. Being based on ordinary least squares, our approach is easy to implement and computationally efficient. We prove a tight concentration bound that shows N−1/2 convergence, but also faithfully describes the dependence on the chosen partition of level sets, hence giving indications on the hyperparameter tuning. The estimator's competitiveness is confirmed by extensive comparisons with state-of-the-art methods, both on synthetic and real data sets. As a second contribution, we establish minimax optimal generalization bounds for k-nearest neighbors and piecewise polynomial regression when trained on samples projected onto an estimate of the index space, thus providing complete and provable estimation of the multi-index model.
  • Constructing fast approximate eigenspaces with application to the fast graph Fourier transforms
    Cris Rusu, Lorenzo Rosasco
    arXiv preprint
    Abstract: We investigate numerically efficient approximations of eigenspaces associated to symmetric and general matrices. The eigenspaces are factored into a fixed number of fundamental components that can be efficiently manipulated (we consider extended orthogonal Givens or scaling and shear transformations). The number of these components controls the trade-off between approximation accuracy and the computational complexity of projecting on the eigenspaces. We write minimization problems for the single fundamental components and provide closed-form solutions. Then we propose algorithms that iterative update all these components until convergence. We show results on random matrices and an application on the approximation of graph Fourier transforms for directed and undirected graphs.
  • A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings
    C Ciliberto, L Rosasco, A Rudi
    arXiv preprint
    Abstract: We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss functions that implicitly defines a suitable geometry on the problem. The latter is the key to develop an algorithmic framework amenable to a sharp statistical analysis and yielding efficient computations. When dealing with output spaces with infinite cardinality, a suitable implicit formulation of the estimator is shown to be crucial.
  • Faster Kriging: Facing High-Dimensional Simulators
    X Lu, A Rudi, E Borgonovo, L Rosasco
    Operations Research 68 (1), pp. 233-249
    Abstract: Kriging is one of the most widely used emulation methods in simulation. However, memory and time requirements potentially hinder its application to data sets generated by high-dimensional simulators. We borrow from the machine learning literature to propose a new algorithmic implementation of kriging that, while preserving prediction accuracy, notably reduces time and memory requirements. The theoretical and computational foundations of the algorithm are provided. The work then reports results of extensive numerical experiments to compare the performance of the proposed algorithm against current kriging implementations, on simulators of increasing dimensionality. Findings show notable savings in time and memory requirements that allow one to handle inputs across more that 10,000 dimensions.
  • Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
    Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco
    International Confererence on Machine Learning (ICML) 2020
    Abstract: Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods.
    @misc{calandriello20a,
     title={Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification},
     author={Calandriello, Daniele and Carratino, Luigi and Lazaric, Alessandro and Valko, Michal and Rosasco, Lorenzo},
     year={2020},
     journal={arXiv preprint}}
  • Adenine: A HPC-oriented tool for biological data exploration
    Fiorini S., Tomasi F., Squillario M., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github.com/slipguru/adenine under FreeBSD license.
  • Cancer mutational signatures identification with sparse dictionary learning
    Tozzo V., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: Somatic DNA mutations are a characteristic of cancerous cells, being usually key in the origin and development of cancer. In the last few years, somatic mutations have been studied in order to understand which processes or conditions may generate them, with the purpose of developing prevention and treatment strategies. In this work we propose a novel sparse regularised method that aims at extracting mutational signatures from somatic mutations. We developed a pipeline that extracts the dataset from raw data and performs the analysis returning the signatures and their relative usage frequencies. A thorough comparison between our method and the state of the art procedure reveals that our pipeline can be used alternatively without losing information and possibly gaining more interpretability and precision.
  • Different features of tumor-associated NK cells in patients with low-grade or high-grade peritoneal carcinomatosis
    Pesce S., Belgrano V., Greppi M., Carlomagno S., Squillario M., Barla A., Della Chiesa M., Di Domenico S., Mavilio D., Moretta L., Candiani S., Sivori S., De Cian F., Marcenaro E.
    Frontiers in Immunology
    Abstract: Peritoneal carcinomatosis (PC) is a rare disease defined as diffused implantation of neoplastic cells in the peritoneal cavity. This clinical picture occurs during the evolution of peritoneal tumors, and it is the main cause of morbidity and mortality of patients affected by these pathologies, though cytoreductive surgery with heated intra-peritoneal chemotherapy (CRS/HIPEC) is yielding promising results. In the present study, we evaluated whether the tumor microenvironment of low-grade and high-grade PC could affect the phenotypic and functional features and thus the anti-tumor potential of NK cells. We show that while in the peritoneal fluid (PF) of low-grade PC most CD56dim NK cells show a relatively immature phenotype (NKG2A+KIR–CD57–CD16dim), in the PF of high-grade PC NK cells are, in large majority, mature (CD56dimKIR+CD57+CD16bright). Furthermore, in low-grade PC, PF-NK cells are characterized by a sharp down-regulation of some activating receptors, primarily NKp30 and DNAM-1, while, in high-grade PC, PF-NK cells display a higher expression of the PD-1 inhibitory checkpoint. The compromised phenotype observed in low-grade PC patients corresponds to a functional impairment. On the other hand, in the high-grade PC patients PF-NK cells show much more important defects that only partially reflect the compromised phenotype detected. These data suggest that the PC microenvironment may contribute to tumor escape from immune surveillance by inducing different NK cell impaired features leading to altered anti-tumor activity. Notably, after CRS/HIPEC treatment, the altered NK cell phenotype of a patient with a low-grade disease and favorable prognosis was reverted to a normal one. Our present data offer a clue for the development of new immunotherapeutic strategies capable of restoring the NK-mediated anti-tumor responses in association with the CRS/HIPEC treatment to increase the effectiveness of the current therapy.
  • Discrete Changes in Glucose Metabolism Define Aging
    Ravera S., Podestà M., Sabatini F., Dagnino M., Cilloni D., Fiorini S., Barla A., Frassoni F.
    Scientific Reports
    Abstract: Aging is a physiological process in which multifactorial processes determine a progressive decline. Several alterations contribute to the aging process, including telomere shortening, oxidative stress, deregulated autophagy and epigenetic modifications. In some cases, these alterations are so linked with the aging process that it is possible predict the age of a person on the basis of the modification of one specific pathway, as proposed by Horwath and his aging clock based on DNA methylation. Because the energy metabolism changes are involved in the aging process, in this work, we propose a new aging clock based on the modifications of glucose catabolism. The biochemical analyses were performed on mononuclear cells isolated from peripheral blood, obtained from a healthy population with an age between 5 and 106 years. In particular, we have evaluated the oxidative phosphorylation function and efficiency, the ATP/AMP ratio, the lactate dehydrogenase activity and the malondialdehyde content. Further, based on these biochemical markers, we developed a machine learning-based mathematical model able to predict the age of an individual with a mean absolute error of approximately 9.7 years. This mathematical model represents a new non-invasive tool to evaluate and define the age of individuals and could be used to evaluate the effects of drugs or other treatments on the early aging or the rejuvenation.
  • Icing: Large-scale inference of immunoglobulin clonotypes
    Tomasi F., Squillario M., Verri A., Bagnara D., Barla A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: Immunoglobulin (IG) clonotype identification is a fundamental open question in modern immunology. An accurate description of the IG repertoire is crucial to understand the variety within the immune system of an individual, potentially shedding light on the pathogenetic process. Intrinsic IG heterogeneity makes clonotype inference an extremely challenging task, both from a computational and a biological point of view. Here we present icing, a framework that allows to reconstruct clonal families also in case of highly mutated sequences. icing has a modular structure, and it is designed to be used with large next generation sequencing (NGS) datasets, a technology which allows the characterisation of large-scale IG repertoires. We extensively validated the framework with clustering performance metrics on the results in a simulated case. icing is implemented in Python, and it is publicly available under FreeBSD licence at https://github.com/slipguru/icing.
  • Uncovering the genetic hereditable component of late onset Alzheimer’s disease through the analysis of GWA data
    M Squillario, F Tomasi, V Tozzo, A Barla, D Uberti

    Abstract: This is an open access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Adenine: A HPC-Oriented Tool for Biological Data Exploration
    A Barla

    Abstract: Adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github. com/slipguru/adenine under FreeBSD license.
  • Next Generation Sequencing and Microrna Assay in a Cohort of Patients Affected By Myelodysplastic Syndromes. an Analysis of Clinical and Genotypic Features
    D Avenoso, M Squillario, A Barla, A Verri, E Carminati, C Nurra, E Todisco, F Gigli, G Gregato, C Poletti, F Bertolini, G J Mufti, M Shah, M Miglino

    Abstract: The myelodysplastic syndromes (MDS) are a group of clonal haematopoietic stem cell disorders and they can be a consequence of genomic/chromosomal instability. The World Health Organization (WHO) 2008 classification divides the MDS in 4 types: refractory anaemia with excess of blast (RAEB), refractory anaemia (RA), refractory cytopenia with multilineage dysplasia (RCMD) and chronic myelomonocytic leukaemia (CMML). Various mechanisms contribute to the pathogenesis and prognosis of the disease and currently next generation sequencing (NGS) detects pathogenic gene mutations that can allocate patients to different risk classes. MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate gene expression via post-transcriptional mechanisms and may have oncogenic properties or act as tumour suppressor and have an active role in the onset of myeloid …
  • Development of a smart post-hospitalization facility for older people by using domotics, robotics, and automated tele-monitoring
    C Patrone, A Cella, C Martini, S Pericu, R Femia, A Barla, C Porfirione, M Puntoni, N Veronese, F Odone, N Casiddu, G A Rollandi, A Verri, A Pilotto

    Abstract: Recent studies showed that about the 8% of beds are occupied by patients who experience a delayed hospital discharge (DHD). This is attributed to a delay in the arrangement of home-care assistance or in admission to long-term care facilities. Recently a lot of technologies have been developed to improve caring and monitoring of older people. The aim of this study is to design, implement and test a prototype of a technology based post-hospitalization facility for older people at risk of DHD by using domotics, robotics and wearable sensors for tele-monitoring. A sensorised posthospitalization facility has been built inside the hospital. Thirty-five healthy volunteers aged from 20 to 82 years were recruited. Clinical and functional assessment, ie motility index (MI), and human-robot interaction satisfaction were measured. A significant correlation was observed between automatic MI and the Gait Speed, the time sit-to-stand, and the Timed Up and Go test. Domotics, robotics and technology-based telemonitoring may represent a new way to assess patient’s autonomy and functional and clinical conditions in an ecological way, reproducing as much as possible a real life at home.
  • Predicting diabetes second-line therapy initiation in the Australian population via time span-guided neural attention network
    Fiorini S., Hajati F., Barla A., Girosi F.
    PLoS ONE
    Abstract: Introduction The first line of treatment for people with Diabetes mellitus is metformin. However, over the course of the disease metformin may fail to achieve appropriate glycemic control, and a second-line therapy may become necessary. In this paper we introduce Tangle, a time span-guided neural attention model that can accurately and timely predict the upcoming need for a second-line diabetes therapy from administrative data in the Australian adult population. The method is suitable for designing automatic therapy review recommendations for patients and their providers without the need to collect clinical measures. Data We analyzed seven years of de-identified records (2008-2014) of the 10% publicly available linked sample of Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) electronic databases of Australia. Methods By design, Tangle inherits the representational power of pre-trained word embedding, such as GloVe, to encode sequences of claims with the related MBS codes. Moreover, the proposed attention mechanism natively exploits the information hidden in the time span between two successive claims (measured in number of days). We compared the proposed method against state-of-the-art sequence classification methods. Results Tangle outperforms state-of-the-art recurrent neural networks, including attention-based models. In particular, when the proposed time span-guided attention strategy is coupled with pre-trained embedding methods, the model performance reaches an Area Under the ROC Curve of 90%, an improvement of almost 10 percentage points over an attentionless recurrent architecture. Implementation Tangle is implemented in Python using Keras and it is hosted on GitHub at https://github.com/samuelefiorini/tangle.
  • Secondary somatic mutations in g-protein-related pathways and mutation signatures in Uveal melanoma
    Piaggio F., Tozzo V., Bernardi C., Croce M., Puzone R., Viaggi S., Patrone S., Barla A., Coviello D., Jager M.J., Van Der Velden P.A., Zeschnigk M., Cangelosi D., Eva A., Pfeffer U., Amaro A.
    Cancers
    Abstract: Background: Uveal melanoma (UM), a rare cancer of the eye, is characterized by initiating mutations in the genes G-protein subunit alpha Q (GNAQ), G-protein subunit alpha 11 (GNA11), cysteinyl leukotriene receptor 2 (CYSLTR2), and phospholipase C beta 4 (PLCB4) and by metastasis-promoting mutations in the genes splicing factor 3B1 (SF3B1), serine and arginine rich splicing factor 2 (SRSF2), and BRCA1-associated protein 1 (BAP1). Here, we tested the hypothesis that additional mutations, though occurring in only a few cases (&ldquo;secondary drivers&rdquo;), might influence tumor development. Methods: We analyzed all the 4125 mutations detected in exome sequencing datasets, comprising a total of 139 Ums, and tested the enrichment of secondary drivers in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that also contained the initiating mutations. We searched for additional mutations in the putative secondary driver gene protein tyrosine kinase 2 beta (PTK2B) and we developed new mutational signatures that explain the mutational pattern observed in UM. Results: Secondary drivers were significantly enriched in KEGG pathways that also contained GNAQ and GNA11, such as the calcium-signaling pathway. Many of the secondary drivers were known cancer driver genes and were strongly associated with metastasis and survival. We identified additional mutations in PTK2B. Sparse dictionary learning allowed for the identification of mutational signatures specific for UM. Conclusions: A considerable part of rare mutations that occur in addition to known driver mutations are likely to affect tumor development and progression.
  • Visual computing methods for assessing the well-being of older people
    Martini C., Odone F., Noceti N., Chessa M., Barla A., Verri A., Cella A., Pilotto A., Rollandi G.A.
    Communications in Computer and Information Science
    Abstract: With the increasing share of elderly population worldwide, the necessity of assistive technologies to support clinicians in monitoring their health conditions is becoming more and more relevant. Recent medical literature has proposed the notion of frail elderly, which rapidly became a key element of clinical practices for the estimation of well-being in aging population. The evaluation of frailty is commonly based on self reported outcomes and occasional physicians evaluations, leading to possibly biased results. In this work we propose a data driven method to automatically evaluate two of the main aspects contributing to the frailty estimation, i.e. the motility of the subject and his cognitive status. The first one is evaluated using visual computing tools, while the latter relies on a virtual reality based system. We provide an extensive experimental assessment performed on two sets of data acquired in a sensorised protected discharge facility located in a local hospital. Our results are in good agreement with the assessment manually performed by physicians, nicely showing the potential capability of our approach to complement current protocols of evaluation.
  • Convergence rates of Forward–Douglas–Rachford splitting method
    Cesare Molinari, Jingwei Liang, Jalal Fadili
    Journal of optimization theory and applications
    Abstract: Over the past decades, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward–Douglas–Rachford splitting method and study both global and local convergence rates of this method. For the global rate, we establish a sublinear convergence rate in terms of a Bregman divergence suitably designed for the objective function. Moreover, when specializing to the Forward–Backward splitting, we prove a stronger convergence rate result for the objective function value. Then locally, based on the assumption that the non-smooth part of the optimization problem is partly smooth, we establish local linear convergence of the method. More precisely, we show that the sequence generated by Forward–Douglas–Rachford first (i) identifies a smooth manifold in a finite number of iteration and then (ii) enters a local linear convergence regime, which is for instance characterized in terms of the structure of the underlying active smooth manifold. To exemplify the usefulness of the obtained result, we consider several concrete numerical experiments arising from applicative fields including, for instance, signal/image processing, inverse problems and machine learning.
  • An improper estimator with optimal excess risk in misspecified density estimation and logistic regression
    Jaouad Mourtada, Stéphane Gaïffas
    arXiv preprint
    Abstract: We introduce a procedure for predictive conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for supervised statistical learning. On standard examples, this bound scales as d/n with d the model dimension and n the sample size, and critically remains valid under model misspecification. Being an improper (out-of-model) procedure, SMP improves over within-model estimators such as the maximum likelihood estimator, whose excess risk degrades under misspecification. Compared to approaches reducing to the sequential problem, our bounds remove suboptimal logn factors, addressing an open problem from Grünwald and Kotlowski for the considered models, and can handle unbounded classes. For the Gaussian linear model, the predictions and risk bound of SMP are governed by leverage scores of covariates, nearly matching the optimal risk in the well-specified case without conditions on the noise variance or approximation error of the linear model. For logistic regression, SMP provides a non-Bayesian approach to calibration of probabilistic predictions relying on virtual samples, and can be computed by solving two logistic regressions. It achieves a non-asymptotic excess risk of O((d+B2R2)/n), where R bounds the norm of features and B that of the comparison parameter; by contrast, no within-model estimator can achieve better rate than min(BR/n−−√,deBR/n) in general. This provides a computationally more efficient alternative to Bayesian approaches, which require approximate posterior sampling, thereby partly answering a question by Foster et al. (2018).
  • Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices
    Jaouad Mourtada
    arXiv preprint
    Abstract: The first part of this paper is devoted to the decision-theoretic analysis of random-design linear prediction. It is known that, under boundedness constraints on the response (and thus on regression coefficients), the minimax excess risk scales, up to constants, as σ2d/n in dimension d with n samples and noise σ2. Here, we study the expected excess risk with respect to the full linear class. We show that the ordinary least squares estimator is exactly minimax optimal in the well-specified case for every distribution of covariates. Further, we express the minimax risk in terms of the distribution of statistical leverage scores of individual samples. We deduce a precise minimax lower bound of σ2d/(n−d+1) for general covariate distribution, which nearly matches the risk for Gaussian design. We then obtain nonasymptotic upper bounds on the minimax risk for covariates that satisfy a "small ball"-type regularity condition, which scale as (1+o(1))σ2d/n as d=o(n), both in the well-specified and misspecified cases. Our main technical contribution is the study of the lower tail of the smallest singular value of empirical covariance matrices around 0. We establish a lower bound on this lower tail, valid for any distribution in dimension d≥2, together with a matching upper bound under a necessary regularity condition. Our proof relies on the PAC-Bayesian technique for controlling empirical processes, and extends an analysis of Oliveira devoted to a different part of the lower tail. Equivalently, our upper bound shows that the operator norm of the inverse sample covariance matrix has bounded Lq norm up to q≍n, and our lower bound implies that this exponent is unimprovable. Finally, we show that the regularity condition naturally holds for independent coordinates.
  • Accelerated iterative regularization via dual diagonal descent
    L Calatroni, G Garrigos, L Rosasco, S Villa
    arXiv preprint
    Abstract: We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with general regularization and data-fit functions. In particular, we develop an inertial approach of which we analyze both convergence and stability. Using tools from inexact proximal calculus, we prove early stopping results with optimal convergence rates for additive data-fit terms as well as more general cases, such as the Kullback-Leibler divergence, for which different type of proximal point approximations hold.
  • A Weakly Supervised Strategy for Learning Object Detection on a Humanoid Robot
    E Maiettini, G Pasquale, V Tikhanoff, L Rosasco, L Natale
    2019 IEEE-RAS 19th International Conference on Humanoid Robots
    Abstract: Research in Computer Vision and Deep Learning has recently proposed numerous effective techniques for detecting objects in an image. In general, these employ deep Convolutional Neural Networks trained end-to-end on large datasets annotated with object labels and 2D bounding boxes. These methods provide remarkable performance, but are particularly expensive in terms of training data and supervision. Hence, modern object detection algorithms are difficult to be deployed in robotic applications that require on-line learning. In this paper, we propose a weakly supervised strategy for training an object detector in this scenario. The main idea is to let the robot iteratively grow a training set by combining autonomously annotated examples, with others that are requested for human supervision. We evaluate our method on two experiments with data acquired from the iCub and R1 humanoid platforms, showing ...
  • On-line object detection: a robotics challenge
    Maiettini, Elisa and Pasquale, Giulia and Rosasco, Lorenzo and Natale, Lorenzo
    Autonomous Robots (2019)
    Abstract: Object detection is a fundamental ability for robots interacting within an environment. While stunningly effective, state-of-the-art deep learning methods require huge amounts of labeled images and hours of training which does not favour such scenarios. This work presents a novel pipeline resulting from integrating (Maiettini et al. in 2017 IEEE-RAS 17th international conference on humanoid robotics (Humanoids), 2017) and (Maiettini et al. in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2018), which naturally trains a robot to detect novel objects in few seconds. Moreover, we report on an extended empirical evaluation of the learning method, justifying that the proposed hybrid architecture is key in leveraging powerful deep representations while maintaining fast training time of large scale Kernel methods. We validate our approach on the Pascal VOC benchmark (Everingham et al. in Int J Comput Vis 88(2): 303--338, 2010), and on a challenging robotic scenario (iCubWorld Transformations (Pasquale et al. in Rob Auton Syst 112:260--281, 2019). We address real world use-cases and show how to tune the method for different speed/accuracy trades-off. Lastly, we discuss limitations and directions for future development.
    @inproceedings{maiettini2019,
     title={On-line object detection: a robotics challenge},
     author={Maiettini, Elisa and Pasquale, Giulia and Rosasco, Lorenzo and Natale, Lorenzo},
     year={2019},
     journal={Autonomous Robots}}
  • Exact sampling of determinantal point processes with sublinear time preprocessing
    Derezinski, Michal and Calandriello, Daniele and and Valko, Michal
    Advances in Neural Information Processing Systems
    Abstract: We study the complexity of sampling from a distribution over all index subsets of the set {1, ..., n} with the probability of a subset S proportional to the determinant of the submatrix LS of some n x n positive semidefinite matrix L, where LS corresponds to the entries of L indexed by S. Known as a determinantal point process (DPP), this distribution is used in machine learning to induce diversity in subset selection. When sampling from DDPs, we often wish to sample multiple subsets S with small expected size k = E[|S|] << n from a very large matrix L, so it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). For this purpose we provide DPP-VFX, a new algorithm which, given access only to L, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is n poly(k), i.e., sublinear in the size of L, and (2) its sampling cost is poly(k), i.e., independent of the size of L. Prior to our results, state-of-the-art exact samplers required O(n^3) preprocessing time and sampling time linear in n or dependent on the spectral properties of L. We furthermore give a reduction which allows using our algorithm for exact sampling from cardinality constrained determinantal point processes with n poly(k) time preprocessing. Our implementation of DPP-VFX is provided at https://github.com/guilgautier/DPPy/.
    @inproceedings{calandriello2019dpp,
     title={Exact sampling of determinantal point processes with sublinear time preprocessing},
     author={Derezinski, Michal and Calandriello, Daniele and and Valko, Michal},
     year={2019},
     booktitle={Advances in Neural Information Processing Systems}}
  • Convergence of Stochastic Proximal Gradient Algorithm
    Lorenzo Rosasco, Silvia Villa, Bằng Công Vũ
    Applied Mathematics and Optimization (2019)
    Abstract: We study the extension of the proximal gradient algorithm where only a stochastic gradient estimate is available and a relaxation step is allowed. We establish convergence rates for function values in the convex case, as well as almost sure convergence and convergence rates for the iterates under further convexity assumptions. Our analysis avoid averaging the iterates and error summability assumptions which might not be satisfied in applications, e.g. in machine learning. Our proofing technique extends classical ideas from the analysis of deterministic proximal gradient algorithms.
    @Article{Rosasco2019,
     title={Convergence of Stochastic Proximal Gradient Algorithm},
     author={Rosasco, Lorenzo and Villa, Silvia and Vu, Bang Cong},
     journal={Applied Mathematics and Optimization},
     year={2019},
     month={Oct},
     day={15},
     issn={1432-0606},
     doi={10.1007/s00245-019-09617-7},
     url={https://doi.org/10.1007/s00245-019-09617-7}}
  • Learning to sequence multiple tasks with competing constraints
    Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Yanlong Huang and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele
    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Abstract: Imitation learning offers a general framework where robots can efficiently acquire novel motor skills from demonstrations of a human teacher. While many promising achievements have been shown, the majority of them are only focused on single-stroke movements, without taking into account the problem of multi-tasks sequencing. Conceivably, sequencing different atomic tasks can further augment the robot's capabilities as well as avoid repetitive demonstrations. In this paper, we propose to address the issue of multi-tasks sequencing with emphasis on handling the so-called competing constraints, which emerge due to the existence of the concurrent constraints from Cartesian and joint trajectories. Specifically, we explore the null space of the robot from an information-theoretic perspective in order to maintain imitation fidelity during transition between consecutive tasks. The effectiveness of the proposed method is validated through simulated and real experiments on the iCub humanoid robot.
    @inproceedings{duan2019,
     title={Learning to sequence multiple tasks with competing constraints},
     author={Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Yanlong Huang and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele},
     booktitle={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
     pages={},
     year={2019},
     organization={IEEE}}
  • Derivative-free online learning of inverse dynamics models
    Romeres, Diego and Zorzi, Mattia and Camoriano, Raffaello and Traversaro, Silvio and Chiuso, Alessandro
    IEEE Transactions on Control Systems Technology
    Abstract: This paper discusses online algorithms for inverse dynamics modelling in robotics. Several model classes including rigid body dynamics (RBD) models, data-driven models and semiparametric models (which are a combination of the previous two classes) are placed in a common framework. While model classes used in the literature typically exploit joint velocities and accelerations, which need to be approximated resorting to numerical differentiation schemes, in this paper a new `derivative-free' framework is proposed that does not require this preprocessing step. An extensive experimental study with real data from the right arm of the iCub robot is presented, comparing different model classes and estimation procedures, showing that the proposed `derivative-free' methods outperform existing methodologies.
    @article{romeres2019,
       title={Derivative-free online learning of inverse dynamics models},
     author={Romeres, Diego and Zorzi, Mattia and Camoriano, Raffaello and Traversaro, Silvio and Chiuso, Alessandro},
       journal={IEEE Transactions on Control Systems Technology},
       year={2019},
     publisher={IEEE}}
  • Genuine Personality Recognition from Highly Constrained Face Images
    Fabio Anselmi, Nicoletta Noceti, Lorenzo Rosasco, Robert Ward
    International Conference on Image Analysis and Processing (ICIAP 2019)
    Abstract: People are able to accurately estimate personality traits, merely on the basis of “passport”-style neutral faces and, thus, cues must exist that allow for such estimation. However, up to date, there has been little progress in identifying the form and location of these cues.
    @INPROCEEDINGS{anselmi2019,
     title = {Genuine Personality Recognition from Highly Constrained Face Images},
     booktitle = {Image Analysis and Processing -- ICIAP 2019},
     year = {2019},
     publisher = {Springer International Publishing}, url = {https://link.springer.com/chapter/10.1007/978-3-030-30642-7_38},
     author = {Fabio Anselmi, Nicoletta Noceti, Lorenzo Rosasco, Robert Ward},
  • Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis
    Saverio Salzo, Silvia Villa
    arXiv preprint
    Abstract: We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak convergence of the iterates and the asymptotic rate o(1/n) for the mean of the function values. We derive linear rates under strong convexity and error bound conditions. Our analysis is based on an abstract convergence principle for stochastic descent algorithms which allows to extend and simplify existing results.
    @INPROCEEDINGS{salzo2019,
     title = {Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1906.07392},
     author = {Saverio Salzo, Silvia Villa},
  • Fast approximation of orthogonal matrices and application to PCA
    Cristian Rusu, Lorenzo Rosasco
    arXiv preprint
    Abstract: We study the problem of approximating orthogonal matrices so that their application is numerically fast and yet accurate. We find an approximation by solving an optimization problem over a set of structured matrices, that we call Givens transformations, including Givens rotations as a special case. We propose an efficient greedy algorithm to solve such a problem and show that it strikes a balance between approximation accuracy and speed of computation. The proposed approach is relevant in spectral methods and we illustrate its application to PCA.
    @INPROCEEDINGS{rusu2019,
     title = {Fast approximation of orthogonal matrices and application to PCA},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1907.08697},
     author = {Cristian Rusu, Lorenzo Rosasco},
  • Gain with no Pain: Efficient Kernel-PCA by Nystrom Sampling
    Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi
    arXiv preprint
    Abstract: In this paper, we propose and study a Nyström based approach to efficient large scale kernel principal component analysis (PCA). The latter is a natural nonlinear extension of classical PCA based on considering a nonlinear feature map or the corresponding kernel. Like other kernel approaches, kernel PCA enjoys good mathematical and statistical properties but, numerically, it scales poorly with the sample size. Our analysis shows that Nyström sampling greatly improves computational efficiency without incurring any loss of statistical accuracy. While similar effects have been observed in supervised learning, this is the first such result for PCA. Our theoretical findings, which are also illustrated by numerical results, are based on a combination of analytic and concentration of measure techniques. Our study is more broadly motivated by the question of understanding the interplay between statistical and computational requirements for learning.
    @INPROCEEDINGS{sterge2019,
     title = {Gain with no Pain: Efficient Kernel-PCA by Nystrom Sampling},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1907.05226},
     author = {Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi},
  • Multi-Scale Vector Quantization with Reconstruction Trees
    Enrico Cecini, Ernesto De Vito, Lorenzo Rosasco
    arXiv preprint
    Abstract: We propose and study a multi-scale approach to vector quantization. We develop an algorithm, dubbed reconstruction trees, inspired by decision trees. Here the objective is parsimonious reconstruction of unsupervised data, rather than classification. Contrasted to more standard vector quantization methods, such as K-means, the proposed approach leverages a family of given partitions, to quickly explore the data in a coarse to fine--multi-scale--fashion. Our main technical contribution is an analysis of the expected distortion achieved by the proposed algorithm, when the data are assumed to be sampled from a fixed unknown distribution. In this context, we derive both asymptotic and finite sample results under suitable regularity assumptions on the distribution. As a special case, we consider the setting where the data generating distribution is supported on a compact Riemannian sub-manifold. Tools from differential geometry and concentration of measure are useful in our analysis.
    @INPROCEEDINGS{Cecini2019,
     title = {Multi-Scale Vector Quantization with Reconstruction Trees},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1907.03875},
     author = {Enrico Cecini, Ernesto De Vito, Lorenzo Rosasco},
  • Implicit Regularization of Accelerated Methods in Hilbert Spaces
    Nicolò Pagliana, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 33 (NeurIPS 2019)
    Abstract: We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces. We analyze the implicit regularization properties of Nesterov acceleration and a variant of heavy-ball in terms of corresponding learning error bounds. Our results show that acceleration can provides faster bias decay than gradient descent, but also suffers of a more unstable behavior. As a result acceleration cannot be in general expected to improve learning accuracy with respect to gradient descent, but rather to achieve the same accuracy with reduced computations. Our theoretical results are validated by numerical simulations. Our analysis is based on studying suitable polynomials induced by the accelerated dynamics and combining spectral techniques with concentration inequalities.
    @INPROCEEDINGS{pagliana2019,
     title = {Implicit Regularization of Accelerated Methods in Hilbert Spaces},
     journal = {to appear in “Advances in Neural Information Processing Systems 33 (NeurIPS 2019)},
     year = {2019},
     url = {https://arxiv.org/abs/1905.13000},
     author = {Nicolò Pagliana, Lorenzo Rosasco},
  • Reproducing kernel Hilbert spaces on manifolds: Sobolev and Diffusion spaces
    Ernesto De Vito, Nicole Mücke, Lorenzo Rosasco
    arXiv preprint
    Abstract: We study reproducing kernel Hilbert spaces (RKHS) on a Riemannian manifold. In particular, we discuss under which condition Sobolev spaces are RKHS and characterize their reproducing kernels. Further, we introduce and discuss a class of smoother RKHS that we call diffusion spaces. We illustrate the general results with a number of detailed examples.
    @INPROCEEDINGS{devito2019,
     title = {Reproducing kernel Hilbert spaces on manifolds: Sobolev and Diffusion spaces},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1905.10913},
     author = {Ernesto De Vito, Nicole Mücke, Lorenzo Rosasco},
  • Monte Carlo wavelets: a randomized approach to frame discretization
    Zeljko Kereta, Stefano Vigogna, Valeriya Naumova, Lorenzo Rosasco, Ernesto De Vito
    13th International Conference on Sampling Theory and Applications
    Abstract: In this paper we propose and study a family of continuous wavelets on general domains, and a corresponding stochastic discretization that we call Monte Carlo wavelets. First, using tools from the theory of reproducing kernel Hilbert spaces and associated integral operators, we define a family of continuous wavelets using spectral calculus. Then, we propose a stochastic discretization based on Monte Carlo estimates of the integral operators. Using concentration of measure results, we establish the convergence of such a discretization and derive convergence rates under natural regularity assumptions.
    @INPROCEEDINGS{Kereta2019,
     title = {Monte Carlo wavelets: a randomized approach to frame discretization},
     journal = {13th International Conference on Sampling Theory and Applications},
     year = {2019},
     url = {https://arxiv.org/abs/1903.06594},
     author = {Zeljko Kereta, Stefano Vigogna, Valeriya Naumova, Lorenzo Rosasco, Ernesto De Vito},
  • Gaussian process optimization with adaptive sketching: Scalable and no regret
    Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco
    Conference on Learning Theory (COLT 2019)
    Abstract: Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions d and iterations t. Given a set of A alternatives to choose from, the overall runtime O(t^3A) is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most Õ (d_eff) points, where d_eff is the effective dimension of the explored space, which is typically much smaller than both d and t. This greatly reduces the dimensionality of the problem, thus leading to a O(TAd_eff^2) runtime and O(Ad_eff) space complexity.
    @INPROCEEDINGS{Calandriello2019,
     title = {Gaussian process optimization with adaptive sketching: Scalable and no regret},
     journal = {COLT 2019},
     year = {2019},
     url = {https://arxiv.org/abs/1903.05594},
     author = {Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco},
  • Theory III: Dynamics and Generalization in Deep Networks
    Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio
    arXiv preprint
    Abstract: Classical generalization bounds for classification in the setting of separable data can be optimized by maximizing the margin of a deep network under the constraint of unit p-norm of the weight matrix at each layer. A possible approach for solving numerically this problem uses gradient algorithms on exponential-type loss functions, enforcing a unit constraint in the p-norm. In the limiting case of continuous gradient flow, we analyze the dynamical systems associated with three algorithms of this kind and their close relation for p=2 with existing weight normalization and batch normalization algorithms. An interesting observation is that unconstrained gradient descent has a similar dynamics with the same critical points and thus also optimizes the generalization bounds. Our approach extends some of the results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. This elusive complexity control is likely to be responsible for generalization despite overparametrization in deep networks.
    @INPROCEEDINGS{Banburski2019,
     title = {Theory III: Dynamics and Generalization in Deep Networks},
     journal = {to appear on COLT 2019},
     year = {2019},
     url = {https://arxiv.org/abs/1903.04991},
     author = {Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio},
  • Beating SGD Saturation with Tail-Averaging and Minibatching
    Nicole Mücke, Gergely Neu, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 33
    Abstract: While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.
    @INPROCEEDINGS{mucke2019,
     title = {Beating SGD Saturation with Tail-Averaging and Minibatching},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1902.08668},
     author = {Nicole Mücke, Gergely Neu, Lorenzo Rosasco},
  • A computational model for grid maps in neural populations
    Fabio Anselmi, Benedetta Franceschiello, Micah M Murray, Lorenzo Rosasco
    arXiv preprint
    Abstract: Grid cells in the entorhinal cortex, together with place, speed and border cells, are major contributors to the organization of spatial representations in the brain. In this contribution we introduce a novel theoretical and algorithmic framework able to explain the emergence of hexagonal grid-like response patterns from the statistics of the input stimuli. We show that this pattern is a result of minimal variance encoding of neurons. The novelty lies into the formulation of the encoding problem through the modern Frame Theory language, specifically that of equiangular Frames, providing new insights about the optimality of hexagonal grid receptive fields. The model proposed overcomes some crucial limitations of the current attractor and oscillatory models. It is based on the well-accepted and tested hypothesis of Hebbian learning, providing a simplified cortical-based framework that does not require the presence of theta velocity-driven oscillations (oscillatory model) or translational symmetries in the synaptic connections (attractor model). We moreover demonstrate that the proposed encoding mechanism naturally maps shifts, rotations and scaling of the stimuli onto the shape of grid cells' receptive fields, giving a straightforward explanation of the experimental evidence of grid cells remapping under transformations of environmental cues.
    @INPROCEEDINGS{anselmi2019,
     title = {A computational model for grid maps in neural populations},
     journal = {arXiv preprint},
     year = {2019},
     url = {https://arxiv.org/abs/1902.06553},
     author = {Fabio Anselmi, Benedetta Franceschiello, Micah M Murray, Lorenzo Rosasco},
  • Are we done with object recognition? The iCub robot’s perspective
    Giulia Pasquale, Carlo Ciliberto, Francesca Odone, Lorenzo Rosasco, Lorenzo Natale
    Robotics and Autonomous Systems 112
    Abstract: We report on an extensive study of the benefits and limitations of current deep learning approaches to object recognition in robot vision scenarios, introducing a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a natural human–robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Analyzing the performance of off-the-shelf models trained off-line on large-scale image retrieval datasets, we show the necessity for knowledge transfer. We evaluate different ways in which this last step can be done, and identify the major bottlenecks affecting robotic scenarios. By studying both object categorization and identification problems, we highlight key differences between object recognition in robotics applications and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm the remarkable improvements yield by deep learning in this setting, while pointing to specific open challenges that need be addressed for seamless deployment in robotics.
    @article{PASQUALE2019260,
     title = {Are we done with object recognition? The iCub robot’s perspective},
     journal = {Robotics and Autonomous Systems},
     volume = {112},
     pages = {260 - 281}, 
     year = {2019},
     issn = {0921-8890}, 
     doi = {https://doi.org/10.1016/j.robot.2018.11.001}, 
     url = {http://www.sciencedirect.com/science/article/pii/S0921889018300332},
     author = {Giulia Pasquale and Carlo Ciliberto and Francesca Odone and Lorenzo Rosasco and Lorenzo Natale},
     keywords = {Humanoid robotics, Robot vision, Visual object recognition, Machine learning, Deep learning, Transfer learning, Image dataset, Dataset collection, Representation invariance, iCub}, 
     abstract = {We report on an extensive study of the benefits and limitations of current deep learning approaches to object recognition in robot vision scenarios, introducing a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a natural human–robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Analyzing the performance of off-the-shelf models trained off-line on large-scale image retrieval datasets, we show the necessity for knowledge transfer. We evaluate different ways in which this last step can be done, and identify the major bottlenecks affecting robotic scenarios. By studying both object categorization and identification problems, we highlight key differences between object recognition in robotics applications and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm the remarkable improvements yield by deep learning in this setting, while pointing to specific open challenges that need be addressed for seamless deployment in robotics},
  • Symmetry-adapted representation learning
    Fabio Anselmi, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio
    Pattern Recognition 86
    Abstract: In this paper, we propose the use of data symmetries, in the sense of equivalences under signal transformations, as priors for learning symmetry-adapted data representations, i.e., representations that are equivariant to these transformations. We rely on a group-theoretic definition of equivariance and provide conditions for enforcing a learned representation, for example the weights in a neural network layer or the atoms in a dictionary, to have the structure of a group and specifically the group structure in the distribution of the input. By reducing the analysis of generic group symmetries to permutation symmetries, we devise a regularization scheme for representation learning algorithm, using an unlabeled training set. The proposed regularization is aimed to be a conceptual, theoretical and computational proof of concept for symmetry-adapted representation learning, where the learned data representations are equivariant or invariant to transformations, without explicit knowledge of the underlying symmetries in the data.
    @article{ANSELMI2019201,
     title = {Symmetry-adapted representation learning},
     journal = {Pattern Recognition},
     year = {2019},
     url = {http://www.sciencedirect.com/science/article/pii/S0031320318302620},
     abstract = {In this paper, we propose the use of data symmetries, in the sense of equivalences under signal transformations, as priors for learning symmetry-adapted data representations, i.e., representations that are equivariant to these transformations. We rely on a group-theoretic definition of equivariance and provide conditions for enforcing a learned representation, for example the weights in a neural network layer or the atoms in a dictionary, to have the structure of a group and specifically the group structure in the distribution of the input. By reducing the analysis of generic group symmetries to permutation symmetries, we devise a regularization scheme for representation learning algorithm, using an unlabeled training set. The proposed regularization is aimed to be a conceptual, theoretical and computational proof of concept for symmetry-adapted representation learning, where the learned data representations are equivariant or invariant to transformations, without explicit knowledge of the underlying symmetries in the data.},
     issn = {0031-3203},
     doi = {https://doi.org/10.1016/j.patcog.2018.07.025},
     author = {Fabio Anselmi and Georgios Evangelopoulos and Lorenzo Rosasco and Tomaso Poggio},
  • Thresholding gradient methods in Hilbert spaces: support identification and linear convergence
    Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa
    ESAIM: Control, Optimisation and Calculus of Variations 26, 28
    Abstract: We study l1 regularized least squares optimization problem in a separable Hilbert space. We show that the iterative soft-thresholding algorithm (ISTA) converges linearly, without making any assumption on the linear operator into play or on the problem. The result is obtained combining two key concepts: the notion of extended support, a finite set containing the support, and the notion of conditioning over finite dimensional sets. We prove that ISTA identifies the solution extended support after a finite number of iterations, and we derive linear convergence from the conditioning property, which is always satisfied for l1 regularized least squares problems. Our analysis extends to the the entire class of thresholding gradient algorithms, for which we provide a conceptually new proof of strong convergence, as well as convergence rates.
    @article{Garrigos17,
      author  = {Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa},
      title   = {Thresholding gradient methods in Hilbert spaces: support identification and linear convergence},
      journal = {to appear on ESAIM: Control, Optimisation and Calculus of Variations},
      year    = {2019},
      url     = {https://arxiv.org/abs/1712.00357}
    }
  • A visual computing approach for estimating the motility index in the frail elder
    Martini C., Noceti N., Chessa M., Barla A., Cella A., Rollandi G.A., Pilotto A., Verri A., Odone F.
    VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
    Abstract: Digital Library
  • Hey there's DALILA: A dictionary learning library
    Tozzo V., D'Amario V., Barla A.
    OpenAccess Series in Informatics
  • Data-driven continuous assessment of frailty in older people
    C Martini, A Barla, F Odone, A Verri, A Cella, GA Rollandi, A Pilotto

    Abstract: The process of aging affects an individual's potential in several dimensions, encompassing the physical, cognitive, psychological, economic, and social domains. The assessment of frailty in elderly patients is key to estimate their overall well-being and to predict mortality risk. In the clinical practice, frailty is usually estimated through medical tests and questionnaires performed sporadically. Continuous automatic assessment may help physicians in evaluating frailty by complementing their assessments with quantitative and non sporadic measurements. In this paper we present the state-of-the-art in frailty evaluation, we summarize recent research achievements that could lead to an improved assessment, and we illustrate a case study we are conducting in our institution. Finally, based on our experience and results, we comment on the open challenges of automatic assessment of frailty.
  • IMPARARE AD INSEGNARE IL PENSIERO COMPUTAZIONALE: UN’ESPERIENZA DI VERA ALTERNANZA SCUOLA-LAVORO PRESSO L’UNIVERSITA’DI GENOVA
    A Barla, B Catania, M Chessa, G Delzanno, G Guerrini, V Mascardi, N Noceti, F Odone

    Abstract: La legge 107/2015 nota come la Buona Scuola ha reso obbligatoria l’Alternanza scuola-lavoro in tutti gli indirizzi di studio della Scuola secondaria di II grado, per un minimo di 200 ore nel triennio dei Licei e di 400 ore negli Istituti Tecnici e Professionali. Il Corso di Studi in Informatica dell’Università di Genova, che proponeva da anni stage di orientamento alla scelta universitaria per studenti della scuola secondaria, si è trovato–come tutti gli altri Corsi di Studio in Italia–a dover gestire e possibilmente soddisfare le richieste delle scuole di riconoscere gli stage di orientamento come stage di Alternanza scuola-lavoro, per ottemperare a tale obbligo.
  • Multi-task multiple kernel learning reveals relevant frequency bands for critical areas localization in focal epilepsy
    V D’Amario, F Tomasi, V Tozzo, G Arnulfo, A Barla, L Nobili

    Abstract: The localization of epileptic zone in pharmacoresistant focal epileptic patients is a daunting task, typically performed by medical experts through visual inspection over highly sampled neural recordings. For a finer localization of the epileptogenic areas and a deeper understanding of the pathology both the identification of pathogenical biomarkers and the automatic characterization of epileptic signals are desirable. In this work we present a data integration learning method based on multi-level representation of stereo-electroencephalography recordings and multiple kernel learning. To the best of our knowledge, this is the first attempt to tackle both aspects simultaneously, as our approach is devised to classify critical vs. non-critical recordings while detecting the most discriminative frequency bands. The learning pipeline is applied to a data set of 18 patients for a total of 2347 neural recordings analyzed by medical experts. Without any prior knowledge assumption, the data-driven method reveals the most discriminative frequency bands for the localization of epileptic areas in the high-frequency spectrum (>= 80 Hz) while showing high performance metric scores (mean balanced accuracy of 0.89+-0.03). The promising results may represent a starting point for the automatic search of clinical biomarkers of epileptogenicity.
  • A 3-fold kernel approach for characterizing Late Onset Alzheimer’s Disease
    M Squillario, F Tomasi, V Tozzo, A Barla, D Uberti, Alzheimer’s Disease Neuroimaging Initiative
    ArXiv preprint
    Abstract: The purpose of this study is to identify a global and robust signature characterizing Alzheimer’s Disease (AD). Two public GWAS datasets were analyzed considering a 3-fold kernel approach, based on SNPs, Genes and Pathways analysis, and two binary classifications tasks were addressed: cases@controls and APOE4 task. In the SNP signature of the ADNI-1 and ADNI-2 datasets, chromosome 19 and 20 reached high classification accuracy. In addition, the functional characterization of ADNI-1 and ADNI-2 SNP signatures found enriched the same pathway (i.e., Neuroactive ligand-receptor interaction), with GRM7 gene in common with both. TOMM40 was confirmed linked to AD pathology by SNP, gene and pathway-based analyses in ADNI-1. Using this 3-fold kernel approach, a peculiar signature of SNPs, genes and pathways has been highlighted in both datasets. Based on these significant results, we retain such approach a valuable tool to elucidate the heritable susceptibility to AD but also to other similar complex diseases.
  • New miRNA signature heralds human NK cell subsets at different maturation steps: Involvement of miR-146a-5p in the regulation of KIR expression
    Pesce S., Squillario M., Greppi M., Loiacono F., Moretta L., Moretta A., Sivori S., Castagnola P., Barla A., Candiani S., Marcenaro E.
    Frontiers in Immunology
    Abstract: Natural killer cells are cytotoxic innate lymphoid cells that play an important role for early host defenses against infectious pathogens and surveillance against tumor. In humans, NK cells may be divided in various subsets on the basis of the relative CD56 expression and of the low-affinity FcγRIIIA CD16. In particular, the two main NK cell subsets are represented by the CD56bright/CD16−/dim and the CD56dim/CD16bright NK cells. Experimental evidences indicate that CD56bright and CD56dim NK cells represent different maturative stages of the NK cell developmental pathway. We identified multiple miRNAs differentially expressed in CD56bright/CD16− and CD56dim/CD16bright NK cells using both univariate and multivariate analyses. Among these, we found a few miRNAs with a consistent differential expression in the two NK cell subsets, and with an intermediate expression in the CD56bright/CD16dim NK cell subset, representing a transitional step of maturation of NK cells. These analyses allowed us to establish the existence of a miRNA signature able to efficiently discriminate the two main NK cell subsets regardless of their surface phenotype. In addition, by analyzing the putative targets of representative miRNAs we show that hsa-miR-146a-5p, may be involved in the regulation of killer Ig-like receptor (KIR) expression. These results contribute to a better understanding of the physiologic significance of miRNAs in the regulation of the development/function of human NK cells. Moreover, our results suggest that hsa-miR-146a-5p targeting, resulting in KIR down-regulation, may be exploited to generate/increment the effect of NK KIR-mismatching against HLA-class I+ tumor cells and thus improve the NK-mediated anti-tumor activity.
  • Constrained DMPs for Feasible Skill Learning on Humanoid Robots
    Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele
    2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)
    Abstract: In the context of humanoid skill learning, movement primitives have gained much attention because of their compact representation and convenient combination with a myriad of optimization approaches. Among them, a well-known scheme is to use Dynamic Movement Primitives (DMPs) with reinforcement learning (RL) algorithms. While various remarkable results have been reported, skill learning with physical constraints has not been sufficiently investigated. For example, when RL is employed to optimize the robot joint trajectories, the exploration noise could drive the resulting trajectory out of the joint limits. In this paper, we focus on robot skill learning characterized by joint limit avoidance, by introducing the novel Constrained Dynamic Movement Primitives (CDMPs). By controlling a set of transformed states (called exogenous states) instead of the original DMPs states, CDMPs are capable of maintaining the joint trajectories within the safety limits. We validate CDMPs on the humanoid robot iCub, showing the applicability of our approach.
    @inproceedings{duan2018,
     title={Constrained DMPs for Feasible Skill Learning on Humanoid Robots},
     author={Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele},
       booktitle={2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)},
     pages={1--6},
     year={2018},
     organization={IEEE}}
  • Statistical and computational trade-offs in kernel k-means
    Daniele Calandriello, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 32 (NeurIPS 2018)
    Abstract: We investigate the efficiency of k-means in terms of both statistical and computational requirements. More precisely, we study a Nyström approach to kernel k-means. We analyze the statistical properties of the proposed method and show that it achieves the same accuracy of exact kernel k-means with only a fraction of computations. Indeed, we prove under basic assumptions that sampling sqrt(n) Nyström landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy. To the best of our knowledge this is the first result of this kind for unsupervised learning.
  • On Fast Leverage Score Sampling and Optimal Learning
    Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 32 (NeurIPS 2018)
    Abstract: Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores sampling is a challenge in its own right requiring further approximations. In this paper, we study the problem of leverage score sampling for positive definite matrices defined by a kernel. Our contribution is twofold. First we provide a novel algorithm for leverage score sampling and second, we exploit the proposed method in statistical learning by deriving a novel solver for kernel ridge regression. Our main technical contribution is showing that the proposed algorithms are currently the most efficient and accurate for these problems.
    @INPROCEEDINGS{Rudi2018onfast,
     title = {On Fast Leverage Score Sampling and Optimal Learning},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1810.13258},
     author = {Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco},
  • Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces
    Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher
    Applied and Computational Harmonic Analysis
    Abstract: In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.
    @article{LIN2018,
     title = {Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces},
     journal = {Applied and Computational Harmonic Analysis},
     year = {2018},
     url = {http://www.sciencedirect.com/science/article/pii/S1063520318300174},
     abstract = {In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.},
     issn = {1063-5203},
     doi = {https://doi.org/10.1016/j.acha.2018.09.009},
     author = {Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher},
  • Modified Fejér sequences and applications
    Junhong Lin, Lorenzo Rosasco, Silvia Villa, Ding-Xuan Zhou
    Computational Optimization and Applications 71
    Abstract: In this note, we propose and study the notion of modified Fejér sequences. Within a Hilbert space setting, this property has been used to prove ergodic convergence of proximal incremental subgradient methods. Here we show that indeed it provides a unifying framework to prove convergence rates for objective function values of several optimization algorithms. In particular, our results apply to forward--backward splitting algorithm, incremental subgradient proximal algorithm, and the Douglas--Rachford splitting method including and generalizing known results.
    @Article{Lin2018,
     title = {Modified Fejér sequences and applications},
     journal = {Computational Optimization and Applications},
     year = {2018},
     month = {Sep},
     day = {01},
     url = {https://doi.org/10.1007/s10589-017-9962-1},
     abstract = {In this note, we propose and study the notion of modified Fejér sequences. Within a Hilbert space setting, this property has been used to prove ergodic convergence of proximal incremental subgradient methods. Here we show that indeed it provides a unifying framework to prove convergence rates for objective function values of several optimization algorithms. In particular, our results apply to forward--backward splitting algorithm, incremental subgradient proximal algorithm, and the Douglas--Rachford splitting method including and generalizing known results.},
     issn = {1573-2894},
     doi = {10.1007/s10589-017-9962-1},
     author = {Junhong Lin, Lorenzo Rosasco, Silvia Villa, Ding-Xuan Zhou},
  • Learning with SGD and Random Features
    Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 32 (NeurIPS 2018)
    Abstract: Sketching and stochastic gradient methods are arguably the most common tech-niques to derive efficient large-scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradients with mini batches and ran-dom features. The latter can be seen as a form of nonlinear sketching and used to define approximate kernel methods. The estimator we consider is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlight how different parameters, such as the number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments
    @INPROCEEDINGS{Carratino2018,
     title = {Learning with SGD and Random Features},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1807.06343},
     author = {Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco},
  • Manifold Structured Prediction
    Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 32 (NeurIPS 2018)
    Abstract: Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure. While classical approaches consider finite, albeit potentially huge, output spaces, in this paper we discuss how structured prediction can be extended to a continuous scenario. Specifically, we study a structured prediction approach to manifold valued regression. We characterize a class of problems for which the considered approach is statistically consistent and study how geometric optimization can be used to compute the corresponding estimator. Promising experimental results on both simulated and real data complete our study.
    @INPROCEEDINGS{Rudi2018,
     title = {Manifold Structured Prediction},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1806.09908},
     author = {Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco},
  • Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification
    Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone
    Advances in Neural Information Processing Systems 32 (NeurIPS 2018)
    Abstract: In this paper, we study the problem of deriving fast and accurate classification algorithms with uncertainty quantification. Gaussian process classification provides a principled approach, but the corresponding computational burden is hardly sustainable in large-scale problems and devising efficient alternatives is a challenge. In this work, we investigate if and how Gaussian process regression directly applied to the classification labels can be used to tackle this question. While in this case training time is remarkably faster, predictions need be calibrated for classification and uncertainty estimation. To this aim, we propose a novel approach based on interpreting the labels as the output of a Dirichlet distribution. Extensive experimental results show that the proposed approach provides essentially the same accuracy and uncertainty quantification of Gaussian process classification while requiring only a fraction of computational resources.
    @INPROCEEDINGS{Milios2018,
     title = {Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1805.10915},
     author = {Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone},
  • Speeding-up Object Detection Training for Robotics with FALKON
    Elisa Maiettini, Giulia Pasquale, Lorenzo Rosasco, Lorenzo Natale
    2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)
    Abstract: Latest deep learning methods for object detection provided remarkable performance boost, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and unbalance of the associated training sets, characterized by few positive and tons of negative (ie background) examples. Proposed approaches, either based on end-to-end learning by back-propagation [22], or standard kernel methods trained with Hard Negatives Mining on top of deep features [8], proved to be effective, but prohibitively slow for on-line applications. In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the recently proposed FALKON [23] algorithm, a novel kernel-based method that allows to quickly train on millions of points. We address the size and unbalance of training data by exploiting the stochastic subsampling intrinsic into the method, combined with a novel, fast, bootstrapping approach. We assess the effectiveness of the approach in a standard computer vision setting (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario as represented by the iCubWorld Transformations [18] dataset.
    @INPROCEEDINGS{Maiettini2018,
     title = {Speeding-up Object Detection Training for Robotics with FALKON},
     journal = {2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)},
     year = {2018},
     url = {https://arxiv.org/abs/1803.08740},
     author = {Elisa Maiettini, Giulia Pasquale, Lorenzo Rosasco, Lorenzo Natale},
  • Sparse Multiple Kernel Learning: Support Identification via Mirror Stratifiability
    Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa
    26th European Signal Processing Conference (EUSIPCO)
    Abstract: In statistical machine learning, kernel methods allow to consider infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done by solving an optimization problem depending on a data fit term and a suitable regularizer. In this paper we consider feature maps which are the concatenation of a fixed, possibly large, set of simpler feature maps. The penalty is a sparsity inducing one, promoting solutions depending only on a small subset of the features. The group lasso problem is a special case of this more general setting. We show that one of the most popular optimization algorithms to solve the regularized objective function, the forward-backward splitting method, allows to perform feature selection in a stable manner. In particular, we prove that the set of relevant features is identified by the algorithm after a finite number of iterations if a suitable qualification condition holds. The main tools used in the proofs are the notions of stratification and mirror stratifiability.
    @article{Garrigos2018,
     title = {Sparse Multiple Kernel Learning: Support Identification via Mirror Stratifiability},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1803.00783},
     author = {Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa},
  • Iterate averaging as regularization for stochastic gradient descent
    Gergely Neu, Lorenzo Rosasco
    Computational Learning Theory (COLT 2018)
    Abstract: We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.
    @article{Neu2018,
     title = {Iterate averaging as regularization for stochastic gradient descent},
     journal = {arXiv preprint},
     year = {2018},
     url = {https://arxiv.org/abs/1802.08009},
     author = {Gergely Neu, Lorenzo Rosasco},
  • Generalization properties of doubly stochastic learning algorithms
    Junhong Lin, Lorenzo Rosasco
    Journal of Complexity
    Abstract: Doubly stochastic learning algorithms are scalable kernel methods that perform very well in practice. However, their generalization properties are not well understood and their analysis is challenging since the corresponding learning sequence may not be in the hypothesis space induced by the kernel. In this paper, we provide an in-depth theoretical analysis for different variants of doubly stochastic learning algorithms within the setting of nonparametric regression in a reproducing kernel Hilbert space and considering the square loss. Particularly, we derive convergence results on the generalization error for the studied algorithms either with or without an explicit penalty term. To the best of our knowledge, the derived results for the unregularized variants are the first of this kind, while the results for the regularized variants improve those in the literature. The novelties in our proof are a sample error bound that requires controlling the trace norm of a cumulative operator, and a refined analysis of bounding initial error.
    @article{LIN2018,
     title = {Generalization properties of doubly stochastic learning algorithms},
     journal = {Journal of Complexity},
     year = {2018},
     issn = {0885-064X},
     doi = {https://doi.org/10.1016/j.jco.2018.02.004},
     url = {http://www.sciencedirect.com/science/article/pii/S0885064X18300128},
     author = {Junhong Lin and Lorenzo Rosasco},
     keywords = {Kernel method, Doubly stochastic algorithm, Nonparametric regression}}
  • LocalNysation: A bottom up approach to efficient localized kernel regression
    Nicole Mucke
    arXiv preprint arXiv:1707.03220
    Abstract: We consider a localized approach in the well-established setting of reproducing kernel learning under random design. The input space X is partitioned into local disjoint subsets Xj (j=1,...,m) equipped with a local reproducing kernel Kj. It is then straightforward to define local KRR estimates. Our first main contribution is in showing that minimax optimal rates of convergence are preserved if the number m of partitions grows sufficiently slowly with the sample size, under locally different degrees on smoothness assumptions on the regression function. As a byproduct, we show that low smoothness on exceptional sets of small probability does not contribute, leading to a faster rate of convergence. Our second contribution lies in showing that the partitioning approach for KRR can be efficiently combined with local Nystr"om subsampling, improving computational cost twofold. If the number of locally subsampled inputs grows sufficiently fast with the sample size, minimax optimal rates of convergence are maintained.
    @Article{Mucke2018,
     author={Nicole Mucke},
     title={LocalNysation: A bottom up approach to efficient localized kernel regression},
     journal={arXiv preprint arXiv:1707.03220},
     year={2018},
     url={https://www.researchgate.net/profile/Nicole_Muecke/publication/318432070_LocalNysation_A_bottom_up_approach_to_efficient_localized_kernel_regression/links/5a8828a5458515b8af90c705/LocalNysation-A-bottom-up-approach-to-efficient-localized-kernel-regression.pdf}}
  • Iterative regularization via dual diagonal descent
    Guillaume Garrigos, Lorenzo Rosasco and Silvia Villa
    Journal of Mathematical Imaging and Vision 60
    Abstract: In the context of linear inverse problems, we propose and study a general iterative regularization method allowing to consider large classes of data-fit terms and regularizers. The algorithm we propose is based on a primal-dual diagonal descent method. Our analysis establishes convergence as well as stability results. Theoretical findings are complemented with numerical experiments showing state-of-the-art performances.
    @Article{Garrigos2018,
     author={Garrigos, Guillaume and Rosasco, Lorenzo and Villa, Silvia},
     title={Iterative Regularization via Dual Diagonal Descent},
     journal={Journal of Mathematical Imaging and Vision},
     year={2018},
     month={Feb},
     day={01},
     volume={60},
     number={2},
     pages={189--215},
     issn={1573-7683},
     doi={10.1007/s10851-017-0754-0},
     url={https://doi.org/10.1007/s10851-017-0754-0}}
  • Theory of deep learning III: the non-overfitting puzzle
    T Poggio, K Kawaguchi, Q Liao, B Miranda, L Rosasco, X Boix, J Hidary, HN Mhaskar
    CBMM memo 073
    Abstract: A main puzzle of deep networks revolves around the apparent absence of overfitting intended as robustness of the expected error against overparametrization, despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to a gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian. The proposition depends on the qualitative theory of dynamical systems and is supported by numerical results. The result extends to deep nonlinear networks two key properties of gradient descent for linear networks, that have been recently recognized (1) to provide a form of implicit regularization: 1. For classification, which is the main application of today’s deep networks, there is asymptotic convergence to the maximum margin solution by minimization of loss functions such as the logistic, the cross entropy and the exp-loss . The maximum margin solution guarantees good classification error for “low noise” datasets. Importantly, this property holds independently of the initial conditions. Because of this property, our proposition guarantees a maximum margin solution also for deep nonlinear networks. 2. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converges to the minimum norm solution for appropriate initial conditions of gradient descent. This implies that there is usually an optimum early stopping that avoids overfitting of the expected risk. This property, valid for the square loss and many other loss functions, is relevant especially for regression. In the case of deep nonlinear networks the solution however is not expected to be strictly minimum norm, unlike the linear case. The robustness to overparametrization has suggestive implications for the robustness of the architecture of deep convolutional networks with respect to the curse of dimensionality.
    @article{Poggio2018,
      author  = {T Poggio, K Kawaguchi, Q Liao, B Miranda, L Rosasco, X Boix, J Hidary, HN Mhaskar},
      title   = {Theory of deep learning III: the non-overfitting puzzle},
      journal = {CBMM memo 073},
      year    = {2018},
      url     = {https://cbmm.mit.edu/sites/default/files/publications/CBMM-Memo-073v3.pdf
    }
  • Data-driven strategies for robust forecast of continuous glucose monitoring time-series
    Fiorini S., Martini C., Malpassi D., Cordera R., Maggi D., Verri A., Barla A.
    Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
    Abstract: Over the past decade, continuous glucose monitoring (CGM) has proven to be a very resourceful tool for diabetes management. To date, CGM devices are employed for both retrospective and online applications. Their use allows to better describe the patients' pathology as well as to achieve a better control of patients' level of glycemia. The analysis of CGM sensor data makes possible to observe a wide range of metrics, such as the glycemic variability during the day or the amount of time spent below or above certain glycemic thresholds. However, due to the high variability of the glycemic signals among sensors and individuals, CGM data analysis is a non-trivial task. Standard signal filtering solutions fall short when an appropriate model personalization is not applied. State-of-the-art data-driven strategies for online CGM forecasting rely upon the use of recursive filters. Each time a new sample is collected, such models need to adjust their parameters in order to predict the next glycemic level. In this paper we aim at demonstrating that the problem of online CGM forecasting can be successfully tackled by personalized machine learning models, that do not need to recursively update their parameters.
  • PALLADIO: A parallel framework for robust variable selection in high-dimensional data
    Barbieri M., Fiorini S., Tomasi F., Barla A.
    Proceedings of PyHPC 2016: 6th Workshop on Python for High-Performance and Scientific Computing - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis
    Abstract: The main goal of supervised data analytics is to model a target phenomenon given a limited amount of samples, each represented by an arbitrarily large number of variables. Especially when the number of variables is much larger than the number of available samples, variable selection is a key step as it allows to identify a possibly reduced subset of relevant variables describing the observed phenomenon. Obtaining interpretable and reliable results, in this highly indeterminate scenario, is often a non-trivial task. In this work we present PALLADIO, a framework designed for HPC cluster architectures, that is able to provide robust variable selection in high-dimensional problems. PALLADIO is developed in Python and it integrates CUDA kernels to decrease the computational time needed for several independent element-wise operations. The scalability of the proposed framework is assessed on synthetic data of different sizes, which represent realistic scenarios.
  • Theory of Deep Learning III: explaining the non-overfitting puzzle
    Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
    arXiv preprint arXiv:1801.00173
    Abstract: A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian. The proposition depends on the qualitative theory of dynamical systems and is supported by numerical results. Our main propositions extend to deep nonlinear networks two properties of gradient descent for linear networks, that have been recently established (1) to be key to their generalization properties: 1. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converges to the minimum norm solution for appropriate initial conditions of gradient descent. This implies that there is usually an optimum early stopping that avoids overfitting of the loss. This property, valid for the square loss and many other loss functions, is relevant especially for regression. 2. For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for low noise datasets. This property holds for loss functions such as the logistic and cross-entropy loss independently of the initial conditions. The robustness to overparametrization has suggestive implications for the robustness of the architecture of deep convolutional networks with respect to the curse of dimensionality.
    @article{Poggio2017,
      author  = {Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar},
      title   = {Theory of deep learning III: explaining the non-overfitting puzzle},
      journal = {arXiv preprint arXiv:1801.00173},
      year    = {2017},
      url     = {https://arxiv.org/abs/1801.00173}
    }
  • FALKON: An Optimal Large Scale Kernel Method
    Alessandro Rudi, Luigi Carratino, and Lorenzo Rosasco
    Advances in Neural Information Processing Systems 30 (NeurIPS 2017)
    Abstract: Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially O(n) memory and O(n√n) time. Extensive experiments show that state of the art results on available large scale datasets can be achieved even on a single machine.
    @article{rudi17,
      author  = {Alessandro Rudi and Luigi Carratino and Lorenzo Rosasco},
      title   = {FALKON: An Optimal Large Scale Kernel Method},
      journal = {Advances in Neural Information Processing Systems 30},
      year    = {2017},
      url     = {https://arxiv.org/abs/1705.10958}
    }
  • Generalization Properties of Learning with Random Features
    Alessandro Rudi, Lorenzo Rosasco
    Advances in Neural Information Processing Systems 30 (NeurIPS 2017)
    Abstract: We study the generalization properties of regularized learning with random features in the statistical learning theory framework. We show that optimal learning errors can be achieved with a number of features smaller than the number of examples. As a byproduct, we also show that learning with random features can be seen as a form of regularization, rather than only a way to speed up computations.
    @article{rudi2017generalization,
      title={Generalization Properties of Learning with Random Features},
      author={Rudi, Alessandro and Camoriano, Raffaello and Rosasco, Lorenzo},
      journal={Advances in Neural Information Processing Systems 30},
      year ={2017}
    }
  • Interactive Data Collection for Deep Learning Object Detectors on Humanoid Robots
    Elisa Maiettini, Giulia Pasquale, Lorenzo Rosasco and Lorenzo Natale
    2017 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)
    Abstract: Deep Learning (DL) methods are notoriously data hungry. Their adoption in robotics is challenging due to the cost associated with data acquisition and labeling. In this paper we focus on the problem of object detection, i.e. the simultaneous localization and recognition of objects in the scene, for which various DL architectures have been proposed in the literature. We propose to use an automatic annotation procedure, which leverages on human-robot interaction and depth-based segmentation, for the acquisition and labeling of training examples. We fine-tune the Faster R-CNN [37] network with these data acquired by the robot autonomously. We measure the performance on the same dataset and investigate the generalization abilities of the network on different settings and in absence of explicit segmentation, showing good detection performance. Experiments on the iCub humanoid robot [26]show that the proposed strategy is effective and can
    @INPROCEEDINGS{maiettini2017,
     author={E. Maiettini and G. Pasquale and L. Rosasco and L. Natale},
     booktitle={2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids)},
     title={Interactive data collection for deep learning object detectors on humanoid robots},
     year={2017},
     pages={862-868},
     keywords={data acquisition;human-robot interaction;humanoid robots;image segmentation;learning (artificial intelligence);object detection;robot vision;DL methods;R-CNN network;automatic annotation procedure;data acquisition;data labeling;deep learning object detectors;deep object detection algorithms;depth-based segmentation;detection performance;explicit segmentation;human-robot interaction;humanoid robots;iCub humanoid robot;interactive data collection;Computer architecture;Feature extraction;Object detection;Robot kinematics;Training},
     doi={10.1109/HUMANOIDS.2017.8246973},
     month={Nov},}
  • Optimal Rates for Learning with Nystrom Stochastic Gradient Methods
    Junhong Lin, Lorenzo Rosasco
    arXiv preprint arXiv:1710.07797
    Abstract: In the setting of nonparametric regression, we propose and study a combination of stochastic gradient methods with Nystr"om subsampling, allowing multiple passes over the data and mini-batches. Generalization error bounds for the studied algorithm are provided. Particularly, optimal learning rates are derived considering different possible choices of the step-size, the mini-batch size, the number of iterations/passes, and the subsampling level. In comparison with state-of-the-art algorithms such as the classic stochastic gradient methods and kernel ridge regression with Nystr"om, the studied algorithm has advantages on the computational complexity, while achieving the same optimal learning rates. Moreover, our results indicate that using mini-batches can reduce the total computational cost while achieving the same optimal statistical results.
    @article{Lin2017,
      title={Optimal Rates for Learning with Nystrom Stochastic Gradient Methods},
      author={Junhong Lin, Lorenzo Rosasco},
      journal={arXiv preprint arXiv:1710.07797},
      year ={2017}
    }
  • Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review
    Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao
    Institute of Automation, Chinese Academy of Sciences
    Abstract: The paper reviews and extends an emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. Implications of a few key theorems are discussed, together with new results, open problems and conjectures.
    @article{@Article{Poggio2017,
     author={Poggio, Tomaso and Mhaskar, Hrushikesh and Rosasco, Lorenzo and Miranda, Brando and Liao, Qianli},
     title={Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review},
     journal={International Journal of Automation and Computing},
     year={2017},
     month={Oct},
     day={01},
     volume={4},
     number={5},
     pages={503--519},
     issn={1751-8520},
     doi={10.1007/s11633-017-1054-2},
     url={https://doi.org/10.1007/s11633-017-1054-2}}
  • Optimal Rates for Multi-pass Stochastic Gradient Methods
    Junhong Lin and Lorenzo Rosasco
    Journal of Machine Learning Research (JMLR)
    Abstract: We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases. As a byproduct, we derive optimal convergence results for batch gradient methods (even in the non-attainable cases)
    @article{JMLR:v18:17-176, 
      author  = {Junhong Lin and Lorenzo Rosasco}, 
      title   = {Optimal Rates for Multi-pass Stochastic Gradient Methods},
      journal = {Journal of Machine Learning Research},
      year    = {2017},
      volume  = {18},
      number  = {97},
      pages   = {1-47},
      url     = {http://jmlr.org/papers/v18/17-176.html}
    }
  • Solving lp-norm regularization with tensor kernels
    Saverio Salzo and Johan K.A. Suykens and Lorenzo Rosasco
    arXiv:1707.05609, 2017
    Abstract: In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of lp regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods cannot be extended beyond Hilbert setting. Numerical experiments confirm the effectiveness of the method.
    @article{salzo2017b,
      author  = {Saverio Salzo and Johan K.A. Suykens and Lorenzo Rosasco},
      title   = {Solving lp-norm regularization with tensor kernels},
      journal = {},
      year    = {2017},
      volume  = {arXiv:1707.05609},
      pages   = {1-20},
      url     = {https://arxiv.org/abs/1707.05609}
    }
  • Convergence of the Forward-Backward algorithm: beyond the worst case with geometry
    Guillaume Garrigos, Lorenzo Rosasco and Silvia Villa
    arXiv:1703.09477
    Abstract: We provide a comprehensive study of the convergence of forward-backward algorithm under suitable geometric conditions leading to fast rates. We present several new results and collect in a unified view a variety of results scattered in the literature, often providing simplified proofs. Novel contributions include the analysis of infinite dimensional convex minimization problems, allowing the case where minimizers might not exist. Further, we analyze the relation between different geometric conditions, and discuss novel connections with a priori conditions in linear inverse problems, including source conditions, restricted isometry properties and partial smoothness.
    @ARTICLE{GarRosVil17b,
      AUTHOR  = {G. Garrigos, L. Rosasco and S. Villa},
      TITLE   = {Convergence of the Forward-Backward algorithm: beyond the worst case with geometry},
      JOURNAL = {Preprint on \href{https://arxiv.org/abs/1703.09477}{arXiv:1703.09477}},
      VOLUME  = {},
      ISSUE   = {},
      PAGES   = {},
      YEAR    = 2017,
      url     = {https://arxiv.org/abs/1703.09477}
    }
  • The variable metric forward-backward splitting algorithm under mild differentiability assumptions
    Saverio Salzo
    SIAM Journal on Optimization, 2017.
    Abstract: In this work we study the variable metric forward-backward splitting algorithm for convex minimization problems replacing the standard assumption of the Lipschitz continuity of the gradient with backtracking line search procedures.
    @article{salzo2017a,
      author  = {Saverio Salzo},
      title   = {The variable metric forward-backward splitting algorithm under mild},
      journal = {SIAM Journal on Optimization},
      volume  = {27}, 
      pages   = {2153-2181}, 
      year    = {2017},
      url     = {https://arxiv.org/abs/1605.00952}
    }
  • Consistent learning by composite proximal thresholding
    Patrick L. Combettes and Saverio Salzo and Silvia Villa
    Mathematical Programming, 2017.
    Abstract: In this work we propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. Statistical and algorithmic properties are studied.
    @article{combettes2017a,
      author  = {Patrick L. Combettes and Saverio Salzo and Silvia Villa},
      title   = {Consistent learning by composite proximal thresholding},
      journal = {Mathematical Programming},
      year    = {2017},
      url     = {https://link.springer.com/article/10.1007/s10107-017-1133-8}
    }
  • Incremental Robot Learning of New Objects with Fixed Update Time
    Raffaello Camoriano*, Giulia Pasquale*, Carlo Ciliberto, Lorenzo Natale, Lorenzo Rosasco and Giorgio Metta
    Robotics and Automation (ICRA), 2017 IEEE International Conference on
    Abstract: We consider object recognition in the context of lifelong learning, where a robotic agent learns to discriminate between a growing number of object classes as it accumulates experience about the environment. We propose an incremental variant of the Regularized Least Squares for Classification (RLSC) algorithm, and exploit its structure to seamlessly add new classes to the learned model. The presented algorithm addresses the problem of having an unbalanced proportion of training examples per class, which occurs when new objects are presented to the system for the first time. We evaluate our algorithm on both a machine learning benchmark dataset and two challenging object recognition tasks in a robotic setting. Empirical evidence shows that our approach achieves comparable or higher classification performance than its batch counterpart when classes are unbalanced, while being significantly faster.
    @INPROCEEDINGS{camoriano17icra,
      author    = {Raffaello Camoriano and Giulia Pasquale and Carlo Ciliberto and Lorenzo Natale and Lorenzo Rosasco and Giorgio Metta},
      title     = {Incremental Robot Learning of New Objects with Fixed Update Time},
      booktitle = {2017 IEEE International Conference on Robotics and Automation (ICRA)}, ,
      year      = {2017},
      url       = {https://fr.arxiv.org/pdf/1605.05045.pdf}
    }
  • View-Tolerant Face Recognition and Hebbian Learning Imply Mirror-Symmetric Neural Tuning to Head Orientation
    J. Z. Leibo, Q. Liao, F Anselmi, W. A. Freiwald, T. Poggio
    Current Biology, 27,1-6 (2017)
    Abstract:  We develop a new computational model explaining the emergence of mirror symmetric receptive fields in monkey face-patch in monkey visual cortex.
    @article{anselmi2017,
      author  = {J.  Z. Leibo, Liao, Q., Anselmi, F., Freiwald, W. A., and Poggio, T.},
      title   = { View-Tolerant Face Recognition and Hebbian Learning Imply Mirror-Symmetric Neural Tuning to Head Orientation},
      journal = {Current Biology},
      year    = {2017},
      volume  = {27},
      pages   = {1-6},
      url     = {http://www.cell.com/current-biology/abstract/S0960-9822(16)31200-3}
    }
  • Symmetry regularization
    F. Anselmi, G. Evangelopoulos, L. Rosasco, T. Poggio
    CBMM Memo
    Abstract:  We derive a new regularization scheme to learn group symmetries in data.
    @article{anselmi2017,
      author  = {Anselmi, F, Evangelopoulos, G, Rosasco, L, Poggio, T},
      title   = { Symmetry regularization},
      journal = {CBMM Memos},
      year    = {2017},
      volume  = {},
      pages   = {},
      url     = {https://cbmm.mit.edu/publications/symmetry-regularization}
    }
  • Consistent Multitask Learning with Nonlinear Output Relations
    Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil
    arXiv:1705.08118
    Abstract: Key to multitask learning is exploiting relationships between different tasks to improve prediction performance. If the relations are linear, regularization approaches can be used successfully. However, in practice assuming the tasks to be linearly related might be restrictive, and allowing for nonlinear structures is a challenge. In this paper, we tackle this issue by casting the problem within the framework of structured prediction. Our main contribution is a novel algorithm for learning multiple tasks which are related by a system of nonlinear equations that their joint outputs need to satisfy. We show that the algorithm is consistent and can be efficiently implemented. Experimental results show the potential of the proposed method.
    @article{ciliberto17,
      author  = {Carlo Ciliberto and Alessandro Rudi and Lorenzo Rosasco and Massimiliano Pontil},
      title   = {Consistent Multitask Learning with Nonlinear Output Relations},
      journal = {CoRR},
      volume  = {abs/1705.08118},
      year    = {2017},
      biburl  = {http://dblp.uni-trier.de/rec/bib/journals/corr/CilibertoRRP17}
    }
  • Active Video Summarization: Customized Summaries via On-line Interaction with the User
    Ana Garcia del Molino, Xavier Boix, Joo-Hwee Lim, Ah-Hwee Tan
    AAAI Conference on Artificial Intelligence
    Abstract: To facilitate the browsing of long videos, automatic video summarization provides an excerpt that represents its content. In the case of egocentric and consumer videos, due to their personal nature, adapting the summary to specific user's preferences is desirable. Current approaches to customizable video summarization obtain the user's preferences prior to the summarization process. As a result, the user needs to manually modify the summary to further meet the preferences. In this paper, we introduce Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary. AVS asks questions about the summary to update it on-line until the user is satisfied. To minimize the interaction, the best segment to inquire next is inferred from the previous feedback. We evaluate AVS in the commonly used UTEgo dataset. We also introduce a new dataset for customized video summarization (CSumm) recorded with a Google Glass. The results show that AVS achieves an excellent compromise between usability and quality. In 41% of the videos, AVS is considered the best over all tested baselines, including summaries manually generated. Also, when looking for specific events in the video, AVS provides an average level of satisfaction higher than those of all other baselines after only six questions to the user.
    @inproceedings{garcia17,
      author  = {Ana Garcia del Molino and Xavier Boix and Joo-Hwee Lim and Ah-Hwee Tan},
      title   = {Active Video Summarization: Customized Summaries via On-line Interaction with the User},
      booktitle = {AAAI Conference on Artificial Intelligence},
      year    = {2017}
    }
  • Iterative regularization via dual diagonal descent
    Guillaume Garrigos, Lorenzo Rosasco and Silvia Villa
    Journal of Mathematical Imaging and Vision
    Abstract: In the context of linear inverse problems, we propose and study a general iterative regularization method allowing to consider large classes of regularizers and data-fit terms. The algorithm we propose is based on a primal-dual diagonal descent method. Our analysis establishes convergence as well as stability results. Theoretical findings are complemented with numerical experiments showing state of the art performances.
    @ARTICLE{GarRosVil17a,
      author={G. Garrigos, L. Rosasco and S. Villa},
      title={Iterative regularization via Diagonal Dual Descent},
      journal={Journal of Mathematical Imaging and Vision, to appear},
      volume= {},
      issue = {},
      pages= {},
      YEAR = 2017,
      url = {https://doi.org/10.1007/s10851-017-0754-0}
    }
  • Regularized Learning Schemes in Feature Banach Spaces
    Patrick L. Combettes and Saverio Salzo and Silvia Villa
    Analysis and Applications (Online Ready), 2016.
    @article{salzo2016a,
      author  = {Patrick L. Combettes and Saverio Salzo and Silvia Villa},
      title   = {Regularized Learning Schemes in Feature {B}anach Spaces},
      journal = {Analysis and Applications { (Online Ready)}},
      year    = {2016},
      volume  = {},
      pages   = {1-46},
      url     = {http://www.worldscientific.com/doi/abs/10.1142/S0219530516500202}
    }
  • Online semi-parametric learning for inverse dynamics modeling
    Diego Romeres, Mattia Zorzi, Raffaello Camoriano and Alessandro Chiuso
    55th IEEE Conference on Decision and Control (CDC 2016)
    Abstract: This paper presents a semi-parametric algorithm for online learning of a robot inverse dynamics model. It combines the strength of the parametric and non-parametric modeling. The former exploits the rigid body dynamics equation, while the latter exploits a suitable kernel function. We provide an extensive comparison with other methods from the literature using real data from the iCub humanoid robot. In doing so we also compare two different techniques, namely cross validation and marginal likelihood optimization, for estimating the hyperparameters of the kernel function.
    @article{Romeres_etal2016,
      author={Diego Romeres, Mattia Zorzi, Raffaello Camoriano and Alessandro Chiuso},
      title={Online semi-parametric learning for inverse dynamics modeling},
      booktitle={55th IEEE Conference on Decision and Control (CDC 2016)},
      year={2016},
      address={Las Vegas, USA, December 12-14, 2016},
      abstract={This paper presents a semi-parametric algorithm for online learning of a robot inverse dynamics model. It combines the strength of the parametric and non-parametric modeling. The former exploits the rigid body dynamics equation, while the latter exploits a suitable kernel function. We provide an extensive comparison with other methods from the literature using real data from the iCub humanoid robot. In doing so we also compare two different techniques, namely cross validation and marginal likelihood optimization, for estimating the hyperparameters of the kernel function.},
      optlocation={Raffello Camoriano (raffaello.camoriano at iit dot it)},
      opturl={https://arxiv.org/pdf/1603.05412v2.pdf}
    }
  • Visual Cortex and Deep Networks. Learning Invariant Representations
    Fabio Anselmi, Tomaso Poggio
    Book MIT press 2016
    Abstract: The authors propose a theory based on the hypothesis that the main computational goal of the ventral stream is to compute neural representations of images that are invariant to transformations commonly encountered in the visual environment and are learned from unsupervised experience. They describe a general theoretical framework of a computational theory of invariance (with details and proofs offered in appendixes) and then review the application of the theory to the feedforward path of the ventral stream in the primate visual cortex.
    @book{PoggioAnselmiBook2016,
      author  = {Fabio Anselmi, Tomaso Poggio},
      title   = {Visual Cortex and Deep Networks. Learning Invariant Representations},
      year    = {2016},
      url     = {https://mitpress.mit.edu/books/visual-cortex-and-deep-networks}
    }
  • Visual Recognition for Humanoid Robots
    Sean Ryan Fanello, Carlo Ciliberto, Nicoletta Noceti and Francesca Odone
    Robotics and Autonomous Systems (In press)
    Abstract: Visual perception is a fundamental component for most robotics systems operating in human environments. Specifically, visual recognition is a prerequisite to a large variety of tasks such as tracking, manipulation, human-robot interaction. As a consequence, the lack of successful recognition often becomes a bottleneck for the application of robotics system to real-world situations. In this paper we aim at improving the robot visual perception capabilities in a natural, human-like fashion, with a very limited amount of constraints to the acquisition scenario. In particular our goal is to build and analyze a learning system that can rapidly be re-trained in order to incorporate new evidence if available. To this purpose, we review the state-of-the-art coding-pooling pipelines for visual recognition and propose two modifications which allow us to improve the quality of the representation, while maintaining real-time performances: a coding scheme, Best Code Entries (BCE), and a new pooling operator, Mid-Level Classification Weights (MLCW). The former focuses entirely on sparsity and improves the stability and computational efficiency of the coding phase, the latter increases the discriminability of the visual representation, and therefore the overall recognition accuracy of the system, by exploiting data supervision. The proposed pipeline is assessed from a qualitative perspective on a Human-Robot Interaction (HRI) application on the iCub platform. Quantitative evaluation of the proposed system is performed both on in-house robotics data-sets (iCubWorld) and on established computer vision benchmarks (Caltech-256, PASCAL VOC 2007). As a byproduct of this work, we provide for the robotics community an implementation of the proposed visual recognition pipeline which can be used as perceptual layer for more complex robotics applications.
    @article{fanello16,
      author  = {Sean Ryan Fanello and Carlo Ciliberto and Nicoletta Noceti and Francesca Odone},
      title   = {Visual Recognition for Humanoid Robots},
      journal = {Robotics and Autonomous Systems},
      year    = {2016}
    }
  • Object Identification from Few Examples by Improving the Invariance of a Deep Convolutional Neural Network
    G. Pasquale and C. Ciliberto and L. Rosasco and L. Natale
    2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Abstract: The development of reliable and robust visual recognition systems is a main challenge towards the deployment of autonomous robotic agents in unconstrained environments. Learning to recognize objects requires image representations that are discriminative to relevant information while being invariant to nuisances, such as scaling, rotations, light and background changes, and so forth. Deep Convolutional Neural Networks can learn such representations from large web-collected image datasets and a natural question is how these systems can be best adapted to the robotics context where little supervision is often available. In this work, we investigate different training strategies for deep architectures on a new dataset collected in a real-world robotic setting. In particular we show how deep networks can be tuned to improve invariance and discriminability properties and perform object identification tasks with minimal supervision.
    @inproceedings{pasquale2016iros,
      author = {G. Pasquale and C. Ciliberto and L. Rosasco and L. Natale},
      booktitle = {2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      doi = {10.1109/IROS.2016.7759720},
      month = {Oct},
      pages = {4904-4911},
      title = {{Object Identification from Few Examples by Improving the Invariance of a Deep Convolutional Neural Network}},
      year = {2016},
      bdsk-Url-1 = {http://dx.doi.org/10.1109/IROS.2016.7759720},
      abstract = {The development of reliable and robust visual recognition systems is a main challenge towards the deployment of autonomous robotic agents in unconstrained environments. Learning to recognize objects requires image representations that are discriminative to relevant information while being invariant to nuisances, such as scaling, rotations, light and background changes, and so forth. Deep Convolutional Neural Networks can learn such representations from large web-collected image datasets and a natural question is how these systems can be best adapted to the robotics context where little supervision is often available. In this work, we investigate different training strategies for deep architectures on a new dataset collected in a real-world robotic setting. In particular we show how deep networks can be tuned to improve invariance and discriminability properties and perform object identification tasks with minimal supervision.}
    }
  • Combining sensory modalities and exploratory procedures to improve haptic object recognition in robotics
    B. Higy, C. Ciliberto, L. Rosasco and L. Natale
    2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)
    Abstract: In this paper we tackle the problem of object recognition using haptic feedback from a robot holding and manipulating different objects. One of the main challenges in this setting is to understand the role of different sensory modalities (namely proprioception, object weight from F/T sensors and touch) and how to combine them to correctly discriminate different objects. We investigated these aspects by considering multiple sensory channels and different exploratory strategies to gather meaningful information regarding the object's physical properties. We propose a novel strategy to train a learning machine able to efficiently combine sensory modalities by first learning individual object features and then combine them in a single classifier. To evaluate our approach and compare it with previous methods we collected a dataset for haptic object recognition, comprising 11 objects that were held in the hands of the iCub robot while performing different exploration strategies. Results show that our strategy consistently outperforms previous approaches [17].
    @inproceedings{higy2016,
      author={B. Higy and C. Ciliberto and L. Rosasco and L. Natale},
      booktitle={2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)},
      title={Combining sensory modalities and exploratory procedures to improve haptic object recognition in robotics},
      year={2016},
      pages={117-124},
      doi={10.1109/HUMANOIDS.2016.7803263},
      month={Nov}
    }
  • Active perception: Building objects' models using tactile exploration
    B. Higy, C. Ciliberto, L. Rosasco and L. Natale
    2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)
    Abstract: In this paper we present an efficient active learning strategy applied to the problem of tactile exploration of an object's surface. The method uses Gaussian process (GPs) classification to efficiently sample the surface of the object in order to reconstruct its shape. The proposed method iteratively samples the surface of the object, while, simultaneously constructing a probabilistic model of the object's surface. The probabilities in the model are used to guide the exploration. At each iteration, the estimate of the object's shape is used to slice the object in equally spaced intervals along the height of the object. The sampled locations are then labelled according to the interval in which their height falls. In its simple form, the data are labelled as belonging to the object and not belonging to the object: object and no-object, respectively. A GP classifier is trained to learn the object/no-object decision boundary. The next location to be sampled is selected at the classification boundary, in this way, the exploration is biased towards more informative areas. Complex features of the object's surface is captured by increasing the number of intervals as the number of sampled locations is increased. We validated our approach on six objects of different shapes using the iCub humanoid robot. Our experiments show that the method outperforms random selection and previous work based on GP regression by sampling more points on and near-the-boundary of the object.
    @inproceedings{jamali2016,
      author={N. Jamali and C. Ciliberto and L. Rosasco and L. Natale},
      booktitle={2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)},
      title={Active perception: Building objects' models using tactile exploration},
      year={2016},
      pages={179-185},
      doi={10.1109/HUMANOIDS.2016.7803275},
      month={Nov}
    }
  • A Consistent Regularization Approach for Structured Prediction
    Ciliberto, Carlo and Rosasco, Lorenzo and Rudi, Alessandro
    Advances in Neural Information Processing Systems 29 (NeurIPS 2016)
    Abstract: We propose and analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed method. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.
    @incollection{NeurIPS2016_6093,
      title = {A Consistent Regularization Approach for Structured Prediction},
      author = {Ciliberto, Carlo and Rosasco, Lorenzo and Rudi, Alessandro},
      booktitle = {Advances in Neural Information Processing Systems 29},
      editor = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
      pages = {4412--4420},
      year = {2016},
      publisher = {Curran Associates, Inc.},
      url = {http://papers.NeurIPS.cc/paper/6093-a-consistent-regularization-approach-for-structured-prediction.pdf}
    }
  • Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex
    Jim Mutch, Fabio Anselmi , Andrea Tacchetti, Lorenzo Rosasco, Joel Z. Leibo, Tomaso Poggio
    Computational and Cognitive Neuroscience of Vision Part of the series Cognitive Science and Technology, pp 84-104, 2016
    Abstract: Tuning properties of simple cells in cortical V1 can be described in terms of a ?universal shape? characterized quantitatively by parameter values which hold across different species . We show here that these properties are quantitatively predicted by the hypothesis that the goal of the ventral stream is to compute for each image a ?signature? vector which is invariant to geometric transformations.
    @book{Mutch16,
      author  = {Jim Mutch, Fabio Anselmi , Andrea Tacchetti, Lorenzo Rosasco, Joel Z. Leibo, Tomaso Poggio},
      title   = {Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex},
      journal = {Computational and Cognitive Neuroscience of Vision Part of the series Cognitive Science and Technology },
      year    = {2016},
      volume  = {},
      number  = {},
      pages   = {84-104},
      url     ={http://ow.ly/UOiD304YWfb}
    }
  • On invariance and selectivity in representation learning
    Fabio Anselmi, Lorenzo Rosasco and Tomaso Poggio
    Information and Inference, Vol. 5 N 2, pg 134-158, 2016
    Abstract: We discuss data representation which can be learned automatically from data, are invariant to transformations, and at the same time selective, in the sense that two points have the same representation only if they are one the transformation of the other. The mathematical results here sharpen some of the key claims of i-theory -- a recent theory of feedforward processing in sensory cortex.
    @article{anselmi15,
      author  = {Fabio Anselmi, Lorenzo Rosasco and Tomaso Poggio},
      title   = {On invariance and selectivity in representation learning},
      journal = {Information and Inference},
      year    = {2016},
      volume  = {5},
      number  = {2},
      pages   = {134-158},
      url     = {http://imaiai.oxfordjournals.org/content/5/2/134.full.pdf+html?sid=932cd117-4307-409b-9fc0-3acd9be20af0}
    }
  • Optimal Learning for Multi-pass Stochastic Gradient Methods
    Junhong Lin and Lorenzo Rosasco
    Advances in Neural Information Processing Systems (NeurIPS)
    Abstract: We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases.
    @article{lin2016optimal,
      title={Optimal Learning for Multi-pass Stochastic Gradient Methods},
      author={Junhong Lin and Lorenzo Rosasco},
      journal={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2016},
      url={http://arxiv.org/abs/1605.08882}
    }
  • Generalization Properties and Implicit Regularization for Multiple Passes SGM
    Junhong Lin, Raffaello Camoriano and Lorenzo Rosasco
    Proceedings of The 33rd International Conference on Machine Learning (ICML), 2340-2348, 2016.
    Abstract: We study the generalization properties of stochastic gradient methods for learning with convex loss functions and linearly parameterized functions. We show that, in the absence of penalizations or constraints, the stability and approximation properties of the algorithm can be controlled by tuning either the step-size or the number of passes over the data. In this view, these parameters can be seen to control a form of implicit regularization. Numerical results complement the theoretical findings.
    @article{lin2016generalization,
      author={Junhong Lin and Raffaello Camoriano and Lorenzo Rosasco},
      title={Generalization Properties and Implicit Regularization for Multiple Passes SGM},
      journal={Proceedings of The 33rd International Conference on Machine Learning (ICML)},
      year={2016},
      pages={2340-2348},
      url={http://jmlr.org/proceedings/papers/v48/lina16.html}
    }
  • Iterative Regularization for Learning with Convex Loss Functions
    Junhong Lin, Lorenzo Rosasco and Ding-Xuan Zhou
    Journal of Machine Learning Research, 17(77): 1-38, 2016.
    Abstract: We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration. We consider a nonparametric setting, in the framework of reproducing kernel Hilbert spaces, and prove consistency and finite sample bounds on the excess risk under general regularity conditions. Our study provides a new class of efficient regularized learning algorithms and gives insights on the interplay between statistics and optimization in machine learning.
    @article{lin2016iterative,
      author  = {Junhong Lin and Lorenzo Rosasco and Ding-Xuan Zhou},
      title   = {Iterative Regularization for Learning with Convex Loss Functions},
      journal = {Journal of Machine Learning Research},
      year    = {2016},
      volume  = {17(77)},
      pages   = {1-38},
      url     = {http://www.jmlr.org/papers/volume17/15-115/15-115.pdf}
    }
  • Regularized Learning Schemes in Feature Banach Spaces
    Patrick L. Combettes, Saverio Salzo, Silvia Villa
    Analysis and Applications 16
    Abstract: This paper proposes a unified framework for the investigation of constrained learning theory in reflexive Banach spaces of features via regularized empirical risk minimization. The focus is placed on Tikhonov-like regularization with totally convex functions. This broad class of regularizers provides a flexible model for various priors on the features, including in particular hard constraints and powers of Banach norms. In such context, the main results establish a new general form of the representer theorem and the consistency of the corresponding learning schemes under general conditions on the loss function, the geometry of the feature space, and the modulus of total convexity of the regularizer. In addition, the proposed analysis gives new insight into basic tools such as reproducing Banach spaces, feature maps, and universality. Even when specialized to Hilbert spaces, this framework yields new results that extend the state of the art.
    @article{comb16,
      author  = {Patrick L. Combettes, Saverio Salzo, Silvia Villa},
      title   = {Regularized Learning Schemes in Feature Banach Spaces},
      journal = {Analysis and Applications (To appear)},
      year    = {2016},
      volume  = {},
      number  = {},
      pages   = {1-46},
      url     = {http://dx.doi.org/10.1142/S0219530516500202}
    }
  • Holographic Embeddings of Knowledge Graphs
    Maximilian Nickel and Lorenzo Rosasco and Tomaso Poggio
    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
    Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets.
    @article{nickel2016aaai,
      title={Holographic Embeddings of Knowledge Graphs},
      author={Maximilian Nickel and Lorenzo Rosasco and Tomaso Poggio},
      booktitle={Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence},
      year={2016}
    }
  • Stochastic forward-backward splitting for monotone inclusions
    Lorenzo Rosasco and Silvia Villa and Bằng Công Vũ
    Journal of Optimization Theory and Applications
    Abstract: We prove novel convergence results for a stochastic forward-backward splitting algorithm for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued maximal monotone cocoercive operator.
    @article{rosasco2014b,
      title={Stochastic forward-backward splitting for monotone inclusions},
      author={Lorenzo Rosasco and Silvia Villa and Bằng Công Vũ},
      journal={Journal of Optimization Theory and Applications},
      url={link.springer.com/content/pdf/10.1007%2Fs10957-016-0893-2.pdf},
      year={2016}
    }
  • Incremental Object Recognition in Robotics with Extension to New Classes in Constant Time
    Raffaello Camoriano, Giulia Pasquale, Carlo Ciliberto, Lorenzo Natale, Lorenzo Rosasco and Giorgio Metta
    ArXiv preprint arXiv:1605.05045
    Abstract: We consider object recognition in the context of lifelong learning, where a robotic agent learns to discriminate between a growing number of object classes as it accumulates experience about the environment. We propose an incremental variant of the Regularized Least Squares for Classification (RLSC) algorithm, and exploit its structure to seamlessly add new classes to the learned model. The presented algorithm addresses the problem of having unbalanced proportion of training examples per class, which occurs when new objects are presented to the system for the first time. We evaluate our algorithm on both a machine learning benchmark dataset and a challenging object recognition task in a robotic setting. Empirical evidence on both problems shows that our approach is significantly faster than its batch counterparts while achieving comparable or better classification performance when classes are unbalanced.
    @article{camoriano16incremental,
      author = {Raffaello Camoriano and Giulia Pasquale and Carlo Ciliberto and Lorenzo Natale and Lorenzo Rosasco and Giorgio Metta},
      title = {Incremental Object Recognition in Robotics with Extension to New Classes in Constant Time},
      journal = {ArXiv preprint arXiv:1605.05045},
      year={2016}
    }
  • Generalized support vector regression: duality and tensor-kernel representation
    Saverio Salzo and Johan A.K. Suykens
    ArXiv preprint arXiv:1603.05876
    Abstract: In this paper we study the variational problem associated to support vector regression in Banach function spaces. For that problem, we provide a new computational framework which relies on a tensor-kernel representation. We also present a large class of tensor-kernels to which our theory fully applies: power series tensor kernels.
    @article{salzo16a,
      author  = {Saverio Salzo and Johan A.K. Suykens},
      title   = {Generalized support vector regression: duality and tensor-kernel representation},
      journal = {arXiv},
      year    = {2016},
      volume  = {},
      pages   = {1-30},
      url     = {https://arxiv.org/abs/1603.05876}
    }
  • Generalization Properties of Learning with Random Features
    Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco
    ArXiv preprint arXiv:1602.04474
    Abstract: We study the generalization properties of regularized learning with random features in the statistical learning theory framework. We show that optimal learning errors can be achieved with a number of features smaller than the number of examples. As a byproduct, we also show that learning with random features can be seen as a form of regularization, rather than only a way to speed up computations.
    @article{rudi2016generalization,
      title={Generalization Properties of Learning with Random Features},
      author={Rudi, Alessandro and Camoriano, Raffaello and Rosasco, Lorenzo},
      journal={arXiv preprint arXiv:1602.04474},
      year ={2016}
    }
  • Incremental Semiparametric Inverse Dynamics Learning
    Raffaello Camoriano, Silvio Traversaro, Lorenzo Rosasco, Giorgio Metta and Francesco Nori
    Robotics and Automation (ICRA), 2016 IEEE International Conference on
    Abstract: This paper presents a novel approach for incremental semiparametric inverse dynamics learning. In particular, we consider the mixture of two approaches: Parametric modeling based on rigid body dynamics equations and nonparametric modeling based on incremental kernel methods, with no prior information on the mechanical properties of the system. The result is an incremental semiparametric approach, leveraging the advantages of both the parametric and nonparametric models. We validate the proposed technique learning the dynamics of one arm of the iCub humanoid robot.
    @INPROCEEDINGS{camoriano16icra,
      author  = {Raffaello Camoriano and Silvio Traversaro and Lorenzo Rosasco and Giorgio Metta and Francesco Nori},
      title   = {Incremental Semiparametric Inverse Dynamics Learning},
      booktitle={2016 IEEE International Conference on Robotics and Automation (ICRA)}, ,
      year    = {2016}
    }
  • A first-order stochastic primal-dual algorithm with correction step
    Lorenzo Rosasco, Silvia Villa and Bằng Công Vũ
    Numerical Functional Analysis and Optimization
    Abstract: We investigate the convergence properties of a stochastic primal-dual splitting algorithm for solving structured monotone inclusions involving the sum of a cocoercive operator and a composite monotone operator.
    @article{rosasco2016,
      title={A first-order stochastic primal-dual algorithm with correction step},
      author={Lorenzo Rosasco and Silvia Villa and  Bằng Công Vũ},
      journal={arXiv:1602.07872,
      year={2016}
    }
  • Learning to Predict Sequences of Human Visual Fixations
    Ming Jiang, Xavier Boix, Gemma Roig, Juan Xu, Luc Van Gool and Qi Zhao
    Neural Networks and Learning Systems, IEEE Transactions on, vol.PP, no.99, pp.1-12
    Abstract: Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.
    @article{TNNLS2016,
      author={Ming Jiang and Xavier Boix and Gemma Roig and Juan Xu and Luc Van Gool and Qi Zhao},
      journal={Neural Networks and Learning Systems, IEEE Transactions on},
      title={Learning to Predict Sequences of Human Visual Fixations},
      year={2016},
      volume={PP},
      number={99},
      pages={1-12},
      url     = {http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7374716&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7374716}
    }
  • NYTRO: When Subsampling Meets Early Stopping
    Raffaello Camoriano, Tomas Angles, Alessandro Rudi, Lorenzo Rosasco
    AISTATS 2016
    Abstract: Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines. On the other hand, memory/space (rather than time) complexity is the main constraint in many applications, and randomized subsampling techniques have been proposed to tackle this issue. In this paper we ask whether early stopping and subsampling ideas can be combined in a fruitful way. We consider the question in a least squares regression setting and propose a form of randomized iterative regularization based on early stopping and subsampling. In this context, we analyze the statistical and computational properties of the proposed method. Theoretical results are complemented and validated by a thorough experimental analysis.
    @inproceedings{camoriano2016nytro,
      title={NYTRO: When Subsampling Meets Early Stopping},
      author={Camoriano, Raffaello and Angles, Tomas and Rudi, Alessandro and Rosasco, Lorenzo},
      booktitle={Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
      pages={1403--1411},
      year={2016}
    }
  • Enabling Depth-driven Visual Attention on the iCub Humanoid Robot: Instructions for Use and New Perspectives.
    Giulia Pasquale, Tanis Mar, Carlo Ciliberto, Lorenzo Rosasco and Lorenzo Natale
    Frontiers in Robotics and AI, section Humanoid Robotics, Vol. 3 Num. 35, 2016
    Abstract: Reliable depth perception eases and enables a large variety of attentional and interactive behaviors on humanoid robots. However, the use of depth in real scenarios is hindered by the difficulty of computing real-time and robust binocular disparity maps from moving stereo cameras. On the iCub humanoid robot we recently adopted the Efficient Large-scale Stereo (ELAS) Matching algorithm for computation of the disparity map. In this technical report we show that this algorithm allows reliable depth perception and experimental evidence that demonstrates that it can be used to solve challenging visual tasks in real-world, indoor settings. As a case study we consider the common situation where the robot is asked to focus the attention on one object close in the scene, showing how a simple but effective disparity-based segmentation solves the problem in this case. This example paves the way to a variety of other similar applications.
    @article{asquale15disparity,
      author={Pasquale, Giulia  and  Mar, Tanis  and  Ciliberto, Carlo  and  Rosasco, Lorenzo Andrea  and  Natale, Lorenzo},
      title={Enabling Depth-driven Visual Attention on the iCub Humanoid Robot: Instructions for Use and New Perspectives.},
      journal={Frontiers in Robotics and AI},
      volume={3},
      year={2016},
      number={35},
      url={http://www.frontiersin.org/humanoid_robotics/10.3389/frobt.2016.00035/abstract},
      doi={10.3389/frobt.2016.00035},
      issn={2296-9144} ,
      abstract={Reliable depth perception eases and enables a large variety of attentional and interactive behaviors on humanoid robots. However, the use of depth in real scenarios is hindered by the difficulty of computing real-time and robust binocular disparity maps from moving stereo cameras. On the iCub humanoid robot we recently adopted the Efficient Large-scale Stereo (ELAS) Matching algorithm for computation of the disparity map. In this technical report we show that this algorithm allows reliable depth perception and experimental evidence that demonstrates that it can be used to solve challenging visual tasks in real-world, indoor settings. As a case study we consider the common situation where the robot is asked to focus the attention on one object close in the scene, showing how a simple but effective disparity-based segmentation solves the problem in this case. This example paves the way to a variety of other similar applications.}
    }
  • A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions
    Lorenzo Rosasco and Silvia Villa and Bằng Công Vũ
    Optimization: A Journal of Mathematical Programming and Operations Research
    Abstract: We propose an inertial forward-backward splitting algorithm to compute the zero of a sum of two monotone operators allowing for stochastic errors in the computation of the operators. More precisely, we establish almost sure convergence in real Hilbert spaces of the sequence of iterates to an optimal solution.
    @article{rosasco2015b,
      title={A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions},
      author={Lorenzo Rosasco and Silvia Villa and  Bằng Công Vũ},
      booktitle={Optimization: A Journal of Mathematical Programming and Operations Research},
      year={2016}
    }
  • A Review of Relational Machine Learning for Knowledge Graphs
    Maximilian Nickel
    Proceedings of the IEEE, vol. 104, no. 1, pp. 11-33, Jan. 2016
    Abstract: Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be 'trained' on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two fundamentally different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on latent feature models such as tensor factorization and multiway neural networks. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. To this end, we also discuss Google's Knowledge Vault project as an example of such combination.
    @ARTICLE{nickel2016ieee,
      author={Maximilian Nickel and Kevin Murphy and Volker Tresp and Evgeniy. Gabrilovich},
      journal={Proceedings of the IEEE},
      title={A Review of Relational Machine Learning for Knowledge Graphs},
      year={2016},
      volume={104},
      number={1},
      pages={11-33},
      keywords={data mining;graph theory;learning (artificial intelligence);statistical analysis;text analysis;Google knowledge vault project;automatic knowledge graph construction;computational cost;graph edges;graph-structured data;latent feature models;multiway neural networks;observable pattern mining;relational data;relational machine learning;statistical analysis;statistical relational models;tensor factorization;text-based information extraction methods;Big data;Biological system modeling;Computer graphs;Data mining;Knowledge based systems;Machine learning;Predictive models;Resource description framework;Statistical analysis;Graph-based models;knowledge extraction;knowledge graphs;latent feature models;statistical relational learning},
      doi={10.1109/JPROC.2015.2483592},
      ISSN={0018-9219},
      month={Jan}
    }
  • A machine learning pipeline for multiple sclerosis course detection from clinical scales and patient reported outcomes
    Fiorini S., Verri A., Tacchino A., Ponzio M., Brichetto G., Barla A.
    Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
    Abstract: In this work we present a machine learning pipeline for the detection of multiple sclerosis course from a collection of inexpensive and non-invasive measures such as clinical scales and patient-reported outcomes. The proposed analysis is conducted on a dataset coming from a clinical study comprising 457 patients affected by multiple sclerosis. The 91 collected variables describe patients mobility, fatigue, cognitive performance, emotional status, bladder continence and quality of life. A preliminary data exploration phase suggests that the group of patients diagnosed as Relapsing-Remitting can be isolated from other clinical courses. Supervised learning algorithms are then applied to perform feature selection and course classification. Our results confirm that clinical scales and patient-reported outcomes can be used to classify Relapsing-Remitting patients.
  • Advances in dynamic modeling of colorectal cancer signalingnetwork regions, a path toward targeted therapies
    Tortolina L., Duffy D.J., Maffei M., Castagnino N., Carmody A.M., Kolch W., Kholodenko B.N., De Ambrosi C., Barla A., Biganzoli E.M., Nencioni A., Patrone F., Ballestrero A., Zoppoli G., Verri A., Parodi S.
    Oncotarget
  • Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm
    Masecchia S., Coco S., Barla A., Verri A., Tonini G.P.
    BMC Medical Genomics
    Abstract: Metastatic neuroblastoma (NB) occurs in pediatric patients as stage 4S or stage 4 and it is characterized by heterogeneous clinical behavior associated with diverse genotypes. Tumors of stage 4 contain several structural copy number aberrations (CNAs) rarely found in stage 4S. To date, the NB tumorigenesis is not still elucidated, although it is evident that genomic instability plays a critical role in the genesis of the tumor. Here we propose a mathematical approach to decipher genomic data and we provide a new model of NB metastatic tumorigenesis.
  • The Hour of Code has arrived in Genoa! [L'Ora del Codice è arrivata a Genova!]
    Ancona D., Barla A., Catania B., Delzanno G., Guerrini G., Mascardi V., Odone F., Ribaudo M.
    Mondo Digitale
  • Saliency Prediction with Active Semantic Segmentation
    Ming Jiang, Xavier Boix, Juan Xu, Gemma Roig, Luc Van Gool and Qi Zhao
    Proc. of British Machine Vision Conference, 2015.
    Abstract: Semantic-level features have been shown to provide a strong cue for predicting eye fixations. They are usually implemented by evaluating object classifiers everywhere in the image. As a result, extracting the semantic-level features may become a computational bottleneck that may limit the applicability of saliency prediction in real-time applications. In this paper, to reduce the computational cost at the semantic level, we introduce a saliency prediction model based on active semantic segmentation, where a set of new features are extracted during the progressive extraction of the semantic labeling. We recorded eye fixations on all the images of the popular MSRC-21 and VOC07 datasets. Experiments in this new dataset demonstrate that the semantic-level features extracted from active semantic segmentation improve the saliency prediction from low- and regional-level features, and it allows controlling the computational overhead of adding semantics to the saliency predictor.
    @inproceedings{BMVC_15,
      author  = {Ming Jiang and Xavier Boix and Juan Xu and Gemma Roig and Luc Van Gool and Qi Zhao},
      title   = {Saliency Prediction with Active Semantic Segmentation},
      journal = {Proc. of British Machine Vision Conference},
      year    = {2015},
      url     = {http://www.bmva.org/bmvc/2015/papers/paper015/index.html}
    }
  • Foveation-based Mechanisms Alleviate Adversarial Examples
    Yan Luo, Xavier Boix, Gemma Roig, Tomaso Poggio and Qi Zhao
    arXiv:1511.06292, 2015.
    Abstract: We show that adversarial examples, i.e., the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly. Then, we corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation. This is because, hypothetically, the CNNs for ImageNet are robust to changes of scale and translation of the object produced by the foveation, but this property does not generalize to transformations of the perturbation. As a result, the accuracy after a foveation is almost the same as the accuracy of the CNN without the adversarial perturbation, even if the adversarial perturbation is calculated taking into account a foveation.
    @inproceedings{Foveation2015,
      author  = {Yan Luo and Xavier Boix and Gemma Roig and Tomaso Poggio and Qi Zhao},
      title   = {Foveation-based Mechanisms Alleviate Adversarial Examples},
      journal = {arXiv:1511.06292},
      year    = {2015},
      url     = {http://arxiv.org/abs/1511.06292}
    }
  • Modified Fejer sequences and applications
    Junhong Lin, Lorenzo Rosasco, Silvia Villa and Ding-Xuan Zhou
    arXiv:1510.04641
    Abstract: In this note, we propose and study the notion of modified Fejer sequences. Within a Hilbert space setting, we show that it provides a unifying framework to prove convergence rates for objective function values of several optimization algorithms. In particular, our results apply to forward-backward splitting algorithm, incremental subgradient proximal algorithm, and the Douglas-Rachford splitting method including and generalizing known results.
    @techreport{lin2015,
      title={Modified Fejer sequences and applications},
      author={Lin, Junhong and Rosasco, Lorenzo and Villa, Silvia and Zhou, Ding-Xuan},
      booktitle={arXiv preprint arXiv:1510.04641},
      year={2015}
    }
  • Deep Convolutional Networks are Hierarchical Kernel Machines
    Anselmi, Fabio and Rosasco, Lorenzo and Tan, Cheston and Poggio, Tomaso
    arXiv:1508.01084
    Abstract: In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group. Such layers include as special cases the convolutional layers of Deep Convolutional Networks (DCNs) as well as the non-convolutional layers (when the group contains only the identity). Rectifying nonlinearities -- which are used by present-day DCNs -- are one of the several nonlinearities admitted by i-theory for the HW module. We discuss here the equivalence between group averages of linear combinations of rectifying nonlinearities and an associated kernel. This property implies that present-day DCNs can be exactly equivalent to a hierarchy of kernel machines with pooling and non-pooling layers. Finally, we describe a conjecture for theoretically understanding hierarchies of such modules. A main consequence of the conjecture is that hierarchies of trained HW modules minimize memory requirements while computing a selective and invariant representation.
    @inproceedings{anselmi2015deep,
      title={Deep Convolutional Networks are Hierarchical Kernel Machines},
      author={Anselmi, Fabio and Rosasco, Lorenzo and Tan, Cheston and Poggio, Tomaso},
      booktitle={arXiv preprint arXiv:1508.01084},
      year={2015}
    }
  • Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature
    Anna V. Little, Mauro Maggioni and Lorenzo Rosasco
    Applied and Computational Harmonic Analysis
    Abstract: Large data sets are often modeled as being noisy samples from probability distributions ?? in R D, with D large. It has been noticed that oftentimes the supportMof these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k ??? D, with k-dimensional manifolds isometrically embedded in R D being a special case. Samples from ?? are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M.
    @inproceedings{little2015,
      title={Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature},
      author={Little, Anna V. and Maggioni, Mauro and Rosasco, Lorenzo},
      booktitle={Applied and Computational Harmonic Analysis},
      year={2015}
    }
  • Less is More: Nystr"om Computational Regularization
    Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco
    NeurIPS 2015 (Oral)
    Abstract: We study Nystr"om type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered. In particular, we prove that these approaches can achieve optimal learning rates, provided the subsampling level is suitably chosen. These results suggest a simple incremental variant of Nystr"om Kernel Regularized Least Squares, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations. Extensive experimental analysis shows that the considered approach achieves state of the art performances on benchmark large scale datasets.
    @inproceedings{rudi2015less,
      title={Less is More: Nystr"om Computational Regularization},
      author={Rudi, Alessandro and Camoriano, Raffaello and Rosasco, Lorenzo},
      booktitle={Advances in Neural Information Processing Systems}, pages={1648--1656},
      year={2015}
    }
  • Learning with incremental iterative regularization
    Lorenzo Rosasco and Silvia Villa
    NeurIPS 2015
    Abstract: Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as well as sharp finite sample bounds for the iterates.
    @incollection{rosasco2015c,
      title={Learning with incremental iterative regularization},
      author={Lorenzo Rosasco and Silvia Villa},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS) 28},
      year={2015}
    }
  • Stochastic inertial primal-dual algorithms
    Lorenzo Rosasco, Silvia Villa and Bằng Công Vũ
    arXiv:1507.00852
    Abstract: We propose and study a novel stochastic inertial primal-dual approach to solve composite optimization problems. These latter problems arise naturally when learning with penalized regularization schemes. Our analysis provide convergence results in a general setting, that allows to analyze in a unified framework a variety of special cases of interest. Key in our analysis is considering the framework of splitting algorithm for solving a monotone inclusions in suitable product spaces and for a specific choice of preconditioning operators.
    @inproceedings{rosasco2015,
      title={ Stochastic inertial primal-dual algorithms},
      author={Rosasco, Lorenzo and Villa, Silvia and Vu, Bang Cong},
      booktitle={arXiv:1507.00852},
      year={2015}
    }
  • Discovering Discrete Subword Units with Binarized Autoencoders and Hidden-Markov-Model Encoders
    L. Badino, A. Mereta and L. Rosasco
    INTERSPEECH 2015 - 16th Annual Conf. of the International Speech Communication Association.
    Abstract: In this paper we address the problem of unsupervised learning of discrete subword units. Our approach is based on Deep Autoencoders (AEs), whose encoding node values are thresholded to subsequently generate a symbolic, i.e., 1-of-K (with K = No. of subwords), representation of each speech frame. We experiment with two variants of the standard AE which we have named Binarized Autoencoder and Hidden-Markov-Model Encoder. The first forces the binary encoding nodes to have a U-shaped distribution (with peaks at 0 and 1) while minimizing the reconstruction error. The latter jointly learns the symbolic encoding representation (i.e., subwords) and the prior and transition distribution probabilities of the learned subwords. The ABX evaluation of the Zero Resource Challenge - Track 1 shows that a deep AE with only 6 encoding nodes, which assigns to each frame a 1-of-K binary vector with K = 2^6, can outperform real-valued MFCC representations in the across-speaker setting. Binarized AEs can outperform standard AEs when using a larger number of encoding nodes, while HMM Encoders may allow more compact subword transcriptions without worsening the ABX performance.
    @inproceedings{badino2015,
      title={Discovering Discrete Subword Units with Binarized Autoencoders and Hidden-Markov-Model Encoders},
      author={Badino, L. and Mereta, A. and Rosasco, L.},
      booktitle={INTERSPEECH 2015 - 16th Annual Conf. of the International Speech Communication Association},
      address={Dresden, Germany},
      year={2015}
    }
  • Almost sure convergence of the forward-backward-forward splitting algorithm
    Bằng Công Vũ
    Optimization Letters 10
    Abstract: In this paper, we propose a stochastic forward-backward-forward splitting algorithm and prove its almost sure weak convergence in real separable Hilbert spaces. Applications to composite monotone inclusion and minimization problems are demonstrated.
    @article{vu2015,
      title={Almost sure convergence of the forward-backward-forward splitting algorithm},
      author={B. C. Vu},
      journal={Optimization Letters},
      year={2015}
    }
  • Teaching iCub to recognize objects using deep Convolutional Neural Networks
    Giulia Pasquale, Carlo Ciliberto, Francesca Odone, Lorenzo Rosasco and Lorenzo Natale
    Proceedings of the 4th Workshop on Machine Learning for Interactive Systems, 32nd International Conference on Machine Learning
    Abstract: Providing robots with accurate and robust visual recognition capabilities in the real-world today is a challenge which prevents the use of autonomous agents for concrete applications. Indeed, the majority of tasks, as manipulation and interaction with other agents, critically depends on the ability to visually recognize the entities involved in a scene. At the same time, computer vision systems based on deep Convolutional Neural Networks (CNNs) are marking a breakthrough in fields as largescale image classification and retrieval. In this work we investigate how latest results on deep learning can advance the visual recognition capabilities of a robotic platform (the iCub humanoid robot) in a real-world scenario. We benchmark the performance of the resulting system on a new dataset of images depicting 28 objects, named iCubWorld28, that we plan on releasing. As in the spirit of the iCubWorld dataset series, this has been collected in a framework reflecting the typical iCub???s daily visual experience. Moreover, in this release we provide four different acquisition sessions, to test incremental learning capabilities over multiple days. Our study addresses the question: how many objects can the iCub recognize today?
    @inproceedings{pasquale15,
      author  = {Giulia Pasquale and Carlo Ciliberto and Francesca Odone and Lorenzo Rosasco and Lorenzo Natale},
      title   = {Teaching iCub to recognize objects using deep Convolutional Neural Networks},
      journal = {Proceedings of the 4th Workshop on Machine Learning for Interactive Systems, 32nd International Conference on Machine Learning},
      year    = {2015},
      volume  = {43},
      pages   = {21--25},
      url     = {http://www.jmlr.org/proceedings/papers/v43/pasquale15}
    }
  • Real-world Object Recognition with Off-the-shelf Deep Conv Nets: How Many Objects can iCub Learn?
    Giulia Pasquale, Carlo Ciliberto, Francesca Odone, Lorenzo Rosasco and Lorenzo Natale
    ArXiv preprint arXiv:1504.03154
    Abstract: The ability to visually recognize objects is a fundamental skill for robotics systems. Indeed, a large variety of tasks involving manipulation, navigation or interaction with other agents, deeply depends on the accurate understanding of the visual scene. Yet, at the time being, robots are lacking good visual perceptual systems, which often become the main bottleneck preventing the use of autonomous agents for real-world applications. Lately in computer vision, systems that learn suitable visual representations and based on multi-layer deep convolutional networks are showing remarkable performance in tasks such as large-scale visual recognition and image retrieval. To this regard, it is natural to ask whether such remarkable performance would generalize also to the robotic setting. In this paper we investigate such possibility, while taking further steps in developing a computational vision system to be embedded on a robotic platform, the iCub humanoid robot. In particular, we release a new dataset that we use as a benchmark to address the question: how many objects can iCub recognize? Our study is developed in a learning framework which reflects the typical visual experience of a humanoid robot like the iCub. Experiments shed interesting insights on the strength and weaknesses of current computer vision approaches applied in real robotic settings.
    @article{pasquale2015icub,
      title={Real-world Object Recognition with Off-the-shelf Deep Conv Nets: How Many Objects can iCub Learn?},
      author={Pasquale, Giulia and Ciliberto, Carlo and Odone, Francesca and Rosasco, Lorenzo and Natale, Lorenzo},
      journal={arXiv preprint arXiv:1504.03154},
      year={2015}
    }
  • Invariant representations for action recognition in the visual system.
    Andrea Tacchetti, Leyla Isik and Tomaso Poggio
    Journal of Vision, Vol. 15 Num. 12 558-558, 2015.
    Abstract: The human brain can rapidly parse a constant stream of visual input. The majority of visual neuroscience studies, however, focus on responses to static, still-frame images. Here we use magnetoencephalography (MEG) decoding and a computational model to study invariant action recognition in videos. We created a well-controlled, naturalistic dataset to study action recognition across different views and actors. We find that, like objects, actions can also be read out from MEG data in under 200 ms (after the subject has viewed only 5 frames of video). Action can also be decoded across actor and viewpoint, showing that this early representation is invariant. Finally, we developed an extension of the HMAX model, inspired by Hubel and Wiesel???s findings of simple and complex cells in primary visual cortex as well as a recent computational theory of the feedforward invariant systems, which is traditionally used to perform size- and position-invariant object recognition in images, to recognize actions. We show that instantiations of this model class can also perform recognition in natural videos that are robust to non-affine transformations. Specifically, view-invariant action recognition and action invariant actor identification in the model can be achieved by pooling across views or actions, in the same manner and model layer as affine transformations (size and position) in traditional HMAX. Together these results provide a temporal map of the first few hundred milliseconds of human action recognition as well as a mechanistic explanation of the computations underlying invariant visual recognition. Meeting abstract presented at VSS 2015
    @article{tacchetti2015invariant,
      title={Invariant representations for action recognition in the visual system.},
      author={Tacchetti, Andrea and Isik, Leyla and Poggio, Tomaso},
      journal={Journal of vision},
      volume={15},
      number={12},
      pages={558--558},
      year={2015}
    }
  • Unsupervised learning of invariant representations
    Fabio Anselmi, Joel Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti and Tomaso Poggio
    Theoretical Computer Science. Vol. 63 112-121, 2016.
    Abstract: The present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (n??????). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (n???1), like humans seem able to do. We propose an approach to this problem and describe the underlying theory, based on the unsupervised, automatic learning of a ???good??? representation for supervised learning, characterized by small sample complexity. We consider the case of visual object recognition, though the theory also applies to other domains like speech. The starting point is the conjecture, proved in specific cases, that image representations which are invariant to translation, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and selective signature can be computed for each image or image patch: the invariance can be exact in the case of group transformations and approximate under non-group transformations. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such signature. The theory offers novel unsupervised learning algorithms for ???deep??? architectures for image and speech recognition. We conjecture that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and selective for recognition???and show how this representation may be continuously learned in an unsupervised way during development and visual experience.
    @article{anselmi2015unsupervised,
      title={Unsupervised learning of invariant representations},
      author={Anselmi, Fabio and Leibo, Joel Z and Rosasco, Lorenzo and Mutch, Jim and Tacchetti, Andrea and Poggio, Tomaso},
      journal={Theoretical Computer Science},
      year={2015},
      publisher={Elsevier}
    }
  • Convex Learning of Multiple Tasks and their Structure
    Carlo Ciliberto, Youssef Mroueh, Tomaso Poggio and Lorenzo Rosasco
    International Conference on Machine Learning (ICML) 2015, preprint arxiv:1504.03101
    Abstract: Reducing the amount of human supervision is a key problem in machine learning and a natural approach is that of exploiting the relations (structure) among different tasks. This is the idea at the core of multi-task learning. In this context a fundamental question is how to incorporate the tasks structure in the learning problem.We tackle this question by studying a general computational framework that allows to encode a-priori knowledge of the tasks structure in the form of a convex penalty; in this setting a variety of previously proposed methods can be recovered as special cases, including linear and non-linear approaches. Within this framework, we show that tasks and their structure can be efficiently learned considering a convex optimization problem that can be approached by means of block coordinate methods such as alternating minimization and for which we prove convergence to the global minimum.
    @article{ciliberto2015multitask,
      title={Convex Learning of Multiple Tasks and their Structure},
      author={Ciliberto, Carlo and Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo},
      journal={arXiv preprint arXiv:1504.03101},
      year={2015}
    }
  • Learning Multiple Visual Tasks while Discovering their Structure
    Carlo Ciliberto and Lorenzo Rosasco and Silvia Villa
    IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, preprint arXiv:1504.03106v1
    Abstract: Multi-task learning is a natural approach for computer vision applications that require the simultaneous solution of several distinct but related problems, e.g. object detection, classification, tracking of multiple agents, or denoising, to name a few. The key idea is that exploring task relatedness (structure) can lead to improved performances. In this paper, we propose and study a novel sparse, non-parametric approach exploiting the theory of Reproducing Kernel Hilbert Spaces for vector-valued functions. We develop a suitable regularization framework which can be formulated as a convex optimization problem, and is provably solvable using an alternating minimization approach. Empirical tests show that the proposed method compares favorably to state of the art techniques and further allows to recover interpretable structures, a problem of interest in its own right.
    @inproceedings{ciliberto2015multivistask,
      title={Learning Multiple Visual Tasks while Discovering their Structure},
      author={Ciliberto, Carlo and Rosasco, Lorenzo and Villa, Silvia},
      booktitle={IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015},
      year={2015}
    }
  • Discriminative Template Learning in Group-Convolutional Networks for Invariant Speech Representations
    Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco and Tomaso Poggio
    INTERSPEECH 2015 - 16th Annual Conf. of the International Speech Communication Association.
    Abstract: In the framework of a theory for invariant sensory signal representations, a signature which is invariant and selective for speech sounds can be obtained through projections in template signals and pooling over their transformations under a group. For locally compact groups, e.g., translations, the theory explains the resilience of convolutional neural networks with filter weight sharing and max pooling across their local translations in frequency or time. In this paper we propose a discriminative approach for learning an optimum set of templates, under a family of transformations, namely frequency transpositions and perturbations of the vocal tract length, which are among the primary sources of speech variability. Implicitly, we generalize convolutional networks to transformations other than translations, and derive data-specific templates by training a deep network with convolution-pooling layers and densely connected layers. We demonstrate that such a representation, combining group-generalized convolutions, theoretical invariance guarantees and discriminative template selection, improves frame classification performance over standard translation-CNNs and DNNs on TIMIT and Wall Street Journal datasets.
    @inproceedings{zhang2015,
      author    = {Zhang, Chiyuan and Voinea, Stephen and Evangelopoulos, Georgios and Rosasco, Lorenzo and Poggio, Tomaso},
      title     = {Discriminative Template Learning in Group-Convolutional Networks for Invariant Speech Representations},
      booktitle = {INTERSPEECH 2015 - 16th Annual Conf. of the International Speech Communication Association},
      year      = {2015},
      pages     = {3229-3233},
      address   = {Dresden, Germany},
      url       = {http://www.isca-speech.org/archive/interspeech_2015/i15_3229.html}
    }
  • A machine learning pipeline for identification of discriminant pathways
    Barla A., Jurman G., Visintainer R., Squillario M., Filosi M., Riccadonna S., Furlanello C.
    Springer Handbook of Bio-/Neuroinformatics
    Abstract: Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more generally, in systems biology. This chapter describes a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. Different algorithms can be chosen to implement the workflow steps. Three applications on genome-wide data are presented regarding the susceptibility of children to air pollution, and early and late onset of Parkinsonʼs and Alzheimerʼs diseases.
  • Alternating proximal regularized dictionary learning
    Salzo S., Masecchia S., Verri A., Barla A.
    Neural Computation
    Abstract: We present an algorithm for dictionary learning that is based on the alternating proximal algorithm studied by Attouch, Bolte, Redont, and Soubeyran (2010), coupled with a reliable and efficient dual algorithm for computation of the related proximity operators. This algorithm is suitable for a general dictionary learning model composed of a Bregman-type data fit term that accounts for the goodness of the representation and several convex penalization terms on the coefficients and atoms, explaining the prior knowledge at hand. As Attouch et al. recently proved, an alternating proximal scheme ensures better convergence properties than the simpler alternating minimization. We take care of the issue of inexactness in the computation of the involved proximity operators, giving a sound stopping criterion for the dual inner algorithm, which keeps under control the related errors, unavoidable for such a complex penalty terms, providing ultimately an overall effective procedure. Thanks to the generality of the proposed framework, we give an application in the context of genome-wide data understanding, revising the model proposed by Nowak, Hastie, Pollack, and Tibshirani (2011). The aim is to extract latent features (atoms) and perform segmentation on array-based comparative genomic hybridization (aCGH) data. We improve several important aspects that increase the quality and interpretability of the results. We show the effectiveness of the proposed model with two experiments on synthetic data, which highlight the enhancements over the original model.
  • Identification of pathway signatures in Parkinson's disease with gene ontology and sparse regularization
    Squillario M., Zycinski G., Barla A., Verri A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: The purpose of this work is to compare Knowledge Driven Variable Selection (KDVS), a novel method for biomarkers, processes and functions identification with the most frequently used pipeline in the analysis of high–throughput data (Standard pipeline). While in the Standard pipeline the biological knowledge is used after the variable selection and classification phase, in KDVS it is used a priori to structure the data matrix. We analyze the same gene expression dataset using ℓ1ℓ2FS , a regularization method for variable selection and classification, choosing Gene Ontology (GO) as source of biological knowledge. We compare the lists identified by the pipelines with state–of–the–art benchmark lists of genes and GO terms known to be related with Parkinson’s disease (PD). The results indicate that KDVS performs significantly better than the Standard pipeline.
  • Quantitative echography in primary uveal melanoma treated by proton beam therapy
    Mosci C., Lanza F.B., Mosci S., Barla A.
    Canadian Journal of Ophthalmology
    Abstract: Objective To describe the dynamics of thickness and internal reflectivity after proton beam therapy (PBT) in uveal melanoma. Participants One hundred and ninety-eight consecutive patients with choroidal or ciliary body melanoma treated by PBT were retrospectively considered. Methods The post-PBT follow-up included ophthalmologic examination, retinography, and B and A modes of standardized echography every 6 months. A total of 1393 examinations were performed. We take into account 4 tumour categories according to the seventh TNM classification. Results Before PBT, tumour thickness ranged from 1.5 to 12.5 mm with a mean of 5.9 mm. Its decrease after radiotherapy was best fitted by the sum of a first-order exponential decay and a constant with a decay half-life of 15 months. Based on the fit, tumour thickness stabilized on a constant value representing, on average, 47% of the initial value. Mean internal reflectivity before PBT was 68%. The dynamics of the reflectivity were best fitted by an exponential and a constant, with rise half-life of 11 months, and stability value of 87%. Conclusions We found that ultrasonographic dynamics of uveal melanoma treated by PBT resembles a function composed of the sum of a constant and a first-order exponential, as previously noted in studies on brachytherapy. Interestingly, after PBT, because of its shorter half-life, internal reflectivity has a faster dynamic response than thickness in large tumours, suggesting that increase of internal reflectivity is a more sensitive indicator of early response to therapy in larger tumours. Résumé Objet Description de la dynamique de l’épaisseur et de la réflectivité interne après la thérapie du faisceau de proton (TFP) du mélanome uvéal. Nature Étude de cohorte rétrospective. Participants Cent quatre-vingt-dix-huit patients consécutifs atteints d’un mélanome choroïdien ou du corps ciliaire et ayant reçu une TFP ont été considérés. Méthodes Après la TFP, le suivi a compris l’examen ophtalmologique, la rétinographie et les modes A et B d’échographie standardisée aux 6 mois. En tout, 1 393 examens ont été effectués. Nous avons tenu compte de quatre catégories de tumeur selon la septième classification TNM. Résultats Avant la TFP, l’épaisseur des tumeurs variait entre 1,5 et 12,5 mm, avec une moyenne de 5,9 mm. Sa réduction après la radiothérapie a eu la meilleure concordance avec la somme d’une décroissance exponentielle de premier ordre et d’une constante ayant une demi-vie de décroissance de 15 mois. Selon les critères, l’épaisseur de la tumeur s’est stabilisée sur une valeur constante représentant en moyenne 47 % de la valeur initiale. La moyenne de réflectivité interne avant la TFP était de 68 %. La dynamique de la réflectivité concordait le mieux à une valeur exponentielle et constante, avec une hausse de demi-vie de 11 mois et une stabilité de 87 %. Conclusions Nous avons trouvé que la dynamique ultrasonographique du mélanome uvéal traité par la TFP ressemble à une fonction composée de la somme d’une constante et d’un exponentiel de premier ordre, tel que mentionné précédemment dans les études sur la brachythérapie. Fait intéressant, après la TFP, étant donné sa demi-vie plus courte, la réflectivité interne a une réponse dynamique plus rapide que l’épaisseur d’une grande tumeur, ce qui suggère que l’accroissement de la réflectivité interne est un indicateur plus sensible de la rapidité de la réaction à la thérapie des plus grandes tumeurs.
  • Regenerative medicine for the treatment of teno-desmic injuries of the equine. A series of 150 horses treated with platelet-derived growth factors
    Scala M., Lenarduzzi S., Spagnolo F., Trapasso M., Ottonello C., Muraglia A., Barla A., Strada P.
    In Vivo
    Abstract: BACKGROUND/AIM: The aim of the present study was to evaluate the safety and the clinical outcome of platelet-rich plasma for the treatment of teno-desmic injures in competition horses. PATIENTS AND METHODS: From January 2009 to December 2011, 150 sport horses suffering from teno-desmic injuries were treated with no-gelled platelet-concentrate. RESULTS: No horse showed any major adverse reaction as a result of the procedure. Full healing was obtained for 81% of the horses. Twelve percent had clinical improvement and only 7% a failure. Eight percent of cases of relapse were observed. No statistically significant correlation existed between clinical outcome and the area of the lesion. A statistically significant correlation existed between the clinical outcome and the age of the horse. CONCLUSION: Treatment with platelet-derived growth factors leads to the formation of a tendon with normal morphology and functionality, which translate in the resumption of the agonistic activity for the horses we treated.
  • The use of platelet-rich plasma gel in patients with mixed tumour undergoing superficial parotidectomy: A randomized study
    Scala M., Mereu P., Spagnolo F., Massa M., Barla A., Mosci S., Forno G., Ingenito A., Strada P.
    In Vivo
    Abstract: BACKGROUND/AIM: Salivary gland tumors are mostly benign tumors. Whether a more conservative surgical approach at greater risk of recurrence, or a more radical intervention with an increased risk of facial paralysis is warranted is still under discussion. Our study addresses the opportunity for improving surgical outcome by employing platelet-rich plasma (PRP) gel at the surgical site. PATIENTS AND METHODS: Twenty consecutive patients undergoing superficial parotidectomy were randomized and assigned to two groups, one with and one without PRP gel. Many parameters were evaluated after surgery and during follow-up, such as the duration of hospitalization, facial nerve deficit, onset of Frey's syndrome, relapse, cosmetic results, presence of keloid or scar depressions, behavior of several facial muscles. RESULTS: Our explorative analysis suggests a positive effect of PRP on surgical outcome in patients undergoing parotidectomy, whereas no negative effects were detected. CONCLUSION: This work suggests that administration of PRP in patients undergoing parotidectomy is beneficial.
  • Alternating Proximal Regularized Dictionary Learning
    Saverio Salzo and Salvatore Masecchia and Alessandro Verri and Annalisa Barla
    Neural Computation, 2014.
    @article{salzo2014a,
      author  = {Saverio Salzo and Salvatore Masecchia and Alessandro Verri and Annalisa Barla},
      title   = {Alternating Proximal Regularized Dictionary Learning},
      journal = {Neural Computation},
      year    = {2014},
      volume  = {26},
      number = {12},
      pages   = {2855-2895},
      url     = {http://www.mitpressjournals.org/doi/10.1162/NECO_a_00672}
    }
  • Regularized Dictionary Learning
    Annalisa Barla and Saverio Salzo and Alessandro Verri
    In Regularization, Optimization, Kernels, and Support Vector Machines. CRC Press, 2014.
    @incollection{salzo2014b,
      author  = {Annalisa Barla and Saverio Salzo and Alessandro Verri},
      booktitle={{R}egularization, {O}ptimization, {K}ernels, and {S}upport {V}ector {M}achines},
      title   = {Regularized Dictionary Learning},
      series = {Chapman  Hall/CRC Machine Learning Pattern Recognition},
      publisher = {CRC Press},
      year   = {2014},
      url     = {}
    }
  • Self-Adaptable Templates for Feature Coding
    Xavier Boix, Gemma Roig, Salomon Diether and Luc V. Gool
    NeurIPS 2014 - Advances in Neural Information Processing Systems
    Abstract: Hierarchical feed-forward networks have been successfully applied in object recognition. At each level of the hierarchy, features are extracted and encoded, followed by a pooling step. Within this processing pipeline, the common trend is to learn the feature coding templates, often referred as codebook entries, filters, or over-complete basis. Recently, an approach that apparently does not use templates has been shown to obtain very promising results. This is the second-order pooling (O2P). In this paper, we analyze O2P as a coding-pooling scheme. We find that at testing phase, O2P automatically adapts the feature coding templates to the input features, rather than using templates learned during the training phase. From this finding, we are able to bring common concepts of coding-pooling schemes to O2P, such as feature quantization. This allows for significant accuracy improvements of O2P in standard benchmarks of image classification, namely Caltech101 and VOC07.
    @inproceedings{boix2014self,
      title={Self-Adaptable Templates for Feature Coding},
      author={Boix, Xavier and Roig, Gemma and Diether, Salomon and Gool, Luc V},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      pages={864--872},
      year={2014}
    }
  • Consistent Learning by Composite Proximal Thresholding
    Patrick L. Combettes, Saverio Salzo, Silvia Villa
    arxiv 1504.04636v1
    Abstract: We investigate random design least-squares regression with prediction functions which are linear combination of elements of a possibly infinite-dimensional dictionary. We propose a new flexible composite regularization model, which makes it possible to apply various priors to the coefficients of the prediction function, including hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent. Moreover, we design an error-tolerant composite proximal thresholding algorithm for computing the estimators. The convergence of this algorithm is established.
    @article{combettes2014consistency,
      title={Consistent Learning by Composite Proximal Thresholding},
      author={Combettes, Patrick L and Salzo, Saverio and Villa, Silvia},
      journal={arXiv preprint arXiv:1504.04636},
      year={2014}
    }
  • Learning with stochastic proximal gradient
    Silvia Villa, Lorenzo Rosasco and Bằng Công Vũ
    NeurIPS 2014 - Advances in Neural Information Processing Systems
    Abstract: We consider composite function minimization problems where only a stochastic estimate of the gradient of the smooth term is available, and in particular regularized online learning. We derive novel finite sample bounds for the natural extension of the classical proximal gradient algorithm. Our approach allows to avoid averaging, a feature which is critical when considering sparsity based methods. Moreover, our results match those obtained by the stochastic extension of accelerated methods, hence suggesting that there is no advantage considering these variants in a stochastic setting.
    @inproceedings{villa2014,
      title={ Learning with stochastic proximal gradient},
      author={Villa, Silvia and Rosasco, Lorenzo and Vu, Bang Cong},
      booktitle={Workshop on Optimization for for Machine Learning. At the conference on Advances in Neural Information Processing Systems (NeurIPS)},
      year={2014}
    }
  • Word-Level Invariant Representations from Acoustic Waveforms
    Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco and Tomaso Poggio
    INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association (student paper award).
    Abstract: Extracting discriminant, transformation-invariant features from raw audio signals remains a serious challenge for speech recognition. The issue of speaker variability is central to this problem, as changes in accent, dialect, gender, and age alter the sound waveform of speech units at multiple levels (phonemes, words, or phrases). Approaches for dealing with this variability have typically focused on analyzing the spectral properties of speech at the level of frames, on par with frame-level acoustic modeling usually applied to speech recognition systems. In this paper, we propose a framework for representing speech at the word level and extracting features from the acoustic, temporal domain, without the need for spectral encoding or preprocessing. Leveraging recent work on unsupervised learning of invariant sensory representations, we extract a signature for a word by first projecting its raw waveform onto a set of templates and their transformations, and then forming empirical estimates of the resulting one-dimensional distributions via histograms. The representation and relevant parameters are evaluated for word classification on a series of datasets with increasing speaker-mismatch difficulty, and the results are compared to those of an MFCC-based representation.
    @inproceedings{voineazhang2014,
      author    = {Voinea, Stephen and Zhang, Chiyuan and Evangelopoulos, Georgios and Rosasco, Lorenzo and Poggio, Tomaso},
      title     = {Word-Level Invariant Representations from Acoustic Waveforms},
      booktitle = {INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association},
      year      = {2014},
      pages     = {2385-2389},
      address   = {Singapore},
      url       = {http://www.isca-speech.org/archive/interspeech_2014/i14_2385.html}
    }
  • Phone Classification by a Hierarchy of Invariant Representation Layers
    Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco and Tomaso Poggio
    INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association.
    Abstract: We propose a multi-layer feature extraction framework for speech, capable of providing invariant representations. A set of templates is generated by sampling the result of applying smooth, identity-preserving transformations (such as vocal tract length and tempo variations) to arbitrarily-selected speech signals. Templates are then stored as the weights of ???neurons???. We use a cascade of such computational modules to factor out different types of transformation variability in a hierarchy, and show that it improves phone classification over baseline features. In addition, we describe empirical comparisons of a) different transformations which may be responsible for the variability in speech signals and of b) different ways of assembling template sets for training. The proposed layered system is an effort towards explaining the performance of recent deep learning networks and the principles by which the human auditory cortex might reduce the sample complexity of learning in speech recognition. Our theory and experiments suggest that invariant representations are crucial in learning from complex, real-world data like natural speech. Our model is built on basic computational primitives of cortical neurons, thus making an argument about how representations might be learned in the human auditory cortex.
    @inproceedings{zhangvoinea2014,
      author    = {Zhang, Chiyuan and Voinea, Stephen and Evangelopoulos, Georgios and Rosasco, Lorenzo and Poggio, Tomaso},
      title     = {Phone Classification by a Hierarchy of Invariant Representation Layers},
      booktitle = {INTERSPEECH 2014 - 15th Annual Conf. of the International Speech Communication Association},
      year      = {2014},
      pages     = {2346-2350},
      address   = {Singapore},
      url       = {http://www.isca-speech.org/archive/interspeech_2014/i14_2346.html}
    }
  • Exploiting Global Force Torque Measurements for Local Compliance Estimation in Tactile Arrays
    Carlo Ciliberto, Luca Fiorio, Marco Maggiali, Lorenzo Natale, Lorenzo Rosasco, Giorgio Metta, Giulio Sandini and Francesco Nori
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2014
    Abstract: In this paper we tackle the problem of estimating the local compliance of tactile arrays exploiting global measurements from a single force and torque sensor. The proposed procedure exploits a transformation matrix (describing the relative position between the local tactile elements and the global force/torque measurements) to define a linear regression problem on the unknown local stiffness. Experiments have been conducted on the foot of the iCub robot, sensorized with a single force/torque sensor and a tactile array of 250 tactile elements (taxels) on the foot sole. Results show that a simple calibration procedure can be employed to estimate the stiffness parameters of virtual springs over a tactile array and to use these model to predict normal forces exerted on the array based only on the tactile feedback. Leveraging on previous works the proposed procedure does not necessarily need a-priori information on the transformation matrix of the taxels which can be directly estimated from available measurements.
    @inproceedings{ciliberto2014,
      title={Exploiting Global Force Torque Measurements for Local Compliance Estimation in Tactile Arrays},
      author={Ciliberto, Carlo and Fiorio, Luca and Maggiali, Marco and Natale, Lorenzo and Rosasco, Lorenzo and Metta, Giorgio and Sandini, Giulio and Nori, Francesco},
      booktitle={Intelligent Robots and Systems (IROS), 2014 IEEE/RSJ International Conference on},
      year={2014},
      organization={IEEE}
    }
  • Learning An Invariant Speech Representation
    Georgios Evangelopoulos, Stephen Voinea, Chiyuan Zhang, Lorenzo Rosasco and Tomaso Poggio
    CBMM Memo 022, arXiv:1406.3884
    Abstract: Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.
    @article{evangelopoulos2014,
      author    = {Evangelopoulos, Georgios and Voinea, Stephen and Zhang, Chiyuan and Rosasco, Lorenzo and Poggio, Tomaso},
      title     = {Learning An Invariant Speech Representation},
      journal   = {CBMM Memo 022, arXiv:1406.3884},
      year      = {2014},
      url       = {http://arxiv.org/abs/1406.3884}
    }
  • Unsupervised Learning of Invariant Representations in Hierarchical Architectures
    Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti and Tomaso Poggio
    arxiv:1311.4158v5
    Abstract: The present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (n??????). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (n???1), like humans seem able to do. We propose an approach to this problem and describe the underlying theory, based on the unsupervised, automatic learning of a ``good'' representation for supervised learning, characterized by small sample complexity (n). We consider the case of visual object recognition though the theory applies to other domains. The starting point is the conjecture, proved in specific cases, that image representations which are invariant to translations, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and unique (discriminative) signature can be computed for each image patch, I, in terms of empirical distributions of the dot-products between I and a set of templates stored during unsupervised learning. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such estimates. Hierarchical architectures consisting of this basic Hubel-Wiesel moduli inherit its properties of invariance, stability, and discriminability while capturing the compositional organization of the visual world in terms of wholes and parts. The theory extends existing deep learning convolutional architectures for image and speech recognition. It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition---and that this representation may be continuously learned in an unsupervised way during development and visual experience.
    @inproceedings{rosasco2014,
      title={Unsupervised Learning of Invariant Representations in Hierarchical Architectures},
      author={Anselmi, Fabio and Leibo, Joel Z. and Rosasco, Lorenzo and Mutch, Jim and Tacchetti, Andrea and Poggio, Tomaso},
      booktitle={arxiv:11311.4158v5},
      year={2014}
    }
  • A Stochastic forward-backward splitting method for solving monotone inclusions in Hilbert spaces
    Lorenzo Rosasco, Silvia Villa and Bang Cong V??
    arxiv:1403.7999v1
    Abstract: We prove novel convergence results for a stochastic forward-backward splitting algorithm for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued maximal monotone cocoercive operator. We derive convergence rates in expectation in the strongly monotone case, as well as almost sure convergence results under weaker assumptions.
    @inproceedings{rosasco2014,
      title={A Stochastic forward-backward splitting method for solving monotone inclusions in Hilbert spaces},
      author={Rosasco, Lorenzo and Villa, Silvia and Cong Vu, Bang},
      booktitle={arxiv:1403.7999v1},
      year={2014}
    }
  • Ask the image: supervised pooling to preserve feature locality
    Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta and Francesca Odone
    Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
    Abstract: In this paper we propose a weighted supervised pooling method for visual recognition systems. We combine a standard Spatial Pyramid Representation which is commonly adopted to encode spatial information, with an appropriate Feature Space Representation favoring semantic information in an appropriate feature space. For the latter, we propose a weighted pooling strategy exploiting data supervision to weigh each local descriptor coherently with its likelihood to belong to a given object class. The two representations are then combined adaptively with Multiple Kernel Learning. Experiments on common benchmarks (Caltech-256 and PASCAL VOC-2007) show that our image representation improves the current visual recognition pipeline and it is competitive with similar state-of-art pooling methods. We also evaluate our method on a real Human-Robot Interaction setting, where the pure Spatial Pyramid Representation does not provide sufficient discriminative power, obtaining a remarkable improvement.
    @inproceedings{fanello2014,
      title={Ask the image: supervised pooling to preserve feature locality},
      author={Fanello, Sean Ryan and Noceti, Nicoletta and Ciliberto, Carlo and Metta, Giorgio and Odone, Francesca},
      booktitle={IEEE Computer Vision and Pattern Recognition},
      year={2014}
    }
  • Convergence of Stochastic Proximal Gradient Algorithm
    Lorenzo Rosasco, Silvia Villa and Bang Cong V??
    arXiv:1403.5074
    Abstract: We prove novel convergence results for a stochastic proximal gradient algorithm suitable for solving a large class of convex optimization problems, where a convex objective function is given by the sum of a smooth and a non-smooth component. We derive convergence rates in expectation in the strongly convex case, as well as almost sure convergence results under weaker assumptions.
    @inproceedings{rosasco2014,
      title={Convergence of Stochastic Proximal Gradient Algorithm},
      author={Rosasco, Lorenzo and Villa, Silvia and Cong Vu, Bang},
      booktitle={arXiv:1403.5074},
      year={2014}
    }
  • A Deep Representation for Invariance and Music Classification
    Chiyuan Zhang, Georgios Evangelopoulos, Stephen Voinea, Lorenzo Rosasco and Tomaso Poggio
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
    Abstract: Representations in the auditory cortex may be learned similar to the visual pathway: a) multiple layers of aggregated and compositional processing and b) modules for building invariance to transformations, while maintaining selectivity to features of sound objects. In this paper we explore the use of such computational modules for extracting invariant and discriminative music representations. Building on a theory of networks for invariance in the cortical visual pathway, we propose a novel, mid-level representation for acoustical signals, using the distributional statistics of projections on a set of templates. Under the assumption that, by construction, this dictionary of templates is composed from certain similar classes, and samples the manifold of variance-inducing signal transformations (such as time/frequency shifting and scaling), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling/aggregation can constitute layers of deep networks, for learning composite audio representations. The paper presents preliminary results towards a theoretical and computational framework for unsupervised audio representation learning, empirically evaluated for music genre classification.
    @inproceedings{zhang2014,
      author    = {Zhang, Chiyuan and Evangelopoulos, Georgios and Voinea, Stephen and Rosasco, Lorenzo and Poggio, Tomaso},
      title     = {A Deep Representation for Invariance and Music Classification},
      booktitle = {2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      year      = {2014},
      pages     = {6984-6988},
      month     = {{May},},
      address   = {Florence, Italy},
      doi       = {10.1109/ICASSP.2014.6854954}
    }
  • Robust Phase Retrieval and Super-Resolution from One Bit Coded Diffraction Patterns
    Youssef Mroueh
    arXiv:1402.2255
    Abstract: In this paper we study a realistic setup for phase retrieval, where the signal of interest is modulated or masked and then for each modulation or mask a diffraction pattern is collected, producing a coded diffraction pattern (CDP) [CLM13]. We are interested in the setup where the resolution of the collected CDP is limited by the Fraunhofer diffraction limit of the imaging system. We investigate a novel approach based on a geometric quantization scheme of phase-less linear measurements into (one-bit) coded diffraction patterns, and a corresponding recovery scheme. The key novelty in this approach consists in comparing pairs of coded diffractions patterns across frequencies: the one bit measurements obtained rely on the order statistics of the un-quantized measurements rather than their values . This results in a robust phase recovery, and unlike currently available methods, allows to efficiently perform phase recovery from measurements affected by severe (possibly unknown) non linear, rank preserving perturbations, such as distortions. Another important feature of this approach consists in the fact that it enables also super-resolution and blind-deconvolution, beyond the diffraction limit of a given imaging system.
    @inproceedings{mroueh2014phaseRetrival,
      title={Robust Phase Retrieval and Super-Resolution from One Bit Coded Diffraction Patterns},
      author={Mroueh, Youssef},
      booktitle={arXiv:1402.2255},
      year={2014}
    }
  • A dictionary learning based method for aCGH segmentation
    Masecchia S., Salzo S., Barla A., Verri A.
    ESANN 2013 proceedings, 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
    Abstract: The starting point of our work is to devise a model for segmentation of aCGH data. We propose an optimization method based on dictionary learning and regularization and we compare it with a stateof-the-art approach, presenting our experimental results on synthetic data.
  • Dictionary learning improves subtyping of breast cancer aCGH data
    Masecchia S., Barla A., Salzo S., Verri A.
    Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
    Abstract: The advent of Comparative Genomic Hybridization (CGH) data led to the development of new mathematical models and computational methods to automatically infer chromosomal alterations. In this work we tackle a standard clustering problem exploiting the good representation properties of a novel method based on dictionary learning. The identified dictionary atoms, which show co-occuring shared alterations among samples, can be easily interpreted by domain experts. We compare a state-of-the-art approach with an original method on a breast cancer dataset.
  • Knowledge Driven Variable Selection (KDVS) - a new approach to enrichment analysis of gene signatures obtained from high-throughput data
    Zycinski G., Barla A., Squillario M., Sanavia T., Di Camillo B., Verri A.
    Source Code for Biology and Medicine
  • Molecular fingerprinting reflects different histotypes and brain region in low grade gliomas
    Mascelli S., Barla A., Raso A., Mosci S., Nozza P., Biassoni R., Morana G., Huber M., Mircean C., Fasulo D., Noy K., Wittemberg G., Pignatelli S., Piatelli G., Cama A., Garré M.L., Capra V., Verri A.
    BMC Cancer
    Abstract: Paediatric low-grade gliomas (LGGs) encompass a heterogeneous set of tumours of different histologies, site of lesion, age and gender distribution, growth potential, morphological features, tendency to progression and clinical course. Among LGGs, Pilocytic astrocytomas (PAs) are the most common central nervous system (CNS) tumours in children. They are typically well-circumscribed, classified as grade I by the World Health Organization (WHO), but recurrence or progressive disease occurs in about 10-20% of cases. Despite radiological and neuropathological features deemed as classic are acknowledged, PA may present a bewildering variety of microscopic features. Indeed, tumours containing both neoplastic ganglion and astrocytic cells occur at a lower frequency.
  • Parameter space exploration within dynamic simulations of signaling networks
    De Ambrosi C., Barla A., Tortolina L., Castagnino N., Pesenti R., Verri A., Ballestrero A., Patrone F., Parodi S.
    Mathematical Biosciences and Engineering
    Abstract: We started offering an introduction to very basic aspects of molecular biology, for the reader coming from computer sciences, information technology, mathematics. Similarly we offered a minimum of information about pathways and networks in graph theory, for a reader coming from the bio-medical sector. At the crossover about the two different types of expertise, we offered some definition about Systems Biology. The core of the article deals with a Molecular Interaction Map (MIM), a network of biochemical interactions involved in a small signaling-network sub-region relevant in breast cancer. We explored robustness/sensitivity to random perturbations. It turns out that our MIM is a non-isomorphic directed graph. For non physiological directions of propagation of the signal the network is quite resistant to perturbations. The opposite happens for biologically significant directions of signal propagation. In these cases we can have no signal attenuation, and even signal amplification. Signal propagation along a given pathway is highly unidirectional, with the exception of signal-feedbacks, that again have a specific biological role and significance. In conclusion, even a relatively small network like our present MIM reveals the preponderance of specific biological functions over unspecific isomorphic behaviors. This is perhaps the consequence of hundreds of millions of years of biological evolution.
  • Regenerative surgery for the definitive repair of chronic ulcers: A series of 34 cases treated with platelet-derived growth factors
    Spagnolo F., Trapasso M., Monteghirfo S., Barla A., Scala M.
    Plastic and Reconstructive Surgery
    Abstract: An abstract is unavailable.
  • Synchronization and Noise: A Mechanism for Regularization in Neural Systems
    Jake Bouvrie and Jean-Jacques Slotine
    arXiv:1312.1632
    Abstract: To learn and reason in the presence of uncertainty, the brain must be capable of imposing some form of regularization. Here we suggest, through theoretical and computational arguments, that the combination of noise with synchronization provides a plausible mechanism for regularization in the nervous system. The functional role of regularization is considered in a general context in which coupled computational systems receive inputs corrupted by correlated noise. Noise on the inputs is shown to impose regularization, and when synchronization upstream induces time-varying correlations across noise variables, the degree of regularization can be calibrated over time. The proposed mechanism is explored first in the context of a simple associative learning problem, and then in the context of a hierarchical sensory coding task. The resulting qualitative behavior coincides with experimental data from visual cortex.
    @inproceedings{bouvrie13,
      title={Synchronization and Noise: A Mechanism for Regularization in Neural Systems},
      author={Bouvrie, Jake and Slotine Jean-Jacques},
      booktitle={arXiv:1312.1632},
      year={2013}
    }
  • Quantization and Greed are Good: One bit Phase Retrieval, Robustness and Greedy Refinements
    Youssef Mroueh and Lorenzo Rosasco
    arXiv:1312.1830
    Abstract: In this paper, we study the problem of robust phase recovery. We investigate a novel approach based on extremely quantized (one-bit) phase-less measurements and a corresponding recovery scheme. The proposed approach has surprising robustness and stability properties and, unlike currently available methods, allows to efficiently perform phase recovery from measurements affected by severe (possibly unknown) non-linear perturbations, such as distortions (e.g. clipping). Beyond robustness, we show how our approach can be used within greedy approaches based on alternating minimization. In particular, we propose novel initialization schemes for the alternating minimization achieving favorable convergence properties with improved sample complexity.
    @inproceedings{mroueh2013phaseRetrival,
      title={Quantization and Greed are Good: One bit Phase Retrieval, Robustness and Greedy Refinements},
      author={Mroueh, Youssef and Rosasco, Lorenzo},
      booktitle={arXiv:1312.1830},
      year={2013}
    }
  • On the Sample Complexity of Subspace Learning
    Alessandro Rudi, Guille D. Canas and Lorenzo Rosasco
    In Advances in Neural Information Processing Systems (NeurIPS) 26. 2013.
    Abstract: A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribution, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.
    @inproceedings{rudiSubspace2013,
      title={On the Sample Complexity of Subspace Learning},
      author={Rudi, Alessandro and D. Canas, Guille and Rosasco, Lorenzo},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS) 26},
      year={2013}
    }
  • A General Convex Framework for Output Kernel Learning
    Carlo Ciliberto, Youssef Mroueh, Tomaso Poggio and Lorenzo Rosasco
    NeurIPS 2013 Workshop on Output Representation Learning, 2013
    Abstract: Multi-task learning exploits the relations among different tasks to reduce the amount of overall supervision required by the learning problem. A natural question is how to model the notion of task structure when it is known a priori and how to recover it from the data when it is not. Here we propose a general framework based on vector-valued Reproducing Kernels Hilbert Spaces that covers several previous methods as special cases. Within this setting, tasks and their structure can be efficiently learned considering a convex output kernel learning problem that can be provably solved by block coordinate descent. We propose a novel penalty to learn sparse tasks structures.
    @inproceedings{cilibertoMTL2013,
      title={A General Convex Framework for Output Kernel},
      author={Ciliberto, Carlo and Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo},
      booktitle={NeurIPS 2013 Workshop on Output Representation Learning},
      year={2013}
    }
  • On the Impact of Learning Hierarchical Representations for Visual Recognition in Robotics
    Carlo Ciliberto, Sean Ryan Fanello, Matteo Santoro, Lorenzo Natale, Giorgio Metta, Lorenzo Rosasco
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013.
    Abstract: Recent developments in learning sophisticated, hierarchical image representations have led to remarkable progress in the context of visual recognition. While these methods are becoming standard in modern computer vision systems, they are rarely adopted in robotics. The question arises of whether solutions, which have been primarily developed for image retrieval, can perform well in more dynamic and unstructured scenarios. In this paper we tackle this question performing an extensive evaluation of state of the art methods for visual recognition on a iCub robot. We consider the problem of classifying 15 different objects shown by a human demonstrator in a challenging Human-Robot Interaction scenario. The classification performance of hierarchical learning approaches are shown to outperform benchmark solutions based on local descriptors and template matching. Our results show that hierarchical learning systems are computationally efficient and can be used for real-time training and recognition of objects.
    @article{cilibertoimpact,
      title={On the Impact of Learning Hierarchical Representations for Visual Recognition in Robotics},
      author={Ciliberto, Carlo and Fanello, Sean Ryan and Santoro, Matteo and Natale, Lorenzo and Metta, Giorgio and Rosasco, Lorenzo}
      booktitle={Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on},
      year={2013},
      organization={IEEE}
    }
  • Nonparametric Sparsity and Regularization
    L. Rosasco, S. Villa, S. Mosci, M. Santoro and A. Verri
    Journal of Machine Learning Research, 14(Jul):1665???1714, 2013.
    Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods.
    @article{JMLR:rosasco13a,
      author  = {Lorenzo Rosasco and Silvia Villa and Sofia Mosci and Matteo Santoro and Alessandro Verri},
      title   = {Nonparametric Sparsity and Regularization},
      journal = {Journal of Machine Learning Research},
      year    = {2013},
      volume  = {14},
      pages   = {1665-1714},
      url     = {http://jmlr.org/papers/v14/rosasco13a.html}
    }
  • Subspace Learning and Empirical Operator Estimation
    Alessandro Rudi, Guille D. Canas and Lorenzo Rosasco
    In Proceedings of the International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines: theory and applications (ROKS) 2013.
    Abstract: This work deals with the problem of linear subspace estimation in a general, Hilbert space setting. We provide bounds that are considerably sharper than existing ones, under equal assumptions. These bounds are also competitive with bounds that are allowed to make strong, further assumptions (on the fourth order moments), even when we do not. Finally, we generalize these results to a family of metrics, allowing for a more general definition of performance.
    @inproceedings{rudiROKS2013,
        author={Alessandro Rudi and Guille D. Canas and Lorenzo Rosasco},
        title={Subspace Learning and Empirical Operator Estimation},
        journal={International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines: theory and applications (ROKS 2013)},
        year={2013}
    }
  • iCub World: Friendly Robots Help Building Good Vision Data-Sets
    Sean Ryan Fanello, Carlo Ciliberto, Matteo Santoro, Lorenzo Natale, Giorgio Metta, Lorenzo Rosasco, Francesca Odone
    IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2013.
    Abstract: In this paper we present and start analyzing the iCub World data-set, an object recognition data-set, we acquired using a Human-Robot Interaction (HRI) scheme and the iCub humanoid robot platform. Our set up allows for rapid acquisition and annotation of data with corresponding ground truth. While more constrained in its scopes -- the iCub world is essentially a robotics research lab -- we demonstrate how the proposed data-set poses challenges to current recognition systems. The iCubWorld data-set is publicly available. The data-set can be downloaded from here.
    @article{fanello2013icub,
      title={iCub World: Friendly Robots Help Building Good Vision Data-Sets},
      author={Fanello, Sean Ryan and Ciliberto, Carlo and Santoro, Matteo and Natale, Lorenzo and Metta, Giorgio and Rosasco, Lorenzo and Odone, Francesca},
      journal={arXiv preprint arXiv:1306.3560},
      year={2013}
    }
  • GURLS: a Least Squares Library for Supervised Learning
    Andrea Tacchetti, Pavan K Mallapragada, Matteo Santoro, Lorenzo Rosasco
    Journal of Machine Learning Research, 14(Oct):3201???3205, 2013
    Abstract: We present GURLS, a least squares, modular, easy-to-extend software library for efficient supervised learning. GURLS is targeted to machine learning practitioners, as well as non-specialists. It offers a number state-of-the-art training strategies for medium and large-scale learning, and routines for efficient model selection. The library is particularly well suited for multi-output problems (multi-category/multi-label). GURLS is currently available in two independent implementations: Matlab and C++. It takes advantage of the favorable properties of regularized least squares algorithm to exploit advanced tools in linear algebra. Routines to handle computations with very large matrices by means of memory-mapped storage and distributed task execution are available. The package is distributed under the BSD licence and is available for download at https://github.com/CBCL/GURLS.
    @article{JMLR:tacchetti13a,
      author  = {Andrea Tacchetti and Pavan K. Mallapragada and Matteo Santoro and Lorenzo Rosasco},
      title   = {GURLS: A Least Squares Library for Supervised Learning},
      journal = {Journal of Machine Learning Research},
      year    = {2013},
      volume  = {14},
      pages   = {3201-3205},
      url     = {http://jmlr.org/papers/v14/tacchetti13a.html}
    }
  • On Learnability, Complexity and Stability
    Silvia Villa, Lorenzo Rosasco, Tomaso Poggio
    "Empirical Inference, Festschrift in Honor of Vladimir N. Vapnik". Editors: Sch??lkopf, Bernhard; Luo, Zhiyuan; Vovk, Vladimir. Springer-Verlag Berlin and Heidelberg GmbH, Chapter 7, page 59-70, 2013. Also Arxiv 1303.5976
    Abstract: We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik. We survey classic results characterizing learnability in term of suitable notions of complexity, as well as more recent results that establish the connection between learnability and stability of a learning algorithm.
    @TechReport{LCS13,
    	 author = {Villa, Silvia and Rosasco, Lorenzo and Poggio, Tomaso},
    	 title = {On Learnability, Complexity and Stability},
    	 url={http://arxiv.org/abs/1303.5976},
    	 journal={Empirical Inference, Festschrift in Honor of Vladimir N. Vapnik},
    	 editor={Sch??lkopf, Bernhard; Luo, Zhiyuan; Vovk, Vladimir},
    	 booktitle={Springer-Verlag Berlin and Heidelberg GmbH},
    	 chapter={7},
    	 page={59--70},
    	year = {2013},
    }
  • Q-ary Compressive Sensing
    Youssef Mroueh and Lorenzo Rosasco
    Proceedings of the 10th International Conference on Sampling Theory and Applications, pp. 17-20
    Abstract: We introduce q-ary compressive sensing, an extension of 1-bit compressive sensing. We propose a novel sensing mechanism and a corresponding recovery procedure. The recovery properties of the proposed approach are analyzed both theoretically and empirically. Results in 1-bit compressive sensing are recovered as a special case. Our theoretical results suggest a tradeoff between the quantization parameter q, and the number of measurements m in the control of the error of the resulting recovery algorithm, as well its robustness to noise.
    @TechReport{QCS13,
    	author = {Mroueh, Youssef and Rosasco, Lorenzo},
    	title = {Q-ary Compressive Sensing},
    	url={http://arxiv.org/abs/1302.5168},
      journal = {Proceedings of the 10th International Conference on Sampling Theory and Applications},
        pages={17--20},
        year = {2013},
    }
  • Accelerated and inexact forward-backward algorithms
    S. Villa, S. Salzo, L. Baldassarre and A. Verri
    SIAM Journal on Optimization
    Abstract: We propose a convergence analysis of accelerated forward-backward splitting methods for composite function minimization, when the proximity operator is not available in closed form, and can only be computed up to a certain precision. We prove that the $1/k^2$ convergence rate for the function values can be achieved if the admissible errors are of a certain type and satisfy a sufficiently fast decay condition. Our analysis is based on the machinery of estimate sequences first introduced by Nesterov for the study of accelerated gradient descent algorithms. Furthermore, we give a global complexity analysis, taking into account the cost of computing admissible approximations of the proximal point. An experimental analysis is also presented.
    @misc{VilSal13,
    	author={Villa, S. and Salzo, S. and Baldassarre, L. and Verri, A.},
    	title={Accelerated and inexact forward-backward algorithms},
    	journal={SIAM J. Optim.},
    	fjournal={SIAM Journal on Optimization},
    	volume={23},
    	year={2013},
    	number={3},
    	pages={1607--1633}
    }
  • A machine learning pipeline for discriminant pathways identification
    Barla A., Jurman G., Visintainer R., Squillario M., Filosi M., Riccadonna S., Furlanello C.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more in general, in systems biology. In this work we propose a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. The proposal is independent from the classification algorithm used. Two applications on genomewide data are presented regarding children susceptibility to air pollution and early and late onset of Parkinson’s disease.
  • Comparison of clinical outcomes for patients with large choroidal melanoma after primary treatment with enucleation or proton beam radiotherapy
    Mosci C., Lanza F.B., Barla A., Mosci S., Hérault J., Anselmi L., Truini M.
    Ophthalmologica
    Abstract: Purpose:To evaluate survival and clinical outcome for patients with a large uveal melanoma treated by either enucleation or proton beam radiotherapy (PBRT). Procedures: This retrospective non-randomized study evaluated 132 consecutive patients with T3 and T4 choroidal melanoma classified according to TNM stage grouping. Results: Cumulative all-cause mortality, melanoma-related mortality and metastasis-free survival were not statistically different between the two groups (log-rank test, p = 0.56, p = 0.99 and p = 0.25, respectively). Eye retention of the tumours treated with PBRT at 5 years was 74% (SD 6.2%). In these patients at diagnosis, 73% of eyes had a best-corrected visual acuity (BCVA) of 0.1 or better. After 12 and 60 months, BCVA of 0.1 or better was observed in 47.5 and 32%, respectively. Conclusion and Message: Although enucleation is the most common primary treatment for large uveal melanomas, PBRT is an eye-preserving option that may be considered for some patients.
  • Discriminant functional gene groups identification with machine learning and prior knowledge
    Zycinski G., Squillario M., Barla A., Sanavia T., Verri A., Di Camillo B.
    ESANN 2012 proceedings, 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
    Abstract: In computational biology, the analysis of high-throughput data poses several issues on the reliability, reproducibility and interpretability of the results. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list interpretability is to integrate biological information from genomic databases in the learning process. Here we propose KDVS, a machine learning based pipeline that incorporates domain biological knowledge a priori to structure the data matrix before the feature selection and classification phases. The pipeline is completed by a final step of semantic clustering and visualization. The clustering phase provides further interpretability of the results, allowing the identification of their biological meaning. To prove the efficacy of this procedure we analyzed a public dataset on prostate cancer.
  • Dynamic simulations of pathways downstream of TGFβ, Wnt and EGF-family growth factors, in colorectal cancer, including mutations and treatments with onco-protein inhibitors
    Tortolina L., Castagnino N., De Ambrosi C., Barla A., Verri A., Zoppoli G., Bagnasco L., Piras D., Patrone F., Ballestrero A., Parodi S.
    SEMA SIMAI Springer Series
    Abstract: With reference to colorectal cancer, we have reconstructed a Molecular Interaction Map downstream of TGFβ, Wnt and EGF-family. Based on an extensive and systematic direct/indirect data extrapolation from several dozens of published experimental papers, and some data interpolation that could fit with the general behavior of this signaling-network region, we were able to obtain an operative mathematical simulation model. We could simulate normal conditions of the network, behavior in the presence of important colorectal cancer mutations, behavior in the presence of virtual drug inhibitors of different specifically altered onco-proteins affected by excess of function. The dynamic behavior of the simulation seems quite reasonable, in terms of what is known about the physiology and the pathology of this signaling-network region. Preliminary experimental verification experiments look encouraging.
  • Effect of size and heterogeneity of samples on biomarker discovery: Synthetic and real data assessment
    Di Camillo B., Sanavia T., Martini M., Jurman G., Sambo F., Barla A., Squillario M., Furlanello C., Toffolo G., Cobelli C.
    PLoS ONE
    Abstract: Motivation The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. Methods We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. Results The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.
  • Multi-output learning via spectral filtering
    Baldassarre L., Rosasco L., Barla A., Verri A.
    Machine Learning
    Abstract: In this paper we study a class of regularized kernel methods for multi-output learning which are based on filtering the spectrum of the kernel matrix. The considered methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector-valued extensions of L2 boosting and other iterative schemes. Computational properties are discussed for various examples of kernels for vector-valued functions and the benefits of iterative techniques are illustrated. Generalizing previous results for the scalar case, we show a finite sample bound for the excess risk of the obtained estimator, which allows to prove consistency both for regression and multi-category classification. Finally, we present some promising results of the proposed algorithms on artificial and real data.
  • Inexact and Accelerated Proximal Point Algorithms
    Saverio Salzo and Silvia Villa
    Journal of Convex Analysis, 2012.
    @article{salzo2012a,
      author  = {Saverio Salzo and Silvia Villa},
      title   = {Inexact and Accelerated Proximal Point Algorithms},
      journal = {Journal of Convex Analysis},
      year    = {2012},
      volume  = {19},
      number = {4},
      pages   = {1167-1192},
      url     = {http://www.heldermann.de/JCA/JCA19/JCA194/jca19062.htm}
    }
  • Convergence Analysis of a Proximal Gauss-Newton method
    Saverio Salzo and Silvia Villa
    Journal of Computational Optimization and Applications, 2012
    @article{salzo2012b,
      author  = {Saverio Salzo and Silvia Villa},
      title   = {Convergence Analysis of a Proximal {G}auss-{N}ewton method},
      journal = {Journal of Computational Optimization and Applications},
      year    = {2012},
      volume  = {53},
      number = {2},
      pages   = {557-589},
      url     = {https://link.springer.com/article/10.1007/s10589-012-9476-9}
    }
  • Practical Conditions for Well-behaved-ness of Anisotropic Voronoi Diagrams
    Guillermo D. Canas
    Arxiv 1202.0867
    Abstract: Recently, simple conditions for well-behaved-ness of anisotropic Voronoi diagrams have been proposed. While these conditions ensure well-behaved-ness of two types of practical anisotropic Voronoi diagrams, as well as the geodesic-distance one, in any dimension, they are both prohibitively expensive to evaluate, and not well-suited for typical problems in approximation or optimization. We propose simple conditions that can be efficiently evaluated, and are better suited to practical problems of approximation and optimization. The practical utility of this analysis is enhanced by the fact that orphan-free anisotropic Voronoi diagrams have embedded triangulations as duals.
    @TechReport{CanPract12,
    	author = {Canas, D. Guillermo},
    	title = {Practical Conditions for Well-behaved-ness of Anisotropic
    		 Voronoi Diagrams},	
    	url={http://arxiv.org/abs/1202.0867},
      	year = {2012},
    }
  • Multiscale Geometric Methods for Data Sets I:Multiscale SVD, Noise and Curvature
    Anna V. Little, Mauro Maggioni, Lorenzo Rosasco
    Technical Report, CSAIL-TR-2012-029 and CBCL-310
    Abstract: Large data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k<<D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M.
    @TechReport{Multi12,
    	author = {Little, Anna V. and Maggioni, Mauro and Rosasco, Lorenzo},
    	title = {Multiscale Geometric Methods for Data Sets I:
    	Multiscale SVD, Noise and Curvature},	
    	url={http://dspace.mit.edu/handle/1721.1/72597},
      	institution =  {MIT-CSAIL-TR-2012-029/CBCL-310},
      	year = 	 {2012},
      	address = 	 {MIT, Cambridge, MA},
      	month = 	 {September},
    }
  • Multiclass Learning with Simplex Coding
    Y. Mroueh, T. Poggio, L. Rosasco and J.J. Slotine
    In Advances in Neural Information Processing Systems (NeurIPS) 25 (pp. 2798-2806). 2012.
    Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, a relaxation error analysis can be developed avoiding constraints on the considered hypotheses class. Moreover, we show that in this setting it is possible to derive the first provably consistent regularized method with training/tuning complexity which is independent to the number of classes. Tools from convex analysis are introduced that can be used beyond the scope of this paper.
    @inproceedings{mroueh2012multiclass,
      title={Multiclass Learning with Simplex Coding},
      author={Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo and Slotine, Jean-Jacques},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS) 25},
      pages={2798--2806},
      year={2012}
    }
  • Learning Probability Measures with respect to Optimal Transport Metrics
    Guillermo D. Canas, Lorenzo Rosasco
    In Advances in Neural Information Processing Systems (NeurIPS) 25 (pp. 2501-2509). 2012.
    Abstract: We study the problem of estimating, in the sense of optimal transport metrics, a measure which is assumed supported on a manifold embedded in a Hilbert space. By establishing a precise connection between optimal transport metrics, optimal quantization, and learning theory, we derive new probabilistic bounds for the performance of a classic algorithm in unsupervised learning (k-means), when used to produce a probability measure derived from the data. In the course of the analysis, we arrive at new lower bounds, as well as probabilistic upper bounds on the convergence rate of the empirical law of large numbers, which, unlike existing bounds, are applicable to a wide class of measures.
    @inproceedings{Wass2012,
      title={Learning Probability Measures with respect to Optimal Transport Metrics},
      author={Canas, Guillermo and Rosasco, Lorenzo},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS) 25},
      pages={2501--2509},
      year={2012}
    }
  • Learning Manifolds with K-Means and K-Flats
    Guillermo D. Canas, Tomaso Poggio and Lorenzo Rosasco
    In Advances in Neural Information Processing Systems (NeurIPS) 25 (pp. 2474-2482). 2012
    Abstract: We study the problem of estimating a manifold from random samples. In particular, we consider piecewise constant and piecewise linear estimators induced by k-means and k-flats, and analyze their performance. We extend previous results for k-means in two separate directions. First, we provide new results for k-means reconstruction on manifolds and, secondly, we prove reconstruction bounds for higher-order approximation (k-flats), for which no known results were previously available. While the results for k-means are novel, some of the technical tools are well-established in the literature. In the case of k-flats, both the results and the mathematical tools are new.
    @inproceedings{Manifolds12,
      title={Learning Manifolds with K-Means and K-Flats},
      author={Canas, Guillermo and Poggio, Tomaso and Rosasco, Lorenzo},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS) 25},
      pages={2474--2482},
      year={2012}
    }
  • Duals of Orphan-Free Anisotropic Voronoi Diagrams are Embedded Meshes
    Guillermo D. Canas, Steven J. Gortler
    28th ACM Symposium on Computational Geometry, 2012
    Abstract: Given an anisotropic Voronoi diagram, we address the fundamental question of when its dual is embedded. We show that, by requiring only that the primal be orphan-free (have connected Voronoi regions), its dual is always guaranteed to be an embedded triangulation. Further, the primal diagram and its dual have properties that parallel those of ordinary Voronoi diagrams: the primal's vertices, edges, and faces are connected, and the dual triangulation has a simple, closed boundary. Additionally, if the underlying metric has bounded anisotropy (ratio of eigenvalues), the dual is guaranteed to triangulate the convex hull of the sites. These results apply to the duals of anisotropic Voronoi diagrams of any set of sites, so long as their Voronoi diagram is orphan-free. By combining this general result with existing conditions for obtaining orphan-free anisotropic Voronoi diagrams, a simple and natural condition for a set of sites to form an embedded anisotropic Delaunay triangulation follows.
    @inproceedings{CanasGorlter_SOCG12,
       author = {Canas, Guillermo D. and Gortler, Steven J.},
       title = {Duals of orphan-free anisotropic {Voronoi} diagrams are embedded meshes},
       booktitle = {Proceedings of the 2012 Symposium on Computational Geometry},
       series = {SoCG \'12},
       year = {2012},
       isbn = {978-1-4503-1299-8},
       location = {Chapel Hill, North Carolina, USA},
       pages = {219--228},
       url = {http://doi.acm.org/10.1145/2261250.2261283},
       doi = {10.1145/2261250.2261283},
       acmid = {2261283},
       publisher = {ACM},
       address = {New York, NY, USA},
       numpages = {10},
       keywords = {Delaunay triangulations, anisotropic Voronoi diagrams, orphan-free},
    }
  • Learning sets with separating kernels
    E. De Vito, L. Rosasco, and A. Toigo
    Applied and Computational Harmonic Analysis 37
    Abstract: We consider the problem of learning a set from random samples. We show how relevant geometric and topological properties of a set can be studied analytically using concepts from the theory of reproducing kernel Hilbert spaces. A new kind of reproducing kernel, that we call separating kernel, plays a crucial role in our study and is analyzed in detail. We prove a new analytic characterization of the support of a distribution, that naturally leads to a family of provably consistent regularized learning algorithms and we discuss the stability of these methods with respect to random sampling. Numerical experiments show that the approach is competitive, and often better, than other state of the art techniques.
    @misc{Lesets,
        author = {De Vito, E. and  Rosasco, L.   and Toigo, A. },	
        title = {Learning Sets with Separating Kernels},	
        url={http://arxiv.org/abs/1204.3573},	
        note={submitted},	
        year={2012}		
    }
  • Kernels for Vector-Valued Functions: a Review.
    M. Alvarez, N. Lawrence and L. Rosasco
    Foundations and Trends in Machine Learning 4(3):195-266, 2012
    Abstract: Kernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspective they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partly by frameworks like multitask learning. In this paper, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.
    @article{AlvLawRos12,
       author = {{\'A}lvarez, M. and Lawrence, N. and Rosasco, L.},
       title = {Kernels for Vector-Valued Functions: a Review},
       volume = {4},
       number = {3},
       year = {2012},
       journal = {Foundations and Trends in Machine Learning},
       pages = {195-266},
       url = {http://dx.doi.org/10.1561/2200000036},
       note = {see also http://arxiv.org/abs/1106.6251}
    }
  • Inexact and accelerated proximal point algorithms
    S. Salzo and S. Villa
    Journal of Convex Analysis 19(4), 2012
    Abstract: We present inexact accelerated proximal point algorithms for minimizing a proper lower semicon- tinuous and convex function. We carry on a convergence analysis under different types of errors in the evaluation of the proximity operator, and we provide corresponding convergence rates for the objective function values. The proof relies on a generalization of the strategy proposed in Guler (1992) for generating estimate sequences according to the definition of Nesterov, and is based on the concept of $\epsilon$-subdifferential. We show that the convergence rate of the exact accelerated algorithm 1/k^2 can be recovered by constraining the errors to be of a certain type.
    @article{SalVil12,
       author={Salzo, S. and Villa, S.},
       title={Inexact and accelerated proximal point algorithms}, 
       journal={Journal of Convex Analysis},
       year={2012},
       number={4},
       volume={19},
    }
  • Proximal methods for the latent group lasso penalty
    S. Villa, L. Rosasco, S. Mosci and A. Verri
    Computational Optimization and Applications 58
    Abstract: We consider a regularized least squares problem, with regularization by structured sparsity-inducing norms, which extend the usual $\ell_1$ and the group lasso penalty, by allowing the subsets to overlap. Such regularizations lead to nonsmooth problems that are difficult to optimize, and we propose in this paper a suitable version of an accelerated proximal method to solve them. We prove convergence of a nested procedure, obtained composing an accelerated proximal method with an inner algorithm for computing the proximity operator. By exploiting the geometrical properties of the penalty, we devise a new active set strategy, thanks to which the inner iteration is relatively fast, thus guaranteeing good computational performances of the overall algorithm. Our approach allows to deal with high dimensional problems without pre-processing for dimensionality reduction, leading to better computational and prediction performances with respect to the state-of-the art methods, as shown empirically both on toy and real data.
    @misc{glopridulong,
       author={Villa, S. and Rosasco, L. and Mosci, S. and Verri, A.},
       title={Proximal methods for the latent group lasso penalty},
       year={2012},
       url={http://arxiv.org/abs/1209.0368},
       note={Submitted},
    }
  • GURLS: a toolbox for large scale multiclass learning
    A. Tacchetti, P. Mallapragada, M. Santoro, and L. Rosasco
    NeurIPS 2011 workshop on parallel and large-scale machine learning. 2011.
    Abstract: We present GURLS, a toolbox for supervised learning based on the regularized least squares algorithm. The toolbox takes advantage of all the favorable properties of least squares and is tailored to deal in particular with multi-category/multi-label problems. One of the main advantages of GURLS is that it allows training and tuning a multi-category classifier at essentially the same cost of one single binary classifier. The toolbox provides a set of basic functionalities including different training strategies and routines to handle computations with very large matrices by means of both memory-mapped storage and distributed task execution. The system is modular and can serve as a basis for easily prototyping new algorithms. The toolbox is available for download, easy to set-up and use.
    @inproceedings{tac_mal_san_ros_2012,
       author = { Tacchetti, Andrea and Mallapragada, Pavan and Santoro, Matteo and Rosasco, Lorenzo },
       title = {Gurls: a toolbox for large scale multiclass learning},
       booktitle = {NeurIPS Workshop on Big Learning: Algorithms, Systems, and Tools for Learning at Scale},
       year = {2012},
       location = {Sierra Nevada, Spain},
       numpages = {6},
       url = {http://dspace.mit.edu/handle/1721.1/69034},
       keywords = {Matlab, Computational Learning, Regularized Least Squares, Large Scale, Multiclass problems, C++},
    }
  • Is there sparsity beyond additive models?
    S. Mosci, L. Rosasco, M. Santoro, A. Verri and S. Villa
    Proceedings of IFAC System Identification 16, 2012
    Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to devise a sparse nonparametric model, avoiding linear or additive models. The key intuition is to measure the importance of each variable in the model by making use of partial derivatives. Based on this idea we propose and study a new regularizer and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm induces a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance.
    @inproceedings{IFAC11,
       author = {Mosci, S. and Rosasco, L. and Santoro, M. and Verri, A. and Villa, S.},
       title = {Is there sparsity beyond additive models?},
       booktitle = {IFAC 2012 System Identification},
       year = {2012},
       volume={16},
       address={Square - Brussels Meeting Centre, Belgium},
        url = {http://www.ifac-papersonline.net/Detailed/54755.html},
    }
  • Consistency of learning algorithms using Attouch-Wets convergence.
    S. Villa, L. Rosasco, S. Mosci and A. Verri
    Optimization 61(3):287-305, 2012
    Abstract: In this article, we show that the notion of Tikhonov well-posedness is suitable for studying supervised learning for a wide range of loss functions. We show that supervised learning can be studied from the perspective of variational systems, where one deals with the stability properties of a family of optimization problems. In particular, we prove that the problem of consistency is related to the Attouch-Wets convergence of a sequence of perturbed functionals. Our aim is understanding the potential benefits of applying variational convergence methods to learning theory.
    @article {VilRosMosVer12,
       AUTHOR = {Villa, S. and Rosasco, L. and Mosci, S. and Verri, A.}, 
       TITLE = {Consistency of learning algorithms using {A}ttouch-{W}ets convergence},
       JOURNAL = {Optimization},
       FJOURNAL = {Optimization. A Journal of Mathematical Programming and Operations Research},
       VOLUME = {61},
       YEAR = {2012},
       NUMBER = {3},
       PAGES = {287--305},
       ISSN = {0233-1934},
       DOI = {10.1080/02331934.2010.511671},
       URL = {http://dx.doi.org/10.1080/02331934.2010.511671},
    }
  • An extension of Mercer theorem to matrix-valued measurable kernels.
    E. De Vito, V. Umanit?? and S. Villa
    Applied and Computational Harmonic Analyisis 2012
    Abstract: We extend the classical Mercer theorem to reproducing kernel Hilbert spaces whose elements are functions from a measurable space X into Cn. Given a finite measure $\mu$ on X, we represent the reproducing kernel K as a convergent series in terms of the eigenfunctions of a suitable compact operator depending on K and $\mu$. Our result holds under the mild assumption that K is measurable and the associated Hilbert space is separable. Furthermore, we show that X has a natural second countable topology with respect to which the eigenfunctions are continuous and such that the series representing K uniformly converges to K on compact subsets of X ????? X, provided that the support of $\mu$ is X.
    @article{Mercer,
       author={De Vito, E. and Umanit\`a, V. and Villa, S.},
       title={An extension of Mercer theorem to matrix-valued measurable kernels},
       journal={Applied. Comput. Harmon. Anal.},
       year={2012},
       url={http://dx.doi.org/10.1016/j.acha2012.06001},
    }
  • Multi-Output Learning via Spectral Filtering.
    L. Baldassarre, L., A. Barla, L. Rosasco and A. Verri
    Machine learning 87(3), p.259-301, 2012
    Abstract: In this paper we study a class of regularized kernel methods for multi-output learning which are based on filtering the spectrum of the kernel matrix. The considered methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector-valued extensions of L2 boosting and other iterative schemes. Computational properties are discussed for various examples of kernels for vector-valued functions and the benefits of iterative techniques are illustrated. Generalizing previous results for the scalar case, we show a finite sample bound for the excess risk of the obtained estimator, which allows to prove consistency both for regression and multi-category classification. Finally, we present some promising results of the proposed algorithms on artificial and real data.
    @article{BalRosBarVer12,
       author = {Baldassarre, L. and Rosasco, L. and Barla, A. and Verri, A.},
       title = {Multi-output learning via spectral filtering},
       volume = {10},
       number = {3},
       year = {2012},
       journal = {Machine Learning},
       pages = {259-301},
       url = {http://dx.doi.org/10.1007/s10994-012-5282-y},
    }
  • A computational procedure for functional characterization of potential marker genes from molecular data: Alzheimer's as a case study
    Squillario M., Barla A.
    BMC Medical Genomics
    Abstract: A molecular characterization of Alzheimer's Disease (AD) is the key to the identification of altered gene sets that lead to AD progression. We rely on the assumption that candidate marker genes for a given disease belong to specific pathogenic pathways, and we aim at unveiling those pathways stable across tissues, treatments and measurement systems. In this context, we analyzed three heterogeneous datasets, two microarray gene expression sets and one protein abundance set, applying a recently proposed feature selection method based on regularization.
  • SVS: Data and knowledge integration in computational biology
    Zycinski G., Barla A., Verri A.
    Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
    Abstract: In this paper we present a framework for structured variable selection (SVS). The main concept of the proposed schema is to take a step towards the integration of two different aspects of data mining: database and machine learning perspective. The framework is flexible enough to use not only microarray data, but other high-throughput data of choice (e.g. from mass spectrometry, microarray, next generation sequencing). Moreover, the feature selection phase incorporates prior biological knowledge in a modular way from various repositories and is ready to host different statistical learning techniques. We present a proof of concept of SVS, illustrating some implementation details and describing current results on high-throughput microarray data.
  • The computational magic of the ventral stream: Towards a theory
    T. Poggio, and J. Z. Leibo (sections with J. Mutch, and L. Rosasco)
    Nature Precedings, doi:10.1038/npre.2011.6117.1 July 16, 2011
    Abstract: I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream ??? V1, V2, V4 and IT ??? is to learn-and-discount image transformations. The most surprising implication of the theory emerging from these assumptions is that the computational goals and detailed properties of cells in the ventral stream follow from symmetry properties of the visual world through a process of unsupervised correlational learning. From the assumption of a hierarchy of areas with receptive fields of increasing size the theory predicts that the size of the receptive fields determines which transformations are learned during development and then factored out during normal processing; that the transformation represented in each area determines the tuning of the neurons in the aerea, independently of the statistics of natural images; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy. Some of the main predictions of this theory-in-fieri are: the type of transformation that are learned from visual experience depend on the size (measured in terms of wavelength) and thus on the area (layer in the models) ??? assuming that the aperture size increases with layers; the mix of transformations learned determine the properties of the receptive fields ??? oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells); class-specific modules ??? such as faces, places and possibly body areas ??? should exist in IT to process images of object classes.
    @inproceedings{pog_lei_ros_2011,
       author = { Poggio, Tomaso and Leibo, Joel and Mutch, Jim and Rosasco, Lorenzo },
       title = {The computational magic of the ventral stream: Towards a theory},
       year = {2011},
       numpages = {25},
       url = {http://precedings.nature.com/documents/6117/version/2/files/npre20116117-2.pdf},
       doi = {10.1038/npre.2011.6117.1},
    }
  • Online Learning, Stability, and Stochastic Gradient Descent.
    T. Poggio, S. Voinea and L. Rosasco
    Arxiv 1105.4701v2
    Abstract: In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CV_loo stability is necessary and sufficient for generalization and consistency of ERM. In this note, we introduce CV_on stability, which plays a similar note in online learning. We show that stochastic gradient descent (SDG) with the usual hypotheses is CVon stable and we then discuss the implications of CV_on stability for convergence of SGD.
    @misc{PogVoiRos11,
        author = {Poggio, T. and Voinea, S. and    Rosasco, L. },
        title = {Online Learning, Stability, and Stochastic Gradient Descent},
        year = {2011},
        url= {http://arxiv.org/abs/1105.4701},
        month = {May},
    }
  • Regularization Predicts While Discovering Taxonomy
    Y. Mroueh, T. Poggio and L. Rosasco
    Proceedings CVPR, 2011
    Abstract: In this work we discuss a regularization framework to solve multi-category when the classes are described by an underlying class taxonomy. In particular we discuss how to learn the class taxonomy while learning a multi-category classifier.
    @inproceedings{mro_pog_ros_2011,
       author = { Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo },
       title = {Regularization Predicts While Discovering Taxonomy},
       booktitle = {CVPR Workshop on Fine-Grained Visual Categorization},
       year = {2011},
       numpages = {5},
       url = {http://dspace.mit.edu/bitstream/handle/1721.1/63175/MIT-CSAIL-TR-2011-029.pdf?sequence=1},
       keywords = {computational learning, classification},
        note = {also available as MIT-CSAIL-TR-2011-029, CBCL-299},
    }
  • Multi-category and taxonomy learning: A regularization approach
    Y. Mroueh, T. Poggio, and L. Rosasco
    Proceedings CVPR, 2011
    Abstract:  In this work we discuss a regularization framework to solve multi-category classification when the classes are described by an underlying class taxonomy. In particular we discuss how to learn the class taxonomy while learning a multi-category classifer.
    @inproceedings{mro_pog_ros2_2011,
       author = { Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo },
       title = {Multi-category and taxonomy learning: A regularization approach},
       booktitle = {NeurIPS Workshop on Challenges in Learning Hierarchical Models: Transfer Learning and Optimization},
       year = {2011},
       location = {Sierra Nevada, Spain},
       numpages = {7},
       url = {https://00cf0c27-a-62cb3a1a-s-sites.googlegroups.com/site/NeurIPS2011workshop/schedule/mroueh.pdf?attachauth=ANoY7cpxA1BfW_vG8FmeGj0LYguMGbuDo2ObWs6tWV2Z0UaIY8l2KhkJ8N5oPKWq4yCHbzuuzTo6whCWAugARG6oZe0DDun8Kg2_SbxPN-4u1fJzGxrYq_Z8CarJZjKbbR4wSo-OxpqvzWNjULcREC75URcjT28BG1DAxrwuwYh98UgVtntvcc9Ins_bMjLb2qvRki6rXvzweEdAxtHf4LNHll44i0Px141I7dy1g_KVUJpng94g6lU%3D\&attredirects=0},
       keywords = {computational learning, classification},
    }
  • Multi-class Learning: Simplex Coding and Relaxation Error
    Y. Mroueh, T. Poggio, L. Rosasco and J.J. Slotine
    Proocedings Oberwolfach Workshop on Mathematics of Machine Learning, 2011
    Abstract: We study multi-category classification in the framework of computational learning theory. We show how a relaxation approach, which is commonly used in binary classification, can be generalized to the multi-class setting. We propose a vector coding, namely the simplex coding, that allows to introduce a new notion of multi-class margin and cast multi-category classification into a vector valued regression problem. The analysis of the relaxation error be quantified and the binary case is recovered as a special case of our theory. From a computational point of view we can show that using the simplex coding we can design regularized learning algorithms for multi-category classification that can be trained at a complexity which is independent to the number of classes.
    @inproceedings{mro_pog_ros_slot_2011,
       author = { Mroueh, Youssef and Poggio, Tomaso and Rosasco, Lorenzo and Slotine, Jean-Jacques },
       title = {Multi-class Learning: Simplex Coding and Relaxation Error},
       booktitle = {Oberwolfach Workshop on Mathematics of Machine Learning},
       year = {2011},
       numpages = {3},
       url = {http://dspace.mit.edu/handle/1721.1/66085},
       address = {MIT, Cambridge, MA},
       institution = {MIT-CSAIL-TR-2011-043/CBCL-305},
       keywords = {computational learning, machine learning, convex relaxation},
    }
  • Multiscale Geometric Methods for Estimating Intrinsic Dimension
    A.V. Little, M. Maggioni and L. Rosasco
    Proceedings SampTA, 2011
    Abstract: We present a novel approach for estimating the intrinsic dimension of certain point clouds: we assume that the points are sampled from a manifold M of dimension k, with k much less than D, and corrupted by D-dimensional noise. When M is linear, one may analyze this situation by PCA: with no noise one would obtain a rank k matrix, and noise may be treated as a perturbation of the covariance matrix. WhenMis a nonlinear manifold, global PCA may dramatically overestimate the intrinsic dimension. We discuss a multiscale version of PCA and how one can extract estimators for the intrinsic dimension that are highly robust to noise, and we derive some of their finite-sample-size properties.
    @inproceedings{lit_mag_ros_2011,
       author = { Little, Anna and Maggioni, Mauro and Rosasco, Lorenzo },
       title = {Multiscale Geometric Methods for Estimating Intrinsic Dimension},
       booktitle = {Proceedings SampTA},
       address= {Singapore},
       year = {2011},
       numpages = {4},
       url = {http://sampta2011.ntu.edu.sg/SampTA2011Proceedings/papers/We2S07.5-P0197.pdf},
       keywords = {Dimension estimation, multiscale analysis, geometric measure theory, point cloud data},
    }
  • Some recent advances in multiscale geometric analysis of point clouds
    G. Chen, A.V. Little, M. Maggioni and L. Rosasco
    Wavelets and Multiscale Analysis:199-225, 2011
    Abstract: We discuss recent work based on multiscale geometric analysis for the study of large data sets that lie in high-dimensional spaces but have low-dimensional structure. We present three applications: the first one to the estimation of intrinsic dimension of sampled manifolds, the second one to the construction of multiscale dictionaries, called geometric wavelets, for the analysis of point clouds, and the third one to the inference of point clouds modeled as unions of multiple planes of varying dimension.
    @incollection{che_lit_mag_ros_2011,
       author = {Chen, G. and Little, A. V. and Maggioni, M. and Rosasco, L.},
        title = {Some Recent Advances in Multiscale Geometric Analysis of Point Clouds},
        booktitle = {Wavelets and Multiscale Analysis},
       series = {Applied and Numerical Harmonic Analysis},
       editor = {Cohen, Jonathan and Zayed, Ahmed I.},
        publisher = {Birkh"auser Boston},
       pages = {199-225},
        url = {http://dx.doi.org/10.1007/978-0-8176-8095-4$\_$10},
       year = {2011}
    }
  • Mathematical and computational foundations of learning theory
    M. Hein, G. Lugosi, L. Rosasco, and S. Smale
    Dagstuhl Seminar 11291
    Abstract: The main goal of the seminar ``Mathematical and Computational Foundations of Learning Theory'' was to bring together experts from computer science, mathematics and statistics to discuss the state of the art in machine learning broadly construed and identify and formulate the key challenges in learning which have to be addressed in the future. This Dagstuhl seminar was one of the first meetings to cover the full broad range of facets of modern learning theory. The meeting was very successful and all participants agreed that such a meeting should take place on a regular basis.
    @article{HeiLugRosSma12,
       author = {Hein, M. and Lugosi, G. and Rosasco, L. and Smale, S.},
       title = {{Mathematical and Computational Foundations of Learning Theory (Dagstuhl Seminar 11291)}},
        pages = {53--69}, 
       journal = {Dagstuhl Reports}, 
       ISSN = {2192-5283}, 
        year = {2011}, 
       volume = {1}, 
       number = {7},
       editor = {Matthias Hein and Gabor Lugosi and Lorenzo Rosasco and Steve Smale},
       publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik}, 
       address = {Dagstuhl, Germany},
       URL = {http://drops.dagstuhl.de/opus/volltexte/2011/3309},
       URN = {urn:nbn:de:0030-drops-33093},
       doi = {http://dx.doi.org/10.4230/DagRep.1.7.53},
       annote = {Keywords: learning theory, machine learning, sparsity, high-dimensional geometry, manifold learning, online learning}, 
    }
  • Support estimation with spectral regularization
    E. De Vito, L. Rosasco and A. Toigo
    Proceedings of Sampling Theory and Applications (Sampta), 2011
    Abstract: We are interested into the problem of learning the support of a high-dimensional probability distribution from random samples. We consider a setting in which the probability distribution does not have a continuous density and provide a novel characterization of the support in terms of the integral operator defined by a suitable reproducing kernel. This new characterization suggests a way to estimate the support from random samples, using spectral techniques proposed to regularize ill-posed inverse problems. We introduce a new class of kernels, that we call completely regular, that allow to prove universal consistency of the proposed methods. Empirical performances are tested on benchmark data and compare favorably to state of the art results.
    @inproceedings{SRegSE, 
       author = {De Vito, E. and Rosasco, L. and Toigo, A.}, 
       title = {Spectral Regularization for Support Estimation},
       booktitle = {Proceedings of Sampling Theory and Applications (Sampta)},
       year = {2011},
       month = {May},
       address = {Singapore},
    }
  • A biology-driven approach identifies the hypoxia gene signature as a predictor of the outcome of neuroblastoma patients
    Fardin P., Barla A., Mosci S., Rosasco L., Verri A., Versteeg R., Caron H.N., Molenaar J.J., Øra I., Eva A., Puppo M., Varesio L.
    Molecular Cancer
    Abstract: Hypoxia is a condition of low oxygen tension occurring in the tumor microenvironment and it is related to poor prognosis in human cancer. To examine the relationship between hypoxia and neuroblastoma, we generated and tested an in vitro derived hypoxia gene signature for its ability to predict patients' outcome.
  • Identification of multiple hypoxia signatures in neuroblastoma cell lines by l1-l2 regularization and data reduction
    Fardin P., Cornero A., Barla A., Mosci S., Acquaviva M., Rosasco L., Gambini C., Verri A., Varesio L.
    Journal of Biomedicine and Biotechnology
  • Learning how to grasp objects
    Barla A., Baldassarre L., Noceti N., Odone F.
    Proceedings of the 18th European Symposium on Artificial Neural Networks - Computational Intelligence and Machine Learning, ESANN 2010
  • Vector field learning via spectral filtering
    Baldassarre L., Rosasco L., Barla A., Verri A.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Abstract: In this paper we present and study a new class of regularized kernel methods for learning vector fields, which are based on filtering the spectrum of the kernel matrix. These methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector valued extensions of L2-Boosting. Our theoretical and experimental analysis shows that spectral filters that yield iterative algorithms, such as L2-Boosting, are much faster than Tikhonov regularization and attain the same prediction performances. Finite sample bounds for the different filters can be derived in a common framework and highlight different theoretical properties of the methods. The theory of vector valued reproducing kernel Hilbert space is a key tool in our study.
  • Learning Generic Invariances in Object Recognition: Translation and Scale.
    J.Z. Leibo, J. Mutch, L. Rosasco, S. Ullman, and T. Poggio
    MIT-CSAIL-TR-2010-062, CBCL-294
    Abstract: Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat confusing while discussions of invariance are often confused. In this report, we provide an operational definition of invariance by formally defining perceptual tasks as classification problems. The definition should be appropriate for physiology, psychophysics and computational modeling. For any specific object, invariance can be trivially ``learned'' by memorizing a sufficient number of example images of the transformed object. While our formal definition of invariance also covers such cases, this report focuses instead on invariance from very few images and mostly on invariances from one example. Image-plane invariances -- such as translation, rotation and scaling -- can be computed from a single image for any object. They are called generic since in principle they can be hardwired or learned (during development) for any object. In this perspective, we characterize the invariance range of a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream. We show that this class of models achieves essentially perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of "templates" (e.g. "centers" or "basis functions") at each level in the hierarchy. Such a representation inherits the invariance of each template, which is implemented through replication of the corresponding "simple" units across positions or scales and their "association" in a "complex" unit. We show simulations on real images that characterize the type and number of templates needed to support the invariant recognition of novel objects. We find that 1) the templates need not be visually similar to the target objects and that 2) a very small number of them is sufficient for good recognition. These somewhat surprising empirical results have intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby. In particular, we conjecture that invariance to translation and scale may be learned by the association -- through temporal contiguity -- of a small number of primal templates, that is patches extracted from the images of an object moving on the retina across positions and scales. The number of templates can later be augmented by bootstrapping mechanisms using the correspondence provided by the primal templates -- without the need of temporal contiguity.
    @TechReport{leibo,
      author = 	 {Leibo, J.Z. and Mutch, J. and Rosasco, L. and Ullman, S. and Poggio, T.},
      title = 	 {Learning Generic Invariances in Object Recognition: Translation and Scale},
      institution =  {MIT-CSAIL-TR-2010-061/CBCL-294},
      year = 	 {2010},
      address = 	 {MIT, Cambridge, MA},
      month = 	 {December},
    }
  • Neurons That Confuse Mirror-Symmetric Object Views.
    J. Mutch, J.Z. Leibo, S. Smale, L. Rosasco, and T. Poggio
    MIT-CSAIL-TR-2010-062, CBCL-295
    Abstract: Neurons in inferotemporal cortex that respond similarly to many pairs of mirror-symmetric images -- for example, 45 degree and -45 degree views of the same face -- have often been reported. The phenomenon seemed to be an interesting oddity. However, the same phenomenon has also emerged in simple hierarchical models of the ventral stream. Here we state a theorem characterizing sufficient conditions for this curious invariance to occur in a rather large class of hierarchical networks and demonstrate it with simulations.
    @techreport{MutLeiSmaRosPog10,
    	author={Mutch,J. and Leibo, J.L. and Smale, S. and Rosasco, L. and Poggio, T.},
    	title={Neurons That Confuse Mirror-Symmetric Object Views},
    	institution={MIT-CSAIL-TR-2010-062/CBCL-295}, 
    	address={MIT, Cambridge, MA}
    	month={December},
    	year={2010} 
    }
  • Spectral Regularization for Support Estimation.
    E. De Vito, L. Rosasco and A. Toigo
    Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (NeurIPS), 2010
    Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral esti- mators based on a new notion of reproducing kernel Hilbert space, which we call ''completely regular''. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.
    @inproceedings{SRegSE, 
       author = {De Vito, E. and Rosasco, L. and Toigo, A.}, 
       title = {Spectral Regularization for Support Estimation},
       booktitle = {Proceedings Advances in Neural Information Processing Systems (NeurIPS) 23},
       year = {2010},
       pages = {487-495},
       url = {http://books.NeurIPS.cc/papers/files/NeurIPS23/NeurIPS2010_0781.pdf},
    }
  • A primal-dual algorithm for group sparse regularization with overlapping groups.
    S. Mosci, S. Villa, L. Rosasco and A. Verri
    Proceedings Advances in Neural Information Processing Systems (NeurIPS), 2010
    Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm pre- sented in L. Jacob, G. Obozinski, and J.-P. Vert (2009), where the group lasso penalty is generalized to overlapping groups of variables. While in L. Jacob, G. Obozinski, and J.-P. Vert (2009) the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations.
    @incollection{glopridu,
       title = {A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups},
       author = {Mosci, S. and Villa, S. and Verri, A. and Rosasco, L.},
       booktitle = {Advances in Neural Information Processing Systems (NeurIPS) 23},
       editor = {J. Lafferty and C. K. I. Williams and J. Shawe-Taylor and R.S. Zemel and A. Culotta},
       pages = {2604--2612},
       year = {2010},
    }
  • Solving structured sparsity regularization with proximal methods.
    S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa
    Machine Learning and Knowledge Discovery in Databases, volume 6322 of Lecture Notes in Computer Science (Springer), 2010
    Abstract:  Proximal methods have recently been shown to provide ef- fective optimization procedures to solve the variational problems defining the l1 regularization algorithms. The goal of the paper is twofold. First we discuss how proximal methods can be applied to solve a large class of machine learning algorithms which can be seen as extensions of l1 regularization, namely structured sparsity regularization. For all these algorithms, it is possible to derive an optimization procedure which corresponds to an iterative projection algorithm. Second, we discuss the effect of a preconditioning of the optimization procedure achieved by adding a strictly convex functional to the objective function. Structured sparsity algorithms are usually based on minimizing a convex (not strictly convex) objective function and this might lead to undesired unstable behavior. We show that by perturbing the objective function by a small strictly convex term we often reduce substantially the number of required computations without affecting the prediction performance of the obtained solution.
    @inproceedings{prox_ECML,
       author = {Mosci, S. and Rosasco, L. and Santoro, M. and Verri, A. and Villa, S.},
       booktitle = {ECML/PKDD (2)},
       editor = {Balcázar, José L. and Bonchi, Francesco and Gionis, Aristides and Sebag, Mich\`ele},
       ee = {http://dx.doi.org/10.1007/978-3-642-15883-4$\_$27},
       pages = {418-433},
       publisher = {Springer},
       series = {Lecture Notes in Computer Science},
       title = {Solving Structured Sparsity Regularization with Proximal Methods.},
       volume = {6322},    year = {2010}, 
    }
  • A regularization approach to nonlinear variable selection.
    L. Rosasco, M. Santoro, S. Mosci, A. Verri, and S. Villa
    Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), volume 9 of JMLR, 2010
    Abstract: In this paper we consider a regularization approach to variable selection when the regression function depends nonlinearly on a few input variables. The proposed method is based on a regularized least square estimator penalizing large values of the partial derivatives. An efficient iterative procedure is proposed to solve the underlying variational problem, and its convergence is proved. The empirical properties of the obtained estimator are tested both for prediction and variable selection. The algorithm compares favorably to more standard ridge regression and l1 regularization schemes.
    @inproceedings{Reg_AISTATS,
       Author = {Rosasco, L. and Mosci, S. and S. Santoro, M. and Verri, A. and Villa, S.},
       Booktitle = {Proceedings of the 13 International Conference on Artificial Intelligence and Statistics},
       Title = {A Regularization Approach to Nonlinear Variable Selection},
       Year = {2010}}
    }
  • Vector Field Learning via Spectral Filtering.
    L. Baldassarre, L. Rosasco, A. Barla and A. Verri
    Machine Learning and Knowledge Discovery in Databases, volume 6322 of Lecture Notes in Computer Science (Springer), 2010
    Abstract: In this paper we present and study a new class of regularized kernel methods for learning vector fields, which are based on filtering the spectrum of the kernel matrix. These methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector valued extensions of L2-Boosting. Our theoretical and experimental analysis shows that spectral filters that yield iterative algorithms, such as L2-Boosting, are much faster than Tikhonov regularization and attain the same prediction performances. Finite sample bounds for the different filters can be derived in a common framework and highlight different theoretical properties of the methods. The theory of vector valued reproducing kernel Hilbert space is a key tool in our study.
    @inproceedings{shortMOL,
       author = {Baldassarre, L. and Rosasco, L. and Barla, A. and Verri, A.},
       title = {Vector Field Learning via Spectral Filtering},
       editor = {Balc\`azar, Jos\'e L. and Bonchi, Francesco and Gionis, Aristides and Sebag, Mich\`ele},    booktitle = {ECML/PKDD (1)},
       year = {2010},
       pages = {56-71},
       publisher = {Springer},
       series = {Lecture Notes in Computer Science},
       volume={6322},
       url = {http://dx.doi.org/10.1007/978-3-642-15880-3$\_$10}
    }
  • A fast algorithm for structured gene selection.
    S. Mosci, S. Villa, A. Verri and L. Rosasco
    Proceedings of Fourth International Workshop on Machine Learning in Systems Biology, Edinburgh (UK) 2010
    Abstract: We deal with the problem of gene selection when genes must be selected group-wise, where the groups, defined a priori and representing functional families, may overlap. We propose a new optimization procedure for solving the regularization problem proposed in Jacob et al. (2009), where the group lasso penalty is generalized to overlapping groups. While in Jacob et al. (2009) the proposed implementation requires replication of genes belonging to more than one group, our iterative procedure, provides a scalable alternative with no need for data duplication. This scalability property allows avoiding the otherwise necessary pre-processing for dimensionality reduction, which is at risk of discarding relevant biological information, and leads to improved prediction performances and higher selection stability.
    @InProceedings{geneselection,
       author = {Mosci, S. and Villa, S. and Verri, A. and Rosasco, L.},
       title = {A fast algorithm for structured gene selection},
       booktitle = {Fourth International Workshop on Machine Learning in Systems Biology},
       year = {2010},
       address={Edinburgh, UK}
    }