(CIFASIS Aprendizaje Automatizado y Aplicaciones) Artículos post-print
URI permanente para esta colección
Examinar
Envíos recientes
Ítem Acceso Abierto Clustering gene expression data with a penalized graph-based metric(BioMed Central, 2011-01-04) Bayá, Ariel E.; Granitto, Pablo M.Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.Ítem Acceso Abierto Time–Adaptive Support Vector Machines(Asociación Española de Inteligencia Artificial, 2008) Grinblat, Guillermo; Granitto, Pablo M.; Ceccatto, AlejandroIn this work we propose an adaptive classification method able both to learn and to follow the temporal evolution of a drifting concept. With that purpose we introduce a modified SVM classifier, created using multiple hyperplanes valid only at small temporal intervals (windows). In contrast to other strategies proposed in the literature, our method learns all hyperplanes in a global way, minimizing a cost function that evaluates the error committed by this family of local classifiers plus a measure associated to the VC dimension of the family. We also show how the idea of slowly changing classifiers can be applied to non-linear stationary concepts with results similar to those obtained with normal SVMs using gaussian kernels.Ítem Acceso Abierto Discriminant models based on sensory evaluations: Single assessors versus panel average(Elsevier B.V., 2008-09) Granitto, Pablo M.; Biasioli, Franco; Endrizzi, Isabella; Gasperi, FlaviaProduct classification based on sensory evaluations can play an important role in quality control or typicality assessment. Unfortunately its real world applications face the difficulties related to the cost of a proper sensory approach. To partially overcome these issues we propose to build discriminant models based on the evaluation of single assessors and develop an appropriate method to combine them. We compare this new strategy with the more traditional one based on the panel average. We consider as applicative examples two datasets obtained from the sensory assessment of diverse cheese typologies from North Italy by two different panels. Also, we apply diverse, innovative and noise-resistant discriminant methods (random forest, penalized discriminant analysis and discriminant partial least squares) to show that our new strategy based on modeling each individual assessor is efficient and that this result is independent of the classifier being used. The main finding of our work is that using noise-resistant multivariate methods, product discrimination based on the combination of independent models built for each assessor is never worse than discrimination based on panel average and that the error reduction is higher in the case of low consonance between assessors. Experiments on the same datasets adding random uniform values (noise) with different intensities support these findings. We also discuss a demonstrative experiment using different sets of attributes for each assessor. Overall, our results suggest that, if the goal is product classification, the consonance among assessors or even the use of the same vocabulary seem not necessary, the key factor being the discrimination capability and repeatability of each judge.