Package description

SOMbrero (‘Self Organizing Maps Bound to Realize Euclidean and Relational Outputs’) implements several variants of the stochastic Self-Organising Map algorithm and is able to handle numeric and non numeric data sets (contingency tables, graphs or any ‘relational’ data described by a dissimilarity matrix).

See help(SOMbrero) for further details.

Numeric SOM

The numeric SOM is illustrated on the well-known iris data set. This data describe iris flowers with 4 numeric variables (Sepal.Length, Sepal.Width, Petal.Length and Petal.Width) and a fifth variable (not used to train the SOM) is the flower species. This example is processed in the numeric SOM guide.

Contingency tables

The SOM algorithm provided by the package SOMbrero can also handle some non-numeric data. First, data described by contingency tables, which can be processed using the ‘korresp’ algorithm (see Cottrell et al., 2004, 2005). This case is illustrated on the presidentielles2002 dataset, which contains the number of votes in the first round of the French 2002 presidential election, for each of the French administrative departments (row variables) and each of the candidates (column variables). This example is used in the korresp user guide.

Dissimilarity matrices

Data described by a dissimilarity matrix can also be processed by SOMbrero as described in Olteanu et al., 2015a. This case is illustrated on a data set extracted from the novel Les Miserables, written by the French author Victor Hugo and published during the XIXth century. This dataset provides a dissimilarity matrix between the characters of the novel, based on the length of shortest paths in a network defined from the novel. This example is provided in the relational user guide.

For those who have an R developer soul, and who want to help improve this package, the following picture provides an overview the current arborescence of the package:

Session information

This vignette has been computed with the following environment:

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] SOMbrero_1.3-1 markdown_1.1   igraph_1.2.5  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5           compiler_4.0.2       pillar_1.4.6        
##  [4] tools_4.0.2          digest_0.6.25        lubridate_1.7.9     
##  [7] checkmate_2.0.0      evaluate_0.14        memoise_1.1.0       
## [10] lifecycle_0.2.0      tibble_3.0.3         gtable_0.3.0        
## [13] png_0.1-7            pkgconfig_2.0.3      rlang_0.4.7         
## [16] rstudioapi_0.11      yaml_2.2.1           pkgdown_1.5.1       
## [19] xfun_0.16            interp_1.0-33        metR_0.7.0          
## [22] stringr_1.4.0        knitr_1.29           generics_0.0.2      
## [25] desc_1.2.0           fs_1.5.0             vctrs_0.3.2         
## [28] scatterplot3d_0.3-41 rprojroot_1.3-2      grid_4.0.2          
## [31] data.table_1.13.0    glue_1.4.1           R6_2.4.1            
## [34] rmarkdown_2.3        deldir_0.1-28        ggplot2_3.3.2       
## [37] magrittr_1.5         backports_1.1.8      scales_1.1.1        
## [40] htmltools_0.5.0      ellipsis_0.3.1       MASS_7.3-51.6       
## [43] ggwordcloud_0.5.0    assertthat_0.2.1     colorspace_1.4-1    
## [46] stringi_1.4.6        munsell_0.5.0        crayon_1.3.4