Skip to content
Learn Plant Science

Learn Plant Science

Explore the the green world with us

  • Home
  • Statistics – Experimental Design & Data Analysis Using R
  • Principal component analysis (PCA) in R studio
big data, database, analysis-3338320.jpg

Principal component analysis (PCA) in R studio

Posted on March 18, 2023May 3, 2023 By Janith Piumal No Comments on Principal component analysis (PCA) in R studio
Statistics – Experimental Design & Data Analysis Using R
######### Get the data#################
data("iris")
str(iris)

Output: 
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

############training datasets and test datasets building #################
set.seed(111)
ind <- sample(2, nrow(iris),
              replace = TRUE,
              prob = c(0.8, 0.2))
training <- iris[ind==1,]
testing <- iris[ind==2,]

#########Scatter Plot & Correlations#check the correlation between variables########
install.packages("psych")
library(psych)
pairs.panels(training[,-5],
             gap = 1,
             bg = c("orange", "pink", "yellow")[training$Species],
             pch=22)

Output:

According to this petal length and petal width, sepal length and petal length , Sepal length, and petal width are highly correlated. This leads to multicollinearity. This issue can be reduce using PCA analysis.

#############Principal Component Analysis########################
pca <- prcomp(training[,-5],
             center = TRUE,
             scale. = TRUE)
attributes(pca)

[1] "sdev"     "rotation" "center" 
[4] "scale"    "x"      
$class
[1] "prcomp"
pca$center
Sepal.Length  Sepal.Width Petal.Length
5.8          3.1          3.6
Petal.Width
1.1

pca$scale
Sepal.Length  Sepal.Width Petal.Length
0.82         0.46         1.79
Petal.Width
0.76

print(pca)

Output:
Standard deviations (1, .., p=4):
[1] 1.7173318 0.9403519 0.3843232 0.1371332

Rotation (n x k) = (4 x 4):
                    PC1         PC2        PC3        PC4
Sepal.Length  0.5147163 -0.39817685  0.7242679  0.2279438
Sepal.Width  -0.2926048 -0.91328503 -0.2557463 -0.1220110
Petal.Length  0.5772530 -0.02932037 -0.1755427 -0.7969342
Petal.Width   0.5623421 -0.08065952 -0.6158040  0.5459403

###########summarized#####################
summary(pca)


Output:
Importance of components:
                          PC1    PC2     PC3    PC4
Standard deviation     1.7173 0.9404 0.38432 0.1371
Proportion of Variance 0.7373 0.2211 0.03693 0.0047
Cumulative Proportion  0.7373 0.9584 0.99530 1.0000

######scatter plot## To check the correlation between the principal components ####
  pairs.panels(pca$x,
               gap=0,
               bg = c("orange", "pink", "yellow")[training$Species],
               pch=22)

Output:

Now there is no correlation between multiple variables therefore there is no  multicollinearity issue.

######### explain the PCA using BY BIPLOT################
library(devtools)
install_github("vqv/ggbiplot")
library(ggbiplot)
g <- ggbiplot(pca,
              obs.scale = 1,
              var.scale = 1,
              groups = training$Species,
              ellipse = TRUE,
              circle = TRUE,
              ellipse.prob = 0.68)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
               legend.position = 'top')
print(g)

Output:

BIPLOT is useful to understand what is happening in the data set.  

  • PC1 is positively correlated with the variables Petal Length, Petal Width, Sepal Length,negatively correlated with Sepal Width.
  • PC2 is negatively correlated with Sepal Width.

References

  • Principal component analysis (PCA) in R | R-bloggers. (2021, May 7). Principal Component Analysis (PCA) in R | R-bloggers. https://www.r-bloggers.com/2021/05/principal-component-analysis-pca-in-r/

Post navigation

❮ Previous Post: Multivariate Analysis
Next Post: Hierarchical Cluster Analysis in R studio ❯

You may also like

audit, chart, graph-3229739.jpg
Statistics – Experimental Design & Data Analysis Using R
Statistical Sampling Methods.
March 4, 2023
accountant, counting, calculation-1794122.jpg
Statistics – Experimental Design & Data Analysis Using R
Experimental Design Models
March 4, 2023
web, network, programming-3706551.jpg
Statistics – Experimental Design & Data Analysis Using R
Hierarchical Cluster Analysis in R studio
March 18, 2023
code, coding, web-944499.jpg
Statistics – Experimental Design & Data Analysis Using R
Z – test & T – Test in R
January 27, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Blog
  • Environment science
  • Horticulture
  • Microbiology
  • Molecular Biology
  • Phylogenetic
  • Plant Breeding
  • Statistics – Experimental Design & Data Analysis Using R
  • Uncategorized

Recent Posts

  • R for Phylogenetic
  • Character based approach of phylogenetic analysis
  • Distance based approach of phylogenetic analysis
  • Phylogenetic Analysis data
  • Applications of the phylogenetic Analysis.

Services

  • Study nature
  • Nature is a gift
  • A second spring
  • Smiles of nature
  • Just let it rain

Contact

Proin gravida nibh auctor aliquet amet anean sollicitudin, lorem quis.

  • 12 Avenue, New York, NY 10160
  • +1 910-626-85255
  • contact@nature.com
  • Home
  • Statistics
  • Phylogenetics
  • Microbiology
  • Plant Breeding
  • Horticulture
  • Molecular Biology
  • Environment science

Copyright © 2026 Learn Plant Science.

Theme: Oceanly Green by ScriptsTown