The getting started vignette explains how to apply the core functions of the imagefluency package to an image. However, often you want to analyze several images at once. The following tutorial provides a walkthrough of batch-analyzing images with the imagefluency package.

### Reading all images from a folder

Before we can start analyzing the images, we need to load them into R. In this example, we choose to read all images that are shipped with the imagefluency package. However, you can replace the file path with any file path on your computer.

Next, we read the file names of all images that satisfy a specific pattern using list.files(). In our case, we want the file names of all files that have .jpg as their file extension. Thus, we specify pattern = '*.jpg$'. The $ sign at the end ensures that we only find files that end with .jpg, a file like image1.jpg.png would therefore not be listed. Multiple patterns could also be specified (e.g., pattern = '*.jpg$|*.png$'). Note that we do not actually load the images into our R session here. Instead, using list.files() we just extract the images’ file paths for later use.

# path to images (here: example images from the imagefluency package)
#
# --NOTE: replace with your folder of interest, e.g.
#         mypath <- "C:/Users/NAME/Documents/Project/images/"
mypath <- system.file("example_images", package = "imagefluency")

# --NOTE: change file ending accordingly!
fileNames <- list.files(mypath, pattern = '*.jpg$', full.names = TRUE) ### Setting up the analysis In the next step, we load the imagefluency package and create a data frame to store the results. In this tutorial, we’ll use a simple for loop to iterate over all images in the folder, get the image fluency scores, and write the results into the data frame. This is the simplest approach if you have many images and might run into memory issues when loading all images at once. Of course, it is also possible to run the analyses using lapply() or purrr::map() or their parallel counterparts (see speeding up your analysis). # load imagefluency package library(imagefluency) # data frame to store results results <- data.frame(filename = basename(fileNames), contrast = NA, selfsimilarity = NA, simplicity = NA, symmetry = NA, typicality = NA) ### Looping over all images Now it’s time to calculate the image fluency scores! To this end, we create a big loop that iterates over all image files. Within the loop, first, the image is read into R. Next the scores for contrast, self-similarity, simplicity, and symmetry are computed for each image. The results are stored in the results data frame we created before. We use tryCatch() in the loop so that the loop does not break if there are any errors with a specific image. Instead, the result is simply an NA if there is an error. Note that we only compute vertical symmetry in the code below. Optionally, you can also convert all images to grayscale to speed up the computation. # big loop over all images for(i in seq_along(fileNames)){ # print loop info cat('** Estimating scores for image',i,'**\n') # 1. read image into R img <- img_read(fileNames[i]) # 2. convert image to grayscale to speed up computation (optional) # --NOTE: remove comment sign in the following line to apply grayscale conversion # img <- rgb2gray(img) # 3. estimate image fluency scores (except typicality) # and store the results results$contrast[i] <- tryCatch(img_contrast(img), error = function(e) NA)
results$selfsimilarity[i] <- tryCatch(img_self_similarity(img), error = function(e) NA) results$simplicity[i] <- tryCatch(img_simplicity(fileNames[i]), error = function(e) NA)

### Saving the results

All done! We have now computed all image fluency scores. Let’s have a look at the results.

knitr::kable(results, digits = 3)
filename contrast selfsimilarity simplicity symmetry typicality
berries.jpg 0.287 -1.784 0.187 0.524 0.478
bike.jpg 0.082 -0.159 0.399 0.419 0.234
bridge.jpg 0.256 -1.112 0.234 0.480 0.437
fireworks.jpg 0.175 -1.036 0.431 0.658 -0.130
office.jpg 0.227 -1.217 0.188 0.171 0.347
rails.jpg 0.364 -0.578 0.536 0.955 0.797
romanesco.jpg 0.292 -0.032 0.363 0.825 0.281
sky.jpg 0.081 -1.562 0.580 0.647 0.053
trees.jpg 0.189 -0.551 0.105 0.191 0.281
valley_green.jpg 0.246 -0.507 0.252 0.803 0.633
valley_white.jpg 0.188 -0.590 0.191 0.601 0.579

If everything worked, we can save the results e.g. into a .csv file.

# save the results into the image folder
write.csv(results, file = paste0(mypath, 'imagefluency_scores.csv'), row.names = FALSE)

## Speeding up the analysis

If you plan to include typicality in your image analysis next to other metrics, there is no need for the big loop from before. The reason is that you load all images into memory when computing typicality anyway. Thus, we can also do this step right at the beginning of our analysis and compute all image fluency scores from there.

### Reading all images into memory using lapply()

In the example below, I’ll use base R’s lapply() function to read all images. However, you can of course use purrr::map() just as well.

Using lapply() is very simple: We tell R to apply the img_read() function to every element in the fileNames object. The result is a list where every list element contains an image matrix. Thus, this object might be very big!

# read all images at once into memory
allImages <- lapply(fileNames, img_read)

The above code works perfectly fine if every image can be read without problems (which is usually the case). However, if we want to have a more robust version, we can also define a custom img_read() function that includes error handling with tryCatch.

# define custom image read function that catches errors and returns NA
}

# read images with custom function
allImages <- lapply(fileNames, img_read_no_error)

### Computing image fluency scores using lapply()

Once we have all images read into memory, we can compute the image fluency scores for all images. To facilitate the process for contrast, self-similarity, simplicity, and symmetry, we define a function that computes the scores for a given image, and then apply the function to all images.

# define function that computes all image fluency scores except typicality
img_fluency_scores <- function(img) {
contr <- tryCatch(img_contrast(img), error = function(e) NA)
selfsim <- tryCatch(img_self_similarity(img), error = function(e) NA)
simpl <- tryCatch(img_simplicity(img), error = function(e) NA)
sym <- tryCatch(img_symmetry(img, horizontal = FALSE), error = function(e) NA)
return(c(contrast = contr,
selfsimilarity = selfsim,
simplicity = simpl,
symmetry = unname(sym)))
}

# apply function to images
results <- lapply(allImages, img_fluency_scores)

Once that’s done, we can convert the results into a data frame and add the images’ file names.

# convert results to data frame and add file names
results <- do.call(rbind, results)
results <- data.frame(filename = basename(fileNames), results)

In the final step, we add the typicality scores as in the previous section and look at the results.

# compute and add typicality to results
results$typicality <- as.vector(img_typicality(allImages)) # print a nicely formatted results table knitr::kable(results, digits = 3) filename contrast selfsimilarity simplicity symmetry typicality berries.jpg 0.287 -1.784 0.187 0.524 0.478 bike.jpg 0.082 -0.159 0.399 0.419 0.234 bridge.jpg 0.256 -1.112 0.234 0.480 0.437 fireworks.jpg 0.175 -1.036 0.431 0.658 -0.130 office.jpg 0.227 -1.217 0.188 0.171 0.347 rails.jpg 0.364 -0.578 0.536 0.955 0.797 romanesco.jpg 0.292 -0.032 0.363 0.825 0.281 sky.jpg 0.081 -1.562 0.580 0.647 0.053 trees.jpg 0.189 -0.551 0.105 0.191 0.281 valley_green.jpg 0.246 -0.507 0.252 0.803 0.633 valley_white.jpg 0.188 -0.590 0.192 0.601 0.579 ## Parallel processing with multiple cores Above code using lapply() should be faster than the approach of using first a big loop and then again reading in all images to compute typicality. However, we might gain a much larger increase in speed, by using parallelized versions of lapply() or purrr::map(), respectively. The following code example demonstrates shortly how to achieve this. Essentially, all you have to do is to replace all instances of lapply() from before with parallel::mclapply() (multi-core lapply). In the example below, however, I’ll use pbmcapply::pbmclapply which adds a progress bar (pb) to the multi-core lapply. # detect cores for multi-core processing ncores <- parallel::detectCores() # read images with multi-core lapply and progress bar allImages <- pbmcapply::pbmclapply(fileNames, img_read_no_error, mc.cores = ncores) # compute image fluency scores results <- pbmcapply::pbmclapply(allImages, img_fluency_scores, mc.cores = ncores) # convert results to data frame and add file names results <- do.call(rbind, results) results <- data.frame(filename = basename(fileNames), results) # compute and add typicality to results results$typicality <- as.vector(img_typicality(allImages))

Let’s do a quick comparison of the computing time when computing the four image fluency scores with and without multi-core processing. Note that with increasing file size or number of images, this speed bump will become larger. We ‘simulate’ this by reading in the images 10 times.

# read images 10 times
cat('Number of images:', length(allImages), '\n')
> Number of images: 110
# compute imgagefluency scores using lapply() with progress bar
tictoc::tic("lapply")
results_lapply <- pbapply::pblapply(allImages, img_fluency_scores)
tictoc::toc(log=TRUE)
> lapply: 8.73 sec elapsed
# compute imgagefluency scores using multi-core lapply() with progress bar
tictoc::tic("pbmclapply")
results_pbmclapply <- pbmcapply::pbmclapply(allImages, img_fluency_scores,
mc.cores = ncores)
tictoc::toc(log=TRUE)
> pbmclapply: 2.44 sec elapsed

We can see that the multi-core approach is more than three times faster (in this single test run) than the single-core approach! Let’s see how furrr::future_map() compares to this.

library(dplyr)
future::plan('multisession')

tictoc::tic('future_map')
results_future_map <- allImages %>% furrr::future_map(img_fluency_scores,
.progress = TRUE,
.options = furrr::furrr_options(seed = TRUE))
tictoc::toc(log=TRUE)
> future_map: 2.98 sec elapsed

Finally, we confirm that all results are the same.

all.equal(results_lapply, results_pbmclapply)
> [1] TRUE
all.equal(results_lapply, results_future_map)
> [1] TRUE