--- title: "Introduction to greenfeedr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to greenfeedr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` When working with GreenFeed system you must: * Retrieve data from each GreenFeed unit * Check animal visitation * Filtering and process data for further analysis The greenfeedr makes these steps fast and easy: * It provides functions for downloading, reporting, and processing GreenFeed data This document introduces you to greenfeedr's set of tools, and shows you how to apply them to data frames. ```{r setup} # install.packages("greenfeedr") library(greenfeedr) ``` ## Download data with `get_gfdata()` The first step in working with GreenFeed is to retrieve your data from the system. This can be done manually by logging into the GreenFeed web interface [C-Lock Inc.](https://ext.c-lockinc.com/greenfeed.php) using your username and password. However, there is a simpler approach: that is via the API using `get_gfdata()` function. This function automates the data retrieval process, allowing you to specify user and password, units, and date range to download the data directly from C-Lock Inc. server. Note that if you have multiple studies simultaneously, you can define a list of studies as follows: ```{r download data} USER <- "your_username" PASS <- "your_password" studies <- list( list( Experiment = "Experiment_01", Unit = c(2, 3), StartDate = "2024-01-20", EndDate = Sys.Date(), save_dir = "/tempdir()/Experiment_01/" ), list( Experiment = "Experiment_02", Unit = c(212), StartDate = "2024-02-01", EndDate = Sys.Date(), save_dir = "/tempdir()/Experiment_02/" ) ) # Here you loop (using 'for') over all your studies applying get_gfdata() function # for (element in studies) { # get_gfdata(USER, PASS, element$Experiment, element$Unit, element$StartDate, element$EndDate, element$save_dir) # } ``` ## How it looks the GreenFeed data? The package provides daily and final datasets for you to explore the basic format of GreenFeed data. The data provided is actual data from a 32-cow study. ```{r data} # Open the daily data downloaded from C-Lock Inc. server daily_data <- readr::read_csv(system.file("extdata", "StudyName_GFdata.csv", package = "greenfeedr"), show_col_types = FALSE) # View the structure of the daily data str(daily_data) # View the first few rows of the daily data head(daily_data) # Open the finalized data received from C-Lock Inc. final_data <- readxl::read_excel(system.file("extdata", "StudyName_FinalReport.xlsx", package = "greenfeedr"), col_types = c("text", "text", "numeric", rep("date", 3), rep("numeric", 12), "text", rep("numeric", 6)) ) # View the structure of the daily data str(final_data) # View the first few rows of the daily data head(final_data) ``` ## Report data with `report_gfdata()` The next step in working with GreenFeed is to check animal visitation and gases production on animals and on a daily basis. The `report_gfdata()` function allows you to download the daily data and generates an easy-to-read report to check your GreenFeed on the farm. In addition, you can use `report_gfdata()` to generate a final report of your study by providing the final data received from C-Lock Inc. three to four weeks after the end of your study. This function is very useful when you have several studies and units running simultaneously to check that the study is going in the right direction. ```{r report data, message = FALSE, echo = FALSE, warning = FALSE} library(dplyr) library(ggplot2) file <- system.file("extdata", "StudyName_FinalReport.xlsx", package = "greenfeedr") start_date <- "2024-05-13" end_date <- "2024-05-25" input_type <- "final" plot_opt <- "All" rfid_file <- NULL df <- readxl::read_excel(file, col_types = c("text", "text", "numeric", rep("date", 3), rep("numeric", 12), "text", rep("numeric", 6))) names(df)[1:14] <- c( "RFID", "AnimalName", "FeederID", "StartTime", "EndTime", "GoodDataDuration", "HourOfDay", "CO2GramsPerDay", "CH4GramsPerDay", "O2GramsPerDay", "H2GramsPerDay", "H2SGramsPerDay", "AirflowLitersPerSec", "AirflowCf" ) # df contains finalized GreenFeed data df <- df %>% ## Remove leading zeros from RFID col to match with IDs dplyr::mutate( RFID = gsub("^0+", "", RFID), ## Extract hours, minutes, and seconds from GoodDataDuration GoodDataDuration = round( as.numeric(substr(GoodDataDuration, 12, 13)) * 60 + # Hours to minutes # as.numeric(substr(GoodDataDuration, 1, 2)) * 60 + # Hours to minutes as.numeric(substr(GoodDataDuration, 15, 16)) + # Minutes # as.numeric(substr(GoodDataDuration, 4, 5)) + as.numeric(substr(GoodDataDuration, 18, 19)) / 60, # Seconds to minutes # as.numeric(substr(GoodDataDuration, 7, 8)) / 60, 2 ) ) %>% ## Remove data with Airflow below the threshold (25 l/s) and data in the time range selected dplyr::filter( AirflowLitersPerSec >= 25, as.Date(StartTime) >= as.Date(start_date) & as.Date(StartTime) <= as.Date(end_date) ) cols_to_convert <- c("CH4GramsPerDay", "CO2GramsPerDay", "O2GramsPerDay", "H2GramsPerDay") df[cols_to_convert] <- lapply(df[cols_to_convert], as.numeric) # Plot 1: Total number of production records per day plot1 <- ggplot(as.data.frame(table(as.Date(df$StartTime))), aes(x = Var1, y = Freq)) + geom_col(color = "black") + labs( title = "Total Records Per Day", x = "", y = "Total Records" ) + geom_text(aes(label = Freq), vjust = -0.5, color = "black", size = 2.2, position = position_dodge(width = 0.9)) + theme_classic() + theme( plot.title = element_text(size = 11, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8), axis.title.y = element_text(size = 10, face = "bold"), legend.position = "none" ) plot1 # Assuming RFIDfile is provided or not, set the grouping variable group_var <- if (!is.null(rfid_file) && is.data.frame(rfid_file) && nrow(rfid_file) > 0) "FarmName" else "RFID" farmname_order <- df %>% dplyr::mutate(day = as.Date(EndTime)) %>% dplyr::group_by(!!sym(group_var), day) %>% dplyr::summarise( n = n(), daily_CH4 = weighted.mean(CH4GramsPerDay, GoodDataDuration, na.rm = TRUE) ) %>% dplyr::group_by(!!sym(group_var)) %>% dplyr::summarise( n = sum(n), daily_CH4 = mean(daily_CH4, na.rm = TRUE) ) %>% dplyr::arrange(desc(daily_CH4)) %>% dplyr::pull(!!sym(group_var)) # Plot 1: Total number of records per animal plot1 <- df %>% dplyr::mutate(day = as.Date(EndTime)) %>% dplyr::group_by(!!sym(group_var), day) %>% dplyr::summarise( n = n(), daily_CH4 = weighted.mean(CH4GramsPerDay, GoodDataDuration, na.rm = TRUE) ) %>% dplyr::group_by(!!sym(group_var)) %>% dplyr::summarise( n = sum(n), daily_CH4 = mean(daily_CH4, na.rm = TRUE) ) %>% ggplot(aes(x = factor(!!sym(group_var), levels = farmname_order), y = n)) + geom_bar(stat = "identity", position = position_dodge()) + labs( title = "Total Records Per Animal", x = "", y = "Total Records" ) + theme_classic() + theme( plot.title = element_text(size = 11, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8), axis.title.y = element_text(size = 10, face = "bold"), legend.position = "none" ) + geom_text(aes(label = n), vjust = -1, color = "black", position = position_dodge(width = 0.9), size = 2.2) # Plot distribution of records throughout the day plot2 <- df %>% dplyr::mutate(AMPM = case_when( HourOfDay >= 22 ~ "10PM-4AM", HourOfDay < 4 ~ "10PM-4AM", HourOfDay >= 4 & HourOfDay < 10 ~ "4AM-10AM", HourOfDay >= 10 & HourOfDay < 16 ~ "10AM-4PM", HourOfDay >= 16 & HourOfDay < 22 ~ "4PM-10PM", TRUE ~ NA_character_ )) %>% dplyr::group_by(!!sym(group_var), AMPM) %>% dplyr::summarise(n = n()) %>% ggplot(aes( x = factor(!!sym(group_var), levels = farmname_order), y = n, fill = factor(AMPM, levels = c("10PM-4AM", "4AM-10AM", "10AM-4PM", "4PM-10PM")) )) + geom_bar(stat = "identity", position = "fill") + labs( title = "Daily Records Distribution", x = "", y = "Percentage of total records", fill = "Time-Windows (24h)" ) + theme_classic() + theme( plot.title = element_text(size = 11, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8), legend.position = "bottom", axis.title.y = element_text(size = 10, face = "bold") ) + scale_fill_brewer(palette = "BrBG") + scale_y_continuous(breaks = c(0, 0.25, 0.50, 0.75, 1), labels = c("0%", "25%", "50%", "75%", "100%"), expand = c(0, 0)) # Create the plots plot1 plot2 ``` ## Process data with `process_gfdata()` #### Using process_gfdata() Before proceeding with analysis, it's useful to summarize key aspects of the dataset: * Total number of records * Records per day * Days with records per week * Weeks with records To analyze the data effectively, we will use `process_gfdata()`, which requires three arguments: * **`param1`** is the number of records per day. * This parameter controls the minimum number of records that must be present for each day in the dataset to be considered valid. * **`param2`** is the number of days with records per week. * This parameter ensures that a minimum number of days within a week have valid records to be included in the analysis. * **`min_time`** is the minimum duration of a record. * This parameter specifies the minimum time threshold for each record to be considered valid. To determine the best-fitting parameters for our dataset, we will use the `eval_param()` function. ```{r parameters} # Define the parameter space for param1 (i), param2 (j), and min_time (k): i <- seq(1, 6) j <- seq(1, 7) k <- seq(2, 6) # Generate all combinations of i, j, and k param_combinations <- expand.grid(param1 = i, param2 = j, min_time = k) ``` In total, we have `r nrow(param_combinations)` different parameter combinations to evaluate. The next step is to evaluate the `process_gfdata()` function with the set of parameters we defined. The function can accept either a file path to the data files or the data as a data frame. ```{r Example, message = FALSE, results = 'hide'} finaldata <- readxl::read_excel(system.file("extdata", "StudyName_FinalReport.xlsx", package = "greenfeedr")) data <- eval_gfparam(data = finaldata, start_date = "2024-05-13", end_date = "2024-05-25" ) ``` After evaluating the function, the results will be placed into a data frame with the following structure: ```{r Results table, echo = FALSE, message = FALSE, results = 'asis'} cat(knitr::kable(data[1:10, ], format = "html", table.attr = "style='font-size: 12px;'")) ``` This will give users a sense of how the data is filtered based on the chosen parameters. A more conservative approach (i.e., stricter parameters) will typically result in fewer retained records and animals. ## Process pellet intakes and visits with `pellin()` and `viseat()` The greenfeedr includes additional functions to help you process daily entries and visits. To check animal visits, you must access the GreenFeed web interface and, in the [data](https://ext.c-lockinc.com/data.php) tab, select “Download Large Dataset” and define the time period for which you want to analyze data. In the folder that the system downloads to your computer, you will find a file 'feedtimes'. This is the file you will use as input for pellin and viseat. If you have more than one 'feedtimes' file because you are using multiple GreenFeed units in the same experiment then you just need to include them as a list of files. Note that you should include the result obtained from the 10-drops test. If units have different gram values, define 'gcup' as a vector with an element for each unit. ```{r pellin and viseat} file <- system.file("extdata", "feedtimes.csv", package = "greenfeedr") result <- pellin( unit = 1, gcup = 34, start_date = "2024-05-13", end_date = "2024-05-25", save_dir = tempdir(), file_path = file ) head(result) ``` ## Citation