Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naniar::recode_shadow() throwing error: "idx must contain one integer for each level of f" #272

Open
antondutoit opened this issue Sep 21, 2020 · 8 comments · Fixed by #273

Comments

@antondutoit
Copy link

antondutoit commented Sep 21, 2020

I have been using recode_shadow() to label missing values which are due to questionnaire skips as 'NA_skip'. The function runs for one column without error, but when run over the second and subsequent columns it persistently throws an error:

Error: `idx` must contain one integer for each level of `f `

I thought the error message referred to empty factor levels, so I created dummy rows with values of NA and NA_skip, but this did not fix the problem.

`
library(tidyverse)
library(naniar)

df <- data.frame(Q1 = c("yes", "no", "no", NA), Q2 = c("a", NA, NA, NA), Q3 = c(1, NA, NA, 4)) %>% 
    mutate(Q1 = factor(Q1)) %>% 
    mutate(Q2 = factor(Q2))

df_sh <- bind_shadow(df)

df_sh_recode <- df_sh

# Q1 is a filter question - people who answer no should skip Q2 and Q3
df_sh_recode <- df_sh_recode %>% 
    recode_shadow(Q2 = .where(Q1 %in% "no" ~ "skip"))

df_sh_recode <- df_sh_recode %>% 
    recode_shadow(Q3 = .where(Q1 %in% "no" ~ "skip"))
#> Error: `idx` must contain one integer for each level of `f`
# throws Error: `idx` must contain one integer for each level of `f`

# there are empty factor levels ..
df_sh_recode$Q1_NA %>% table(., useNA = "always")
#> .
#>     !NA      NA NA_skip    <NA> 
#>       3       1       0       0

# .. so: kludge: add dummy rows 5 and 6, to fill up factor levels in an attempt to solve the 'idx must contain one integer .." error
df_sh_recode[5, ] <- NA
df_sh_recode[5, 4:6] <- "NA_skip"
df_sh_recode[6, ] <- NA
df_sh_recode[6, 4:6] <- "NA"

# and re-run the recode_shadow for Q3:
df_sh_recode <- df_sh_recode %>% 
    recode_shadow(Q3 = .where(Q1 %in% "no" ~ "skip"))
#> Error: `idx` must contain one integer for each level of `f`
# still throws same error
`
Created on 2020-09-21 by the reprex package (v0.3.0)
@njtierney
Copy link
Owner

Thank you!

OK so I've isolated the bug in the "recode-shadow-factor-bug" branch, this currently works.

Can you try and install this and let me know if this works on this approach, and in your larger use case?

You can install it using:

remotes::install_github("njtierney/naniar#273")

I've not got a few warnings to handle that crop up using this approach so I will need to make a few small changes, but if you can let me know if this works?

Cheers!

@antondutoit
Copy link
Author

I've just run it with the code I've got so far -- all good! No errors and the data passes a quick eyeball test.

I'll code some more columns tonight or tomorrow and let you know how that goes (got to get out now before the light fails, the National Park is calling :-) )

@njtierney
Copy link
Owner

OK that's glorious!

Enjoy the National Park!

@njtierney
Copy link
Owner

Fixed the warnings I was getting previously this bug should now be resolved, thanks for the reprex and happy travels on your missing data journey!

@antondutoit
Copy link
Author

Thanks Nick. Just used recode_shadow again on a big block of missings, again with no problems.

@njtierney
Copy link
Owner

Great news!

@szimmer
Copy link

szimmer commented Jan 14, 2024

This is marked as closed but I'm still encountering this issue. I just installed from CRAN so I'm not using an older version.

library(naniar)
#> Warning: package 'naniar' was built under R version 4.3.2

df <- tibble::tribble(
  ~wind, ~temp, ~pressure,
  -99,    45, 3,
  68,    NA, 1,
  72,    25, -88,
  38,    24, -99
)

dfs <- bind_shadow(df)

# example from vignette
dfs %>%
  recode_shadow(temp = .where(wind == -99 ~ "bananas")) %>%
  recode_shadow(wind = .where(wind == -99 ~ "apples"))
#> # A tibble: 4 × 6
#>    wind  temp pressure wind_NA   temp_NA    pressure_NA
#>   <dbl> <dbl>    <dbl> <fct>     <fct>      <fct>      
#> 1   -99    45        3 NA_apples NA_bananas !NA        
#> 2    68    NA        1 !NA       NA         !NA        
#> 3    72    25      -88 !NA       !NA        !NA        
#> 4    38    24      -99 !NA       !NA        !NA

# Add some simple coding for pressure
dfs %>%
  recode_shadow(temp = .where(wind == -99 ~ "bananas")) %>%
  recode_shadow(wind = .where(wind == -99 ~ "apples")) %>%
  recode_shadow(pressure=.where(pressure == -88 ~ "oranges"))
#> # A tibble: 4 × 6
#>    wind  temp pressure wind_NA   temp_NA    pressure_NA
#>   <dbl> <dbl>    <dbl> <fct>     <fct>      <fct>      
#> 1   -99    45        3 NA_apples NA_bananas !NA        
#> 2    68    NA        1 !NA       NA         !NA        
#> 3    72    25      -88 !NA       !NA        NA_oranges 
#> 4    38    24      -99 !NA       !NA        !NA

# Add some complex coding for pressure
dfs %>%
  recode_shadow(temp = .where(wind == -99 ~ "bananas")) %>%
  recode_shadow(wind = .where(wind == -99 ~ "apples")) %>%
  recode_shadow(pressure=.where(pressure == -99 ~ "bananas", pressure == -88 ~ "oranges"))
#> Error in `mutate()`:
#> ℹ In argument: `wind_NA = (function (.var, suffix) ...`.
#> Caused by error in `lvls_reorder()`:
#> ! `idx` must contain one integer for each level of `f`
#> Backtrace:
#>      ▆
#>   1. ├─... %>% ...
#>   2. ├─naniar::recode_shadow(...)
#>   3. ├─naniar:::recode_shadow.data.frame(...)
#>   4. │ └─... %>% dplyr::mutate(!!!magic_shade_case_when)
#>   5. ├─dplyr::mutate(., !!!magic_shade_case_when)
#>   6. ├─naniar:::update_shadow(., unlist(suffix, use.names = FALSE))
#>   7. │ └─dplyr::mutate_if(...)
#>   8. │   ├─dplyr::mutate(.tbl, !!!funs)
#>   9. │   └─dplyr:::mutate.data.frame(.tbl, !!!funs)
#>  10. │     └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>  11. │       ├─base::withCallingHandlers(...)
#>  12. │       └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>  13. │         └─mask$eval_all_mutate(quo)
#>  14. │           └─dplyr (local) eval()
#>  15. └─naniar (local) `<fn>`(wind_NA, suffix = `<chr>`)
#>  16.   └─forcats::fct_relevel(new_var, levels(.var), new_level)
#>  17.     └─forcats::lvls_reorder(f, match(new_levels, old_levels))
#>  18.       └─cli::cli_abort("{.arg idx} must contain one integer for each level of {.arg f}")
#>  19.         └─rlang::abort(...)

# Just complex coding for pressure
dfs %>%
  recode_shadow(pressure=.where(pressure == -99 ~ "bananas", pressure == -88 ~ "oranges"))
#> # A tibble: 4 × 6
#>    wind  temp pressure wind_NA temp_NA pressure_NA
#>   <dbl> <dbl>    <dbl> <fct>   <fct>   <fct>      
#> 1   -99    45        3 !NA     !NA     !NA        
#> 2    68    NA        1 !NA     NA      !NA        
#> 3    72    25      -88 !NA     !NA     NA_oranges 
#> 4    38    24      -99 !NA     !NA     NA_bananas

Created on 2024-01-14 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16 ucrt)
#>  os       Windows 11 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/New_York
#>  date     2024-01-14
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
#>  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
#>  dplyr         1.1.3   2023-09-03 [1] CRAN (R 4.3.1)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  forcats       1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
#>  fs            1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
#>  ggplot2       3.4.2   2023-04-03 [1] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
#>  gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
#>  naniar      * 1.0.0   2023-02-02 [1] CRAN (R 4.3.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
#>  scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
#>  styler        1.10.1  2023-06-05 [1] CRAN (R 4.3.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
#>  visdat        0.6.0   2023-02-02 [1] CRAN (R 4.3.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/steph/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@njtierney njtierney reopened this Jan 14, 2024
@njtierney
Copy link
Owner

Thanks @szimmer - I'll take a look at this before the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants