## Introduction

This article describes various methods of writing conditional formulas using R.

## Requirements

- An R variable, calculation or data set.
- The main condition operators are as follows:

## Method

### 1. Mathematical operators and formulas

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:

q2a_1 / (q2a_1 + q2b_1)

All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Note, instead of grouping the variables using *cbind*, we can also reference the variable set **Label **(enclosed in backticks):

rowMeans(`Q2 - No. of colas consumed`)

### Vector arithmetic

One of the great strengths of using R is that you can use *vector arithmetic*. Consider the expression `q2a_1 / sum(q2a_1)`

. This tells R to divide the value of `q2_a1`

by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide `q2_a1`

by itself. We can similarly standardize `q2a_1`

to have a mean of 0 and a standard deviation of 1 using ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

.

In these two examples, there are also pre-constructed specialist functions we can use: ```
q2a_1
/ sum(q2a_1)
```

is equivalent to writing `prop.table(q2a_1)`

, and ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

is equivalent to `scale(q2a_1)`

.

Note, most in-built R functions, such as `sd`

, `mean`

, `sum`

, `rowMeans`

, and `rowSums`

, will return missing values if any of the values in the *vector *(variable in this case) passed to them contains a missing value. In most cases, the trick is to use `na.rm = TRUE`

. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, `prop.table`

cannot deal with missing values, and `scale`

automatically removes them.

### Variable sets as tables

If you hover over a variable set R reference (e.g. `Q2 - No. of colas consumed`) in a **Calculation **with your mouse, you can see that it previews the raw data in tabular format.

This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select** Hide**, or else add this exclusion in your code.

If we were, for example, to look at the sum of the variables pertaining to each occasion: **Sum, 'out and about'** and **Sum, 'at home'**, this would be the last column in the preview: `[,"SUM","SUM"]`

.

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

This is perhaps more obvious when we review the data as an aggregated table to see the interlocked *SUM - SUM* cell:

### 2. Boolean expression

A *Boolean expression* is an expression which evaluates to a logical value of *true* or *false*. Results that are *true* are returned as values of 1 and* false* as 0 (i.e., this is a way to construct a binary variable).

For example, we have two numeric variables, *v1* and *v2*:

`v1 != v2`

returns a 1 for observations where*v1*and*v2*differ and a value of 0 when they are the same.`is.na(v1)`

returns a 1 for observations in*v1*that have the value of NA (missing), otherwise it returns a 0.`rowSums(v1,v2) > 0`

returns a 1 if the sum of*v1*and*v2*is greater than 0, otherwise it returns a 0.

### 3. The *if...else* method

Basic conditional statements can be written using an *if* then *else* structure. As an example, we have an object called *x* that stores a rating of 1-5. Here, we will assign a **1** if *x* is greater than or equal to **4**, assign a** 2** if *x* equals** 3**, otherwise assign a **3**:

if(x>=4) 1 else

if(x==3) 2 else

3

This is the same set of conditions using optional curly brackets and spacing:

if(x>=4){

1

} else if(x==3) {

2

} else {

3

}

Note, the above example will return a single result. If you wish to use this method within an R variable, you will need to return a vector or data frame with the same length (number of rows) as the number of records in your data set.

For example, if you have conditions connected to a combo box control, the below will display the data from the *BrandA* variable if *Brand 1* is selected, *BrandB* if *Brand 2* is selected, otherwise *BrandC*:

if(combo.box=="Brand 1") BrandA else

if(combo.box=="Brand 2") BrandB else

BrandC

### 4. The *switch* method

An alternative to *if...else* is the *switch* function. Using the earlier example, we could write the following to achieve this result:

switch(x,3,3,2,1,1)

In this code, the value of *x* represents an index which tells it which subsequent value to return. So if *x* equals 4, it will return 1 as this is the fourth of the five recode values.

Note, this returns a single value only.

### 5. The* subscripting* method

A further conditional method which is useful for banding variables is to essentially apply filter conditions. Again using the same example, we can write the following:

x[x>=4] = 1

x[x==3] = 2

x[x<=2] = 3

x

Note, this returns a value for each record in your *x* object.

Alternatively, you can replace the values with labels so it returns a text output instead:

x[x>=4] = "Yes"

x[x==3] = "Maybe"

x[x<=2] = "No"

x

Note, changing **Properties > Structure** to **Nominal** for R variables will let Displayr automatically set up the value labels when it converts to a categorical variable. Similarly, if your code returns a factor (i.e. has a value and label), you will not need to manually add labels via **Properties >** **DATA VALUES > Labels**.

### 6. The *ifelse* method

There is also a shortcut method called *ifelse* that lets you write a condition in a single line. In the below example, the formula will return a **Yes** if *x* is greater than 1, otherwise a **No**:

ifelse(x>1,"Yes","No")

Note, this returns a value for each record in your *x* object. You can also nest this to additionally return **Maybe** if *y* is greater than 1:

ifelse(x>1,"Yes", ifelse(y>1,"Maybe","No"))

### 7. The *case_when* method

The *dplyr* R package offers the *case_when* function which is particularly useful for working with categorical data. Below is an example of how to recode an *Age* variable into groups:

dplyr::case_when(

Age == "18 to 24" ~ 1,

Age == "25 to 29" ~ 2,

Age %in% c("40 to 44", "45 to 49") ~ 3,

Age %in% c("50 to 54", "55 to 64", "65 or more") ~ 4

TRUE ~ 0

)

Looking at the code above, note that:

- For a single category, we use the
`==`

operator. - For multiple categories, we list them surrounded by
`c()`

and use the`%in%`

operator. - The values are assigned at the end of the line, after a
`~`

. - The
`TRUE ~ 0`

is optional and R reads this as assign 0 to "everybody else". If records don't fall into any of these conditions and this line is omitted, the result will return NA.

Let's now look at a more complex example that references multiple questions, *Age* and *d4* (living arrangements). Here, we wish to create a household structure variable by using the `&`

operator:

dplyr::case_when(

# Young singles

Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 %in% c("Living alone", "Sharing accommodation") ~ 1,

# Older singles

!Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 %in% c("Living alone", "Sharing accommodation") ~ 2,

# Young couples

Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 == "Living with partner only" ~ 3,

# Older couples

!Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 == "Living with partner only" ~ 4,

# Young families

Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 %in% c("Living with partner and children", "Living with children only") ~ 5,

# Older families

!Age %in% c("18 to 24", "25 to 29", "30 to 34") &

d4 %in% c("Living with partner and children", "Living with children only") ~ 6,

# Older families

TRUE ~ 7

)

A much nicer way of computing a household structure variable is shown in the code below:

young = Age %in% c("18 to 24", "25 to 29", "30 to 34")

single = d4 %in% c("Living alone", "Sharing accommodation")

partner.only = d4 == "Living with partner only"

children = d4 %in% c("Living with partner and children", "Living with children only")

dplyr::case_when(

young & single ~ 1,

!young & single ~ 2,

young & partner.only ~ 3,

!young & partner.only ~ 4,

young & children ~ 5,

!young & children ~ 6,

!children & !partner.only & !single ~ 7

)

This approach initially creates four variables as inputs to the main variable of interest. These variables are so called scratch variables: they're only accessible to this specific code, and not from any other object or code in Displayr. They exist for the sole purpose of computing household structure. This time the first 4 lines each compute a variable with TRUE or FALSE for each row of data, and then *case_when* evaluates these using standard boolean logic for each row of data.

Note, be careful of using *as.numeric* to convert categorical data into numeric data to avoid referencing value labels in your code. These assigned values will not necessarily match the values that have been set in the raw data file. For example, if the data file contains values of 1 for** Male** and 2 for **Female**, but no respondent selected male, then the value of 1 would be assigned to Female.

In these cases, it's better to create a numeric copy of your variable to reference instead. You can do this by selecting your variable in the **Data Sets** tree, pressing **Duplicate** in the toolbar and then changing **Structure** to **Numeric** on the **Properties** tab of the **object inspector**.

## See Also

How to Work with R in Displayr

How to Perform Mathematical Calculations Using R

## Comments

0 comments

Article is closed for comments.