Data Wrangling 2.0 with R

Now this week is data wrangling 2.0 with R, where we’ll be taking a step further into wrangling and look at data structuring where we will be using more techniques like joining, transposing to analyze the data better.

MSBA
DataScience
Data Analytics
R
Data Wrangling
Dplyr
Author

Mohit Shrestha

Published

July 29, 2019

We’ve been studying R for two weeks now, and so far we have studied the basics – analyzing the data with simple plotting- creating histogram, scatter plot.

And now this week is data wrangling 2.0 with R, where we’ll be taking a step further into wrangling and look at data structuring where we will be using more techniques like joining, transposing to analyze the data better.

First we looked at how to filter NAs, how to convert a character into a numeric data, convert date integer/character format into Date format using lubridate package, which is a R package that basically makes it easier to process any date related tasks.

In today’s blog, I’ll just make a note of some of these techniques that I thought would be most helpful.

JOINs statement in R are very similar to PROC SQL in SAS, but I found writing Join statement in R much easier and straight forward.

The most used technique of joining two datasets is using the function inner join which returns all observations in table_A and table_B, where the keys are equal, which is defined in the by statement. If we don’t specify the columns that we want to join by, by default, it will take the column names that match.

Inner Join

Inner Join

Full join is another useful technique where it selects all matching and non-matching rows, and brings the records from all of the tables. I think this will be very useful when we are trying to find things that don’t match up.

Full Join

Full Join

But what happens when we don’t have matching columns? We just use DPLYR and specify a vector in the by statement:

by = c(“x” = “x2”, “y” = “y2”)

Here, x will be coming from Table 1, and x2 will be coming from Table 2. They don’t have to be the same name, but they have to be the same type.

Common Join Gotchas!

https://dplyr.tidyverse.org/reference/join.html

The other thing we learned is how to append rows together and bind columns by using functions bind_rows and bind_cols. They are pretty similar in terms of what they do. Bind rows just appends the tables together, and bind columns just puts the two tables together without the join. It may look like it joins the tables, but it’s not matching the keys. Also, in bind column, it just looks at the first column of first table. So in the example below, bind column just takes the observation data in COL_A of Table A.

Bind Rows and Bind Cols

Bind Rows and Bind Cols

So, the next function that is similar to bind_rows is union_all. Union also bind rows, but it doesn’t do sort, so we’ll just end up with a record once Table A and Table B are put together.

Union all

Union all

There’s another function union which is similar to union_all, where it appends the records with a sort distinct and returns the unique set of rows from Table A and B

Union

Union

One of the other things that I think would be very helpful is SetDiff as it returns the rows that don’t match. There are times when we want to see what didn’t match, so in those cases SetDiff would be super helpful.

Setdiff

Setdiff

We also learnt about data tidying, which is similar to transpose but a lot more powerful, where we have

Gather

Gather

Spread

Spread

Reuse

Citation

BibTeX citation:
@misc{shrestha2019,
  author = {Mohit Shrestha},
  title = {Data {Wrangling} 2.0 with {R}},
  date = {2019-07-29},
  url = {https://www.mohitshrestha.com.np/posts/2019-07-29-data-wrangling-2-0-with-r},
  langid = {en}
}
For attribution, please cite this work as:
Mohit Shrestha. 2019. “Data Wrangling 2.0 with R.” July 29, 2019. https://www.mohitshrestha.com.np/posts/2019-07-29-data-wrangling-2-0-with-r.

Stay in touch

If you enjoyed this post, then don't miss out on any future posts by subscribing to my email newsletter.

Support my work with a coffee

Or if you’re interested in working with me, I’m open to freelance work. You can book an appointment with me on Calendly.

Share