StatQuest: Random Forests in R

February 26, 2018

Here’s a link to the source code on the StatQuest GitHub.

17 thoughts on “StatQuest: Random Forests in R”

Statistics Student

March 2, 2018 at 6:15 am

Your exported code seems to have some problems with HTML tags: <- "
Does R Studio editor have a problem with parsing single quotes?

Reply
- Josh
  
  March 2, 2018 at 12:47 pm
  
  Thanks for the heads up! There were all kinds of problems there. I think I’ve fixed it though. Let me know if it works for you (you might need to shift-reload the page).
  
  Reply
  - Statistics Student
    
    March 2, 2018 at 1:38 pm
    
    Check the colnames(data) bit – syntax looks odd
    
    Reply
    - Josh
      
      March 2, 2018 at 2:00 pm
      
      Got it! There were “<" (less than) symbols in the comments that were messing things up.
      
      Reply
Arik

May 1, 2018 at 5:07 pm

Hey you are doing a great great and great job by showing the ways for ML. I wanted to know thefact that which one is better – Python or R for doing classification?
On the other hand, When we are doing PCA or Random forest classifications, we somehow clustering the samples. Do we need to use another algorithm for doing classification or clusterization?

Reply
- Josh
  
  May 1, 2018 at 5:26 pm
  
  I can’t say if Python is better for machine learning than R, or if R is better – the algorithms are the same. It really boils down to which language you would prefer to use or if there are other features of Python or R that you want to have access to.
  
  I’m not sure I understand the second question – however, I’ll take guess. If you cluster the samples using Random Forests, you can still use Random Forests for classification – you can do both things with the same Random Forest.
  
  Reply
  - Arik
    
    May 1, 2018 at 6:50 pm
    
    Thanks for your kind reply. I will look forward to your suggestions
    .
    
    Reply
armchanon

May 7, 2018 at 6:24 am

Great video!!

This is very helpful especially for beginners (like me) to learn about Random Forest using R studio.
As I am quite new to this field, would you please teach me how to visualize and get the tree result?
I have been struggling with writing codes and commands for a while. Thanks!!

Reply
- Josh
  
  May 7, 2018 at 12:38 pm
  
  I’m glad to hear the video is helpful! I’ll put that on the “to-do” list for future StatQuests…
  
  Reply
  - armchanon
    
    May 8, 2018 at 6:39 pm
    
    It would be very nice if you can just write down the command on this comment like what you did above. I really need a picture of tree sample for my presentation very soon.
    
    I tried searching many websites but most of them are complicated for me because I did not know which part of command that I need to change into my own data. I tried playing around and I got error most of the time.
    
    I tried commands like “cforest”, “getTree” but could not get result that I need. I came across a command “reprtree” as well. It looks promissing but I cannot install it saying it is not available for Rstudio ver. 3.4.4 and 3.5.
    
    I am sorry if I ask very simple questions but I just stated to learn about Rstudio and Random Forest by myself couple days ago. Anyway, really look forward to your next video.
    
    Reply
Josh

May 8, 2018 at 6:46 pm

Have you already copied and pasted the code above? It creates several graphs and charts – both represent the quality / effectiveness of the random forest. Do you need something other than those graphs? If so, could you please describe it more clearly?

Reply
armchanon

May 9, 2018 at 1:21 am

I tried all of the codes and got same result as you have shown in the video.

In fact, I conducted questionnaire upon specific group of manufacturing firms. Using Random Forest, I plan to utilize answers (variables) from each firm as a classificaton, then use it to identify firms with similiar characteristics in another set of data.

I would like to extract one representative tree from the forest in form of one simple visualized tree chart, so that I can show how I identify which firm in another data set belongs to this specific group of manufacturing firm.

I hope this help explain my intenton clearly. Many Thanks

Reply
- Josh
  
  May 9, 2018 at 1:07 pm
  
  To be honest, if all you want is a representative tree, your best bet might be to just draw one by hand. However, if you really want one from your random forest, I found 2 pages that describe the process:
  
  https://stats.stackexchange.com/questions/41443/how-to-actually-plot-a-sample-tree-from-randomforestgettree
  
  https://shiring.github.io/machine_learning/2017/03/16/rf_plot_ggraph
  
  Reply
armchanon

May 10, 2018 at 3:50 am

Thank you very much for your comment. I will follow your suggestion.

Reply
B201

October 25, 2019 at 12:13 pm

Hey!
Your videos have been very helpful to me in understanding classification approaches for a project I am doing.

The data I have is about 5500 observations with 18 independent variables. The % of the success class is about 30 %. I have previously tried logistic regression on this data set. While it serves as a good profiler, giving me an idea about the independent variables that are relevant, as a predictor, it is not quite good.

Hence, I decided to use Random Forests. PFA a link to the error image. My % error for the success class is actually increasing. Do you have any idea why this could be happening or what I could do?
https://ibb.co/HHy8Q3g

Kind Regards

Reply
Nerda

November 21, 2023 at 10:35 pm

Hi Josh Thanks for this video.

One question though: You show how to impute data in your original dataset rfimput but what about imputing data in the new dataset used to predict your outcome? In your random forest video #2 you indicate that a similar technique can be used to predict varibles in the a new dataset but in the R section you do not provide the code for it. Since in the new dataset we do not have the outcome varible I imagine the command rfImpute does not work.

Thanks for your help

Reply
- starmer
  
  November 22, 2023 at 12:47 am
  
  Just like in the video, when you have new data without knowing the outcome, you can impute for both outcomes.
  
  Reply

StatQuest!!!

An epic journey through statistics and machine learning

StatQuest: Random Forests in R

17 thoughts on “StatQuest: Random Forests in R”

Leave a Reply Cancel reply