StatQuest: Random Forests in R

Here’s a link to the source code on the StatQuest GitHub.

17 thoughts on “StatQuest: Random Forests in R

  1. Your exported code seems to have some problems with HTML tags: <- "
    Does R Studio editor have a problem with parsing single quotes?

  2. Hey you are doing a great great and great job by showing the ways for ML. I wanted to know thefact that which one is better – Python or R for doing classification?
    On the other hand, When we are doing PCA or Random forest classifications, we somehow clustering the samples. Do we need to use another algorithm for doing classification or clusterization?

    • I can’t say if Python is better for machine learning than R, or if R is better – the algorithms are the same. It really boils down to which language you would prefer to use or if there are other features of Python or R that you want to have access to.

      I’m not sure I understand the second question – however, I’ll take guess. If you cluster the samples using Random Forests, you can still use Random Forests for classification – you can do both things with the same Random Forest.

  3. Great video!!

    This is very helpful especially for beginners (like me) to learn about Random Forest using R studio.
    As I am quite new to this field, would you please teach me how to visualize and get the tree result?
    I have been struggling with writing codes and commands for a while. Thanks!!

      • It would be very nice if you can just write down the command on this comment like what you did above. I really need a picture of tree sample for my presentation very soon.

        I tried searching many websites but most of them are complicated for me because I did not know which part of command that I need to change into my own data. I tried playing around and I got error most of the time.

        I tried commands like “cforest”, “getTree” but could not get result that I need. I came across a command “reprtree” as well. It looks promissing but I cannot install it saying it is not available for Rstudio ver. 3.4.4 and 3.5.

        I am sorry if I ask very simple questions but I just stated to learn about Rstudio and Random Forest by myself couple days ago. Anyway, really look forward to your next video.

  4. Have you already copied and pasted the code above? It creates several graphs and charts – both represent the quality / effectiveness of the random forest. Do you need something other than those graphs? If so, could you please describe it more clearly?

  5. I tried all of the codes and got same result as you have shown in the video.

    In fact, I conducted questionnaire upon specific group of manufacturing firms. Using Random Forest, I plan to utilize answers (variables) from each firm as a classificaton, then use it to identify firms with similiar characteristics in another set of data.

    I would like to extract one representative tree from the forest in form of one simple visualized tree chart, so that I can show how I identify which firm in another data set belongs to this specific group of manufacturing firm.

    I hope this help explain my intenton clearly. Many Thanks

  6. Hey!
    Your videos have been very helpful to me in understanding classification approaches for a project I am doing.

    The data I have is about 5500 observations with 18 independent variables. The % of the success class is about 30 %. I have previously tried logistic regression on this data set. While it serves as a good profiler, giving me an idea about the independent variables that are relevant, as a predictor, it is not quite good.

    Hence, I decided to use Random Forests. PFA a link to the error image. My % error for the success class is actually increasing. Do you have any idea why this could be happening or what I could do?
    https://ibb.co/HHy8Q3g

    Kind Regards

  7. Hi Josh Thanks for this video.

    One question though: You show how to impute data in your original dataset rfimput but what about imputing data in the new dataset used to predict your outcome? In your random forest video #2 you indicate that a similar technique can be used to predict varibles in the a new dataset but in the R section you do not provide the code for it. Since in the new dataset we do not have the outcome varible I imagine the command rfImpute does not work.

    Thanks for your help

Leave a Reply

Your email address will not be published. Required fields are marked *