What is the purpose of using a decision tree?What is the purpose of using a decision tree?Should I use...

LWC SFDX source push error TypeError: LWC1009: decl.moveTo is not a function

High voltage LED indicator 40-1000 VDC without additional power supply

Java Casting: Java 11 throws LambdaConversionException while 1.8 does not

Why can't we play rap on piano?

dbcc cleantable batch size explanation

Maximum likelihood parameters deviate from posterior distributions

How old can references or sources in a thesis be?

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

Perform and show arithmetic with LuaLaTeX

Do I have a twin with permutated remainders?

What are these boxed doors outside store fronts in New York?

What does it mean to describe someone as a butt steak?

A case of the sniffles

What is the word for reserving something for yourself before others do?

Cross compiling for RPi - error while loading shared libraries

What's that red-plus icon near a text?

Why is consensus so controversial in Britain?

Alternative to sending password over mail?

Why does Kotter return in Welcome Back Kotter?

Why doesn't H₄O²⁺ exist?

strTok function (thread safe, supports empty tokens, doesn't change string)

What does "Puller Prush Person" mean?

What typically incentivizes a professor to change jobs to a lower ranking university?

Intersection point of 2 lines defined by 2 points each



What is the purpose of using a decision tree?


What is the purpose of using a decision tree?Should I use decision trees to predict user preferences?Deciding attributes for decision treesWhat does “degree of freedom” mean in neural networks?The efficiency of Decision TreeComparing learning methods for facial recognitionWhat are the most common machine learning algorithms applied to binary categorical data?Decision Tree Quality MetricModeling failure “events” in time-series environmental dataWhat is Oblivious Decision Tree and Why?What happen to gain ratio when information gain is 0?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







8












$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$








  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33


















8












$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$








  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33














8












8








8


1



$begingroup$


I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?










share|cite|improve this question









$endgroup$




I don't understand what is the purpose of the decision tree? The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree? It is because it decreases the complexity of my code?



I am still spared from calculating entropy and information gain because there are prebuilt algorithms for them where I just plug in the rules right? (Like ID3)



Why do we use it with machine learning now? Because we don't even have to come up with the rules while before we needed to? The machine learns from the training data and based on the attributes it can predict a result?



Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?







machine-learning






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Mar 19 at 12:26









5791357913

452




452








  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33














  • 6




    $begingroup$
    It's not about the code, it's about the model.
    $endgroup$
    – Sycorax
    Mar 19 at 13:28






  • 6




    $begingroup$
    "Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
    $endgroup$
    – DarthFennec
    Mar 19 at 17:54










  • $begingroup$
    Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
    $endgroup$
    – gung
    Mar 19 at 18:59










  • $begingroup$
    @DarthFennec Quotable!
    $endgroup$
    – Jim
    Mar 19 at 20:33








6




6




$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28




$begingroup$
It's not about the code, it's about the model.
$endgroup$
– Sycorax
Mar 19 at 13:28




6




6




$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54




$begingroup$
"Does implementing ML in my code decrease overhead more and it makes my code less complex, more effective, faster?" More effective, depending on what your code does, but otherwise no. ML doesn't exist to make your code less complex or more performant (it tends to have the opposite effect). ML exists to automate creation of algorithms based on sample data. Usually this isn't necessary because programmers can just write effective algorithms, but sometimes that's way too hard to do, which is where ML comes in.
$endgroup$
– DarthFennec
Mar 19 at 17:54












$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung
Mar 19 at 18:59




$begingroup$
Please do not cross-post. That is against SE policy for just this reason; it wastes a lot of people's time.
$endgroup$
– gung
Mar 19 at 18:59












$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33




$begingroup$
@DarthFennec Quotable!
$endgroup$
– Jim
Mar 19 at 20:33










3 Answers
3






active

oldest

votes


















21












$begingroup$


The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbb{R}$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_{i+1}$ we try to approximize $f|_{(a_i, a_{i+1})}$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






share|cite|improve this answer









$endgroup$









  • 2




    $begingroup$
    The other big plus is being accessible for human inspection ("aaah, so that's why!").
    $endgroup$
    – dedObed
    Mar 19 at 21:46






  • 1




    $begingroup$
    yeah decision trees are great for explaining to people without stats background because they are very intuitive.
    $endgroup$
    – qwr
    Mar 19 at 21:57



















1












$begingroup$

Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



Any set of partitions will approximate the function but some are clearly better than others.



Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






share|cite|improve this answer











$endgroup$





















    0












    $begingroup$

    A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



    Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



    Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.





    For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






    share|cite|improve this answer









    $endgroup$














      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "65"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      21












      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbb{R}$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_{i+1}$ we try to approximize $f|_{(a_i, a_{i+1})}$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$









      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57
















      21












      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbb{R}$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_{i+1}$ we try to approximize $f|_{(a_i, a_{i+1})}$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$









      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57














      21












      21








      21





      $begingroup$


      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbb{R}$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_{i+1}$ we try to approximize $f|_{(a_i, a_{i+1})}$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.






      share|cite|improve this answer









      $endgroup$




      The way I see it is, it is a series of if-else. Why don't I just use if-else instead of using a decision tree?




      You are absolutely right. A decision tree is nothing else but a series of if-else statements. However, it is the way we interpret these statements as a tree that lets us build these rules automatically... I.e. given some input example set $(x_1, y_1), ..., (x_N, y_N)$... what is the best set of rules that describes what value $y$ has given a new input $x$? ID3 and alike lets us automatically create these rules. It is not really about the tree once built, it is about how we created it.



      Apart from that one hardly ever uses a decision tree alone, the reason being precisely what you say: it is a pretty simplistic model that lacks expressiveness. However, it has one big advantage over other models: One can compute a single decision tree quite fast. That means that we can come up with algorithms that train many many decision trees (boosting, aka AdaBoost and GradientBoosting) on big datasets. These collection of (usually more than 500) of these simplistic models (called forest) can then express much more complicated shapes.



      You could also imagine it like this: Given a 'nice' (i.e. continuous) but complicated function $f : [a,b] to mathbb{R}$ we could try to approximate this function using lines. If the function is complicated (like $sin(x)$ or so) then we produce a big error. However, we could combine lines in the way that we divide the interval $[a,b]$ into smaller parts $a = a_0 < a_1 < ... < a_M = b$ and on each $a_i, a_{i+1}$ we try to approximize $f|_{(a_i, a_{i+1})}$ (that is, $f$ restricted to this interval) by a line. By basic math (analysis) we can then approximate the function arbitrarily close (i.e. make an arbitrarily small error) if we take enough lines. Hence, we built up a complicated but accurate model from very simple ones. That is exactly the same idea that (for example) GradientBoosting uses: It builds a forest from very 'stupid' single decision trees.







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Mar 19 at 13:25









      Fabian WernerFabian Werner

      1,646516




      1,646516








      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57














      • 2




        $begingroup$
        The other big plus is being accessible for human inspection ("aaah, so that's why!").
        $endgroup$
        – dedObed
        Mar 19 at 21:46






      • 1




        $begingroup$
        yeah decision trees are great for explaining to people without stats background because they are very intuitive.
        $endgroup$
        – qwr
        Mar 19 at 21:57








      2




      2




      $begingroup$
      The other big plus is being accessible for human inspection ("aaah, so that's why!").
      $endgroup$
      – dedObed
      Mar 19 at 21:46




      $begingroup$
      The other big plus is being accessible for human inspection ("aaah, so that's why!").
      $endgroup$
      – dedObed
      Mar 19 at 21:46




      1




      1




      $begingroup$
      yeah decision trees are great for explaining to people without stats background because they are very intuitive.
      $endgroup$
      – qwr
      Mar 19 at 21:57




      $begingroup$
      yeah decision trees are great for explaining to people without stats background because they are very intuitive.
      $endgroup$
      – qwr
      Mar 19 at 21:57













      1












      $begingroup$

      Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



      If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



      Any set of partitions will approximate the function but some are clearly better than others.



      Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



      Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






      share|cite|improve this answer











      $endgroup$


















        1












        $begingroup$

        Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



        If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



        Any set of partitions will approximate the function but some are clearly better than others.



        Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



        Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






        share|cite|improve this answer











        $endgroup$
















          1












          1








          1





          $begingroup$

          Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



          If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



          Any set of partitions will approximate the function but some are clearly better than others.



          Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



          Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.






          share|cite|improve this answer











          $endgroup$



          Just adding to @Fabian Werner’s answer - do you remember doing Riemann Sums rule in an intro to integration? Well that too was a set of evenly partitioned if statements which you use to calculate the area under the function.



          If you draw a 1D function and draw the partitions evenly what you will find is that in areas where the function has little gradient, neighboring partitions can be merged together without a great loss in accuracy. Equally, in partitions with high gradient adding more partitions will significantly improve the approximation.



          Any set of partitions will approximate the function but some are clearly better than others.



          Now, moving to CART models - we see data in the form of noisy points from this function and we are asked to approximate the function. By adding too many partitions we can overfit and essentially perform a nearest neighbor type model. To avoid this we limit the number of partitions our model can use (usually in the form of max depth and min samples per split). So now where should we place these splits? That is the question addressed by the splitting criteria. Areas with higher “complexity” should receive more splits as a rule of thumb and that is what gini, entropy, etc. endeavour to do.



          Making predictions are just if-else statements but in the context of machine learning that is not where the power of the model comes from. The power comes from the model's ability to trade off over and under fit in a scalable manner and can be derived in a consistent probabilistic framework with theoretical guarantees in the limit of data. Finally, if we take a similar abstracted view of ML models we can say neural networks, kernel methods, Monte Carlo approaches and many more are simply addition and multiplication. Unfortunately, that is not a very useful view of the literature.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Mar 22 at 6:51

























          answered Mar 21 at 21:12









          j__j__

          1,451511




          1,451511























              0












              $begingroup$

              A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



              Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



              Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.





              For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






              share|cite|improve this answer









              $endgroup$


















                0












                $begingroup$

                A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.





                For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






                share|cite|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                  Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                  Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.





                  For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.






                  share|cite|improve this answer









                  $endgroup$



                  A decision tree is a partitioning of the problem domain in subsets, by means of conditions. It is usually implemented as cascaded if-then-elses. You can see it as a term that describes a complex decision logic.



                  Decision trees are neither more efficient nor more "supportive" of machine learning than logical tests. They are logical tests.



                  Also keep in mind that any algorithm is nothing more than a combination of arithmetic computations and tests, i.e. a (usually huge) decision tree.





                  For completeness, let us mention that in some contexts, such as machine learning, complex decision trees are built automatically, by algorithms. But this doesn't change their nature.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Mar 19 at 13:54









                  Yves DaoustYves Daoust

                  19819




                  19819






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398322%2fwhat-is-the-purpose-of-using-a-decision-tree%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Nidaros erkebispedøme

                      Birsay

                      Was Woodrow Wilson really a Liberal?Was World War I a war of liberals against authoritarians?Founding Fathers...