Choosing k value in KNN classifier?2019 Community Moderator ElectionBackpropagation: how do you compute the gradient of the final output with respect to any loss function?scikit-learn classifier reset in loopSci-kit learn function to select threshold for higher recall than precisionInterpreting 1vs1 support vectors in an SVMStacking when the the target variable is categorical?How can I do classification with categorical data which is not fixed?Why does Bagging or Boosting algorithm give better accuracy than basic Algorithms in small datasets?When does decision tree perform better than the neural network?Problem about tuning hyper-parametresHow to use a one-hot encoded nominal feature in a classifier in Scikit Learn?
Is there a familial term for apples and pears?
Extreme, but not acceptable situation and I can't start the work tomorrow morning
Does the average primeness of natural numbers tend to zero?
How can I fix this gap between bookcases I made?
Is it wise to focus on putting odd beats on left when playing double bass drums?
What does "enim et" mean?
How could a lack of term limits lead to a "dictatorship?"
COUNT(*) or MAX(id) - which is faster?
"listening to me about as much as you're listening to this pole here"
Is Social Media Science Fiction?
Is Fable (1996) connected in any way to the Fable franchise from Lionhead Studios?
Is there a way to make member function NOT callable from constructor?
Why was the "bread communication" in the arena of Catching Fire left out in the movie?
Lied on resume at previous job
Crop image to path created in TikZ?
Email Account under attack (really) - anything I can do?
LWC and complex parameters
Could a US political party gain complete control over the government by removing checks & balances?
Why is my log file so massive? 22gb. I am running log backups
What does 'script /dev/null' do?
Is every set a filtered colimit of finite sets?
"My colleague's body is amazing"
Add an angle to a sphere
Hosting Wordpress in a EC2 Load Balanced Instance
Choosing k value in KNN classifier?
2019 Community Moderator ElectionBackpropagation: how do you compute the gradient of the final output with respect to any loss function?scikit-learn classifier reset in loopSci-kit learn function to select threshold for higher recall than precisionInterpreting 1vs1 support vectors in an SVMStacking when the the target variable is categorical?How can I do classification with categorical data which is not fixed?Why does Bagging or Boosting algorithm give better accuracy than basic Algorithms in small datasets?When does decision tree perform better than the neural network?Problem about tuning hyper-parametresHow to use a one-hot encoded nominal feature in a classifier in Scikit Learn?
$begingroup$
I'm working on classification problem and decided to use KNN classifier for the problem.
so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?
Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?
machine-learning k-nn
$endgroup$
add a comment |
$begingroup$
I'm working on classification problem and decided to use KNN classifier for the problem.
so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?
Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?
machine-learning k-nn
$endgroup$
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago
add a comment |
$begingroup$
I'm working on classification problem and decided to use KNN classifier for the problem.
so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?
Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?
machine-learning k-nn
$endgroup$
I'm working on classification problem and decided to use KNN classifier for the problem.
so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?
Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?
machine-learning k-nn
machine-learning k-nn
asked 7 hours ago
user214user214
22318
22318
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago
add a comment |
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.
$endgroup$
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
add a comment |
$begingroup$
I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48905%2fchoosing-k-value-in-knn-classifier%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.
$endgroup$
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
add a comment |
$begingroup$
Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.
$endgroup$
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
add a comment |
$begingroup$
Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.
$endgroup$
Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.
answered 6 hours ago
pythinkerpythinker
5431211
5431211
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
add a comment |
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
6 hours ago
1
1
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
6 hours ago
add a comment |
$begingroup$
I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
add a comment |
$begingroup$
I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
add a comment |
$begingroup$
I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 7 hours ago
Stephen EwingStephen Ewing
112
112
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
add a comment |
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
So I used go with k=131
$endgroup$
– user214
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
7 hours ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48905%2fchoosing-k-value-in-knn-classifier%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
7 hours ago
$begingroup$
@pythinker yes..
$endgroup$
– user214
7 hours ago