categorizing a variable turns it from insignificant to significant The Next CEO of Stack OverflowVariable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression
Audio Conversion With ADS1243
Is it OK to decorate a log book cover?
Why do we say 'Un seul M' and not 'Une seule M' even though M is a "consonne"
What difference does it make using sed with/without whitespaces?
how one can write a nice vector parser, something that does pgfvecparseA=B-C; D=E x F;
Is there an equivalent of cd - for cp or mv
What day is it again?
What steps are necessary to read a Modern SSD in Medieval Europe?
Are the names of these months realistic?
How to find image of a complex function with given constraints?
What is the process for purifying your home if you believe it may have been previously used for pagan worship?
Is dried pee considered dirt?
What does "shotgun unity" refer to here in this sentence?
Is it ever safe to open a suspicious HTML file (e.g. email attachment)?
Computationally populating tables with probability data
IC has pull-down resistors on SMBus lines?
Is a distribution that is normal, but highly skewed, considered Gaussian?
Calculate the Mean mean of two numbers
Reference request: Grassmannian and Plucker coordinates in type B, C, D
Man transported from Alternate World into ours by a Neutrino Detector
TikZ: How to fill area with a special pattern?
How to get the last not-null value in an ordered column of a huge table?
Is there a reasonable and studied concept of reduction between regular languages?
Yu-Gi-Oh cards in Python 3
categorizing a variable turns it from insignificant to significant
The Next CEO of Stack OverflowVariable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
add a comment |
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
add a comment |
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
regression logistic statistical-significance multivariate-analysis
edited Mar 19 at 9:53
kjetil b halvorsen
31.7k984234
31.7k984234
asked Mar 19 at 5:58
Omry AtiaOmry Atia
31010
31010
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398273%2fcategorizing-a-variable-turns-it-from-insignificant-to-significant%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
answered Mar 19 at 6:22
Stephan KolassaStephan Kolassa
47.2k7100175
47.2k7100175
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
edited Mar 19 at 14:58
answered Mar 19 at 6:23
Glen_b♦Glen_b
214k23417769
214k23417769
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398273%2fcategorizing-a-variable-turns-it-from-insignificant-to-significant%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown