categorizing a variable turns it from insignificant to significant The Next CEO of Stack OverflowVariable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression

Audio Conversion With ADS1243

Is it OK to decorate a log book cover?

Why do we say 'Un seul M' and not 'Une seule M' even though M is a "consonne"

What difference does it make using sed with/without whitespaces?

how one can write a nice vector parser, something that does pgfvecparseA=B-C; D=E x F;

Is there an equivalent of cd - for cp or mv

What day is it again?

What steps are necessary to read a Modern SSD in Medieval Europe?

Are the names of these months realistic?

How to find image of a complex function with given constraints?

What is the process for purifying your home if you believe it may have been previously used for pagan worship?

Is dried pee considered dirt?

What does "shotgun unity" refer to here in this sentence?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Computationally populating tables with probability data

IC has pull-down resistors on SMBus lines?

Is a distribution that is normal, but highly skewed, considered Gaussian?

Calculate the Mean mean of two numbers

Reference request: Grassmannian and Plucker coordinates in type B, C, D

Man transported from Alternate World into ours by a Neutrino Detector

TikZ: How to fill area with a special pattern?

How to get the last not-null value in an ordered column of a huge table?

Is there a reasonable and studied concept of reduction between regular languages?

Yu-Gi-Oh cards in Python 3

categorizing a variable turns it from insignificant to significant

The Next CEO of Stack OverflowVariable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression

I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.

How can this be?

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

add a comment |

How can this be?

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

add a comment |

How can this be?

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

How can this be?

regression logistic statistical-significance multivariate-analysis

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

edited Mar 19 at 9:53

kjetil b halvorsen

31.7k984234

asked Mar 19 at 5:58

Omry Atia

31010

asked Mar 19 at 5:58

Omry Atia

31010

asked Mar 19 at 5:58

Omry Atia

31010

add a comment |

2 Answers
2

active

oldest

votes

One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.

Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.

> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
> 
> library(lmtest)
> 
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test

Model 1: yy ~ xx
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72 
2 1 -677.22 -1 0.9914 0.3194
> 
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test

Model 1: yy ~ xx_cut
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq) 
1 3 -673.65 
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40

3

$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03

add a comment |

One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.

You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398273%2fcategorizing-a-variable-turns-it-from-insignificant-to-significant%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.

> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
> 
> library(lmtest)
> 
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test

Model 1: yy ~ xx
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72 
2 1 -677.22 -1 0.9914 0.3194
> 
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test

Model 1: yy ~ xx_cut
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq) 
1 3 -673.65 
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40

3

$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03

add a comment |

One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.

> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
> 
> library(lmtest)
> 
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test

Model 1: yy ~ xx
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72 
2 1 -677.22 -1 0.9914 0.3194
> 
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test

Model 1: yy ~ xx_cut
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq) 
1 3 -673.65 
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40

3

$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03

add a comment |

One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.

> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
> 
> library(lmtest)
> 
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test

Model 1: yy ~ xx
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72 
2 1 -677.22 -1 0.9914 0.3194
> 
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test

Model 1: yy ~ xx_cut
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq) 
1 3 -673.65 
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.

> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
> 
> library(lmtest)
> 
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test

Model 1: yy ~ xx
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72 
2 1 -677.22 -1 0.9914 0.3194
> 
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test

Model 1: yy ~ xx_cut
Model 2: yy ~ 1
 #Df LogLik Df Chisq Pr(>Chisq) 
1 3 -673.65 
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

answered Mar 19 at 6:22

Stephan Kolassa

47.2k7100175

$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40

3

$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03

add a comment |

$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40

3

$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03

Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?

– ajrwhite
Mar 19 at 18:40

@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.

– Stephan Kolassa
Mar 20 at 6:03

add a comment |

One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

add a comment |

One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

add a comment |

One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

edited Mar 19 at 14:58

answered Mar 19 at 6:23

Glen_b♦

214k23417769

answered Mar 19 at 6:23

Glen_b♦

214k23417769

answered Mar 19 at 6:23

Glen_b♦

214k23417769

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtkuk

2 Answers
2

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

2 Answers
2

2 Answers
2

2 Answers
2