A continuous hinge function for statistical modeling

By May 19, 2017ai, bigdata, machinelearning

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

This comes up sometimes in my applied work: I want a continuous “hinge function,” something like the red curve above, connecting two straight lines in a smooth way.

Why not include the sharp corner (in this case, the function y=-0.5*x if x0)? Two reasons. First, computation: Hamiltonian Monte Carlo can trip on discontinuities. Second, I want a smooth curve anyway, as I’d expect it to better describe reality. Indeed, the linear parts of the curve are themselves typically only approximations.

So, when I’m putting this together, I don’t want to take two lines and then stitch them together with some sort of quadratic or cubic, creating a piecewise function with three parts. I just want one simple formula that asymptotes to the lines, as in the above picture.

As I said, this problem comes up occasion, and each time I struggle to remember what’s a good curve that looks like this. Then after I do it, I forget what I did, the next time the problem comes up. And, amazingly enough, googling *hinge function* gives no convenient solution.

So this time I decided to derive a hinge function from first principles, so we’ll have it forever.

Let’s start with the simplest example, a function y=0 for negative x, and y=x for positive x:

What’s a good curve for this? We can start with the sharp-cornered hinge. Its derivative is just a step function:

Ha! Now it’s clear what to do. We set the derivative to the inverse logistic function, dy/dx = exp(x)/(1 + exp(x)):

And now we just integrate this to get the original function, the continuous hinge. Integrating exp(x)/(1 + exp(x)) dx is trivial; you just define u = exp(x), then it’s the integral of 1/(1+u) du, which is log(1+u), hence y = log (1 + exp(x)).

And that’s what I plotted in the second graph above, the one labeled, “Continuous hinge function: simplest example.”

More generally, we might want a continuous version of a hinge with corner at (x0, a), with slope b0 for x x0. And there’s one more parameter which is the distance scale of the inverse-logit, in the dimensions of x. Label that distance scale as delta.

Our desired continuous hinge function then has the derivative,

dy/dx = b0 + (b1 – b0) * exp((x-x0)/delta)/(1 + exp((x-x0)/delta)).

Integrating this over x, and setting it so the corner point is at (x0, a), yields the smooth curve,

y = a + b0*(x – x0) + (b1 – b0) * delta * log (1 + exp((x-x0)/delta))

The top curve above shows the continuous curve with the values x0=2, a=1, b0=0.1, b1=0.5, delta=1.

Playing around with delta keeps the asymptotes the same but compresses or spreads the curving part. For example, here’s what you get by setting delta=3:

In this example, delta=1 (displayed in the very first graph in this post) looks like a better choice if we really want something that looks like a hinge; on the other hand, there are settings where something smoother is desired; and all depends on the scale of x. The above graphs would look much different if plotted from -100 to 100, for example. Anyway, the point here is that we can set delta, and now we understand how it works.

Here’s the R function:

hinge <- function(x, x0, a, b0, b1, delta) {
  return(a + b0 * (x - x0) + (b1 - b0) * delta * log(1 + exp((x - x0) / delta)))
}

This is not actually the best way to compute things, as the exponential can easily send you into overflow, especially if you set delta to a small number, as you might very well do in order to approximate the sharp corner. Indeed, in the graphs above, I drew the dotted lines using hinge() with delta=0.1, as this was less effort than writing a separate if/else function for the sharp hinge, and these graphs are at a resolution where setting delta=0.1 is close enough to setting it to 0.

What value of delta should be used in a real application? It depends on the context. You should resist the inclination to set delta to a tiny value such as 0.001 or even 0.1. Think of the curving connector piece not just as a computational compromise but as a part of your model, in that real-world functions typically do not have sharp corners.

I think we should implement a hinge() function in Stan that does this computation and associated autodiff in a stable and computationally efficient manner. In the meantime, if you do use the above function, make sure that the numbers being exponentiated aren't too extreme. If they are far from 0, do some rescaling in the computation to avoid instabilities.

P.S. Below is the R code for the graphs. Yeah, I know it's ugly. That's part of the point, to show what I do and, I hope, motivate you to do better. I'm not the world's cleanest coder, and I'm not proud of that.

pdf("hinge_1.pdf", height=4, width=5.5)
par(mar=c(3,3,1,1), mgp=c(1.7,.5,0), tck=-.01)
curve(hinge(x, 2, 1, .1, .5, .1), from=-10, to=10, n=1e5, bty="l", xlab="x", ylab="y", xaxs="i", lty=2)
curve(hinge(x, 2, 1, .1, .5, 1), from=-10, to=10, n=1e5, col="red", add=TRUE)
mtext("A continuous hinge function")
text(-5, .7, "slope of 0.1", cex=.9)
text(7.2, 2.5, "slope of 0.5", cex=.9)
dev.off()

pdf("hinge_2.pdf", height=4, width=5.5)
par(mar=c(3,3,1,1), mgp=c(1.7,.5,0), tck=-.01)
curve(hinge(x, 0, 0, 0, 1, .1), from=-10, to=10, n=1e5, bty="l", xlab="x", ylab="y", xaxs="i", lty=2)
curve(hinge(x, 0, 0, 0, 1, 1), from=-10, to=10, n=1e5, col="red", add=TRUE)
mtext("Continuous hinge function:  simplest example")
text(-5, .5, "y = 0", cex=.9)
text(5, 6, "y = x", cex=.9)
dev.off()

pdf("hinge_3.pdf", height=4, width=5.5)
par(mar=c(3,3,1,1), mgp=c(1.7,.5,0), tck=-.01)
curve(ifelse(x<0, 0, 1), from=-10, to=10, n=1e5, bty="l", xlab="x", ylab="dy/dx", xaxs="i", lty=2)
mtext("The derivative of a hinge is a step")
dev.off()

pdf("hinge_4.pdf", height=4, width=5.5)
par(mar=c(3,3,1,1), mgp=c(1.7,.5,0), tck=-.01)
curve(ifelse(x<0, 0, 1), from=-10, to=10, n=1e5, bty="l", xlab="x", ylab="dy/dx", xaxs="i", lty=2)
curve(exp(x)/(1+exp(x)), from=-10, to=10, n=1e5, col="red", add=TRUE)
mtext("Inverse logit as a continuous step")
dev.off()

pdf("hinge_5.pdf", height=4, width=5.5)
par(mar=c(3,3,1,1), mgp=c(1.7,.5,0), tck=-.01)
curve(hinge(x, 2, 1, .1, .5, .1), from=-10, to=10, n=1e5, bty="l", xlab="x", ylab="y", xaxs="i", lty=2)
curve(hinge(x, 2, 1, .1, .5, 3), from=-10, to=10, n=1e5, col="red", add=TRUE)
mtext(expression(paste("Continuous hinge function with ", delta, " = 3")), line=-.2)
text(-3.5, .1, "slope of 0.1", cex=.9)
text(7.2, 2.5, "slope of 0.5", cex=.9)
dev.off()

There's also this annoying thing where I make the graphs and view then in pdf---that's fine---but then I have to duplicate them and save them as png files, then upload them one at a time onto the blog, then edit the html. There's gotta be a better approach. I could use Markdown, I suppose, but then I'd have to learn Markdown. Also, I don't really like how Markdown documents look. I'm used to blogstyle.

P.P.S. I don't kid myself that this function is new; I'm sure it's been rediscovered a zillion times. I just find it useful to have the functional form and the derivation here in one place.

The post A continuous hinge function for statistical modeling appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

The post A continuous hinge function for statistical modeling appeared first on All About Statistics.




Source link