In the meantime here's one last maths heavy post (promise!)
Newton's Method for root finding (or strictly the Secant Method, as we're finding the derivative analytically) works pretty good. It fits a straight line to a function, then solves for the straight line. Then we repeat again using the straight line solution as a starting point to fit a new straight line.
We're about to leave anything that's relevant to anything that's taught in schools - so lets abandon any pretence this isn't about calculus. I'll also abandon any claim that I actually know this stuff - my maths is pretty good, but I don't have a maths degree, so there may be some inaccuracies in here! This is stuff I picked up when I was fact checking the previous post!
Newton Rahpson takes the derivative of a function find its gradient at a point and fit a straight line, but what if we could do better? The function isn't a straight line - its a curve, which means its gradient is changing, so we get the wrong answer. But what if we could include information about not just the gradient, but how the gradient was changing? Well if we plotted a graph of the gradient, what would be it's gradient? In other words the second derivative!
For a straight line it has a constant gradient so f'(x)=m which is a constant. f''(x) is how the gradient changes, and (for a straight line) it doesn't so f''(x)=0. But if we had a quadratic (say x*x), then the gradient does change - its a curve! In fact in this case f'(x)=2x. That of course is a straight line with gradient 2, so f''(x)=2. Newton assumes that f''(x)=0 which is why it only approximates the curve rather than getting it right.
It turns out Newton's method is only the most basic approach to this problem and there are a whole bunch of more advanced solutions which incorporate higher order derivatives, and fit ever more complex curves. After Newton, the next simplest method is Halley's method. It's just like Newton Rhapson except that we use:
That's a bit more complex, but most of it is just an equation that we need to type in, in place of the Newton equation. As you can see it has f''(x) in the bottom right, so its accounting for the curvature.
We approximate f'(x) as:
f'(x) = (f(x+dx)-f(x))/dx
so we can approximate f''(x) as:
f''(x) = (f'(x+dx)-f'(x))/dx
However using our equations so far this would approximate f'(x+dx) using f(x+dx) and f(x+dx+dx), which is a bit far away from where we're interested in. However we could just as easily have used:
f'(x) = (f(x)-f(x-dx))/dx
(this is called backward differencing, rather than forward differencing), so we can use these two approximations - one backwards and one forwards to get two derivatives close to x, and use those to find f''(x) so:
f''(x)=((f(x+dx)-f(x))/dx - (f(x)-f(x-dx))/dx )/dx
It's the forward difference, minus the backward difference, divided by the distance between them. Rearranging this gives:
f''(x)=(f(x+dx)+fx(x-dx)-2f(x))/(dx^2)
f'(x) = (f(x+dx)-f(x-dx))/2dx
In fact we can drop this into Newtons method for slightly better results.
Now we can find f''(x), and a better approximation to f'(x) we can just drop these into the previous code, and off we go:
when start
.set x to 45
.set dx to 0.005
.repeat 10
..broadcast calcF and wait
..set fx to f
..say join join [x] ":" [fx]
..
..change x by dx
..broadcast calcF and wait
..set fpdx to f
..change x by -dx
..change x by -dx
..broadcast calcF and wait
..set fmdx to f
..change x by dx
..
..set m to (fpdx-fmdx)/(dx+dx)
..set d2 to (fpdx+fmdx-(fx+fx))/(dx*dx)
..
..set x to x-((2*fx*m)/(2*m*m-fx*d2))
Running that on our random cubic, prints out:
45:95129
29.7889:28177.9
6.22419:311.385
3.00481:41.1829
1.425:4.52992
0.869719:0.30097
0.802179:0.00100017
0.801938:-1.19209e-07
0.801938:-1.19209e-07
Compare that to the results from Newton:
45:95129
29.5867:27619.5
19.4132:8049.59
12.7117:2363.52
8.29118:698.161
5.37315:206.495
3.45087:60.4607
2.20288:17.1924
1.43379:4.62523
1.00966:1.05843
0.835976:0.145962
0.803114:0.00487411
0.801939:7.15256e-06
0.801938:2.38419e-07
0.801938:-1.19209e-07
Including the information about the curvature get us to the first answer a lot quicker! Using -1 and -2 as starting points to find the other roots produces:
-1:1
-0.599916:0.103806
-0.554994:8.11815e-05
-0.554958:1.19209e-07
-0.554958:-5.96046e-08
-0.554958:-5.96046e-08
-2:1
-2.23067:0.0828772
-2.24698:1.66893e-05
-2.24698:-9.53674e-07
-2.24698:7.15256e-07
And for Newton:
-1:1
-0.500083:-0.124813
-0.555568:0.00140119
-0.554958:-5.96046e-08
-0.554958:-5.96046e-08
-2:1
-2.33341:-0.481945
-2.2531:-0.0317504
-2.247:-0.00013113
-2.24698:7.15256e-07
-2.24698:-9.53674e-07
Both of these converge pretty quick, so the results are a little inconclusive: for -2 Halley converges 1 step faster. For the root near -1 it looks like Newton gets their first... 4 steps rather than 5, but look more closely: on the first three iterations Halley is closer, and on the fourth is actually printing the right answer, but the error is 10e-7 rather than 10e-8 when it converges on the final step.
Overall Halley definitely converges faster. However there are problems.
The first is numerical instability. In the Newton code I used a value of dx=0.001. It worked great. In this code the second derivative requires dividing by dx squared, which would be 0.000001. Thats small. In fact it didn't work - I had to increase dx to 0.005 as a minimum otherwise the whole thing went crazy. Larger values of dx mean less accuracy to our derivatives, and things start to go wrong.
The second is that we're relying on more maths. Both Newton and Halley rely on the functions we're testing to be "well behaved", but there's just more to go wrong with that for Halley than Newton. There's also more code to go wrong - we're approximating more things, and that means more errors can creep in. In both cases there are a lot of things that need to go right to produce versions of the code that work reliably (the code here is for experimenting with and isn't designed to handle when things go wrong), but again theres more to go wrong in the more complex code.
The real problem however is that Halley converges faster PER STEP. But look at the work we're doing per step. It evaluates the function three times, rather than two, and then does a lot more maths on those values. Overall, although Halley uses less steps it probably uses about the same, or more cpu time. Computers are really good at doing lots of simple steps, so giving them the simpler code, and letting them run it fast, with lots of iterations is probably the better solution in the real world.
Even if this doesn't necessarily produce better results, it was still fun to try it, and there are a lot of good things to be learnt about implementing numerical analysis in code. We now return to our normal programming.
No comments:
Post a Comment