885 lines
34 KiB
Go
885 lines
34 KiB
Go
// Copyright 2009 The Go Authors. All rights reserved.
|
|
// Use of this source code is governed by a BSD-style
|
|
// license that can be found in the LICENSE file.
|
|
|
|
/*
|
|
|
|
Multi-precision division. Here be dragons.
|
|
|
|
Given u and v, where u is n+m digits, and v is n digits (with no leading zeros),
|
|
the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v.
|
|
That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x,
|
|
and rem = u - quo·v.
|
|
|
|
|
|
Long Division
|
|
|
|
Division in a computer proceeds the same as long division in elementary school,
|
|
but computers are not as good as schoolchildren at following vague directions,
|
|
so we have to be much more precise about the actual steps and what can happen.
|
|
|
|
We work from most to least significant digit of the quotient, doing:
|
|
|
|
• Guess a digit q, the number of v to subtract from the current
|
|
section of u to zero out the topmost digit.
|
|
• Check the guess by multiplying q·v and comparing it against
|
|
the current section of u, adjusting the guess as needed.
|
|
• Subtract q·v from the current section of u.
|
|
• Add q to the corresponding section of the result quo.
|
|
|
|
When all digits have been processed, the final remainder is left in u
|
|
and returned as rem.
|
|
|
|
For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2).
|
|
|
|
q₂ q₁ q₀
|
|
_________________
|
|
v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀
|
|
↓ ↓ ↓ | |
|
|
[u₄ u₃ u₂]| |
|
|
- [ q₂·v ]| |
|
|
----------- ↓ |
|
|
[ rem | u₁]|
|
|
- [ q₁·v ]|
|
|
----------- ↓
|
|
[ rem | u₀]
|
|
- [ q₀·v ]
|
|
------------
|
|
[ rem ]
|
|
|
|
Instead of creating new storage for the remainders and copying digits from u
|
|
as indicated by the arrows, we use u's storage directly as both the source
|
|
and destination of the subtractions, so that the remainders overwrite
|
|
successive overlapping sections of u as the division proceeds, using a slice
|
|
of u to identify the current section. This avoids all the copying as well as
|
|
shifting of remainders.
|
|
|
|
Division of u with n+m digits by v with n digits (in base B) can in general
|
|
produce at most m+1 digits, because:
|
|
|
|
• u < B^(n+m) [B^(n+m) has n+m+1 digits]
|
|
• v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number]
|
|
• u/v < B^(n+m) / B^(n-1) [divide bounds for u, v]
|
|
• u/v < B^(m+1) [simplify]
|
|
|
|
The first step is special: it takes the top n digits of u and divides them by
|
|
the n digits of v, producing the first quotient digit and an n-digit remainder.
|
|
In the example, q₂ = ⌊u₄u₃u₂ / v⌋.
|
|
|
|
The first step divides n digits by n digits to ensure that it produces only a
|
|
single digit.
|
|
|
|
Each subsequent step appends the next digit from u to the remainder and divides
|
|
those n+1 digits by the n digits of v, producing another quotient digit and a
|
|
new n-digit remainder.
|
|
|
|
Subsequent steps divide n+1 digits by n digits, an operation that in general
|
|
might produce two digits. However, as used in the algorithm, that division is
|
|
guaranteed to produce only a single digit. The dividend is of the form
|
|
rem·B + d, where rem is a remainder from the previous step and d is a single
|
|
digit, so:
|
|
|
|
• rem ≤ v - 1 [rem is a remainder from dividing by v]
|
|
• rem·B ≤ v·B - B [multiply by B]
|
|
• d ≤ B - 1 [d is a single digit]
|
|
• rem·B + d ≤ v·B - 1 [add]
|
|
• rem·B + d < v·B [change ≤ to <]
|
|
• (rem·B + d)/v < B [divide by v]
|
|
|
|
|
|
Guess and Check
|
|
|
|
At each step we need to divide n+1 digits by n digits, but this is for the
|
|
implementation of division by n digits, so we can't just invoke a division
|
|
routine: we _are_ the division routine. Instead, we guess at the answer and
|
|
then check it using multiplication. If the guess is wrong, we correct it.
|
|
|
|
How can this guessing possibly be efficient? It turns out that the following
|
|
statement (let's call it the Good Guess Guarantee) is true.
|
|
|
|
If
|
|
|
|
• q = ⌊u/v⌋ where u is n+1 digits and v is n digits,
|
|
• q < B, and
|
|
• the topmost digit of v = vₙ₋₁ ≥ B/2,
|
|
|
|
then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.)
|
|
|
|
That is, if we know the answer has only a single digit and we guess an answer
|
|
by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division,
|
|
then that guess is at least as large as the correct answer. It is also not
|
|
too much larger: it is off by at most two from the correct answer.
|
|
|
|
Note that in the first step of the overall division, which is an n-by-n-digit
|
|
division, the 2-by-1 guess uses an implicit uₙ = 0.
|
|
|
|
Note that using a 2-by-1-digit division here does not mean calling ourselves
|
|
recursively. Instead, we use an efficient direct hardware implementation of
|
|
that operation.
|
|
|
|
Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v.
|
|
If a guess q̂ is too big, it will not satisfy this test. Viewed a different way,
|
|
the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is
|
|
negative, then the guess q̂ is too big.
|
|
|
|
This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.
|
|
Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because
|
|
q̂ ≤ q+2.
|
|
|
|
|
|
Scaling Inputs
|
|
|
|
The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
|
|
For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
|
|
because the first digit 1 is smaller than B/2 = 5.
|
|
|
|
We can ensure that v has a large top digit by multiplying both u and v by the
|
|
right amount. Continuing the example, if we multiply both 172 and 19 by 3, we
|
|
now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough
|
|
⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
|
|
to multiply by 4, because that can be done with a shift. Specifically, we can
|
|
always count the number of leading zeros i in the first digit of v and then
|
|
shift both u and v left by i bits.
|
|
|
|
Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
|
|
be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
|
|
by the scaling factor (shifting right i bits) when we finish.
|
|
|
|
Note that these shifts happen before and after the entire division algorithm,
|
|
not at each step in the per-digit iteration.
|
|
|
|
Note the effect of scaling inputs on the size of the possible quotient.
|
|
In the scaled u/v, u can gain a digit from scaling; v never does, because we
|
|
pick the scaling factor to make v's top digit larger but without overflowing.
|
|
If u and v have n+m and n digits after scaling, then:
|
|
|
|
• u < B^(n+m) [B^(n+m) has n+m+1 digits]
|
|
• v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
|
|
• u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
|
|
• u/v < 2 B^m [simplify]
|
|
|
|
The quotient can still have m+1 significant digits, but if so the top digit
|
|
must be a 1. This provides a different way to handle the first digit of the
|
|
result: compare the top n digits of u against v and fill in either a 0 or a 1.
|
|
|
|
|
|
Refining Guesses
|
|
|
|
Before we check whether u < q̂·v, we can adjust our guess to change it from
|
|
q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
|
|
Although not mentioned above, the Good Guess Guarantee also promises that this
|
|
3-by-2-digit division guess is more precise and at most one away from the real
|
|
answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done
|
|
without n-digit math.
|
|
|
|
If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to
|
|
⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division:
|
|
if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced.
|
|
|
|
Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0,
|
|
and
|
|
|
|
uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂)
|
|
[splitting off the bottom digit]
|
|
= (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
|
|
[regrouping]
|
|
|
|
The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁.
|
|
If the initial guess returns both q̂ and its remainder r̂, then checking
|
|
whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂.
|
|
|
|
If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by
|
|
decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂.
|
|
(As before, this fixup is only needed at most twice.)
|
|
|
|
Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one
|
|
away from the correct q, and we've avoided doing any n-digit math.
|
|
(If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.)
|
|
|
|
The final check u < q̂·v and the possible fixup must be done at full precision.
|
|
For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess
|
|
is not often wrong at all. But still we must do the check. Note that since the
|
|
3-by-2 guess is off by at most 1, it can be convenient to perform the final
|
|
u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the
|
|
subtraction underflows, decremeting q̂ and adding one v back to r is enough to
|
|
arrive at the final q, r.
|
|
|
|
That's the entirety of long division: scale the inputs, and then loop over
|
|
each output position, guessing, checking, and correcting the next output digit.
|
|
|
|
For a 2n-digit number divided by an n-digit number (the worst size-n case for
|
|
division complexity), this algorithm uses n+1 iterations, each of which must do
|
|
at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of
|
|
O(n) time each, so O(n²) time overall.
|
|
|
|
|
|
Recursive Division
|
|
|
|
For very large inputs, it is possible to improve on the O(n²) algorithm.
|
|
Let's call a group of n/2 real digits a (very) “wide digit”. We can run the
|
|
standard long division algorithm explained above over the wide digits instead of
|
|
the actual digits. This will result in many fewer steps, but the math involved in
|
|
each step is more work.
|
|
|
|
Where basic long division uses a 2-by-1-digit division to guess the initial q̂,
|
|
the new algorithm must use a 2-by-1-wide-digit division, which is of course
|
|
really an n-by-n/2-digit division. That's OK: if we implement n-digit division
|
|
in terms of n/2-digit division, the recursion will terminate when the divisor
|
|
becomes small enough to handle with standard long division or even with the
|
|
2-by-1 hardware instruction.
|
|
|
|
For example, here is a sketch of dividing 10 digits by 4, proceeding with
|
|
wide digits corresponding to two regular digits. The first step, still special,
|
|
must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit
|
|
remainder less than v. The middle steps divide 6 digits by 4, guaranteed to
|
|
produce two output digits each (one wide digit) with 4-digit remainders.
|
|
The final step must use what it has: the 4-digit remainder plus one more,
|
|
5 digits to divide by 4.
|
|
|
|
q₆ q₅ q₄ q₃ q₂ q₁ q₀
|
|
_______________________________
|
|
v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
|
|
↓ ↓ ↓ ↓ ↓ | | | | |
|
|
[u₉ u₈ u₇ u₆ u₅]| | | | |
|
|
- [ q₆q₅·v ]| | | | |
|
|
----------------- ↓ ↓ | | |
|
|
[ rem |u₄ u₃]| | |
|
|
- [ q₄q₃·v ]| | |
|
|
-------------------- ↓ ↓ |
|
|
[ rem |u₂ u₁]|
|
|
- [ q₂q₁·v ]|
|
|
-------------------- ↓
|
|
[ rem |u₀]
|
|
- [ q₀·v ]
|
|
------------------
|
|
[ rem ]
|
|
|
|
An alternative would be to look ahead to how well n/2 divides into n+m and
|
|
adjust the first step to use fewer digits as needed, making the first step
|
|
more special to make the last step not special at all. For example, using the
|
|
same input, we could choose to use only 4 digits in the first step, leaving
|
|
a full wide digit for the last step:
|
|
|
|
q₆ q₅ q₄ q₃ q₂ q₁ q₀
|
|
_______________________________
|
|
v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
|
|
↓ ↓ ↓ ↓ | | | | | |
|
|
[u₉ u₈ u₇ u₆]| | | | | |
|
|
- [ q₆·v ]| | | | | |
|
|
-------------- ↓ ↓ | | | |
|
|
[ rem |u₅ u₄]| | | |
|
|
- [ q₅q₄·v ]| | | |
|
|
-------------------- ↓ ↓ | |
|
|
[ rem |u₃ u₂]| |
|
|
- [ q₃q₂·v ]| |
|
|
-------------------- ↓ ↓
|
|
[ rem |u₁ u₀]
|
|
- [ q₁q₀·v ]
|
|
---------------------
|
|
[ rem ]
|
|
|
|
Today, the code in divRecursiveStep works like the first example. Perhaps in
|
|
the future we will make it work like the alternative, to avoid a special case
|
|
in the final iteration.
|
|
|
|
Either way, each step is a 3-by-2-wide-digit division approximated first by
|
|
a 2-by-1-wide-digit division, just as we did for regular digits in long division.
|
|
Because the actual answer we want is a 3-by-2-wide-digit division, instead of
|
|
multiplying q̂·v directly during the fixup, we can use the quick refinement
|
|
from long division (an n/2-by-n/2 multiply) to correct q to its actual value
|
|
and also compute the remainder (as mentioned above), and then stop after that,
|
|
never doing a full n-by-n multiply.
|
|
|
|
Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add
|
|
(not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that
|
|
produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee
|
|
to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first
|
|
digit. These benefits are discussed more after the Good Guess Guarantee proof
|
|
below.
|
|
|
|
|
|
How Fast is Recursive Division?
|
|
|
|
For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over
|
|
wide digits, producing two wide digits plus a possible leading regular digit 1,
|
|
which can be handled without a recursive call. That is, the algorithm uses two
|
|
full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit
|
|
multiplication, along with a few n-digit additions and subtractions. The standard
|
|
n-by-n-digit multiplication algorithm requires O(n²) time, making the overall
|
|
algorithm require time T(n) where
|
|
|
|
T(n) = 2T(n/2) + O(n) + O(n²)
|
|
|
|
which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²).
|
|
This is not an improvement over regular long division.
|
|
|
|
When the number of digits n becomes large enough, Karatsuba's algorithm for
|
|
multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time.
|
|
(Karatsuba multiplication is implemented in func karatsuba in nat.go.)
|
|
That makes the overall recursive division algorithm take O(n^1.6) time as well,
|
|
which is an improvement, but again only for large enough numbers.
|
|
|
|
It is not critical to make sure that every recursion does only two recursive
|
|
calls. While in general the number of recursive calls can change the time
|
|
analysis, in this case doing three calls does not change the analysis:
|
|
|
|
T(n) = 3T(n/2) + O(n) + O(n^log₂3)
|
|
|
|
ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
|
|
time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
|
|
division does not hurt the asymptotic performance. Of course, it is likely
|
|
still faster in practice to do two.
|
|
|
|
|
|
Proof of the Good Guess Guarantee
|
|
|
|
Given numbers x, y, let us break them into the quotients and remainders when
|
|
divided by some scaling factor S, with the added constraints that the quotient
|
|
x/y and the high part of y are both less than some limit T, and that the high
|
|
part of y is at least half as big as T.
|
|
|
|
x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋
|
|
x₀ = x mod S y₀ = y mod S
|
|
|
|
x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T
|
|
y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T
|
|
|
|
And consider the two truncated quotients:
|
|
|
|
q = ⌊x/y⌋
|
|
q̂ = ⌊x₁/y₁⌋
|
|
|
|
We will prove that q ≤ q̂ ≤ q+2.
|
|
|
|
The guarantee makes no real demands on the scaling factor S: it is simply the
|
|
magnitude of the digits cut from both x and y to produce x₁ and y₁.
|
|
The guarantee makes only limited demands on T: it must be large enough to hold
|
|
the quotient x/y, and y₁ must have roughly the same size.
|
|
|
|
To apply to the earlier discussion of 2-by-1 guesses in long division,
|
|
we would choose:
|
|
|
|
S = Bⁿ⁻¹
|
|
T = B
|
|
x = u
|
|
x₁ = uₙuₙ₋₁
|
|
x₀ = uₙ₋₂...u₀
|
|
y = v
|
|
y₁ = vₙ₋₁
|
|
y₀ = vₙ₋₂...u₀
|
|
|
|
These simpler variables avoid repeating those longer expressions in the proof.
|
|
|
|
Note also that, by definition, truncating division ⌊x/y⌋ satisfies
|
|
|
|
x/y - 1 < ⌊x/y⌋ ≤ x/y.
|
|
|
|
This fact will be used a few times in the proofs.
|
|
|
|
Proof that q ≤ q̂:
|
|
|
|
q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋]
|
|
> (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋]
|
|
= x₁ - y₁ [distribute y₁]
|
|
|
|
So q̂·y₁ > x₁ - y₁.
|
|
Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
|
|
|
|
q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋]
|
|
≥ q̂ - x/y [⌊x/y⌋ < x/y]
|
|
= (1/y)·(q̂·y - x) [factor out 1/y]
|
|
≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S]
|
|
≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1]
|
|
= (1/y)·(x₁·S - y₁·S + S - x) [distribute S]
|
|
= (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀]
|
|
> -y₁·S / y [x₀ < S, so S - x₀ < 0; drop it]
|
|
≥ -1 [y₁·S ≤ y]
|
|
|
|
So q̂ - q > -1.
|
|
Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
|
|
|
|
Proof that q̂ ≤ q+2:
|
|
|
|
x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S]
|
|
≤ x/y₁·S - x/y [x₁S ≤ x]
|
|
= (x/y)·(y/y₁·S - 1) [factor out x/y]
|
|
= (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction]
|
|
= (x/y)·(y₀/y₁·S) [y - y₁·S = y₀]
|
|
= (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁]
|
|
< (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1]
|
|
≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T]
|
|
< T·(2/T) [x/y < T]
|
|
= 2 [T·(2/T) = 2]
|
|
|
|
So x₁/y₁ - x/y < 2.
|
|
|
|
q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋]
|
|
= ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋]
|
|
≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁]
|
|
< x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1]
|
|
= (x₁/y₁ - x/y) + 1 [regrouping]
|
|
< 2 + 1 [above: x₁/y₁ - x/y < 2]
|
|
= 3
|
|
|
|
So q̂ - q < 3.
|
|
Since q̂ - q is an integer, q̂ - q ≤ 2.
|
|
|
|
Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore
|
|
q̂ - q ≤ 1.
|
|
|
|
Note also that in the general case 2n-by-n division where we don't know that
|
|
x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could
|
|
remove the special case first step of long division as long as we allow the
|
|
first fixup loop to run up to four times. (Using a simple comparison to decide
|
|
whether the first digit is 0 or 1 is still more efficient, though.)
|
|
|
|
Finally, note that when dividing three leading base-B digits by two (scaled),
|
|
we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T.
|
|
This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
|
|
⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
|
|
integer. For random x and y, the chance of this is 2/B, or, for large B,
|
|
approximately zero. This means that after we produce the 3-by-2 guess in the
|
|
long division algorithm, the fixup loop essentially never runs.
|
|
|
|
In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
|
|
division has exactly the same effect: the probability of needing a fixup is the
|
|
same 2/B. Even better, we can allow the general case x/y < 2T and the fixup
|
|
probability only grows to 4/B, still essentially zero.
|
|
|
|
|
|
References
|
|
|
|
There are no great references for implementing long division; thus this comment.
|
|
Here are some notes about what to expect from the obvious references.
|
|
|
|
Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical
|
|
reference for long division, but that entire series is highly compressed, never
|
|
repeating a necessary fact and leaving important insights to the exercises.
|
|
For example, no rationale whatsoever is given for the calculation that extends
|
|
q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound.
|
|
The proof that the calculation even has the desired effect is left to exercises.
|
|
The solutions to those exercises provided at the back of the book are entirely
|
|
calculations, still with no explanation as to what is going on or how you would
|
|
arrive at the idea of doing those exact calculations. Nowhere is it mentioned
|
|
that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the
|
|
Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction,
|
|
making it difficult to understand how modifications like adding another digit
|
|
or adjusting the quotient range affects the overall bound.
|
|
|
|
All that said, Knuth remains the canonical reference. It is dense but packed
|
|
full of information and references, and the proofs are simpler than many other
|
|
presentations. The proofs above are reworkings of Knuth's to remove the
|
|
arguments by contradiction and add explanations or steps that Knuth omitted.
|
|
But beware of errors in older printings. Take the published errata with you.
|
|
|
|
Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield”
|
|
starts with a blunt critique of Knuth's presentation (among others) and then
|
|
presents a more detailed and easier to follow treatment of long division,
|
|
including an implementation in Pascal. But the algorithm and implementation
|
|
work entirely in terms of 3-by-2 division, which is much less useful on modern
|
|
hardware than an algorithm using 2-by-1 division. The proofs are a bit too
|
|
focused on digit counting and seem needlessly complex, especially compared to
|
|
the ones given above.
|
|
|
|
Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of
|
|
implementing division by an n-digit divisor using recursive calls to division
|
|
by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
|
|
sub-quadratic run time. However, the presentation decisions are made almost
|
|
entirely for the purpose of simplifying the run-time analysis, rather than
|
|
simplifying the presentation. Instead of a single algorithm that loops over
|
|
quotient digits, the paper presents two mutually-recursive algorithms, for
|
|
2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
|
|
algorithm.
|
|
|
|
The proofs in the paper are remarkably complex, especially considering that
|
|
the algorithm is at its core just long division on wide digits, so that the
|
|
usual long division proofs apply essentially unaltered.
|
|
*/
|
|
|
|
package big
|
|
|
|
import "math/bits"
|
|
|
|
// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
|
|
// It uses z and z2 as the storage for q and r.
|
|
func (z nat) div(z2, u, v nat) (q, r nat) {
|
|
if len(v) == 0 {
|
|
panic("division by zero")
|
|
}
|
|
|
|
if u.cmp(v) < 0 {
|
|
q = z[:0]
|
|
r = z2.set(u)
|
|
return
|
|
}
|
|
|
|
if len(v) == 1 {
|
|
// Short division: long optimized for a single-word divisor.
|
|
// In that case, the 2-by-1 guess is all we need at each step.
|
|
var r2 Word
|
|
q, r2 = z.divW(u, v[0])
|
|
r = z2.setWord(r2)
|
|
return
|
|
}
|
|
|
|
q, r = z.divLarge(z2, u, v)
|
|
return
|
|
}
|
|
|
|
// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
|
|
// It uses z as the storage for q.
|
|
// Note that y is a single digit (Word), not a big number.
|
|
func (z nat) divW(x nat, y Word) (q nat, r Word) {
|
|
m := len(x)
|
|
switch {
|
|
case y == 0:
|
|
panic("division by zero")
|
|
case y == 1:
|
|
q = z.set(x) // result is x
|
|
return
|
|
case m == 0:
|
|
q = z[:0] // result is 0
|
|
return
|
|
}
|
|
// m > 0
|
|
z = z.make(m)
|
|
r = divWVW(z, 0, x, y)
|
|
q = z.norm()
|
|
return
|
|
}
|
|
|
|
// modW returns x % d.
|
|
func (x nat) modW(d Word) (r Word) {
|
|
// TODO(agl): we don't actually need to store the q value.
|
|
var q nat
|
|
q = q.make(len(x))
|
|
return divWVW(q, 0, x, d)
|
|
}
|
|
|
|
// divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
|
|
// The caller must ensure that len(z) = len(x).
|
|
func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
|
|
r = xn
|
|
if len(x) == 1 {
|
|
qq, rr := bits.Div(uint(r), uint(x[0]), uint(y))
|
|
z[0] = Word(qq)
|
|
return Word(rr)
|
|
}
|
|
rec := reciprocalWord(y)
|
|
for i := len(z) - 1; i >= 0; i-- {
|
|
z[i], r = divWW(r, x[i], y, rec)
|
|
}
|
|
return r
|
|
}
|
|
|
|
// div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn.
|
|
// It uses z and u as the storage for q and r.
|
|
// The caller must ensure that len(vIn) ≥ 2 (use divW otherwise)
|
|
// and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise).
|
|
func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) {
|
|
n := len(vIn)
|
|
m := len(uIn) - n
|
|
|
|
// Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above).
|
|
// vIn is treated as a read-only input (it may be in use by another
|
|
// goroutine), so we must make a copy.
|
|
// uIn is copied to u.
|
|
shift := nlz(vIn[n-1])
|
|
vp := getNat(n)
|
|
v := *vp
|
|
shlVU(v, vIn, shift)
|
|
u = u.make(len(uIn) + 1)
|
|
u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift)
|
|
|
|
// The caller should not pass aliased z and u, since those are
|
|
// the two different outputs, but correct just in case.
|
|
if alias(z, u) {
|
|
z = nil
|
|
}
|
|
q = z.make(m + 1)
|
|
|
|
// Use basic or recursive long division depending on size.
|
|
if n < divRecursiveThreshold {
|
|
q.divBasic(u, v)
|
|
} else {
|
|
q.divRecursive(u, v)
|
|
}
|
|
putNat(vp)
|
|
|
|
q = q.norm()
|
|
|
|
// Undo scaling of remainder.
|
|
shrVU(u, u, shift)
|
|
r = u.norm()
|
|
|
|
return q, r
|
|
}
|
|
|
|
// divBasic implements long division as described above.
|
|
// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
|
|
// q must be large enough to hold ⌊u/v⌋.
|
|
func (q nat) divBasic(u, v nat) {
|
|
n := len(v)
|
|
m := len(u) - n
|
|
|
|
qhatvp := getNat(n + 1)
|
|
qhatv := *qhatvp
|
|
|
|
// Set up for divWW below, precomputing reciprocal argument.
|
|
vn1 := v[n-1]
|
|
rec := reciprocalWord(vn1)
|
|
|
|
// Compute each digit of quotient.
|
|
for j := m; j >= 0; j-- {
|
|
// Compute the 2-by-1 guess q̂.
|
|
// The first iteration must invent a leading 0 for u.
|
|
qhat := Word(_M)
|
|
var ujn Word
|
|
if j+n < len(u) {
|
|
ujn = u[j+n]
|
|
}
|
|
|
|
// ujn ≤ vn1, or else q̂ would be more than one digit.
|
|
// For ujn == vn1, we set q̂ to the max digit M above.
|
|
// Otherwise, we compute the 2-by-1 guess.
|
|
if ujn != vn1 {
|
|
var rhat Word
|
|
qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec)
|
|
|
|
// Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above.
|
|
vn2 := v[n-2]
|
|
x1, x2 := mulWW(qhat, vn2)
|
|
ujn2 := u[j+n-2]
|
|
for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2]
|
|
qhat--
|
|
prevRhat := rhat
|
|
rhat += vn1
|
|
// If r̂ overflows, then
|
|
// r̂ u[j+n-2]v[n-1] is now definitely > x1 x2.
|
|
if rhat < prevRhat {
|
|
break
|
|
}
|
|
// TODO(rsc): No need for a full mulWW.
|
|
// x2 += vn2; if x2 overflows, x1++
|
|
x1, x2 = mulWW(qhat, vn2)
|
|
}
|
|
}
|
|
|
|
// Compute q̂·v.
|
|
qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0)
|
|
qhl := len(qhatv)
|
|
if j+qhl > len(u) && qhatv[n] == 0 {
|
|
qhl--
|
|
}
|
|
|
|
// Subtract q̂·v from the current section of u.
|
|
// If it underflows, q̂·v > u, which we fix up
|
|
// by decrementing q̂ and adding v back.
|
|
c := subVV(u[j:j+qhl], u[j:], qhatv)
|
|
if c != 0 {
|
|
c := addVV(u[j:j+n], u[j:], v)
|
|
// If n == qhl, the carry from subVV and the carry from addVV
|
|
// cancel out and don't affect u[j+n].
|
|
if n < qhl {
|
|
u[j+n] += c
|
|
}
|
|
qhat--
|
|
}
|
|
|
|
// Save quotient digit.
|
|
// Caller may know the top digit is zero and not leave room for it.
|
|
if j == m && m == len(q) && qhat == 0 {
|
|
continue
|
|
}
|
|
q[j] = qhat
|
|
}
|
|
|
|
putNat(qhatvp)
|
|
}
|
|
|
|
// greaterThan reports whether the two digit numbers x1 x2 > y1 y2.
|
|
// TODO(rsc): In contradiction to most of this file, x1 is the high
|
|
// digit and x2 is the low digit. This should be fixed.
|
|
func greaterThan(x1, x2, y1, y2 Word) bool {
|
|
return x1 > y1 || x1 == y1 && x2 > y2
|
|
}
|
|
|
|
// divRecursiveThreshold is the number of divisor digits
|
|
// at which point divRecursive is faster than divBasic.
|
|
const divRecursiveThreshold = 100
|
|
|
|
// divRecursive implements recursive division as described above.
|
|
// It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r.
|
|
// z must be large enough to hold ⌊u/v⌋.
|
|
// This function is just for allocating and freeing temporaries
|
|
// around divRecursiveStep, the real implementation.
|
|
func (z nat) divRecursive(u, v nat) {
|
|
// Recursion depth is (much) less than 2 log₂(len(v)).
|
|
// Allocate a slice of temporaries to be reused across recursion,
|
|
// plus one extra temporary not live across the recursion.
|
|
recDepth := 2 * bits.Len(uint(len(v)))
|
|
tmp := getNat(3 * len(v))
|
|
temps := make([]*nat, recDepth)
|
|
|
|
z.clear()
|
|
z.divRecursiveStep(u, v, 0, tmp, temps)
|
|
|
|
// Free temporaries.
|
|
for _, n := range temps {
|
|
if n != nil {
|
|
putNat(n)
|
|
}
|
|
}
|
|
putNat(tmp)
|
|
}
|
|
|
|
// divRecursiveStep is the actual implementation of recursive division.
|
|
// It adds ⌊u/v⌋ to z and overwrites u with the remainder r.
|
|
// z must be large enough to hold ⌊u/v⌋.
|
|
// It uses temps[depth] (allocating if needed) as a temporary live across
|
|
// the recursive call. It also uses tmp, but not live across the recursion.
|
|
func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) {
|
|
// u is a subsection of the original and may have leading zeros.
|
|
// TODO(rsc): The v = v.norm() is useless and should be removed.
|
|
// We know (and require) that v's top digit is ≥ B/2.
|
|
u = u.norm()
|
|
v = v.norm()
|
|
if len(u) == 0 {
|
|
z.clear()
|
|
return
|
|
}
|
|
|
|
// Fall back to basic division if the problem is now small enough.
|
|
n := len(v)
|
|
if n < divRecursiveThreshold {
|
|
z.divBasic(u, v)
|
|
return
|
|
}
|
|
|
|
// Nothing to do if u is shorter than v (implies u < v).
|
|
m := len(u) - n
|
|
if m < 0 {
|
|
return
|
|
}
|
|
|
|
// We consider B digits in a row as a single wide digit.
|
|
// (See “Recursive Division” above.)
|
|
//
|
|
// TODO(rsc): rename B to Wide, to avoid confusion with _B,
|
|
// which is something entirely different.
|
|
// TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋.
|
|
B := n / 2
|
|
|
|
// Allocate a nat for qhat below.
|
|
if temps[depth] == nil {
|
|
temps[depth] = getNat(n) // TODO(rsc): Can be just B+1.
|
|
} else {
|
|
*temps[depth] = temps[depth].make(B + 1)
|
|
}
|
|
|
|
// Compute each wide digit of the quotient.
|
|
//
|
|
// TODO(rsc): Change the loop to be
|
|
// for j := (m+B-1)/B*B; j > 0; j -= B {
|
|
// which will make the final step a regular step, letting us
|
|
// delete what amounts to an extra copy of the loop body below.
|
|
j := m
|
|
for j > B {
|
|
// Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits).
|
|
// First make the 2-by-1-wide-digit guess using a recursive call.
|
|
// Then extend the guess to the full 3-by-2 (see “Refining Guesses”).
|
|
//
|
|
// For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit,
|
|
// we use a (2B+1)-by-(B+1) digit, which handles the possibility that
|
|
// the result has an extra leading 1 digit as well as guaranteeing
|
|
// that the computed q̂ will be off by at most 1 instead of 2.
|
|
|
|
// s is the number of digits to drop from the 3B- and 2B-digit chunks.
|
|
// We drop B-1 to be left with 2B+1 and B+1.
|
|
s := (B - 1)
|
|
|
|
// uu is the up-to-3B-digit section of u we are working on.
|
|
uu := u[j-B:]
|
|
|
|
// Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n].
|
|
qhat := *temps[depth]
|
|
qhat.clear()
|
|
qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps)
|
|
qhat = qhat.norm()
|
|
|
|
// Extend to a 3-by-2 quotient and remainder.
|
|
// Because divRecursiveStep overwrote the top part of uu with
|
|
// the remainder r̂, the full uu already contains the equivalent
|
|
// of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion.
|
|
// Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder.
|
|
// If that subtraction underflows, q̂·v > u, which we fix up
|
|
// by decrementing q̂ and adding v back, same as in long division.
|
|
|
|
// TODO(rsc): Instead of subtract and fix-up, this code is computing
|
|
// q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u.
|
|
// But we can do the subtraction directly, as in the comment above
|
|
// and in long division, because we know that q̂ is wrong by at most one.
|
|
qhatv := tmp.make(3 * n)
|
|
qhatv.clear()
|
|
qhatv = qhatv.mul(qhat, v[:s])
|
|
for i := 0; i < 2; i++ {
|
|
e := qhatv.cmp(uu.norm())
|
|
if e <= 0 {
|
|
break
|
|
}
|
|
subVW(qhat, qhat, 1)
|
|
c := subVV(qhatv[:s], qhatv[:s], v[:s])
|
|
if len(qhatv) > s {
|
|
subVW(qhatv[s:], qhatv[s:], c)
|
|
}
|
|
addAt(uu[s:], v[s:], 0)
|
|
}
|
|
if qhatv.cmp(uu.norm()) > 0 {
|
|
panic("impossible")
|
|
}
|
|
c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv)
|
|
if c > 0 {
|
|
subVW(uu[len(qhatv):], uu[len(qhatv):], c)
|
|
}
|
|
addAt(z, qhat, j-B)
|
|
j -= B
|
|
}
|
|
|
|
// TODO(rsc): Rewrite loop as described above and delete all this code.
|
|
|
|
// Now u < (v<<B), compute lower bits in the same way.
|
|
// Choose shift = B-1 again.
|
|
s := B - 1
|
|
qhat := *temps[depth]
|
|
qhat.clear()
|
|
qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps)
|
|
qhat = qhat.norm()
|
|
qhatv := tmp.make(3 * n)
|
|
qhatv.clear()
|
|
qhatv = qhatv.mul(qhat, v[:s])
|
|
// Set the correct remainder as before.
|
|
for i := 0; i < 2; i++ {
|
|
if e := qhatv.cmp(u.norm()); e > 0 {
|
|
subVW(qhat, qhat, 1)
|
|
c := subVV(qhatv[:s], qhatv[:s], v[:s])
|
|
if len(qhatv) > s {
|
|
subVW(qhatv[s:], qhatv[s:], c)
|
|
}
|
|
addAt(u[s:], v[s:], 0)
|
|
}
|
|
}
|
|
if qhatv.cmp(u.norm()) > 0 {
|
|
panic("impossible")
|
|
}
|
|
c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv)
|
|
if c > 0 {
|
|
c = subVW(u[len(qhatv):], u[len(qhatv):], c)
|
|
}
|
|
if c > 0 {
|
|
panic("impossible")
|
|
}
|
|
|
|
// Done!
|
|
addAt(z, qhat.norm(), 0)
|
|
}
|