On a grid with width m and height
n, how many triangles have lattice
points as vertices but otherwise have no lattice points on the edges and
interiors? (Such triangles are called ‘empty’.) Find the answer modulo
109+7.
For example, for m=1 and n=2, the 12 empty triangles by shape and multiplicity
are given below: (Image from kata description.)
Example from kata
Geometry part
According to Pick’s
theorem, a triangle with lattice point vertices has area: A=i+2e+21 Where i is the number of
lattice points on the interior (not counting edges), and e is the number of points on the edges (not
counting vertices). Therefore triangle is empty iff its area is 1/2 and its vertices are lattice points. (For
the sake of brevity we always work with lattice points from now on.)
Intuitively, empty triangles have a relatively ‘thin’ shape. The
longer edges get, the ‘thinner’ the triangle gets. An exact useful bound
in the ‘thickness’ of an empty triangle be encapsulated in this
lemma:
Lemma: For an empty triangle, two of its vertices
coincide with opposite corners of its bounding box.
Proof: Pick the leftmost point A and rightmost point B of a triangle. (If there is a tie, pick an
arbitrary one.)
A triangle in its bounding
box
Without loss of generality, we assume that B is higher than or level with A, and C is
above the line segment AB. Construct a
vertical line through C, intersecting
AB at D as shown. Let h=∣CD∣ and let m be the
horizontal distance between A and B. Then the area of △ABC is hm/2. This can be shown by constructing a
horizontal line through D intersecting
the bounding box on the side of A at
E, and on the side of B at F, then
△CDE=△CDA and △CDF=△CDB. Then:
△ABC=△CDA+△CDB=△CDE+△CDF=△CEF=hm/2
(Where clear from context △XYZ represents the triangle’s area)
Therefore if △ABC is empty,
then hm=1. If C is within the rectangle with A and B as
corners (including boundaries), then it satisfies the desired
conclusion.
Otherwise C is above B, and since they are lattice points, C must be at least 1 unit higher than B, therefore h≥1. But we already have m≥1,
so it must be the case that h=m=1.
Then one of CA and CB is vertical.
If CA is vertical, then ∣CA∣=h=1. We are now
constrained in a 1×1 box, in
which the only possible triangle shape of edge lengths (1,1,2) indeed has two vertices as
corners of its bounding box.
If CB is vertical, then A and C are
corners of the bounding box of △ABC
Thus concludes the proof of this lemma.
Now we only need to find, for each possible bounding box size, the
number of empty triangles that fits exactly into this bounding box.
There are four completely symmetric ways to fit an empty triangle into
its, depending which diagonal the triangle’s two points ‘outer’ points
sit on, and on which side the third ‘inner’ point lies. Here’s the four
cases illustrated, where in each case the shown diagonal is one edge of
the triangle, and the other vertex is in the shaded region.
The four cases
Therefore we can just calculate one case and multiply the answer by
four. Let’s put one case at the origin like this:
Solving for a certain size
Supposed the rectangle has width m
and height n. Similar to what has been
shown while proving the lemma, for the area of △ABC to be 1/2, C must
lie on a line that is AB shifted down
by 1/m, and it must also be a lattice
point. The equation of line AB is:
mx+ny=1 And therefore that of the shifted line is: mx+ny+1/mnx+my+1m(n−y)+n(−x)=1=mn=1Bézout’s
identity tells us that there are solutions if and only if gcd(m,n)=1. We are interested in only
solutions in the region 0≤x<m
and 0≤y<n. Given one solution
(x=x0,y=y0), the general solution
is (x=x0+km,y=y0−kn) where
k is any integer. There is a unique
solution (x=x0+⌊y0/n⌋m,y=y0modn) that lies within the region. This can be shown
first by noting that euclidean division of integers gives a unique 0≤y<n such that for some integer
k, y0=kn+y. Then we verify that x≥0, which shows that the solution is inside the region we
want:
x=x0+⌊ny0⌋m=nm(n−y0)−1+⌊ny0⌋m=m(1+⌊ny0⌋−ny0)−n1≥m⋅n1−n1=nm−1≥0 (In the middle inequality step, note that since y0 is a integer, y0/n is some integer plus k/n, where k is a integer such that 0≤k<n.)
Therefore an m×n bounding box
has four empty triangles if gcd(m,n)=1, and none otherwise.
We however need to consider all bounding box sizes from 1×1 up to m×n, instead of just one size. A size j×i rectangle fits into a size m×n rectangle (n+1−i)(m+1−j) times. So the answer
is: 4i=1∑nj=1∑m[gcd(i,j)=1](n+1−i)(m+1−j)
([P] is the Iverson
bracket, which is 1 when P is true and 0 otherwise.)
(Note 1: I’m doing this weird transposition of coordinate thing (just
once here) because we don’t use coordinates consistently between
geometry and, shall we say, discrete mathematics. In geometry to refer
to a point we have x before y, and y
positive goes up, but if we refer to an entry in a matrix we have row
number before column number, and successive rows goes down. Things are
completely symmetric anyway, so the best way to deal with this
inconsistency is to forget that you noticed it or that I told you
it.)
(Note 2: This function without the factor 4 can be found as A114999 on OEIS.)
Current progress: We know that we need to find 4×a(n,m) where a(n,m)=i=1∑nj=1∑m[gcd(i,j)=1](n+1−i)(m+1−j)
Möbius
inversion is an application of the inclusion-exclusion principle. A
function is defined as follows:
μ(n)={(−1)k0if n factorizes to k distinct primesotherwise This function has this important property: d∣n∑μ(d)=[n=1] (Where d∣n means for each
positive divisor of n including 1 and n, of
course not counting twice if n is 1.)
We can use this equation to replace [gcd(i,j)=1] with ∑d∣gcd(i,j)μ(d), which allows us
to ‘unleash’ the properties of gcd(i,j), namely that the set of divisors of gcd(i,j) is the set of common divisors of
i and j (which I write as summing over d∣i,d∣j below), which in turn allows us
to rewrite out summations: ⋯=i=1∑nj=1∑md∣gcd(i,j)∑μ(d)((n+1)(m+1)−(m+1)i−(n+1)j+ij)=i=1∑nj=1∑md∣i,d∣j∑μ(d)((n+1)(m+1)−(m+1)i−(n+1)j+ij)=d=1∑min{n,m}i=1∑⌊n/d⌋j=1∑⌊m/d⌋μ(d)((n+1)(m+1)−(m+1)di−(n+1)dj+d2ij)
Note that in the last step I ‘switched variables’ i←i/d and j←j/d, and instead of looking for
possible d given i and j, I
look for possible di and dj given d,
which is simpler because it does not involve a divisibility check.
Let’s split that up into three parts for easier handling later
on:
Where ID(n) is the sum of
id(n)=n, that is ID(n)=n(n+1)/2.
The three parts have a common pattern: sk,u(n,m)=d=1∑min{n,m}μ(d)dku(⌊dn⌋,⌊dm⌋) Where u is trivially
computable. If we figure out how to compute sk,u, we’re set.
Intermission:
Number-theoretic summation trick
Now for something completely out of nowhere.
The Dirichlet convolution of two functions is defined as: (f∗g)(n)=d∣n∑f(d)g(dn) Let G(n)=∑i=1ng(i), and similarly let capital letter functions represent the
sum of the corresponding lowercase letter functions. An observation is
that if we sum a Dirichlet convolution, we get: (The sum over cd=i means all positive integers c and d such
that cd=i, and the sum over cd≤n works similarly) i=1∑n(f∗g)(i)=i=1∑nd∣i∑f(d)g(di)=i=1∑ncd=i∑f(c)g(d)=cd≤n∑f(c)g(d)=c=1∑nf(c)i=1∑⌊n/c⌋g(i)=c=1∑nf(c)G(⌊cn⌋) Isolating the first term gives: f(1)G(n)=i=1∑n(f∗g)(i)−c=1∑nf(c)G(⌊cn⌋) If f(1)= 0, then: G(n)=f(1)1(i=1∑n(f∗g)(i)−c=1∑nf(c)G(⌊cn⌋)) Assume for now that f is chosen
such that F(i) and ∑i=1n(f∗g)(i) are efficiently
computable, either directly via an O(1)
formula or indirectly via a ‘recursive’ application of this
technique.
Note that ⌊n/c⌋ can
only take on a maximum of about 2n values:
n/1,n/2,…,n/n (final
term is approximate), for c≤n
about n,n−1,…,1, for c>n
The sum over c from 1 to n is
now broken up into O(n) segments
where ⌊n/c⌋ is equal,
and G(⌊n/c⌋) only needs
to be calculated once per segment.
Given x, we want to solve for j in ⌊n/j⌋=x, which shall give us the form of each segment.
x=⌊jn⌋x≤jn<x+1x+1n<j≤xn⌊x+1n⌋+1≤j≤⌊xn⌋
Therefore given the the first interval’s starting point l1=2, the end of this interval is r1=⌊n/⌊n/2⌋⌋, and then the next interval starts at l2=r1+1 and ends at r2=⌊n/⌊n/l2⌋⌋, etc. until n is
reached. In each interval [li,ri]
the sum is (F(ri)−F(li))G(⌊n/li⌋).
By recursively computing G as shown,
memoizing using a hash table, we can achieve O(n3/4) time complexity and O(n1/2) space complexity. If we use a
sieve to pre-compute the first O(n2/3) terms of G(n) we can achieve O(n2/3) time and space complexity.
The complexity proof: In the no-precomputation case every G(⌊n/d⌋) is computed exactly
once, each taking O(⌊n/d⌋) time. As discussed earlier ⌊n/d⌋ takes on the values
1,2,…,n and n/1,n/2,...,n/n (We are talking
asymptotics here so we’ll just ignore the details.), therefore the total
time is: T(n)=i=1∑nO(i+in)=i=1∑nO(i1/2+i−1/2n)=O((n)3/2)+O((n)1/2)n=O(n3/4) If we can find the first O(nk)
(where 1/2≤k≤3/4) values of
g in O(nk) time we can accelerate the process.
This is often possible for, for example, multiplicative functions where
Euler’s
sieve can be used. For more detailed information check this note on
Codeforces. The new execution time is
Asymptotically, Tk′(n) is
minimized when k=(2−k)/2, which
is k=2/3. In practice, the number of
values to precompute is a tunable that will be set according to
needs.
To summarize, this means that if h=f∗g, and H and F can be computed efficiently either directly
or ‘recursively’ by this technique, then G can be computed efficiently.
I could not find any authoritative reference on this method. The best
I could find is this answer on
math.SE.
The algorithm described above is often shown as a method to compute
isolated values of G(n), but it
actually computes G(⌊n/d⌋) for all possible values of ⌊n/d⌋. This will become handy
if, as alluded to earlier, F(n) is
still non-trivial but this method can apply, but we won’t need this for
now. We shall see how this helps us here in a minute, but first we need
to find f and h.
Back to business
In the definition of sk,f, we
extract the ‘difficult’ part μ(n)nk and we can define another function along with it: mk(n)idk(n)=μ(n)nk=nk The Dirichlet convolution of the two is: (mk∗idk)(n)=d∣n∑μ(d)dk(dn)k=d∣n∑μ(d)nk=(d∣n∑μ(d))nk=[n=1]nk=[n=1] That’s quite simple. This is no surprise, since [n=1] is the identity of Dirichlet
convolution, and mk is the Dirichlet
inverse of idk. Refer to Wikipedia
for details.
Can we use the earlier technique to compute sums of mk(n)=μ(n)nk? The answer is yes, as
the sum of the functions involved are either trivial or well-known:
i=1∑n[i=1]i=1∑n1i=1∑nii=1∑ni2=1=n=2i(i+1)=6i(i+1)(2i+1) Therefore it is possible to compute Mk(⌊n/d⌋) for all possible
values of ⌊n/d⌋ in
O(n3/4) time. (Remember that
uppercase means sum.)
Can we use the faster O(n2/3)
method? The answer is yes because mk
is multiplicative. I’ll refer to the notes on Euler’s
sieve mentioned earlier.
Remind you, we want to compute this. sk,u(n,m)=d=1∑min{n,m}mk(d)u(⌊dn⌋,⌊dm⌋) This is eerily familiar. In fact this is almost what we did in
the intermission, but now u takes two
arguments.
How many values can the pair (⌊n/d⌋,⌊m/d⌋) take on? As d increases, ⌊n/d⌋ decreases, and it will
take about 2n steps doing so,
and similarly for ⌊n/d⌋, therefore (⌊n/d⌋,⌊m/d⌋) takes on a maximum of about 2(n+m) steps (not a simple
sum because the steps can overlap). For example for n=10 and m=3, the combined steps like this (open the image in a new tab if
it’s too small for you):
Steps of change compared
Apply the exact same strategy as before: For the first interval, the
start is l1=1, and end is r1=min{⌊n/⌊n/l1⌋⌋,⌊m/⌊m/l1⌋⌋}, then for
the second interval l2=r1+1 and
r2=min{⌊n/⌊n/l2⌋⌋,⌊m/⌊m/l2⌋⌋}
and so on until min{n,m}. The sum
in each interval is then (Mk(ri)−Mk(li−1))f(⌊n/li⌋,⌊m/li⌋).
We can see that each ri and li−1 is either of the form ⌊n/d⌋ or ⌊m/d⌋ or 0. For the algorithm to work, Mk(0) should be the natural choice of sum of
no numbers, 0. Therefore if we compute
all Mk(⌊n/d⌋) and
Mk(⌊m/d⌋) in O(max{n,m}2/3) time, we can use O(n) time to combine them into our
answer sk,f(n,m) (provided that
f is available in O(1), which is the case.)
Applying this template to the three parts of the sum, which as a
reminder are:
We can have our answer 4(a0(n,m)+a1(n,m)+a2(n,m)) in O(max{n,m}2/3) time and O(max{n,m}2/3) space.
Phew, that was a whopping three thousand words according to Typora.
Admittedly though it does count words in LaTeX code, which may or may
not represent the effort needed to read math notation.
A few more details
The problem requires modulo arithmetic, otherwise word-sized integers
won’t hold such large numbers. In my code supporting class
mo is used for this purpose. I worried that it would hinder
optimization, but casually playing around on https://godbolt.org shows
that it works exactly the same as using functions while not wrapping the
number in a class.
Since this method requires calculating three things at once, I wrote
this tiny function that ‘vectorizes’ a function to multiple
std::array ‘vectors’. It comes into play quite a few times
in my code and cleans up the code quite a bit. It seems to optimize
well, with the loop unrolled and function properly inlined at least when
tested under recent Clang. Quite a bit of what I learned while solving
this kata is actually for writing this.
template<typename Func,size_t N,typename... Args>decltype(auto) arrf(Func f,std::array<Args, N>... args){std::array<decltype(f(args[0]...)), N> res;for(size_t i =0; i != res.size(); i ++){ res[i]= f(args[i]...);}return res;}
I also took the opportunity to enjoy many post-C++11 features such as
initializer lists, std::optional, std::array,
std::unordered_map, parameter packs
typename... Args, and maybe some others I’ve missed. I did
not get to use these features when I used to do competitive programming
a few years ago because of the poor adoption of up-to-date compilers.
It’s a nice refresher doing these things at a slightly higher level of
abstraction while sacrificing little to no efficiency, in a modernized
language that is admittedly quite old but also more alive than ever.