In a previous discussion,
we looked at a variety of infallible tests for primality, but all of them were too slow to be viable for large numbers. In fact, all of the methods discussed there will fail miserably for even moderately large numbers, those with just a few dozen decimal digits. That does not even begin to push into the realm of large numbers. In turn, that forces us into a different realm of tests - tests which are usually and even almost always correct, but can sometimes incorrectly predict primality.
In this discussion, I will be trying to convince you that the Fermat test for primality can be a quite good test when the number is sufficiently large. Except of course, when it is really bad. Even so, the Fermat test for primality is both a useful tool as well as a necessary underpinning for several other better tests.
The Fermat test for primality relies on Fermat's little theorem, a perhaps under-appreciated tool. Even the name implies it is of little interest. I'm not taking about the famous last theorem, only proven in recent years, but his little theorem.
If you want to look over some nice ways to prove the little theorem, take a read in this link:
I will readily admit that long ago, when I learned about the little theorem in a nearly forgotten class, I thought it was interesting, but why would I care? Not until I learned more mathematics and saw Fermat’s little theorem appearing in different places did I begin to appreciate it. Fermat tells us that, IF P is a prime, AND w is co-prime with P (so the two are relatively prime, sharing no common factors except 1), then it must be true that
mod(w^(P-1),P) == 1
Try it out. Does it work? Be careful though as too large of an exponent will cause problems in double precision, and that is not difficult to do. As a test case that will not overwhelm doubles, note that 13 is prime, and 3 shares no common factors with 13, so we satisfy the requirements for Fermat's little theorem.
We can even verify that any co-prime of 13 will yield the same result.
mod((1:12).^12,13)
ans =
1 1 1 1 1 1 1 1 1 1 1 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Indeed it worked, suggesting what we knew all along, that 13 is prime. The little Fermat test for primality of the number P uses a converse form of Fermat's little theorem, thus given a co-prime number w known as the witness, is that if
mod(w^(P-1),P)==1
then we have evidence that P is indeed prime. This is not conclusive evidence, but still it is evidence. It is not conclusive because the converse of a true statement is not always true.
The analogy I like here is if we lived in a universe where all crows are black. (I'll ask you to pretend this is true. In fact, some crows have a mutation, making them essentially albino crows. For the purposes of this thought experiment, pretend this cannot happen.) Now, suppose I show you a picture of a bird which happens to be black. Do you know the bird to be a crow? Of course not, as the bird may be a raven, or a redwing blackbird (until you see the splash of red on the wing), a common grackle, a European starling in summer plumage, a condor, etc. But having seen black plumage, it is now more likely the bird is indeed a crow. I would call this a moderately weak evidentiary test for crow-ness.
Little Fermat may seem to be of little value when testing for primality for two reasons. First, computing the remainder would seem to be highly CPU intensive for large P. In the example above, I had only to compute 3^12=531441, which is not that large. But for numbers with many thousands or millions of digits, directly raising even a number as small as 2 to that power will overwhelm any computer. Secondly, if we do that computation, little Fermat does not conclusively prove P to be prime.
Our savior in one respect is the powermod tool. And that helps greatly, since we can compute the remainder in a reasonable time. A powermod call is quite fast even for huge powers. (I won't get into how powermod works here, since that alone is probably worth a discussion. I could though, if I see some interest because there are some very pretty variations of the powermod algorithm. I hope to show you one of them when I discuss the Fibonacci test for primality in a future post.) Trying the little Fermat test using powermod on a number with 1207 decimal digits, I’ll first do a time check.
P = 4000*sym(2)^3999 - 1;
timeit(@() powermod(2,P-1,P))
As you can see, powermod really is pretty fast. Compared to an isprime test on that number it would show a significant difference.
I have said before that little Fermat is not a conclusive test. In fact, a good trick is to perform a second little Fermat test, using a different witness. If the second test also indicates primality, then we have additional evidence that P is in fact prime.
This value for P is indeed prime, and little Fermat suggests it is, doubly suggestive in that test since I actually performed two parallel tests. Here however, we need to understand when it will fail, and how often it will fail.
If we perform a little Fermat test for primality, we will never see false negatives, that is, if any test with any witness ever indicates a number is composite, then it is certainly composite. (The contrapositive of a true statement is always true.) The alternate class of failure is the false positive, where little Fermat indicates a number is prime when it was actually composite.
If P is composite, and w co-prime with P, we call P a Fermat pseudo-prime for the witness w if we see a remainder of 1 when P was in fact composite. When that happens, the witness (w) is called a Fermat liar for the primality of P. (A list of some Fermat pseudo-primes where 2 is a Fermat liar can be found in sequence A001567 of the OEIS.)
In the case of 4000*2^3999-1, I claimed the number to be in fact prime, and it was identified so (as PROBABLY prime by little Fermat. Next, consider another number from that same family. I’ll perform three parallel tests on it, with witnesses 2, 3, and 5. This will suggest the value of doing parallel tests on a number to reduce the failure rate from little Fermat.
P2 = 1024*sym(2)^1023 - 1;
As you can see, P2 is co-prime with all of 2, 3 and 5, but 2 is a Fermat liar, whilst 3 and 5 are Fermat truth tellers, identifying P2 as certainly composite. So the little Fermat test can definitely fail for SOME witnesses, since we see a disagreement. However, an interesting fact about P2 above is it is also a Mersenne number with prime exponent, thus it can be written as 2^1033-1, where 1033 is prime. I can go into more detail about this case later, but we can show that 2 is always a Fermat liar for composite Mersenne numbers when the exponent is itself prime. I’ll try to leave more detail on this matter in a future discussion, or perhaps a comment.
Next, consider the composite integer 51=3*17. As the product of two primes, it is clearly not itself prime.
w = w0(gcd(w0,P51) == 1)
w =
2 4 5 7 8 10 11 13 14 16 19 20 22 23 25 26 28 29 31 32 35 37 38 40 41 43 44 46 47 49
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Note that I did not include 1 or 50 in that set, since 1 raised to any power is 1, and 50 is congruent to -1, mod 51. -1 raised to any even power is also always 1, and 51-1 is an even number. And so when we are working modulo 51, both 1 and 50 are not useful witnesses in terms of the little Fermat test.
w(R == 1)
ans =
2 4 5 7 8 10 11 13 14 16 19 20 22 23 25 26 28 29 31 32 35 37 38 40 41 43 44 46 47 49
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
This teaches us that when 51 is tested for primality using the little Fermat test, there are 2 distinct witnesses w (16 and 35) we could have chosen which would have been Fermat liars, but all 28 other potential co-prime witnesses would have been truth tellers, accurately showing 51 to be composite. Proportionally, little Fermat will have been correct roughly 93% of the time, since only 2 of these 30 possible tests returned a false positive. (I’ll add that for any modulus P, if w<P is not co-prime with the modulus, then the computation mod(w^(P-1),P) will always return a non-unit result, and therefore we can theoretically use any integer w from the set 2:P-2 as a witness. However if w is co-prime with P then P is clearly not prime, and the entire problem becomes a little less interesting. As such, I will only consider co-prime witnesses for this discussion.) Regardless, that would make the little Fermat test for P=51 even more often correct, since it returns the correct result of composite for 46 out of the 48 possible witnesses 2:49. Does this mean Little Fermat is indeed the basis for a good test to rely on to learn if a number is prime? Well, yes. And no.
Little Fermat forms a very good test most of the time, but reliance is a strong word. This means we need to explore the little Fermat test in more depth, focusing on Fermat liars and the case of false positives. To offer some appreciation of the false positive rate, offline, I have tested all composite integers between 4 and 10000, for all their possible co-prime witnesses.
In that .mat file, I've saved three vectors, X, witnessCount, and liarCount. X is there just to use for plotting purposes and is NaN for all non-composite entries.
whos X witnessCount liarCount
Name Size Bytes Class Attributes
X 1x10000 80000 double
liarCount 1x10000 80000 double
witnessCount 1x10000 80000 double
The vector witnessCount is the number of valid witnesses for the corresponding number in X. Corresponding to that is the vector liarCount, which is the number of Fermat liars I found for each composite in X.
How many Fermat test useful witnesses are there for any integer X? This is just 2 less than the number of coprimes of X. The number of coprimes is given by the Euler totient function, commonly called phi(X). (I’ll be going into more depth on the totient in the next chapter of this series, because the Euler totient is a crucial part of understanding how all of this works.)
The witness count is phi(X)-2. Why subtract 2? 1 can never be a witness, but 1 is technically coprime to everything. The same applies to X-1 (which is congruent to -1 mod X.) As such, there are phi(X)-2 coprimes to consider. (I've posted a function called totient on the FEX, but it is easily computed if you know the factorization of X. Or for small numbers, you can just use GCD to identify all co-primes, and count them.)
From that plot, you can learn a few interesting things. (As a mathematician, this is what I love the most, thus to look at whay may be the simplest, most boring plot, and try to find something of value, something I had never thought of before.) For example, we know that when X is prime, then everything from the set 2:X-2 is a valid witness. So the upper boundary on that plot will be the line y==x. As well, there are a few numbers where the order of the set of witnesses will be close to the maximum possible. For example, 961=31*31, has 928 valid witnesses. That makes some sense, as 961 is the square of a prime (31), so we know 961 is divisible only by 31. Only multiples of 31 will not be coprime with 961.
But how about the lower boundary? The least number of valid witnesses will always come from highly composite numbers, because they will share common factors with almost everything. For example 30 = 2*3*5, or 210=2*3*5*7.
witnessCount([30 210 420])
ans =
6 46 94
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
A good discussion about the lower bound for that plot can be found here:
What really matters to us though, is the fraction of the useful witnesses for a little Fermat test that yield a false positive.
plot(X,liarCount./witnessCount,'b.')
Cind = liarCount == witnessCount;
title('Fermat pseudo-prime strength')
A look at this plot shows seven circles in red, corresponding to X from the list {561, 1105, 1729, 2465, 2821, 6601, 8911} which are collectively known as Carmichael numbers. These are numbers where all witnesses return a false positive. Carmichael numbers are themselves fairly rare. You can find a list of them as sequence A002997 in the OEIS. And for those of you who have never wandered around the OEIS, please take this opportunity to do so now. The OEIS stands for Online Encyclopedia of Integer Sequences. It contains a wealth of interesting knowledge about integers and integer sequences.)
There are a few other interesting numbers we can find in that plot, like 91 and 703, where roughly 50% of the valid witnesses yield false positives. Of the complete set, which numbers did return at least a 25% false positive rate for primality? These numbers would be known as strong pseudo-primes for the little Fermat test, because they are pseudo-primes for at least 25% of the potential witnesses. These strong pseudo-primes have some interesting similarities to the Carmichael numbers. (My next post will go into more depth on Carmichael numbers and strong pseudo-primes. At the moment, I am merely interested in looking at the how often the little Fermat test fails overall.)
find(liarCount./witnessCount> 0.25)
ans =
15 45 65 91 105 133 231 341 481 561 645 703 1105 1541 1729 1891 2465 2701 2821 3201 4033 4371 5461 5565 6533 6601 7107 8321 8911
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
You should notice the spacing between successive strong Fermat pseudo-primes is growing slowly, with a spacing of roughly 800 on average in the vicinity of 10000. If I step out beyond by a factor of 10, the next strong Fermat pseudo-primes after 1e5 are {101101, 104653, 107185, 109061, 111361, 114589, 115921 126217, 126673}, so that average spacing is definitely growing.
Given that set, now we can look at the prime factorizations of each of those strong pseudo-primes. Can we learn something about them?
arrayfun(@factor,find(liarCount./witnessCount> 0.25),'UniformOutput',false)
Perhaps the most glaring thing I see in that set of factors is almost all of those strong Fermat pseudo-primes are square free. That is, in that list, only 45=3*3*5 had any replicated factor at all. That property of being square free is something we will see is necessary to be a Carmichael number, but it also suggests that a simple roughness test applied in advance would have eliminated almost all of those strong pseudo-primes as obviously not prime, even at a very low level of roughness.
In fact, for most composite integers, most witnesses do indeed return a negative, indicating the number is not prime, and therefore composite. Little Fermat does not commonly tell falsehoods, even though it can do so.
semilogy(X,movmedian(liarCount./witnessCount,20,'omitnan'),'b-')
title('False positive fraction for composites')
We can learn from this last plot that as the number to be tested grows large, the median false positive rate for little Fermat, even for X as low as only 10000, is roughly 0.0003. (It continues to decrease for larger X too. In fact, I’ve read that when X is on the order of 2^256, the relative fraction of Fermat liars is on the order of 1 in 1e12, and it continues to decrease as X grows in magnitude. In my eyes, that seems pretty good for an imperfect test. Not perfect, but not bad when paired with roughness and perhaps a second little Fermat test using a different witness, and we will start to see tests which bear a higher degree of strength.)
I’ll stop at this point in this post because the post is getting lengthy. In my next post, I’d like to visit some questions about what are Carmichael numbers, about whether some witnesses are better than others, and if there are any numbers which lack any Fermat liars. However, in order to dive more deeply, I will need to explain how/why/when the little Fermat test works, and what causes Fermat liars. Stay tuned, because this starts to get interesting.