Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
good use of 'hankel' function
cool solution
Sorry about this, but I got stuck and I want to learn how to do it. After looking at several solutions, I found my mistake and was able to create my own solution :)
What happens if the test suite changed in the future?
This solution is not correct in general, as the way of using hankel here, generates n-1 fake fragments
Clever usage of the Hankel matrix. I don't automatically think of the Hankel for this application, but it really works well. Thanks - I've learned something
What's the point of a 'solution' like this? It passes the test suite, but in what way was it interesting for you to write it?
645 Solvers
Magic is simple (for beginners)
1361 Solvers
151 Solvers
317 Solvers
373 Solvers