Consecutive numbers in a table

2 views (last 30 days)
Prathamesh Halagi
Prathamesh Halagi on 3 Feb 2021
Commented: Walter Roberson on 5 Feb 2021
I have exported a data form xls which is 4420x3. the first column consists of step numbers, I wanted to be able to get all consecutive numbers in the column one as well as the data followed in that row.
For ex: Col 1 ; Col 2 ; Col 3
1; 10; 5.6
2; 18; 6
3; 16; 8
10; 20; 7
11; 18; 6 , etc.
So it should save one file with all data beloning to (1, 2, 3) and another with (10,11). If column one is consecutive number then save the following data in one file. and do the similar for the rest consecutive numbers. So basically all consecutive numbers will be saved together with all the rows corresponding.

Answers (1)

Walter Roberson
Walter Roberson on 3 Feb 2021
mask = (diff(T.Col1) == 1).';
starts = strfind([false mask], [0 1])
stops = strfind([mask, false], [1 0])+1
Now starts(k):stops(k) is one run
Note that this version of the code assumes that there are no singles or that singles are not to be output.
  4 Comments
Prathamesh Halagi
Prathamesh Halagi on 5 Feb 2021
DO you think there is an option to use setdiff to understand the difference in the rows of one single column. If it is greater than 1 then save it to next file.?
Walter Roberson
Walter Roberson on 5 Feb 2021
No, setdiff() is not going to help you.
mask = (diff(T.Col1) == 1).';
starts = strfind([false mask], [0 1]);
stops = strfind([mask, false], [1 0])+1;
lastsingle = starts(1)-1;
for J = 1 : lastsingle
write out row J of the data separately to a file. ONLY that row.
end
for K = 1 : length(starts)
write out row starts(K):stops(K) of the data to a file as a group.
if K == length(starts)
lastsingle = number_of_rows_in_data
else
lastsingle = starts(K+1) - 1;
end
for J = stops(K)+1:lastsingle
write out row J of the data separately to a file. ONLY that row.
end
end
The algorithm here is to identify runs, which are stretches that start where the difference between rows becomes 1 and end when the difference stops becoming 1. Each run can be written as a group. Then between runs there might be any number of single values that are not part of a run, that have to be written to files individually (you never want to write two rows to the same file if they are not consecutive, so it follows you might end up writing single rows.)
Be sure to debug that the boundary conditions are correct.
Example:
2 22 1
5 55 2
11 11 3
12 12 3
13 13 3
15 15 4
17 17 5
18 18 5
19 19 5
23 23 6
The third number here reflects which file number to write to.
  • there may be singles before the first run of consecutive numbers
  • there may be singles between runs of consecutive numbers
  • there may be singles after the last run of consecutive numbers
I did ask about these possibilities, and you could have ruled them out and in so doing made the code easier, but instead you posted subsets of your data without refuting the possibilities of singles. Saying that your data looks like (small sample) is not sufficient to rule out other cases later in the data. I would recommend that in future if someone asks you whether your data has a particular characteristic, you answer directly Yes or No or I Don't Know, so that the person can give the code most appropriate for the situation.

Sign in to comment.

Categories

Find more on Variables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!