count the change between a row and the next, in a column

1 view (last 30 days)
I have a series of data in a column, and I want to count how many times the data, between one row and the next, change its content (my column is a categorical column). Below, I put an example:
Here, I have 4 changes. The dimension of my timetable is 78397x5. How can I do? Thanks in advance for the help.

Accepted Answer

Paolo
Paolo on 28 Jun 2018
Edited: Paolo on 28 Jun 2018
For categorical array c
chararr = cellstr(c);
num = cellfun(@(x) regexp(x,'(\d)*$','match','once'), chararr,'un',0);
nchange = nnz(diff(str2double(num)));
  7 Comments
Adam Danz
Adam Danz on 29 Jun 2018
Edited: Adam Danz on 29 Jun 2018
Erica, what didn't work with my solution? Do you have access to grp2idx()?
I updated my solution to use your tab_night data as input rather than my fake data (the algorithm didn't change). I also added the solution to count the number of changes between _20 and _17. Note that this solution should work on any and all category types and is not dependent on the naming if the category.
For you data, there were 12468 total changes in category and there were 927 changes from _20 to _17 (which is a lot of trips to the bathroom).

Sign in to comment.

More Answers (2)

Adam Danz
Adam Danz on 28 Jun 2018
Edited: Adam Danz on 29 Jun 2018
A solution is to convert your categorical data to numerical using gr2idx() if you have the statistics toolbox. Then you can differentiate the numerical array and count how many times the differentiation isn't zero (ie, count the number of times there is a change). Note that this solution will work with any category names. Using the data you attached, here's how to count the number of changes:
% Copy list of all categories
cats = tab_night{:,1};
% Convert to numerical values
n = grp2idx(cats);
% Alternatively, as @Guillaume pointed out: n = double(c);
% now differentiate and count the number of changes
count = sum(diff(n)~=0);
If you want to know the index value where the change occurs, these are the row numbers:
find(diff(n)~=0)+1
If you want to count the number of times the category changes from ..._20 to ..._17 here's how to do that:
% Count changes from dbuid-20 to dbuid-17
% First we need to find which values of 'n' correspond to the 2 categories requested
dbuid17 = n(find(cats == 'dbuid-17',1)); %=4
dbuid20 = n(find(cats == 'dbuid-20',1)); %=7
% now we need to count the number of times n changes from 7 to 4
% 4 minus 7 is -3 so if we differentate n we just need to count the number of '-3's
% along with the number of times it changed to category _17.
nDiff = diff(n);
changIdx = nDiff==(dbuid17-dbuid20) & n(2:end)==dbuid17;
nChanges = sum(changIdx);
Finally, to confirm that everything worked well, use the 'changIdx' to look at which categories were flagged as a change from _20 to _17:
cats(find(changIdx)) %Should all be dbuid-20
cats(find(changIdx)+1) %should all be dbuid-17
If you don't have access to the grp2idx function this won't work. Otherwise, it works for all category types.
  4 Comments
Adam Danz
Adam Danz on 29 Jun 2018
Edited: Adam Danz on 29 Jun 2018
Hi Erica, you've got a nice chunk of work to do on this and it's an interesting project so I hope you succeed. I'll break down the steps you need to do but there will inevitably be additional bumps along the way.
  1. You need to determine when someone is in room X. You should be able to use my code to get you started on that. For example, you can identify any time someone is in the room 'dbuid20' by the logical vector created by n==dbuid20. This will give you a string of [0 0 0 1 1 1 1 1 0 0 0 0 ....] where the 1s indicated presence in the room.
  2. Using that logical vector, you need to figure out when each string of 1s start and end. For example, [0 0 0 1 1 1 1 0 0 0 1 1 0 0] in that string the ones start at [4,11] and end at [7, 12]; Hint: if the logical vector is named 'inRoomX', you can use the output of diff(inRoomX) which will have a 1 or -1 for entering or leaving the room. However, you may need to clean up the data in cases where there is only 1 sample within the room (ie, the visit was <=3 seconds).
  3. What you ultimately want is two logical vectors: one that marks room entry and one that marks room exit. Once you have that, you can pull the corresponding time stamps from your data and calculate the amount of time in the room. Follow this example below.
% Create fake exitRoom and enterRoom logical vectors. In each of them, 1 indicates enter or exit.
exitRoom = false(size(tab_night.t));
exitRoom(1:1000:50001) = true;
enterRoom = false(size(tab_night.t));
enterRoom(400:1000:50400) = true;
% Get enter times
EnterTime = tab_night.t(enterRoom);
% Get exit times
ExitTime = tab_night.t(exitRoom);
% Subtract the two, assuming that all exit times come after enter times
% 'timeInRoom' should all be positive!! If not you have problems.
timeInRoom = ExitTime - EnterTime;
%How much total time in room?
sum(timeInRoom)

Sign in to comment.


Guillaume
Guillaume on 29 Jun 2018
Edited: Guillaume on 29 Jun 2018
Hum, unless I'm very mistaken, wouldn't the simplest solution be to convert the array to double, then nnz(diff()) that?
%build demo array
c = categorical({'dbuid-20'; 'dbuid-17'});
c = c(repelem([1 2 1 2 1], [7 3 5 3 1]))
%result
ndiff = nnz(diff(double(c))) %all done!
No need for the statistical toolbox and works with all categorical arrays.
  1 Comment
Adam Danz
Adam Danz on 29 Jun 2018
Yes, double(c) is more simple than grp2idx(c) and produces the same results. Good call.

Sign in to comment.

Categories

Find more on Categorical Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!