Most efficient way to separate numerically labeled objects that share borders?

Consider the following image mask:
IM = [0 0 0 1 1 1 1
0 0 1 1 1 1 1
0 2 2 2 1 1 1
2 2 2 2 2 1 0
2 2 2 2 2 0 0];
Objects 1 and 2 share a diagonal border. If I converted this image as shown above into a logical and passed it to regionprops, it would detect only 1 (merged) object, not 2.
I'm looking for the most computationally efficient way to separate these objects. The end result would look something like this:
want = [0 0 0 1 1 1 1
0 0 0 0 1 1 1
0 2 2 0 0 1 1
2 2 2 2 0 0 0
2 2 2 2 2 0 0];
...where regionprops(logical(want)) would result in 2 objects being detected.
The reason I'm asking is that my actual masks are 25000 x 25000 in size, with about 100,000 objects in each. Separating them would be more efficient in terms of both storage and downstream processing. Also, I'm aware that separating these objects will erode them a bit, and a secondary goal is to minimize the amount of erosion and preserve their overall structures, as some of the actual objects can be as little as 10 pixels wide (they are narrow cells).
So far I've thought of some ways to do this, including brute-forcing it by systematically checking each pixel and converting it to a 0 if one of its neighbors is anything other than a 0 or itself, e.g:
IM2 = padarray(IM, [1 1], 0, 'both'); % make some space
for i = 2:size(IM2, 1)-1
for j = 2:size(IM2, 2)-1
% get that pixel and its neighbors
w = IM2(i-1:i+1, j-1:j+1);
% if bordering another object, turn it to 0
if any(~ismember(w(:), [0 IM2(i, j)]))
IM2(i, j) = 0;
end
end
end
out = IM2(2:end-1, 2:end-1); % remove padding
While this method technically works, it's time-consuming for a large image (about 6 hours per 25000 x 25000) and I have quite a few of these images...
I've also tried various other methods, including isolating individual objects, doing morphological erosion on each, and then mapping the results back on the original image, but this is turned out to be far worse than the brute-force method in terms of both computation time and structural preservation.
Does anyone here have experience with efficiently separating these kinds of objects, which are numerically labeled and could be adjacent to each other, such that they can be converted and stored as logical images without losing too much of their morphological properties?

2 Comments

bwlabel does not solve this issue because when it converts IM above to logical, the distinction between objects 1 and 2 is lost. They need to be separated before bwlabel can be called.

Sign in to comment.

Answers (2)

---- Update on 2023-04-10 ----
Mostly figured this out. I just had to isolate each object in its bounding box instead of running imdilate across the entire image N times (code updated below).
load('sample.mat') % IM = sample 1000x1000 image
figure, imshow(label2rgb(IM, 'parula', 'k')); title('Adjacent objects');
IM2 = padarray(IM, [1 1], 0, 'both');
% reorder objects 1,2,3... (can comment this out if they are already in order)
IM2 = uint32(categorical(IM2)); IM2 = IM2 - 1;
b = false(size(IM2)); % allocate master boundary mask
R = regionprops(IM2);
for i = 1:size(R, 1)
% isolate each object in its boundingbox
bb = R(i).BoundingBox;
obj = IM2(floor(bb(2)):ceil(bb(2)+bb(4)), floor(bb(1)):ceil(bb(1)+bb(3)));
% remove other objects in this boundingbox
obj(obj ~= i) = 0;
% determine object perim in its boundingbox
obj = imdilate(obj, strel('disk', 1)) - obj;
% locate corresponding bbox in b and append to it
b(floor(bb(2)):ceil(bb(2)+bb(4)), floor(bb(1)):ceil(bb(1)+bb(3))) = ...
b(floor(bb(2)):ceil(bb(2)+bb(4)), floor(bb(1)):ceil(bb(1)+bb(3))) | obj;
end
out = IM2; out(b) = 0; % remove all boundaries
out = out(2:end-1, 2:end-1); % remove padding
figure, imshow(label2rgb(out, 'parula', 'k')); title('Separated objects');
This method is really fast. On the original 25000 x 25000 image with 120,000 objects, it finishes in ~90 seconds (as opposed to my original method, which took 40+ hours).
The only issue now is what is the best algorithm to preserve as much of each object as possible (or as Image Analyst pointed out in a comment, how to determine boundaries of adjacent objects). The above code uses imdilate to define a border around each object as an efficient approximation. This will erode away some objects if they're too small, though I currently don't see a way around sacrificing some morphology.
---- Original idea ----
After giving this some more thought, I've come up with a faster solution, which is to just calculate and remove the perimeters of each object. Method 1 does this via bwperim, while method 2 uses a dilated object minus the original.
% u = nonzeros(unique(IM(:)));
% b = false(size(IM)); % allocate boundary mask
% for i = 1:size(u, 1)
% % isolate each object
% obj = IM == u(i);
% % method 1: find object inner perimeter and append to b
% b(bwperim(obj)) = true;
% % method 2: find object outer perimeter and append to b
% % b(logical(imdilate(obj, strel('disk', 1)) - obj)) = true;
% end
% out = IM;
% out(b) = 0; % split objects by setting their boundaries to 0
Method 1 takes ~6% of the time of my initial brute-force method, but indescriminately erodes every object by a pixel, which might be an issue for smaller objects.
Method 2 is a little slower (~14%), but is better at preserving small objects.
You don't need bwlabel, like KSSV suggested, because your matrix is already labeled. You can simply use
oneBlob = ismember(IM, numberYouWant);
I don't see why bwperim() would not preserve small objects. I guess I'd need to see an example of that to prove it, like
mask = false(4,4);
mask(2, 2:3) = true
mask = 4×4 logical array
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
perims = bwperim(mask) % See? All preserved
perims = 4×4 logical array
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0

8 Comments

Sorry, I think my problem statement wasn't clear: I'm trying to separate objects such that when the image is converted to logical, the "separation" between objects is preserved.
Yes, my objects are already labeled. The issue is that sometimes, there are no zeros separating objects 1 and 2, such as in this example:
IM = repmat([1 1 1 2 2 2 0 0 3 3 3], 4, 1) % 3 objects in this 'image'
IM = 4×11
1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3
If I save this as a logical, I lose the information that there are 3 distinct objects.
IMR = regionprops(logical(IM));
size(IMR) % detects only 2 objects because 1 and 2 are fused (bad!)
ans = 1×2
2 1
What I want is to "physically" separate each object, even if it means that I'll lose a little bit of object integrity.
want = repmat([1 1 0 2 2 2 0 0 3 3 3], 4, 1)
want = 4×11
1 1 0 2 2 2 0 0 3 3 3 1 1 0 2 2 2 0 0 3 3 3 1 1 0 2 2 2 0 0 3 3 3 1 1 0 2 2 2 0 0 3 3 3
wantR = regionprops(logical(want));
size(wantR) % detects 3 objects (good!)
ans = 1×2
3 1
The issue is how do I efficiently go from IM --> want. It doesn't have to be precisely what's shown, but I need at least a solid border of zeros between objects 1 and 2, kind of like watershedding, in a way.
You're correct that bwperim by itself doesn't erode objects. But what I'm doing is using the result of bwperim as a mask, and assigning that mask as 0 in IM in order to try to separate object 1 from 2. This by definition will erode every object by about a pixel around its perimeter, as shown below:
u = nonzeros(unique(IM(:)));
b = false(size(IM)); % allocate boundary mask
for i = 1:size(u, 1)
% isolate each object
obj = IM == u(i);
% method 1: find object inner perimeter and append to b
b(bwperim(obj)) = true;
end
out = IM;
out(b) = 0 % split objects by setting their boundaries to 0
out = 4×11
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 3 0 0 1 0 0 2 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0
You'll notice that all 3 objects are a lot smaller in out than in want. In particular, object 3 didn't need to be eroded at all, because it was already separated from 1 and 2 in the original image. But as I mentioned, this isn't a huge deal if the objects are sufficiently large. For small objects like what is shown above though, this method could significantly alter their properties. It is, however, the fastest method I've figured so far.
Hope that clarifies things a little. If it would help, I've also attached a small sample of the image I'm working with. You'll find that there are 448 (already labeled) objects, but converting the image directly to binary returns only 108, due to objects being fused together. The above bwperim method returns 444 objects, which is not ideal, but probably good enough.
load('sample.mat'); % sample IM containing 448 distinctly labeled objects
length(nonzeros(unique(IM(:))))
ans = 448
size(regionprops(logical(IM))) % direct conversion to logical merges objects!
ans = 1×2
108 1
u = nonzeros(unique(IM(:)));
b = false(size(IM)); % allocate boundary mask
for i = 1:size(u, 1)
% isolate each object
obj = IM == u(i);
% method 1: find object inner perimeter and append to b
b(bwperim(obj)) = true;
end
out = IM;
out(b) = 0; % split objects by setting their boundaries to 0
size(regionprops(logical(out)))
ans = 1×2
444 1
If two regions are adjacent regionprops will give measurements for each assuming you pass it the labeled image. However when you take the labeled image and cast it to logical, all the labels are lost and you have only 0 and 1 so the distinction between blob #1 and blob #2 is lost. So, DON'T DO THAT! Here is fixed code:
IM = repmat([1 1 1 2 2 2 0 0 3 3 3], 4, 1) % 3 objects in this 'image'
IM = 4×11
1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3 1 1 1 2 2 2 0 0 3 3 3
% Wrong way:
props = regionprops(logical(IM)); % DO NOT cast a labeled image to binary
size(props) % detects only 2 objects because 1 and 2 are fused (bad!)
ans = 1×2
2 1
% Correct way:
props = regionprops(IM);
size(props) % detects all 3 objects even though 1 and 2 are fused/
ans = 1×2
3 1
I know that I can just pass IM directly to regionprops. The reasons that I want to forceably split up objects are: (1) My images are on the order of 25000 x 25000 with 100,000 objects each. This makes it unwieldy to store and share (currently as uint32) compared to a binary, and (2) Some downstream processes from collaborators assume a logical input mask, and I don't want to rewrite their code.
If two regions are adjacent, one with a lower number and one with a higher number, which pixel boundary do you want to set to zero?
Long story short: It doesn't matter as long as the objects are separated such that regionprops still recognizes they're distinct objects after image is binarized.
Ideally, there would be minimal disruption (i.e. minimize the difference between regionprops(IM) and regionprops(logical(IM))), but I accept that there will be somewhat arbitrary definitions of boundaries, particularly when they're not neat lines in cardinal directions, and that there will be some losses in object morphology.
As a side note: I will take another crack at this later, but my current guess is that my bwperim and imdilate methods are slow because they're unnecessarily passing over an entire image N times, even though each object in isolation only takes up a tiny fraction of the actual image. Maybe using the bounding box of each object (instead of obj = IM == u(i);) would speed it up....
Also, now that I think of it, maybe a better question title would have been: "How can I minimize the difference between regionprops(IM) and regionprops(logical(IM))?"
@Image Analyst Turns out calculating boundaries in individual bounding boxes resulted in a decently fast solution, and I have updated my initial Answer (feel free to take a look). If you have any ideas on optimizing "boundary detection", I'm happy to discuss, but thank you for your time regardless. I've been a long time lurker and appreciate your answers in other image processing threads as well.
Like I said, if you cast your labeled matrix to logical, then all your labeled blobs coalesce into one giant blob and regionprops will only give you measurements on the giant blob, not individual blobs.
I really don't know why you think you need to split apart the blobs to call regionprops rather than just put in IM directly and get the measurements for all the blobs at once. Not really sure what you want to measure but if you want the area and perimeter, for all hundred or so blobs, you can simply do this:
props = regionprops(IM, 'table', 'Area', 'Perimeter')
No need to split them apart and measure each one at a time.
As mentioned before, I want to save my labeled matrix as logical for storage and downstream processing reasons. Please trust me that my reasons for doing this are sound. I wouldn't go out of my way to do this if there were a simpler solution.
I'm fully aware of how regionprops works, and the issue has nothing to do with regionprops. I have been doing image processing and data analysis in Matlab for nearly 10 years, and I am 100% confident that regionprops alone will not give me the result I'm looking for.
I don't care about the perimeter length, I care about the perimeter pixels. You asked if there are adjacent regions with different labels, whose pixels do I set to zero? There is your answer. Now I need to do that 100,000+ times per image, and quickly.
If you're still unconvinced about my concepts or methods, I strongly recommend that you check out my updated official Answer, and compare the before and after images. I just processed one of my original images and reduced the size from 1.64 GB to 10.3 MB, while keeping more than 95% of objects intact. All done in 1.5 minutes.
Once again, thank you for your time. Attempting to explain my issue to you helped me to come up with a solution on my own.

Sign in to comment.

Asked:

on 10 Apr 2023

Commented:

on 11 Apr 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!