FILTER BIG DATA SET
    3 views (last 30 days)
  
       Show older comments
    
Dear All,
I have Survey data for six years each year containing 26 variables and more than 2 million rows [2533835x28 table].
I would like to filter the entire dataset by using some entries from the variable (c33) for the values
 [1111    1112    1113    1114    1115    1116    1117    1118    1119    1121    1122    1123    1124    1131    1132    1133    1134    1135    1136    1137    1139    1140    1150    1161    1162    1169    1191    1193    1199    1210    1221    1222    1223    1224    1225    1229    1231    1232    1233    1239    1241    1242    1243    1249    1251    1252    1259    1261    1262    1269    1271    1272    1273    1279    1281    1282    1283    1284    1285    1286    1287    1291    1292    1293    1299    1301    1302    1309    1411    1412    1413    1420    1430    1441    1442    1450    1461    1462    1463    1491    1492    1493    1499    1500    1611    1612    1619    1620    1631    1632    1633    1639    1640    1700    2101    2102    2109    2201    2202    2203    2209    2301    2302    2303    2309    2401    2402    3111    3112    3113    3121    3122    3211    3212    3214    3215    3219    3221    3222    3223    3229 ].
Can anyone guide me how to filter the data.
2 Comments
  KSSV
      
      
 on 30 Aug 2024
				What do you mean by fitler the data? You may use logical indexing like ==, >, < etc.
Answers (2)
  Star Strider
      
      
 on 30 Aug 2024
        Your qquestion is a bit ambiguous.  
If you want to match thee elements of the data you posted to elements of your matrix, one option is to use the ismember function (since they all appear to be integers, ii they are actually floating-point numbers instead, use ismembertol wiith a simiiilar calling syntax).  
Try something like this — 
V =  [1111    1112    1113    1114    1115    1116    1117    1118    1119    1121    1122    1123    1124    1131    1132    1133    1134    1135    1136    1137    1139    1140    1150    1161    1162    1169    1191    1193    1199    1210    1221    1222    1223    1224    1225    1229    1231    1232    1233    1239    1241    1242    1243    1249    1251    1252    1259    1261    1262    1269    1271    1272    1273    1279    1281    1282    1283    1284    1285    1286    1287    1291    1292    1293    1299    1301    1302    1309    1411    1412    1413    1420    1430    1441    1442    1450    1461    1462    1463    1491    1492    1493    1499    1500    1611    1612    1619    1620    1631    1632    1633    1639    1640    1700    2101    2102    2109    2201    2202    2203    2209    2301    2302    2303    2309    2401    2402    3111    3112    3113    3121    3122    3211    3212    3214    3215    3219    3221    3222    3223    3229 ];
size(V)
A = array2table(randi([1000 3300], 10, 12))                         % Create Data (Matrix Of Random Integers)
Aa = table2array(A);
Lm = ismember(Aa, V)                                                % Logical MAtrix Of Locations
[r,c] = find(Lm);                                                   % Return Numeric Indices
rc = [r c]                                                          % Row & Column Indices Of Matching Values
.
0 Comments
  Subhajyoti
 on 30 Aug 2024
        
      Edited: Subhajyoti
 on 30 Aug 2024
  
      To filter your dataset in MATLAB based on specific entries in a particular variable, you can use logical indexing. 
Here, in the following code, I have generated a dummy data-table, and performed filtering operations on numerical and string data types.
% Create a data table of size 10x4
% Columns x1, x2, x3, x4 of data type string, double, double, boolean
num_rows = 10000000;
t = table;
% random data
x1 = string(randi([1, 10], num_rows, 1));
x2 = randi([1, 10], num_rows, 1);
x3 = randi([1, 10], num_rows, 1);
x4 = randi([0, 1], num_rows, 1);
% assign data to table
t.x1 = x1;
t.x2 = x2;
t.x3 = x3;
t.x4 = logical(x4);
- Use logical indexing to filter data where 'x3' is less than 5:
    tic
    %---------------------------------------%
    filtered_data1 = t(t.x3 < 5, :);
    %---------------------------------------%
    toc
    disp("Time taken to filter data using logical indexing: " + toc + " seconds")
- Use 'ismember' to filter data in 'x1' which are member of given array
    tic
    %---------------------------------------%
    filterValues = ["1", "2", "3", "4", "5"];
    filtered_data2 = t(ismember(t.x1, filterValues), :);
    %---------------------------------------%
    toc
    disp("Time taken to filter data using ismember: " + toc + " seconds")
- For complex numeric conditional operations, converting it to array using 'table2array' function can sometime speed-up operations.
    c33Array = table2array(t(:,'x3'));
    % Check if data-squared is less than 27
    filter = c33Array.^2 < 27;
    filtered_data5 = t(filter, :);
You may go through the following MathWorks documentation links to learn more about ‘table’ in MATLAB: 
- ‘table’: https://www.mathworks.com/help/matlab/ref/table.html
- Access Data in Tables: https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-table.html
- Filtering Elements: https://www.mathworks.com/help/matlab/matlab_prog/find-array-elements-that-meet-a-condition.html
 I hope this helps.
0 Comments
See Also
Categories
				Find more on Logical in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


