Remove rows in an array containing a non-matching element

2 views (last 30 days)
I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?

Accepted Answer

Guillaume
Guillaume on 11 Apr 2017
Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);
  3 Comments
Guillaume
Guillaume on 12 Apr 2017
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
readtable is extremely flexible. Look at its documentation to see all the options available.

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 11 Apr 2017
Look into ismember() or setdiff()
  1 Comment
astein
astein on 11 Apr 2017
Edited: astein on 11 Apr 2017
I don't know how to use either for this purpose. setdiff() is going to give me the genes they don't have in common? I want the genes they have in common. ismember() gives me a logical array. I run into the same issue of how do I use the array to pull out only the rows that are "true". I am having difficulty manipulating the datasets (which format to load the txt files--structure, table, etc).

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!