Combing information from 4 columns to create a new column
Show older comments

Hello,
I am SUPER new to Matlab (I have done an online fundamentals course but that is it). And I really would like to use Matlab to pre-process a spreadsheet before analysis (I could do this manually, but want to practice Matlab so thought I would give it a go). And I have hit a wall.
So, what I have is 4 columns that I need to merge together. The four columns all pertain to the same question, but contain the answer that a participant gave. For example, the question is which disorder do you have. If the individual has autism, ‘Autism’ will be stated in column 1 (column 2 is ADHD, column 3 is I don’t know ect). If they didn’t have that condition it will be blank (Matlab has assigned NaN). I have 600 rows of participants (so you can see if I were to do this manually it will take a very long time).
Ultimately, I want one column that has the disorder belonging to each participant.
Now, I thought that I could use an if function to do this. In words it would be, “For row 1, find the value not equal to NaN and insert into a new column – repeat for rows 1:600”. Now, the thing is, I don’t have any idea what functions I would use. I was thinking something like a if command but I’m not quite sure.
Please let me know if my explanation isn't clear.
Thank you soooo much!!
1 Comment
dpb
on 24 Mar 2018
Again, it is much easier to help if you attach actual data instead of pictures; we can't do anything with that...doesn't need to be more than representative sample as shown but does need to be either the actual text or a attach a file containing the data.
Answers (1)
Presuming it is in a spreadsheet as shown above, read the file as text; then the empty cells will remain empty instead of being NaN. Then it's easy-enough to pick out the non-empty cells with logical addressing--
I did make a small sample without the header lines--
>> [~,t,r]=xlsread('laurel.xlsx')
t =
6×4 cell array
'NaN' '' '' ''
'' '' '' 'No''
'' '' '' 'No''
'' '' 'Don't know' ''
'' '' '' ''
'' '' '' 'No'
r =
6×4 cell array
'NaN' [NaN] [ NaN] [NaN]
[NaN] [NaN] [ NaN] 'No''
[NaN] [NaN] [ NaN] 'No''
[NaN] [NaN] 'Don't know' [NaN]
[NaN] [NaN] [ NaN] [NaN]
[NaN] [NaN] [ NaN] 'No'
>>
Notice the difference between the two variables; t is the input as text and the r is the "raw" data in the spreadsheet which gets converted to cell array of the underlying type so has both a text and a numeric NaN both of which tend to be converted to numeric form whereas as a cell string the content isn't anything but a string...so we'll use it; just showed the other for comparison/pedgogical purposes here.
>> ix=~cellfun(@isempty,t) % which cells aren't empty???
ix =
6×4 logical array
1 0 0 0
0 0 0 1
0 0 0 1
0 0 1 0
0 0 0 0
0 0 0 1
>> t(ix) % pick those out
ans =
5×1 cell array
'NaN'
'Don't know'
'No''
'No''
'No'
>>
and Voila! what you're looking for. NB: a possible issue, though. There's one line that has nothing in any of the cells; that line ends up getting eliminated entirely. If that's a real case and there's an observation of that kind, you'll have to read in the ID column as well so that every record will be retained and so you get the results back keeping the information intact and associated with the correct individual.
2 Comments
OP's Answer follow-up moved to comment--dpb
Ah brilliant! So I have got as far as you have said that that works perfectly. But I do have that issue of the missing values, I need to make sure participant has their response (even if they jest left it blank). I am unsure of how to factor in that ID column.
I also realise that I have a few individuals who have two conditions - Do you think there would be a way adding two conditions together into this new column?
I will add a sample of my data shortly....
Thanks so much for the help!
dpb
on 25 Mar 2018
I'm sure you could; which two of the six would you combine, though?
I've no idea what analysis you want to do on the data but I'd think combining fields would compromise that ability as well as not make it nearly as easy to find those with a given condition or combination of conditions. In general, data fields should only contain one specific data coding, not combinatorial.
In fact, I'm not sure keeping the original but incorporating names for each column and converting to categorical variables and normalizing the responses to specific codings wouldn't be all you'd really want to do here before making any analyses.
If you were to do that, the issue of no responses for a respondent would go away; it's easy enough to deal with that if must, but looking at the data a little more makes me wonder if you wouldn't be better of with a different tack...
Categories
Find more on Univariate Discrete Distributions in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!