join
(Not Recommended) Merge dataset array observations
The dataset
data type is not recommended. To work with heterogeneous data,
use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
Syntax
C = join(A,B)
C = join(A,B,keys)
C = join(A,B,param1
,val1
,param2
,val2
,...)
[C,IB] = join(...)
C = join(A,B,'Type',TYPE,...)
C = join(A,B,'Type',TYPE,'MergeKeys',true,...)
[C,IA,IB] = join(A,B,'Type',TYPE,...)
Description
C = join(A,B)
creates a dataset array
C
by merging observations from the two dataset arrays
A
and B
. join
performs
the merge by first finding key variables, that is, pairs of dataset
variables, one in A
and one in B
, that share the
same name. Each observation in B
must contain a unique combination of
values in the key variables, and must contain all combinations of values that are
present in the keys from A
. join
then uses these
key variables to define a many-to-one correspondence between observations in
A
and those in B
. join
uses this correspondence to replicate the observations in B
and
combine them with the observations in A
to create
C
.
C = join(A,B,keys)
performs the merge using
the variables specified by keys
as the key variables in both
A
and B
. keys
is a positive
integer, a vector of positive integers, a character vector, a string array, a cell array
of character vectors, or a logical vector.
C
contains one observation for each observation in
A
. Variables in C
include all of the variables
from A
, as well as one variable corresponding to each variable in
B
(except for the keys from B
). If
A
and B
contain variables with identical
names, join
adds the suffix '_left'
and
'_right'
to the corresponding variables in
C
.
C = join(A,B,
specifies optional parameter name/value pairs to control how the dataset variables in
param1
,val1
,param2
,val2
,...)A
and B
are used in the merge. Parameters
are:
'Keys'
— Specifies the variables to use as keys in bothA
andB
.'LeftKeys'
— Specifies the variables to use as keys inA
.'RightKeys'
— Specifies the variables to use as keys inB
.
You may provide either the 'Keys'
parameter, or both the
'LeftKeys'
and 'RightKeys'
parameters. The
value for these parameters is a positive integer, a vector of positive integers, a
character vector, a string array, a cell array of character vectors, or a logical
vector. 'LeftKeys'
or 'RightKeys'
must both
specify the same number of key variables, and join
pairs the left and
right keys in the order specified.
'LeftVars'
— Specifies which variables fromA
to include inC
. By default,join
includes all variables fromA
.'RightVars'
— Specifies which variables fromB
to include inC
. By default,join
includes all variables fromB
except the key variables.
You can use 'LeftVars'
or 'RightVars'
to include
or exclude key variables as well as data variables. The value for these parameters is a
positive integer, a vector of positive integers, a character vector, a string array, a
cell array of character vectors, or a logical vector.
[C,IB] = join(...)
returns an index vector
IB
, where join
constructs C
by horizontally concatenating A(:,LeftVars)
and
B(IB,RightVars)
. join
can also perform more
complicated inner and outer join operations that allow a many-to-many correspondence
between A
and B
, and allow unmatched observations
in either A
or B
.
C = join(A,B,'Type',TYPE,...)
performs the
join operation specified by TYPE
. TYPE
is one of
'inner'
, 'leftouter'
,
'rightouter'
, 'fullouter'
, or
'outer'
(which is a synonym for 'fullouter'
).
For an inner join, C
only contains observations corresponding to a
combination of key values that occurred in both A
and
B
. For a left (or right) outer join, C
also
contains observations corresponding to keys in A
(or
B
) that did not match any in B
(or
A
). Variables in C
taken from
A
(or B
) contain null values in those
observations. A full outer join is equivalent to a left and right outer join.
C
contains variables corresponding to the key variables from both
A
and B
, and join
sorts the
observations in C
by the key values.
For inner and outer joins, C
contains variables corresponding to
the key variables from both A
and B
by default, as
well as all the remaining variables. join
sorts the observations in
the result C
by the key values.
C = join(A,B,'Type',TYPE,'MergeKeys',true,...)
includes a single
variable in C
for each key variable pair from A
and B
, rather than including two separate variables. For outer joins,
join
creates the single variable by merging the key values from
A
and B
, taking values from
A
where a corresponding observation exists in
A
, and from B
otherwise. Setting the
'MergeKeys'
parameter to true
overrides
inclusion or exclusion of any key variables specified via the
'LeftVars'
or 'RightVars'
parameter. Setting
the 'MergeKeys'
parameter to false
is equivalent
to not passing in the 'MergeKeys'
parameter.
[C,IA,IB] = join(A,B,'Type',TYPE,...)
returns
index vectors IA
and IB
indicating the
correspondence between observations in C
and those in
A
and B
. For an inner join,
join
constructs C
by horizontally
concatenating A(IA,LeftVars)
and B(IB,RightVars)
.
For an outer join, IA
or IB
may also contain
zeros, indicating the observations in C
that do not correspond to
observations in A
or B
, respectively.
Examples
Create a dataset array from Fisher's iris data:
load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs);
Create a separate dataset array with the diploid chromosome counts for each species of iris:
snames = nominal({'setosa';'versicolor';'virginica'}); CC = dataset({snames,'species'},{[38;108;70],'cc'}) CC = species cc setosa 38 versicolor 108 virginica 70
Broadcast the data in CC
to the rows of iris
using the key variable species
in each dataset:
iris2 = join(iris,CC); iris2([1 2 51 52 101 102],:) ans = species SL SW PL PW cc Obs1 setosa 5.1 3.5 1.4 0.2 38 Obs2 setosa 4.9 3 1.4 0.2 38 Obs51 versicolor 7 3.2 4.7 1.4 108 Obs52 versicolor 6.4 3.2 4.5 1.5 108 Obs101 virginica 6.3 3.3 6 2.5 70 Obs102 virginica 5.8 2.7 5.1 1.9 70
Create two datasets and join them using the 'MergeKeys'
flag:
% Create two data sets that both contain the key variable % 'Key1'. The two arrays contain observations with common % values of Key1, but each array also contains observations % with values of Key1 not present in the other. a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',... 'VarNames',{'Key1' 'Var1'}) b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',... 'VarNames',{'Key1' 'Var2'}) % Combine a and b with an outer join, which matches up % observations with common key values, but also retains % observations whose key values don't have a match. % Keep the key values as separate variables in the result. couter = join(a,b,'key','Key1','Type','outer') % Join a and b, merging the key values as a single variable % in the result. coutermerge = join(a,b,'key','Key1','Type','outer',... 'MergeKeys',true) % Join a and b, retaining only observations whose key % values match. cinner = join(a,b,'key','Key1','Type','inner',... 'MergeKeys',true) a = Key1 Var1 'a' 1 'b' 2 'c' 3 'e' 11 'h' 17 b = Key1 Var2 'a' 4 'b' 5 'd' 6 'e' 7 couter = Key1_left Var1 Key1_right Var2 'a' 1 'a' 4 'b' 2 'b' 5 'c' 3 '' NaN '' NaN 'd' 6 'e' 11 'e' 7 'h' 17 '' NaN coutermerge = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'c' 3 NaN 'd' NaN 6 'e' 11 7 'h' 17 NaN cinner = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'e' 11 7