You can proceed with extracting the mel frequency cepstral coefficients (MFCCs) for both the audio inputs, sampled at a frequency of fs Hz. Now you will have a matrix of size (a,b) containing the MFCCs for both the audio's.
You can either keep the MFCCs as a 1-d vector of size(a*b,1) or as a 2-d vector/matrix of size(a,b) depending upon how you want to compute the correlation among them.
Then you can proceed with computing the correlation using corr or corrcoef.
Please have a look at this thread if you want to compute the correlation between two matrices.