Remove specific data sequence when reading .bin file

2 views (last 30 days)
How do I remove/skip the specific data sequence [1024 0 2240 -24500] when reading the .bin file in batch? thanks!
filename='rf.bin'; % filename='./rf.bin';
fid=fopen(filename,'r');
dataHeader=fread(fid,123,'uint8'); % skipping the header for .bin
NsperBatch = 1e3; % number of sample per batch
K=100; % Average every K set of values, K=100 in this case
magSpectrumMat=[];
while ~feof(fid)
magSpectrum=0;
for k=1:K
data=fread(fid,NsperBatch*2,'int16','b');
dataIQ=data(1:2:end)+1i*data(2:2:end);
dataSpectrum=fftshift(fft(dataIQ));
magSpectrum=magSpectrum+abs(dataSpectrum).^2;
end
magSpectrum = magSpectrum/K;
magSpectrumMat = [magSpectrumMat magSpectrum];
end
magSpectrumMat_dB=pow2db(magSpectrumMat);
  3 Comments
Ivy Chen
Ivy Chen on 12 Oct 2017
Edited: Ivy Chen on 12 Oct 2017
Yes, it can occur anywhere. What you described above is the preferred way to just drop the [1024 0 2240 -24500] data sequence and read four more values in the block. This way, it will make the calculation and matrix clean. Thanks for the help.
Walter Roberson
Walter Roberson on 12 Oct 2017
Is there any possibility that it could occur inside the 123 byte header? Is there any possibility it could start inside the 123 byte header but end outside the header?

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 12 Oct 2017
Okay, here it is, with skips accounted for, and with automatic padding in case the data is the wrong size.
As you indicated the words to skip could occur "anywhere" after I asked about that, I assumed that the words to skip might even occur during that 123 byte header.
I assumed that if there were not a full K batch that you wanted to take the mean of what was available in the last partial batch rather than dividing by K specifically.
I vectorized a lot of the computation.
I was not completely sure of the order of data you wanted to output. I think your existing code is putting the results for averaging into column vectors in a matrix; that is the output format I create here.
The below is not tested as I do not happen to have your data file.
NsperBatch = 1e3;
header_size = 123;
K=100; % Average every K set of values, K=100 in this case
pattern_to_skip = int16([1024 0 2240 -24500]); %magic sequence of words to ignore
filename = 'rf.bin'; % filename='./rf.bin';
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8'); %big endian
PL = length(pattern_to_skip);
fid = fopen(filename,'r');
bytes = reshape( fread(fid, inf, '*uint8'), 1, []); %row vector
fclose(fid);
orig_num_bytes = length(bytes);
skiplocs = strfind(bytes, pattern_to_skip);
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; %delete bytes
end
postskip_num_bytes = length(bytes);
fprintf('%d groups were skipped\n', (orig_num_bytes - postskip_num_bytes) / PL );
dataHeader = bytes(1:header_size);
bytes = bytes(header_size+1:end);
data_length = length(bytes);
if mod(data_length, 2) ~= 0
fprintf('warning: data is odd number of bytes long, padding\n');
bytes(end+1) = 0;
end
if mod(data_length, 4) ~= 0
fprintf('warning: data is odd number of words long, padding\n');
bytes(end+1:end+2) = 0;
end
words = typecast(bytes, 'int16');
all_dataIQ = double( complex( words(1:2:end), words(2:2:end) ) );
num_dataIQ = length(all_dataIQ);
target_num_dataIQ = NsperBatch * ceil( num_dataIQ / NsperBatch);
if num_dataIQ ~= target_num_dataIQ
fprintf('warning: complex data is not a multiple of %d samples long, padding\n', numNsperBatch);
all_dataIQ(target_num_dataIQ) = 0; %zero fill automatically
end
magSpectra = abs(fftshift( fft( reshape(all_dataIQ, NsperBatch, []) ) )).^2; %do it all at once!
num_spectra = size(magSpectra, 2);
num_full_batches = floor(num_spectra / K);
num_leftover = num_spectra - K * num_full_batches;
num_batches = num_full_batches + (num_leftover ~= 0);
magSpectrumMat = zeros(NsperBatch, num_batches);
for batch_idx = 1 : num_full_batches
bstart = (batch_idx - 1) * K + 1;
bend = bstart + K - 1;
magSpectrum = mean( magSpectra(:, bstart : bend ), 2 );
magSpectrumMat(:, batch_idx) = magSpectrum;
end
if num_leftover ~= 0
magSpectrum = mean( magSpectra(:, end-num_leftover+1 : end), 2 );
magSpectrumMat(:, end) = magSpectrum;
end
  8 Comments
Walter Roberson
Walter Roberson on 16 Oct 2017
Do not modify the bytes = line. Your header is defined by an odd number of bytes, and if you swap at the time you read them in, you would move byte 123 to the position of byte 124 and would then be ignoring the wrong byte. So you have to scan as bytes and delete the garbage as bytes (unless you are sure the garbage never occurs in the headers), and once have scrubbed the garbage you need to trim off the first 123 bytes of what is left.
Once you have trimmed off the header, there is a possibility that you need to byte swap: it depends on how the data was stored.
When you described the data values to remove, I assumed you had read through the data stream and had found those particular numeric values after reading as int16, with the implication that the bytes were in the other order (because the native order on whatever host you are using is little-endian.) But it is possible that you were told the sequence of bytes by someone else who assumed you were using big endian, in which case the byteswap would not be needed... Do you have a sample file known to have the sequence of bytes in it that you could process with byteswap or not on the match for deletion, to check to see which is happening in practice?
If the data is written as big-endian then you would need to byteswap the int16, which you would do by changing
words = typecast(bytes, 'int16');
to
words = byteswap( typecast(bytes, 'int16') );
Ivy Chen
Ivy Chen on 16 Oct 2017
It is the last case that the IQ data is big endian. I will add the byteswap to complete the process. thanks again!

Sign in to comment.

More Answers (0)

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!