MATLAB Answers

Problems in reading large matrix with unintended gaps between data

1 view (last 30 days)
Poulomi Ganguli
Poulomi Ganguli on 16 May 2021
Commented: Rik on 18 May 2021
Hello:
I guess I have attached the output file instead of the actual one. I revised my submission here. The problem is with the last column DRNRF. There is a gap between two digits '02' '00', '03' '25' and so on. This two numerals should be merged together while reading giving the output as '0200' and '0325' respectively. The rest of the procedure I am following from this post, which is working: https://de.mathworks.com/matlabcentral/answers/823485-problems-in-reading-large-matrix-with-large-empty-cells
Any help in this regard would be appreciated. Thanks!
  1 Comment
Mathieu NOE
Mathieu NOE on 17 May 2021
well
I don't see a gap , but a dot between the two fields - or am I missing something ?
INDEX YEAR MN DT ..MAX ..MIN AW ..R/F .EVP DRNRF
-------------------------------------------------
42045 1985 01 01 2.5 -3.9 002.4 02.00
42045 1985 01 02 6.5 -2.9 003.4 03.25
42045 1985 01 03 5.0 -3.9 007.6 02.35
42045 1985 01 04 6.0 -1.1 000.0 00.00
42045 1985 01 05 5.2 -1.1 009.2 08.15
42045 1985 01 06 1.8 -4.1 009.5 11.20
when I import your file , the last column appears as :
outdata(:,end)
ans =
2.0000
3.2500
2.3500
0
8.1500
11.2000

Sign in to comment.

Answers (1)

Mathieu NOE
Mathieu NOE on 18 May 2021
hello again
so this is my simple fix , retrieve the two last vectors and do this very simple mathematical fix ;
your DRNRF data appears in (updated) outdata(:,8)
[outdata,head] = readclm('Test_data.txt',9);
outdata(:,8) = outdata(:,8)*100+outdata(:,9);
outdata(:,9) = []; % clean up
I like the readclm subfuntion , here is it (maybe the textscan afficionados will have a comment about my coding methods ...) :
function [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
% Text file can begin with a header or comment block.
% [DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
% Opens file FILENAME, skips first several lines specified
% by SKIP number or beginning with comment '%'.
% Then reads next several lines into a string matrix HEAD
% until the first line with numerical data is encountered
% (that is until first non-empty output of SSCANF).
% Then reads the rest of the file into a numerical matrix
% DATA in a format FORMAT with number of columns equal
% to number of columns of the text file or specified by
% number NCLM. If data does not match the size of the
% matrix DATA, it is padded with NaN at the end.
%
% READCLM(FILENAME) reads data from a text file FILENAME,
% skipping only commented lines. It determines number of
% columns by the length of the first data line and uses
% the floating point format '%g';
%
% READCLM uses FGETS to read the first lines and FSCANF
% for reading data.
% Kirill K. Pankratov, kirill@plume.mit.edu
% 03/12/94, 01/10/95.
% Defaults and parameters ..............................
formt_dflt = '%g'; % Default format for fscanf
addn = nan; % Number to fill the end if necessary
% Handle input ..........................................
if nargin<1, error(' File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
% Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
% Find header and first data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head % Add lines to header.....
s = fgets(fid); % Get next line
jl = jl+1;
is_skip = jl<=skip;
is_skip = jl<=skip | s(1)=='%';
out1 = sscanf(s,formt); % Try to read this line
% If unreadable by SSCANF or skip, add to header
is_head = isempty(out1) | is_skip;
if is_head & ~is_skip
head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
% Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
% Read the rest of the file ..............................
if l1~=nclm % First line format is different from ncolumns
outdata = fscanf(fid,formt);
lout = length(outdata)+l1;
ncu = ceil(lout/nclm);
lz = nclm*ncu-lout;
outdata = [out1'; outdata(:); ones(lz,1)*addn];
outdata = reshape(outdata,nclm,ncu)';
else % Regular case
outdata = fscanf(fid,formt,[nclm inf]);
outdata = [out1; outdata']; % Add the first line
end
fclose (fid); % Close file ..........
  3 Comments
Rik
Rik on 18 May 2021
That does make sense. Sometimes FEX submissions are deleted.
To read text files I have written the readfile function (which you can get from the FEX or through the AddOn-manager). It works on every release that supports && and || (i.e. R13 (v6.5) and later). Aparently it was a good idea to have such a function (including the ability to read from a URL), as Mathworks implemented such a function in R2020b (readlines).
I prefer to split tasks into different functions, so I would read the entire file as text, then outside that function remove the lines with header or comments and use textscan or regexp or str2double, whatever makes sense.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!