Converting unformatted text to formatted text
6 views (last 30 days)
Show older comments
I asked this question before and neglected some info, so I want to start fresh to avoid confusion.
clear all;
close all
clc
projectdir = 'C:\Users\me\data.psr';
newdir = 'C:\Users\me\Desktop\Test1';
fid=fopen(projectdir,'r');
T=textscan(fid, '%s');
fclose(fid);
for i=8:107
a=T{1,1}{i,1};
b= a(30:48);
matrix(i).r = b(2);
matrix(i).c = b(5);
matrix(i).info = b(8:13);
end
A = zeros(9,9)
for j=8:107
A(matrix(j).r, matrix(j).c) = matrix(j).info;
end;
The error:
Assignment has more non-singleton rhs dimensions than non-singleton subscripts
Error in Untitled2 (line 23)
A(matrix(j).r, matrix(j).c) = matrix(j).info;
This answer by user Stephen Cobeldick might help, although it was created only to deal with the histogram. It gives an error when ran however.
str = fileread('temp.txt');
% identify digits:
rgx = '[A-Z]+\[(\d+)\]\[(\d+)\]:*(\d+)';
C = regexp(str,rgx,'tokens');
% convert digits to numeric:
M = cellfun(@str2double,vertcat(C{:}));
M(:,1:2) = 1+M(:,1:2);
% convert to linear indices:
out = nan(max(M(:,1)),max(M(:,2)));
idx = sub2ind(size(out),M(:,1),M(:,2));
% allocate values:
out(idx) = M(:,3)
Error using cellfun
Input #2 expected to be a cell array, was double instead.
Error in Untitled3 (line 12)
M = cellfun(@str2double,vertcat(C{:}));
7 Comments
Stephen23
on 25 Nov 2015
I hope that you get the help and information that you need, and have fun learning MATLAB! We do put a lot of effort in when people need it, so please come and ask more questions :)
Stephen23
on 26 Dec 2020
OP deleted comments which are still visible in Google Cache:
Accepted Answer
per isakson
on 24 Nov 2015
Edited: per isakson
on 28 Nov 2015
I have assumed that the size of the resulting arrays are known
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
counter_matrix = nan( 10, 10 );
for jj = 1 : length( counter_rows )
%
cac = textscan( counter_rows{jj}, '%*s%d%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
counter_matrix( cac{1}+1, cac{2}+1 ) = cac{3}; % one based
end
mean_vector = nan( 1, 10 );
for jj = 1 : length( mean_rows )
%
cac = textscan( mean_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
mean_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
range_vector = nan( 1, 10 );
for jj = 1 : length( range_rows )
%
cac = textscan( range_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
range_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
 
or maybe better - no assumptions regarding sizes
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
CRS = permute( char( counter_rows ), [2,1] );
cac = textscan( CRS, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
counter_matrix( ix_linear ) = num(:,3);
counter_matrix = reshape( counter_matrix, sz );
MRS = permute( char( mean_rows ), [2,1] );
cac = textscan( MRS, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
mean_vector( num(:,1)+1 ) = num(:,2); % one based
RRS = permute( char( range_rows ), [2,1] );
cac = textscan( RRS, '%*s%f%f' ...
, 'Delimiter' , ' []:'...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
%
range_vector( num(:,1)+1 ) = num(:,2); % one based
hope they return identical results :-)
 
and another iteration
Comments:
- A function is superior to a script. It doesn't mess with the base workspace. It's easier to debug and it's easier to call from a script or function.
- This function is readable. It's fairly straightforward to add new keywords and row formats.
- The switch case can be replaced by a feval construct. But why do that?
- The subfunctions, f1, f2 and f3, have large parts of their code in common. That asks for further refactoring.
- Allocating a separate sub-function to each type of row makes testing easier.
- If speed becomes a problem analyze the code with the profiler.
>> S = cssm( 'c:\m\cssm\text4.txt' )
S =
RainflowCycleCounterHistogram: [10x10 double]
RainflowCycleMeanBreakpoints: [-111 100 300 330 360 380 390 400 410 420]
RainflowCycleRangeBreakpoints: [0 35 70 100 135 170 200 230 260 300]
RainflowCycleReversalTolerance: 20
PowerCylinderTemperature: 0
PowerCylinderTemperatureHistogram: [1x12 double]
PowerCylinderTemperatureHistogramBreakpoints: [0 150 175 200 220 250 300 320 350 370 400]
>>
where
function S = cssm( filespec )
fid = fopen( filespec );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = strtrim( rows{:} );
type_list = {
... format keyword
'f1', 'RainflowCycleCounterHistogram'
'f2', 'RainflowCycleMeanBreakpoints'
'f2', 'RainflowCycleRangeBreakpoints'
'f3', 'RainflowCycleReversalTolerance'
'f3', 'PowerCylinderTemperature'
'f2', 'PowerCylinderTemperatureHistogram'
'f2', 'PowerCylinderTemperatureHistogramBreakpoints'
};
for jj = 1 : size( type_list, 1 )
switch type_list{jj,1}
case 'f1'
S.(type_list{jj,2}) = f1( type_list{jj,2}, rows );
case 'f2'
S.(type_list{jj,2}) = f2( type_list{jj,2}, rows );
case 'f3'
S.(type_list{jj,2}) = f3( type_list{jj,2}, rows );
otherwise
error( 'The format, "%s", is not yet implemented', type_list{jj,1} )
end
end
end
function matrix = f1( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
matrix( ix_linear ) = num(:,3);
matrix = reshape( matrix, sz );
end
function matrix = f2( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
matrix( num(:,1)+1 ) = num(:,2); % one based
end
function matrix = f3( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f', 'Delimiter',':' );
matrix = cac{:};
end
function ism = is_member( keyword, rows )
% the keyword is followed by either ":" or "["
cac = regexp( rows, ['^',keyword,'(?=(:|\[))'], 'once' );
ism = not( cellfun( @isempty, cac ) );
end
12 Comments
dpb
on 25 Nov 2015
Edited: dpb
on 25 Nov 2015
What is the desired output again? I'd approach it a little more generically but not sure where am headed as for what, precisely to do with the end result but I'll note that from your file one can do the following--
>> S=textread('test4.txt','%s','delimiter','\n','whitespace','','headerlines',3); % read into cell array of strings
>> tok=cellfun(@(x) tokens(x,'[]:'),S,'uniformoutput',0); % find tokens each line
>> whos tok
Name Size Bytes Class Attributes
tok 52x1 13660 cell
>> tok{1} % sample what looks like
ans =
RainflowCycleCounterHistogram
0
0
1.0000000000
>> ntok=cellfun(@(x) size(x,1),tok); % number in each row
>> [min(ntok) max(ntok)] % range overall in file
ans =
2 4
>> for n=min(ntok):max(ntok) % build specific format string
fmt=['%s' repmat('[%d]',1,n-2) ':%f']
end
fmt =
%s:%f
fmt =
%s[%d]:%f
fmt =
%s[%d][%d]:%f
>> [u,iu]=unique(cellfun(@(x) x(1,:),tok,'uniform',0),'stable') % what's in file and where???
u =
'RainflowCycleCounterHistogram'
'RainflowCycleMeanBreakpoints'
'RainflowCycleRangeBreakpoints'
'RainflowCycleReversalTolerance'
'PowerCylinderTemperature'
'PowerCylinderTemperatureHistogram'
'PowerCylinderTemperatureHistogramBreakpoints'
iu =
1
8
18
28
29
30
42
>>
From the above pieces one can write a general parser for each possible data line format as long as they follow the form of
String[Index1][Index2]: Value
where the number of indices can be 0,1,2. The above actually will hand N-dimensional arrays; just that 2's the largest seen to date.
With the above it's simple enough to write a routine that loops over the elements in the U array , build the proper format string and select and parse the given lines without any specific testing for matching strings at all unless and until a user asks for only a given one or set at which time those can be returned from the general result.
But, you don't need to parse the individual lines at all; simply convert the fields within the token array for the ones of choice from the corollary tok array; ntok gives the info on how many elements there are corresponding to the fields.
function tok = tokens(s,d)
% Simple string parser returns tokens in input string s
%
% T=TOKENS(S) returns the tokens in the string S delimited
% by "white space". Any leading white space characters are ignored.
%
% TOKENS(S,D) returns tokens delimited by one of the
% characters in D. Any leading delimiter characters are ignored.
% DPBozarth (Rev 1 1998)
% Get initial token and set up for rest
if nargin==1
[tok,r] = strtok(s);
while ~isempty(r)
[t,r] = strtok(r);
tok = strvcat(tok,t);
end
else
[tok,r] = strtok(s,d);
while ~isempty(r)
[t,r] = strtok(r,d);
tok = strvcat(tok,t);
end
end
Also, of course, regexp can return tokens if one's got the patience to figure out the proper expression needed...
More Answers (1)
dpb
on 24 Nov 2015
>> fmt='%*s%f%f%f';
>> fid=fopen('test4.txt');
>> c=cell2mat(textscan(fid,fmt,'headerlines',3,'delimiter','[]:','collectoutput',1,'multipledelimsAsOne',1));
>> v(sub2ind(sz,c(:,1)+1,c(:,2)+1))=c(:,3)
v =
Columns 1 through 10
1 0 1 1000 0 0 0 1 0 0
Columns 11 through 20
0 0 0 1 0 0 0 0 0 0
>> fid=fclose(fid);
0 Comments
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!