File Exchange

image thumbnail

txt2mat

version 6.60.3.0 (31.4 KB) by Andres
fast and versatile ascii data import capable of handling large text files

12 Downloads

Updated 17 Feb 2019

View Version History

View License

As txt2mat basically is a wrapper for sscanf, it quickly converts ascii files containing m-by-n numeric data, allowing for header lines. When encountering rows with different numbers of data elements, it will work line-by-line and thus slow down somewhat.
You may let txtmat carry out an automatic data layout analysis on comparatively 'simple' text files (header lines + decimal number data with common delimiters). By this analysis it is able to directly import most numeric .csv-files, for instance.
As txt2mat can perform string and regular expression replacements before the numeric conversion, it can cope with many irregularities within the data. By that it is also capable of detecting and handling commas as decimal characters (common german notation).
You can filter lines by keywords, skip lines by line number, provide appropriate format strings (as for sscanf), or split up the import process for huge files if you encounter memory problems.
You may also use the above to simply read the manipulated text into a string or to put each line into a separate cell without the numeric conversion.
txt2mat should work on Matlab R2007a and newer versions.

Comments and suggestions welcome.

Andres

Cite As

Andres (2020). txt2mat (https://www.mathworks.com/matlabcentral/fileexchange/18430-txt2mat), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (70)

Jose Miguel

Hi Andres, I would need some help regarding the possibility of txt2mat to read a few data files I'm managing now.
Could you please get in contact with me at my email address ?
Thanks a lot
Regards
Jose

T A

Right, I should've stated that the apparent bug in string replacement was the reason for my comment. Although, I do frequently take advantage of txt2mat's ability to more-or-less filter out a column of text in a data file.

Andres

Hi T A, thanks for the feedback. As stated in the description, the auto-detection is meant to work with files with (a variable number of) header lines followed by numeric data (with common delimiters) only, while your line contains mixed string and numeric data.
Still I have to check why it makes a difference if there is "blag" or "bgl" at the beginning of the line...

T A

I'm getting some odd behavior when I leave txt2mat to auto-detect string replacement. Here's my 1-line text file:
blag 988301 0 2678402.0000 0.0000000000 0.0000000000 0.9849370000 0.9849370000
Running txt2mat as follows:
A=txt2mat(filin,'numheaderlines',0)
gives A as a 1x7 matrix of NaNs, and the command line says it had trouble reading. Most notably, it shows:
* 1 string replacement(s) »bgl «
Somehow, it's turning "blag" into "bgl". I tried changing the "blag" string in the file to "crek", and txt2mat gives this:
* 1 string replacement(s) »ckr «
Furthermore, if I change "blag" to "bgl" in the file, then txt2mat runs successfully. I therefore infer the string replacement is the source of the problem, though I haven't had a chance to check the code.

Denis Anikiev

Andres

Hi T A, that's correct, thank you. Up to now ReadMode is only checked for being a character array, so a typo will lead to an error. I'll fix that.

T A

FYI, it looks like you don't test for valid values passed to ReadMode. This led to an error because minLbAwareness was not set. Great tool, though.

David Walwark

farhad abtahi

Hi dear all,

does any one have access to this file (File ID#18430, 23 Jan 2008)?

Vijay Anand

Changming Liu

Excellent work with large txt files! Thank you!

vivek GB

Thank you so much for this one!!!

Andres

Hi valere demeter,

thank you for reporting this, and sorry for my late answer (I somehow did not receive a note about your comment although an e-mail should be sent immediately).
The problem occurs if the number of header lines specified exceeds half the number of lines of the whole file. I have just submitted a bug fix version.

valere demeter

Hi Andres,

When specifying more than 16 header lines, I am getting "no numeric data found" when reading a file with the following format in every line:

0.0, 0.0,

The file has more than 17 lines and the command I am using is:

[A,ffn,nh,SR,hl,fpos] = txt2mat('dat.txt','NumHeaderLines',17,'ReplaceChar',', ');

If I let it determine the format it works ok.

Thanks,

nathan q

A really useful tool. Deals with errors in data files robustly. Thanks!

Andres

The issue that Taylor came across was caused by a matlab bug in R2015a and R2015b.

http://www.mathworks.com/matlabcentral/answers/249344-matlab-array-size-limit-error-why

It has been fixed with R2016a.

Taylor, thank you for your support to find the cause of the problem.

Marcelo Toledo

Andres

Hi Taylor,
there's no contact link on your profile page, so I reply here again.

Trying to reproduce the error you observed, I created a large text file with 1000000x40 values and I set the maximum array size to 1% of RAM.
Then I get the error, but it points to another line of code:
f8(end-cnt_trail_white+1:end) = spuint; % fill with spaces
(it is around line number 2800, my txt2mat.m has some more header comment lines).
In this case the error is unexpected, because I do not alter the size of a variable. I have reproduced this behaviour in a test script and I have contacted mathworks service with a bug report for this.

I'd like to know how the error may point to line 2702:
cntLb(lbCntr+1)= cntLb(lbCntr) + sum(f8(idxLo:idxHi)==uintLb);
as in your case. Do you use special parameter options with txt2mat, like the 'MemPar' argument?
So far, I can not imagine how your error might be produced, because the variable "cntLb" is initialized to its maximum expected length a few lines before, so it should not get larger either. This is either a matlab bug, too, or I have done something wrong there, so I'd like to find out.

If you should not have the chance to support me in finding a possible bug, no problem, just let me know please, then I go on on my own.

To contact me, you may check my pofile page, or execute
x=-2:3;
disp(char(round([polyval([-0.32,0.43,1.75,-5.90,-0.95,116],x),...
polyval([-4.44,9.12,29.8,-33.6,-52.9, 98],x)])))

Andres

Hi Taylor,
thank you very much for reporting this! I will investigate it further. If you could provide me with further details, please contact me using the link on my profile page.

Taylor

Great tool, I've used it quite a bit. But I recently upgraded Matlab and, in R2015a, the program will fail on larger files if the "Limit the maximum array size to a percentage of RAM" option is checked under "Workspace" in the Preferences. The error pointed to line 2702:
cntLb(lbCntr+1)= cntLb(lbCntr) + sum(f8(idxLo:idxHi)==uintLb); %#ok<AGROW>
My impression is that this option in Matlab somehow interferes with the program's ability to determine how much memory to use, so it ends up requesting too much and Matlab blocks it. Turning off the option fixed the issue for me.

Eric Kappel

Brilliant in its function!
Excellent in its help/examples!

Tom Are

Marcelo Toledo

Andres

Hi Kaare, I had no success in contacting you via your author page, but thanks for your suggestion. And yes, a new version of txt2mat capable of skipping lines by line number will come up soon.

Kaare

Very good :) Very fast!
Would it be possible to also include a "skip every n'th line" argument? It's really all it needs for me to discard my own implementation and use this instead.

Andre

Great! Thanks a lot. Works perfect!

Liping LI

perfect~~

Jânio Anselmo

Perfect!!

Carl Fischer

Great. Thanks a lot.

Kyle

Brilliant work - saved me a day. Much better than Matlab itself.

Clemens

Great work! Your submission provides a effortless solution which is just doing what I was expecting in every case I've used it.

Amr Suleiman

Henk-Jan Ramaker

Superb file! Really great! The speed and flexibility is impressive. However, i'm experiencing some troubles. When I read this txt file:

Symbol,Date
XXH00,09/221999

it works great! But, when I read this txt file:

Symbol,Date
ESH00,09/221999

txt2mat cannot read any lines. I discovered, when the 2nd line starts with an 'E' or 'N', txt2mat somehow cannot match the format.

Is there a way around this or to fix this problem?

Thomas

I wasted 3 hours searching for a solution to my problem before I stumbled upon this beauty! Thanks so much for all the effort you put into this. Good explanation and examples to get you started. Part of my permanent function collection..

ESTEBAN

It works nice and smooth importing DEM (digital elevation models)...Thanks a lot

Juan Pedro

Great job

zampala ballarini

thanks a lot. Have been looking for this for a while.
great job

Andres

Hi arouabm Ben Mohamed,
I'm sorry, txt2mat does not support mixed string/numeric output in cells as e.g. textscan does. It would be quite hard (probably) to change this without sacrificing speed and R13 compatibility.
Btw, you can use some features of txt2mat (like line filtering) with read mode 'char' and use textscan afterwards (see example 3b).
Perhaps, if ease of use is more important than speed, you may find the fex submission "readtext" useful as well.

arouabm Ben Mohamed

Hi
thank you very much for this code it is great
but how can we make this function read a text file and convert it into a matrix containing words not characters
actually my file contains strings and numbers

Matthew

Fabulously useful code - thanks Andres...

I am particularly impressed with the Param_Array t2mOpts and its' many flexible uses for reading complex data. I hope MATLAB considers standardizing your work.

Also BRAVO with your highly useful documentation!

Yvan Lengwiler

I haven't looked at the code yet, but I have used this submission to read in 1 GB of data. It just worked beautifully, very fast, and allowed me to filter out bogus lines. Very useful submission.

Akshay B

Thanks for your reply, Andres. Really, Mathworks should take your function and put it in their base functions - you have taken great efforts!!

Andres

Thanks for your feedback. Indeed it would be nice to have a handy column selection input argument instead of using 'ConvString' with something like '%f %*f %f %*f %f'.
Regarding your memory issue, so far I can only refer to the 'NumericType' param and especially to example 6, but you probably found that in the doc already. txt2mat reads in the whole part of the file that is to be imported and initializes the output array. In an extreme case, a double array holding the data may consume about four times more memory than the file, unless some rows have missing values. I am thinking about making txt2mat more memory efficent - e.g. avoid slurping large files - without noticeably loosing speed (but tbh I have hardly any spare time now).

Akshay B

An awesome and VERY helpful function!!! The param/value pairs are great - especially 'BadLineString' and 'RowRange'.

Small suggestion for future spins - if possible, having a 'ColumnRange" would be great.

Btw, I ran into an "Out of Memory" situation when processing a 355MB file, even though I have 3GB RAM and the 'memory' command reported that have atleast ~500MB for a single array (~1200MB for arrays). Any ideas how to fix this?

Thanks!!!

Andres

It would be most simple if you could uniquely identify every line of text by some string (or some few strings), as with 'tex' (or just 'x') in your example:

txt2mat('example.txt','BadLineString',{'tex'})

If that is not possible, you could read in all lines and then remove all NaN-only rows:

A = txt2mat('example.txt',0,'ReadMode','line')
A = A(any(isfinite(A),2),:)

If things are more complex, you may also contact me via the 'Contact Author' link. Good luck!

AwedBy Matlab

(I believe a 5 star rating is in order for this function anyway :-) )

AwedBy Matlab

I tried using txt2mat to extract a matrix A from a file data.csv that has several lines of explicative text interspersed throughout the file, like this:

text 1
1 2 3
4 5 6
7 8 9
text 2
10 11 12

For the example above, txt2mat produced the matrix

10 11 12

rather than what I expected, which was

1 2 3
4 5 6
7 8 9
10 11 12

Can anyone help? Thanks!

Pavan

Superb function. Worked without a problem. Great stuff.
Thanks Andres

achus Pujante

Great function, you saved me a lot of time, thank you very much
Then I used your function to simplify my own case

S. P.

Andres

@ Leonard
Thanks for pointing to the version number issue with the MCR (and that part of code you mention should have been replaced anyhow...). I've just sent you an email regarding a possible solution; if it got lost, please use the 'Contact Author' link on my author page.
Regards
Andres

Leonard

txt2mat is excellent by being very straightforward in it's implementation. Could be the defacto standard within Matlab. Thanks Andreas for this.

If you are considering compiling a standalone application and deploying it using the MCR, you may want to consider the following:
I did not encounter any errors using this .m file while using it as long as MATLAB was installed on my machine (ver 7.5, BTW).
When deploying my executable to another machine and using the MCR (ver 7.7), the command line indicated the following:

Too many objects requested. Most likely cause is missing [ ] around left hand side that has a comma separated list expansion.

Error in ==> txt2mat at 519

515 %% Definitions
516
517 % find out matlab version as a decimal, up to the second dot:
518 v = ver('matlab');
519 vs= v.Version;
520 vsDotPos = [strfind(vs,'.'), Inf, Inf];
521 vn= str2double(vs(1:min(numel(vs),vsDotPos(2)-1)));

The .m file halted execution of my program because it was looking for a version # for MATLAB, which was not installed on the target machine. I patched the code by determining the matlab ver and editing the txt2mat.m as follows:

518 %v = ver('matlab');
519 vs = 7.5; %vs= v.Version;
520 vsDotPos = [strfind(vs,'.'), Inf, Inf];
521 vn= str2double(vs(1:min(numel(vs),vsDotPos(2)-1)));

Maybe it's possible to check for 'Matlab' on the target machine and/or the MCR and then handle this line appropriately.

Regards,
Len

Val Schmidt

This seems like a terrifically useful tool.

One feature request. In addition to being able to specify characters that, if found in a line, mark the line for omission, it would be nice to be able to instead skip everything by default and specify characters of lines that are to be included.

Suppose, for example, you want to parse a log file. You'll want to extract all the log entries of a particular type. Rather than having to specify every other type of entry for omission, it'd be nice to be able to specify just the ones you want.

Thanks,
Val

Gabriel Vézina

great code and fast to execute

Bas

Needed to have a file import script that could handle a comma as the decimal separator. This worked instantly.

Jose Miguel Jauregui

A great job,
It works perfectly when it comes to load very extensive text files, a very fast code. If you work with extensive text files, this is the code to use.

In case you need a modification Andres is always willing to provide help

Josemi

Fredrik

Andres

@ John McArthur
The header itself *is* accessible as a string, e.g.

>> [A,ffn,nh,SR,hl] = txt2mat('myfile.txt');
>> headerNumbers = sscanf(hl,'%*s%f,%*s%f')

headerNumbers =
1001
400
40
15

(generally it is hard to guess which information in the header is useful to the user)

John McArthur

Wondering if there's a way to get the program to recognize and import data in the header line. For instance, if you have the test conditions as header lines, like this:

nTimeInc, 1001, TotalTime, 400
ForcingFreq, 40, ForcingAmp, 15
T, X, Y, Z, Theta, Zeta
0, 0, 0, 0, 0, 1.0
0.4, 1.2, 0, 0, 0.01, 1.2
....

So, the header files have some useful info that would be nice to have accessible and can be added to plots and analysis.

Any thoughts?
johnnyfisma@hotmail.com

ZY

A powerful code. I am using it (with some help from the author of the code) to read very complicated data files with headerlines throughout the file as well as data lines with different lengths (line folding).

And it is quite fast too.

Thanks Andres for your great work!

Zahra

Ralf

djr djr

rodrigo abarca del rio

Excellent work. It helped me in reading data from different format, and skipping the headlines ... in just a second, after having tried over more than 4 hours by different methods. we are close to fortran now :-)
thanks so much a lot for your work.

Andres T.

@wu zhiyong
Please see my posting in the newsgroup.

wu zhiyong

Great work!
Thanks for your suggestion!
It helps me read the data from text files with ignoring the characters.But I fail to ignore the blank lines.
Do you have any suggestion?

K AM

Fantastic script. Worked right out of the box. I used it bring in oddly shaped data files and it worked like a dream! Thanks Andres!

Wladimir Alonso

It worked beautifully when I needed to read solar data tables (http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/list_by_state.html) that contained headers, text and numbers (in this case using txt2mat(name_of_the_file,1,43,'ConvString',['%d-%d-%d,%d:%f' repmat(',%f',1,38)]) as kindly suggested by the author of the function)
thanks Andres!

Florian H

Excellent! Finally a function that handles textfiles without the hassle of the builtin MATLAB ones.

I've combined it with
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15294
to get a really usefull drag-and-drop import funtion for various textformats.

vincenzo ficco

MATLAB Release Compatibility
Created with R2014a
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!