Parsing Through HTLM Style .txt File

3 views (last 30 days)
Adam Holland
Adam Holland on 13 Jun 2024
Commented: Ganesh on 14 Jun 2024
I'm trying to read in a txt file with a series of messages describing what is happening during a sporting event. The general format seems to be based on HTML but the file is given as .txt. I'm able to open the file and save it as a variable using fscanf (not sure if this is the most appropiate way for this application). I'm having trouble wrapping my head around how to search for blocks of code that I want to record for further analysis. For example one of things ill be doing is looking for a block like the one below.
<FLAG>
<ID>e9f40b4b-decd-4b47-ae83-e6f9c972c570</ID>
<DT>01.07.2023 16:30:00.000</DT>
<FL>Green</FL>
<LV>Track</LV>
<FI />
</FLAG>
So in plain English i want to look for every time there is <FLAG> and then record the time of day which is denoted by <DT>. I'm pretty rusty with my MATLAB never any reading from .txt files when i was at my best so I think once I get the hand of reading and searching through the file I'll be able to figure it all out from there.

Answers (2)

Ganesh
Ganesh on 13 Jun 2024
Edited: Ganesh on 13 Jun 2024
You can consider using the "readstruct()" function in MATLAB. It allows you to input an "XML" file and parses it into a structure. Please find the code implementation of the same below:
a = readstruct('Data.txt',FileType="xml")
a = struct with fields:
FLAG: [1x2 struct]
% You can now access the fields like any normal MATLAB structure
a.FLAG(1)
ans = struct with fields:
ID: "e9f40b4b-decd-4b47-ae83-e6f9c972c570" DT: 01.07.2023 16:30:00.000 FL: "Green" LV: "Track" FI: ""
for i=1:length(a.FLAG)
disp(num2str(i)+ " -- "+a.FLAG(i).ID)
end
1 -- e9f40b4b-decd-4b47-ae83-e6f9c972c570 2 -- e9f40b4b-decd-4b47-ae83-e6f9c972c571
You can read up more on readstruct from the following documentation:
  3 Comments
Adam Holland
Adam Holland on 13 Jun 2024
Thanks so much! I really like how doing it this way breaks it down into a structure variable. For some reason it says that my file isnt a valid xml format but I'll have to keep this method in mind for future projects.
Ganesh
Ganesh on 14 Jun 2024
Do refer to the ".txt" file attached. You might be missing to enclose the whole data within a tag.

Sign in to comment.


Hassaan
Hassaan on 13 Jun 2024
% Open the file
fileID = fopen('your_file.txt', 'r');
if fileID == -1
error('Could not open file');
end
% Read the file
fileContent = fscanf(fileID, '%c');
fclose(fileID);
% Find all occurrences of <FLAG> blocks
flagStart = strfind(fileContent, '<FLAG>');
flagEnd = strfind(fileContent, '</FLAG>');
for i = 1:numel(flagStart)
% Extract the FLAG block
flagBlock = fileContent(flagStart(i):flagEnd(i)+6); % +6 to include </FLAG>
% Find and extract the time of day (DT)
dtStart = strfind(flagBlock, '<DT>') + 4;
dtEnd = strfind(flagBlock, '</DT>') - 1;
timeOfDay = flagBlock(dtStart:dtEnd);
% Display or save the time of day
disp(['Time of day: ' timeOfDay]);
end
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.
  2 Comments
Adam Holland
Adam Holland on 13 Jun 2024
Works perfectly. Thanks so much!
Hassaan
Hassaan on 13 Jun 2024
@Adam Holland You are welcome.
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.

Sign in to comment.

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!