How to convert HTML report to Excel

Hello,
we have a report in HTML which is generated by Polyspace and we wish to convert that into Excel format.The problem we have faced is, HTML page has some data in tables and while we try to read that no table data is accessable,we are able to read only text data.
%%Read in HTML file.
filenameHTML = uigetfile('.html');
txt = fileread(filenameHTML);
%%Remove HTML tags, header text, and last section (pertaining to images).
txt = regexprep(txt,'<script.*?/script>','');
txt = regexprep(txt,'<style.*?/style>','');
txt = regexprep(txt,'<.*?>','');
txt = regexprep(txt,'.*#\n','');
txt = regexprep(txt,'--.*?\n','');
txt = regexprep(txt,'\n\n.*','');
%%Set up delimiters and format specification to read columns of data as text:
delimiter = {' = '};
formatSpec = '%q%q%[^\n\r]';
%%Read columns of data according to the format.
dataArray = textscan(txt, formatSpec, 'Delimiter', delimiter); ...
%'TextType', 'char', 'ReturnOnError', false);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1); %preallocation before loop
for col = 1:(length(dataArray)-1)
raw(1:length(dataArray{col}),col) = dataArray{col};
end;
%%Write data to Excel spreadsheet.
filenameSpreadsheet = 'Example.xlsx';
xlswrite(filenameSpreadsheet,raw)

3 Comments

Attach the text you're trying to parse; not much of anything anybody could do w/o the data to see what's in the file.
You could start with the result txt attached as a .mat file...presuming the previous code is stripping only what is intended and there really is the needed data left in the remainder.
If that isn't so, then would have to have the original file...
The file attached is the original file,which i'm trying to extract data.Please find the attachment.
Well, the following at the bottom is pretty-much the story it would appear--
<!-- This template library is designed to work with the JavaScript TOC and autonumber
scripts included in this template. The Javascript autonumber script replaces the
autonumber elements used in this template with actual numbers when the report generated
from this template is loaded into a browser. The autonumber script implements the
autonumber behavior defined by the DOM AutoNumber class. -->
There isn't any table data in the file; it's all dynamic on the server to populate the browser view. Looks to me like the implementor would have to have provided a "Download" function or you would have to scrape the page to get the actual displayed data.

Sign in to comment.

Answers (0)

Products

Release

R2016b

Asked:

on 23 Jul 2020

Commented:

dpb
on 24 Jul 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!