How to code Categorical Variables in NARX neural network data input?
7 views (last 30 days)
Show older comments
I am working to predict electricity demand (load) and I am having many categorical variables as inputs to a Neural Network Time Series NARX app (eg: months (12 categories spelled out January -December), days (seven categories: 1 - 7), and Hours in each day (1 thru 24). When I load my excel data table to assign "Inputs" as my variables, the Matlab is not able to read and display my categorical variable "Months" because the values are spelled out January thru December. Should I write a simple line code such as below, or is there a different way to flag those variables as Categorical for NARX neural networks? I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc). Thank you in advance!
T.HE = categorical(T.HE); T.MONTH = categorical(T.MONTH);T.WEEKDAY = categorical(T.WEEKDAY);
3 Comments
Walter Roberson
on 3 Jan 2020
You will not be able to proceed with the Mathworks tools and will need to write your own. The Mathworks tools can only work with data that is all (orderable) numeric, or all categorical, or all cell array of character vectors.
Even if you were to switch to all categorical you would have challenges: when you concatenate together categorical arrays, the individual ranges loose their identity and a new categorical array is created that combines all of the categories, renumbering elements. The neural networks would have no way of knowing that the second column could not simultaneously have Tuesday and March for example.
However as I touched on in my Answer, I think you are making a mistake in trying to make the entries unordered. When you make them unordered you are saying that the second day of February has more predictive power for load on the second day of August than the first day of August has for the second day of August.
Accepted Answer
Walter Roberson
on 3 Jan 2020
T.MONTH_C = categorical(T.MONTH, {'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'}, 'ordinal', false);
T.HE_C = categorical(T.HOUR, 1:24, {'01:00', '02:00', '03:00', '04:00', ....... '24:00'}, 'ordinal', false);
T.WEEKDAY_C = categorical(T.WEEKDAY, 1:7, {'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'}, 'ordinal', false);
I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc)
I do not know what part of the world you live in, but in the part of the world that I live in, the electrical demands between adjacent calendar months are strongly correlated. The relationship between the demands for January and February are much stronger than the relationship between the demands between January and June.
2 Comments
Walter Roberson
on 3 Jan 2020
If I understand correctly, you can pass in a categorical array. However, all entries in the array would draw from the same categorization so you would not be able to create one column of the array that was restricted to weekdays and another column that was restricted to month.
If you were to switch to the one-of-N 0/1 representation then you would be able to combine that with non-categorical columns.
However as I have indicated above, I think that you are making a mistake.
There are several different kinds of electrical load. The ones such as cooking statistically peak around the same time every weekday, possibly a different peak time on Sunday.
The ones such as laundry tend to be more cyclic with irregular period depending on family size and age (e.g. I tend to do laundry on Sunday but families with small children might need laundry every day or two) . You can try day-of-week predictions for this kind but the correlation might not be so strong.
Then there is electricity for heating and cooling. The correlations for those are strong by adjacent calendar days: the weather tomorrow will not be all that difference (on average) from the weather today. Very few places fluctuate randomly between +30C and -30c on a daily basis, but the +30 highs tend to cluster and the -30 lows tend to cluster.
In the part of the world that I live in, the highest electricity demands are February because that is our coldest month and we have to heat a lot. There is also a notable July peak due to the need for cooling.
There are other parts of the world where the peak for the year is reliably local Summer, because of the strong cooling requirements.
These building heating and cooling requirements based upon weather are the biggest predictors by far of electricity load in many places, and you will be making a mistake to convert all of your date information into unordered categorical because the seasonal hints are ordered.
Within stretches short enough to be much the same weather, you do get weekday based and time if day cycles, with industrial use peaking during "working hours" for some industries (others work all night too), and non-heating residential use peaking at evening meal time (and again a little later for dishwasher use). So some cyclic analysis is good, but you need to know what you are analyzing.
More Answers (1)
SK
on 3 Jan 2020
4 Comments
Walter Roberson
on 4 Jan 2020
Yes, that makes sense. Version 2 corresponds to using unordered categories, and Version 1 corresponds to using ordered categories.
See Also
Categories
Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!