Why does the "parquetwrite" function yield an error when used to write a column table containing mixed heterogenous primitive types in MATLAB R2023a?

11 views (last 30 days)
I am trying to write a single column table of mixed primitive types wrapped in cells to a parquet file. Since everything in the table is still a "cell", my understanding is that this should work. My code is as follows:
cellTable = table({1, [1,2,3], "hello", ["hi", "bye"]}')
parquetwrite(filename, cellTable)
On running the above code, I get this error message:
Error using parquetwrite T.Var1{3} is a string array. Based on T.Var1{1}, expected either a double array or a scalar <missing> value.
Is this behavior expected?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 16 Aug 2023
The "parquetwrite" function is working as intended.
It is not possible to write mixed primitive types in one variable to a Parquet column. This is because Parquet columns are strongly-typed, and you cannot write heterogenous primitive data to one column. When using cell arrays, we map them to Parquet LIST type, which are represented by two arrays in Parquet: the data array and an index array. The index array tells you how to partition the data array into rows. 
For instance, if you have this cell array:
>> cellArray =
  3×1 cell array
    {[    1]}
    {[2 3 4]}
    {[  5 6]}
 This gets mapped to a Parquet LIST column with this data array and index array:
>> data = [1 2 3 4 5 6]
>> index = [0 1 4 6] % note arrow uses 0-based indexing
In other words, the data is stored in a contiguous array, so it cannot contain mixed primitive types. This is why we cannot write a cell array containing both doubles and strings to a Parquet column.

More Answers (0)

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!