Case insensitivity for keys in outerjoin and ismember?

10 views (last 30 days)
Is there any way to have "outerjoin" ignore the letter case of string "key" fields (or cell arrays of characters) in "outerjoin" (or any join, for that matter)? What about for string comparisons by "ismember"?
If not, I have to create dummy columns of my comparison fields, with all text coerced into either lower or upper case, for the purpose of joins and ismember. I'd like to avoid this extra noise in my code and my tables.

Accepted Answer

Eric Sofen
Eric Sofen on 4 Jan 2023
We've had a variety of requests for a "lenient" join, but there's also a variety of different interpretations of "lenient." :-) We're still trying to figure out the best way to support this sort of thing.
For the time being, adding variables to your table with standardized text is the best way to solve the problem, as you're already doing.
  1 Comment
FM
FM on 4 Jan 2023
Edited: FM on 4 Jan 2023
Thanks, Eric.
In SQL, it's easy to define lenient joins on-the-fly by using a WHERE clause instead of an ON clause to define the join condition between two tables, e.g., "WHERE SomeBooleanExpression". In most cases, "SomeBooleanExpression" is Table1.KeyField = Table2.KeyField, but it can be anything, e.g., Table1.Field1 > Table2.Field2 + Table1.Field3. This can result in many-to-many matches between Table1 and Table2.
We can use such flexibility for case-insensitive matches, e.g., UPPER(Table1.KeyField) = UPPER(Table2.KeyField), where UPPER is any function provided by the relational database software to convert text to upper case.
Afternote: It has been a long time since I fired up relational database software, but from browsing online, I can't rule out the possibility that the ON clause can be used with join conditions that are more complex than simple equivalence between the "key" fields of two tables. This can lead to many-to-many matches. It would probably take me half a day to figure out the buttonology to test this on Microsoft Access or MySQL, so I'll just add this as a caveat to my comment above.
Regardless of whether the ON or WHERE clauses are used, such many-to-many joins are conceptually equivalent to matching each record of one table to every record of the other table (a so-called Cartesian join), then filtering according to the WHERE/ON conditions. The Cartesian join increases the record count immensely to MxN, where M and N are the record counts of the two joined tables. SQL engines work hard, however, using the join and filtering conditions to avoid having to actually realize the full Cartesian join.
I can see why TMW might be thinking carefully about how to support lenient joins. If they go the route of SQL, the bookkeeping needs to avoid realizing a full Cartesian join in order to be practial. This kind of optimization is highly nontrivial.

Sign in to comment.

More Answers (0)

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!