Find Data in Table

Question

0 votes

I have some data that includes both numbers and text. The text is inconsistent in which column the information is in. Is it possible to sort this table in some way please?

Here is some example data. How for example would I plot the Speed & Time of Apples and of Oranges please?

----------------------------------------

ID / Speed / Time / Fruit 1 / Fruit 2 / Fruit 3

----------------------------------------

1 3 9 Apple Jam Cake

2 6 5 Cake Orange Jam

3 3 4 Jam Cake Apple

4 5 2 Jam Orange Cake

5 9 4 Jam Jam Orange

6 5 3 Pear Jam Cake

7 3 5 Cake Cake Cake

8 2 3 Jam Cake Apple

9 5 3 Jam Orange Jam

10 7 9 Jam Jam Apple

22 Comments
Show 20 older comments Hide 20 older comments

dpb on 4 May 2017

Edited: dpb on 4 May 2017

So far, you're just trying to look up data in an unnormalized database. The mushing-together of two different characteristics into a single field is the problem; 'JamApple' should be recoded in the database rather than trying to write special code to search all these different storage forms. Same thing in putting a given characteristic in different fields (columns); each field should be of a specific type and only of that type. The process of rearranging the structure of a database itself and then correcting entries in a database to follow those rules is known as "normalization".

"... how "CTRL F" works in Calc..."

I have absolutely no idea what that means??? If you're talking about a text search in some application, then that's the "solution" I spoke of before of storing everything as cellstr or string and doing character searches instead, but it's still going to be problematic if there's no consistency in the underlying database.

You'll have the same or similar problems no matter what programming language you use as long as the data are inconsistent.

OK, R2017a will support the string class; you can 'spearmint and see how well it works in a table with string searching; I'm limited to R2012b so can't try it out...see the doc for it and what string-searching functions support the new class so far and if can use it in a table.

ADDENDUM

Indeed, table does support strings and there's even a new search function for them contains that looks like would allow you to finesse your way past the dilemma by retaining the data as strings. I still wouldn't recommend this direction in general if this is going to grow into something that is really used and going to have any significant amount of effort put into it going forward; it's just going to get more and more convoluted as it evolves if so. If it's just a toy app, that's something else again, certainly, but the idea of "worth doing, worth doing well" comes to mind even there, at least to me...

dpb on 4 May 2017

Edited: dpb on 4 May 2017

Well, if you have any control at all over this third party data it would behoove you to get it cleaned up on that end first. It's one thing to have all this jumbled-up stuff while there's still such a small amount you can visually look at each and every member and recognize where there are inconsistencies and heuristically code your way around them. When it gets large enough, that will no longer be practical at best and problematic or impossible at worst.

In that case, you could theoretically even be jeopardizing your degree by not solving the data storage issues up front; at a minimum I foresee making your task much more difficult and losing much sleep over detailed coding issues that have nothing whatsoever to do with the dissertation topic but are simply tedious and time-wasting exercises rather than research.

Perhaps I'm being overly cautious here, but I'd surely advise spending some serious time considering these data and what your research is going to require to be able to reliably extract from them to accomplish the objective in some depth rather than just finding some "tricks" to work around the issues this little sample illustrate. Where there's one nit here, there are bound to be a bunch more you've not yet seem when get the whole thing.

I'd suggest doing a search on the phrase "database normalization" and reading some of the articles to at least get a general idea of what is meant and some of the issues that occur when data are unnormalized.

It may not be needed to build a full relational database but having that understanding of what one is and how having one solves many problems such as this one of having to write specialty searching expressions because the data aren't symmetric will help you understand what will be the minimum amount of such rules you'll have to impose in order to be able to retrieve the data needed.

In the above research, you'll probably also be led to "NoSQL databases" which are those which are not relational but what are "key value stores"; iow, data are stored with a key/lookup relation. With what you have here, if it is at all representative of the actual data, that may be more in the direction you want to consider in your storage scheme. As of latest releases, Matlab now supports a map class which allows for such kind of organization; you should probably look some at it and compare the difference between it and the table.

I still think, however, that the basic data as you've shown it here needs some pretty heavy manipulation to make fields and values far more consistent and less overlapping or you're likely to run into real difficulties as you proceed.

Hope that helps outline some of the issues that you can look into before getting too deeply invested in any one technique that turns out to be a dead end.

dpb on 10 May 2017

Edited: dpb on 10 May 2017

any reduces the logical array to vector T/F condition. Here, note the "_,2" optional trailing argument to work by row (2nd dimension) instead of default column; hence the result is T for each row matching the condition to select from the overall table those rows matching whatever the condition was.

Thus you get every row that has whatever 'twas were looking for in any column in the searched area. The alternative here is all which would reguire the match occur in every column of a row to return T.

Look up "logical addressing" in "Getting Started" documentation for further details and understanding of how important it is in Matlab in general and particularly for such activities as you're undertaking.

Hope the ability to rewrite the data into something more closely approximating a database is coming along so these convoluted searches can go away.

If you don't mind my asking, are these examples really representative of actual data and if so, just what is it they come from and what meaning/knowledge is to be derived therefrom? Looks like random word generator with restricted vocabulary so far... :)

Roger Cox on 18 May 2017

I'm interested in Condition Based Maitenance. This could increase the capacity factor achieved for a constant ammount of maintenance work.

As you suggest, energy storage is another interesting area.

dpb on 18 May 2017

Interesting...spent last 10 or so year of employment career with CSI (Computational Systems, Inc.)for whom predictive maintenance is core business...they were just sold to Emerson Electric when I left to return to family farm; they now are pretty-much folded into Emerson so don't show up as CSI externally any longer ... <Emerson-embeds-prediction-data> might lead to some useful white papers, etc., etc., etc., ... <machinery-health-management>

Last piece I worked on was the wireless accelerometer, an industry first at the time...

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 3 May 2017

Open in MATLAB Online

1 vote

Using categorical for the string columns has some advantages. I fixed up your listing above to be able to read as a table via

> t=readtable('cox.dat');
>> whos t
  Name       Size            Bytes  Class    Attributes
  t         10x6              2848  table              
>> t.Properties
ans = 
             Description: ''
    VariableDescriptions: {}
           VariableUnits: {}
          DimensionNames: {'Row'  'Variable'}
                UserData: []
                RowNames: {}
           VariableNames: {'ID'  'Speed'  'Time'  'Fruit1'  'Fruit2'  'Fruit3'}
>>

With that it's easy enough to write

   >> ix=any(ismember(t{:,4:6},{'Apple';'Orange'}),2);  % find any row with apple or orange
  >> t(ix,:)                                            % display those found
  ans = 
      ID    Speed    Time    Fruit1    Fruit2    Fruit3
      __    _____    ____    ______    ______    ______
       1    3        9       Apple     Jam       Cake  
       2    6        5       Cake      Orange    Jam   
       3    3        4       Jam       Cake      Apple 
       4    5        2       Jam       Orange    Cake  
       5    9        4       Jam       Jam       Orange
       8    2        3       Jam       Cake      Apple 
       9    5        3       Jam       Orange    Jam   
      10    7        9       Jam       Jam       Apple 
  >>

Use the logical index vector to access any other variable of interest.

It would be better going forward, of course, to normalize the database to put the three disparate properties in their own variable.

5 Comments
Show 3 older comments Hide 3 older comments

Peter Perkins on 8 May 2017

Open in MATLAB Online

When accessing one variable at a time, Obi Wan might also suggest using the dot:

plot(TT3.Speed(OR),TT3.Time(OR),'o',TT3.Speed(AP),TT3.Time(AP),'o')

But even with curlies, names are nice:

plot(TT3{OR,'Speed'},TT3{OR,'Time'},'o',TT3{AP,'Speed'},TT3{AP,'Speed'},'o')

dpb on 9 May 2017

Ever so, may be it can... :)

Sign in to comment.

Find Data in Table

22 Comments
Show 20 older comments Hide 20 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Find Data in Table

22 Comments Show 20 older comments Hide 20 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

22 Comments
Show 20 older comments Hide 20 older comments

5 Comments
Show 3 older comments Hide 3 older comments