Explore1

Version 1.0.0 (1.69 MB) by ArchNW

Uses CSV based data to Generate a suite of data exploration statistics and graphs and then automatically write new data back to a CSV file.

0.0

(0)

13 Downloads

Updated 26 Sep 2023

View License

Explore1

Explore1 imports csv data, calculates a complete set of data summary statistics, basic hypothesis tests, and associated graphs. Optionally, Explore1 can write generated result tables back to a csv file.

Example 1: Output1 = explore1(test_Choice, input_Data)

Example 2: Output1 = explore1()

test_Choice – a selection from the explore1() function user input menu (see below)

input_Data – a CSV file name (see below)

Output1 – a Matlab structure containing all tables generated during a run with brief identifications

Introduction

The Explore1 functions were created with the overall goal of allowing researchers to quickly begin to understand a given dataset. Data, once collected, is often stored in one of a few basic types of files. Probably the most common of these are comma-separated values (CSV) files or, more specifically, Excel files. Similarly, in scientific research there are a number of basic statistical procedures that one should complete to describe a dataset. These basic procedures (often referred to as summary statistics) form the basis for further investigation of the data. Further, there are number of basic hypothesis tests that can be used to begin to understand the underlying structures of a dataset. Explore1 was created with these ideas in mind. Explore1 takes data stored in CSV files, calls a series of related functions performs that carry out these basic first steps of analysis. It contains functions and scripts that accomplish several tasks. These include

1) Import data from a CSV (Excel) file.

2) Calculate a selection of basic summary and normality testing statistics.

3) Perform basic hypothesis tests.

4) Write and save resulting tables to a new Excel file.

To further the overall goal reaching a firmer understanding of a dataset, Explore1 gives the user options to:

5) Generate graphs for associated statistics.

6) Calculate bootstrap confidence intervals.

In general, Explore1 will calculate summary statistics on a complete dataset imported from a CSV file. This data can be interval (measurement) data or counts of observations (which can be generated automatically from a list of observations). All statistics are organized into tables. Additionally, Explore1 will perform analysis (summary and hypothesis testing) and generate associated graphs focusing on one Grouping Variable and (optionally) one Splitting variable. These are the variables that determine how the data is summarized and tested. For example, if “Yet Another Nominal Variable” from Example 1 (see Data Structure below) was selected as the Grouping variable, summary statistics for Code1 and Code2 would be generated. For hypothesis testing, a t-test, amongst other tests, would be calculated comparing the two subdivisions. For interval-based analysis the calculations would be working with all the observations’ selected Interval Variable X’s for each Code. For count-based analysis, the observations will be transformed into count data. The original dataset will be automatically binned based on an organizing variable. Summary statistics for each Code’s counts will be generated. Hypotheses tests will then be conducted comparing counts for each Code across bins. Finally, if both Grouping and Splitting Variables have been identified, Explore1 will combine these into a new variable with all combinations of the two parent variables’ subdivisions. Analysis will then be carried out on this new hybrid variable. Upon completing calculating all statistics Explore1 can write tables containing test results and associated figures to an Excel file.

**********************************************************************************************

Statistics

Explore1 calculates statistics and generates graphs for the overall dataset, each interval measure selected, and/or for binned count data. Additionally, summary statistics are calculated for each subdivision for Grouping, Splitting, and Group-Split subdivisions. Explore1 calculates the following statistics:

1) General Summary Statistics:

a. Number of observations

b. Sum

c. Mean

d. Standard Deviation

e. Minimum

f. 25th Percentile

g. Median

h. 75th Percentile

i. Maximum

j. Standard Error

k. Variance

l. Skewness

m. Standard Error of Skewness

n. Kurtosis

o. Standard Error of Kurtosis

p. Coefficient of Variation

2) Absolute Deviation

a. Mean Absolute Deviation

b. Median Absolute Deviation

3) Normality

a. Shapiro-Wilk W

b. Shapiro-Francia W’

c. Anderson-Darling

d. Kolmogorov-Smirnov (one sample)

e. Jarque-Bera

4) Hypothesis Testing

a. T-test

b. Permutation t-test

c. Mann-Whitney U

d. Two Sample Kolmogorov Smirnov

e. Fligner-Killeen

i. Conover Variation

ii. Donnelly-Kramer Variation

f. ANOVA

g. Robust ANOVA Alternatives:

i. Bartette’s

ii. Levene Absolute

iii. Brown-Forsythe

iv. O’Brian

h. ANOVA Post-hoc

i. Tukey-Kramer

ii. Bonferonni

iii. Dunn-Sidak

iv. Scheffe

i. Kruskal-Wallis

i. Mann-Whitney Post-Hoc

5) Effect Size Measure

a. Cohen’s D

b. Glass’ Delta 1

c. Glass’ Delta 2

d. Hedges G

e. R-effect

f. Eta

g. Eta Squared

h. Omega Squared

i. Epsilon-Squared

6) Chi-Square

a. Contingency Table

i. Observed

ii. Expected

iii. Residual

iv. Standardized Residual

v. Adjusted Residual

b. Chi-Square

c. Likelihood Ratio

d. Fisher Test

e. Cramer V

f. Phi

g. Contingency Coefficient

h. Nominal Measure of Association

i. Lambda Test

ii. Goodman and Kruskal tau

iii. Uncertainty Coefficient

7) Graphs

a. Histogram

b. Histogram with fit distributions shown:

i. Normal

ii. Kernel Density

iii. Poisson

c. Normal Distribution Probability Plot

d. Quantile-Quantile Plot

e. Probability Plot for Lognormal Distribution

f. Group KDE

g. Group Bar Graph with Error Bars

h. Boxplot

Included Functions

amg2 – alternate multicompare graphs

basic_numeric_stats_v2 – summary statistics for interval based data

basic_sum_stats_v2 - summary statistics for count based data

cbs() – calls analysis and writing functions for complete dataset - counts

chi2 - calculates chi-square and related tables, figures, and statistics

chi2_2 – calculates chi-square and related tables, figures, and statistics

cnt_data_1_nomv_stats – stand-alone control script for count data a one nominal Grouping variable

cnt_data_2_nomvs_stats = stand-alone control script for count data with nominal Grouping and Splitting variables

Count_Statistics – stand-alone control script which take observation data, bins it into count data and calls summary and hypothesis testing functions

count_sum_stats_cbs – summary statistics called from cbs() function

countstats2 – main control function between explore1() and summary and hypothesis functions

explore1 – control function – select analysis variation and define input data name

FK_DK_Con – Fligner-Killeen test

group_sum_stats_v2 – summary statistics - counts

groupsplit_num_sum_v2 - summary statistics – interval

groupsplit_sum_stats_v2 - summary statistics – counts

hypoth_measures_n2_V2 – hypothesis testing – 2 subdivisions of Grouping Variable - interval

hypoth_measures_n3plus_V2 – hypothesis testing – more than 2 subdivisions – interval

hypoth_n2_V2 – hypothesis testing – 2 subdivisions of Grouping Variable – counts

hypoth_n3plus_V2 – hypothesis testing – more than 2 subdivisions – counts

ibs – calls analysis and writing functions for complete dataset - interval

inputsBoth – user input questions

Live_Counts – achieves same results as selecting “2” when prompted while using explore1()

Live_Measures – achieves same results as selecting “1” when prompted while using explore1()

Measure_Statistics – stand-alone control script – intervals - achieves same results as selecting “1” when prompted while using explore1()

measurestats2 – control function

swft – Shaprio-Wilk and Shapiro-Francia normality tests

t_perm_test – permutation test

xlgrphwrite2 – writes figures to an Excel file created by the xlwrite2 function

xlwrite2 – writes table output to an Excel file

Testing and Algorithm Selection

Throughout the programming process, test results were compared to results produced by several statistical software packages. These included SPSS, Stata, PAST, and R. In some cases, statistical software packages use slightly different algorithms to achieve the same basic ends. In a portion of those cases, the results could be slightly different. When differences presented themselves, I have generally selected the version of the test in question that seemed to be used across the most platforms. Failing this, a literature review was carried out.

Bibliography, Abridged

Most of the procedures calculated by Explore1 are fairly well documented in most basic to mid-level statistics books. Presented below is an abbreviated list of references used in selecting and refining the algorithms used throughout the functions.

Ahmad, F., & Sherwani, R. A. K. (2015). Power Comparison of Various Normality Tests. Pak.j.stat.oper.res., 11(3), 331-345.

Anderson, M. J. (2001). Permutation Tests for Univariate or Multivariate Analysis of Variance and Regression. Canadian Journal of Fisheries Aquatic Science, 58, 626-639.

Baxter, M. J., & Beardah, C. C. (1996). Beyond Histograms - Improved Approaces to Simple Data Display in Archaeology Using Kernel Density Esimates. Department f Mathematics, Statistics, and Operational Research. The Nottingham Trent University. Nottingham.

Bohn, L. L., & Wolfe, D. A. (1992). Nonparametric Two-Sample Procedures for Ranked-Set Samples Data. Journal of the American Statistical Association, 87(418), 552-561.

Borenstein, M., Hedges, L. R., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. Chichester: John Wiley and Sons, Ltd.

Brown, M. B., & Forsythe, A. B. (1974). Robust Tests for the Equality of Variances. Journal of the American Statistical Association, 69(346), 364-367.

Cameron, A. C. (2004). Kurtosis. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao (Eds.), Encyclopedia of Social Science Research Methods (pp. 544-545). Thousand Oaks: SAGE Publications, Inc.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Second Edition. United States of America: Lawrence Erlbaum Associates.

Cohen, J. (1992a). A Power Primer. Psychological Bulletin, 112(1), 155-159.

Cohen, J. (1992b). Statistical Power Analysis. Current Directions in Psychological Science, 1(3), 98-101.

Conover, W. J., Johnson, M. E., & Johnson, C. D. (1981). A Comparative Study of Tests for Homogeneity of Variances, with Applications to the OuterContinental Shelf Bidding Data. Technometrics, 23(4), 351-361.

DeCarlo, L. T. (1997). On the Meaning and Use of Kurtosis. Psychological Methods, 2(3), 292-307.

Donnelly, S. M., & Kramer, A. (1999). Testing for Multiple Species in Fossil Samples: An Evaluation and Comparison of Tests for Equal Relative Variation. American Journal of Physical Anthropology, 108, 507-529.

Drennan, R. D. (2009). Statistics for Archaeologists: A Commonsense Approach, Second Edition. Dordrecht: Springer.

Fletcher, M., & Lock, G. R. (2005). Digging Numbers : Elementary Statistics For Archaeologists (2nd ed.). Oxford : Oxford University Committee for Archaeology: Oakville, CT.

Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use, calculations, and interpretation. J Exp Psychol Gen, 141(1), 2-18. doi:10.1037/a0024338

Gan, F. F., Koehler, K. J., & Thompson, J. C. (1991). Probability Plots and Distribution Curves for Assessing the Fit of Probability Models. The American Statistician, 45(1), 14-21.

Joanes, D. N., & Gill, C. A. (1998). Comparing Measures of Sample Skewness and Kurtosis. Journal of the Royal Statistical Society. Series D (The Statistician), 41(1), 183-189.

Liebetrau, A. M. (2011). Measures of Association: SAGEE Publications, Inc.

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc, 82(4), 591-605. doi:10.1111/j.1469-185X.2007.00027.x

Ramenofsky, A. F., & Steffen, A. (1998a). Units as Tools of Measurement. In A. F. Ramenofsky & A. Steffen (Eds.), Unit Issues in Archaeology (pp. 3-18). Salt Lake: The University of Utah Press.

Ramenofsky, A. F., & Steffen, A. (Eds.). (1998b). Unit Issues in Archaeology. Salt Lake: The University of Utah Press.

Rogan, J. C., & Keselman, H. J. (1977). Is the ANOVA F-Test Robust to Variance Heterogeneity When Sample Sizes are Equal?: An Investigation via a Coefficient of Variation. American Educational Research Journal, 14, 493-498.

Rosenthal, R., & Rubin, D. B. (2003). r equivalent: A simple effect size indicator. Psychol Methods, 8(4), 492-496. doi:10.1037/1082-989X.8.4.492

Royston, J. P. (1982a). Algorithm AS 181: The W Test for Normality. Applied Statistics, 31(2), 176-180.

Royston, J. P. (1982b). An Extension of Shapiro and Wilk's W Test for Normality to Large Samples. Journal of the Royal Statistical Society. Series C (Applied Statistics), 31(2), 115-124.

Royston, J. P. (1983). A Simple Method for Evaluating the Shapiro-Francia W' Test of Non-Normality. Journal of the Royal Statistical Society. Series D (The Statistician), 32(3), 297-300.

Royston, J. P. (1991). Tests for departure from normality. Stata Technical Bulletin, 2(July), 16-17.

Ruxton, G. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test.

Behavioral Ecology, 688-690.

Shennan, S. (1997). Quantifying Archaeology. Iowa City: University of Iowa Press.

Stephens, M. A. (1970). Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables. Journal of the Royal Statistical Society. Series B (Methodological), 32(1), 115-122.

Sullivan, A. P., III, Mink, P. B., II, & Uphus, P. M. (2007). Archaeological Survey Design, Units of Observation, and the Characterization of Regional Variability. American Antiquity, 72(2), 322-333.

Tomarken, A. J., & Sterlin, R. C. (1986). Comparison of ANOVA Alternatives Under Variance Heterogeneity and Specific Noncentrality Structures. Quantitative Methods in Psychology, 99(1), 90-99.

Vargha, A., & Delany, H. D. (1998). The Kruskal-Wallis Test and Stochastic Homogeneity. Journal of Educational and Behavioral Statistics, 23(2), 170-192.

Wilcox, R. R. (1992). Why Can Methods for Comparing Means Have Relatively Low Power, and What Can You Do to Correct the Problem? Current Directions in Psychological Science, 1(3), 101-105.

Wilk, M. B., & Gnanadesikan, R. (1968). Probability Plotting Methods for the Analysis of Data. Biometrika, 55(1), 1-17.

Wilk, M. B., & Shapiro, S. S. (1965). An Analysis of Variance Test for Normaily (Complete Samples). Biometrika, 52(3/4), 591-611.

Yazici, B., & Yolacan, S. (2007). A Comparison of Various Tests of Normality. Journal of Statistical Computation and Simulation, 77(2), 175–183.

Zimmerman, D. W. (1987). Comparative Power of Student T Test and Mann-Whitney U Test for Unequal Sample Sizes and Variances. The Journal of Experimental Education, 55(3), 171-174.

Copywrite William Gardner-O'Kearny 2023

Cite As

Gardner-O'Kearny, William (2023). Explore1 (https://www.mathworks.com/matlabcentral/fileexchange/<...>), MATLAB Central File Exchange. Retrieved September 26, 2023.

MATLAB Release Compatibility

Created with R2023b

Compatible with R2020a to R2023b

Platform Compatibility

Windows macOS Linux

Tags Add Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Explore1

Version	Published	Release Notes
1.0.0	26 Sep 2023		Download

Explore1

Cite As

MATLAB Release Compatibility

Platform Compatibility

Tags Add Tags

Community Treasure Hunt

Discover Live Editor

Explore1

Explore1