Statistics and Machine Learning Toolbox Example Data Sets
Statistics and Machine Learning Toolbox™ includes a variety of data sets with different file formats and sizes. These data sets are used in documentation examples to demonstrate software capabilities. This topic summarizes and describes some of the available data sets, but is not a comprehensive list.
Data Sets Available with Product Installation
This list describes data sets available when you install Statistics and Machine Learning Toolbox. The File Contents column displays the output of the
whos
command, which you can enter after you load the file
into the workspace.
Filename | Description | How to Load | File Contents |
---|---|---|---|
acetylene.mat | Chemical reaction data with correlated predictors |
load acetylene.mat | Name Size Bytes Class Attributes Description 16x105 3360 char x1 16x1 128 double x2 16x1 128 double x3 16x1 128 double y 16x1 128 double Description variable. |
carbig.mat | Measurements of cars from 1970–1982 |
load carbig.mat |
Name Size Bytes Class Attributes Acceleration 406x1 3248 double Cylinders 406x1 3248 double Displacement 406x1 3248 double Horsepower 406x1 3248 double MPG 406x1 3248 double Mfg 406x13 10556 char Model 406x36 29232 char Model_Year 406x1 3248 double Origin 406x7 5684 char Weight 406x1 3248 double cyl4 406x5 4060 char org 406x7 5684 char when 406x5 4060 char |
carsmall.mat | Subset of |
load carsmall.mat |
Name Size Bytes Class Attributes Acceleration 100x1 800 double Cylinders 100x1 800 double Displacement 100x1 800 double Horsepower 100x1 800 double MPG 100x1 800 double Mfg 100x13 2600 char Model 100x33 6600 char Model_Year 100x1 800 double Origin 100x7 1400 char Weight 100x1 800 double |
census1994.mat | US Census Bureau demographic data from the UCI machine learning repository |
load census1994.mat | Name Size Bytes Class Attributes Description 20x74 2960 char adultdata 32561x15 1872566 table adulttest 16281x15 944466 table Description variable. |
cereal.mat | Breakfast cereal ingredients |
load cereal.mat |
Name Size Bytes Class Attributes Calories 77x1 616 double Carbo 77x1 616 double Cups 77x1 616 double Fat 77x1 616 double Fiber 77x1 616 double Mfg 77x1 154 char Name 77x1 10288 cell Potass 77x1 616 double Protein 77x1 616 double Shelf 77x1 616 double Sodium 77x1 616 double Sugars 77x1 616 double Type 77x1 616 double Variables 15x2 4134 cell Vitamins 77x1 616 double Weight 77x1 616 double |
cities.mat | Quality-of-life ratings for US metropolitan areas, given in [4] |
load cities.mat |
Name Size Bytes Class Attributes categories 9x14 252 char names 329x43 28294 char ratings 329x9 23688 double |
discrim.mat | A version of |
load discrim.mat |
Name Size Bytes Class Attributes big 26x43 2236 char categories 9x14 252 char group 329x1 2632 double idx 26x1 208 double names 329x43 28294 char ratings 329x9 23688 double |
examgrades.mat | Exam grades on a scale of 0–100 |
load examgrades.mat |
Name Size Bytes Class Attributes grades 120x5 4800 double |
fisheriris.mat or
fisheriris.csv | Fisher's 1936 iris data |
load fisheriris.mat |
Name Size Bytes Class Attributes meas 150x4 4800 double species 150x1 18100 cell |
fisheriris = readtable("fisheriris.csv"); |
Name Size Bytes Class Attributes fisheriris 150x5 24805 table | ||
flu.mat | ILI (influenza-like illness) percentage estimated by Google Flu Trends for various regions of the US, and the CDC-weighted ILI percentage based on sentinel provider reports |
load flu.mat | Name Size Bytes Class Attributes Description 1x306 612 char flu 52x11 14640 dataset Description variable. |
gas.mat | Gasoline prices in the state of Massachusetts in 1993 |
load gas.mat |
Name Size Bytes Class Attributes price1 20x1 160 double price2 20x1 160 double |
hald.mat | Heat of cement vs. mix of ingredients |
load hald.mat | Name Size Bytes Class Attributes Description 22x58 2552 char hald 13x5 520 double heat 13x1 104 double ingredients 13x4 416 double Description variable. |
hogg.mat | Bacteria counts in different shipments of milk |
load hogg.mat |
Name Size Bytes Class Attributes hogg 6x5 240 double x1 6x1 48 double x2 6x1 48 double x3 6x1 48 double x4 6x1 48 double x5 6x1 48 double |
hospital.xls or
hospital.mat | Simulated hospital data |
hospital = readtable("hospital.xls"); |
Name Size Bytes Class Attributes hospital 100x12 44579 table |
load hospital.mat | Name Size Bytes Class Attributes Description 1x23 46 char hospital 100x7 43784 dataset Description variable. | ||
imports-85.mat | 1985 Auto Imports Database from the UCI machine learning repository |
load imports-85.mat | Name Size Bytes Class Attributes Description 9x79 1422 char X 205x26 42640 double Description variable. |
indomethacin.mat | Concentrations of the drug indomethacin in the bloodstream of 6 subjects over 8 hours |
load indomethacin.mat | Name Size Bytes Class Attributes Description 14x50 1400 char concentration 66x1 528 double subject 66x1 528 double time 66x1 528 double Description variable. |
ionosphere.mat | Ionosphere data set from the UCI machine learning repository |
load ionosphere.mat | Name Size Bytes Class Attributes Description 5x79 790 char X 351x34 95472 double Y 351x1 37206 cell Description variable. |
kmeansdata.mat | Four-dimensional clustered data |
load kmeansdata.mat |
Name Size Bytes Class Attributes X 560x4 17920 double |
lawdata.mat | Grade point averages and LSAT scores from 15 law schools |
load lawdata.mat |
Name Size Bytes Class Attributes gpa 15x1 120 double lsat 15x1 120 double |
mileage.mat | Mileage data for three car models from two factories |
load mileage.mat |
Name Size Bytes Class Attributes mileage 6x3 144 double |
moore.mat | Biochemical oxygen demand on five predictors |
load moore.mat |
Name Size Bytes Class Attributes moore 20x6 960 double |
morse.mat | Recognition of Morse code distinctions by non-coders |
load morse.mat |
Name Size Bytes Class Attributes Y0 36x8 2304 double dissimilarities 1x630 5040 double morseChars 36x2 7824 cell |
parts.mat | Dimensional run-out on 36 circular parts |
load parts.mat |
Name Size Bytes Class Attributes runout 36x4 1152 double |
polydata.mat | Sample data for polynomial fitting |
load polydata.mat |
Name Size Bytes Class Attributes x 1x43 344 double x1 1x101 808 double y 1x43 344 double y1 1x101 808 double |
popcorn.mat | Popcorn yield by popper type and brand |
load popcorn.mat |
Name Size Bytes Class Attributes popcorn 6x3 144 double |
reaction.mat | Reaction kinetics for Hougen-Watson model |
load reaction.mat |
Name Size Bytes Class Attributes beta 5x1 40 double model 1x6 12 char rate 13x1 104 double reactants 13x3 312 double xn 3x10 60 char yn 1x13 26 char |
repeatedmeas.mat | Simulated repeated measures data |
load repeatedmeas.mat |
Name Size Bytes Class Attributes between 30x12 6415 table within 8x2 1863 table |
stockreturns.mat | Simulated stock returns |
load stockreturns.mat |
Name Size Bytes Class Attributes stocks 100x10 8000 double |
Data Sets Available with Specific Examples
This list describes some of the data sets available when you open specific
Statistics and Machine Learning Toolbox examples. The list is not comprehensive. The File Contents column
displays the output of the whos
command, which you can enter
after you load the file into the workspace.
Filename | Description | How to Load | File Contents |
---|---|---|---|
arrhythmia.mat | Patient information and response variables that indicate the presence or absence of cardiac arrhythmia |
openExample("arrhythmia.mat") load arrhythmia.mat | Name Size Bytes Class Attributes Description 8x69 1104 char VarNames 1x279 41570 cell X 452x279 1008864 double Y 452x1 3616 double Description variable. |
batterysmall.mat | Sensor data (voltage, current, and temperature) and state of charge for a Li-ion battery; a subset of the data in [1] |
openExample("batterysmall.mat") load batterysmall.mat |
Name Size Bytes Class Attributes dataLarge 1x1 1886400 struct testDataSmall 1319x6 65361 table trainDataSmall 6773x6 327153 table |
CreditRating_Historical.dat | Financial ratios, industry sectors, and credit ratings for a list of corporate customers |
openExample("CreditRating_Historical.dat") creditrating = readtable("CreditRating_Historical.dat"); |
Name Size Bytes Class Attributes creditrating 3932x8 649029 table |
humanactivity.mat | Human activity recognition data for five activities: sitting, standing, walking, running, and dancing |
openExample("humanactivity.mat") load humanactivity.mat | Name Size Bytes Class Attributes Description 29x1 5918 string actid 24075x1 192600 double actnames 1x5 592 cell feat 24075x60 11556000 double featlabels 60x1 8292 cell Description variable. |
nlpdata.mat | Natural language processing data extracted from the bat365® documentation |
openExample("nlpdata.mat") load nlpdata.mat | Name Size Bytes Class Attributes Description 26x68 3536 char X 31572x34023 36716304 double sparse Y 31572x1 33094 categorical corpus 31572x1 6149252 cell dictionary 34023x1 4137912 cell Description variable. |
NYCHousing2015.mat | Information on the sales of properties in New York City in 2015 |
openExample("NYCHousing2015.mat") load NYCHousing2015.mat |
Name Size Bytes Class Attributes NYCHousing2015 91446x10 32103067 table |
ovariancancer.mat | Grouped observations on 4000 predictors for ovarian cancer, given in [2] and [3] |
openExample("ovariancancer.mat") load ovariancancer.mat |
Name Size Bytes Class Attributes grp 216x1 25056 cell obs 216x4000 3456000 single |
spectra.mat | NIR spectra and octane numbers for 60 gasoline samples |
openExample("spectra.mat") load spectra.mat | Name Size Bytes Class Attributes Description 11x72 1584 char NIR 60x401 192480 double octane 60x1 480 double spectra 60x2 195660 dataset Description variable. |
References
[1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. "LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script." Mendeley 3 (March 2020). https://doi.org/10.17632/CP3473X7XV.3.
[2] Conrads, Thomas P., Vincent A. Fusaro, Sally Ross, Don Johann, Vinodh Rajapakse, Ben A. Hitt, Seth M. Steinberg, et al. "High-Resolution Serum Proteomic Features for Ovarian Cancer Detection." Endocrine-Related Cancer 11 (2004): 163–78.
[3] Petricoin, Emanuel F., Ali M. Ardekani, Ben A. Hitt, Peter J. Levine, Vincent A. Fusaro, Seth M. Steinberg, Gordon B. Mills, et al. “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet 359, no. 9306 (February 2002): 572–77.
[4] Boyer, Richard and Savageau, David. Rand McNally Places Rated Almanac. Rand McNally & Company, 1985.