Compute by Group
Description
The Compute by Group task lets you interactively group data and compute summary statistics, perform transformations, or apply filters for each group. The task automatically generates MATLAB® code for your live script.
Using this task, you can:
Define groups of data in an array, table, or timetable.
Summarize, transform, or filter the data based on each grouping.
Output a new table or timetable with the results of the computation.
Open the Task
To add the Compute by Group task to a live script in the MATLAB Editor:
On the Live Editor tab, select Task > Compute by Group.
In a code block in the script, type a relevant keyword, such as
group
. Select Compute by Group from the suggested command completions.
Examples
Compute Statistics by Group
Summarize data by interactively grouping the data, specifying variables to operate on, and computing statistics using the Compute by Group task in the Live Editor.
Create a timetable using the sample file outages.csv
. The file contains six columns of data representing electric utility outages. Convert the Region
and Cause
column-oriented variables to categorical arrays and display the timetable.
outages = readtimetable("outages.csv");
outages.Region = categorical(outages.Region);
outages.Cause = categorical(outages.Cause)
outages=1468×5 timetable
OutageTime Region Loss Customers RestorationTime Cause
____________________ _________ ______ __________ ____________________ _______________
01-Feb-2002 12:18:00 SouthWest 458.98 1.8202e+06 07-Feb-2002 16:50:00 winter storm
23-Jan-2003 00:49:00 SouthEast 530.14 2.1204e+05 NaT winter storm
07-Feb-2003 21:15:00 SouthEast 289.4 1.4294e+05 17-Feb-2003 08:14:00 winter storm
06-Apr-2004 05:44:00 West 434.81 3.4037e+05 06-Apr-2004 06:10:00 equipment fault
16-Mar-2002 06:18:00 MidWest 186.44 2.1275e+05 18-Mar-2002 23:23:00 severe storm
18-Jun-2003 02:49:00 West 0 0 18-Jun-2003 10:54:00 attack
20-Jun-2004 14:39:00 West 231.29 NaN 20-Jun-2004 19:16:00 equipment fault
06-Jun-2002 19:28:00 West 311.86 NaN 07-Jun-2002 00:51:00 equipment fault
16-Jul-2003 16:23:00 NorthEast 239.93 49434 17-Jul-2003 01:12:00 fire
27-Sep-2004 11:09:00 MidWest 286.72 66104 27-Sep-2004 16:37:00 equipment fault
05-Sep-2004 17:48:00 SouthEast 73.387 36073 05-Sep-2004 20:46:00 equipment fault
21-May-2004 21:45:00 West 159.99 NaN 22-May-2004 04:23:00 equipment fault
01-Sep-2002 18:22:00 SouthEast 95.917 36759 01-Sep-2002 19:12:00 severe storm
27-Sep-2003 07:32:00 SouthEast NaN 3.5517e+05 04-Oct-2003 07:02:00 severe storm
12-Nov-2003 06:12:00 West 254.09 9.2429e+05 17-Nov-2003 02:04:00 winter storm
18-Sep-2004 05:54:00 NorthEast 0 0 NaT equipment fault
⋮
Open the Compute by Group task in the Live Editor. To group the data by the five regions where the outages occurred, select outages
as the input data and group by unique values of the Region
variable. Then, compute on the Loss
and Customers
variables by selecting All numeric variables in the Compute on field.
The Compute by Group task can perform three different types of computations for groups. To summarize the outage data, set the computation type to Compute stats by group. Then, to compute the mean and maximum values for the numeric variables Loss
and Customers
, use the Computations per group field to select the Mean and Maximum methods.
The resulting timetable contains the group observation count, mean power loss, maximum power loss, mean number of affected customers, and maximum number of affected customers for the outages in each region.
outageStats=5×6 table
Region GroupCount mean_Loss max_Loss mean_Customers max_Customers
_________ __________ _________ ________ ______________ _____________
MidWest 142 1137.7 23141 2.4015e+05 3.972e+06
NorthEast 557 551.65 23418 1.4917e+05 5.9689e+06
SouthEast 389 495.35 8767.3 1.6776e+05 2.2249e+06
SouthWest 26 493.88 2796 2.6975e+05 1.8202e+06
West 354 433.37 16659 1.5201e+05 4.26e+06
Transform by Group
Improve the interpretability or appearance of data by interactively grouping data, specifying variables to operate on, and applying a transformation operation using the Compute by Group task in the Live Editor.
Create a timetable using the sample file outages.csv
. The file contains six columns of data representing electric utility outages. Convert the Region
and Cause
column-oriented variables to categorical arrays and display the timetable.
outages = readtimetable("outages.csv");
outages.Region = categorical(outages.Region);
outages.Cause = categorical(outages.Cause)
outages=1468×5 timetable
OutageTime Region Loss Customers RestorationTime Cause
____________________ _________ ______ __________ ____________________ _______________
01-Feb-2002 12:18:00 SouthWest 458.98 1.8202e+06 07-Feb-2002 16:50:00 winter storm
23-Jan-2003 00:49:00 SouthEast 530.14 2.1204e+05 NaT winter storm
07-Feb-2003 21:15:00 SouthEast 289.4 1.4294e+05 17-Feb-2003 08:14:00 winter storm
06-Apr-2004 05:44:00 West 434.81 3.4037e+05 06-Apr-2004 06:10:00 equipment fault
16-Mar-2002 06:18:00 MidWest 186.44 2.1275e+05 18-Mar-2002 23:23:00 severe storm
18-Jun-2003 02:49:00 West 0 0 18-Jun-2003 10:54:00 attack
20-Jun-2004 14:39:00 West 231.29 NaN 20-Jun-2004 19:16:00 equipment fault
06-Jun-2002 19:28:00 West 311.86 NaN 07-Jun-2002 00:51:00 equipment fault
16-Jul-2003 16:23:00 NorthEast 239.93 49434 17-Jul-2003 01:12:00 fire
27-Sep-2004 11:09:00 MidWest 286.72 66104 27-Sep-2004 16:37:00 equipment fault
05-Sep-2004 17:48:00 SouthEast 73.387 36073 05-Sep-2004 20:46:00 equipment fault
21-May-2004 21:45:00 West 159.99 NaN 22-May-2004 04:23:00 equipment fault
01-Sep-2002 18:22:00 SouthEast 95.917 36759 01-Sep-2002 19:12:00 severe storm
27-Sep-2003 07:32:00 SouthEast NaN 3.5517e+05 04-Oct-2003 07:02:00 severe storm
12-Nov-2003 06:12:00 West 254.09 9.2429e+05 17-Nov-2003 02:04:00 winter storm
18-Sep-2004 05:54:00 NorthEast 0 0 NaT equipment fault
⋮
Open the Compute by Group task in the Live Editor. To group the data by the ten causes by which the outages occurred, select outages
as the input data and group by unique values of the Cause
variable. Then, set Compute on as the Loss
variable.
The Compute by Group task can perform three different types of computations for groups. To transform the outage data, set the computation type to Transform by group. Then, to fill missing power loss values, set Computation per group as the Fill missing with group mean method.
The resulting timetable contains the outage data with missing power loss replaced with the mean power loss for outages with the same cause.
outageTransform=1468×5 timetable
OutageTime Region Loss Customers RestorationTime Cause
____________________ _________ ______ __________ ____________________ _______________
01-Feb-2002 12:18:00 SouthWest 458.98 1.8202e+06 07-Feb-2002 16:50:00 winter storm
23-Jan-2003 00:49:00 SouthEast 530.14 2.1204e+05 NaT winter storm
07-Feb-2003 21:15:00 SouthEast 289.4 1.4294e+05 17-Feb-2003 08:14:00 winter storm
06-Apr-2004 05:44:00 West 434.81 3.4037e+05 06-Apr-2004 06:10:00 equipment fault
16-Mar-2002 06:18:00 MidWest 186.44 2.1275e+05 18-Mar-2002 23:23:00 severe storm
18-Jun-2003 02:49:00 West 0 0 18-Jun-2003 10:54:00 attack
20-Jun-2004 14:39:00 West 231.29 NaN 20-Jun-2004 19:16:00 equipment fault
06-Jun-2002 19:28:00 West 311.86 NaN 07-Jun-2002 00:51:00 equipment fault
16-Jul-2003 16:23:00 NorthEast 239.93 49434 17-Jul-2003 01:12:00 fire
27-Sep-2004 11:09:00 MidWest 286.72 66104 27-Sep-2004 16:37:00 equipment fault
05-Sep-2004 17:48:00 SouthEast 73.387 36073 05-Sep-2004 20:46:00 equipment fault
21-May-2004 21:45:00 West 159.99 NaN 22-May-2004 04:23:00 equipment fault
01-Sep-2002 18:22:00 SouthEast 95.917 36759 01-Sep-2002 19:12:00 severe storm
27-Sep-2003 07:32:00 SouthEast 697.41 3.5517e+05 04-Oct-2003 07:02:00 severe storm
12-Nov-2003 06:12:00 West 254.09 9.2429e+05 17-Nov-2003 02:04:00 winter storm
18-Sep-2004 05:54:00 NorthEast 0 0 NaT equipment fault
⋮
Filter by Group
Focus on specific information in a data set by interactively grouping data, specifying variables to operate on, and applying a group filter with Compute by Group.
Create a timetable using the sample file outages.csv
. The file contains six columns of data representing electric utility outages. Convert the Region
and Cause
column-oriented variables to categorical arrays and display the timetable.
outages = readtimetable("outages.csv");
outages.Region = categorical(outages.Region);
outages.Cause = categorical(outages.Cause)
outages=1468×5 timetable
OutageTime Region Loss Customers RestorationTime Cause
____________________ _________ ______ __________ ____________________ _______________
01-Feb-2002 12:18:00 SouthWest 458.98 1.8202e+06 07-Feb-2002 16:50:00 winter storm
23-Jan-2003 00:49:00 SouthEast 530.14 2.1204e+05 NaT winter storm
07-Feb-2003 21:15:00 SouthEast 289.4 1.4294e+05 17-Feb-2003 08:14:00 winter storm
06-Apr-2004 05:44:00 West 434.81 3.4037e+05 06-Apr-2004 06:10:00 equipment fault
16-Mar-2002 06:18:00 MidWest 186.44 2.1275e+05 18-Mar-2002 23:23:00 severe storm
18-Jun-2003 02:49:00 West 0 0 18-Jun-2003 10:54:00 attack
20-Jun-2004 14:39:00 West 231.29 NaN 20-Jun-2004 19:16:00 equipment fault
06-Jun-2002 19:28:00 West 311.86 NaN 07-Jun-2002 00:51:00 equipment fault
16-Jul-2003 16:23:00 NorthEast 239.93 49434 17-Jul-2003 01:12:00 fire
27-Sep-2004 11:09:00 MidWest 286.72 66104 27-Sep-2004 16:37:00 equipment fault
05-Sep-2004 17:48:00 SouthEast 73.387 36073 05-Sep-2004 20:46:00 equipment fault
21-May-2004 21:45:00 West 159.99 NaN 22-May-2004 04:23:00 equipment fault
01-Sep-2002 18:22:00 SouthEast 95.917 36759 01-Sep-2002 19:12:00 severe storm
27-Sep-2003 07:32:00 SouthEast NaN 3.5517e+05 04-Oct-2003 07:02:00 severe storm
12-Nov-2003 06:12:00 West 254.09 9.2429e+05 17-Nov-2003 02:04:00 winter storm
18-Sep-2004 05:54:00 NorthEast 0 0 NaT equipment fault
⋮
Open the Compute by Group task in the Live Editor. To group the data by the year and region in which the outages occurred, use Group by to bin the OutageTime
variable by year and group the Region
variable by unique values. Then, compute on the power loss by selecting the Loss
variable in the Compute on field.
The Compute by Group task can perform three different types of computations for groups. To filter the outage data, set the computation type to Filter by group. Then, set Computation per group as a new local function and customize the filter by writing a function which gives a true result for the outlier data to keep and a false result for non-outlier data to be filtered out.
The resulting timetable contains only outlier outage data, where the power loss is outside of three standard deviations from the mean of the losses for the year and region.
outageFilter=159×6 timetable
OutageTime Region Loss Customers RestorationTime Cause year_OutageTime
____________________ _________ ______ __________ ____________________ _______________ _______________
06-Apr-2004 05:44:00 West 434.81 3.4037e+05 06-Apr-2004 06:10:00 equipment fault 2004
06-Jun-2002 19:28:00 West 311.86 NaN 07-Jun-2002 00:51:00 equipment fault 2002
08-Mar-2005 16:37:00 SouthEast 1339.2 4.3003e+05 10-Mar-2005 20:42:00 winter storm 2005
02-Jul-2004 09:16:00 MidWest 15128 2.0104e+05 06-Jul-2004 14:11:00 thunder storm 2004
20-Apr-2002 16:46:00 MidWest 23141 NaN NaT unknown 2002
10-Dec-2002 10:45:00 MidWest 14493 3.0879e+06 11-Dec-2002 18:06:00 unknown 2002
18-May-2002 11:04:00 MidWest 1389.1 1.3447e+05 21-May-2002 01:22:00 unknown 2002
22-Sep-2003 00:53:00 MidWest 3995.8 6.7808e+05 23-Sep-2003 03:45:00 unknown 2003
05-Nov-2005 12:46:00 NorthEast 2966.1 NaN 06-Nov-2005 21:40:00 unknown 2005
17-Aug-2002 09:05:00 NorthEast 21673 NaN 19-Aug-2002 21:45:00 unknown 2002
16-Sep-2004 19:42:00 NorthEast 4718 NaN NaT unknown 2004
20-May-2002 10:57:00 NorthEast 9116.6 2.4983e+06 21-May-2002 15:22:00 unknown 2002
05-Sep-2003 20:15:00 SouthEast 1700.1 1.6393e+05 10-Sep-2003 19:59:00 thunder storm 2003
20-Sep-2004 12:37:00 SouthEast 8767.3 2.2249e+06 02-Oct-2004 06:00:00 severe storm 2004
14-Sep-2005 15:45:00 SouthEast 1839.2 3.4144e+05 NaT severe storm 2005
14-Sep-2003 16:09:00 NorthEast 2011.3 6.9368e+05 24-Sep-2003 07:44:00 severe storm 2003
⋮
function tf = myFilterFcn(x) % x is the data in a group from one computation variable % tf is true, false, or a logical column vector with the same height as x tf = isoutlier(x); end
Related Examples
Parameters
Input data
— Valid grouping data from workspace
vector | matrix | table | timetable
Specify groups by selecting valid workspace grouping variables from the Group by drop-down list. When the data is contained in a table or timetable, additionally select the table variables to group by. You can group by unique values or specify how to bin the data.
From the Compute on drop-down list, select the workspace data to compute on. When the data is contained in a table or timetable, select the table variables to compute on.
Computation for groups
— Type of computation to perform
Compute stats by group
| Transform by group
| Filter by group
Select one of these computation options:
Computation Type | Description |
---|---|
Compute stats by group | A summary (or aggregate) of data, such as a mean or maximum. You can also
supply a custom function by providing a local function name or a function
handle. The function must return one entity per group whose first dimension
has length 1. For more information, see groupsummary . |
Transform by group | Transform the data, for example, scale the data by the 2-norm or fill
missing data. You can also supply a custom function by providing a local
function name or a function handle. The function must return one entity whose
first dimension has length 1 or has the same number of rows as the input data.
For more information, see grouptransform . |
Filter by group | Filter members from each group by providing a local function or function
handle that defines the filtering computation. The function must return a
logical scalar or a logical column vector with the same number of rows as the
data indicating which group members to select. If the function returns a
logical scalar, then either all members of the group are filtered (when the
value is false ) or none are (when the value is
true ). If the function returns a logical vector, then
members of groups are filtered when the corresponding element is
false . Members are kept when the corresponding element is
true . For more information, see groupfilter . |
For all computation types, you can click New to create a new function in the Live Script that defines the computation. Clicking New automatically inserts an example function into the Live Script that uses the appropriate syntax for the selected computation type. If you change the name of the example function, to use the new function name, reselect the method from the drop-down list in the live task.
Version History
Introduced in R2021bR2023a: Return the number of unique elements
Compute the number of distinct nonmissing elements in each group of data. Select Compute stats by group, and then specify the Number of unique values or Select all computation method.
R2022a: Live Editor task does not run automatically if inputs have more than 1 million rows
This Live Editor task does not run automatically if the inputs have more than 1 million rows. In previous releases, the task always ran automatically for inputs of any size. If the inputs have a large number of rows, then the code generated by this task can take a noticeable amount of time to run (more than a few seconds).
When a task does not run automatically, the Autorun indicator is disabled. You can either run the task manually when needed or choose to enable the task to run automatically.
See Also
Functions
Live Editor Tasks
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other bat365 country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)