mapreduce
Programming technique for analyzing data sets that do not fit in memory
Syntax
Description
optionally
specifies the run-time configuration settings for outds
= mapreduce(ds
,mapfun
,reducefun
,mr
)mapreduce
.
The mr
input is the result of a call to the mapreducer
function.
Typically, this argument is used with Parallel Computing Toolbox™, MATLAB®
Parallel Server™,
or MATLAB
Compiler™. For more information, see Speed Up and Deploy MapReduce Using Other Products.
specifies
additional options with one or more outds
= mapreduce(___,Name,Value
)Name,Value
pair
arguments using any of the previous syntaxes. For example, you can
specify 'OutputFolder'
followed by a character
vector specifying a path to the output folder.
Examples
Input Arguments
Output Arguments
Tips
Debugging your
mapreduce
algorithms to examine how key-value pairs move through the different phases is always useful. To examine the movement of data, set breakpoints in your map and reduce functions. The breakpoints stop execution ofmapreduce
, allowing you to examine the current status of relevant variables, like theKeyValueStore
orValueIterator
. For more information, see Debug MapReduce Algorithms.Some recommendations to optimize
mapreduce
performance on any platform are:Minimize the number of calls to the map function. The easiest approach is to increase the value of the
ReadSize
property of the input datastore. The result is thatmapreduce
passes larger blocks of data to the map function, and the datastore depletes with fewer reads.Decrease the amount of intermediate data sent between map and reduce functions. One approach is to use
unique
inside a map function to combine similar keys. See Compute Mean by Group Using MapReduce for an example of this technique.
Extended Capabilities
Version History
Introduced in R2014b