MATLAB simplifies working with big data by accessing and integrating with your existing big data storage and adapts to your data processing needs based on available resources.
With MATLAB, you can:
- Access big data from various storages such as traditional file systems, cloud-based storages (AWS® S3, Azure® Blob), SQL and NoSQL databases, and data platforms
- Clean, analyze, and develop machine learning models on smaller sample data
- Scale up and apply the same code on big data without rewriting your algorithms
- Use processing power tailored to your needs, from your local machine, traditional HPC clusters, Spark™ clusters, and cloud data platforms
“High-performance computing with MATLAB enables us to process previously unanalyzed big data. We translate what we learn into an understanding of how human activities affect the health of ecosystems to inform responsible decisions about what humans do in the ocean and on land.”
Dr. Christopher Clark, Cornell University
Using MATLAB and Simulink for Big Data
Access Data
You can use MATLAB to read data from large collections of files, databases, data platforms, and cloud storage systems. Datastores in MATLAB let you access data that do not fit into the memory of a single computer or are distributed across multiple files. These datastores support various file formats (CSV, Parquet, MDF etc.) and storage systems (AWS S3, Azure Blob, HDFS, databases, data platforms). You can also create your own datastores for custom file formats.
Learn More
- Images
- Parquet and Avro files
- Tabular text, CSV, and spreadsheets
- MDF files
- Databases (SQL, NoSQL)
- Databricks, Domino Data Lab, and Cloudera®
Explore, Clean, Transform, and Develop Predictive Models
With MATLAB, you can perform data analysis and data engineering on big data efficiently. MATLAB supports predicate pushdown for Parquet files, so you can filter big data at the source. Once read, you can transform and combine data from different datastores for preprocessing and data engineering.
MATLAB tall arrays use a lazy evaluation framework, which lets you run in-memory table and timetable-based code on big data without rewriting. Tall arrays support hundreds of data manipulation, mathematical, statistical, and machine learning functions, which you can use for simple statistical analysis or developing predictive models on big data.
Learn More
- Tall Arrays
- Add two lines to your MATLAB code to make it work with Big Data (Blog)
- Transform and combine datastores
Integrate and Run on Your Big Data IT Infrastructure
MATLAB can help you process big data efficiently by integrating it with your existing infrastructure. You can scale up and run your MATLAB code interactively using parallel processing as well as in deployed production mode. You can deploy analytics in streaming, and batch applications royalty-free. Also, you can run your MATLAB code and models with big data on different cloud data platforms like Databricks, Domino Data Lab, and Google® BigQuery.