Sunday, December 25, 2022

OCI Data Flow service and upyter notebooks

 Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets without infrastructure to deploy or manage. This configuration enables rapid application delivery because developers can focus on app development, not infrastructure management.

OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis. With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects. OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.

Web console-based fully managed Jupyter notebooks: With Data Flow notebooks, you can develop big data analytics and data science applications in Python, Scala, and PySpark with fully managed OCI Data Science powered Jupyter notebook sessions. You can take advantage of distributed processing using OCI Data Flow-powered Apache Spark with Jupyter kernels and applications running on OCI Data Flow clusters.

OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.

With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.

No comments: