APACHE Spark | Big Policy Canvas

Apache Spark™ is a unified analytics engine for large-scale data processing.

Type of content: Assets

Type of asset:

Application / Tool

Big data potential

Yes

Policy domains: Innovation, Science & Technology

Phase in the policy cycle:

Policy Design and Analysis

TRL

Open license availability

Yes

Website

https://spark.apache.org/

Serves:

Deeper understanding of IT potential and IT processes

Addresses:

SWOT Analysis for APACHE Spark
	Helpful	Harmful
Internal	Strengths• Run workloads faster: achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine • Write applications quickly in Java, Scala, Python, R, and SQL. • Combine SQL, streaming, and complex analytics. • Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. • Used at a wide range of organizations to process large datasets.	Weaknesses• No File Management - Apache Spark does not have its own file management system, thus it relies on some other platform like Hadoop or another cloud-based platform which is one of the Spark known issues • No Support for Real-time Processing • Problem with Small File • If we use Spark with Hadoop, we come across a problem of a small file. HDFS provides a limited number of large files rather than a large number of small files. • Back pressure is build up of data at an input-output when the buffer is full and not able to receive the additional incoming data. No data is transferred until the buffer is empty. Apache Spark is not capable of handling pressure implicitly rather it is done manually. • Memory management
External	Opportunities• Simplify the challenging and compute-intensive task of processing high volumes of data • Real time data processing • Seamlessly integrating complex capabilities such as machine learning and graph algorithms	Threats• There are various technologies that are overtaking Spark • In-memory processing is expensive when we look for a cost-efficient processing of big data

Open data - Download the Knowledge base

You are free to download the data of this Knowledge base.

To do this you must be an authenticated user: log in or sign in now.

All the data are licensed as Creative Common CC-BY 4.0.