Apache Spark™ is a unified analytics engine for large-scale data processing.

Type of content: Assets
Type of asset:
Application / Tool
Big data potential
Policy domains: Innovation, Science & Technology
Phase in the policy cycle:
Policy Design and Analysis
Open license availability
SWOT Analysis for
Helpful Harmful
Strengths• Run workloads faster: achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine
• Write applications quickly in Java, Scala, Python, R, and SQL.
• Combine SQL, streaming, and complex analytics.
• Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
• Used at a wide range of organizations to process large datasets.
Weaknesses• No File Management - Apache Spark does not have its own file management system, thus it relies on some other platform like Hadoop or another cloud-based platform which is one of the Spark known issues
• No Support for Real-time Processing
• Problem with Small File
• If we use Spark with Hadoop, we come across a problem of a small file. HDFS provides a limited number of large files rather than a large number of small files.
• Back pressure is build up of data at an input-output when the buffer is full and not able to receive the additional incoming data. No data is transferred until the buffer is empty. Apache Spark is not capable of handling pressure implicitly rather it is done manually.
• Memory management
Opportunities• Simplify the challenging and compute-intensive task of processing high volumes of data
• Real time data processing
• Seamlessly integrating complex capabilities such as machine learning and graph algorithms
Threats• There are various technologies that are overtaking Spark
• In-memory processing is expensive when we look for a cost-efficient processing of big data

Open data - Download the Knowledge base

You are free to download the data of this Knowledge base.

To do this you must be an authenticated user: log in or sign in now.

All the data are licensed as Creative Common CC-BY 4.0.