Apache Spark™ is a unified analytics engine for large-scale data processing.
|SWOT Analysis for
|Strengths• Run workloads faster: achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine
• Write applications quickly in Java, Scala, Python, R, and SQL.
• Combine SQL, streaming, and complex analytics.
• Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
• Used at a wide range of organizations to process large datasets.
|Weaknesses• No File Management - Apache Spark does not have its own file management system, thus it relies on some other platform like Hadoop or another cloud-based platform which is one of the Spark known issues
• No Support for Real-time Processing
• Problem with Small File
• If we use Spark with Hadoop, we come across a problem of a small file. HDFS provides a limited number of large files rather than a large number of small files.
• Back pressure is build up of data at an input-output when the buffer is full and not able to receive the additional incoming data. No data is transferred until the buffer is empty. Apache Spark is not capable of handling pressure implicitly rather it is done manually.
• Memory management
|Opportunities• Simplify the challenging and compute-intensive task of processing high volumes of data
• Real time data processing
• Seamlessly integrating complex capabilities such as machine learning and graph algorithms
|Threats• There are various technologies that are overtaking Spark
• In-memory processing is expensive when we look for a cost-efficient processing of big data