Spark Framework for Big Data Analysis on Pseudo Distributed Clusters White Paper
Many ”big data” applications have been designed which track statistics about page views in real time, train a machine learning model and automatically detect anomalies. But these applications often require different set of tools like Map-Reduce on Hadoop (MR), Hive, Hadoop Streaming, Weka and Mahout to create models and classifiers.
This white paper talks about streaming data operated on various layers of the Spark stack, such as Spark Streaming, Spark SQL, Spark Machine Learning libraries (MLlib).
In this paper, our data scientist Vishwas Subramanian discusses transforming a stream of live Twitter data into datasets, carrying out feature extraction, constructing a model and analyzing the data, improving the language classification and finally applying the model back in real time on a Pseudo Distributed System (LXCs).
Fill out your information and access our latest white paper!
Syntelli will never share your information with outside parties.