Pacific-Design.com

    
Home Index

1. Apache Spark

2. PySpark

Apache Spark / PySpark /

PySpark - Compute Average in Parallel

   
#bin/dse spark-submit compute-average.py

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Compute Average in Parallel")
sc = SparkContext(conf=conf)

#------------ pyspak shell sc is provided ------------#
a = sc.parallelize([1,3,4,5,8,9,12,45,67,88,99])

def average(a,b):
        return (a+b)/2.0;

def sum(a,b):
        return a+b;

b = a.reduce(average)

SUM=a.reduce(sum)
print "SUM=" + str(SUM) 

COUNT=a.count()
print "COUNT=" + str(COUNT)

Output

SUM=341
COUNT=11