3 days running average / capco
data = [Row(Date='2023-01-01', ProductID=100, QuantitySold=10),
Row(Date='2023-01-02', ProductID=100, QuantitySold=15),
Row(Date='2023-01-03', ProductID=100, QuantitySold=20),
Row(Date='2023-01-04', ProductID=100, QuantitySold=25),
Row(Date='2023-01-05', ProductID=100, QuantitySold=30),
Row(Date='2023-01-06', ProductID=100, QuantitySold=35),
Row(Date='2023-01-07', ProductID=100, QuantitySold=40),
Row(Date='2023-01-08', ProductID=100, QuantitySold=45)]
###########################################################################
I have the following sales data for a product, with the quantity sold on each date:
Date ProductID QuantitySold
2023-01-01 100 10
2023-01-02 100 15
2023-01-03 100 20
2023-01-04 100 25
2023-01-05 100 30
2023-01-06 100 35
2023-01-07 100 40
2023-01-08 100 45
I need to calculate the 3-day running average of the QuantitySold for this product.
use spark/scala/pyspark for calculation
Expected Output
Date ProductID QuantitySold RunningAvg3Days
2023-01-01 100 10 10.00
2023-01-02 100 15 12.50
2023-01-03 100 20 15.00
2023-01-04 100 25 20.00
2023-01-05 100 30 25.00
2023-01-06 100 35 30.00
2023-01-07 100 40 35.00
2023-01-08 100 45 40.00
Comments
Post a Comment