Problem Scenario 42 : You have been given a file (sparklO/sales.txt), with the content as given in below.
spark10/sales.txt
Department,Designation,costToCompany,State
Sales,Trainee,12000,UP
Sales,Lead,32000,AP
Sales,Lead,32000,LA
Sales,Lead,32000,TN
Sales,Lead,32000,AP
Sales,Lead,32000,TN
Sales,Lead,32000,LA
Sales,Lead,32000,LA
Marketing,Associate,18000,TN
Marketing,Associate,18000,TN
HR,Manager,58000,TN
And want to produce the output as a csv with group by Department,Designation,State with additional columns with sum(costToCompany) and TotalEmployeeCountt
Should get result like
Dept,Desg,state,empCount,totalCost
Sales,Lead,AP,2,64000
Sales.Lead.LA.3.96000
Sales,Lead,TN,2,64000
Submit