Problem Scenario 16 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish below assignment.
1. Create a table in hive as below.
create table departments_hive(department_id int, department_name string);
2. Now import data from mysql table departments to this hive table. Please make sure that data should be visible using below hive command, select" from departments_hive
Problem Scenario 85 : In Continuation of previous question, please accomplish following activities.
1. Select all the columns from product table with output header as below. productID AS ID
code AS Code name AS Description price AS 'Unit Price'
2. Select code and name both separated by ' -' and header name should be Product Description'.
3. Select all distinct prices.
4. Select distinct price and name combination.
5. Select all price data sorted by both code and productID combination.
6. count number of products.
7. Count number of products for each code.
Problem Scenario 39 : You have been given two files
spark16/file1.txt
1,9,5
2,7,4
3,8,3
spark16/file2.txt
1,g,h
2,i,j
3,k,l
Load these two tiles as Spark RDD and join them to produce the below results
(l,((9,5),(g,h)))
(2, ((7,4), (i,j))) (3, ((8,3), (k,l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).
Problem Scenario 90 : You have been given below two files
course.txt
id,course
1,Hadoop
2,Spark
3,HBase
fee.txt
id,fee
2,3900
3,4200
4,2900
Accomplish the following activities.
1. Select all the courses and their fees , whether fee is listed or not.
2. Select all the available fees and respective course. If course does not exists still list the fee
3. Select all the courses and their fees , whether fee is listed or not. However, ignore records having fee as null.
Problem Scenario 14 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. Create a csv file named updated_departments.csv with the following contents in local file system.
updated_departments.csv
2,fitness
3,footwear
12,fathematics
13,fcience
14,engineering
1000,management
2. Upload this csv file to hdfs filesystem,
3. Now export this data from hdfs to mysql retaildb.departments table. During upload make sure existing department will just updated and new departments needs to be inserted.
4. Now update updated_departments.csv file with below content.
2,Fitness
3,Footwear
12,Fathematics
13,Science
14,Engineering
1000,Management
2000,Quality Check
5. Now upload this file to hdfs.
6. Now export this data from hdfs to mysql retail_db.departments table. During upload make sure existing department will just updated and no new departments needs to be inserted.
Problem Scenario 43 : You have been given following code snippet.
val grouped = sc.parallelize(Seq(((1,"twoM), List((3,4), (5,6)))))
val flattened = grouped.flatMap {A =>
groupValues.map { value => B }
}
You need to generate following output.
Hence replace A and B
Array((1,two,3,4),(1,two,5,6))
Problem Scenario 36 : You have been given a file named spark8/data.csv (type,name).
data.csv
1,Lokesh
2,Bhupesh
2,Amit
2,Ratan
2,Dinesh
1,Pavan
1,Tejas
2,Sheela
1,Kumar
1,Venkat
1. Load this file from hdfs and save it back as (id, (all names of same type)) in results directory. However, make sure while saving it should be
Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1. create employee.json file locally.
2. Load this file on hdfs
3. Register this data as a temp table in Spark using Python.
4. Write select query and print this data.
5. Now save back this selected data in json format.