Great Medium article on SPARK and SQL
- bdata3
- Jan 1, 2020
- 1 min read
Updated: Feb 20, 2020
You can use SQL including window functions on Spark dataframe
nice article on this can be found here :
for example (when using databriks (https://community.cloud.databricks.com/) notebook - but not only):
file_location = "/FileStore/tables/social_delta.csv"
file_type = "csv"
# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
display(df)
temp_table_name = "social_delta_csv"
df.createOrReplaceTempView(temp_table_name)
df1 = sqlContext.sql("""select *,total-lag(total,1,0) over(partition by url, service order by ts) as delta from (
select *,
sum(delta)
over (partition by url, service order by ts)
as total
from social_delta_csv)""")
df1.show(3)

this and many more in the article ....
Comments