Great Medium article on SPARK and SQL

Updated: Feb 20, 2020

You can use SQL including window functions on Spark dataframe

nice article on this can be found here :

for example (when using databriks (https://community.cloud.databricks.com/) notebook - but not only):

file_location = "/FileStore/tables/social_delta.csv"

file_type = "csv"

# CSV options

infer_schema = "false"

first_row_is_header = "true"

delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.

df = spark.read.format(file_type) \

.option("inferSchema", infer_schema) \

.option("header", first_row_is_header) \

.option("sep", delimiter) \

.load(file_location)

display(df)

temp_table_name = "social_delta_csv"

df.createOrReplaceTempView(temp_table_name)

df1 = sqlContext.sql("""select *,total-lag(total,1,0) over(partition by url, service order by ts) as delta from (

select *,

sum(delta)

over (partition by url, service order by ts)

as total

from social_delta_csv)""")

df1.show(3)

this and many more in the article ....

Recent Posts