Python, Pandas, SQL , Merge and Group by
- bdata3
- Mar 17, 2020
- 1 min read
So you have
You have an expense report, for example, coming from aws CUR (https://aws.amazon.com/premiumsupport/knowledge-center/cost-usage-report/) but you want to group it per customer (not account and by month)
use the following python code it may help you:
from pyathena import connect
import pandas as pd
st='s3://c-res-stg/s1/'
conn = connect(aws_access_key_id=access,aws_secret_access_key=secret, s3_staging_dir=st,region_name='eu-west-1')
sql_q='''select linkedaccountid,d_month,sum(blendedcost) cost from c_cust_db.c_res_fin_d_p1 group by linkedaccountid,d_month'''
df=pd.read_sql_query(sql_q,conn)
cust_df=pd.read_excel('/Users/geva/Downloads/Customer_list.xlsx')
t=df.merge(cust_df,left_on='linkedaccountid',right_on='Account ID').groupby(['Customer Name','d_month']).sum()

Comments