Jupyter and Pandas tips and tricks
- bdata3
- Jan 23, 2022
- 1 min read
Updated: Feb 20, 2022
This is an ongoing post with tips and tricks to use with Jupyter and Pandas - sure there are many more send me and I'll add them (bdata@geva4u.com)
Copy cell output to clipboard - for example, if you have manipulated SQL query - you want to run in SQL
import pandas as pd
pd.DataFrame([Out[35]]).to_clipboard()
count values in a cell and plot as a bar :
df.value_counts('type').sort_values(ascending=False)

To run SQL on data frame you can use pandasql details in :https://pypi.org/project/pandasql/
then you will be able to perform the following (in SQL and not in pandas...)

To show all columns and rows in the data frame:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
Finding all none numeric values in column
tmp[~tmp['customer'].apply(np.isreal)]['customer'].unique()\
Counting and on a graph:
tmp[~tmp['customer'].apply(np.isreal)].groupby('customer').agg({'customer':'count'}).plot.bar()

Automatically convert all objects columns couldn't be saved to parquet to str (this practice could be used with changes to other data types too):
retry = True
while retry:
try:
tmp[tmp['customer'].apply(np.isreal)].to_parquet(
'/Users/geva/Documents/cust/out/t_tmp.parquet')
retry = False
except BaseException as err:
t = str(err).split()
col_name = t[t.index('column')+1]
tmp[col_name]=tmp[col_name].astype('str')
print(col_name)
And when using awswrangler :
try:
wr.s3.to_parquet(
df=df,
path=args.dest+args.table,
dataset=True,
mode=args.mode,
database=args.database,
table=args.table)
except BaseException as err:
for f,t in zip(df.columns,df.dtypes):
if t =='object' and not df[f].apply(np.isreal).all():
df[f]=df[f].astype('str')
logging.info(f'changed type of col ==> {f}')
wr.s3.to_parquet(
df=df,
path=args.dest+args.table,
dataset=True,
mode=args.mode,
database=args.database,
table=args.table)
Comments