Jupyter and Pandas tips and tricks

bdata3
Jan 23, 2022
1 min read

Updated: Feb 20, 2022

This is an ongoing post with tips and tricks to use with Jupyter and Pandas - sure there are many more send me and I'll add them (bdata@geva4u.com)

Copy cell output to clipboard - for example, if you have manipulated SQL query - you want to run in SQL

import pandas as pd
pd.DataFrame([Out[35]]).to_clipboard()

count values in a cell and plot as a bar :

df.value_counts('type').sort_values(ascending=False)

To run SQL on data frame you can use pandasql details in :https://pypi.org/project/pandasql/

then you will be able to perform the following (in SQL and not in pandas...)

To show all columns and rows in the data frame:

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

Finding all none numeric values in column

tmp[~tmp['customer'].apply(np.isreal)]['customer'].unique()\

Counting and on a graph:

tmp[~tmp['customer'].apply(np.isreal)].groupby('customer').agg({'customer':'count'}).plot.bar()

Automatically convert all objects columns couldn't be saved to parquet to str (this practice could be used with changes to other data types too):

retry = True
while retry:
    try:
        tmp[tmp['customer'].apply(np.isreal)].to_parquet(
        '/Users/geva/Documents/cust/out/t_tmp.parquet')
        retry = False
    except BaseException as err:
        t = str(err).split()
        col_name = t[t.index('column')+1]
        tmp[col_name]=tmp[col_name].astype('str')
        print(col_name)

And when using awswrangler :

try:
    wr.s3.to_parquet(
    df=df,
    path=args.dest+args.table,
    dataset=True,
    mode=args.mode,
    database=args.database,
    table=args.table)
except BaseException as err:
    for f,t in zip(df.columns,df.dtypes):
        if t =='object' and not df[f].apply(np.isreal).all():
            df[f]=df[f].astype('str')
            logging.info(f'changed type of col ==> {f}')
    wr.s3.to_parquet(
    df=df,
    path=args.dest+args.table,
    dataset=True,
    mode=args.mode,
    database=args.database,
    table=args.table)

Jupyter and Pandas tips and tricks

Recent Posts

Comments