Python smart-open package
- bdata3
- May 10, 2020
- 1 min read
If you have a very large file on S3 or on the web use smart_open (or even if you have local file gzip..)
It is a Python 3 library for efficient streaming of very large files from/to storages such as S3, GCS, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem. It supports transparent, on-the-fly (de-)compression for a variety of different formats.
to print file header :
from smart_open import open
t=open('s3://bucket/file.csv.gz')
for n,i in enumerate(t):
if n==0:
print(i)

Comments