遍历S3

Posted on 2016-07-04 20:20 in Python

首先你需要一个有S3 list权限的key,如果bucket里面的文件很多的话,推荐使用分页来遍历

session = boto3.Session(aws_access_key_id=<s3_aws_key_id>,
                            aws_secret_access_key=<s3_aws_secret_key>,
                            region_name='us-east-1')
    s3 = session.resource('s3')
    client = session.client('s3')
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=<s3_bucket>, Prefix=<s3_path_prefix>:
        for content in result.get('Contents'):
            if content.get('Size') > 0:
                print content.get('Key')

分页默认大小是1000,可以修改PageSize,改小可以减少响应时间

paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket='my-bucket',
                                   PaginationConfig={'PageSize': 100})

参考资料: 1. https://github.com/boto/boto3 2. https://boto3.readthedocs.io/en/latest/reference/services/s3.html#paginators 3. https://boto3.readthedocs.io/en/latest/guide/paginators.html