Saturday, April 20, 2019

Understanding Nested Lists Dictionaries of JSON in Python and AWS CLI


After lots of hair pulling, bouts of frustration, I was able to grasp this nested list and dictionary thingie in JSON output of AWS cli commands such as describe-db-instances and others. If you run the describe-db-instances for rds or describe-instances for ec2, you get a huge pile of JSON mumbo-jumpo with all those curly and square brackets studded with colons and commas. The output is heavily nested.


For example, if you do :

aws rds describe-db-instances

you get all the information but heavily nested within. Now if you only want to extract or iterate through, say VPCSecurityGroupId of all database instances, then you have to traverse all that nested information which comprises of dictionary consisting of keys which have values as arrays and those arrays have more dictionaries and so on.

After the above rant, let me try to ease the pain a bit by explaining this. For clarity, I have just taken out following chunk from describe-db-instance output. Suppose, the only thing you are interested in is the value of VpcSecurityGroupId from  following chunk:

mydb=rds.describe_db_instances(DBInstanceIdentifier=onedb)
mydb= {'DBInstances':
          [
            {'VpcSecurityGroups': [ {'VpcSecurityGroupId': 'sg-0ed48bab1d54e9554', 'Status': 'active'}]}
          ]
       }

The variable mydb is a dictionary with key DBInstances. This key DBInstances has an array as its value. Now the first item of that array is another dictionary and the first key of that dictionary is VpcSecurityGroups. Now the value this key VpcSecurityGroups another array. This another array's first item is again a dictionary. This last dictionary has a key VpcSecurityGroupId and we want value of this key.

If your head has stopped spinning, then read on and stop cursing me as I am going to demystify it now.

If you want to print that value just use following command:

mydb['DBInstances'][0]['VpcSecurityGroups'][0]['VpcSecurityGroupId']

So the secret is that if its a dictionary, then use key name and if its an array then use index and keep going. That's all there is to it. Full code to print this using Python, boto3 etc is as follows:

import boto3
import click

rds = boto3.client('rds',region_name='ap-southeast-2')
dbs = rds.describe_db_instances()

@click.group()
def cli():
    "Gets RDS data"
    pass

@cli.command('list-database')
@click.argument('onedb')
def list_database(onedb):
    "List info about one database"
    mydb=rds.describe_db_instances(DBInstanceIdentifier=onedb)
    #print(mydb)
    #Following line only prints value of VpcSecurityGroupId of RDS instance
    print(mydb['DBInstances'][0]['VpcSecurityGroups'][0]['VpcSecurityGroupId'])
    #Following line only prints value of OptionGroup of RDS instance
    print(mydb['DBInstances'][0]['OptionGroupMemberships'][0]['OptionGroupName'])
    #Following line only prints value of Parameter Group of RDS instance
    print(mydb['DBInstances'][0]['DBParameterGroups'][0]['DBParameterGroupName'])

if __name__ == '__main__':
    cli()


I hope that helps. If you know any easier way, please do favor and let us know in comments. Thanks.

No comments: