Working with JSON

In this hands-on module, we will learn how to work with the JSON data format. JSON (JavaScript Object Notation) is a powerful, flexible, and lightweight data format that we see a lot throughout this course, especially when working with web apps and REST APIs.

After going through this module, students should be able to:

  • Identify and write valid JSON

  • Read JSON into an object in a Python3 script

  • Loop over and work with elements in a JSON object

  • Write JSON to file from a Python3 script

JSON Basics

Analogous to Python3 dictionaries, JSON is typically composed of key:value pairs. The universality of this data structure makes it ideal for exchanging information between programs written in different languages and web apps. A simple, valid JSON object may look like this:

{
  "key1": "value1",
  "key2": "value2"
}

Although less common in this course, simple arrays of information (analogous to Python3 lists) are also valid JSON:

[
  "thing1", "thing2", "thing3"
]

JSON offers a lot of flexibility on the placement of white space and newline characters. Types can also be mixed together, forming complex data structures:

{
  "department": "COE",
  "number": 332,
  "name": "Software Engineering and Design",
  "inperson": true,
  "finalgroups": null,
  "instructors": ["Joe", "Charlie", "Joe"],
  "prerequisites": [
    {"course": "COE 322", "instructor": "Victor"},
    {"course": "SDS 322", "instructor": "Victor"}
  ]
}

On the class server, navigate to your home directory and make a new folder for this module:

[local]$ ssh username@login-coe332.tacc.utexas.edu
(enter password)
(enter token)
[login-coe332]$ cd coe-332/
[login-coe332]$ mkdir working-with-json && cd working-with-json

Download this sample JSON files into that folder using the wget command, or click this link and cut and paste the contents into a file called Meteorite_Landings.json:

[login-coe332]$ wget https://raw.githubusercontent.com/TACC/coe-332-sp23/main/docs/unit02/sample-data/Meteorite_Landings.json

Note

The Meteorite Landing data is adapted from a data set provided by The Meteoritical Society here: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh

EXERCISE

Plug this file (or some of the above samples) into an online JSON validator (e.g. JSONLint). Try making manual changes to some of the entries to see what breaks the JSON format.

Read JSON into a Python3 Script

The json Python3 library is part of the Python3 Standard Library, meaning it can be imported without having to be installed by pip. Start editing a new Python3 script using your method of choice:

[login-coe332]$ vim json_ex.py

Warning

Do not name your Python3 script “json.py”. If you import json when there is a script called “json.py” in the same folder, it will import that instead of the actual json library.

The code you need to read in the JSON file of state names and abbreviations into a Python3 object is:

1import json
2
3with open('Meteorite_Landings.json', 'r') as f:
4    ml_data = json.load(f)

Only three simple lines! We import json from the standard library so that we can work with the json class. We use the safe with open... statement to open the file we downloaded read-only into a filehandle called f. Finally, we use the load() method of the json class to load the contents of the JSON file into our new ml_data object.

EXERCISE

Try out some of these calls to the type() function on the new ml_data object that you loaded. Also print() each of these as necessary to be sure you know what each is. Be able to explain the output of each call to type() and print().

 1import json
 2
 3with open('Meteorite_Landings.json', 'r') as f:
 4    ml_data = json.load(f)
 5
 6type(ml_data)
 7type(ml_data['meteorite_landings'])
 8type(ml_data['meteorite_landings'][0])
 9type(ml_data['meteorite_landings'][0]['name'])
10
11print(ml_data)
12print(ml_data['meteorite_landings'])
13print(ml_data['meteorite_landings'][0])
14print(ml_data['meteorite_landings'][0]['name'])

Tip

Consider doing this in the Python3 interpreter’s interactive mode instead of in a script.

Work with JSON Data

As we have seen, the JSON object we loaded contains meteorite landing data including names, ids, classes, masses, latitudes, and longitudes. Let’s write a few functions to help us explore the data.

First, write a function to calculate the average mass of all meteorites in the data set. Call that function, and have it print the average mass to screen.

 1import json
 2
 3def compute_average_mass(a_list_of_dicts, a_key_string):
 4    total_mass = 0.
 5    for i in range(len(a_list_of_dicts)):
 6        total_mass += float(a_list_of_dicts[i][a_key_string])
 7    return (total_mass / len(a_list_of_dicts))
 8
 9with open('Meteorite_Landings.json', 'r') as f:
10    ml_data = json.load(f)
11
12print(compute_average_mass(ml_data['meteorite_landings'], 'mass (g)'))

Next, write a function to check where on the globe the meteorite landing site is located. We need to check whether it is Northern or Southern hemisphere, and whether it is Western or Eastern hemisphere.

 1import json
 2
 3def compute_average_mass(a_list_of_dicts, a_key_string):
 4    total_mass = 0.
 5    for i in range(len(a_list_of_dicts)):
 6        total_mass += float(a_list_of_dicts[i][a_key_string])
 7    return (total_mass / len(a_list_of_dicts))
 8
 9def check_hemisphere(latitude: float, longitude: float) -> str:    # type hints
10    location = ''
11    if (latitude > 0):
12        location = 'Northern'
13    else:
14        location = 'Southern'
15    if (longitude > 0):
16        location = f'{location} & Eastern'
17    else:
18        location = f'{location} & Western'
19    return(location)
20
21with open('Meteorite_Landings.json', 'r') as f:
22    ml_data = json.load(f)
23
24print(compute_average_mass(ml_data['meteorite_landings'], 'mass (g)'))
25
26for row in ml_data['meteorite_landings']:
27    print(check_hemisphere(float(row['reclat']), float(row['reclong'])))

Note

Type hints in function definitions indicate what types are expected as input and output of a function, but no checking actually happens at runtime. Think of them as documentation or annotations.

Tip

Check out Python3 ternary operators to make your if/else conditionals shorter, but perhaps a little less intuitive to read.

def check_hemisphere(lat, lon):
    location = 'Northern' if (lat > 0) else 'Southern'
    location = f'{location} & Eastern' if (lon > 0) else f'{location} & Western'
    return(location)

EXERCISE

Write a third function to count how many of each ‘class’ of meteorite there is in the list. The output should look something like:

type, number
H, 1
H4, 2
L6, 6
...etc

Write JSON to File

Finally, in a new script, we will create an object that we can write to a new JSON file.

 1import json
 2
 3data = {}
 4data['class'] = 'COE332'
 5data['title'] = 'Software Engineering and Design'
 6data['subjects'] = []
 7data['subjects'].append( {'unit': 1, 'topic': ['linux', 'python3', 'git']} )
 8data['subjects'].append( {'unit': 2, 'topic': ['json', 'csv', 'xml', 'yaml']} )
 9
10with open('class.json', 'w') as out:
11    json.dump(data, out, indent=2)

Notice that most of the code in the script above was simply assembling a normal Python3 dictionary. The json.dump() method only requires two arguments - the object that should be written to file, and the filehandle. The indent=2 argument is optional, but it makes the output file looks a little nicer and easier to read.

Inspect the output file and paste the contents into an online JSON validator.

EXERCISE

Write a new Python3 script to read in Meteorite_Landings.json, convert the ids, masses, latitudes, and longitudes to floats, then save it as a new JSON file called Meteorite_Landings_updated.json. Compare them side by side to make sure you can see and understand the difference.

Additional Resources