Mapping LECs from FCC Form 499 Data
Posted on Sun 15 August 2021 in jupter-notebook
Mapping LECs from FCC Form 499 Data¶
Recently I have been looking into the data sets available from the FCC web site. Since the beginning of the year, I have been more involved at work with telecom data and decided to dig in to see what I could find. There is a lot to explore.
One of the primary data sets available is the Form 499 database.
Some Background on Form 499¶
The FCC Form 499 Filer Database is an identification system for all interstate telecommunications carriers, all interconnected voice over IP (VoIP) providers and certain other providers of interstate telecommunications in the US. More info on this data set can be found at the FCC Web Site
In this post, I download the full Form 499 data set of about 17,000 records. Then I extract the most current filings as of April 1, 2021 and quantify the carriers identified as Local Exchange Carriers by state. The final result is an interactive choropleth map of this data using the Python mapping library Folium.
# Import the necessary libraries
import folium
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import openpyxl
import requests
import pprint
pp = pprint.PrettyPrinter(indent=4)
I like to load data directly from a web site or API whenever possible so as to ensure my code can be easily reproduced. An interesting thing I learned when ingesting this file is that the xlrd library that Pandas uses to import Excel files no longer supports anything but .xls files. I have been using an environment with xlrd 1.2.0 with no problem. As of the xlrd 2.0 release, .xlsx files are not supported. Pandas provides a nice error message and points you to using the openpyxl library which is what I use here.
fcc499 = pd.read_excel('https://apps.fcc.gov/cgb/form499/499View.htm', engine='openpyxl')
# There are over 120 columns in this data set, so let's use the verbose=True option for the info method
# inorder to see all of them.
fcc499.info(verbose=True)
I'm interested mainly in the state columns so that I can count up the number of LECs operating in each state. Along with these, I want to grab the first two columns Filer_499_ID and LastFiling which has the filing date. I also want to include the Legal_Name_of_Carrier and a key field called Principal_Comm_Type_1. I will use this to determine if the carrier is a LEC.
Since I am selecting a set of non-contiguous columns, the easiest way to do this is to use numpy.r_ which is used to concatenate array slices along the row axis. Combining this with the .iloc indexer is a really nice way to easily define the columns desired in one step.
See the excellent blog site Practical Business Python and this post in particular for more info.
Also, notice I have used the .copy() method to create a copy of the dataframe. This is important because otherwise you get a warning message.
For more about this issue see this excellent explanation.
fcc499_states = fcc499.iloc[:, np.r_[0:2, 3, 5, 60:119]].copy()
Let's change the LastFiling column to a datetime type inorder to be able to limit the result to the most recent filing. I have also inlcuded an alternative method to do this commented out below the first method.
fcc499_states['LastFiling'] = pd.to_datetime(fcc499_states['LastFiling'])
#fcc499_states['LastFiling'] = fcc499_states['LastFiling'].astype('datetime64[ns]')
fcc499_states.head()
You can see from the value counts below, that April 1, 2021 is the most recent and most represented filing date. Let's limit our anlaysis on just these records.
fcc499_states['LastFiling'].value_counts().sort_index(ascending=False)
fcc499_states = fcc499_states[fcc499_states['LastFiling'] == '2021-04-01']
We now have the data extracted that we want and in the proper format. It comprises just over 6,500 rows of data.
fcc499_states.info()
Now let's focus on the Principal_Comm_Type_1 field. This field is the category in which the carrier classified itself on the Form 499. Since we are interested in Local Exchange Carriers, we will select Incumbent LEC and CAP/CLEC. Incumbent Local Exchange Carriers (ILECs) and Competitive Local Exchange Carrier (CLECs) were born from the break up of Ma Bell with the 1996 Telecommunications Act. Up to that point, AT&T and the "Baby Bells" provided the bulk of local telephone service. The 1996 Act allowed other competitors to enter the market by buying access to the existing network. CAP stands for Competitive Access Provider and is essentially the same as a CLEC. For more history and background see a nice description in the telecom glossary provided Bandwidth.
fcc499_states['Principal_Comm_Type_1'].value_counts()
fcc499_states_lec = fcc499_states[(fcc499_states['Principal_Comm_Type_1'] == 'CAP/CLEC') | \
(fcc499_states['Principal_Comm_Type_1'] == 'Incumbent LEC')]
Selecting just these two types reduces the list to 1,999 rows.
fcc499_states_lec.shape
Now I want to sum up the number of carriers in each state to do some EDA and see how the data is distributed.
state_total = fcc499_states_lec.iloc[:,4:63].sum(axis=0)
state_total = pd.DataFrame(state_total)
state_total = state_total.reset_index()
state_total = state_total.rename(columns={'index': 'State', 0: 'LECs'})
state_total
In order to match the state names to the GeoJSON file I want to use for the map, we need to do a little more clean up - namely replace the underscores in the two word state names with a space.
# Replace _ in state names
i = 0
for c in state_total.State:
state_total.loc[i, 'State'] = c.replace('_', ' ')
i += 1
state_total
Now let's get a the GeoJSON file that we can use to define the states in the map. I found a nice one from Logan Powell on his github. This one works better than the one given in the simple Leaflet tutorial since it includes not only the 50 states but US territories and most importly as a resident of Washington, DC, the Disctict of Columbia.
url = 'https://raw.githubusercontent.com/loganpowell/census-geojson/master/GeoJSON/500k/2020/state.json'
state_geo = requests.get(url).json()
After taking a look at the GeoJSON, we need to get the state names and their abbreviations. This is found after all of the geo codes that define the polygon boundary in properties with NAME and STUSPS. Below is an example of how to access the date for the first record which happens to be Guam.
print(state_geo['features'][0]['properties']['NAME'],
state_geo['features'][0]['properties']['STUSPS'])
Now I want to grab all the states and their abbreviations into a dictionary so that I can use this to map the abbreviations and the index location in the GeoJSON to my dataframe with the LEC counts.
Create the dictionay of abbreviations with state name as the key and abbreviation as the value. Since the name used in the LEC Counts is slightly different than what is in the GeoJSON file, I need to make a few adjustments for the US Virgin Islands and the Northern Marianna Islands.
state_abbreviations = dict()
for i in range(len(state_geo['features'])):
state_name = state_geo['features'][i]['properties']['NAME']
state_abbrev = state_geo['features'][i]['properties']['STUSPS']
state_abbreviations[state_name] = state_abbrev
state_abbreviations['US Virgin Islands'] = state_abbreviations['United States Virgin Islands']
state_abbreviations['Northern Mariana Islands'] = state_abbreviations['Commonwealth of the Northern Mariana Islands']
state_abbreviations
Let's do the same for the index values with state name again as the key and the index as the value.
state_ids = dict()
for i in range(len(state_geo['features'])):
state_name = state_geo['features'][i]['properties']['NAME']
state_ids[state_name] = i
state_ids['US Virgin Islands'] = state_ids['United States Virgin Islands']
state_ids['Northern Mariana Islands'] = state_ids['Commonwealth of the Northern Mariana Islands']
state_ids
Now I'll map the values to the states and also drop the few null values for territories not in the GeoJSON file to get the dataframe with the state totals for ILECs along with the abbreviation. I also want to use the GeoJSON index value as the index for the final version of the state_total dataframe.
state_total['Abbrev'] = state_total['State'].map(state_abbreviations)
state_total['ID'] = state_total['State'].map(state_ids)
state_total
state_total = state_total.dropna()
state_total = state_total.set_index('ID')
state_total.index = state_total.index.astype(int)
state_total
The easiest way to add tooltip data that shows the number of LECs when we hover over a particular state is to add it to the GeoJSON. Let's do this along with the state abbreviation. We will do this by adding it at the properties level.
for i in range(len(state_geo['features'])):
try:
state_geo['features'][i]['properties']['tooltip'] = state_geo['features'][i]['properties']['NAME'] +\
' ' + str(state_total.loc[i, 'LECs'])
except:
state_geo['features'][i]['properties']['tooltip'] = ''
Now when we look at the first record again for Guam we can see that the key "tooltip" has been added with the name and LEC total as the value.
state_geo['features'][0]['properties']
We are now ready to build the choropleth map using the state abbreviation to join the GeoJSON (STUSPS) and the state_total dataframe (Abbrev). I have left a few options commented out including different base layer tile options, the fill colors and a threshold scale to play around with to see what looks best.
m = folium.Map(location=[40, -95], zoom_start=4)
#folium.TileLayer('stamentoner').add_to(m)
#folium.TileLayer('stamenwatercolor').add_to(m)
#folium.TileLayer('cartodbpositron').add_to(m)
choropleth = folium.Choropleth(
geo_data=state_geo,
name='choropleth',
data=state_total,
columns=['Abbrev', 'LECs'],
key_on='feature.properties.STUSPS',
fill_color='YlGnBu',
#fill_color='YlOrRd'
fill_opacity=0.8,
line_opacity=0.2,
legend_name='Local Exchange Carriers (LECs) in Each State',
#threshold_scale=[25, 50, 75, 100, 150, 200, 250],
highlight=True
).add_to(m)
folium.LayerControl().add_to(m)
choropleth.geojson.add_child(
folium.features.GeoJsonTooltip(['tooltip'], labels=False)
)
m