Mapping LECs from FCC Form 499 Data¶

Recently I have been looking into the data sets available from the FCC web site. Since the beginning of the year, I have been more involved at work with telecom data and decided to dig in to see what I could find. There is a lot to explore.

One of the primary data sets available is the Form 499 database.

Some Background on Form 499¶

The FCC Form 499 Filer Database is an identification system for all interstate telecommunications carriers, all interconnected voice over IP (VoIP) providers and certain other providers of interstate telecommunications in the US. More info on this data set can be found at the FCC Web Site

In this post, I download the full Form 499 data set of about 17,000 records. Then I extract the most current filings as of April 1, 2021 and quantify the carriers identified as Local Exchange Carriers by state. The final result is an interactive choropleth map of this data using the Python mapping library Folium.

In [1]:

# Import the necessary libraries

import folium
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import openpyxl
import requests

import pprint
pp = pprint.PrettyPrinter(indent=4)

I like to load data directly from a web site or API whenever possible so as to ensure my code can be easily reproduced. An interesting thing I learned when ingesting this file is that the xlrd library that Pandas uses to import Excel files no longer supports anything but .xls files. I have been using an environment with xlrd 1.2.0 with no problem. As of the xlrd 2.0 release, .xlsx files are not supported. Pandas provides a nice error message and points you to using the openpyxl library which is what I use here.

In [2]:

fcc499 = pd.read_excel('https://apps.fcc.gov/cgb/form499/499View.htm', engine='openpyxl')

In [3]:

# There are over 120 columns in this data set, so let's use the verbose=True option for the info method
# inorder to see all of them.

fcc499.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17114 entries, 0 to 17113
Data columns (total 122 columns):
 #    Column                        Dtype  
---   ------                        -----  
 0    Filer_499_ID                  int64  
 1    LastFiling                    object 
 2    USF_Contributor_1             object 
 3    Legal_Name_of_Carrier         object 
 4    Doing_Business_As             object 
 5    Principal_Comm_Type_1         object 
 6    Holding_Company               object 
 7    CORESID                       object 
 8    Management_Company            object 
 9    HQ_Address1                   object 
 10   HQ_Address2                   object 
 11   HQ_Address3                   object 
 12   HQ_City                       object 
 13   HQ_State                      object 
 14   HQ_Zip_Code                   object 
 15   Customer_Inquiries_Address1   object 
 16   Customer_Inquiries_Address2   object 
 17   Customer_Inquiries_Address3   object 
 18   Customer_Inquiries_City       object 
 19   Customer_Inquiries_State      object 
 20   Customer_Inquiries_Zip_Code   object 
 21   Customer_Inquiries_Telephone  object 
 22   Customer_Inquiries_Ext        float64
 23   Other_Trade_Name1             object 
 24   Other_Trade_Name2             object 
 25   Other_Trade_Name3             object 
 26   Other_Trade_Name4             object 
 27   Other_Trade_Name5             object 
 28   Other_Trade_Name6             object 
 29   Other_Trade_Name7             object 
 30   Other_Trade_Name8             object 
 31   Other_Trade_Name9             object 
 32   Other_Trade_Name10            object 
 33   Other_Trade_Name11            object 
 34   Other_Trade_Name12            object 
 35   Other_Trade_Name13            object 
 36   DC_Agent1                     object 
 37   DC_Agent2                     object 
 38   DC_Agent_Telephone            object 
 39   DC_Agent_Ext                  float64
 40   DC_Agent_Fax                  object 
 41   DC_Agent_EMail                object 
 42   DC_Agent_Address1             object 
 43   DC_Agent_Address2             object 
 44   DC_Agent_Address3             object 
 45   DC_Agent_City                 object 
 46   DC_Agent_State                object 
 47   DC_Agent_Zip                  object 
 48   Alternate_Agent1              object 
 49   Alternate_Agent2              object 
 50   Alternate_Agent_Telephone     object 
 51   Alternate_Agent_Ext           float64
 52   Alternate_Agent_Fax           object 
 53   Alternate_Agent_EMail         object 
 54   Alternate_Agent_Address1      object 
 55   Alternate_Agent_Address2      object 
 56   Alternate_Agent_Address3      object 
 57   Alternate_Agent_City          object 
 58   Alternate_Agent_State         object 
 59   Alternate_Agent_Zip_Code      object 
 60   Alabama                       bool   
 61   Alaska                        bool   
 62   American_Samoa                bool   
 63   Arizona                       bool   
 64   Arkansas                      bool   
 65   California                    bool   
 66   Colorado                      bool   
 67   Connecticut                   bool   
 68   Delaware                      bool   
 69   District_of_Columbia          bool   
 70   Florida                       bool   
 71   Georgia                       bool   
 72   Guam                          bool   
 73   Hawaii                        bool   
 74   Idaho                         bool   
 75   Illinois                      bool   
 76   Indiana                       bool   
 77   Iowa                          bool   
 78   Johnston_Atoll                bool   
 79   Kansas                        bool   
 80   Kentucky                      bool   
 81   Louisiana                     bool   
 82   Maine                         bool   
 83   Maryland                      bool   
 84   Massachusetts                 bool   
 85   Michigan                      bool   
 86   Midway_Atoll                  bool   
 87   Minnesota                     bool   
 88   Mississippi                   bool   
 89   Missouri                      bool   
 90   Montana                       bool   
 91   Nebraska                      bool   
 92   Nevada                        bool   
 93   New_Hampshire                 bool   
 94   New_Jersey                    bool   
 95   New_Mexico                    bool   
 96   New_York                      bool   
 97   North_Carolina                bool   
 98   North_Dakota                  bool   
 99   Northern_Mariana_Islands      bool   
 100  Ohio                          bool   
 101  Oklahoma                      bool   
 102  Oregon                        bool   
 103  Pennsylvania                  bool   
 104  Puerto_Rico                   bool   
 105  Rhode_Island                  bool   
 106  South_Carolina                bool   
 107  South_Dakota                  bool   
 108  Tennessee                     bool   
 109  Texas                         bool   
 110  Utah                          bool   
 111  US_Virgin_Islands             bool   
 112  Vermont                       bool   
 113  Virginia                      bool   
 114  Wake_Island                   bool   
 115  Washington                    bool   
 116  West_Virginia                 bool   
 117  Wisconsin                     bool   
 118  Wyoming                       bool   
 119  note1                         object 
 120  note2                         object 
 121  note3                         object 
dtypes: bool(59), float64(3), int64(1), object(59)
memory usage: 9.2+ MB

I'm interested mainly in the state columns so that I can count up the number of LECs operating in each state. Along with these, I want to grab the first two columns Filer_499_ID and LastFiling which has the filing date. I also want to include the Legal_Name_of_Carrier and a key field called Principal_Comm_Type_1. I will use this to determine if the carrier is a LEC.

Since I am selecting a set of non-contiguous columns, the easiest way to do this is to use numpy.r_ which is used to concatenate array slices along the row axis. Combining this with the .iloc indexer is a really nice way to easily define the columns desired in one step.

See the excellent blog site Practical Business Python and this post in particular for more info.

Also, notice I have used the .copy() method to create a copy of the dataframe. This is important because otherwise you get a warning message.

For more about this issue see this excellent explanation.

In [4]:

fcc499_states = fcc499.iloc[:, np.r_[0:2, 3, 5, 60:119]].copy()

Let's change the LastFiling column to a datetime type inorder to be able to limit the result to the most recent filing. I have also inlcuded an alternative method to do this commented out below the first method.

In [5]:

fcc499_states['LastFiling'] = pd.to_datetime(fcc499_states['LastFiling'])
#fcc499_states['LastFiling'] = fcc499_states['LastFiling'].astype('datetime64[ns]')
fcc499_states.head()

Out[5]:

	Filer_499_ID	LastFiling	Legal_Name_of_Carrier	Principal_Comm_Type_1	Alabama	Alaska	American_Samoa	Arizona	Arkansas	California	...	Texas	Utah	US_Virgin_Islands	Vermont	Virginia	Wake_Island	Washington	West_Virginia	Wisconsin	Wyoming
0	801264	2021-04-01	Tri-County Telephone Company	Incumbent LEC	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
1	801267	2021-04-01	Duo County Tel. Coop. Corp., Inc.	Incumbent LEC	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
2	801270	2021-04-01	K & M Telephone Company, Inc	Incumbent LEC	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
3	801276	2019-04-01	CN Johnson Holding	Payphone Service Provider	False	False	False	False	False	False	...	False	True	False	False	False	False	False	False	False	False
4	801279	2021-04-01	Rock Port Tel. Co.	Incumbent LEC	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False

5 rows × 63 columns

You can see from the value counts below, that April 1, 2021 is the most recent and most represented filing date. Let's limit our anlaysis on just these records.

In [6]:

fcc499_states['LastFiling'].value_counts().sort_index(ascending=False)

Out[6]:

2021-04-01    6582
2020-04-01     693
2019-04-01     631
2018-04-02     474
2017-04-03     475
2016-04-01     566
2015-04-01     518
2014-04-01     456
2013-04-01     376
2012-04-02     451
2011-04-01     384
2010-04-01     356
2009-04-01     491
2008-04-01     445
2007-04-02     366
2007-02-01       1
2006-04-03     444
2005-05-01       1
2005-04-01     323
2005-02-01       1
2004-08-01       1
2004-04-01     404
2003-05-01       1
2003-04-01     386
2002-08-01       5
2002-05-01       4
2002-04-01    1540
2001-08-01       1
2001-05-01       1
2001-04-02     592
2001-04-01       4
2000-09-01      15
2000-04-01      26
1999-09-01       6
1999-04-01      24
1998-09-01       6
1998-04-01       6
1997-09-01       7
Name: LastFiling, dtype: int64

In [7]:

fcc499_states = fcc499_states[fcc499_states['LastFiling'] == '2021-04-01']

We now have the data extracted that we want and in the proper format. It comprises just over 6,500 rows of data.

In [8]:

fcc499_states.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6582 entries, 0 to 16112
Data columns (total 63 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Filer_499_ID              6582 non-null   int64         
 1   LastFiling                6582 non-null   datetime64[ns]
 2   Legal_Name_of_Carrier     6582 non-null   object        
 3   Principal_Comm_Type_1     6582 non-null   object        
 4   Alabama                   6582 non-null   bool          
 5   Alaska                    6582 non-null   bool          
 6   American_Samoa            6582 non-null   bool          
 7   Arizona                   6582 non-null   bool          
 8   Arkansas                  6582 non-null   bool          
 9   California                6582 non-null   bool          
 10  Colorado                  6582 non-null   bool          
 11  Connecticut               6582 non-null   bool          
 12  Delaware                  6582 non-null   bool          
 13  District_of_Columbia      6582 non-null   bool          
 14  Florida                   6582 non-null   bool          
 15  Georgia                   6582 non-null   bool          
 16  Guam                      6582 non-null   bool          
 17  Hawaii                    6582 non-null   bool          
 18  Idaho                     6582 non-null   bool          
 19  Illinois                  6582 non-null   bool          
 20  Indiana                   6582 non-null   bool          
 21  Iowa                      6582 non-null   bool          
 22  Johnston_Atoll            6582 non-null   bool          
 23  Kansas                    6582 non-null   bool          
 24  Kentucky                  6582 non-null   bool          
 25  Louisiana                 6582 non-null   bool          
 26  Maine                     6582 non-null   bool          
 27  Maryland                  6582 non-null   bool          
 28  Massachusetts             6582 non-null   bool          
 29  Michigan                  6582 non-null   bool          
 30  Midway_Atoll              6582 non-null   bool          
 31  Minnesota                 6582 non-null   bool          
 32  Mississippi               6582 non-null   bool          
 33  Missouri                  6582 non-null   bool          
 34  Montana                   6582 non-null   bool          
 35  Nebraska                  6582 non-null   bool          
 36  Nevada                    6582 non-null   bool          
 37  New_Hampshire             6582 non-null   bool          
 38  New_Jersey                6582 non-null   bool          
 39  New_Mexico                6582 non-null   bool          
 40  New_York                  6582 non-null   bool          
 41  North_Carolina            6582 non-null   bool          
 42  North_Dakota              6582 non-null   bool          
 43  Northern_Mariana_Islands  6582 non-null   bool          
 44  Ohio                      6582 non-null   bool          
 45  Oklahoma                  6582 non-null   bool          
 46  Oregon                    6582 non-null   bool          
 47  Pennsylvania              6582 non-null   bool          
 48  Puerto_Rico               6582 non-null   bool          
 49  Rhode_Island              6582 non-null   bool          
 50  South_Carolina            6582 non-null   bool          
 51  South_Dakota              6582 non-null   bool          
 52  Tennessee                 6582 non-null   bool          
 53  Texas                     6582 non-null   bool          
 54  Utah                      6582 non-null   bool          
 55  US_Virgin_Islands         6582 non-null   bool          
 56  Vermont                   6582 non-null   bool          
 57  Virginia                  6582 non-null   bool          
 58  Wake_Island               6582 non-null   bool          
 59  Washington                6582 non-null   bool          
 60  West_Virginia             6582 non-null   bool          
 61  Wisconsin                 6582 non-null   bool          
 62  Wyoming                   6582 non-null   bool          
dtypes: bool(59), datetime64[ns](1), int64(1), object(2)
memory usage: 636.3+ KB

Now let's focus on the Principal_Comm_Type_1 field. This field is the category in which the carrier classified itself on the Form 499. Since we are interested in Local Exchange Carriers, we will select Incumbent LEC and CAP/CLEC. Incumbent Local Exchange Carriers (ILECs) and Competitive Local Exchange Carrier (CLECs) were born from the break up of Ma Bell with the 1996 Telecommunications Act. Up to that point, AT&T and the "Baby Bells" provided the bulk of local telephone service. The 1996 Act allowed other competitors to enter the market by buying access to the existing network. CAP stands for Competitive Access Provider and is essentially the same as a CLEC. For more history and background see a nice description in the telecom glossary provided Bandwidth.

In [9]:

fcc499_states['Principal_Comm_Type_1'].value_counts()

Out[9]:

Interconnected Voip               2093
Incumbent LEC                     1224
CAP/CLEC                           775
Toll Reseller                      480
Private Service Provider           363
Cellular/PCS/SMR                   260
Local Reseller                     235
Wireless Data                      204
Non-interconnected VoIP            162
Interexchange Carrier              134
SMR (dispatch)                     107
Other Local                         93
Other Toll                          91
Paging & Messaging                  81
Audio Bridge Service                66
Prepaid Card                        59
Payphone Service Provider           49
Satellite Service Provider          38
Other Mobile                        24
Operator Service Provider           22
Shared-Tenant Service Provider      22
Name: Principal_Comm_Type_1, dtype: int64

In [10]:

fcc499_states_lec = fcc499_states[(fcc499_states['Principal_Comm_Type_1'] == 'CAP/CLEC') | \
                                (fcc499_states['Principal_Comm_Type_1'] == 'Incumbent LEC')]

Selecting just these two types reduces the list to 1,999 rows.

In [11]:

fcc499_states_lec.shape

Out[11]:

(1999, 63)

Now I want to sum up the number of carriers in each state to do some EDA and see how the data is distributed.

In [12]:

state_total = fcc499_states_lec.iloc[:,4:63].sum(axis=0)
state_total = pd.DataFrame(state_total)
state_total = state_total.reset_index()
state_total = state_total.rename(columns={'index': 'State', 0: 'LECs'})
state_total

Out[12]:

	State	LECs
0	Alabama	101
1	Alaska	47
2	American_Samoa	6
3	Arizona	93
4	Arkansas	89
5	California	135
6	Colorado	109
7	Connecticut	62
8	Delaware	51
9	District_of_Columbia	58
10	Florida	113
11	Georgia	153
12	Guam	8
13	Hawaii	42
14	Idaho	76
15	Illinois	162
16	Indiana	132
17	Iowa	252
18	Johnston_Atoll	4
19	Kansas	118
20	Kentucky	106
21	Louisiana	85
22	Maine	67
23	Maryland	83
24	Massachusetts	75
25	Michigan	127
26	Midway_Atoll	4
27	Minnesota	171
28	Mississippi	75
29	Missouri	121
30	Montana	73
31	Nebraska	105
32	Nevada	80
33	New_Hampshire	56
34	New_Jersey	88
35	New_Mexico	81
36	New_York	146
37	North_Carolina	111
38	North_Dakota	77
39	Northern_Mariana_Islands	6
40	Ohio	138
41	Oklahoma	113
42	Oregon	112
43	Pennsylvania	144
44	Puerto_Rico	28
45	Rhode_Island	43
46	South_Carolina	98
47	South_Dakota	77
48	Tennessee	113
49	Texas	179
50	Utah	75
51	US_Virgin_Islands	10
52	Vermont	52
53	Virginia	114
54	Wake_Island	7
55	Washington	103
56	West_Virginia	77
57	Wisconsin	157
58	Wyoming	57

In order to match the state names to the GeoJSON file I want to use for the map, we need to do a little more clean up - namely replace the underscores in the two word state names with a space.

In [13]:

# Replace _ in state names

i = 0
for c in state_total.State:
    state_total.loc[i, 'State'] = c.replace('_', ' ')
    i += 1
state_total

Out[13]:

	State	LECs
0	Alabama	101
1	Alaska	47
2	American Samoa	6
3	Arizona	93
4	Arkansas	89
5	California	135
6	Colorado	109
7	Connecticut	62
8	Delaware	51
9	District of Columbia	58
10	Florida	113
11	Georgia	153
12	Guam	8
13	Hawaii	42
14	Idaho	76
15	Illinois	162
16	Indiana	132
17	Iowa	252
18	Johnston Atoll	4
19	Kansas	118
20	Kentucky	106
21	Louisiana	85
22	Maine	67
23	Maryland	83
24	Massachusetts	75
25	Michigan	127
26	Midway Atoll	4
27	Minnesota	171
28	Mississippi	75
29	Missouri	121
30	Montana	73
31	Nebraska	105
32	Nevada	80
33	New Hampshire	56
34	New Jersey	88
35	New Mexico	81
36	New York	146
37	North Carolina	111
38	North Dakota	77
39	Northern Mariana Islands	6
40	Ohio	138
41	Oklahoma	113
42	Oregon	112
43	Pennsylvania	144
44	Puerto Rico	28
45	Rhode Island	43
46	South Carolina	98
47	South Dakota	77
48	Tennessee	113
49	Texas	179
50	Utah	75
51	US Virgin Islands	10
52	Vermont	52
53	Virginia	114
54	Wake Island	7
55	Washington	103
56	West Virginia	77
57	Wisconsin	157
58	Wyoming	57

Now let's get a the GeoJSON file that we can use to define the states in the map. I found a nice one from Logan Powell on his github. This one works better than the one given in the simple Leaflet tutorial since it includes not only the 50 states but US territories and most importly as a resident of Washington, DC, the Disctict of Columbia.

In [14]:

url = 'https://raw.githubusercontent.com/loganpowell/census-geojson/master/GeoJSON/500k/2020/state.json'
state_geo = requests.get(url).json()

After taking a look at the GeoJSON, we need to get the state names and their abbreviations. This is found after all of the geo codes that define the polygon boundary in properties with NAME and STUSPS. Below is an example of how to access the date for the first record which happens to be Guam.

In [15]:

print(state_geo['features'][0]['properties']['NAME'],
state_geo['features'][0]['properties']['STUSPS'])

Guam GU

Now I want to grab all the states and their abbreviations into a dictionary so that I can use this to map the abbreviations and the index location in the GeoJSON to my dataframe with the LEC counts.

Create the dictionay of abbreviations with state name as the key and abbreviation as the value. Since the name used in the LEC Counts is slightly different than what is in the GeoJSON file, I need to make a few adjustments for the US Virgin Islands and the Northern Marianna Islands.

In [16]:

state_abbreviations = dict()
for i in range(len(state_geo['features'])):
    state_name = state_geo['features'][i]['properties']['NAME']
    state_abbrev = state_geo['features'][i]['properties']['STUSPS']
    state_abbreviations[state_name] = state_abbrev
state_abbreviations['US Virgin Islands'] = state_abbreviations['United States Virgin Islands']
state_abbreviations['Northern Mariana Islands'] = state_abbreviations['Commonwealth of the Northern Mariana Islands']
state_abbreviations

Out[16]:

{'Guam': 'GU',
 'Texas': 'TX',
 'Wisconsin': 'WI',
 'Rhode Island': 'RI',
 'New York': 'NY',
 'United States Virgin Islands': 'VI',
 'New Hampshire': 'NH',
 'Minnesota': 'MN',
 'Puerto Rico': 'PR',
 'Missouri': 'MO',
 'North Carolina': 'NC',
 'Michigan': 'MI',
 'Louisiana': 'LA',
 'Nebraska': 'NE',
 'California': 'CA',
 'Wyoming': 'WY',
 'South Carolina': 'SC',
 'Commonwealth of the Northern Mariana Islands': 'MP',
 'Kansas': 'KS',
 'Delaware': 'DE',
 'Alaska': 'AK',
 'New Jersey': 'NJ',
 'North Dakota': 'ND',
 'District of Columbia': 'DC',
 'Colorado': 'CO',
 'Virginia': 'VA',
 'Indiana': 'IN',
 'Nevada': 'NV',
 'New Mexico': 'NM',
 'Alabama': 'AL',
 'Tennessee': 'TN',
 'Kentucky': 'KY',
 'Oregon': 'OR',
 'Mississippi': 'MS',
 'Connecticut': 'CT',
 'Georgia': 'GA',
 'Utah': 'UT',
 'Idaho': 'ID',
 'Illinois': 'IL',
 'Iowa': 'IA',
 'Arizona': 'AZ',
 'American Samoa': 'AS',
 'Vermont': 'VT',
 'Montana': 'MT',
 'South Dakota': 'SD',
 'Pennsylvania': 'PA',
 'Oklahoma': 'OK',
 'Maryland': 'MD',
 'Maine': 'ME',
 'Hawaii': 'HI',
 'Ohio': 'OH',
 'West Virginia': 'WV',
 'Washington': 'WA',
 'Arkansas': 'AR',
 'Massachusetts': 'MA',
 'Florida': 'FL',
 'US Virgin Islands': 'VI',
 'Northern Mariana Islands': 'MP'}

Let's do the same for the index values with state name again as the key and the index as the value.

In [17]:

state_ids = dict()
for i in range(len(state_geo['features'])):
    state_name = state_geo['features'][i]['properties']['NAME']
    state_ids[state_name] = i
state_ids['US Virgin Islands'] = state_ids['United States Virgin Islands']
state_ids['Northern Mariana Islands'] = state_ids['Commonwealth of the Northern Mariana Islands']
state_ids

Out[17]:

{'Guam': 0,
 'Texas': 1,
 'Wisconsin': 2,
 'Rhode Island': 3,
 'New York': 4,
 'United States Virgin Islands': 5,
 'New Hampshire': 6,
 'Minnesota': 7,
 'Puerto Rico': 8,
 'Missouri': 9,
 'North Carolina': 10,
 'Michigan': 11,
 'Louisiana': 12,
 'Nebraska': 13,
 'California': 14,
 'Wyoming': 15,
 'South Carolina': 16,
 'Commonwealth of the Northern Mariana Islands': 17,
 'Kansas': 18,
 'Delaware': 19,
 'Alaska': 20,
 'New Jersey': 21,
 'North Dakota': 22,
 'District of Columbia': 23,
 'Colorado': 24,
 'Virginia': 25,
 'Indiana': 26,
 'Nevada': 27,
 'New Mexico': 28,
 'Alabama': 29,
 'Tennessee': 30,
 'Kentucky': 31,
 'Oregon': 32,
 'Mississippi': 33,
 'Connecticut': 34,
 'Georgia': 35,
 'Utah': 36,
 'Idaho': 37,
 'Illinois': 38,
 'Iowa': 39,
 'Arizona': 40,
 'American Samoa': 41,
 'Vermont': 42,
 'Montana': 43,
 'South Dakota': 44,
 'Pennsylvania': 45,
 'Oklahoma': 46,
 'Maryland': 47,
 'Maine': 48,
 'Hawaii': 49,
 'Ohio': 50,
 'West Virginia': 51,
 'Washington': 52,
 'Arkansas': 53,
 'Massachusetts': 54,
 'Florida': 55,
 'US Virgin Islands': 5,
 'Northern Mariana Islands': 17}

Now I'll map the values to the states and also drop the few null values for territories not in the GeoJSON file to get the dataframe with the state totals for ILECs along with the abbreviation. I also want to use the GeoJSON index value as the index for the final version of the state_total dataframe.

In [18]:

state_total['Abbrev'] = state_total['State'].map(state_abbreviations)
state_total['ID'] = state_total['State'].map(state_ids)
state_total

Out[18]:

	State	LECs	Abbrev	ID
0	Alabama	101	AL	29.0
1	Alaska	47	AK	20.0
2	American Samoa	6	AS	41.0
3	Arizona	93	AZ	40.0
4	Arkansas	89	AR	53.0
5	California	135	CA	14.0
6	Colorado	109	CO	24.0
7	Connecticut	62	CT	34.0
8	Delaware	51	DE	19.0
9	District of Columbia	58	DC	23.0
10	Florida	113	FL	55.0
11	Georgia	153	GA	35.0
12	Guam	8	GU	0.0
13	Hawaii	42	HI	49.0
14	Idaho	76	ID	37.0
15	Illinois	162	IL	38.0
16	Indiana	132	IN	26.0
17	Iowa	252	IA	39.0
18	Johnston Atoll	4	NaN	NaN
19	Kansas	118	KS	18.0
20	Kentucky	106	KY	31.0
21	Louisiana	85	LA	12.0
22	Maine	67	ME	48.0
23	Maryland	83	MD	47.0
24	Massachusetts	75	MA	54.0
25	Michigan	127	MI	11.0
26	Midway Atoll	4	NaN	NaN
27	Minnesota	171	MN	7.0
28	Mississippi	75	MS	33.0
29	Missouri	121	MO	9.0
30	Montana	73	MT	43.0
31	Nebraska	105	NE	13.0
32	Nevada	80	NV	27.0
33	New Hampshire	56	NH	6.0
34	New Jersey	88	NJ	21.0
35	New Mexico	81	NM	28.0
36	New York	146	NY	4.0
37	North Carolina	111	NC	10.0
38	North Dakota	77	ND	22.0
39	Northern Mariana Islands	6	MP	17.0
40	Ohio	138	OH	50.0
41	Oklahoma	113	OK	46.0
42	Oregon	112	OR	32.0
43	Pennsylvania	144	PA	45.0
44	Puerto Rico	28	PR	8.0
45	Rhode Island	43	RI	3.0
46	South Carolina	98	SC	16.0
47	South Dakota	77	SD	44.0
48	Tennessee	113	TN	30.0
49	Texas	179	TX	1.0
50	Utah	75	UT	36.0
51	US Virgin Islands	10	VI	5.0
52	Vermont	52	VT	42.0
53	Virginia	114	VA	25.0
54	Wake Island	7	NaN	NaN
55	Washington	103	WA	52.0
56	West Virginia	77	WV	51.0
57	Wisconsin	157	WI	2.0
58	Wyoming	57	WY	15.0

In [19]:

state_total = state_total.dropna()
state_total = state_total.set_index('ID')
state_total.index = state_total.index.astype(int)
state_total

Out[19]:

	State	LECs	Abbrev
ID
29	Alabama	101	AL
20	Alaska	47	AK
41	American Samoa	6	AS
40	Arizona	93	AZ
53	Arkansas	89	AR
14	California	135	CA
24	Colorado	109	CO
34	Connecticut	62	CT
19	Delaware	51	DE
23	District of Columbia	58	DC
55	Florida	113	FL
35	Georgia	153	GA
0	Guam	8	GU
49	Hawaii	42	HI
37	Idaho	76	ID
38	Illinois	162	IL
26	Indiana	132	IN
39	Iowa	252	IA
18	Kansas	118	KS
31	Kentucky	106	KY
12	Louisiana	85	LA
48	Maine	67	ME
47	Maryland	83	MD
54	Massachusetts	75	MA
11	Michigan	127	MI
7	Minnesota	171	MN
33	Mississippi	75	MS
9	Missouri	121	MO
43	Montana	73	MT
13	Nebraska	105	NE
27	Nevada	80	NV
6	New Hampshire	56	NH
21	New Jersey	88	NJ
28	New Mexico	81	NM
4	New York	146	NY
10	North Carolina	111	NC
22	North Dakota	77	ND
17	Northern Mariana Islands	6	MP
50	Ohio	138	OH
46	Oklahoma	113	OK
32	Oregon	112	OR
45	Pennsylvania	144	PA
8	Puerto Rico	28	PR
3	Rhode Island	43	RI
16	South Carolina	98	SC
44	South Dakota	77	SD
30	Tennessee	113	TN
1	Texas	179	TX
36	Utah	75	UT
5	US Virgin Islands	10	VI
42	Vermont	52	VT
25	Virginia	114	VA
52	Washington	103	WA
51	West Virginia	77	WV
2	Wisconsin	157	WI
15	Wyoming	57	WY

The easiest way to add tooltip data that shows the number of LECs when we hover over a particular state is to add it to the GeoJSON. Let's do this along with the state abbreviation. We will do this by adding it at the properties level.

In [20]:

for i in range(len(state_geo['features'])):
    try:
        state_geo['features'][i]['properties']['tooltip'] = state_geo['features'][i]['properties']['NAME'] +\
        ' ' + str(state_total.loc[i, 'LECs'])
    except:
        state_geo['features'][i]['properties']['tooltip'] = ''

Now when we look at the first record again for Guam we can see that the key "tooltip" has been added with the name and LEC total as the value.

In [21]:

state_geo['features'][0]['properties']

Out[21]:

{'STATEFP': '66',
 'STATENS': '01802705',
 'AFFGEOID': '0400000US66',
 'GEOID': '66',
 'STUSPS': 'GU',
 'NAME': 'Guam',
 'LSAD': '00',
 'ALAND': 543555847,
 'AWATER': 934337453,
 'tooltip': 'Guam 8'}

We are now ready to build the choropleth map using the state abbreviation to join the GeoJSON (STUSPS) and the state_total dataframe (Abbrev). I have left a few options commented out including different base layer tile options, the fill colors and a threshold scale to play around with to see what looks best.

In [22]:

m = folium.Map(location=[40, -95], zoom_start=4)
#folium.TileLayer('stamentoner').add_to(m)
#folium.TileLayer('stamenwatercolor').add_to(m)
#folium.TileLayer('cartodbpositron').add_to(m)

choropleth = folium.Choropleth(
    geo_data=state_geo,
    name='choropleth',
    data=state_total,
    columns=['Abbrev', 'LECs'],
    key_on='feature.properties.STUSPS',
    fill_color='YlGnBu',
    #fill_color='YlOrRd'
    fill_opacity=0.8,
    line_opacity=0.2,
    legend_name='Local Exchange Carriers (LECs) in Each State',
    #threshold_scale=[25, 50, 75, 100, 150, 200, 250],
    highlight=True
).add_to(m)

folium.LayerControl().add_to(m)

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['tooltip'], labels=False)
)

m

Out[22]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Hover over a state to see the total number of LECs in the tooltip.

Last, we can save the map as an html doc to use outside of the Jupyter Notebook environment.

In [23]:

m.save('State_LEC_Count.html')