Mapping LECs from FCC Form 499 Data

Posted on Sun 15 August 2021 in jupter-notebook

Mapping LECs from FCC Form 499 Data

Recently I have been looking into the data sets available from the FCC web site. Since the beginning of the year, I have been more involved at work with telecom data and decided to dig in to see what I could find. There is a lot to explore.

One of the primary data sets available is the Form 499 database.

Some Background on Form 499

The FCC Form 499 Filer Database is an identification system for all interstate telecommunications carriers, all interconnected voice over IP (VoIP) providers and certain other providers of interstate telecommunications in the US. More info on this data set can be found at the FCC Web Site

In this post, I download the full Form 499 data set of about 17,000 records. Then I extract the most current filings as of April 1, 2021 and quantify the carriers identified as Local Exchange Carriers by state. The final result is an interactive choropleth map of this data using the Python mapping library Folium.

In [1]:
# Import the necessary libraries

import folium
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import openpyxl
import requests

import pprint
pp = pprint.PrettyPrinter(indent=4)

I like to load data directly from a web site or API whenever possible so as to ensure my code can be easily reproduced. An interesting thing I learned when ingesting this file is that the xlrd library that Pandas uses to import Excel files no longer supports anything but .xls files. I have been using an environment with xlrd 1.2.0 with no problem. As of the xlrd 2.0 release, .xlsx files are not supported. Pandas provides a nice error message and points you to using the openpyxl library which is what I use here.

In [2]:
fcc499 = pd.read_excel('https://apps.fcc.gov/cgb/form499/499View.htm', engine='openpyxl')
In [3]:
# There are over 120 columns in this data set, so let's use the verbose=True option for the info method
# inorder to see all of them.

fcc499.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17114 entries, 0 to 17113
Data columns (total 122 columns):
 #    Column                        Dtype  
---   ------                        -----  
 0    Filer_499_ID                  int64  
 1    LastFiling                    object 
 2    USF_Contributor_1             object 
 3    Legal_Name_of_Carrier         object 
 4    Doing_Business_As             object 
 5    Principal_Comm_Type_1         object 
 6    Holding_Company               object 
 7    CORESID                       object 
 8    Management_Company            object 
 9    HQ_Address1                   object 
 10   HQ_Address2                   object 
 11   HQ_Address3                   object 
 12   HQ_City                       object 
 13   HQ_State                      object 
 14   HQ_Zip_Code                   object 
 15   Customer_Inquiries_Address1   object 
 16   Customer_Inquiries_Address2   object 
 17   Customer_Inquiries_Address3   object 
 18   Customer_Inquiries_City       object 
 19   Customer_Inquiries_State      object 
 20   Customer_Inquiries_Zip_Code   object 
 21   Customer_Inquiries_Telephone  object 
 22   Customer_Inquiries_Ext        float64
 23   Other_Trade_Name1             object 
 24   Other_Trade_Name2             object 
 25   Other_Trade_Name3             object 
 26   Other_Trade_Name4             object 
 27   Other_Trade_Name5             object 
 28   Other_Trade_Name6             object 
 29   Other_Trade_Name7             object 
 30   Other_Trade_Name8             object 
 31   Other_Trade_Name9             object 
 32   Other_Trade_Name10            object 
 33   Other_Trade_Name11            object 
 34   Other_Trade_Name12            object 
 35   Other_Trade_Name13            object 
 36   DC_Agent1                     object 
 37   DC_Agent2                     object 
 38   DC_Agent_Telephone            object 
 39   DC_Agent_Ext                  float64
 40   DC_Agent_Fax                  object 
 41   DC_Agent_EMail                object 
 42   DC_Agent_Address1             object 
 43   DC_Agent_Address2             object 
 44   DC_Agent_Address3             object 
 45   DC_Agent_City                 object 
 46   DC_Agent_State                object 
 47   DC_Agent_Zip                  object 
 48   Alternate_Agent1              object 
 49   Alternate_Agent2              object 
 50   Alternate_Agent_Telephone     object 
 51   Alternate_Agent_Ext           float64
 52   Alternate_Agent_Fax           object 
 53   Alternate_Agent_EMail         object 
 54   Alternate_Agent_Address1      object 
 55   Alternate_Agent_Address2      object 
 56   Alternate_Agent_Address3      object 
 57   Alternate_Agent_City          object 
 58   Alternate_Agent_State         object 
 59   Alternate_Agent_Zip_Code      object 
 60   Alabama                       bool   
 61   Alaska                        bool   
 62   American_Samoa                bool   
 63   Arizona                       bool   
 64   Arkansas                      bool   
 65   California                    bool   
 66   Colorado                      bool   
 67   Connecticut                   bool   
 68   Delaware                      bool   
 69   District_of_Columbia          bool   
 70   Florida                       bool   
 71   Georgia                       bool   
 72   Guam                          bool   
 73   Hawaii                        bool   
 74   Idaho                         bool   
 75   Illinois                      bool   
 76   Indiana                       bool   
 77   Iowa                          bool   
 78   Johnston_Atoll                bool   
 79   Kansas                        bool   
 80   Kentucky                      bool   
 81   Louisiana                     bool   
 82   Maine                         bool   
 83   Maryland                      bool   
 84   Massachusetts                 bool   
 85   Michigan                      bool   
 86   Midway_Atoll                  bool   
 87   Minnesota                     bool   
 88   Mississippi                   bool   
 89   Missouri                      bool   
 90   Montana                       bool   
 91   Nebraska                      bool   
 92   Nevada                        bool   
 93   New_Hampshire                 bool   
 94   New_Jersey                    bool   
 95   New_Mexico                    bool   
 96   New_York                      bool   
 97   North_Carolina                bool   
 98   North_Dakota                  bool   
 99   Northern_Mariana_Islands      bool   
 100  Ohio                          bool   
 101  Oklahoma                      bool   
 102  Oregon                        bool   
 103  Pennsylvania                  bool   
 104  Puerto_Rico                   bool   
 105  Rhode_Island                  bool   
 106  South_Carolina                bool   
 107  South_Dakota                  bool   
 108  Tennessee                     bool   
 109  Texas                         bool   
 110  Utah                          bool   
 111  US_Virgin_Islands             bool   
 112  Vermont                       bool   
 113  Virginia                      bool   
 114  Wake_Island                   bool   
 115  Washington                    bool   
 116  West_Virginia                 bool   
 117  Wisconsin                     bool   
 118  Wyoming                       bool   
 119  note1                         object 
 120  note2                         object 
 121  note3                         object 
dtypes: bool(59), float64(3), int64(1), object(59)
memory usage: 9.2+ MB

I'm interested mainly in the state columns so that I can count up the number of LECs operating in each state. Along with these, I want to grab the first two columns Filer_499_ID and LastFiling which has the filing date. I also want to include the Legal_Name_of_Carrier and a key field called Principal_Comm_Type_1. I will use this to determine if the carrier is a LEC.

Since I am selecting a set of non-contiguous columns, the easiest way to do this is to use numpy.r_ which is used to concatenate array slices along the row axis. Combining this with the .iloc indexer is a really nice way to easily define the columns desired in one step.

See the excellent blog site Practical Business Python and this post in particular for more info.

Also, notice I have used the .copy() method to create a copy of the dataframe. This is important because otherwise you get a warning message.

For more about this issue see this excellent explanation.

In [4]:
fcc499_states = fcc499.iloc[:, np.r_[0:2, 3, 5, 60:119]].copy()

Let's change the LastFiling column to a datetime type inorder to be able to limit the result to the most recent filing. I have also inlcuded an alternative method to do this commented out below the first method.

In [5]:
fcc499_states['LastFiling'] = pd.to_datetime(fcc499_states['LastFiling'])
#fcc499_states['LastFiling'] = fcc499_states['LastFiling'].astype('datetime64[ns]')
fcc499_states.head()
Out[5]:
Filer_499_ID LastFiling Legal_Name_of_Carrier Principal_Comm_Type_1 Alabama Alaska American_Samoa Arizona Arkansas California ... Texas Utah US_Virgin_Islands Vermont Virginia Wake_Island Washington West_Virginia Wisconsin Wyoming
0 801264 2021-04-01 Tri-County Telephone Company Incumbent LEC False False False False False False ... False False False False False False False False False False
1 801267 2021-04-01 Duo County Tel. Coop. Corp., Inc. Incumbent LEC False False False False False False ... False False False False False False False False False False
2 801270 2021-04-01 K & M Telephone Company, Inc Incumbent LEC False False False False False False ... False False False False False False False False False False
3 801276 2019-04-01 CN Johnson Holding Payphone Service Provider False False False False False False ... False True False False False False False False False False
4 801279 2021-04-01 Rock Port Tel. Co. Incumbent LEC False False False False False False ... False False False False False False False False False False

5 rows × 63 columns

You can see from the value counts below, that April 1, 2021 is the most recent and most represented filing date. Let's limit our anlaysis on just these records.

In [6]:
fcc499_states['LastFiling'].value_counts().sort_index(ascending=False)
Out[6]:
2021-04-01    6582
2020-04-01     693
2019-04-01     631
2018-04-02     474
2017-04-03     475
2016-04-01     566
2015-04-01     518
2014-04-01     456
2013-04-01     376
2012-04-02     451
2011-04-01     384
2010-04-01     356
2009-04-01     491
2008-04-01     445
2007-04-02     366
2007-02-01       1
2006-04-03     444
2005-05-01       1
2005-04-01     323
2005-02-01       1
2004-08-01       1
2004-04-01     404
2003-05-01       1
2003-04-01     386
2002-08-01       5
2002-05-01       4
2002-04-01    1540
2001-08-01       1
2001-05-01       1
2001-04-02     592
2001-04-01       4
2000-09-01      15
2000-04-01      26
1999-09-01       6
1999-04-01      24
1998-09-01       6
1998-04-01       6
1997-09-01       7
Name: LastFiling, dtype: int64
In [7]:
fcc499_states = fcc499_states[fcc499_states['LastFiling'] == '2021-04-01']

We now have the data extracted that we want and in the proper format. It comprises just over 6,500 rows of data.

In [8]:
fcc499_states.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6582 entries, 0 to 16112
Data columns (total 63 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Filer_499_ID              6582 non-null   int64         
 1   LastFiling                6582 non-null   datetime64[ns]
 2   Legal_Name_of_Carrier     6582 non-null   object        
 3   Principal_Comm_Type_1     6582 non-null   object        
 4   Alabama                   6582 non-null   bool          
 5   Alaska                    6582 non-null   bool          
 6   American_Samoa            6582 non-null   bool          
 7   Arizona                   6582 non-null   bool          
 8   Arkansas                  6582 non-null   bool          
 9   California                6582 non-null   bool          
 10  Colorado                  6582 non-null   bool          
 11  Connecticut               6582 non-null   bool          
 12  Delaware                  6582 non-null   bool          
 13  District_of_Columbia      6582 non-null   bool          
 14  Florida                   6582 non-null   bool          
 15  Georgia                   6582 non-null   bool          
 16  Guam                      6582 non-null   bool          
 17  Hawaii                    6582 non-null   bool          
 18  Idaho                     6582 non-null   bool          
 19  Illinois                  6582 non-null   bool          
 20  Indiana                   6582 non-null   bool          
 21  Iowa                      6582 non-null   bool          
 22  Johnston_Atoll            6582 non-null   bool          
 23  Kansas                    6582 non-null   bool          
 24  Kentucky                  6582 non-null   bool          
 25  Louisiana                 6582 non-null   bool          
 26  Maine                     6582 non-null   bool          
 27  Maryland                  6582 non-null   bool          
 28  Massachusetts             6582 non-null   bool          
 29  Michigan                  6582 non-null   bool          
 30  Midway_Atoll              6582 non-null   bool          
 31  Minnesota                 6582 non-null   bool          
 32  Mississippi               6582 non-null   bool          
 33  Missouri                  6582 non-null   bool          
 34  Montana                   6582 non-null   bool          
 35  Nebraska                  6582 non-null   bool          
 36  Nevada                    6582 non-null   bool          
 37  New_Hampshire             6582 non-null   bool          
 38  New_Jersey                6582 non-null   bool          
 39  New_Mexico                6582 non-null   bool          
 40  New_York                  6582 non-null   bool          
 41  North_Carolina            6582 non-null   bool          
 42  North_Dakota              6582 non-null   bool          
 43  Northern_Mariana_Islands  6582 non-null   bool          
 44  Ohio                      6582 non-null   bool          
 45  Oklahoma                  6582 non-null   bool          
 46  Oregon                    6582 non-null   bool          
 47  Pennsylvania              6582 non-null   bool          
 48  Puerto_Rico               6582 non-null   bool          
 49  Rhode_Island              6582 non-null   bool          
 50  South_Carolina            6582 non-null   bool          
 51  South_Dakota              6582 non-null   bool          
 52  Tennessee                 6582 non-null   bool          
 53  Texas                     6582 non-null   bool          
 54  Utah                      6582 non-null   bool          
 55  US_Virgin_Islands         6582 non-null   bool          
 56  Vermont                   6582 non-null   bool          
 57  Virginia                  6582 non-null   bool          
 58  Wake_Island               6582 non-null   bool          
 59  Washington                6582 non-null   bool          
 60  West_Virginia             6582 non-null   bool          
 61  Wisconsin                 6582 non-null   bool          
 62  Wyoming                   6582 non-null   bool          
dtypes: bool(59), datetime64[ns](1), int64(1), object(2)
memory usage: 636.3+ KB

Now let's focus on the Principal_Comm_Type_1 field. This field is the category in which the carrier classified itself on the Form 499. Since we are interested in Local Exchange Carriers, we will select Incumbent LEC and CAP/CLEC. Incumbent Local Exchange Carriers (ILECs) and Competitive Local Exchange Carrier (CLECs) were born from the break up of Ma Bell with the 1996 Telecommunications Act. Up to that point, AT&T and the "Baby Bells" provided the bulk of local telephone service. The 1996 Act allowed other competitors to enter the market by buying access to the existing network. CAP stands for Competitive Access Provider and is essentially the same as a CLEC. For more history and background see a nice description in the telecom glossary provided Bandwidth.

In [9]:
fcc499_states['Principal_Comm_Type_1'].value_counts()
Out[9]:
Interconnected Voip               2093
Incumbent LEC                     1224
CAP/CLEC                           775
Toll Reseller                      480
Private Service Provider           363
Cellular/PCS/SMR                   260
Local Reseller                     235
Wireless Data                      204
Non-interconnected VoIP            162
Interexchange Carrier              134
SMR (dispatch)                     107
Other Local                         93
Other Toll                          91
Paging & Messaging                  81
Audio Bridge Service                66
Prepaid Card                        59
Payphone Service Provider           49
Satellite Service Provider          38
Other Mobile                        24
Operator Service Provider           22
Shared-Tenant Service Provider      22
Name: Principal_Comm_Type_1, dtype: int64
In [10]:
fcc499_states_lec = fcc499_states[(fcc499_states['Principal_Comm_Type_1'] == 'CAP/CLEC') | \
                                (fcc499_states['Principal_Comm_Type_1'] == 'Incumbent LEC')]

Selecting just these two types reduces the list to 1,999 rows.

In [11]:
fcc499_states_lec.shape
Out[11]:
(1999, 63)

Now I want to sum up the number of carriers in each state to do some EDA and see how the data is distributed.

In [12]:
state_total = fcc499_states_lec.iloc[:,4:63].sum(axis=0)
state_total = pd.DataFrame(state_total)
state_total = state_total.reset_index()
state_total = state_total.rename(columns={'index': 'State', 0: 'LECs'})
state_total
Out[12]:
State LECs
0 Alabama 101
1 Alaska 47
2 American_Samoa 6
3 Arizona 93
4 Arkansas 89
5 California 135
6 Colorado 109
7 Connecticut 62
8 Delaware 51
9 District_of_Columbia 58
10 Florida 113
11 Georgia 153
12 Guam 8
13 Hawaii 42
14 Idaho 76
15 Illinois 162
16 Indiana 132
17 Iowa 252
18 Johnston_Atoll 4
19 Kansas 118
20 Kentucky 106
21 Louisiana 85
22 Maine 67
23 Maryland 83
24 Massachusetts 75
25 Michigan 127
26 Midway_Atoll 4
27 Minnesota 171
28 Mississippi 75
29 Missouri 121
30 Montana 73
31 Nebraska 105
32 Nevada 80
33 New_Hampshire 56
34 New_Jersey 88
35 New_Mexico 81
36 New_York 146
37 North_Carolina 111
38 North_Dakota 77
39 Northern_Mariana_Islands 6
40 Ohio 138
41 Oklahoma 113
42 Oregon 112
43 Pennsylvania 144
44 Puerto_Rico 28
45 Rhode_Island 43
46 South_Carolina 98
47 South_Dakota 77
48 Tennessee 113
49 Texas 179
50 Utah 75
51 US_Virgin_Islands 10
52 Vermont 52
53 Virginia 114
54 Wake_Island 7
55 Washington 103
56 West_Virginia 77
57 Wisconsin 157
58 Wyoming 57

In order to match the state names to the GeoJSON file I want to use for the map, we need to do a little more clean up - namely replace the underscores in the two word state names with a space.

In [13]:
# Replace _ in state names

i = 0
for c in state_total.State:
    state_total.loc[i, 'State'] = c.replace('_', ' ')
    i += 1
state_total
Out[13]:
State LECs
0 Alabama 101
1 Alaska 47
2 American Samoa 6
3 Arizona 93
4 Arkansas 89
5 California 135
6 Colorado 109
7 Connecticut 62
8 Delaware 51
9 District of Columbia 58
10 Florida 113
11 Georgia 153
12 Guam 8
13 Hawaii 42
14 Idaho 76
15 Illinois 162
16 Indiana 132
17 Iowa 252
18 Johnston Atoll 4
19 Kansas 118
20 Kentucky 106
21 Louisiana 85
22 Maine 67
23 Maryland 83
24 Massachusetts 75
25 Michigan 127
26 Midway Atoll 4
27 Minnesota 171
28 Mississippi 75
29 Missouri 121
30 Montana 73
31 Nebraska 105
32 Nevada 80
33 New Hampshire 56
34 New Jersey 88
35 New Mexico 81
36 New York 146
37 North Carolina 111
38 North Dakota 77
39 Northern Mariana Islands 6
40 Ohio 138
41 Oklahoma 113
42 Oregon 112
43 Pennsylvania 144
44 Puerto Rico 28
45 Rhode Island 43
46 South Carolina 98
47 South Dakota 77
48 Tennessee 113
49 Texas 179
50 Utah 75
51 US Virgin Islands 10
52 Vermont 52
53 Virginia 114
54 Wake Island 7
55 Washington 103
56 West Virginia 77
57 Wisconsin 157
58 Wyoming 57

Now let's get a the GeoJSON file that we can use to define the states in the map. I found a nice one from Logan Powell on his github. This one works better than the one given in the simple Leaflet tutorial since it includes not only the 50 states but US territories and most importly as a resident of Washington, DC, the Disctict of Columbia.

In [14]:
url = 'https://raw.githubusercontent.com/loganpowell/census-geojson/master/GeoJSON/500k/2020/state.json'
state_geo = requests.get(url).json()

After taking a look at the GeoJSON, we need to get the state names and their abbreviations. This is found after all of the geo codes that define the polygon boundary in properties with NAME and STUSPS. Below is an example of how to access the date for the first record which happens to be Guam.

In [15]:
print(state_geo['features'][0]['properties']['NAME'],
state_geo['features'][0]['properties']['STUSPS'])
Guam GU

Now I want to grab all the states and their abbreviations into a dictionary so that I can use this to map the abbreviations and the index location in the GeoJSON to my dataframe with the LEC counts.

Create the dictionay of abbreviations with state name as the key and abbreviation as the value. Since the name used in the LEC Counts is slightly different than what is in the GeoJSON file, I need to make a few adjustments for the US Virgin Islands and the Northern Marianna Islands.

In [16]:
state_abbreviations = dict()
for i in range(len(state_geo['features'])):
    state_name = state_geo['features'][i]['properties']['NAME']
    state_abbrev = state_geo['features'][i]['properties']['STUSPS']
    state_abbreviations[state_name] = state_abbrev
state_abbreviations['US Virgin Islands'] = state_abbreviations['United States Virgin Islands']
state_abbreviations['Northern Mariana Islands'] = state_abbreviations['Commonwealth of the Northern Mariana Islands']
state_abbreviations
Out[16]:
{'Guam': 'GU',
 'Texas': 'TX',
 'Wisconsin': 'WI',
 'Rhode Island': 'RI',
 'New York': 'NY',
 'United States Virgin Islands': 'VI',
 'New Hampshire': 'NH',
 'Minnesota': 'MN',
 'Puerto Rico': 'PR',
 'Missouri': 'MO',
 'North Carolina': 'NC',
 'Michigan': 'MI',
 'Louisiana': 'LA',
 'Nebraska': 'NE',
 'California': 'CA',
 'Wyoming': 'WY',
 'South Carolina': 'SC',
 'Commonwealth of the Northern Mariana Islands': 'MP',
 'Kansas': 'KS',
 'Delaware': 'DE',
 'Alaska': 'AK',
 'New Jersey': 'NJ',
 'North Dakota': 'ND',
 'District of Columbia': 'DC',
 'Colorado': 'CO',
 'Virginia': 'VA',
 'Indiana': 'IN',
 'Nevada': 'NV',
 'New Mexico': 'NM',
 'Alabama': 'AL',
 'Tennessee': 'TN',
 'Kentucky': 'KY',
 'Oregon': 'OR',
 'Mississippi': 'MS',
 'Connecticut': 'CT',
 'Georgia': 'GA',
 'Utah': 'UT',
 'Idaho': 'ID',
 'Illinois': 'IL',
 'Iowa': 'IA',
 'Arizona': 'AZ',
 'American Samoa': 'AS',
 'Vermont': 'VT',
 'Montana': 'MT',
 'South Dakota': 'SD',
 'Pennsylvania': 'PA',
 'Oklahoma': 'OK',
 'Maryland': 'MD',
 'Maine': 'ME',
 'Hawaii': 'HI',
 'Ohio': 'OH',
 'West Virginia': 'WV',
 'Washington': 'WA',
 'Arkansas': 'AR',
 'Massachusetts': 'MA',
 'Florida': 'FL',
 'US Virgin Islands': 'VI',
 'Northern Mariana Islands': 'MP'}

Let's do the same for the index values with state name again as the key and the index as the value.

In [17]:
state_ids = dict()
for i in range(len(state_geo['features'])):
    state_name = state_geo['features'][i]['properties']['NAME']
    state_ids[state_name] = i
state_ids['US Virgin Islands'] = state_ids['United States Virgin Islands']
state_ids['Northern Mariana Islands'] = state_ids['Commonwealth of the Northern Mariana Islands']
state_ids
Out[17]:
{'Guam': 0,
 'Texas': 1,
 'Wisconsin': 2,
 'Rhode Island': 3,
 'New York': 4,
 'United States Virgin Islands': 5,
 'New Hampshire': 6,
 'Minnesota': 7,
 'Puerto Rico': 8,
 'Missouri': 9,
 'North Carolina': 10,
 'Michigan': 11,
 'Louisiana': 12,
 'Nebraska': 13,
 'California': 14,
 'Wyoming': 15,
 'South Carolina': 16,
 'Commonwealth of the Northern Mariana Islands': 17,
 'Kansas': 18,
 'Delaware': 19,
 'Alaska': 20,
 'New Jersey': 21,
 'North Dakota': 22,
 'District of Columbia': 23,
 'Colorado': 24,
 'Virginia': 25,
 'Indiana': 26,
 'Nevada': 27,
 'New Mexico': 28,
 'Alabama': 29,
 'Tennessee': 30,
 'Kentucky': 31,
 'Oregon': 32,
 'Mississippi': 33,
 'Connecticut': 34,
 'Georgia': 35,
 'Utah': 36,
 'Idaho': 37,
 'Illinois': 38,
 'Iowa': 39,
 'Arizona': 40,
 'American Samoa': 41,
 'Vermont': 42,
 'Montana': 43,
 'South Dakota': 44,
 'Pennsylvania': 45,
 'Oklahoma': 46,
 'Maryland': 47,
 'Maine': 48,
 'Hawaii': 49,
 'Ohio': 50,
 'West Virginia': 51,
 'Washington': 52,
 'Arkansas': 53,
 'Massachusetts': 54,
 'Florida': 55,
 'US Virgin Islands': 5,
 'Northern Mariana Islands': 17}

Now I'll map the values to the states and also drop the few null values for territories not in the GeoJSON file to get the dataframe with the state totals for ILECs along with the abbreviation. I also want to use the GeoJSON index value as the index for the final version of the state_total dataframe.

In [18]:
state_total['Abbrev'] = state_total['State'].map(state_abbreviations)
state_total['ID'] = state_total['State'].map(state_ids)
state_total
Out[18]:
State LECs Abbrev ID
0 Alabama 101 AL 29.0
1 Alaska 47 AK 20.0
2 American Samoa 6 AS 41.0
3 Arizona 93 AZ 40.0
4 Arkansas 89 AR 53.0
5 California 135 CA 14.0
6 Colorado 109 CO 24.0
7 Connecticut 62 CT 34.0
8 Delaware 51 DE 19.0
9 District of Columbia 58 DC 23.0
10 Florida 113 FL 55.0
11 Georgia 153 GA 35.0
12 Guam 8 GU 0.0
13 Hawaii 42 HI 49.0
14 Idaho 76 ID 37.0
15 Illinois 162 IL 38.0
16 Indiana 132 IN 26.0
17 Iowa 252 IA 39.0
18 Johnston Atoll 4 NaN NaN
19 Kansas 118 KS 18.0
20 Kentucky 106 KY 31.0
21 Louisiana 85 LA 12.0
22 Maine 67 ME 48.0
23 Maryland 83 MD 47.0
24 Massachusetts 75 MA 54.0
25 Michigan 127 MI 11.0
26 Midway Atoll 4 NaN NaN
27 Minnesota 171 MN 7.0
28 Mississippi 75 MS 33.0
29 Missouri 121 MO 9.0
30 Montana 73 MT 43.0
31 Nebraska 105 NE 13.0
32 Nevada 80 NV 27.0
33 New Hampshire 56 NH 6.0
34 New Jersey 88 NJ 21.0
35 New Mexico 81 NM 28.0
36 New York 146 NY 4.0
37 North Carolina 111 NC 10.0
38 North Dakota 77 ND 22.0
39 Northern Mariana Islands 6 MP 17.0
40 Ohio 138 OH 50.0
41 Oklahoma 113 OK 46.0
42 Oregon 112 OR 32.0
43 Pennsylvania 144 PA 45.0
44 Puerto Rico 28 PR 8.0
45 Rhode Island 43 RI 3.0
46 South Carolina 98 SC 16.0
47 South Dakota 77 SD 44.0
48 Tennessee 113 TN 30.0
49 Texas 179 TX 1.0
50 Utah 75 UT 36.0
51 US Virgin Islands 10 VI 5.0
52 Vermont 52 VT 42.0
53 Virginia 114 VA 25.0
54 Wake Island 7 NaN NaN
55 Washington 103 WA 52.0
56 West Virginia 77 WV 51.0
57 Wisconsin 157 WI 2.0
58 Wyoming 57 WY 15.0
In [19]:
state_total = state_total.dropna()
state_total = state_total.set_index('ID')
state_total.index = state_total.index.astype(int)
state_total
Out[19]:
State LECs Abbrev
ID
29 Alabama 101 AL
20 Alaska 47 AK
41 American Samoa 6 AS
40 Arizona 93 AZ
53 Arkansas 89 AR
14 California 135 CA
24 Colorado 109 CO
34 Connecticut 62 CT
19 Delaware 51 DE
23 District of Columbia 58 DC
55 Florida 113 FL
35 Georgia 153 GA
0 Guam 8 GU
49 Hawaii 42 HI
37 Idaho 76 ID
38 Illinois 162 IL
26 Indiana 132 IN
39 Iowa 252 IA
18 Kansas 118 KS
31 Kentucky 106 KY
12 Louisiana 85 LA
48 Maine 67 ME
47 Maryland 83 MD
54 Massachusetts 75 MA
11 Michigan 127 MI
7 Minnesota 171 MN
33 Mississippi 75 MS
9 Missouri 121 MO
43 Montana 73 MT
13 Nebraska 105 NE
27 Nevada 80 NV
6 New Hampshire 56 NH
21 New Jersey 88 NJ
28 New Mexico 81 NM
4 New York 146 NY
10 North Carolina 111 NC
22 North Dakota 77 ND
17 Northern Mariana Islands 6 MP
50 Ohio 138 OH
46 Oklahoma 113 OK
32 Oregon 112 OR
45 Pennsylvania 144 PA
8 Puerto Rico 28 PR
3 Rhode Island 43 RI
16 South Carolina 98 SC
44 South Dakota 77 SD
30 Tennessee 113 TN
1 Texas 179 TX
36 Utah 75 UT
5 US Virgin Islands 10 VI
42 Vermont 52 VT
25 Virginia 114 VA
52 Washington 103 WA
51 West Virginia 77 WV
2 Wisconsin 157 WI
15 Wyoming 57 WY

The easiest way to add tooltip data that shows the number of LECs when we hover over a particular state is to add it to the GeoJSON. Let's do this along with the state abbreviation. We will do this by adding it at the properties level.

In [20]:
for i in range(len(state_geo['features'])):
    try:
        state_geo['features'][i]['properties']['tooltip'] = state_geo['features'][i]['properties']['NAME'] +\
        ' ' + str(state_total.loc[i, 'LECs'])
    except:
        state_geo['features'][i]['properties']['tooltip'] = ''

Now when we look at the first record again for Guam we can see that the key "tooltip" has been added with the name and LEC total as the value.

In [21]:
state_geo['features'][0]['properties']
Out[21]:
{'STATEFP': '66',
 'STATENS': '01802705',
 'AFFGEOID': '0400000US66',
 'GEOID': '66',
 'STUSPS': 'GU',
 'NAME': 'Guam',
 'LSAD': '00',
 'ALAND': 543555847,
 'AWATER': 934337453,
 'tooltip': 'Guam 8'}

We are now ready to build the choropleth map using the state abbreviation to join the GeoJSON (STUSPS) and the state_total dataframe (Abbrev). I have left a few options commented out including different base layer tile options, the fill colors and a threshold scale to play around with to see what looks best.

In [22]:
m = folium.Map(location=[40, -95], zoom_start=4)
#folium.TileLayer('stamentoner').add_to(m)
#folium.TileLayer('stamenwatercolor').add_to(m)
#folium.TileLayer('cartodbpositron').add_to(m)

choropleth = folium.Choropleth(
    geo_data=state_geo,
    name='choropleth',
    data=state_total,
    columns=['Abbrev', 'LECs'],
    key_on='feature.properties.STUSPS',
    fill_color='YlGnBu',
    #fill_color='YlOrRd'
    fill_opacity=0.8,
    line_opacity=0.2,
    legend_name='Local Exchange Carriers (LECs) in Each State',
    #threshold_scale=[25, 50, 75, 100, 150, 200, 250],
    highlight=True
).add_to(m)

folium.LayerControl().add_to(m)

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['tooltip'], labels=False)
)

m
Out[22]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Hover over a state to see the total number of LECs in the tooltip.

Last, we can save the map as an html doc to use outside of the Jupyter Notebook environment.

In [23]:
m.save('State_LEC_Count.html')