More VC Gender/Race Discussion

Let’s continue our discussion about Richard Kerby’s data on race/gender diversity in venture capital from Sunday. I did a touch more cleaning of the data — exact details of which are left as an exercise for the reader — leaving us with a nice tibble.

x <- read_rds(url("https://www.davidkane.info/files/blog_files/vc.rds"))
x$title <- as.factor(x$title)

I have assigned this object to x, which is my preferred name for whatever the main object of analysis is within an R session. Others prefer to reserve x for vectors and insist that the main tibble/data frame should be named df. I forgot to make title, which only has three levels, a factor last time. Anyway:

summary(x)
##      name               firm                 title     eng_degree       operator      
##  Length:1487        Length:1487        Associate:343   Mode :logical   Mode :logical  
##  Class :character   Class :character   Partner  :915   FALSE:1074      FALSE:877      
##  Mode  :character   Mode  :character   Principal:229   TRUE :403       TRUE :600      
##                                                        NA's :10        NA's :10       
##   undergrad             grad              gender         race     
##  Length:1487        Length:1487        Female: 274   Asian : 387  
##  Class :character   Class :character   Male  :1213   Black :  43  
##  Mode  :character   Mode  :character                 Latinx:  24  
##                                                      White :1033

I am not sure what the story is with the 10 missing indicators for being an operator or for having an engineering degree.

filter(x, is.na(operator))
## # A tibble: 10 x 9
##    name            firm         title  eng_degree operator undergrad grad    gender race 
##    <chr>           <chr>        <fct>  <lgl>      <lgl>    <chr>     <chr>   <fct>  <fct>
##  1 Anu Duggal      Female Foun… Partn… NA         NA       Vassar    London  Female Asian
##  2 Sutian Dong     Female Foun… Partn… NA         NA       NYU       <NA>    Female Asian
##  3 John Doerr      Kleiner Per… Partn… NA         NA       Rice      Rice    Male   White
##  4 David Singer    Maverick Ve… Partn… NA         NA       Yale      Stanfo… Male   White
##  5 Rob Ward        Meritech     Partn… NA         NA       Williams  MIT     Male   White
##  6 Jeff Del Presto Polaris      Assoc… NA         NA       UVA       UVA     Male   White
##  7 Clarey Zhu      TCV          Assoc… NA         NA       Yale      <NA>    Female Asian
##  8 Michael Chu     Tenaya Capi… Assoc… NA         NA       Berkeley  <NA>    Male   Asian
##  9 Bill Ericson    Wildcat Ven… Partn… NA         NA       Georgeto… Northw… Male   White
## 10 Patrick Chung   X Fund       Partn… NA         NA       Harvard   Harvard Male   Asian

But anyone who is missing one is also missing the other. The issue is present in the raw data as well, so I suspect that this is just an oversight on Kerby’s part.

As we discussed last time, VCs listed as Associates are often quite junior (including current undergraduates). So, let’s leave them out of this analysis, since my goal is to look at race/gender across “powerful” people in VC.

Consider race:

x %>% 
  filter(title != "Associate") %>% 
  group_by(race) %>% 
  summarize(n = n()) %>%
  ungroup() %>% 
  mutate(percentage = round(100 * n/sum(n))) %>% 
  arrange(desc(n))
## # A tibble: 4 x 3
##   race       n percentage
##   <fct>  <int>      <dbl>
## 1 White    833         73
## 2 Asian    271         24
## 3 Black     24          2
## 4 Latinx    16          1

Less than 3% of senior VCs are Black/Latinx. Now gender:

x %>% 
  filter(title != "Associate") %>% 
  group_by(gender) %>% 
  summarize(n = n()) %>%
  ungroup() %>% 
  mutate(percentage = round(100 * n/sum(n))) %>% 
  arrange(desc(n))
## # A tibble: 2 x 3
##   gender     n percentage
##   <fct>  <int>      <dbl>
## 1 Male     989         86
## 2 Female   155         14

Less than 15% are female. Consider race and gender together:

x %>% 
  filter(title != "Associate") %>% 
  group_by(gender, race) %>% 
  summarize(n = n()) %>%
  ungroup() %>% 
  mutate(percentage = round(100 * n/sum(n))) %>% 
  arrange(desc(n))
## # A tibble: 8 x 4
##   gender race       n percentage
##   <fct>  <fct>  <int>      <dbl>
## 1 Male   White    731         64
## 2 Male   Asian    228         20
## 3 Female White    102          9
## 4 Female Asian     43          4
## 5 Male   Black     20          2
## 6 Male   Latinx    10          1
## 7 Female Latinx     6          1
## 8 Female Black      4          0

There are only 10 (!) non-associate female Black/Latinx VCs. They are:

x %>% 
  filter(title != "Associate") %>% 
  filter(race %in% c("Black", "Latinx"), gender == "Female")
## # A tibble: 10 x 9
##    name              firm       title   eng_degree operator undergrad grad   gender race 
##    <chr>             <chr>      <fct>   <lgl>      <lgl>    <chr>     <chr>  <fct>  <fct>
##  1 Jenny Gao         Bessemer   Princi… FALSE      FALSE    Harvard   <NA>   Female Lati…
##  2 Terri Burns       Google Ve… Princi… TRUE       TRUE     NYU       <NA>   Female Black
##  3 Ulili Onovakpuri  Kapor Cap… Partner FALSE      FALSE    Berkeley  Duke   Female Black
##  4 Carolina Huaranca Kapor Cap… Princi… FALSE      TRUE     Penn      Corne… Female Lati…
##  5 Maia Sharpley     Learn Cap… Partner FALSE      FALSE    Trinity   Michi… Female Black
##  6 Renata Quintini   Lux Capit… Partner FALSE      FALSE    Stanford  Stanf… Female Lati…
##  7 Samara Hernandez  MATH Vent… Princi… FALSE      FALSE    Michigan  North… Female Lati…
##  8 Vanessa Larco     NEA        Partner TRUE       TRUE     Georgia … <NA>   Female Lati…
##  9 Shauntel Poulson  Reach Cap… Partner TRUE       TRUE     MIT       Stanf… Female Black
## 10 Miriam Rivera     Ulu Ventu… Partner FALSE      FALSE    Stanford  Stanf… Female Lati…

Some comments:

  • Is this data accurate? Jenny Gao does not present (to me?) at Latinx.
  • Many of these firms are smaller and (presumably?) less well-established:
x %>% 
  count(firm) %>% 
  filter(firm %in% c("Kapor Capital", "Learn Capital", "MATH Venture Partners", 
                     "Ulu Ventures", "Reach Capital")) %>% 
  arrange(desc(n))
## # A tibble: 5 x 2
##   firm                      n
##   <chr>                 <int>
## 1 Kapor Capital             6
## 2 Learn Capital             5
## 3 Reach Capital             5
## 4 MATH Venture Partners     4
## 5 Ulu Ventures              3

Making some graphics with this data is a project for another day.

comments powered by Disqus