We had a very open-ended invitation to explore one of several datasets on

http://totalgood.github.io/hackor/

We chose to explore the Raw Committe Transactions table.

The following describes more or less where our exploration ended up, rather than showing the many false starts and dead ends that we encountered.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
raw_trans  <- read.delim("/Users/jds/Desktop/hackor/raw_committee_transactions.csv", stringsAsFactors = F, sep = ",")

The raw dataset has 977337 rows and 42 columns.

#  Correct the data types and add a flag for negative amounts.
raw_trans <- raw_trans %>% 
  mutate(tran_id = as.character(tran_id),
         original_id = as.character(original_id),
         atteset_date = ymd(attest_date),
         review_date = ymd(review_date),
         due_date = ymd(due_date),
         neg_amt_flag = ifelse(amount < 0, "Y", "N")
         )

# have a look at the raw data:
head(raw_trans)
##   tran_id original_id  tran_date tran_status
## 1 1430945     1430945 2012-11-20    Original
## 2 1202420     1202420 2012-03-19    Original
## 3 1202445     1202445 2012-03-19    Original
## 4 1212902     1212902 2012-03-30    Original
## 5 1179133     1179133 2012-02-14    Original
## 6 1165767     1165767 2012-02-10    Original
##                                    filer
## 1 Citizen Action for Political Education
## 2 Citizen Action for Political Education
## 3 Citizen Action for Political Education
## 4 Citizen Action for Political Education
## 5 Citizen Action for Political Education
## 6       Committee to Elect Cheryl Hukill
##                                  contributor_payee          sub_type
## 1 Miscellaneous Cash Contributions $100 and under  Cash Contribution
## 2   Miscellaneous Cash Expenditures $100 and under  Cash Expenditure
## 3   Miscellaneous Cash Expenditures $100 and under  Cash Expenditure
## 4 Miscellaneous Cash Contributions $100 and under  Cash Contribution
## 5 Miscellaneous Cash Contributions $100 and under  Cash Contribution
## 6                                   Luther Horsley Cash Contribution
##     amount aggregate_amount contributor_payee_committee_id filer_id
## 1 16080.14                0                             NA       33
## 2   325.11                0                             NA       33
## 3   325.11                0                             NA       33
## 4     4.16                0                             NA       33
## 5  3680.05                0                             NA       33
## 6   250.00              250                             NA    12581
##     attest_by_name attest_date review_by_name review_date   due_date
## 1 Aurora Prestholt  2012-12-19         nanfer  2012-12-19 2012-12-20
## 2 Aurora Prestholt  2012-04-01         nanfer  2012-04-02 2012-04-10
## 3 Aurora Prestholt  2012-04-01         nanfer  2012-04-02 2012-04-10
## 4 Aurora Prestholt  2012-04-11         nanfer  2012-04-11 2012-04-10
## 5 Aurora Prestholt  2012-03-01         nanfer  2012-03-05 2012-03-15
## 6  Ronald R Hukill  2012-02-10         janflo  2012-03-13 2012-03-12
##   occptn_ltr_date pymt_sched_txt purp_desc intrst_rate check_nbr
## 1                                                             NA
## 2                                                             NA
## 3                                                             NA
## 4                                                             NA
## 5                                                             NA
## 6                                                             NA
##   tran_stsfd_ind    filed_by_name filed_date addr_book_agent_name
## 1              f Aurora Prestholt 2012-12-19                     
## 2              f Aurora Prestholt 2012-04-01                     
## 3              f Aurora Prestholt 2012-04-01                     
## 4              f Aurora Prestholt 2012-04-11                     
## 5              f Aurora Prestholt 2012-03-01                     
## 6              f  Ronald R Hukill 2012-03-12                     
##    book_type title_txt occptn_txt emp_name emp_city emp_state employ_ind
## 1                                                                      f
## 2                                                                      f
## 3                                                                      f
## 4                                                                      f
## 5                                                                      f
## 6 Individual               Farmer                                      f
##   self_employ_ind addr_line1 addr_line2    city state   zip zip_plus_four
## 1               f                                        NA            NA
## 2               f                                        NA            NA
## 3               f                                        NA            NA
## 4               f                                        NA            NA
## 5               f                                        NA            NA
## 6               t PO Box 209            Midland    OR 97634            NA
##    county purpose_codes exp_date atteset_date neg_amt_flag
## 1                             NA   2012-12-19            N
## 2                             NA   2012-04-01            N
## 3                             NA   2012-04-01            N
## 4                             NA   2012-04-11            N
## 5                             NA   2012-03-01            N
## 6 Klamath                     NA   2012-02-10            N
# have a look at negative transactions, because that sound peculiar:
filter(raw_trans, neg_amt_flag == "Y")  %>%  head()
##   tran_id original_id  tran_date tran_status
## 1 1440832     1440832 2012-12-31    Original
## 2 1435213     1435213 2012-12-19    Original
## 3 1418410     1418410 2012-11-16    Original
## 4  542546      542546 0029-01-14    Original
## 5 1388147     1388145 2012-10-22     Amended
## 6 1981143     1979608 2015-02-05     Amended
##                                        filer
## 1     Friends of Clackamas Community College
## 2                 Don Moore for Commissioner
## 3            Committee to Elect Regina Ayars
## 4                  Friends of Jason Atkinson
## 5                           SW Rail Vote PAC
## 6 Marion County Democratic Central Committee
##                                contributor_payee                sub_type
## 1                 Clackamas Federal Credit Union Cash Balance Adjustment
## 2                                                Cash Balance Adjustment
## 3                                                Cash Balance Adjustment
## 4 Miscellaneous Cash Expenditures $100 and under        Cash Expenditure
## 5                                                Cash Balance Adjustment
## 6                                    Sarah White Cash Balance Adjustment
##    amount aggregate_amount contributor_payee_committee_id filer_id
## 1 -198.75                0                             NA    11247
## 2 -259.40                0                             NA    15512
## 3   -4.41                0                             NA    15822
## 4  -24.48                0                             NA     2864
## 5   -0.30                0                             NA    15752
## 6  -10.00                0                             NA      332
##         attest_by_name attest_date review_by_name review_date   due_date
## 1         Dave W Bilby  2013-01-04         eliack  2013-01-04 2013-01-30
## 2 Constance j Hathaway  2012-12-26         stajac  2012-12-26 2013-01-18
## 3     Ruth E Coulthard  2012-11-16         nanfer  2012-11-16 2012-12-17
## 4       Jessica Graham  2009-03-04                       <NA> 2007-01-31
## 5        Lori L Piercy  2012-10-22         lydplu  2012-10-29 2012-10-29
## 6       Susan L Schwab  2015-02-10         eliack  2015-02-11 2015-03-09
##   occptn_ltr_date pymt_sched_txt
## 1                               
## 2                               
## 3                               
## 4                               
## 5                               
## 6                               
##                                                                    purp_desc
## 1 OreStar balance greater than Bank Balances prior to my tenure as Treasurer
## 2                                                           adj bank account
## 3                                                      reconcile to chk acct
## 4                                                                           
## 5                                                              To match bank
## 6                                                            Bank Adjustment
##   intrst_rate check_nbr tran_stsfd_ind        filed_by_name filed_date
## 1                    NA              f         Dave W Bilby 2013-01-04
## 2                    NA              f Constance j Hathaway 2012-12-26
## 3                    NA              f     Ruth E Coulthard 2012-11-16
## 4                    NA              f       Jessica Graham 2009-03-04
## 5                    NA              f        Lori L Piercy 2012-10-22
## 6                    NA              f       Susan L Schwab 2015-02-10
##   addr_book_agent_name       book_type title_txt occptn_txt emp_name
## 1                      Business Entity                              
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                           Individual                              
##   emp_city emp_state employ_ind self_employ_ind               addr_line1
## 1                             f               f 18600 SE McLoughlin Blvd
## 2                                                                       
## 3                                                                       
## 4                             f               f                         
## 5                                                                       
## 6                             t               f          458 21st St. NE
##   addr_line2      city state   zip zip_plus_four    county purpose_codes
## 1            Milwaukie    OR 97267            NA Clackamas              
## 2                               NA            NA                        
## 3                               NA            NA                        
## 4                               NA            NA                        
## 5                               NA            NA                        
## 6                Salem    OR 97301            NA    Marion              
##   exp_date atteset_date neg_amt_flag
## 1       NA   2013-01-04            Y
## 2       NA   2012-12-26            Y
## 3       NA   2012-11-16            Y
## 4       NA   2009-03-04            Y
## 5       NA   2012-10-22            Y
## 6       NA   2015-02-10            Y
# count by transaction status and neg_amt_flag:
raw_trans %>% group_by(tran_status, neg_amt_flag) %>% summarise(count = n())
## Source: local data frame [4 x 3]
## Groups: tran_status [?]
## 
##   tran_status neg_amt_flag  count
##         (chr)        (chr)  (int)
## 1     Amended            N  25825
## 2     Amended            Y     78
## 3    Original            N 951128
## 4    Original            Y    306
# Separate two quite different kinds of records:
am_trans <-  raw_trans %>% filter(tran_status == "Amended")
or_trans <-  raw_trans %>% filter(tran_status == "Original")

last_ammendment <- am_trans %>% group_by(original_id) %>% 
  filter(min_rank(desc(filed_date)) == 1) %>% ungroup()

check_am_trans <- am_trans  %>% filter(original_id == "1555159" | original_id == "1881696" |
                        tran_id == "1555159" | tran_id == "1881696") %>% arrange(original_id)
select(check_am_trans, tran_id, original_id, attest_date, filed_date)
##   tran_id original_id attest_date filed_date
## 1 1973965     1555159  2015-01-23 2015-01-23
## 2 1982492     1555159  2015-02-12 2015-02-12
## 3 1798655     1555159  2014-08-29 2014-08-29
## 4 1977239     1555159  2015-02-02 2015-02-02
## 5 2010109     1881696  2015-04-29 2015-04-29
## 6 1928117     1881696  2014-11-24 2014-11-24
## 7 2001078     1881696  2015-04-07 2015-04-07
check_last_ammendment <- last_ammendment %>% filter(original_id == "1555159" | original_id == "1881696" |
                        tran_id == "1555159" | tran_id == "1881696") %>% arrange(original_id)

select(check_last_ammendment, tran_id, original_id, attest_date, filed_date)
## Source: local data frame [2 x 4]
## 
##   tran_id original_id attest_date filed_date
##     (chr)       (chr)       (chr)      (chr)
## 1 1982492     1555159  2015-02-12 2015-02-12
## 2 2010109     1881696  2015-04-29 2015-04-29
# conculsion: indeedd the correct record was selected.

It’s still puzzling why “Ammended” records all point to an “original” but the original transaction ID doesn’t seem appear in the dataset.

Because some transactions have been ammended more than once, the filter on the “last transaction” reduces the total number of ammended transactions by 294 to 25609. Should the rows with the “last ammended transaction” be added back to the un-ammended transactions? Probably so.

subsequently_ammended <- am_trans %>% group_by(original_id) %>% 
  filter(min_rank(desc(filed_date)) != 1) %>% ungroup()

# check whether multiple ammendments are for negative amounts and count up the total:
subsequently_ammended %>% group_by(neg_amt_flag) %>%  summarise(total = sum(amount))
## Source: local data frame [1 x 2]
## 
##   neg_amt_flag    total
##          (chr)    (dbl)
## 1            N 680880.8
x <- subsequently_ammended %>%  summarise(total = sum(amount))

The original dataset has 294 extra records that should be eliminated as suggested in this exploration. Without filtering out the duplicated adjusted records, total amounts are overstated by $ 680,880.8.