We had a very open-ended invitation to explore one of several datasets on
http://totalgood.github.io/hackor/
We chose to explore the Raw Committe Transactions table.
The following describes more or less where our exploration ended up, rather than showing the many false starts and dead ends that we encountered.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
raw_trans <- read.delim("/Users/jds/Desktop/hackor/raw_committee_transactions.csv", stringsAsFactors = F, sep = ",")
The raw dataset has 977337 rows and 42 columns.
# Correct the data types and add a flag for negative amounts.
raw_trans <- raw_trans %>%
mutate(tran_id = as.character(tran_id),
original_id = as.character(original_id),
atteset_date = ymd(attest_date),
review_date = ymd(review_date),
due_date = ymd(due_date),
neg_amt_flag = ifelse(amount < 0, "Y", "N")
)
# have a look at the raw data:
head(raw_trans)
## tran_id original_id tran_date tran_status
## 1 1430945 1430945 2012-11-20 Original
## 2 1202420 1202420 2012-03-19 Original
## 3 1202445 1202445 2012-03-19 Original
## 4 1212902 1212902 2012-03-30 Original
## 5 1179133 1179133 2012-02-14 Original
## 6 1165767 1165767 2012-02-10 Original
## filer
## 1 Citizen Action for Political Education
## 2 Citizen Action for Political Education
## 3 Citizen Action for Political Education
## 4 Citizen Action for Political Education
## 5 Citizen Action for Political Education
## 6 Committee to Elect Cheryl Hukill
## contributor_payee sub_type
## 1 Miscellaneous Cash Contributions $100 and under Cash Contribution
## 2 Miscellaneous Cash Expenditures $100 and under Cash Expenditure
## 3 Miscellaneous Cash Expenditures $100 and under Cash Expenditure
## 4 Miscellaneous Cash Contributions $100 and under Cash Contribution
## 5 Miscellaneous Cash Contributions $100 and under Cash Contribution
## 6 Luther Horsley Cash Contribution
## amount aggregate_amount contributor_payee_committee_id filer_id
## 1 16080.14 0 NA 33
## 2 325.11 0 NA 33
## 3 325.11 0 NA 33
## 4 4.16 0 NA 33
## 5 3680.05 0 NA 33
## 6 250.00 250 NA 12581
## attest_by_name attest_date review_by_name review_date due_date
## 1 Aurora Prestholt 2012-12-19 nanfer 2012-12-19 2012-12-20
## 2 Aurora Prestholt 2012-04-01 nanfer 2012-04-02 2012-04-10
## 3 Aurora Prestholt 2012-04-01 nanfer 2012-04-02 2012-04-10
## 4 Aurora Prestholt 2012-04-11 nanfer 2012-04-11 2012-04-10
## 5 Aurora Prestholt 2012-03-01 nanfer 2012-03-05 2012-03-15
## 6 Ronald R Hukill 2012-02-10 janflo 2012-03-13 2012-03-12
## occptn_ltr_date pymt_sched_txt purp_desc intrst_rate check_nbr
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## tran_stsfd_ind filed_by_name filed_date addr_book_agent_name
## 1 f Aurora Prestholt 2012-12-19
## 2 f Aurora Prestholt 2012-04-01
## 3 f Aurora Prestholt 2012-04-01
## 4 f Aurora Prestholt 2012-04-11
## 5 f Aurora Prestholt 2012-03-01
## 6 f Ronald R Hukill 2012-03-12
## book_type title_txt occptn_txt emp_name emp_city emp_state employ_ind
## 1 f
## 2 f
## 3 f
## 4 f
## 5 f
## 6 Individual Farmer f
## self_employ_ind addr_line1 addr_line2 city state zip zip_plus_four
## 1 f NA NA
## 2 f NA NA
## 3 f NA NA
## 4 f NA NA
## 5 f NA NA
## 6 t PO Box 209 Midland OR 97634 NA
## county purpose_codes exp_date atteset_date neg_amt_flag
## 1 NA 2012-12-19 N
## 2 NA 2012-04-01 N
## 3 NA 2012-04-01 N
## 4 NA 2012-04-11 N
## 5 NA 2012-03-01 N
## 6 Klamath NA 2012-02-10 N
# have a look at negative transactions, because that sound peculiar:
filter(raw_trans, neg_amt_flag == "Y") %>% head()
## tran_id original_id tran_date tran_status
## 1 1440832 1440832 2012-12-31 Original
## 2 1435213 1435213 2012-12-19 Original
## 3 1418410 1418410 2012-11-16 Original
## 4 542546 542546 0029-01-14 Original
## 5 1388147 1388145 2012-10-22 Amended
## 6 1981143 1979608 2015-02-05 Amended
## filer
## 1 Friends of Clackamas Community College
## 2 Don Moore for Commissioner
## 3 Committee to Elect Regina Ayars
## 4 Friends of Jason Atkinson
## 5 SW Rail Vote PAC
## 6 Marion County Democratic Central Committee
## contributor_payee sub_type
## 1 Clackamas Federal Credit Union Cash Balance Adjustment
## 2 Cash Balance Adjustment
## 3 Cash Balance Adjustment
## 4 Miscellaneous Cash Expenditures $100 and under Cash Expenditure
## 5 Cash Balance Adjustment
## 6 Sarah White Cash Balance Adjustment
## amount aggregate_amount contributor_payee_committee_id filer_id
## 1 -198.75 0 NA 11247
## 2 -259.40 0 NA 15512
## 3 -4.41 0 NA 15822
## 4 -24.48 0 NA 2864
## 5 -0.30 0 NA 15752
## 6 -10.00 0 NA 332
## attest_by_name attest_date review_by_name review_date due_date
## 1 Dave W Bilby 2013-01-04 eliack 2013-01-04 2013-01-30
## 2 Constance j Hathaway 2012-12-26 stajac 2012-12-26 2013-01-18
## 3 Ruth E Coulthard 2012-11-16 nanfer 2012-11-16 2012-12-17
## 4 Jessica Graham 2009-03-04 <NA> 2007-01-31
## 5 Lori L Piercy 2012-10-22 lydplu 2012-10-29 2012-10-29
## 6 Susan L Schwab 2015-02-10 eliack 2015-02-11 2015-03-09
## occptn_ltr_date pymt_sched_txt
## 1
## 2
## 3
## 4
## 5
## 6
## purp_desc
## 1 OreStar balance greater than Bank Balances prior to my tenure as Treasurer
## 2 adj bank account
## 3 reconcile to chk acct
## 4
## 5 To match bank
## 6 Bank Adjustment
## intrst_rate check_nbr tran_stsfd_ind filed_by_name filed_date
## 1 NA f Dave W Bilby 2013-01-04
## 2 NA f Constance j Hathaway 2012-12-26
## 3 NA f Ruth E Coulthard 2012-11-16
## 4 NA f Jessica Graham 2009-03-04
## 5 NA f Lori L Piercy 2012-10-22
## 6 NA f Susan L Schwab 2015-02-10
## addr_book_agent_name book_type title_txt occptn_txt emp_name
## 1 Business Entity
## 2
## 3
## 4
## 5
## 6 Individual
## emp_city emp_state employ_ind self_employ_ind addr_line1
## 1 f f 18600 SE McLoughlin Blvd
## 2
## 3
## 4 f f
## 5
## 6 t f 458 21st St. NE
## addr_line2 city state zip zip_plus_four county purpose_codes
## 1 Milwaukie OR 97267 NA Clackamas
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 Salem OR 97301 NA Marion
## exp_date atteset_date neg_amt_flag
## 1 NA 2013-01-04 Y
## 2 NA 2012-12-26 Y
## 3 NA 2012-11-16 Y
## 4 NA 2009-03-04 Y
## 5 NA 2012-10-22 Y
## 6 NA 2015-02-10 Y
# count by transaction status and neg_amt_flag:
raw_trans %>% group_by(tran_status, neg_amt_flag) %>% summarise(count = n())
## Source: local data frame [4 x 3]
## Groups: tran_status [?]
##
## tran_status neg_amt_flag count
## (chr) (chr) (int)
## 1 Amended N 25825
## 2 Amended Y 78
## 3 Original N 951128
## 4 Original Y 306
# Separate two quite different kinds of records:
am_trans <- raw_trans %>% filter(tran_status == "Amended")
or_trans <- raw_trans %>% filter(tran_status == "Original")
last_ammendment <- am_trans %>% group_by(original_id) %>%
filter(min_rank(desc(filed_date)) == 1) %>% ungroup()
check_am_trans <- am_trans %>% filter(original_id == "1555159" | original_id == "1881696" |
tran_id == "1555159" | tran_id == "1881696") %>% arrange(original_id)
select(check_am_trans, tran_id, original_id, attest_date, filed_date)
## tran_id original_id attest_date filed_date
## 1 1973965 1555159 2015-01-23 2015-01-23
## 2 1982492 1555159 2015-02-12 2015-02-12
## 3 1798655 1555159 2014-08-29 2014-08-29
## 4 1977239 1555159 2015-02-02 2015-02-02
## 5 2010109 1881696 2015-04-29 2015-04-29
## 6 1928117 1881696 2014-11-24 2014-11-24
## 7 2001078 1881696 2015-04-07 2015-04-07
check_last_ammendment <- last_ammendment %>% filter(original_id == "1555159" | original_id == "1881696" |
tran_id == "1555159" | tran_id == "1881696") %>% arrange(original_id)
select(check_last_ammendment, tran_id, original_id, attest_date, filed_date)
## Source: local data frame [2 x 4]
##
## tran_id original_id attest_date filed_date
## (chr) (chr) (chr) (chr)
## 1 1982492 1555159 2015-02-12 2015-02-12
## 2 2010109 1881696 2015-04-29 2015-04-29
# conculsion: indeedd the correct record was selected.
It’s still puzzling why “Ammended” records all point to an “original” but the original transaction ID doesn’t seem appear in the dataset.
Because some transactions have been ammended more than once, the filter on the “last transaction” reduces the total number of ammended transactions by 294 to 25609. Should the rows with the “last ammended transaction” be added back to the un-ammended transactions? Probably so.
subsequently_ammended <- am_trans %>% group_by(original_id) %>%
filter(min_rank(desc(filed_date)) != 1) %>% ungroup()
# check whether multiple ammendments are for negative amounts and count up the total:
subsequently_ammended %>% group_by(neg_amt_flag) %>% summarise(total = sum(amount))
## Source: local data frame [1 x 2]
##
## neg_amt_flag total
## (chr) (dbl)
## 1 N 680880.8
x <- subsequently_ammended %>% summarise(total = sum(amount))
The original dataset has 294 extra records that should be eliminated as suggested in this exploration. Without filtering out the duplicated adjusted records, total amounts are overstated by $ 680,880.8.