by Dan Fellin
Have you used Twitter API?
in the last six months?
Source data is tweets about python.
Let's code!
That's all there is to it to gather the 200K tweets.
That 200K was twitter's fault. 4x all the tweets for the past week
Any questions?
How many people have used pandas?
dict
tuned upNow let's go to code...
df = get_df_few() # normalize_json
df.to_csv('out.csv')
# look at csv
# look at json
# explain why normalize_json is cool
# list all columns
df.columns
# all columns are tab completed
# df.fav<TAB>
# append
df2 = get_df_many() # normalize_json
df_all = df.append(df2)
# favorites
print(df.favorite_count.mean())
print(df.sort_values('favorite_count'))
# get all
print(df.groupby("favorite_count").size())
df_all.created_at_ts = pd.to_datetime(df_all.created_at)
df_all.text_split = df.text.str.lower().str.split()
unique_words = set()
df_all.text_split.apply(unique_words.update)
len(unique_words) # should be ~20000
all_words = []
df_all.text_split.apply(all_words.extend)
len(all_words) # should be ~140000
c = Counter(all_words)
c.most_common(20)
# replace all not word and not whitespace
# ALSO LOWER
s = df_all.text.str.replace('[^\w\s]', ' ')
s = s.str.split()
df_all.text_split = s
all_words = []
df_all.text_split.apply(all_words.extend)
c = Counter(all_words)
c.most_common(20)
Now, choose your own adventure:
all_tweets.csv
)