Creating a GPT-2 Twitter Bot The Complete Guide

Photo by Morning Brew on Unsplash

Step One of 5: Scraping Accounts

First, create a list of Twitter accounts you want to use as your dataset, I asked on Twitter for volunteers, but you can use anyone you want as a dataset. (it's good courtesy to ask permission). You want to have quite a few tweets; otherwise, the AI will “overtrain” and generate unusable garble. I would suggest at least 50k tweets in general, but your mileage may vary.

conda install -c anaconda git
pip install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
import twint
import os
import glob
import pandas as pd
#Put your list of accounts in like the template
username = ["usernameone", "usernametwo", "usernamethree"]
#limit the number of tweets collected per account: c.Limit = 10
def ai(user):
c = twint.Config()
c.Username = user
c.Custom["tweet"] = ["tweet"]
c.Store_csv = True
c.Output = f'{user}.csv'
print(user)
twint.run.Search(c)
for users in username:
ai(users)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "alltweets.csv", index=True, encoding='utf-8-sig')
python [filename].py

Step Two of 5: Processing the Tweets

Once that is all done, it will generate a file called “alltweets.csv” which will have all the tweets from all accounts in a clean CSV file.

Step Three of 5: Training the AI

We will use google collab to make this process as accessible as possible. It is essentially a free virtual machine you can use for 8 hours with excellent hardware. It is recommended you have at least 10GB free in your google drive for training. To kick it off, click here to use a pre-built notebook generously created by Max Woolf, so the process is smooth sailing.

  • I would use 355M model for tweets. It tends to make more coherent. However, if you don't have a ton of tweets to work with, feel free to try the 124M model. Any of the larger models will not train well, and will generally make crappy tweets.
  • Experiment with your amount of steps. I trained for 4000 steps using a dataset of around 3.5MB, but your mileage may vary. A good rule of thumb is when the loss starts either going up consistently or not changing values at all, it changes This means it has overfitted the data and will start to corrupt the model.
  • Make sure to put samples on pretty frequently so you can see how the bot is doing!

Step Four of 5: Generating Tweets

When you are satisfied with your results and have saved the checkpoint to drive for later use, its time to generate tweets! Scroll down near the end, and you will be able to generate *10,000* tweets with a click of a button!

Step Five of 5: Setting Up Automating The Bots Tweets (optional)

Applying for Developer Access

To automatically tweet, you need developer access. A developer account will be done through https://developer.twitter.com/ You will have to fill out a form for applying (I’ll provide a copy/paste response to quicken the process), and you will likely have to wait for another two days to be accepted. Apply for the developer account on the account you plan to tweet from.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tweepy, time, sys

argfile = str(sys.argv[1])

#enter the corresponding information from your Twitter application:
CONSUMER_KEY = '1234abcd...'#keep the quotes, replace this with your consumer key
CONSUMER_SECRET = '1234abcd...'#keep the quotes, replace this with your consumer secret key
ACCESS_KEY = '1234abcd...'#keep the quotes, replace this with your access token
ACCESS_SECRET = '1234abcd...'#keep the quotes, replace this with your access token secret
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

filename=open(argfile,'r')
f=filename.readlines()
filename.close()

for line in f:
api.update_status(line)
time.sleep(3600)#Tweet every 60 minutes

Now, the moment of truth!

Go to the terminal, and enter the following command:

python3 tweet.py [name of text file].txt

Conclusion

This was my first project getting into AI, and it has been incredibly informative and exciting. I hope the same level of enjoyment I felt is shared with you. Now, this is only the tip of the iceberg, and I suggest you delve into other AI projects. I’ll be making these guides for every project I do to make sure others have all the resources to make AI fun and easy(ish). In the future, I will update this essay to further explore the possibilities like automatic AI responses to replies on Twitter and more!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rio

A high school senior trying to make the AI process at least 10% less homicidal