Creating a GPT-2 Twitter Bot The Complete Guide

Photo by Morning Brew on Unsplash

As a beginner of AI coding and development, I was surprised by the lack of complete guides and thought I might as well document my process so others will find the AI journey easier to pass.

This project will create an AI Twitter bot, complete with automated scheduling and generation, plus creating a dataset from previously, human-made (ew) tweets. It is recommended that you have a basic understanding of python, although not necessary. This is all done so that you can run this off of virtually any computer, so don't fret!

We will use/setup:

Anaconda

Google Collab

Twint

Very basic coding knowledge

GPT-2 Simple

Twitter Developer Account (optional)

Notepad++

Make sure to have anaconda and notepad++ installed before beginning this tutorial.

Step One of 5: Scraping Accounts

First, create a list of Twitter accounts you want to use as your dataset, I asked on Twitter for volunteers, but you can use anyone you want as a dataset. (it's good courtesy to ask permission). You want to have quite a few tweets; otherwise, the AI will “overtrain” and generate unusable garble. I would suggest at least 50k tweets in general, but your mileage may vary.

Once you have a solid list, you want to install a useful tool called Twint that will scrape all of our data.

Open anaconda prompt, and cd into wherever you want to store your dataset. Install twint by issuing the following command into a command prompt. Installing twint any other way will produce errors later in the process

First, you must install git, so copy and paste the code below:

conda install -c anaconda git

then install twint using the text below:

pip install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Now we prepare the code to scrape the ungodly amount of tweets. You can collect tweets in several ways via twint (click the link for more ways to collect tweets) I decided to collect the usernames by hand.

Now that you have the usernames, you want to copy this code below into a text document and save it as a .py file into a folder of your choice.

import twint
import os
import glob
import pandas as pd
#Put your list of accounts in like the template
username = ["usernameone", "usernametwo", "usernamethree"]
#limit the number of tweets collected per account: c.Limit = 10
def ai(user):
c = twint.Config()
c.Username = user
c.Custom["tweet"] = ["tweet"]
c.Store_csv = True
c.Output = f'{user}.csv'
print(user)
twint.run.Search(c)
for users in username:
ai(users)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "alltweets.csv", index=True, encoding='utf-8-sig')

Run the file using the command

python [filename].py

If you encounter errors when running the code download the file from google drive here. and it will start scraping all of the accounts. This may take a bit, so if you have any other text you want in the AI like books, quotes, etc compile that now.

Step Two of 5: Processing the Tweets

Once that is all done, it will generate a file called “alltweets.csv” which will have all the tweets from all accounts in a clean CSV file.

The next step requires Notepad++, so make sure you have this installed. To make this process as quick as possible, I’ve made a macro that you can import to process all your data in one click. You are free to process it yourself, but this shortcut is there if you want it.

What you need to is download the. XML file here. Then, go to your windows search, look up %appdata% and then notepad++. There you will find shortcuts.xml and replace that file with the XML you just downloaded. (a more detailed guide here)

Now, you want to open output.csv in notepad++ on the top toolbar, click macro. Then select “Cleantweets” and let the macro clean up the data! Be patient, it may say “not responding,” but it just needs time!

When it is done, it should look something like this:

Step Three of 5: Training the AI

We will use google collab to make this process as accessible as possible. It is essentially a free virtual machine you can use for 8 hours with excellent hardware. It is recommended you have at least 10GB free in your google drive for training. To kick it off, click here to use a pre-built notebook generously created by Max Woolf, so the process is smooth sailing.

Continue with the instructions located in the collab, however ill give you a few tips:

  • I would use 355M model for tweets. It tends to make more coherent. However, if you don't have a ton of tweets to work with, feel free to try the 124M model. Any of the larger models will not train well, and will generally make crappy tweets.
  • Experiment with your amount of steps. I trained for 4000 steps using a dataset of around 3.5MB, but your mileage may vary. A good rule of thumb is when the loss starts either going up consistently or not changing values at all, it changes This means it has overfitted the data and will start to corrupt the model.
  • Make sure to put samples on pretty frequently so you can see how the bot is doing!

Step Four of 5: Generating Tweets

When you are satisfied with your results and have saved the checkpoint to drive for later use, its time to generate tweets! Scroll down near the end, and you will be able to generate *10,000* tweets with a click of a button!

You can change the number of files, the tweets per file, and the temperature, which can change the “wackiness” of the tweets with lower being more formal and the high be more erratic. Click the play button and download all the files per collabs instruction.

And there you have it, hundreds of thousands of tweetable blurbs in txt file made straight from an AI! With this, you can set up an account and manually tweet to your heart's content.

If, however, you would like to automate the tweeting, follow the instructions down below.

Step Five of 5: Setting Up Automating The Bots Tweets (optional)

Applying for Developer Access

To automatically tweet, you need developer access. A developer account will be done through https://developer.twitter.com/ You will have to fill out a form for applying (I’ll provide a copy/paste response to quicken the process), and you will likely have to wait for another two days to be accepted. Apply for the developer account on the account you plan to tweet from.

First, you want to click “apply” at the top right corner

Then click “apply for a developer account”

Choose “Making a bot” and click next

Fill out your basic information

Now for the fun part, filling out forms! Now, don’t worry, I have created copy paste answers that you can put in, so you don’t have to spend your precious time filling out annoying questions. (credit to this article)

First, it will ask how your bot will interact with its API. You want to say the following:

“I plan to build a small bot that will use GPT2, a Markov Chain, or some variation of an RNN to make simple tweets daily about a given topic. The purpose is to just share some of the fun generated text that these models make via the Twitter platform.”

The next question we will answer is “Will your app use Tweet, Retweet, like, follow, or Direct Message functionality?” you can respond with:

“The app will use the tweepy Python library to update the bot’s twitter status with a tweet at an unintrusive interval of time. The Python script will be run from a cron job on my computer and will only use the Tweet functionality.”

The rest of the questions for my case are redundant and do not be answered, however, if your plan uses some of these features, detail to your own accord.

And apply!

Create a folder called “aitweets” and cd into that with a command terminal. Then install the tweepy library using this code:

pip3 install tweepy

Next, create a text file inside the folder. copy and paste this code inside the text file and save as a python file (.py) (in the example tweet.py)

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tweepy, time, sys

argfile = str(sys.argv[1])

#enter the corresponding information from your Twitter application:
CONSUMER_KEY = '1234abcd...'#keep the quotes, replace this with your consumer key
CONSUMER_SECRET = '1234abcd...'#keep the quotes, replace this with your consumer secret key
ACCESS_KEY = '1234abcd...'#keep the quotes, replace this with your access token
ACCESS_SECRET = '1234abcd...'#keep the quotes, replace this with your access token secret
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

filename=open(argfile,'r')
f=filename.readlines()
filename.close()

for line in f:
api.update_status(line)
time.sleep(3600)#Tweet every 60 minutes

Now that we have all the necessary tweets and the script almost primed, its time to go back to the developer application. If you're accepted, you can continue with further steps. If not, you may have to wait for a response from Twitter. You want to go back to the twitter developer dashboard. From there, click “Add App”

Name it your bots name and continue. It will land you on a page that looks something like this:

Go back to the python script, and copy the API key/API secret key into the consumer key slots. Next, click “app settings” and edit “app permissions” from “read-only” to “read and write” and save.

Then, go to the “keys and tokens” located at the top center.

From there, click “generate” next to the “Access Token & Secret”

Similarly to the API keys, put them in your python script, but instead of entering under “consumer” enter under the “Access” portions of the script. It will look like so:

Take those AI Generate text files, put them in the same directory as the python file. Now, you can look through the text to make a more filtered text file, or just run off wild and run them unsupervised. As a note, you will have to remove <|endoftext|>, <|startoftext|>, empty spaces, and ===== spaces to remove any bad tweets.

Now, the moment of truth!

Go to the terminal, and enter the following command:

python3 tweet.py [name of text file].txt

And you should have the bot tweeting line by line every 60 minutes from your selected text file! You will have to have this script running on your computer (or hosted server) to tweet. It won't tweet if you were to turn the computer off. I'm sure there are solutions you can look into if you're interested.

Conclusion

This was my first project getting into AI, and it has been incredibly informative and exciting. I hope the same level of enjoyment I felt is shared with you. Now, this is only the tip of the iceberg, and I suggest you delve into other AI projects. I’ll be making these guides for every project I do to make sure others have all the resources to make AI fun and easy(ish). In the future, I will update this essay to further explore the possibilities like automatic AI responses to replies on Twitter and more!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store