# A Comprehensive Guide to Extract Tweets using Tweepy

Twitter is one of the most popular sources of data in this age of Artificial Intelligence. Today, data is key to almost everything. To extract data from this amazing platform, Twitter provides APIs. We can use the API endpoints provided by Twitter, but, in this blog, we will use the Tweepy library.

You can do so much with Twitter API/Tweepy using Python that it is hard to cover all in one blog. So, I will divide it into two parts. In this blog, we will cover 5 topics related to searching the tweets. We will learn:

1. How to search tweets with Keywords
    
2. How to search tweets with specific user mentioned or of specific user
    
3. How to find tweets with specific hashtags
    
4. How To combine all three options
    
5. How to do pagination/How to fetch N number of tweets while tacking limitation of API
    

Before we start with the actual part of our blog, just confirm that you have installed Tweepy on your system. If not try the following command in your terminal.

```shell
pip install Tweepy
```

# How to Search Tweets with Specific Keywords

Twitter API has some restrictions, so in this blog, I will show you how to get recent Tweets that contain specific keywords. For example, we want to get tweets that contain either **bitcoin** or **python**. To get tweets with any or both of these keywords, we have to form a query. And the query would be `bitcoin OR python`. One thing to note here is that **OR** must be in a capital case.

To fetch the tweets we can use the following code:

```python
from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

KEYWORDS = "bitcoin OR python"

# Basic keyword search
tweets = api.search_tweets(KEYWORDS)
```

I have written another [blog](https://codingenigma.com/introduction-to-twitter-search-api) that will show you how to form the search queries and how you can get your Twitter API keys.

**So, how does this code work?** To fetch data using Twitter API we need to authenticate first. In version 1.1, we need to use OAuth for authentication. Whereas, in version 2, we can do most of the tasks with Bearer Token only.

After the authentication, we need to form a query. In our case, we want tweets that contain either bitcoin or python. Afterwards, we need to use **search\_tweets()** method to pass our query. And that's it!

# How to Search Tweets with Full Text

To extract full tweets, we need to pass one more argument, `tweet_mode="extended"` in our `search_tweets()` method. So, our new code to extract full tweets is

```python
from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

KEYWORDS = "bitcoin OR python"

# Basic keyword search with full text
tweets = api.search_tweets(KEYWORDS, tweet_mode="extended")
```

# How to get Tweets with only the English language?

We just need to add one more argument in our **serach\_tweet()** method.

```python
# Basic keyword search with full text and lang
tweets = api.search_tweets(KEYWORDS, lang="en", tweet_mode="extended")
```

What if you want recent tweets or most popular tweets? Using Tweepy we can get either recent tweets, popular tweets or mixed version. By default, Tweepy returns mixed Tweets.

```Python
# Basic keyword search with full text and result type
tweets = api.search_tweets(KEYWORDS, result_type="recent", tweet_mode="extended")
tweets = api.search_tweets(KEYWORDS, result_type="popular", tweet_mode="extended")
tweets = api.search_tweets(KEYWORDS, result_type="mixed", tweet_mode="extended")
```

# How to get N numbers of Tweets?

Good news! You have to just pass another argument **count** in `search_tweets()` method.

```python
# Basic keyword search with full text and count
tweets = api.search_tweets(KEYWORDS, count=100, tweet_mode="extended")
print("Total:", len(tweets))
```

But, there is a catch here.

![Get N number of Tweets using Tweepy](https://cdn.hashnode.com/res/hashnode/image/upload/v1711914648802/d87f57a1-b4e4-4d60-953b-e6d24b982939.png align="center")

You see, even if I asked for 100 Tweets sometimes I got 90, sometimes 89 or even 85. So, I am not certain how many you can get using the **count** argument. So, for now, I will suggest if you use the count argument use the number below 85.

**Don't Worry!** Tweepy has provided a way to fetch N number of Tweets but sometimes it depends on the quota and subscription of your Twitter API.

# How to Search Tweets with Specific Hashtags

To get tweets based on hashtags is the same as the keywords. Tweepy has not provided any specific function to retrieve Tweets based on Hashtags. So, to get these Tweets we have to again form the query as we did in the search based on keywords.

Let's extract Tweets based on `#javascript` and `#backend`. Here, we want tweets with both hashtags. So, the query will be `#javascript AND #backend`.

```python
from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "#javascript AND #backend -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", count=5, result_type="recent")

for i in tweets:
    print(i.full_text)
    print("-" * 15)
```

> **Note:** I am Showing only small 3 Tweets here on purpose. Because others were too big and it will just create more confusion.
> 
> ![How to Search Tweets with Specific Hashtags](https://cdn.hashnode.com/res/hashnode/image/upload/v1711914663937/75d0d76a-1ccf-4ef9-bf87-096d5e2e2a2f.png align="center")

As you can see in the output, Every tweet has **javascript** and **backend** both hashtags in them. Here, I have used `-filter:retweets` in our query to exclude annoying retweets from our result.

# How to Search Tweets containing Specific User

So far, we saw how to search tweets based on keywords and hashtags. But, now we will see how to extract Tweets that have specific user mentions.

It is really very simple. We have formed a query mentioning our desired user. For example, we want to fetch tweets that mention **Elon Musk**. So, the query will become `@elonmusk`. Here, **elonmusk** is the username of Elon Musk.

```python
from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "@elonmusk -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", result_type="recent")

for i in tweets:
    print(i.full_text)
    print("-" * 15)
```

> Again, I have shown only a few tweets here.

![How to Search Tweets containing Specific User](https://cdn.hashnode.com/res/hashnode/image/upload/v1711914681082/d370d336-7dfb-43b1-a08d-117f211994c3.png align="center")

## How to fetch Tweets based on keyword, hashtags and username?

So far, we saw the three most used queries to fetch tweets. Just to give a bit of test to form a query now let's combine all of these three methods in a single go. So, we will try to fetch tweets that have the **tesla** keyword, **cryptocurrency** hashtag and **elonmusk** mentioned.

The query we need to form will be `tesla AND #cryptocurrency AND @elonmusk`

```python
from tweepy import OAuth1UserHandler, API

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth)

print("-" * 15)

QUERY = "tesla AND #cryptocurrency AND @elonmusk -filter:retweets"
tweets = api.search_tweets(QUERY, tweet_mode="extended", count=7, result_type="recent", lang='en')
for i in tweets:
    print(i.full_text)
    print("-" * 15)
```

![How to fetch Tweets based on keyword, hashtags and username?](https://cdn.hashnode.com/res/hashnode/image/upload/v1711914700564/025a3a06-8da4-4276-a471-4005f734e0e0.png align="center")

# How to Retrieve Specific Number of Tweets using Tweepy

Earlier, we saw, how we can pass the **count** argument to the `search_tweets()` method and get a specific number of tweets. But using this method we can get only 100 tweets as per the documentation. So, **how do we fetch more tweets?**

To retrieve n number of tweets, we can use the `Cursor` class provided by Tweepy. The cursor operates or works like pagination which is what we need to retrieve N number of tweets.

**So, how do we use it? 🤔**

To fetch 1000 tweets using the Cursor class, we need to pass mainly two arguments:

1. Method that we want to paginate
    
2. Query
    

Now specify the number of tweets we want, we can use the `items()` method of Cursor class.

```python
from tweepy import OAuth1UserHandler, API, Cursor

auth = OAuth1UserHandler(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET)
api = API(auth, wait_on_rate_limit=True)

all_tweets = []
for tweet in Cursor(api.search_tweets, "#python", count=1000).items():
    all_tweets.append(tweet.text)

print(len(all_tweets))
```

Here, if you have noticed, I have passed one more argument in our API class. Twitter API has a rate limit/quota. If we don't want to handle the error manually then we can pass `wait_on_rate_limit=True` to **API()** class.

Because of this, whenever we hit a rate limit, we will get see the following message on our console.

> `Rate limit reached. Sleeping for: 802`

The 802 seconds is not the fixed duration. In general, we have to wait between 13 to 15 minutes.

![How to Retrieve Specific Number of Tweets using Tweepy](https://cdn.hashnode.com/res/hashnode/image/upload/v1711914718691/7c43c78c-1499-4b34-970f-611b835ad791.png align="center")

# Conclusion

You might have observed that the most difficult task to search and fetch tweets is to form a proper query. To solve that problem, I have written a whole blog post explaining [Twitter Search API](https://codingenigma.com/introduction-to-twitter-search-api). Once you go through that blog you will be able to write your own search queries.

Let me know if you need any help or want to discuss something. Reach out to me on [Twitter](https://twitter.com/Sahil_Fruitwala) or [LinkedIn](https://bit.ly/3JbsPDm). Make sure to share any thoughts, questions, or concerns. I would love to see them.

Till the next time 👋
