Introduction to Twitter Search API

Introduction to Twitter Search API

Introduction

Facebook, Instagram and Twitter are considered as one of the top social media platforms. According to BusinessofApps Facebook, Instagram and Twitter have 2700, 1160 and 330 million estimated active users every month respectively. By looking at these numbers we can say that social media has become one of the largest sources of data. Though Facebook and Instagram have more active users, Twitter remains the most popular platform for academic researchers and developers. The main reason for this could be the availability and insights of data provided by Twitter.

As of 2018, according to Oberlo, 500 million tweets are sent each day. That equates to 5,787 tweets per second. In the beginning, Twitter was used as just a textual platform but with growing popularity, it is also being used for sharing photos and videos. Apart from sharing their thoughts and ideas, many people use Twitter as a tutorial and community platform.

How to Create Twitter Developer Account

Before using Twitter API, we need to set up a developer account and need to create a project to get API keys. If you do not have a Twitter account you need to create one and then go to Twitter Developers Dahsboard and sign in. You will see the dashboard as shown in the image below.

Click on the Create Project button it will take you to the screen as below. Gave name to your project. You can give any name but it should be unique. I gave MyBlogProject as a name.

After that select the reason for this project and write down some descriptions about the project. (I guess, it does not necessary but they are asking these anyway.)

In the end, you just need to give the name of the application. Again it does not matter but it should be unique. And once you press complete you will get your API keys as shown in the image.

You can copy these keys now are you can get them afterwards as well. Do not share your keys with anyone. Now if you click on Projects and Apps and then overview from the left side panel. You will see all your projects and standalone apps. You can use both or any one of Project or Standalone Apps. For this tutorial, we will use App which we created inside Project. You can see I have both apps inside the project and a standalone app. Earlier there was only one kind of Twitter API which was V1.1 but at the time I was writing a blog new V2 was in the early access stage. So for the project, I have access to both versions whereas for a standalone app I had access to only V1.1.

Exploration of Twitter API v1.1

Every endpoint in V1.1 API starts with api.twitter.com/1.1/search/tweets.json. We can use different query parameters as per our needs. In this tutorial, we are using Standard API which puts some restrictions on usage. As per official documentation, there are many kinds of query parameters available which are:

  1. q

  2. geocode

  3. lang

  4. result_type

  5. count

  6. include_entities

  7. until

  8. since_id and max_id

  9. locale (only ja is currently effective, so we will avoid this one.)

  10. tweet_mode (this has not been documented in official docs.)

From the above-mentioned parameters, only 'q' is the required query parameter and all others are optional parameters.

1. Query Parameter: "q"

Query parameter q is used to search specific terms, hashtags and users. As an example, we can use #python to get all tweets that contain #python or we can use "web development" as a single word to retrieve all tweets that have "web development" in it. Moreover, by using @elonmusk, we can get all tweets and retweets of user Elon Musk.

Note: Standard API will get you only data from the last 7 days and if you do not use count it will return only 15 data.

  1. Example of fetching tweets with hashtag: python
    https://api.twitter.com/1.1/search/tweets.json?q=%23python

  2. Example of fetching tweets that contain: "web development"
    https://api.twitter.com/1.1/search/tweets.json?q="web development"

  3. Example of fetching tweets with username: elonmusk
    https://api.twitter.com/1.1/search/tweets.json?q=@elonmusk

Here, in the first example, as you can see we have used %23 instead of symbol #. We need to use such representation as we can not use some symbols directly in our URI. By default, Twitter API gives truncated tweet data. So, to get the full tweets, we have to pass tweet_mode=extended query parameter.

2. Query Parameter: "geocode"

Once we have decided what to search and which hashtag, keywords or user to search, we can search tweets for specific geolocation using the geocode parameter. This query parameter is optional_, so, one can use it as per needs.

Now, If I want to search the keyword python around Bangalore, I need to pass latitude, longitude and radius. So, I will pass query parameter as geocode=12.97194,77.59369,1mi. Here, 12.97194,77.59369 and 1mi are latitude, longitude and radius respectively. Where 1mi means 1 mile. We can use km (kilometres) as well.

To get tweets that have the python keyword and which are within a mile radius of Bangalore one can use the following URI: https://api.twitter.com/1.1/search/tweets.json?q=python&geocode=12.97194,77.59369,1mi

3. Query Parameter: "lang"

We can get tweets with specific language. As an example, we can get tweets that have the keyword India and language is hindi. To fetch this kind of data we can use the lang query parameter as shown below. https://api.twitter.com/1.1/search/tweets.json?q=India&lang=hi

One can use the following Wikipedia page to get a list of all languages and their respective codes.

4. Query Parameter: "result_type"

There is some situation where we want most recent tweets and sometimes there are situations where we want most popular tweets. In this kind of situation, we can use query parameter result_type. This query parameter is optional as geocode.

We can either use mixed, recent or popular as reasult_type where mixed is the default value.

5. Query Parameter: "count"

The count parameter is used when we want a specific number of tweets. By default, we will get 15 tweets but in a single request we can get up to a maximum of 100 tweets.

6. Query Parameter: "include_entities"

One can get some extra data by passing include_entities query parameter as true.

By use of the URI shown below it gives me extra entities field data. https://api.twitter.com/1.1/search/tweets.json?q=%22web%20development%22&include_entities=true&count=1

Entities Data:

"entities": {
        "hashtags": [
          {
            "text": "FREECOURSE",
            "indices": [
              0,
              11
            ]
          },
          {
            "text": "FREE",
            "indices": [
              88,
              93
            ]
          },
          {
            "text": "online",
            "indices": [
              94,
              101
            ]
          },
          {
            "text": "udemy",
            "indices": [
              102,
              108
            ]
          }
        ],
        "symbols": [],
        "user_mentions": [],
        "urls": [
          {
            "url": "https://t.co/YcPoOiGnNu",
            "expanded_url": "https://www.udemy.com/course/bootstrap-3-responsive-design-tutorial-fundamentals/?couponCode=DISCOVERYVIP",
            "display_url": "udemy.com/course/bootstr…",
            "indices": [
              64,
              87
            ]
          },
          {
            "url": "https://t.co/icOzd6jBUe",
            "expanded_url": "https://twitter.com/i/web/status/1376231256258719747",
            "display_url": "twitter.com/i/web/status/1…",
            "indices": [
              110,
              133
            ]
          }
        ]
      }

One thing to notice here is, I have used %22 instead of double quotes (") and %20 instead of white space.

7. Query Parameter: "until"

To get all tweets created before a specific date, we can use until query parameter. The date should be formatted as YYYY-MM-DD. Keep in mind that the search index has a 7-day limit (for Standard API). In other words, no tweets will be found for a date older than one week.

When I was writing this blog it was 2021-03-28, so, I can request data up to 2021-03-22. If I request the date 2021-03-21, it will give me an empty array. One can use the following format to use the until parameter: https://api.twitter.com/1.1/search/tweets.json?q=python&until=2021-03-22

8. Query Parameter: "since_id" and "max_id"

Truly, in documentation, I could not find how can we get/generate since_id and max_Id.

As per the documentation, if we use since_id, it will return results with an ID greater than (that is, more recent than) the specified ID. On other hand, max_id will return results with an ID less than (that is, older than) or equal to the specified ID.

Boolean Syntax

We can use boolean operators and grouping mechanisms to get more specific tweets. We can use logical And, OR and NOT(-) operators. We can use _round parenthesis for grouping multiple keywords and filters.

If we want to search tweets that contain python and developers then we can write q=python%20developer. This will search tweets with both keywords python and developer. Here, note that python and the developer don't need to come together.

If we want them together we can write q=%22python%20developer%22 or q="python developer". If we want tweets with either python or developer then we can write q=(python OR developer).

If we want to ignore tweets with some keywords we can do using hyphen(-). So, to get tweets with the keyword python or Django and ignore developer, we can write a query like q=(python OR Django) -developer.

Note: We can use multiple OR and negation in our query. To use multiple negations instead of using -(iPhone OR iMac OR MacBook), use the following: -iPhone -iMac -MacBook.

There can be some uncertainty while using multiple operations. Example:

  • apple OR iPhone iPad would be evaluated as apple OR (iPhone iPad)

  • iPad iPhone OR android would be evaluated as (iPhone iPad) OR android

To eliminate uncertainty and ensure that your rules are evaluated as intended, group terms together with parentheses where appropriate. For example:

  • (apple OR iPhone) iPad

  • iPhone (iPad OR android)

More Filters

As per other online resources and official documentation, we can filter data by replies, retweets and based on the account is verified or not as well.

NoFilterExplanation
1filter:retweetsIncludes retweets
2-filter:retweetsExcludes retweets
3filter:repliesIncludes replies
4-filter:repliesExcludes replies
5filter:verifiedIncludes tweets from verified accounts only
6-filter:verifiedExcludes tweets from verified accounts only
7exclude:retweetsExcludes retweets
8exclude:repliesExcludes replies
9since:YYYY-MM-DDfetch tweets since mentioned date
10until:YYYY-MM-DDfetch tweets until mentioned date

To use these filters, we need logical AND and OR operators. It does the same as the name suggests. To understand more, we can go through some examples.

  1. Get tweets with the keyword python and exclude retweets: https://api.twitter.com/1.1/search/tweets.json?q=python AND -filter:retweets
    or
    https://api.twitter.com/1.1/search/tweets.json?q=python AND exclude:retweets

  2. Get tweets with the keyword python and exclude replies: https://api.twitter.com/1.1/search/tweets.json?q=python AND -filter:replies
    or
    https://api.twitter.com/1.1/search/tweets.json?q=python AND exclude:replies

  3. Get tweets with the keyword python and exclude retweets and replies: https://api.twitter.com/1.1/search/tweets.json?q=python AND -filter:retweets AND -filter:replies

  4. Get tweets with the keyword "apple iPad" and from a verified account: https://api.twitter.com/1.1/search/tweets.json?q="apple iPad" AND filter:verified

Some more filers are:

NoFilterExplanation
1filter:linksIncludes tweets with links
2-filter:linksExcludes tweets with links
3filter:imagesIncludes tweets with images
4-filter:imagesExcludes tweets with images
5filter:videosIncludes tweets with videos
6-filter:videosExcludes tweets with videos
7from:userBrings back tweets from named user
8to:userBrings back tweets sent to named user
9-has:hashtagsIncludes tweets without hashtags
10has:hashtagsIncludes tweets with hashtags (for some reason it is not working now on)

There are many other filters are there it is not a good idea to mention them all here. You can find all other filters in official documentation.

Note: Here write username without @


One last example: Let's assume we want to get recent 50 tweets with keywords "python developer" AND Django and by ignoring flask. We want only tweets and replies (in other words we don't want retweets). We need full tweets with English(en) language around the Bangalore area and we do not want extra entities.

Solution:

https://api.twitter.com/1.1/search/tweets.json?q=("python developer" AND  django) -flask AND exclude:retweets&tweet_mode=extended&lang=en&count=50&geocode=12.97194,77.59369,10mi

Conclusion

I have tried to explain all the methods and filters one can use for fetching tweets using Twitter API. I wrote this blog for introducing all developers to standard Twitter API, there might be many other query parameters and filters that are not mentioned here.

References

I am grateful for all the resources mentioned below. These are the all resources that helped me in writing this blog.

  1. How to Master Twitter Search: Basic Boolean Operators and Filters

  2. Stackoverflow

  3. Search Tweets

  4. Rules and filtering: Premium v1.1