Upload CSV Data To MongoDB

Upload CSV Data To MongoDB

As developers, we face many cases when we import data from one source and want to upload this data into another source. This was the same situation I was facing. I exported data from Notion and wanted to import it into the MongoDB Atlas.

To insert data into MongoDB, we will use pymongo python package. To install pymongo use the following command:

 pip install pymongo==3.12.3

Note: Please note that there is some issue with a newer version of pymongo. So, I strongly suggest using version 3.12.3.

I will also use the pandas library. I find it easier to work with pandas than with python's default csv module. You can install pandas the same way we installed pymongo.

 pip install pandas

How to read CSV data in Python

To read CSV files, as I mentioned earlier, we will use the pandas library. To read a CSV file we need only 2 lines of code.

import pandas as pd

df = pd.read_csv("CsvFile.csv") # return a dataframe
print(df.head())  # df.head() return first 5 records

Now, after reading the CSV file we need to convert it into dict or json format. Why? We need to convert it into dict because MongoDB is JSON based database. And we can easily convert dict to json format and insert data into MongoDB.

How to convert pandas dataframe into dict

Pandas provide a method called df.to_dict() which converts dataframe to an array of dictionaries. We have multiple rows, so we need an array of dictionaries.

data_dict = df.to_dict("records")

Here, the "record" keyword is important. That's the argument which converts it into a proper array of dict. You can check out the official documentation here.

How to insert data to MongoDB

As I mentioned earlier, we will use pymongo it to connect with MongoDB. You can do that using the following code:

from pymongo import MongoClient

with MongoClient(URL) as client:
    db = client.prod  # prod is a database name
    tools = db.tools  # tools is a collection name
    ...

Here, we have used context manager (With statement), so we don't need to manually close the connection after operations are done. We will need the URL of our database, regardless of whether it is local or hosted somewhere.

Now, we got out a collection, to upload data into this collection we have 2 options. Either we insert row by row or all row data at once. First I will show you how you can insert all data at once.

How to insert multiple rows at once in MongoDB

To insert multiple data at once, we can use insert_many() method.

with MongoClient(URL,connect=False) as client:
    db = client.prod
    tools = db.tools
    result = tools.insert_many(data_list)

If you do not want any other operation during insertion and have already prepared an array of dict as we did earlier. This is the way to go.

How to insert data in MongoDB

To insert data row by row, we can use insert_one() the method. For example, we want to add it to every row.

with MongoClient(URL,connect=False) as client:
    db = client.Tools
    tools = db.tools
    for (index, data) in enumerate(data_list):
        data['id'] = index  # id added to existing data
        tools.insert_one(dub)

Conclusion

There are so many things we can do with Python and MongoDB. This was one of the examples. If you want to know or learn any other MongoDB operations apart from insertion let me know on Twitter or LinkedIn