You are currently viewing Creating DataFrames in Pandas

Creating DataFrames in Pandas

This post will illustrate how to create pandas dataFrames in 4 easy steps. The purpose of this post is to introduce you to the world of pandas and its DataFrames. Knowledge of this topic is essential for anyone who would like to venture into data science. This post assumes you understand the basics of python data structures and python libraries. I have tried to explain everything like I’m explaining to an absolute beginner while keeping it short and simple, so I hope you’ll enjoy.

Pandas is a python library that is used for manipulating data by data analysts and data scientists. Data in Pandas is usually stored in a 2 dimensional data structure called a DataFrame. A DataFrame is a 2 dimensional labeled data structure with columns and rows. Columns can contain different data types (like string, int, float etc) if necessary. You can think of a DataFrame as being similar to as Excel table. Here is what an example DataFrame might look like.

Example DataFrame for student performance

The above is a DataFrame with college student performance information. It presents how each student is performing in different classes. You can notice how different columns have different data types. The question now is — how do we create such a DataFrame? You may follow the following steps in your python editor,

1. DataFrames are not “built in” objects in python – this means we cannot use DataFrames unless we import them from the pandas library. Lets import the DataFrame from the pandas library as shown below

from pandas import DataFrame

2. To be able to play comfortably with our data, we need to convert it into a data structure thats easy to manipulate. Lucky for us, we have the list data structure. At this point, lets create lists of the data as illustrated below:

names = ['Akhilesh', 'Ruchi','Bhawna', 'Isha']
acc_marks = [97,69,19,76]
eng_marks = [36, 85,72,68]
mat_marks = [47, 86, 41, 46]
eco_marks = [13,51,53,11]
bus_marks = [34,53,40,22]

3. The DataFrame constructor can take either an ndarray (structured or homogeneous), Iterable, dict, or DataFrame as the data that needs to be converted into a DataFrame. For our example, we will use a dict. Lets create a dictionary of column names and column data as illustrated below

student_data_dict = {"Name": names, "Accountancy": acc_marks, "English": eng_marks, "Maths": mat_marks, "Economics": eco_marks, "Business Studies": bus_marks}

4. Now the next step is to create a DataFrame from this dictionary by parsing the dictionary as the first parameter (in our case, only parameter) to the DataFrame constructor.

student_data_frame = DataFrame(student_data_dict)

If you run the command print(student_data_frame), it should show you the following output:

Student marks DataFrame output

This is how we create DataFrames from scratch using pandas. I recommend this udacity course to all begginners Feel free to comment and share this post ๐Ÿ™‚

This Post Has One Comment

Leave a Reply