PANDAS- 5 methods you must know

Anjan Parajuli
3 min readApr 9, 2024

“Data is the new gold ” they say….

But raw data is never good enough. So to make it gold, we have to process and extract knowledge within the data in order to get to the summit of achieving pure wisdom from data. Pandas is the first stride to reach that summit.

Pandas , a python library, is a library with myriads of available methods for analysis of data. Let’s embark on this journey of analyzing and exploring data in the realm of data science. Methods that I am about to describe and pin points are keys to starting out on the journey of data science. Let’s get on with it.

  1. info()

info is a method that subsequently results in the type of data available in the dataset. First step to analyzing data is to scrutinize it, so info will pave the way for the kind of preprocessing to be performed on the data. Let’s make our hands dirty:

Analyzing data types and count of null fields

2. describe()

Central tendency is the concept that describes the average value around which the whole data roams. And dispersion is measure of how far the data are distributed from central value. Descriptive statistics as a whole is provided by this function.

descriptive statistics using describe function

3. read_csv and to_csv

Real data (especially structured data) mostly arrive in csv (comma separated values) format. So to import and read them or create and download them, two functions are essential in the context of exploratory data analysis using pandas.

a. read_csv

This reads data in csv format into data frames (2 dimensional format in pandas).

b. to_csv

This exports data to our local machine in csv format from DataFrames of pandas.

Code above shows first exporting with index=False to remove redundancy of index while importing the data in next step.

4. loc and iloc

While exploring data, fetching the exact data point required might be a challenge especially when index are tweaked. To mitigate those problems, we have two useful methods in pandas

a. loc

It returns the value having index value as 3.

The example clearly shows and returns second row with index value as 3.

b. iloc

In contrast to loc, this method will return the value with actual position.

The example shows it is returning fourth row with index 3 starting from 0.

5. crosstab

This method is used to compare two columns in an innovative way quickly.

The first one is Data Frame and we used crosstab to analyze columns named “B” and “Category”. The new crosstab results can be interpreted as how many B in certain category. This might come handy when used with real world data.

Whoooo!!!
That’s all for now. Grasping these five important methods is a key step to exploratory data analysis using pandas and getting the real gold out of lumps of data.

Thank you. And see you again.👌

--

--