Introduction to Pandas

Introduction

Pandas is a Python library for data manipulation and analysis. It is built on top of NumPy and is designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive.

Here are some of the frequently used Pandas functions and methods:

  • read_csv(): This function reads a CSV file into a Pandas DataFrame.
  • to_csv(): This function writes a Pandas DataFrame to a CSV file.
  • read_excel(): This function reads an Excel file into a Pandas DataFrame.
  • to_excel(): This function writes a Pandas DataFrame to an Excel file.
  • DataFrame(): This function creates a Pandas DataFrame from a variety of sources, including lists, dictionaries, and NumPy arrays.
  • loc(): This function selects rows and columns from a DataFrame by label.
  • iloc(): This function selects rows and columns from a DataFrame by index.
  • head(): This function returns the first few rows of a DataFrame.
  • tail(): This function returns the last few rows of a DataFrame.
  • describe(): This function provides a summary of the statistical properties of a DataFrame.
  • mean(): This function calculates the mean of a DataFrame.
  • median(): This function calculates the median of a DataFrame.
  • mode(): This function calculates the mode of a DataFrame.
  • plot(): This function draws a chart of a DataFrame.
  • groupby(): This function groups rows of a DataFrame together based on a common value.
  • agg(): This function aggregates the values in a DataFrame by applying a function to each group.
  • merge(): This function merges two DataFrames together based on a common column.
  • join(): This function joins two DataFrames together based on a common column.
  • concat(): This function concatenates two DataFrames together.
  • apply(): This function applies a function to each row or column of a DataFrame.
  • query(): This function filters a DataFrame based on a Boolean expression.
  • rename(): This function renames columns in a DataFrame.
  • fillna(): This function fills missing values in a DataFrame.

These are just a few of the many functions and methods available in Pandas. For more information, you can refer to the Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

Example

Here are some examples of how to use the functions and methods listed above:

  • read_csv(): This function reads a CSV file into a Pandas DataFrame. For example, to read the file data.csv into a DataFrame, you would use the following code:
import pandas as pd

df = pd.read_csv('data.csv')
  • to_csv(): This function writes a Pandas DataFrame to a CSV file. For example, to write the DataFrame df to the file output.csv, you would use the following code:
df.to_csv('output.csv')
  • read_excel(): This function reads an Excel file into a Pandas DataFrame. For example, to read the file data.xlsx into a DataFrame, you would use the following code:
import pandas as pd

df = pd.read_excel('data.xlsx')
  • to_excel(): This function writes a Pandas DataFrame to an Excel file. For example, to write the DataFrame df to the file output.xlsx, you would use the following code:
df.to_excel('output.xlsx')
  • DataFrame(): This function creates a Pandas DataFrame from a variety of sources, including lists, dictionaries, and NumPy arrays. For example, to create a DataFrame from the list data, you would use the following code:
import pandas as pd

data = [1, 2, 3, 4, 5]

df = pd.DataFrame(data)
  • loc(): This function selects rows and columns from a DataFrame by label. For example, to select the first row and the second column from the DataFrame df, you would use the following code:
df.loc[0, 1]
  • iloc(): This function selects rows and columns from a DataFrame by index. For example, to select the first row and the second column from the DataFrame df, you would use the following code:
df.iloc[0, 1]
  • head(): This function returns the first few rows of a DataFrame. For example, to return the first 5 rows of the DataFrame df, you would use the following code:
df.head(5)
  • tail(): This function returns the last few rows of a DataFrame. For example, to return the last 5 rows of the DataFrame df, you would use the following code:
df.tail(5)
  • describe(): This function provides a summary of the statistical properties of a DataFrame. For example, to get a summary of the statistical properties of the DataFrame df, you would use the following code:
df.describe()
  • mean(): This function calculates the mean of a DataFrame. For example, to calculate the mean of the values in the 'column_name' column of the DataFrame df, you would use the following code:
df['column_name'].mean()
  • median(): This function calculates the median of a DataFrame. For example, to calculate the median of the values in the 'column_name' column of the DataFrame df, you would use the following code:
df['column_name'].median()
  • mode(): This function calculates the mode of a DataFrame. For example, to calculate the mode of the values in the 'column_name' column of the DataFrame df, you would use the following code:
df['column_name'].mode()
  • plot(): This function draws a chart of a DataFrame. For example, to draw a line chart of the values in the 'column_name' column of the DataFrame df, you would use the following code:
df['column_name'].plot()
  • groupby(): This function groups rows of a DataFrame together based on a common value. For example, to group the rows of the DataFrame df by the value in the 'column_name' column, you would use the following code:
df = df.groupby('column_name')
  • agg(): This function aggregates the values in a DataFrame by applying a function to each group. For example, to calculate the mean of the values in each group of the DataFrame df, you would use the following code:
df = df.groupby('column_name').agg('mean')
  • merge(): This function merges two DataFrames together based on a common column. For example, to merge the DataFrames df1 and df2 together based on the value in the 'column_name' column, you would use the following code:
pd.merge(df1, df2, on='column_name')
  • join(): This function joins two DataFrames together based on a common column. For example, to join the DataFrames df1 and df2 together based on the value in the 'column_name' column, you would use the following code:
pd.join(df1, df2, on='column_name')
  • concat(): This function concatenates two DataFrames together. For example, to concatenate the DataFrames df1 and df2 together, you would use the following code:
pd.concat([df1, df2])
  • apply(): This function applies a function to each row or column of a DataFrame. For example, to calculate the square root of each value in the 'column_name' column of the DataFrame df, you would use the following code:
df = df.apply(np.sqrt)
  • query(): This function filters a DataFrame based on a Boolean expression. For example, to filter the DataFrame df to only rows where the value in the 'column_name' column is greater than 10, you would use the following code:
df = df.query('column_name > 10')
  • rename(): This function renames columns in a DataFrame. For example, to rename the 'column_name' column to 'new_column_name', you would use the following code:
df = df.rename(columns={'column_name': 'new_column_name'})
  • fillna(): This function fills missing values in a DataFrame. For example, to fill all missing values in the 'column_name' column with the value 0, you would use the following code:
df = df.fillna(0)