What is Data Frame ?
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure. In simple terms, it’s like a table or spreadsheet where you can store rows and columns of data. Each column can have different data types such as integers, floats, and strings.
In Python, DataFrames are a powerful way to store and work with tabular data (like data in a table or spreadsheet). You can think of a DataFrame as a table where each column is a different data type (numbers, text, etc.), and each row represents a record or an entry in the dataset.
- Rows represent individual data points (observations or records).
- Columns represent variables (attributes or features) of those records.
DataFrames are a part of the Pandas library, which is a popular tool for data analysis.
CREATING DATAFRAMES
1.Reading Data into DataFrames:
pip install pandas
import pandas as pd
# Reading a CSV file into a DataFrame
df = pd.read_csv(‘data.csv’)
# Displaying the first few rows of the DataFrame
print(df.head())
Output: The data will be extracted from CSV file and shown
2.Creating a DataFrame from a Dictionary:
Taking example of students and their marks
import pandas as pd
# Data in the form of a dictionary
data = {
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’],
‘Age’: [20, 21, 22],
‘Score’: [85, 90, 88]
}
# Creating a DataFrame
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
output:
Name Age Score
0 Alice 20 85
1 Bob 21 90
2 Charlie 22 88
3.Creating a DataFrame from a List of Lists:
data = [
[‘Alice’, 20, 85],
[‘Bob’, 21, 90],
[‘Charlie’, 22, 88]
]
columns = [‘Name’, ‘Age’, ‘Score’]
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
Name Age Score
0 Alice 20 85
1 Bob 21 90
2 Charlie 22 88
Axis in Pandas DataFrame:
In Pandas, axis is a parameter used in many functions to specify whether the operation should be applied to rows (axis=0) or columns (axis=1).
- Axis 0 refers to the rows (vertical axis).
- Axis 1 refers to the columns (horizontal axis).
Index in DataFrames:
In a DataFrame, an index refers to the labels that are assigned to the rows. It helps uniquely identify each row in the DataFrame, allowing for easy and efficient access, selection, and manipulation of data. The index serves as the row label and is crucial for various operations like filtering, sorting, and joining DataFrames.