Cheat Sheet for Pandas [transl.]

Record some commonly used pandas commands.

How to use pandas to read and write excel

Pandas read and write excel. In addition to CSV files, it is also common to use Excel worksheets to store data in the form of lists. Pandas defines two API functions to handle Excel files: read_excel() and to_excel(). The read_excel() function can read two types of files, Excel 2003 (.xls) and Excel 2007 (.xlsx). This function can read Excel because it integrates the xlrd module.

Read

  1. To read the data in the excel file and convert it into a DataFrame object, just use the read_excel() function, as shown below:
1
2
3
4
import pandas as pd

frame = pd.read_excel("data.xlsx")
print(frame)
  1. As seen above, when reading excel, the DataFrame object returned by default contains the data in the first worksheet. To read the data in the second worksheet, you need to specify the name of the worksheet or the number or index of the worksheet with the second parameter, as shown below:
1
2
3
4
5
6
7
import pandas as pd

frame = pd.read_excel("data.xlsx", "Sheet2")
# Or use the serial number to indicate sheet
frame = pd.read_excel("data.xlsx", 1)
# Note that the sorting of sheets starts from 0, where 1 corresponds to sheet2 above
print(frame)
  1. According to the different arrangements of the original data, sometimes it is not necessary to use the first row as the column name. If you read Excel, there will be no column name, you need to add the parameter: header=None
1
2
3
4
import pandas as pd

frame = pd.read_excel("data.xlsx", 0, header=None)
print(frame)

Write

The above operations are also applicable to Excel write operations, to convert the DataFrame object to Excel, as shown below:

1
2
3
4
5
6
7
8
9
10
import pandas as pd
import numpy as np

frame = pd.DataFrame(
np.random.random((4, 4)),
index=["exp1", "exp2", "exp3", "exp4"],
columns=["Jan", "Fab", "Mar", "Apr"],
)
print(frame)
frame.to_excel("data.xlsx")

A new Excel file containing the data will be generated in the working directory.

Pandas ranks read and write operations

  1. Read the specified single row, the data will be stored in the list
1
2
3
4
5
6
7
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
data = df.ix[0].values
# 0 means the first line. The data read here does not include the header, be careful!
print(data)
  1. Read the specified multiple rows, the data will be stored in the nested list
1
2
3
4
5
6
7
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
data = df.ix[[1, 2]].values
# If you want to read multiple lines, specify the number of lines in the nested list in ix[]
print(data)
  1. Read the specified rank
1
2
3
4
5
6
7
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
data = df.ix[1, 2]
# Read the value of the first row and second column, there is no need for nested lists
print(data)
  1. Read the specified multi-row and multi-column value
1
2
3
4
5
6
7
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
data = df.ix[[1, 2], ["red", "green"]].values
# Read the title and data column values ​​in the first row and the second row, where a nested list is required
print(data)
  1. Get the specified column of all rows
1
2
3
4
5
6
7
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
data = df.ix[:, ["red", "green"]].values
# Read the title and data column values ​​in the first row and the second row, where a nested list is required
print(data)
  1. Get the line number and print out
1
2
3
4
5
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
print(df.index.values)
  1. Get the column name and print out
1
2
3
4
5
import pandas as pd

df = pd.read_excel("data.xlsx")
# This will directly read the first form of this Excel by default
print(df.columns.values)

Pandas and Numpy data format conversion

  1. Convert dataframe to np.array
1
2
3
4
5
6
import pandas as pd

df = pd.read_excel("data.xlsx")
# df-->pandas dataframe data format
df = df.values
# df-->numpy array data format
  1. Convert np.array to dataframe
1
2
3
4
5
6
7
import pandas as pd
import numpy as np

df = np.random.rand(4, 4)
# df-->numpy array data format
df = pd.DataFrame(df)
# df-->pandas dataframe data format

Reference

  1. pandas documentation
  2. pandas select specific rows and columns