Pandas read and write excel. In addition to CSV files, it is also common to use Excel worksheets to store data in the form of lists. Pandas defines two API functions to handle Excel files: read_excel() and to_excel(). The read_excel() function can read two types of files, Excel 2003 (.xls) and Excel 2007 (.xlsx). This function can read Excel because it integrates the xlrd module.
Read
To read the data in the excel file and convert it into a DataFrame object, just use the read_excel() function, as shown below:
1 2 3 4
import pandas as pd
frame = pd.read_excel("data.xlsx") print(frame)
As seen above, when reading excel, the DataFrame object returned by default contains the data in the first worksheet. To read the data in the second worksheet, you need to specify the name of the worksheet or the number or index of the worksheet with the second parameter, as shown below:
1 2 3 4 5 6 7
import pandas as pd
frame = pd.read_excel("data.xlsx", "Sheet2") # Or use the serial number to indicate sheet frame = pd.read_excel("data.xlsx", 1) # Note that the sorting of sheets starts from 0, where 1 corresponds to sheet2 above print(frame)
According to the different arrangements of the original data, sometimes it is not necessary to use the first row as the column name. If you read Excel, there will be no column name, you need to add the parameter: header=None
A new Excel file containing the data will be generated in the working directory.
Pandas ranks read and write operations
Read the specified single row, the data will be stored in the list
1 2 3 4 5 6 7
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default data = df.ix[0].values # 0 means the first line. The data read here does not include the header, be careful! print(data)
Read the specified multiple rows, the data will be stored in the nested list
1 2 3 4 5 6 7
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default data = df.ix[[1, 2]].values # If you want to read multiple lines, specify the number of lines in the nested list in ix[] print(data)
Read the specified rank
1 2 3 4 5 6 7
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default data = df.ix[1, 2] # Read the value of the first row and second column, there is no need for nested lists print(data)
Read the specified multi-row and multi-column value
1 2 3 4 5 6 7
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default data = df.ix[[1, 2], ["red", "green"]].values # Read the title and data column values in the first row and the second row, where a nested list is required print(data)
Get the specified column of all rows
1 2 3 4 5 6 7
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default data = df.ix[:, ["red", "green"]].values # Read the title and data column values in the first row and the second row, where a nested list is required print(data)
Get the line number and print out
1 2 3 4 5
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default print(df.index.values)
Get the column name and print out
1 2 3 4 5
import pandas as pd
df = pd.read_excel("data.xlsx") # This will directly read the first form of this Excel by default print(df.columns.values)
Pandas and Numpy data format conversion
Convert dataframe to np.array
1 2 3 4 5 6
import pandas as pd
df = pd.read_excel("data.xlsx") # df-->pandas dataframe data format df = df.values # df-->numpy array data format
Convert np.array to dataframe
1 2 3 4 5 6 7
import pandas as pd import numpy as np
df = np.random.rand(4, 4) # df-->numpy array data format df = pd.DataFrame(df) # df-->pandas dataframe data format