Read Csv Not as a List With Dictionary Python
The CSV (Comma Separated Values) format is quite popular for storing data. A big number of datasets are nowadays as CSV files which can be used either directly in a spreadsheet software similar Excel or can be loaded up in programming languages similar R or Python. Pandas dataframes are quite powerful for handling two-dimensional tabular information. In this tutorial, we'll look at how to read a csv file every bit a pandas dataframe in python.
How to read csv files in python using pandas?
The pandas read_csv()
function is used to read a CSV file into a dataframe. Information technology comes with a number of different parameters to customize how y'all'd like to read the file. The following is the general syntax for loading a csv file to a dataframe:
import pandas as pd df = pd.read_csv(path_to_file)
Here, path_to_file
is the path to the CSV file you desire to load. It can be any valid string path or a URL (run across the examples below). It returns a pandas dataframe. Let's look at some of the different use-cases of the read_csv()
function through examples –
Examples
Earlier we keep, let's become a sample CSV file that we'd exist using throughout this tutorial. Nosotros'll exist using the Iris dataset which y'all can download from Kaggle. Here's a snapshot of how it looks when opened in excel:
1. Read CSV from its location on your automobile
To read a CSV file locally stored on your car pass the path to the file to the read_csv()
function. You can pass a relative path, that is, the path with respect to your current working directory or yous can pass an accented path.
# read csv using relative path import pandas as pd df = pd.read_csv('Iris.csv') print(df.caput())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 ane v.1 3.v 1.four 0.two Iris-setosa one 2 four.9 3.0 1.4 0.2 Iris-setosa 2 3 4.seven 3.2 ane.three 0.2 Iris-setosa 3 4 4.vi 3.1 1.5 0.2 Iris-setosa 4 five 5.0 3.six 1.4 0.ii Iris-setosa
In the above case, the CSV file Iris.csv
is loaded from its location using a relative path. Here, the file is nowadays in the current working directory. You can as well read a CSV file from its accented path. See the case below:
# read csv using absolute path import pandas as pd df = pd.read_csv(r"C:\Users\piyush\Downloads\Iris.csv") print(df.head())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 i five.1 3.5 1.4 0.2 Iris-setosa 1 two 4.nine three.0 1.4 0.ii Iris-setosa 2 iii four.7 3.2 ane.3 0.2 Iris-setosa three 4 4.6 three.i ane.5 0.2 Iris-setosa 4 v five.0 3.6 1.4 0.ii Iris-setosa
Here, the aforementioned CSV file is read from its absolute path.
two. Read CSV from a URL
Y'all can also read a CSV file from its URL. Pass the URL to the read_csv()
office and it'll read the corresponding file to a dataframe. The Iris dataset tin as well be downloaded from the UCI Machine Learning Repository. Permit's apply their dataset download URL to read information technology as a dataframe.
import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data") df.head()
Output:
5.1 iii.5 i.iv 0.two Iris-setosa 0 4.9 3.0 one.4 0.2 Iris-setosa 1 4.7 iii.two 1.3 0.ii Iris-setosa ii 4.6 3.1 i.5 0.2 Iris-setosa three v.0 3.6 1.4 0.two Iris-setosa 4 5.4 3.ix 1.7 0.four Iris-setosa
You can see that the read_csv()
part is able to read a dataset from its URL. It is interesting to note that in this particular data source, we exercise non accept headers. The read_csv()
function infers the header by default and hither uses the first row of the dataset equally the header.
iii. Read a CSV file without a header
In the above example, you saw that if the dataset does not have a header, the read_csv()
part infers it by itself and uses the start row of the dataset as the header. Y'all can alter this beliefs through the header
parameter, pass None
if your dataset does not have a header. You lot can too pass a custom list of integers as a header.
import pandas every bit pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header=None) df.caput()
Output:
0 one 2 3 4 0 five.one 3.five 1.4 0.ii Iris-setosa 1 4.9 3.0 1.iv 0.ii Iris-setosa 2 4.7 3.ii 1.3 0.two Iris-setosa three 4.vi three.i i.v 0.2 Iris-setosa 4 5.0 three.6 1.four 0.2 Iris-setosa
In the in a higher place example, nosotros laissez passer header=None
to the read_csv()
office since the dataset did not have a header.
four. Read a CSV file and give custom column names
Yous can give custom cavalcade names to your dataframe when reading a CSV file using the read_csv()
office. Pass your custom column names as a list to the names
parameter.
import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/auto-learning-databases/iris/iris.data", names = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) print(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 four.9 3.0 1.four 0.2 Iris-setosa 2 4.7 iii.two one.3 0.2 Iris-setosa 3 4.vi three.1 i.v 0.2 Iris-setosa 4 5.0 three.6 1.iv 0.2 Iris-setosa
5. Read CSV with a column as index
You can also use a cavalcade as the row labels of the dataframe. Pass the column name to the index_col
parameter. Going back to the Iris.csv
we downloaded from Kaggle. Hither, we utilize the Id
columns as the dataframe index.
# read csv with a column as index import pandas as pd df = pd.read_csv('Iris.csv', index_col='Id') impress(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species Id 1 five.i 3.five 1.four 0.2 Iris-setosa ii 4.9 three.0 i.four 0.ii Iris-setosa three 4.vii 3.ii one.three 0.2 Iris-setosa iv 4.six three.1 i.v 0.2 Iris-setosa 5 5.0 3.half dozen i.4 0.ii Iris-setosa
In the higher up instance, you can see that the Id
column is used equally the row index of the dataframe df
. You can also pass multiple columns equally list to the index_col
parameter to be used as row index.
six. Read only a subset of columns of a CSV
Y'all can also specify the subset of columns to read from the dataset. Pass the subset of columns you want as a list to the usecols
parameter. For example, let's read all the columns from Iris.csv
except Id
.
# read csv with a cavalcade as index import pandas equally pd df = pd.read_csv('Iris.csv', usecols=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) print(df.head())
Output:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 v.one 3.5 ane.4 0.2 Iris-setosa 1 4.9 3.0 ane.iv 0.2 Iris-setosa two 4.7 iii.2 one.3 0.two Iris-setosa 3 four.6 iii.i 1.5 0.two Iris-setosa iv 5.0 three.6 1.4 0.2 Iris-setosa
In the above example, the returned dataframe does not have an Id
column.
vii. Read just the first north rows of a CSV
Yous can also specify the number of rows of a file to read using the nrows
parameter to the read_csv()
function. Particularly useful when y'all desire to read a pocket-sized segment of a large file.
# read csv with a column as index import pandas as pd df = pd.read_csv('Iris.csv', nrows=3) print(df.caput())
Output:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 ane 5.1 3.5 1.4 0.2 Iris-setosa i two 4.9 iii.0 1.iv 0.2 Iris-setosa two 3 4.7 3.2 1.3 0.2 Iris-setosa
In the above example, we read only the outset iii rows of the file Iris.csv
.
These are simply some of the things you can exercise when reading a CSV file to dataframe. Pandas dataframes also provide a number of useful features to manipulate the data one time the dataframe has been created.
With this, we come to the finish of this tutorial. The code examples and results presented in this tutorial accept been implemented in a Jupyter Notebook with a python (version 3.8.three) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you lot tin can opt out any time.
Source: https://datascienceparichay.com/article/read-csv-files-using-pandas-with-examples/
0 Response to "Read Csv Not as a List With Dictionary Python"
Post a Comment