Read Csv Not as a List With Dictionary Python

The CSV (Comma Separated Values) format is quite popular for storing data. A big number of datasets are nowadays as CSV files which can be used either directly in a spreadsheet software similar Excel or can be loaded up in programming languages similar R or Python. Pandas dataframes are quite powerful for handling two-dimensional tabular information. In this tutorial, we'll look at how to read a csv file every bit a pandas dataframe in python.

The pandas read_csv() function is used to read a CSV file into a dataframe. Information technology comes with a number of different parameters to customize how y'all'd like to read the file. The following is the general syntax for loading a csv file to a dataframe:

            import pandas as pd df = pd.read_csv(path_to_file)          

Here, path_to_file is the path to the CSV file you desire to load. It can be any valid string path or a URL (run across the examples below). It returns a pandas dataframe. Let's look at some of the different use-cases of the read_csv() function through examples –

Earlier we keep, let's become a sample CSV file that we'd exist using throughout this tutorial. Nosotros'll exist using the Iris dataset which y'all can download from Kaggle. Here's a snapshot of how it looks when opened in excel:

 Iris dataset snapshot in Excel

To read a CSV file locally stored on your car pass the path to the file to the read_csv() function. You can pass a relative path, that is, the path with respect to your current working directory or yous can pass an accented path.

            # read csv using relative path import pandas as pd df = pd.read_csv('Iris.csv') print(df.caput())          

Output:

                          Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species 0   ane            v.1           3.v            1.four           0.two  Iris-setosa one   2            four.9           3.0            1.4           0.2  Iris-setosa 2   3            4.seven           3.2            ane.three           0.2  Iris-setosa 3   4            4.vi           3.1            1.5           0.2  Iris-setosa 4   five            5.0           3.six            1.4           0.ii  Iris-setosa          

In the above case, the CSV file Iris.csv is loaded from its location using a relative path. Here, the file is nowadays in the current working directory. You can as well read a CSV file from its accented path. See the case below:

            # read csv using absolute path import pandas as pd df = pd.read_csv(r"C:\Users\piyush\Downloads\Iris.csv") print(df.head())          

Output:

                          Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species 0   i            five.1           3.5            1.4           0.2  Iris-setosa 1   two            4.nine           three.0            1.4           0.ii  Iris-setosa 2   iii            four.7           3.2            ane.3           0.2  Iris-setosa three   4            4.6           three.i            ane.5           0.2  Iris-setosa 4   v            five.0           3.6            1.4           0.ii  Iris-setosa          

Here, the aforementioned CSV file is read from its absolute path.

Y'all can also read a CSV file from its URL. Pass the URL to the read_csv() office and it'll read the corresponding file to a dataframe. The Iris dataset tin as well be downloaded from the UCI Machine Learning Repository. Permit's apply their dataset download URL to read information technology as a dataframe.

            import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data") df.head()          

Output:

                          5.1  iii.5  i.iv  0.two  Iris-setosa 0  4.9  3.0  one.4  0.2  Iris-setosa 1  4.7  iii.two  1.3  0.ii  Iris-setosa ii  4.6  3.1  i.5  0.2  Iris-setosa three  v.0  3.6  1.4  0.two  Iris-setosa 4  5.4  3.ix  1.7  0.four  Iris-setosa          

You can see that the read_csv() part is able to read a dataset from its URL. It is interesting to note that in this particular data source, we exercise non accept headers. The read_csv() function infers the header by default and hither uses the first row of the dataset equally the header.

In the above example, you saw that if the dataset does not have a header, the read_csv() part infers it by itself and uses the start row of the dataset as the header. Y'all can alter this beliefs through the header parameter, pass None if your dataset does not have a header. You lot can too pass a custom list of integers as a header.

            import pandas every bit pd df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header=None) df.caput()          

Output:

                          0    one    2    3            4 0  five.one  3.five  1.4  0.ii  Iris-setosa 1  4.9  3.0  1.iv  0.ii  Iris-setosa 2  4.7  3.ii  1.3  0.two  Iris-setosa three  4.vi  three.i  i.v  0.2  Iris-setosa 4  5.0  three.6  1.four  0.2  Iris-setosa          

In the in a higher place example, nosotros laissez passer header=None to the read_csv() office since the dataset did not have a header.

Yous can give custom cavalcade names to your dataframe when reading a CSV file using the read_csv() office. Pass your custom column names as a list to the names parameter.

            import pandas as pd df = pd.read_csv("https://archive.ics.uci.edu/ml/auto-learning-databases/iris/iris.data",                  names = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) print(df.head())          

Output:

                          SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species 0            5.1           3.5            1.4           0.2  Iris-setosa 1            four.9           3.0            1.four           0.2  Iris-setosa 2            4.7           iii.two            one.3           0.2  Iris-setosa 3            4.vi           three.1            i.v           0.2  Iris-setosa 4            5.0           three.6            1.iv           0.2  Iris-setosa          

You can also use a cavalcade as the row labels of the dataframe. Pass the column name to the index_col parameter. Going back to the Iris.csv we downloaded from Kaggle. Hither, we utilize the Id columns as the dataframe index.

            # read csv with a column as index import pandas as pd df = pd.read_csv('Iris.csv', index_col='Id') impress(df.head())          

Output:

                          SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species Id                                                                        1             five.i           3.five            1.four           0.2  Iris-setosa ii             4.9           three.0            i.four           0.ii  Iris-setosa three             4.vii           3.ii            one.three           0.2  Iris-setosa iv             4.six           three.1            i.v           0.2  Iris-setosa 5             5.0           3.half dozen            i.4           0.ii  Iris-setosa          

In the higher up instance, you can see that the Id column is used equally the row index of the dataframe df. You can also pass multiple columns equally list to the index_col parameter to be used as row index.

Y'all can also specify the subset of columns to read from the dataset. Pass the subset of columns you want as a list to the usecols parameter. For example, let's read all the columns from Iris.csv except Id.

            # read csv with a cavalcade as index import pandas equally pd df = pd.read_csv('Iris.csv', usecols=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']) print(df.head())          

Output:

                          SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species 0            v.one           3.5            ane.4           0.2  Iris-setosa 1            4.9           3.0            ane.iv           0.2  Iris-setosa two            4.7           iii.2            one.3           0.two  Iris-setosa 3            four.6           iii.i            1.5           0.two  Iris-setosa iv            5.0           three.6            1.4           0.2  Iris-setosa                      

In the above example, the returned dataframe does not have an Id column.

Yous can also specify the number of rows of a file to read using the nrows parameter to the read_csv() function. Particularly useful when y'all desire to read a pocket-sized segment of a large file.

            # read csv with a column as index import pandas as pd df = pd.read_csv('Iris.csv', nrows=3) print(df.caput())          

Output:

                          Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species 0   ane            5.1           3.5            1.4           0.2  Iris-setosa i   two            4.9           iii.0            1.iv           0.2  Iris-setosa two   3            4.7           3.2            1.3           0.2  Iris-setosa          

In the above example, we read only the outset iii rows of the file Iris.csv.

These are simply some of the things you can exercise when reading a CSV file to dataframe. Pandas dataframes also provide a number of useful features to manipulate the data one time the dataframe has been created.

With this, we come to the finish of this tutorial. The code examples and results presented in this tutorial accept been implemented in a Jupyter Notebook with a python (version 3.8.three) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you lot tin can opt out any time.

polkthawased01.blogspot.com

Source: https://datascienceparichay.com/article/read-csv-files-using-pandas-with-examples/

0 Response to "Read Csv Not as a List With Dictionary Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel