programming python

How do i print specific rows and columns in python?

I am having trouble printing certain values within my CSV.

I have a file that has 9 columns

Record_id   Month   Day   Year  Location_id   Animal_id  Sex   Length   Weight

and over 1000 rows.

I want to print Month , Day , and Year columns when the year is equivalent to 2002.

Because I have a lot of data I decided to only work with the first 5 rows where year is equal to 2002.

This is my code:

data.df.iloc[0:5, 1:4]

With this I can print the first 5 rows and the 3 columns I desire. However I can't figure out how to filter the year to be 2002

asked Apr 29, 2018 at 15:04

you can start by getting all the rows where year equal to 2002 with

filtered_data = df[df["Year"]==2002]

then you can apply your code to get only the first five rows and the three selected columns with

filtered_data.iloc[0:5, 1:4]

answered Apr 30, 2018 at 9:26

Last update on August 19 2022 21:51:42 [UTC/GMT +8 hours]

Pandas: DataFrame Exercise-6 with Solution

Write a Pandas program to select the specified columns and rows from a given DataFrame.
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.

Sample DataFrame:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Sample Solution :

Python Code :

import pandas as pd
import numpy as np

exam_data  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
        'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
        'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame[exam_data , index=labels]
print["Select specific columns and rows:"]
print[df.iloc[[1, 3, 5, 6], [1, 3]]]

Sample Output:

Select specific columns and rows:
   score qualify
b    9.0      no
d    NaN      no
f   20.0     yes
g   14.5     yes

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code [and comments] through Disqus.

Previous: Write a Pandas program to select the 'name' and 'score' columns from the following DataFrame.
Next: Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.

Image by catalyststuff on Freepik

You can download the Jupyter notebook of this tutorial here.

In this blog post, I will show you how to select subsets of data in Pandas using [ ], .loc, .iloc, .at, and .iat. I will be using the wine quality dataset hosted on the UCI website. This data record 11 chemical properties [such as the concentrations of sugar, citric acid, alcohol, pH, etc.] of thousands of red and white wines from northern Portugal, as well as the quality of the wines, recorded on a scale from 1 to 10. We will only look at the data for red wine.

First, I import the Pandas library, and read the dataset into a DataFrame.

Here are the first 5 rows of the DataFrame:

I rename the columns to make it easier for me call the column names for future operations.

wine_df.columns = ['fixed_acidity', 'volatile_acidity', 'citric_acid', 
                   'residual_sugar', 'chlorides', 'free_sulfur_dioxide', 
                   'total_sulfur_dioxide','density','pH','sulphates', 
                   'alcohol', 'quality']

Different Ways to Select Columns

Selecting a single column

To select the first column 'fixed_acidity', you can pass the column name as a string to the indexing operator.

You can perform the same task using the dot operator.

Selecting multiple columns

To select multiple columns, you can pass a list of column names to the indexing operator.

wine_four = wine_df[['fixed_acidity', 'volatile_acidity','citric_acid', 'residual_sugar']]

Alternatively, you can assign all your columns to a list variable and pass that variable to the indexing operator.

cols = ['fixed_acidity', 'volatile_acidity','citric_acid', 'residual_sugar']
wine_list_four = wine_four[cols]

Selecting columns using "select_dtypes" and "filter" methods

To select columns using select_dtypes method, you should first find out the number of columns for each data types.

In this example, there are 11 columns that are float and one column that is an integer. To select only the float columns, use wine_df.select_dtypes[include = ['float']]. The select_dtypes method takes in a list of datatypes in its include parameter. The list values can be a string or a Python object.

You can also use the filter method to select columns based on the column names or index labels.

In the above example, the filter method returns columns that contain the exact string 'acid'. The like parameter takes a string as an input and returns columns that has the string.

You can use regular expressions with the regex parameter in the filter method.

Here, I first rename the ph and quality columns. Then, I pass the regex parameter to the filter method to find all the columns that has a number.

Changing the Order of Columns

I would like to change the order of my columns.

wine_df.columns shows all the column names. I organize the names of my columns into three list variables, and concatenate all these variables to get the final column order.

I use the Set module to check if new_cols contains all the columns from the original.

Then, I pass the new_cols variable to the indexing operator and store the resulting DataFrame in a variable "wine_df_2" . Now, the wine_df_2 DataFrame has the columns in the order that I wanted.

Different Ways to Select Rows

Selecting rows using .iloc and loc

Now, let's see how to use .iloc and loc for selecting rows from our DataFrame. To illustrate this concept better, I remove all the duplicate rows from the "density" column and change the index of wine_df DataFrame to 'density'.

To select the third row in wine_df DataFrame, I pass number 2 to the .iloc indexer.

To do the same thing, I use the .loc indexer.

To select rows with different index positions, I pass a list to the .iloc indexer.

I pass a list of density values to the .iloc indexer to reproduce the above DataFrame.

You can use slicing to select multiple rows . This is similar to slicing a list in Python.

The above operation selects rows 2, 3 and 4.

You can perform the same thing using loc.

Here, I am selecting the rows between the indexes 0.9970 and 0.9959.

Selecting Rows and Columns Simultaneously

You have to pass parameters for both row and column inside the .iloc and loc indexers to select rows and columns simultaneously. The rows and column values may be scalar values, lists, slice objects or boolean.

Select all the rows, and 4th, 5th and 7th column:

To replicate the above DataFrame, pass the column names as a list to the .loc indexer:

Selecting disjointed rows and columns

To select a particular number of rows and columns, you can do the following using .iloc.

To select a particular number of rows and columns, you can do the following using .loc.

To select a single value from the DataFrame, you can do the following.

You can use slicing to select a particular column.

To select rows and columns simultaneously, you need to understand the use of comma in the square brackets. The parameters to the left of the comma always selects rows based on the row index, and parameters to the right of the comma always selects columns based on the column index.

If you want to select a set of rows and all the columns, you don't need to use a colon following a comma.

Selecting rows and columns using "get_loc" and "index" methods

In the above example, I use the get_loc method to find the integer position of the column 'volatile_acidity' and assign it to the variable col_start. Again, I use the get_loc method to find the integer position of the column that is 2 integer values more than 'volatile_acidity' column, and assign it to the variable called col_end.I then use the iloc method to select the first 4 rows, and col_start and col_endcolumns. If you pass an index label to the get_loc method, it returns its integer location.

You can perform a very similar operation using .loc. The following shows how to select the rows from 3 to 7, along with columns "volatile_acidity" to "chlorides".

Subselection using .iat and at

Indexers, .iat and .at, are much more faster than .iloc and .loc for selecting a single element from a DataFrame.

I will be writing more tutorials on manipulating data using Pandas. Stay Tuned!

References

Pandas Cookbook
Python for Data Analysis

How do I display specific rows and columns in Python?

Method 1: Using iloc[ ]. Example: Suppose you have a pandas dataframe and you want to select a specific row given its index..

iloc[ ] is used to select rows/ columns by their corresponding labels..

loc[ ] is used to select rows/columns by their indices..

[ ] is used to select columns by their respective names..

How do you select only certain rows in Python?

You can use one of the following methods to select rows in a pandas DataFrame based on column values:.

Method 1: Select Rows where Column is Equal to Specific Value df. loc[df['col1'] == value].

Method 2: Select Rows where Column Value is in List of Values. df. ... .

Method 3: Select Rows Based on Multiple Column Conditions df..

How do I select specific rows and columns from a DataFrame?

To select a single value from the DataFrame, you can do the following. You can use slicing to select a particular column. To select rows and columns simultaneously, you need to understand the use of comma in the square brackets.

How do I display a specific column in Python?

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .