How do you select the top 5 rows in python?

There are 2 solutions:

1.sort_values and aggregate head:

df1 = df.sort_values('score',ascending = False).groupby('pidx').head(2)
print (df1)

    mainid pidx pidy  score
8        2    x    w     12
4        1    a    e      8
2        1    c    a      7
10       2    y    x      6
1        1    a    c      5
7        2    z    y      5
6        2    y    z      3
3        1    c    b      2
5        2    x    y      1

2.set_index and aggregate nlargest:

df = df.set_index(['mainid','pidy']).groupby('pidx')['score'].nlargest(2).reset_index() 
print (df)
  pidx  mainid pidy  score
0    a       1    e      8
1    a       1    c      5
2    c       1    a      7
3    c       1    b      2
4    x       2    w     12
5    x       2    y      1
6    y       2    x      6
7    y       2    z      3
8    z       2    y      5

Timings:

np.random.seed(123)
N = 1000000

L1 = list('abcdefghijklmnopqrstu')
L2 = list('efghijklmnopqrstuvwxyz')
df = pd.DataFrame({'mainid':np.random.randint(1000, size=N),
                   'pidx': np.random.randint(10000, size=N),
                   'pidy': np.random.choice(L2, N),
                   'score':np.random.randint(1000, size=N)})
#print (df)

def epat(df):
    grouped = df.groupby('pidx')
    new_df = pd.DataFrame([], columns = df.columns)
    for key, values in grouped:
        new_df = pd.concat([new_df, grouped.get_group(key).sort_values('score', ascending=True)[:2]], 0)
    return (new_df)

print (epat(df))

In [133]: %timeit (df.sort_values('score',ascending = False).groupby('pidx').head(2))
1 loop, best of 3: 309 ms per loop

In [134]: %timeit (df.set_index(['mainid','pidy']).groupby('pidx')['score'].nlargest(2).reset_index())
1 loop, best of 3: 7.11 s per loop

In [147]: %timeit (epat(df))
1 loop, best of 3: 22 s per loop

You can use df.head() to get the first N rows in Pandas DataFrame.

For example, if you need the first 4 rows, then use:

df.head(4)

Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.

For example, you can use the following syntax to get all the rows excluding the last 4 rows:

df.head(-4)

Complete example to get the first N rows in Pandas DataFrame

Step 1: Create a DataFrame

Let’s create a simple DataFrame with 10 rows:

import pandas as pd

data = {'Fruits': ['Banana','Blueberry','Apple','Cherry','Mango','Pineapple','Watermelon','Papaya','Pear','Coconut'],
        'Price': [2,1.5,3,2.5,3,4,5.5,3.5,1.5,2]
        }

df = pd.DataFrame(data, columns = ['Fruits', 'Price'])

print (df)

As you can see, there are 10 rows in the DataFrame:

       Fruits  Price
0      Banana    2.0
1   Blueberry    1.5
2       Apple    3.0
3      Cherry    2.5
4       Mango    3.0
5   Pineapple    4.0
6  Watermelon    5.5
7      Papaya    3.5
8        Pear    1.5
9     Coconut    2.0

Step 2: Get the first N Rows in Pandas DataFrame

You can use the following syntax to get the first 4 rows in the DataFrame (depending on your needs, you may specify a different number of rows inside the brackets):

df.head(4)

Here is the complete code to get the first 4 rows for our example:

import pandas as pd

data = {'Fruits': ['Banana','Blueberry','Apple','Cherry','Mango','Pineapple','Watermelon','Papaya','Pear','Coconut'],
        'Price': [2,1.5,3,2.5,3,4,5.5,3.5,1.5,2]
        }

df = pd.DataFrame(data, columns = ['Fruits', 'Price'])

get_rows = df.head(4)

print (get_rows)

You’ll now get the first 4 rows:

      Fruits  Price
0     Banana    2.0
1  Blueberry    1.5
2      Apple    3.0
3     Cherry    2.5

Step 3 (Optional): Get all the rows, excluding the last N rows

Let’s suppose that you’d like to get all the rows, excluding the last N rows.

For example, you can use the code below in order to get all the rows excluding the last 4 rows:

import pandas as pd

data = {'Fruits': ['Banana','Blueberry','Apple','Cherry','Mango','Pineapple','Watermelon','Papaya','Pear','Coconut'],
        'Price': [2,1.5,3,2.5,3,4,5.5,3.5,1.5,2]
        }

df = pd.DataFrame(data, columns = ['Fruits', 'Price'])

get_rows = df.head(-4)

print (get_rows)

You’ll now see all the rows excluding the last 4 rows:

      Fruits  Price
0     Banana    2.0
1  Blueberry    1.5
2      Apple    3.0
3     Cherry    2.5
4      Mango    3.0
5  Pineapple    4.0

You can check the Pandas Documentation to learn more about df.head().

You are here: Home / Python / Pandas DataFrame / How to Select Top N Rows with the Largest Values in a Column(s) in Pandas?

Sometimes, while doing data wrangling, we might need to get a quick look at the top rows with the largest or smallest values in a column. This kind of quick glance at the data reveal interesting information in a dataframe. Pandas dataframe easily enables one to have a quick look at the top rows either with largest or smallest values in a column.

Pandas library has function called nlargest makes it really easy to look at the top or bottom rows. Let us first load Pandas library.

import pandas as pd

Let us use gapminder data. Let us load the data from Carpentry’s github page and look at the data corresponding to the year 2007 alone.

# Carpentry url containing data
data_url = 'http://bit.ly/2cLzoxH'
# Load the data from Carpentry url
gapminder = pd.read_csv(data_url)
# filter the data to contain just year=2007
gapminder_2007 = gapminder[gapminder.year==2007]

Pandas nlargest function can take the number of rows we need as argument and the column name for which we are looking for largest values. Pandas nlargest function

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

Let us look at the top 3 rows of the dataframe with the largest population values using the column variable “pop”.

gapminder_2007.nlargest(3,'pop')

We just get two rows and see that China, India, and US are the top 3 countries with largest population.

country	year	pop	continent	lifeExp	gdpPercap
299	China	2007	1.318683e+09	Asia	72.961	4959.114854
707	India	2007	1.110396e+09	Asia	64.698	2452.210407
1619	United States	2007	3.011399e+08	Americas	78.242	42951.653090

The function nlargest also has an argument keep that allows one to deal with duplicate values. keep can take {‘first’, ‘last’, ‘all’}, where

first : prioritize the first occurrence(s)
last : prioritize the last occurrence(s)
all : does not drop any duplicates

How to Get Top N Rows Based on Largest Values in Multiple Columns in Pandas?

In the above example we saw getting top rows ordered by values of a single column. Pandas nlargest function can take more than one variable to order the top rows.

We can give a list of variables as input to nlargest and get first n rows ordered by the list of columns in descending order.

# top n rows ordered by multiple columns
gapminder_2007.nlargest(3,['lifeExp','gdpPercap'])

Here we get top 3 rows with largest values in column “lifeExp” and then “gdpPercap”.

	country	year	pop	continent	lifeExp	gdpPercap
803	Japan	2007	127467972.0	Asia	82.603	31656.06806
671	Hong Kong China	2007	6980412.0	Asia	82.208	39724.97867
695	Iceland	2007	301931.0	Europe	81.757	36180.78919

How to Get Top N Rows Based on Smallest Values of a Column in Pandas?

Just as you guessed, Pandas has the function nsmallest to select top rows of smallest values in one or more column, in descending order.

Let use see an example of using nsmallest on gapminder data. Here is how to get top 3 countries with smallest lifeExp.

gapminder_2007.nsmallest(3,'liefExp')
	country	year	pop	continent	lifeExp	gdpPercap
1463	Swaziland	2007	1133066.0	Africa	39.613	4513.480643
1043	Mozambique	2007	19951656.0	Africa	42.082	823.685621
1691	Zambia	2007	11746035.0	Africa	42.384	1271.211593

How do I show the first 5 rows in Python?

DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.

How do you get the first 5 rows in pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.

How do you display the first 3 rows in a DataFrame?

So to get first three rows of the dataframe, we can assign the value of n as '3'..
Syntax: Dataframe.head(n).
Syntax: dataframe.iloc[statrt_index, end_index+1].
Syntax: Dataframe.iloc [ [m,n] ].

How do you select the first 5 columns in Python?

Use head() to select the first N columns of pandas dataframe.
N = 5. # Select first 5 columns..
first_n_columns = df.T. head(N).T. print("First 5 Columns Of Dataframe : ").
print(first_n_columns) print('Type:').