How do you merge rows in python?
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Combining Data in pandas With concat() and merge() Show
The In this tutorial, you’ll learn how and when to combine your data in pandas with:
If you have some experience using You can follow along with the examples in this tutorial using the interactive Jupyter Notebook and data files available at the link below: pandas merge(): Combining Data on Common Columns or IndicesThe first technique that you’ll learn is When you want to combine data
objects based on one or more keys, similar to what you’d do in a relational database, You can achieve both many-to-one and many-to-many joins with As you might have guessed, in a many-to-many join, both of your merge columns will have repeated values. These merges are more complex and result in the Cartesian product of the joined rows. This means that, after the merge, you’ll have every combination of rows that share the same value in the key column. You’ll see this in action in the examples below. What makes When you use
After that, you can provide a number of optional arguments to define how your datasets are merged:
These are some of the most important parameters to pass to
How to Use merge()Before getting into the details of how to use
You’ll learn about these different joins in detail below, but first take a look at this visual representation of them: In this image, the two circles are your two datasets, and the labels point to which part or parts of the datasets you can expect to see. While this diagram doesn’t cover all the nuance, it can be a handy guide for visual learners. If you have an SQL background, then you may recognize the merge operation names from the You can also see a visual explanation of the various joins in an SQL context on Coding Horror. Now take a look at the different joins in action. ExamplesMany pandas tutorials provide very simple DataFrames to illustrate the concepts that they are trying to explain. This approach can be confusing since you can’t relate the data to anything concrete. So, for this tutorial, you’ll use two real-world datasets as the DataFrames to be merged:
You can explore these datasets and follow along with the examples below using the interactive Jupyter Notebook and climate data CSVs: If you’d like to learn how to use Jupyter Notebooks, then check out Jupyter Notebook: An Introduction. These two datasets are from the National Oceanic and Atmospheric Administration (NOAA) and were derived from the NOAA public data repository. First, load the datasets into separate DataFrames: >>>
In the code above, you used pandas’ >>>
Here, you used Next, take a quick look at the dimensions of the two DataFrames: >>>
Note that Inner JoinIn this example, you’ll use With the two datasets loaded into >>>
Here you’ve created a new DataFrame called If you check the >>>
If you guessed 365 rows, then you were correct! This is because With >>>
You can specify a single key column with a string or multiple key columns with a list. This results in a DataFrame with 123,005 rows and 48 columns. Why 48 columns instead of 47? Because you specified the key columns to join on, pandas doesn’t try to merge all mergeable columns. This can result in “duplicate” column names, which may or may not have different values. “Duplicate” is in quotation marks because the column names will not be an exact match. By default, they are appended with To prevent surprises, all the following examples
will use the Outer JoinHere, you’ll specify an outer join with the If a row doesn’t have a match in the other DataFrame based on the key column(s), then you won’t lose the row like you would with an inner join. Instead, the row will be in the merged DataFrame, with This is best illustrated in an example: >>>
If you remember from when you checked the Left JoinIn this example, you’ll specify a left join—also known as a left outer
join—with the You can think of this as a half-outer, half-inner merge. The example below shows you this in action: >>>
>>>
This results in a DataFrame with 365 rows, matching the number of rows in Right JoinThe right join, or right outer join, is the mirror-image version of the left join. With this join, all rows from the right DataFrame will be retained, while rows in the left DataFrame without a match in the key column of the right DataFrame will be discarded. To demonstrate how right and left joins are mirror images of each other, in the example
below you’ll recreate the >>>
Here, you simply flipped the positions of the input DataFrames and specified a right join. When you inspect
On the other hand, this complexity makes Now, you’ll look at pandas .join(): Combining Data on a Column or IndexWhile Under the hood, >>>
With the indices visible, you can see a left join happening here, with Now flip the previous example around and instead call >>>
Notice that the DataFrame is larger, but data that doesn’t exist in the smaller DataFrame, How to Use .join()By default, Like
ExamplesIn this section, you’ll see examples showing a few different use cases for Since you already saw a short >>>
Because With this, the connection between Below you’ll see a >>>
This example should be reminiscent of what you saw in the introduction to In this section, you’ve learned about pandas concat(): Combining Data Across Rows or ColumnsConcatenation is a bit different from the merging techniques that you saw above. With merging, you can expect the resulting dataset to have rows from the parent datasets mixed in together, often based on some commonality. Depending on the type of merge, you might also lose rows that don’t have matches in the other dataset. With concatenation, your datasets are just stitched together along an axis — either the row axis or column axis. Visually, a concatenation with no parameters along rows would look like this: To implement this in code, you’ll use
What if you wanted to perform a concatenation along columns instead? First, take a look at a visual representation of this operation: To accomplish this, you’ll use a
You’ll learn more about the parameters for How to Use concat()As you can see, concatenation is a simpler way to combine datasets. It’s often used to form a single, larger set to do additional operations on. When you concatenate datasets, you can specify the axis along which you’ll concatenate. But what happens with the other axis? Nothing. By default, a concatenation results in a set
union, where all data is preserved. You’ve seen this with If you use this parameter, then the default is As with the other inner joins you saw earlier, some data loss can occur when you do an inner join with Since you learned about the
This list isn’t exhaustive. You can find the complete, up-to-date list of parameters in the pandas documentation. ExamplesFirst, you’ll do a basic concatenation along the default axis using the DataFrames that you’ve been playing with throughout this tutorial: >>>
This one is very simple by design. Here, you created a DataFrame that is a double of a small DataFrame that was made earlier. One thing to notice is that the indices repeat. If you want a fresh, 0-based index, then you can use the >>>
As noted before, if you concatenate along axis 0 (rows) but have labels in
axis 1 (columns) that don’t match, then those columns will be added and filled in with >>>
With these two DataFrames, since you’re just concatenating along rows, very few columns have the same name. That means you’ll see a lot of columns with To instead drop columns that have any missing data, use the >>>
Using the inner join, you’ll be left with only those columns that the original DataFrames have in common: You can also flip this by setting the >>>
Now you have only the rows that have data for all columns in both DataFrames. It’s no coincidence that the number of rows corresponds with that of the smaller DataFrame. Another useful trick for concatenation is using the >>>
If you check on the original DataFrames, then you can verify
whether the higher-level axis labels ConclusionYou’ve now learned the three most important techniques for combining data in pandas:
In addition to learning how to use these techniques, you also learned about set logic by experimenting with the different ways to join your datasets. Additionally, you learned about the most common parameters to each of the above techniques, and what arguments you can pass to customize their output. You saw these techniques in action on a real dataset obtained from the NOAA, which showed you not only how to combine your data but also the benefits of doing so with pandas’ built-in techniques. If you haven’t downloaded the project files yet, you can get them here: Did you learn something new? Figure out a creative way to solve a problem by combining complex datasets? Let us know in the comments below! Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Combining Data in pandas With concat() and merge() How do you merge two rows in Python?The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
How do you merge columns and rows in Python?You've now learned the three most important techniques for combining data in pandas:. merge() for combining data on common columns or indices.. . join() for combining data on a key column or an index.. concat() for combining DataFrames across rows or columns.. How do you merge in Python?If a key combination does not appear in either the left or the right tables, the values in the joined table will be NA.
...
Merge Using 'how' Argument.. How do you group multiple rows in Python?You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
How do I combine DataFrame rows?We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
How do you merge two tables in Python?The concat() function performs concatenation operations of multiple tables along one of the axes (row-wise or column-wise).
|