How do you make a scatter plot in python?
Scatter PlotA scatter plot is a diagram where each value in the data set is represented by a dot. Show
The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the same length, one for the values of the x-axis, and one for the values of the y-axis:
The The
ExampleUse the import matplotlib.pyplot as plt x = [5,7,8,7,2,17,2,9,4,11,12,9,6] plt.scatter(x, y) Result:Run example » Scatter Plot ExplainedThe x-axis represents ages, and the y-axis represents speeds. What we can read from the diagram is that the two fastest cars were both 2 years old, and the slowest car was 12 years old. Note: It seems that the newer the car, the faster it drives, but that could be a coincidence, after all we only registered 13 cars. Random Data DistributionsIn Machine Learning the data sets can contain thousands-, or even millions, of values. You might not have real world data when you are testing an algorithm, you might have to use randomly generated values. As we have learned in the previous chapter, the NumPy module can help us with that! Let us create two arrays that are both filled with 1000 random numbers from a normal data distribution. The first array will have the mean set to 5.0 with a standard deviation of 1.0. The second array will have the mean set to 10.0 with a standard deviation of 2.0: ExampleA scatter plot with 1000 dots: import numpy x = numpy.random.normal(5.0, 1.0, 1000) plt.scatter(x, y) Result:Run example » Scatter Plot ExplainedWe can see that the dots are concentrated around the value 5 on the x-axis, and 10 on the y-axis. We can also see that the spread is wider on the y-axis than on the x-axis. Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Using plt.scatter() to Visualize Data in Python An important part of working with data is being able to visualize it. Python has several third-party modules you can use for data
visualization. One of the most popular modules is Matplotlib and its submodule pyplot, often referred to using the alias Below, you’ll walk through several examples that will show you how to use the function effectively. In this tutorial you’ll learn how to:
To get the most out of this tutorial, you should be familiar with the fundamentals of Python
programming and the basics of NumPy and its Creating Scatter PlotsA scatter plot is a visual representation of how two variables relate to each other. You can use scatter plots to explore the relationship between two variables, for example by looking for any correlation between them. In this section of the tutorial, you’ll become familiar with creating basic scatter plots using Matplotlib. In later sections, you’ll learn how to further customize your plots to represent more complex data using more than two dimensions. Getting Started With plt.scatter()Before you can
start working with
Now that you have Matplotlib installed, consider the following use case. A café sells six different types of bottled orange drinks. The owner wants to understand the relationship between the price of the drinks and how many of each one he sells, so he keeps track of how many of each drink he sells every day. You can visualize this relationship as follows:
In this Python script, you import the Finally, you create the scatter plot by using When you’re using an interactive environment, such as a console or a
Jupyter Notebook, you don’t need to call Here’s the output from this code: This plot shows that, in general, the more expensive a drink is, the fewer items are sold. However, the drink that costs $4.02 is an outlier, which may show that it’s a particularly popular product. When using scatter plots in this way, close inspection can help you explore the relationship between variables. You can then carry out further analysis, whether it’s using linear regression or other techniques. Comparing plt.scatter() and plt.plot()You can also produce the scatter plot shown above using another function within You can achieve the same scatter plot as the one you obtained in the section above with the following call to
In this case, you had to include the marker In some instances, for the basic scatter plot you’re plotting in this example, using
The performance will vary on different computers, but
when you run this code, you’ll find that If you can create scatter plots using
In the next section, you’ll start exploring more advanced uses of Customizing Markers in Scatter PlotsYou can visualize more than two variables on a two-dimensional scatter plot by customizing the markers. There are four main features of the markers used in a scatter plot that you can customize with
In this section of the tutorial, you’ll learn how to modify all these properties. Changing the SizeLet’s return to the café owner you met earlier in this tutorial. The different orange drinks he sells come from different suppliers and have different profit margins. You can show this additional information in the scatter plot by adjusting the size of the marker. The profit margin is given as a percentage in this example:
You can notice a few changes from the first example. Instead of lists, you’re now using NumPy arrays. You can use any array-like data structure for the data, and NumPy arrays are commonly used in these types of applications since they enable element-wise operations that are performed efficiently. The NumPy module is a dependency of Matplotlib, which is why you don’t need to install it manually. You’ve also used named parameters as input arguments in the function call. The parameters The parameter You can see the scatter plot created by this code below: The size of the marker indicates the profit margin for each product. The two orange drinks that sell most are also the ones that have the highest profit margin. This is good news for the café owner! Changing the ColorMany of the customers of the café like to read the labels carefully, especially to find out the sugar content of the drinks they’re buying. The café owner wants to emphasize his selection of healthy foods in his next marketing campaign, so he categorizes the drinks based on their sugar content and uses a traffic light system to indicate low, medium, or high sugar content for the drinks. You can add color to the markers in the scatter plot to show the sugar content of each drink:
You define the variables You then defined the variable The café owner has already decided to remove the most expensive drink from the menu as this doesn’t sell well and has a high sugar content. Should he also stop stocking the cheapest of the drinks to boost the health credentials of the business, even though it sells well and has a good profit margin? Changing the ShapeThe café owner has found this exercise very useful, and he wants to investigate another product. In addition to the orange drinks, you’ll now also plot similar data for the range of cereal bars available in the café:
In this code, you refactor the variable names to take into account that you now have data for two different products. You then plot both scatter plots in a single figure. This gives the following output: Unfortunately, you can no longer figure out which data points belong to the orange drinks and which to the cereal bars. You can change the shape of the marker for one of the scatter plots:
You keep the default marker shape for the orange drink data. The default marker is You can now distinguish the data points for the orange drinks from those for the cereal bars. But there is one problem with the last plot you created that you’ll explore in the next section. Changing the TransparencyOne of the data points for the orange drinks has disappeared. There should be six orange drinks, but only five round markers can be seen in the figure. One of the cereal bar data points is hiding an orange drink data point. You can fix this visualization problem by making the data points partially transparent using the alpha value:
You’ve set the You’ve also added a title and other labels to the plot to complete the figure with more information about what’s being displayed. Customizing the Colormap and StyleIn the scatter plots you’ve created so far, you’ve used three colors to represent low, medium, or high sugar content for the drinks and cereal bars. You’ll now change this so that the color directly represents the actual sugar content of the items. You first need to refactor the variables
These are now lists containing the percentage of the daily recommended amount of sugar in each item. The rest of the code remains the same, but you can now choose the colormap to use. This maps values to colors:
The color of the markers is now based on a continuous scale, and you’ve also displayed the colorbar that acts as a legend for the color of the markers. Here’s the resulting scatter plot: All the plots you’ve plotted so far have been displayed in the native Matplotlib style. You can change this style by using one of several options. You can display the available styles using the following command: >>>
You can now change the plot style when using Matplotlib by using the following function call before calling
This changes the style to that of Seaborn, another third-party visualization package. You can see the different style by plotting the final scatter plot you displayed above using the Seaborn style: You can read more about customizing plots in Matplotlib, and there are also further tutorials on the Matplotlib documentation pages. Using
The ability to represent more than two variables makes Exploring plt.scatter() Further
A commuter who’s keen on collecting data has collated the arrival times for buses at her local bus stop over a six-month period. The timetabled arrival times are at 15 minutes and 45 minutes past the hour, but she noticed that the true arrival times follow a normal distribution around these times: This plot shows the relative likelihood of a bus arriving at each minute within an hour. This probability distribution can be represented using NumPy and
You’ve created two normal distributions centered on You can now simulate bus arrival times using this distribution. To do this, you can create random times and random relative probabilities using the built-in
You’ve simulated Your plot will look different since the data you’re generating is random. However, not all of these points are likely to be close to the reality that the commuter observed from the data she gathered and analyzed. You can plot the distribution she obtained from the data with the simulated bus arrivals:
This gives the following output: To keep the simulation realistic, you need to make sure that the random bus arrivals match the data and the distribution obtained from those data. You can filter the randomly generated points by keeping only the ones that fall within the probability distribution. You can achieve this by creating a mask for the scatter plot:
The variables You’ve segmented the data points from the original scatter plot based on whether they fall within the distribution and used a different color and marker to identify the two sets of data. Reviewing the Key Input ParametersYou’ve learned about the main input parameters to create scatter plots in the sections above. Here’s a brief summary of key points to remember about the main input parameters:
These are not the only input parameters available with ConclusionNow
that you know how to create and customize scatter plots using In this tutorial you’ve learned how to:
You can get the most out of visualization using Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Using plt.scatter() to Visualize Data in Python How do I make a scatter plot in Python?Machine Learning - Scatter Plot. Example. Use the scatter() method to draw a scatter plot diagram: import matplotlib.pyplot as plt. x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86] ... . Example. A scatter plot with 1000 dots: import numpy. import matplotlib.pyplot as plt. ... . ❮ Previous Next ❯. Which function will create scatter plot in Python?Matplotlib has a built-in function to create scatterplots called scatter() .
How do you plot a scatter plot between two variables in Python?Set the figure size and adjust the padding between and around the subplots.. Create random xs and ys data points using numpy.. Zip xs and ys. Iterate them together.. Make a scatter plot with each x and y values.. To display the figure, use show() method.. |