Hướng dẫn pandas vs excel reddit

If most of your stakeholders work within the MS Office environment and datasets are small, then Excel is sufficient for data exploration and charting. VBA is useful for working with tasks like naming sheets, automating copying and pasting, creating and deleting GUI objects, and allowing your stakeholders to trigger tasks at the click of a button.

That being said, all of this can also be done in Python, and pandas becomes invaluable when you run into large datasets and need to perform matrix operations instead of VLOOKUPing against 1M rows for five minutes. Python is also more extensible for advanced statistics (think test statistics, multivariate regression, supervised and unsupervised learning, the list goes on).

I think that free VBA tutorials should be abundant online (try... Coursera?)

I don't have a review of the course itself.

However, I'm very much against buying a course to teach you such a specific area of a topic. If you know python, pandas is all about finding a dataset, load it into a variable, and pick away. You won't really learn anything that you will not end up googling anyway. Particularly for pandas, there are so many cheat sheets that teach you most things. Between that and googling/stackexchange, you'll have everything you need.

If, on the other hand, you don't feel comfortable with python beyond simple tasks, then i guess you can try to find tutorials on it.

However, there are many free and decent tutorials that give you everything you need.

Hello everyone,

I have been working as a data analyst for about 3 years and about 2 years ago I switched to Pandas due to how much potential it had to improve my life: I am very bad at cataloging and record keeping, so doing things in Pandas makes everything so traceable and reproducible for me. Plus Excel always becomes super slow for me when the rows surpass 80K.

This week I will give my team members a quick demonstration of Pandas's capabilities. A lot of my team members use Excel for most of their data analysis, so I'd like to show them some handy operations that can be done by Pandas in much shorter time, hence improving our efficiency.

Several operations I have in mind are:

  • The magic power of unstack to move columns into rows, usually accompanying set_index() and groupby.

  • df.duplicated(), esp if I want to find the rows that have duplicate values in some columns, I can do df[df.duplicated(['A', 'B'], keep=False)]

  • I always have to write a pretty long formula in Excel to find values in column A that is in or not in column B. I love how I can just do df[~df.colA.isin(df1.colB)]

  • `merge` compared to vlookup or indexmatch

That's about what I have in mind right now. Some other operations like drop_duplicates() or sum() can be done not without too much effort in Excel. I don't think I need to talk about groupby() as we all use SQL on a regular basis.

Would anyone care to give me some additional examples please?

Thanks for your time in advance!

Hướng dẫn pandas vs excel reddit

I know some people prefer to export the csv into pandas and then filter the data from there but I am currently going through a list of articles and trying to determine which of them are listicles based on their headline. Filtering to see if the headline has a number and then seeing if the article is actually a list vs. an article that has the date/year in the headline. Then I would assign these articles with a "list" value. Would it make more sense to do this in python or excel? Thanks

Hướng dẫn pandas vs excel reddit

level 1

If you need repeateable processes always use programming > excel.I try using excel as less as possible unless its for making a quick graph or something. Any type of data processing I do in pandas.

level 1

  1. imagine you need to go back and explain to someone every single thing you did to transform the original data set into its current form. And you need to show them step by step exactly as it was done the first time. Could you do that if you only used Excel?

  2. this is a fantastic exercise to get some practice using Python and improve your skillsets. Why not give it a shot?

level 2

Technically you can with Excel’s Power Query.

level 1

One reason is if you have a very large dataset. I’m pretty sure Excel sheets have a limit on the numbers of rows and columns.

Another good reason is where you want a reproducible workflow, where someone else can look at your script and see exactly how you filtered the data. Same is true where you need filters to be applied as the dataset gets updated with new records.

level 1

Excel should only be used for looking at the data. Don't do any transformations with it really.