Do i need r if i know python

Posted byu/[deleted]5 years ago

I'm a college student and I've learned quite a bit of Python. I'm not an expert, but I have gotten to the point that I can do basic DS projects in this language.

I know that the other main language used in this field is R. Should I now start learning R, or is there little benefit to doing so? Is there anything that can be done with R but not with Python? Will knowing both Python and R make me more employable?

level 1

I get asked this question a lot at work. My response after having worked in both languages a lot is that you don't learn R like a normal language. You learn advanced statistics first, then R feels natural. What I mean is that the normal constructs from Python don't map to R. R from a computer science language perspective is pretty terrible, you can't write good production code in it. But it's wonderful from an applied math perspective. You can do fairly complex statistical data analysis in just a few lines and see formatted output meant for human eyes with little effort. If you like stats, pick up one of the books like "learning statistics with R" and work through it from a math first perspective as opposed to an algorithm first perspective like you would with Python or other production languages. Start with formulating and testing null hypothesis and figuring out frequentist statistics and t tests, then move on to Bayesian stats, linear regression, qq plots, measures of correlation, autoencoders, there is tons of awesome in there that's useful and fun learning. If none of this sounds interesting, don't bother with R.

level 2

Thank you so much, really well explained. Always wanted to learn stats and apply it using programming. Your comment has given me a good place to start off

level 2

Oh... My normal workflow is to do feature extraction and filtering at scale with Python and gnu parallel or spark, then do exploratory analysis on sensibly sized and representive samples with R. However more and more I find myself staying in python as the stats libraries get better there.

level 2

· 5 yr. agoData Scientist | Non-profit

I kinda disagree with this -> i'm taking an analytical modeling course in R and find that because of how the models are in different packages with no coherence I fight formatting issues and memory errors more than I actually get the benefits from "a few lines of code". Everything is done in a few lines, but my lines are usually broken if I assume that the data doesn't have to be reformatted, where as with python any of the DS stack can be used as long as the data is in a numpy format.

level 1

Depends almost entirely on what you want to do. Be an ML engineer focusing on productionalizing models - maybe not worth it. Be a high level analyst and want to use ggplot - probably more so.

As for my situation, I'm a DS at fairly well known company in the Bay and I survive solely with Python/Pandas and sometimes Spark.

level 2

I operationalize and prototype using R and H20. I use R data.table because it handles bigger data more efficiently than pandas. The blocker of R is that computer science majors don't typically learn how to use R and they end up doing engineering tasks more so than pure quants, thus constraining the production options.

level 1

I'm a statistician working with both languages on daily basis. Many statistical methods are available only in R. The basic packages in R includes a lot of variants that python doesn't provide at the moment. In addition, a lot of cutting-edge methods are implemented in R only, and it's unlikely to change soon. It is really useful for a data scientist to be familiar with both.

level 2

Yes, this is especially true with things like time series regression, semiparametric estimation, etc

I know both R and Python but almost always stick with R

level 2

exactly, I haven't seen better combo than R+data.table+ggplot2 & co.

level 1

Also student, I would say to at least be familiar with R. In my opinion, it's a lot better for data exploration and visualization for your personal understanding of the data, as it is easier to code a plot and other things. Python would be better for modelling, as R runs on a single thread.

Overall, I would say R helps you learn your data a lot faster, other than that I don't know if there is a difference.

level 2

What do you use in R that helps you learn your data faster?

level 1

I've been in the same boat. I took an intro to CS class in Python and also TA'd the class afterwards. However, for about a year and half after that, I took many Stats/ML classes that were exclusively taught in R and I got very comfortable with R. I kinda abandoned Python esp for statistical analysis, data wrangling and basically all things data. R is a good language but not a great one. Ggplot is amazing to look at but I'm actually trying to go back to Python for data analysis. There are countless annoying bugs in R that I still cannot explain up to this day. 3D array in dplyr is a lot slower than if you were to do it in numpy. Overall, I wouldn't recommend it from my personal experience since i'm just one of those 'who love to hate R'. I would recommend getting familiar with R syntax so at least you'll be able to read others' code on GitHub.

Remember the biggest difference: Python was created by computer scientists and R was created by statisticians.

level 1

I use R for mixed effects models and some data cleansing [for some things the tidyverse is better than pandas]. In python, there is poor support for mixed effects models, especially GLM mixed models.

level 2

meh just use go full Bayes and use pystan

level 1

To answer your last question, absolutely. I learned R first but after using Python for my recent projects, I never use R anymore. It is good to know both but use whatever you are most comfortable using.

level 1

If you're in undergrad than become proficient at some level enough to be hired based on the skill. You're so young you have no idea where your career will take you. If you're a little older or set in your career it's only necessary if you're more of a quantitative researcher / statistician.

level 1

ggplot and tidyverse [e.g., dplyr] are worth learning.

The example I give for first seeing the power of ggplot is when you want a graph faceted by something [country, language, category, ..].

level 1

· 5 yr. agoData Scientist | Non-profit

I think there is a place for R, and I'm sure there are some functions in R that python can't do. Those things are rare and working in R is terrible if you are good at coding.

level 1

Worth knowing, but Python is rapidly eclipsing R, even in academic circles. Personally, I've opted to get a better understanding of popular Python packages like numpy, keras, and dash, instead of delving deeper into R. I also use Pyspark extensively for working with datasets too large for pandas. It seems that Pyspark has more to offer than SparkR.

level 2

I dont think it's true that python is eclipsing R. Especially academics don't need and want to learn to program, and R has much, much better materials for learning to do X in academic field Y. Python and R seem to grow together, but new statistical packages usually are written in R first.

Can I use Python instead of R?

A: Python is better than R as it can be used for multiple purposes. It has better scalability, performance, integration, etc. However, if the purpose is data analysis and visualization, R is a better option.

How easy is R If you know Python?

Both Python and R are considered fairly easy languages to learn. Python was originally designed for software development. If you have previous experience with Java or C++, you may be able to pick up Python more naturally than R. If you have a background in statistics, on the other hand, R could be a bit easier.

Do I need both Python and R?

In general, you shouldn't be choosing between R and Python, but instead should be working towards having both in your toolbox. Investing your time into acquiring working knowledge of the two languages is worthwhile and practical for multiple reasons.

Can Python do everything R can?

While Python and R can basically both do any data science task you can think of, there are some areas where one language is stronger than the other. The majority of deep learning research is done in Python, so tools such as Keras and PyTorch have "Python-first" development.

Chủ Đề