R vs. Python – Classes and objects

This entry is part 3 of 3 in the series R vs. Python

R and python are two of the main programming languages used in data science. Sometimes it may be necessary to move from one programming language to another. For example because python is the more generalized programming language while R is more specialized on data analysis. At the pyconDE 2022 I gave a talk with the title Rewriting your R analysis code in Python. This is part 3 of a series of blog posts about the different topics I talked about in the talk. Part 3 is all about classes and objects.

Classes and objects

R has multiple class systems, S3 classes, S4 classes and reference classes. Python has only one type of classes. In python you can find out the class of data using the type() function:

type(3)
## <class 'int'>

in R you do the same with the class() function:

class(3)
## [1] "numeric"

If you know objects and classes from other languages, you are most familiar with the concept of reference classes in R. Here you have functions that belong to an object and can work on that object. The class system in python is similar to reference classes in R. Since the default class system in R are S3 classes, I will start to explain them a bit, then moving to reference classes and python classes and then make some comparisons between them.

S3 classes

S3 classes are the default class system in R.

I am using the example from the book Advanced R.

You can just create a new class by using a list and then changing the class of that list:

bankAccount <- list(balance=100)
class(bankAccount) <- "Account"

You can define functions for a specific object by creating a specific version of an existing function for your class. For example, if you want that the print()-function works on that object, you can define the function print.Account() and every time print() is called, your print.Account() function is used to actually print out your object:

print.Account <- function(obj){
  print(obj$balance)
}

print(bankAccount)
## [1] 100

If you want to have specific function working on your object, you can write the base function, like withdraw() and make sure using the UseMethod()-function, that the version of the withdraw() function for your class is called:

withdraw <- function(obj, amount){
  UseMethod("withdraw", obj)
}

withdraw.Account <- function(obj,amount) {
  obj$balance <- obj$balance - amount
  return(obj)
}

bankAccount <- withdraw(bankAccount, 10)

print(bankAccount)
## [1] 90

S3-classes are very flexibel, you can just add new functions and new fields to them, like to any other list in R:

# Modify print function to also check if the new name field is present and print accordingly:
print.Account <- function(obj){
  if(is.null(obj$owner)){
    print(obj$balance)
  } else {
    print(glue::glue("Bank Account:\n Balance: {obj$balance}\n Owner: {obj$owner}"))
  }
}

print(bankAccount)
## [1] 90
# Add the new field:
bankAccount$owner <- "Me"

print(bankAccount)
## Bank Account:
## Balance: 90
## Owner: Me

Reference classes

A reference class is based on the S4 class system in R, but also has an environment in it. I won’t explain much about S4 classes, but if you encounter one, you may find @ instead of $ to identify fields in it.

You define the reference class explicitly:

R:

AccountRefClass <- setRefClass("AccountRefClass",
                       fields = list(balance = "numeric"),
                       methods = list(
                         withdraw = function(x){
                           "withdraw an amount x of money"
                           balance <<- balance - x
                         }
                       )
)

bankAccount <- AccountRefClass$new(balance = 100)
bankAccount$withdraw(10)

The first string in a method definition of the class is the doc-string, or documentation string. If you create your documentation in R with doxygen, that string will be used as documentation for the method.

The reference class already comes with its own definition for print(). This is the internal function show():

print(bankAccount)
## Reference class object of class "AccountRefClass"
## Field "balance":
## [1] 90
# is equivalent to

bankAccount$show()
## Reference class object of class "AccountRefClass"
## Field "balance":
## [1] 90

You can add new functions, like your own definition of show() to the reference class like this:

AccountRefClass$methods(
  show = function(){
    "Print out the balance of the account"
    print(balance)
  }
)
bankAccount <- AccountRefClass$new(balance = 100)
bankAccount$withdraw(10)
print(bankAccount)
## [1] 90

But you cannot just add a field to the bankAccount-object like it is possible in the S3 class:

bankAccount$owner <- 'Me'
## Error in envRefSetField(x, what, refObjectClass(x), selfEnv, value): 'owner' is not a field in class "AccountRefClass"

Also objects of reference classes are references. This means assigning an object to a new name just gives you two variable names which point to the same object:

bankAccount2 <- bankAccount
# this also changes bancAccount:
bankAccount2$withdraw(12)

print(bankAccount)
## [1] 78

You need to explicitly copy the object, using the copy()-function:

bankAccount2 <- bankAccount$copy()
# this also changes bancAccount:
bankAccount2$withdraw(8)

print(bankAccount)
## [1] 78
print(bankAccount2)
## [1] 70

Also you can still use the S3 way of creating a generic function that works on you reference class like this:

as.character.AccountRefClass <- function(obj){
  as.character(obj$balance)
}

print(as.character(bankAccount))
## [1] "78"

Reference classes have their own default constructur, but you can overwrite what happens when a new object is created by defining the initialize()-function.

Python classes

In python, all classes are explicitly defined:

class Account:
    def __init__(self, balance : float):
        self.balance = balance
    def withdraw(self,x : float):
        """withdraw an amount x of money"""
        self.balance -= x
    def __str__(self):
      return str(self.balance)

bankAccount = Account(balance = 100)
bankAccount.withdraw(10)
print(bankAccount)
## 90

Python has special method names, the dunder names like __init__() and __str__(). They are used for specific behavior, for example __init__() is the constructor function that is called when you create an object of that class. The __str__() function returns a string representation of the class, something like as.character.AccountRefClass() in the R reference class example. It is also called when you use print(). So the special functions are a bit more clear in python than in R. In python you can define the documentation string at the beginning of a function in the same way as in R.

You can access the doc string using the __doc__-attribute of the function:

print(bankAccount.withdraw.__doc__)
## withdraw an amount x of money

Like S3 objects, you can modify a python object on the fly:

bankAccount.owner = "Me"
print(bankAccount.owner)
## Me

You can also monkeypatch the functions of a class after you created already the objects for it, using setattr():

def bankAccount__str__(self):
  return f"Bank Account:\n Balance: {self.balance}\n Owner: {self.owner}"

setattr(Account, "__str__", bankAccount__str__)

print(bankAccount)
## Bank Account:
##  Balance: 90
##  Owner: Me

If you want to copy an object, you have the same behavior as in R reference classes, the copy points to the same object in the memory:

bankAccount2 = bankAccount
bankAccount.withdraw(12)
print(bankAccount2.balance, bankAccount.balance)
## 78 78

But you can use the copy package to really create a copy of you robject:

import copy
bankAccount2 = copy.deepcopy(bankAccount)
bankAccount.withdraw(8)
print(bankAccount2.balance, bankAccount.balance)
## 78 70

Discussion

Reference classes in R and classes in python are very similar, but they are not the same. You can also use inheritance with both of them, meaning you can base your own class on an existing one. Python classes and objects are more flexibel than R reference classes, but if you want that type of flexibility in R, you can always use S3 classes instead. I find it much more convenient to use reference classes in R than S3 classes, because they feel much more natural to me, because they are more like classes in C++ and python.

Series Navigation<< R vs. Python – Pipes

Schreiben Sie einen Kommentar

Ihre E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert