R vs. Python – Pipes

This entry is part 2 of 3 in the series R vs. Python

R and python are two of the main programming languages used in data science. Sometimes it may be necessary to move from one programming language to another. For example because python is the more generalized programming language while R is more specialized on data analysis.
At the pyconDE 2022 I gave a talk with the title Rewriting your R analysis code in Python.
This is part 2 of a series of blog posts about the different topics I talked about in the talk.
Part 2 is all about pipes.

Summary

  • Part 1: Basic differences
  • Part 2: Pipes
  • Part 3: Classes
  • Part 4: Library Loading
  • Part 5: … vs. args,kwargs
  • Part 6: Non-standard evaluation
  • Part 7: Run R code from python and vice versa

Pipes

A pipe is something, where you can fill something in on one end, and it comes
out on the other end. But you can do some processing on the way.
For example: take a water pipe. Cold water gets filled in. It goes to a
heater, then the water is warm. Afterwards it goes to the dishwasher,
and you get clean dishes, but the water gets dirty.

The same happens when you fill in data into a data processing pipe.
Especially the pipe %>% symbol from the magrittr-package and the
helper function from the dplyr-package are a well known tool for
data processing pipes in R. Since R version 4.1, there is also the native pipe
|> which uses the |-symbol that was already used for pipes on unix shells for a long time.
The example uses the palmer penguins example data and groups the data by the species and does a simple summary per species:

R: magrittr-Pipe

library(dplyr)
library(palmerpenguins)
penguins %>%
  group_by(species) %>%
  summarise(
    n = n(),
    bill_depth = mean(bill_depth_mm, na.rm=TRUE),
    body_mass = mean(body_mass_g, na.rm=TRUE)
  )
## # A tibble: 3 × 4
##   species       n bill_depth body_mass
##   <fct>     <int>      <dbl>     <dbl>
## 1 Adelie      152       18.3     3701.
## 2 Chinstrap    68       18.4     3733.
## 3 Gentoo      124       15.0     5076.

R: Native Pipe

Same example using the native pipe:

penguins |>
  group_by(species) |>
  summarise(
    n = n(),
    bill_depth = mean(bill_depth_mm, na.rm=TRUE),
    body_mass = mean(body_mass_g, na.rm=TRUE)
  )
## # A tibble: 3 × 4
##   species       n bill_depth body_mass
##   <fct>     <int>      <dbl>     <dbl>
## 1 Adelie      152       18.3     3701.
## 2 Chinstrap    68       18.4     3733.
## 3 Gentoo      124       15.0     5076.

Python

Python does not have the same piping mechanism. But it is object oriented,
and you can execute functions that belong to the object by accessing them using the .-operator. And with \ you can tell python that you are writing your code in the next line. So you can do something that looks similar to R pipes in python using a pandas data frame and .\ as pipe-like operator:

!pip3 install palmerpenguins
from palmerpenguins import load_penguins
penguins = load_penguins()

penguins.\
  groupby('species').\
  aggregate({
    'bill_length_mm': ['mean', 'count'],
    'body_mass_g': ['mean']
  })
##           bill_depth_mm              body_mass_g
##                   count       mean          mean
## species                                         
## Adelie              151  18.346358 -1.412451e+07
## Chinstrap            68  18.420588  3.733088e+03
## Gentoo              123  14.982114 -1.731338e+07

The main advantage of the R pipe is, that you can simply combine functions from different packages as well as your own to create a data processing pipe.
In python, you can only use functions that already exist in the object. And adding more functions to those result objects requires much more work with classes and inheritance.

Python pipe Package

There is a package for python called pipe that allows piping of data using the |-sign,
but it does not work on pandas data frames.
But it can be used to define functions that to similar things that are known from unix command lines, like grep:

from pipe import Pipe
import re

@Pipe
def grep(iterable, pattern, flags=0):
  for line in iterable:
    if re.match(pattern, line, flags=flags):
    yield line
    lines = ["Hello", "hello", "World", "world"]
  for line in lines | grep("H"):
    print(line)
## Hello

I mention the pipe package as honorable mention, but since it works different from
the pipes in R, I do not recommend it.

Series Navigation<< R vs. Python – Basic differencesR vs. Python – Classes and objects >>

Schreiben Sie einen Kommentar

Ihre E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert