Transformation in Trials

Navigating the Shift: From SAS to R in Clinical Trials with Sunil Gupta

Sam Parnell & Ivanna Rosendal Season 5 Episode 11

Send us a text

Why has SAS been the cornerstone of clinical trials for decades, and what is causing the shift to R now? Join us on "Transformation in Trials" as we explore this pivotal transition with Sunil Gupta, a seasoned programmer who has navigated both worlds. Sunil elaborates on SAS's long-standing dominance due to its robust programming capabilities and ease of use for FDA reviewers. However, the landscape is changing as R garners attention for its graphical prowess and collaborative potential. The conversation highlights the growing trend of new graduates versed in R and Python, which alleviates the shortage of SAS programmers and signifies a modernization wave in clinical trials, ultimately aiming to accelerate drug development.

Our discussion takes a deep dive into the collaborative spirit driving this transition, particularly through initiatives like Pharmaverse. This specialized extension of Tidyverse is designed to optimize clinical data workflows within the pharmaceutical industry. Sunil shares insights into how clinical programming is evolving, focusing on standardized data models and the unification of CDISC standards. The conversation underscores the importance of collaboration and resource sharing, allowing organizations to tackle complex challenges efficiently and improve patient outcomes while adapting to R's open-source environment and its growing acceptance in pharmaceutical submissions.

Transitioning from SAS to R is not just about adopting a new tool; it's about mastering new skills and embracing change. Sunil discusses the challenges he faced when learning R, from understanding its functional approach to navigating intricate syntax. He offers personal anecdotes that reflect his journey and the broader industry shift, emphasizing the importance of validation processes and resource optimization. As we conclude this episode, we express gratitude for the platform to discuss these transformative changes and invite listeners to engage with us on future topics, continuing to share success stories that inspire more organizations to embrace the power of R.


________
Reach out to Sam Parnell and Ivanna Rosendal

Join the conversation on our LinkedIn page

Speaker 1:

Welcome to another episode of Transformation in Trials. I'm your host, Ivana Rosendahl. In this podcast, we explore how clinical trials are currently transforming so we can identify trends that can be further accelerated. We want to ensure that no patient has to wait for treatment and we get drugs to them as quickly as possible. Welcome to another episode of Transformation in Trials. Today, we're going to be focusing on the topic of submissions using R instead of SAS, and in the studio with me today I have Sunil Gupta, who is an experienced SAS programmer, who now also is an experienced R programmer. And, Sunil, I'm very happy to have you here Setting the stage. Could you tell us more about why SAS has been the default programming language for clinical trials and how R is now entering the stage?

Speaker 2:

Yeah, I think that's a very good question. Sas has been well. I've been in the pharmaceutical industry for well over three decades and SAS has been the leader. It's at the standard for how to do clinical trials and I think FDA reviewers predominantly use SAS. They're used to seeing SAS output and so SAS was greatly embraced, used by pharmaceutical companies, cros. It was developed there in Cary, north Carolina, and so it's a strong research and analysis type of company and I think that R has been gaining a lot of traction, a lot of interest.

Speaker 2:

More recently, I first learned about R about four or five years ago, and I noticed that many of the features that people would say about R is the ability for graphs being able to generate graphs a lot easier than it is in SAS.

Speaker 2:

Of course, sas has got a lot of graphic capability and there's a language itself with programming, with the graphic customization, and I first heard about R along those lines that graphs can be easily produced and I first heard about R along those lines that a grass can be easily produced, and now R has gained tremendous traction and momentum. You know in the industry that you can't ignore it. You have many things that are going on in R. You know, you see that as you go to the conferences you see a lot of papers being talked about it, and so with the Pharmaverse packages that are there too, a lot of organizations completely embracing it. So we're going through a big transition from what we've been doing traditionally for three decades into something new, more new territory. But I think we're going in the right direction, because there's a tremendous amount of collaboration that we have not seen before and ours really introducing that to us in this field.

Speaker 1:

Did that make sense? What was it that made SaaS so distinct within this place in the first place?

Speaker 2:

Yeah, sas's strength really lies in the strong programming language you have.

Speaker 2:

You know things that you need to bring data into SAS, so you have data input, data management, data analysis, statistical modeling and then the graphs and so you have all this and you have the components of macro programming to build applications.

Speaker 2:

You also have a dashboard, so you have modules that enable you to be extremely productive in SaaS. And, of course, saas has got the history with mainframe and has versions, and so people got very used to using SaaS in order to accomplish the goal. They also went into the vertical market and the life sciences and dedicated to having an environment where it's more controlled. We have versioning going on and documentation and so forth. They also got into Enterprise Guide. So they've invested tremendously and SAS has always done that throughout the years to invest tremendously in the research and development, to listen to customers, clients, exactly how they're using their products, and so they've been a leader, not just in the pharmaceutical industry but, you know, entertainment, energy, utility, manufacturing, all across the board, because they saw the need for analyzing massive amounts of data to make decisions, and also in the credit industry, with transactions that are going on.

Speaker 1:

And how would one create graphs when the data has been processed in SAS?

Speaker 2:

Yeah, so there are. Of course there's ODS that enables you to help create various types of output file types, but there is a language in itself, so in that language there's a procedure. Within the procedure there's various syntax and various options, and so previously it was a concept called annotating. Annotating, where you have to construct a data set that basically has all of the parameters and the conditions to place items that you want on the graph. It's kind of a separate process that you have to learn, so it's a little bit of a learning curve there.

Speaker 2:

That was the initial part of it, and then the templates itself. It kind of expanded on that, so it required a little bit more knowledge of what are the various types of options that are available. That's how you can really customize things in SaaS, and from using R, they're using a predefined templates, so they know the types of analysis that users want to perform data scientists and so they have these templates that you can add on to, and so I think they're more designed towards the user versus more of a programming language that you have in SaaS.

Speaker 1:

And how are people usually trained in SAS? When starting in pharmaceutical companies, did they already come pre-trained or did they get training on the job?

Speaker 2:

Yeah, sas was and I think it still is, but more so before predominantly used in schools, graduate schools, epidemiology and statisticians they would use SAS, so they're very comfortable, very familiar with SAS. But with the trend in the past five, six years or maybe longer, there's been, I think, a shift graduating moving more towards R and Python as well, where you see more people using R versus SAS in order to get the similar types of things done. So you see more graduates of people understanding and doing R, and R has really gone a lot in the data science field and many of the technology areas, and so I think you see more. You know SAS is still there but you see more R graduates. And one of the challenges you know pharmaceutical industry faces is, you know been the shortage of SAS programmers. And now you know we have R programmers available from school, so that helps kind of fill that type of need.

Speaker 1:

And we have lemons, we can make lemonade.

Speaker 2:

Exactly.

Speaker 1:

I'm actually curious. So AR has many applications in data science also, but in your mind, what is the key difference between data science and clinical programming? In your mind?

Speaker 2:

what is the key difference between data science and clinical programming? Yeah, good question. In my search and understanding of R I found a lot of examples in data science, which is really great. But clinical programming, I would say, is more focused on what I call considered data frame programming, sql type processing, not so much, I guess, of creating dummy data, but more along the lines of processing, selecting variables, filtering, doing group processing.

Speaker 2:

Basically, in our area of pharmaceutical industry, what we have to do is we have to create STTMs, which is a standardized version of the raw data coming in. And so, you know, in some sense there's some similarities, obviously, with data science, because you're still concerned about the quality of the data, you have specifications that you need to follow. So we're much more interested in data frames versus matrices, for example, and so, I think, focusing more along those lines. And then the clinical trial process, you know, creation of the SDTMs. Many of the organizations follow best practices and how they do that. So they have systems in place to leverage metrics and so applying that, you know, in an R setting, you know, could be some challenges initially, but I think organizations you know have gone to the extent of developing the Pharmaverse packages to help facilitate that.

Speaker 1:

I want to go back to one of the things you said earlier, which is that R fits better with the increased collaboration in our industry. Maybe you can comment more about what is this collaboration and what form does it take?

Speaker 2:

Yeah, no, I think it's tremendous. I've never seen anything like it In the past. Organizations, you know, they do volunteer work like we'll join on CDISC committees to review guidelines and also fuse but the collaboration that I'm talking about now with R, they've come. It's kind of like an evolution process. They've come to the realization that we're all on the same boat. You know we have similar challenges, we have similar goals. Why not work together?

Speaker 2:

When CDIS was first introduced, there was a major overhaul in the submission process, you know, because previously, before CDIS, there'd be so many variations for the same type of data and domain, which didn't make any sense. So when CDIS was introduced, you streamline the process. It's kind of like a manufacturing facility. You completely streamline it. So then the domain is going to be exactly the domain and FDA knows exactly how, what to do with it. And so what that has done is, you know, when CDIS was introduced, it opened up everybody's eyes saying, okay, this is a better way to do it. I think at this point what we're seeing is pharmaceutical companies so you have sponsors, you have CROs, you have CDISC, you have FDA realizing oh, we've been doing this. We've been doing this for three decades. Let's do something innovative and let's use that technology to work together and solve things together, so that we can use maybe a package or a function in order to accomplish the same thing once it's already been tested and validated.

Speaker 2:

So if you, think about it then you know you're able to use the limited resources for more challenging tasks. Why do the routine things that we're very comfortable doing? We can focus more on the more challenging things. So there's a lot of momentum with this going on. And you know there hasn't been just one package. There are many packages already been developed in Pharmaverse and so you know people once they see they'll be impressed. It does take time to develop these packages and you know organizations are leveraging them, so it makes a lot of sense and organizations are leveraging them, so it makes a lot of sense.

Speaker 1:

I love the analogy of CDISC extreme mining data like a manufacturing facility. That's a really great image. I'm going to steal that Sure.

Speaker 2:

Yeah, for me, when I teach CDISC I always like to give some analogy, because otherwise it's kind of an abstract concept. I try to put some meaning into it and then it can make sense to them.

Speaker 1:

Well, you've mentioned the Pharmaverse a couple of times. What is the Pharmaverse?

Speaker 2:

So for those who are in our programming, everybody knows about Tidyverse. So Tidyverse, just like the name implies, it's universal. There's so many components within Tidyverse that enables you to do many of the things that I talked about, such as bringing data in or doing data management, sql processing, analysis, many of those things, and so our programmers definitely need to know and leverage Tidyverse. So then we take an extension of that. We kind of go into the vertical market.

Speaker 2:

So now we have is the Farmerverse, which is basically a version of the Tidyverse, but then specifically for the pharmaceutical industry. Now many of the packages that are there in the Farmerverse have been built on Tidyverse and so but it is dedicated to the creation of STTMs, to the creation of STTMs, to the creation of atoms, to the creation of clinical study reports, and there's a whole pipeline of a workflow process that the Pharmiverse is looking into to kind of fill that gap. So basically you have end to end processing, but the Pharmiverse basically is specifically for the pharmaceutical industry. So it's a vertical area, but it works very similar to what you see as functions that are there in the Tidyverse. And so people who know about Tidyverse and are comfortable using piping and know how to program in R should be able to make the transition to using Farmiverse easy.

Speaker 1:

That makes sense and that's another example of collaboration right.

Speaker 2:

Yep, that's exactly right.

Speaker 1:

Well, I would be curious to learn more about. Well, we just talked about the CDISC standards. Are there any differences when applying the CDISC standards to data that is programmed in R?

Speaker 2:

Yeah, good question. You know organizations have several options. It's not either, or there are different organizations that have different interests, priorities, resources and vision. Those who are still heavily invested in SaaS, for example, may want to continue with that, but they can still use R Shiny to help, you know, complement their submission. Their submission may be in SAS and they can have R Shiny to help engage the reviewer in the review process. So that's one way.

Speaker 2:

Other ways is when you bring, you can have several components of the process, like the creation of STTMs. Maybe you want to do that using the Oak package within Farmiverse, or maybe you want to keep the STTMs within SaaS and maybe you want to use the Admiral, keep the SDTMs within SAS and maybe you want to use the Admiral for the generation of Adams. Or there may be SDTM checks that you want to use from the R. So there's several things that you can do. And since some organizations maybe they want to have a migration plan into using R and maybe they'll have new studies that will be developed in R and and maybe they'll have new studies, new studies that will be developed in R, and then maybe they'll use those. You know completely with the pharmaverse, but I think you know migration.

Speaker 2:

I think if you're, if you're already, you know, working on on studies that are active and everything like that, you want to be careful making a migration into R for those active studies. You know definitely don't want to delay anything. You know time-ons are extremely important in our field. But I think that you know, slowly, maybe integrating R, maybe R could be introduced as a QC, independent QC. You know, instead of doing double programming in SAS, bring in R as a QC. You know you need to definitely train, have that skill set in your teams who you know knows about R. But once you have that many organizations, you know they may be having macros or applying best practices in SAS, so they need to have that same type of you know mentality and SOP process when you're in the R trying to apply best practices. So you have that compliance. And, of course, generating a high level of compliance from the compliance report is also very important, which can be done with R as well.

Speaker 1:

That makes sense. I'm wondering the increased usage of R does this impact the platforms that pharmaceutical companies use to analyze their data? Most scientific computing environments, at least previously, ran SAS exclusively.

Speaker 2:

Yeah, no, that's true.

Speaker 2:

I think that you know with R you have, and the thing with R is you have the base R, then you have RStudio, which is the great user interface to R, and then there's versions of RStudio you know that you have on the cloud.

Speaker 2:

So there's various ways that the structure is there, is there, and then the thing about R. I think there's a little bit more administration with R, meaning there's various versions not only of R itself but the packages that you have and the dependency, everything that's there. So one of the important things you know R, just like with SAS, what we had to do before was installation, operational performance qualification. Those still need to be applied when you're getting into R. Maybe I think more so because the amount and the type of maybe updates, versioning, probably a little bit more, that you may see in R versus in SAS, again, mainly because of the packages that are there. And so I think, yeah, there could be some different structure that's there with our programming computing environment and fortunately there are leaders in our field to help organizations set that up, so they don't have to have the expertise in that field but they can work with the IT group to kind of set that environment up.

Speaker 1:

And the potentially larger administration. That's one difference between SAS and R. What other differences are there, with R being an open source language?

Speaker 2:

Yeah, good point. There are some differences. You know the way R handles missing data. There is an internal representation and I encourage my clients and students to you know, leverage that so that any missing data coming into R make it known to R that this is missing data and there's macros to do that. Because then what you can do is you can leverage the macros, the functions that are specific to checking for missing data. So that's one thing. Other things, you know they're rounding. There's some rounding differences in the way a SAS and R behave and the whole thing about R.

Speaker 2:

Basically R is a collection of macros, of functions rather, and so its functions are very specific to the task that you want to do. You know, when you talk about SAS, obviously there's SAS functions and everything. Talk about SAS, obviously there's SAS functions and everything. When you the SAS macros, when you're running a SAS macro in R, that's more along the lines of a user-defined function. So I think that's the kind of analogy that's there.

Speaker 2:

But what I've done is, in the process of learning R, I've always tried to have a direct comparison to SAS. So how can I do this? How can I do this and not what I've been doing in SAS for three decades. So one of the things I was very happy to see was SQL type processing. So the dplyr package enables me to continue doing that. Whether I want to select variables, you know, filter, do some group processing or using summary stats, I can do all those things in R, like I've been doing in SAS. And while in SAS you have the data steps and the procedures, which you have in R is, of course, you have the various packages and the various functions, and then you have piping, which connects related functions together, based off a common data frame, and then you can use the same type of logic in order to generate some type of output results.

Speaker 1:

Really cool. Have we seen submissions made entirely in R yet?

Speaker 2:

Yeah, I think this was the year to kind of have a proof of concept and I started off, I think, with a, you know, parallel SAS and R and I think now you know that has gone well. Fda has said that they do accept R. So I think you know fail at posit. He's accumulated a list of submissions and some of those submissions also include the shiny. So what we are seeing is maybe you know a list of up to about 30 or so submissions so far, and I think it's going to continue to grow because as people get more and more comfortable with they will be in line to have those submissions. So I think we'll be seeing more of those.

Speaker 1:

Really cool and exciting development. For sure, if we have existing SaaS programmers in a pharmaceutical company, should they be learning R?

Speaker 2:

Yeah, I think. So. I know it was a challenge, you know, definitely for me, because I'm so used and comfortable doing SAS, because anything that I want to do, whether it's transpose, I can do that very easily. When you're going to a new language, you're kind of starting off from scratch. You know you're in a jungle and you have to find your way. You make mistakes.

Speaker 2:

You know one of the things that I think, not necessarily just with R, but I think when you're learning another language, it gives you a sense of, okay, there's more than one way of doing things. You know that this other approach, there may be some benefits to this. You know there are differences and what are the benefits and what are some. You know drawbacks and and and. It opened up you know, my eyes to see okay, oh, this is a different way of doing it and accomplishing the same type of things. So I think my skillset, you know, expanded, my understanding of how, you know, data processing works got a little bit better. And then, not only that, it opened up, you know, some doors for me, able to, you know, do things that I haven't done before. And so you know you have that sense of, you know, motivation that you want to learn more, especially when I'm seeing so much activity in R.

Speaker 2:

You know my curiosity is okay why are so many people getting so involved with R? And you know, initially I found R to be a little bit of a steep learning curve, mainly because I was so used to doing things in one way using SAS. But I realized, okay, there is a, you know, if I stay focused, there is an end goal, that there is some real benefits there. And I realized there are, you know, are quite a few benefits and how what you can do with R once you understand the syntax, because the syntax is a little bit different. One of the things that I found very powerful in R, which is not there in SAS, is the data frame options. The data frame options where you can identify, select records or variables. It can be used quite extensively. So I found it to be a very, very powerful feature within R.

Speaker 1:

And what motivated you to learn R yourself?

Speaker 2:

Yeah, I think that my students, my clients, asked me if I offer a class in R, and at that time I didn't.

Speaker 2:

Only class I taught was a CDISC and SAS.

Speaker 2:

Let me see what the interest that's going on there in the industry, and then the interest from my clients asking me about it.

Speaker 2:

Then I did a deep dive, I learned, or I put together a website, our guru dot com, basically to help me better understand.

Speaker 2:

It's a wiki that I designed so that I was able to capture many of the things that I learned, and I learned things in bits and pieces. I categorized them and so I placed examples in there, and so, with the keyword searches and navigation, I continued to build and enhance on that. It's based off of the model that I had with sassavvycom, which I built over 11 years ago as a consultant in the field as a resource for me to not reinvent the wheel. I can become much more productive by accessing that, and so, with our guru, I was able to put together content so that I was able to better understand how our works, and then all the details and then links for more information, and then from that I was able to put together a curriculum which is, you know, eight weeks online class which you know, assumes people have no or some knowledge on R and then also teach them how to create SDTMs and atoms using the tidyverse.

Speaker 1:

And when people take your course, what are some of the things that they struggle with or stumble upon and then suddenly realize, ooh, this is how things work.

Speaker 2:

Yeah, I think one of the things that myself included is just the understanding of the syntax. You know, you take a look at, for example, the data frame, the data frame options, that syntax, you know. When you bombard with a lot of examples, you see, okay, you see the syntax and you see the outcome that it produces, but there's no clear explanation as to why the output is produced with that syntax. And so they get very confused. And what I do is I take a look, I group and I group similar type of syntax to some type of behavior and I kind of reverse engineer. I take a look okay, this is the outcome that's being produced. Why is it producing it this way? So then I take a look at the syntax and then figure it out, because you know, with the programming language, logic and rules you know convey everything, and so I'm able to explain concepts a lot easier to them. But it does require some time and patience, you know, to thoroughly understand that. I think the syntax.

Speaker 2:

The other thing that I find with R is the heavy use of indices. For me I like to reference variable names. I think they're very explicit, easy to understand and read and maintain versus indices. And you know people coming into my class. They get a little, they're a bit frustrated because R is not so straightforward, easy to learn. While there are many examples, it just doesn't explain things in very basic concepts and so I understand their pain because I've gone through it myself and I can explain those things and then from that I'm able to compare and contrast and find out differences like, for example, mutate versus, summarize the behaviors outcomes from that and that further enhances my understanding and their understanding you know how to master are just like they have been mastering SAS for all these years.

Speaker 1:

Well, the best teachers are the students who want to struggle with themselves.

Speaker 2:

Yep, I agree. Yeah, one of the learning, one of the early learning things for me, was when I created Yep, I agree. So then I immediately looked into it as like why is it creating a matrix? Then I realized the root of the problem With the C-bind. If you don't have a data frame as part of your options there the parameter there it will create a matrix. But if you do have a data frame, then it creates a data frame. So, yeah, so from that point on, I actually used dataframe in order to create my data frame instead of C-bind.

Speaker 1:

Yeah, that's a fun learning experience. What about the whole clinical programming space? How did you get into it in the first place?

Speaker 2:

Yeah, I've always had an interest in this area. Before pharmaceutical, I've been in the medical device field. You know my, my, my bachelor's, my master's is in bioengineering and my master's is in applied mathematics bachelor's in applied mathematics. And so, with the medical device field, I've always been in clinical trials because I wanted to apply my programming, analytical skills, helping to have better quality data for better quality of life, and so I think that it's a really good combination.

Speaker 2:

I was really lucky to be in this field working for top pharma companies, top CROs, and I'm very much of a hands-on person, project manager and mentor, because I always like to help people, you know, grow in this field, get into this field and also, you know, grow. I think that one of the things you know, I think, people that may be comfortable in a specific niche, but I like to always, you know, expand the horizons. You know, do they really know and understand PROC, sql and are they leveraging metadata you know to the fullest? Do they really understand that? Because that's a really higher level of thinking, and so, you know, I'm excited to help, you know, share what I've learned with others and people taking my class. They could be new, fresh graduates or seniors in the field who want to have that master concept of specific tasks, so I'm able to offer all these things.

Speaker 1:

That's awesome. For all these things, that's awesome. Well, as we start rounding off, we always ask our guests the same question in the end, and that is if we gave you the transformation trials magic wand that has the ability to change one thing in our industry, what would you wish to change?

Speaker 2:

Yeah, I appreciate you asking me that. I think, as I'm learning a little bit more and more about Arm, I think that, as organizations are making the transition into Arm, I think, you know, sharing success stories will be helpful the benefits of what they're doing with R, but how the process, how they went, how they got to that point, what are some of the challenges that they encountered, what are the things that they questioned, the changes that they made in the infrastructure, what are the things that you know, lessons learned and what packages, how they go about with the validation of the packages. I think I think there's, you know, there's a gap there that that really hasn't been communicated. What we do is we see, we see the packages that are there and there's good documentation of the packages and people can see, ok, they can associate that this package will produce this result for me.

Speaker 2:

But I think a little bit more attention on the initial questions that everybody has, you know, which I also had, is the validation and how to actually go about with the implementation and, I think, more success stories to help educate others who are really thinking about it. And I know I've talked with some organizations and they decided that they're going to keep what they've been doing in SaaS because they have limited resources, but they will use R for Shiny, which is a great example of, you know, leveraging limited resources and still making use of R in a way that adds value that was not there before. So I think success stories like that will help convey the message that R is there and it's not like a complete replacement of SaaS unless you want it to be but I think educating more people about the things that really can be done with R and the process of achieving that that's awesome.

Speaker 1:

I think that can help more companies understand how to create value with R Well Sunil, this was a great conversation. If our listeners have follow-up questions or want to reach out to you, where can they find you?

Speaker 2:

Oh sure, yeah, Please feel free to visit. You can see me on LinkedIn. I'm there. I post a good bit on there. Feel free to connect with me. I'm happy to connect and help with any questions that you may have. And also, of course, r-gurucom is my website. The other one is SAS savvy calm. So please feel free to visit that and reach out to me with any questions that you may have. And thank you for this opportunity, you know, to be part of your podcast. I'm a fan, I listen your podcast. I think it really helps to communicate many of the things that are going on in our industry. So, thank you, thank you. So much New this was a pleasure.

Speaker 1:

Thank you. Take care, Listen to Transformation in Trials. If you have a suggestion for a guest for our show, reach out to Sam Parnell or Ivana Rosendahl on LinkedIn. You can find more episodes on Apple Podcasts, Spotify, Google Podcasts or in any other player. Remember to subscribe and get the episodes hot off the editor.

People on this episode