Transformation in Trials

Harnessing the Power of Open-Source in Pharmacovigilance with Lionel Van Holle

Sam Parnell & Ivanna Rosendal Season 4 Episode 9

Send us a text

Ready to lift the lid on the untapped potential of open-source technology in clinical trials? Strap in for a fascinating journey with our guest Lionel van Holle, the founder of Open Source PV. We'll unveil the transformative power of open-source tech in the life sciences industry, including its innate transparency, parallel development capabilities, and cost-effectiveness. Lionel shares insights into the diversity of SAS, R, and Python programmers involved in this field, revealing how the open-source revolution is just around the corner.

The collaboration between code and pharmacovigilance is changing the game, and we're excited to share this revelation with you. Tune in as we navigate the complexities of finding the right code packages and understand the nuances of search engine keywords. We also uncover the tremendous impact of the FDA's move towards open-source data and algorithms, highlighting the future of industry-wide collaborations. Concluding our discussion, we delve into the promise of machine learning and automation in medical writing, demonstrating how tech is poised to interpret data and create automated reports, potentially revolutionizing the way we see medical documentation. Join our enlightening conversation and discover the next frontier in life sciences.

Guest:
Lionel Van Holle


________
Reach out to Ivanna Rosendal

Join the conversation on our LinkedIn page

Speaker 1:

You're listening to Transformation in Trials. Welcome to Transformation in Trials. This is a podcast exploring all things transformational in clinical trials. Everything is off limits on the show and we will have guests from the whole spectrum of the clinical trials community and we're your hosts, Ivana and Sam. Welcome to another episode of Transformation in Trials. Today, in the studio with us we have Leonel van Hol, who is the founder of Open Source PV. Hi, Leonel.

Speaker 2:

Good morning.

Speaker 1:

Today we're going to dive into the topic of open source solutions for PV signal detection and Leonel setting the stage for our audience. Can you tell us more about the opportunities of open source technology, such as R in Life Sciences?

Speaker 2:

Yes. So I think first it's an opportunity in life science to get away from SAS, who has been really there for a long time, kind of monolithic software in the industry. So it opens the gates to different languages and each of them has different strengths and weaknesses. So it's good that life science can now pick it up where it's more useful first of all, and then we get the usual benefits of open source technology. First there is the transparency we can see what's the code behind it. We can also benefit from parallel development.

Speaker 2:

For example, when you have open source development there is no software roadmap where they pick up the features which are the most requested by the different potential users. It can go in all directions and it's very good. So you can benefit from what has been developed in the field of agriculture or different areas compared to life science, and it really helps with, I would say, cross fertilization, like benefiting from other fields in terms of methods or what has been developed. Another advantage is the limited cost. The Sassel licenses are very expensive, and then this open source technology, like R or Python, they come at no direct cost. There is maybe some training cost or maybe some validation cost, but no license cost. So that's the opportunities I see and it's already a good start and a good motivation to dive into open source technology in life science.

Speaker 1:

I would say I would be curious to hear your opinion on why have we been so dependent on Sass before? Why are we only looking at adopting open source technology now?

Speaker 2:

Yeah, I think it boils down to the fact that Sass has been kind of pre-approved by regulatory authorities. They don't challenge really when you submit a dossier that has been generated by Sass. They don't challenge every procedure. They kind of trust Sass. So it's an easy way in terms. So I think the pharma industry followed the easy way because there is a lot at stake at that moment. So I'm going to geopardize, for example, 10 years of product development just making a bet on alternative language for programming. But now there are more and more examples and collaboration between the FDA and some big pharma companies. So regulatory authorities also send signals that they are open in getting submissions in open source technology languages.

Speaker 1:

That is an interesting development. In your experience, does that also mean that 5Science companies are becoming more comfortable with using open source?

Speaker 2:

Yeah, I think now we have seen a company like which is the biggest one. They even developed R packages. That's the name, roche, exactly Roche. So they were really at the start of this movement, especially in the clinical trial setting, even developing specific package for Adam and NCTM generating them, and they are collaborating with the FDA. So, yeah, I think there was also a new statement that the first dossier submission entirely made in R was done a few months ago or a year ago, so it's really moving forward.

Speaker 2:

It comes with a bit of challenges, like how do we validate potentially obscure libraries we can find in R when they are not in CRAN? But there is definitely openness and the designs of the clinical trials become more and more complicated so we can really benefit from what has been developed. Should it be in the academic or in some other places? So yeah, there is definitely an opening and some success already.

Speaker 1:

I would also be curious to hear your thinking about the competencies required within a pharmaceutical company to make use of these open-thorough technologies, because historically we've heard a lot of SAS programmers who are comfortable with SAS. So what do you need to be able to use open-thorough?

Speaker 2:

I'm going to make myself a lot of enemies. But initially the SAS programmers they were not programmers by. They didn't follow an academic profile of programming or IT. They were coming from different places. They were coming from biology, medicine and so on. So it was already a group of very diverse people who learned very late after academics some programming and they were used for programming activities in the life science industry to start with. So that was my experience, like 15 years ago. We were coming from very different fields and different, very diverse profiles and almost none of us were coming from the IT or coming from informatics or something like that.

Speaker 2:

So SAS is a bit already not really compatible with the programming landscape. It's a bit strange. So with AR it's even easier because it's a bit similar, but they have, I would say, a better programming profile those who come with AR or Python programming skills, because it's an actual language. Sas is a bit it's a bit strange. It's really for almost a specific purpose.

Speaker 2:

So there is no difficulty to adapt and the beauty of it is that you have a huge supply of potential R and Python programmers compared to SAS, where actually what was needed is was to have people skilled in SAS, and it was notoriously not. These trainings were notoriously not given in the academics because it's a very expensive license and they didn't buy this license. So they prefer to teach R or Python to the students. So you have, by default, a bigger population of programmers coming out from schools and university. So it's even easier for the pharma companies to get R profiles or Python profiles, compared to SAS profiles, where they needed to invest some time and money in educating them in SAS language. So it's even easier. I would say, as a summary of my long elaboration Thank you, that makes sense.

Speaker 1:

And, lionel, we made a conference where you were telling me and the other conference participants about this cool solution that you built for signal detection for pharma covigilants. Can you tell us more about the solution that you built?

Speaker 2:

Yes. So the objective of Open Source PV is really to build a suite of tools in pharma covigilants, a suite of tools using open source technology, solely as an alternative to what's currently existing in the market. For the moment the pharma companies and they have a good pipeline with 10 products, for example, with some of them in post-marketing setting they have one choice If they do want to perform quantitative signal detection, either they buy some suite which is very expensive, for good reason, because it's a niche, so they cannot really make it cheap, because it requires investment to be built and there are a few customers only, so it's a very expensive solution or they build it on their own, but again, maybe they don't have the time or the resources to do it internally. So what I want to do is really provide an alternative, which is building a suite of tools using open source technology, which the effect of it is that it's going to be relatively cheap because you don't have license cost, because you use open source technology and you can benefit from parallel development. I can build, for example, myself features that are not really picked up because they would satisfy many, many users, or some others could build new features and we have some parallel development then of new functionalities, which is kind of nice because otherwise it's very, very few customers and then they really want to invest on the biggest and the most appealing features.

Speaker 2:

Otherwise in the traditional software development and for the moment there are some modules on there, so that's the spontaneous report database for vaccines, performing the steps of data management, so like cleaning data, removing duplicates, mapping to the latest version of the meta-addictionary, and then on top of that you have some signal detection methodology, so the traditional disproportionality. It's also a method I developed 10 years ago. It's time to unsets the detection and then you have a shiny app on top of it which allow the final users to dive into what has been flagged as quantitative signals and understand better if there is some potentially causal relationship. I have the same for fares. So fares, fares or vaccine strikes.

Speaker 2:

I also build a standalone signal detection system where the pharma companies can load their own data so you can easily export the minimal data required for running this proportionality analysis. It's just a product field, the event field, the case field and a few other fields if you want to benefit also from visualizations. And they load it Like it's a basic shiny app. You upload this file and then after a few seconds it's going to generate this proportionality scores and some visualization. A bit more minimalistic than the other, but it does the job and it doesn't require an IT project.

Speaker 2:

I love my IT colleagues, but it's also the idea to be more agile. If you want to set it up quickly and quickly assess the benefits, maybe before diving into a full IT project, with full implementation between the spontaneous report database and this new system, then you can quickly benefit and assess the value with this product. And I have a few too many orders in the pipeline, too many for my free time. But I also would like to invest on visualization of clinical trial data for safety review, and some nice work has already been done and I would like to make a good usage of it to complete the suite.

Speaker 1:

How has this been received by from studio companies? Have people been excited that they can do things themselves without having to do an IT project?

Speaker 2:

Yes, some so far to the standalone scenario detection. I got one customer that was really happy the fact that they could really benefit from the time to understand detection relatively easily without diving into a huge project also trying to convince potentially the software provider. That's going to be useful Because, to be honest, this method of time to understand detection works better for vaccines and for drugs. So it's not very interesting for the traditional software companies to invest on it Because they suspect that most of their customers will not be interested. So if they do it they make it the charge a lot, the potential request, because they see it as a some features that's gonna be only useful for one customer and generally they even consider it as a derailer. They don't. They want to have a roadmap that's not going in all directions. So they were happy to have this ability to quickly see the results. But yeah, I feel a bit more like an artisan compared to an industrial when I'm doing this kind of project, like saying, yeah, I can carve something like that, but it's a bit like an artisan sometime, but it's good, I like it and yeah, otherwise, some of the difficulties is also deploying these solutions.

Speaker 2:

So, for example, the standalone signal detection where they can upload their own data. I had to find this solution because there is generally some concern about data sharing. These data are very sensitive and it comes with many ISO or SOC2 certification for them to share the data and it was very complicated for me to invest time trying to get this certification. And I found this nice trick where they upload their own data on the system so they can provide a code standalone but they can then deploy it internally on their own EC2 instance on Amazon and they control the environment and then they upload and there is no data sharing. The other two solutions they are with open data. It says the FDA or the CDC releasing the data, so it's open. We don't have these potential concerns. But for a pharma company there is the potential concern. But the upload data use self solution is kind of a nice one and it's a way in between to mitigate the risks and, yeah, I like it.

Speaker 1:

And that's a smart solution for a problem that could become very complicated.

Speaker 2:

Yeah, and I see it's more and more coming, like the shiny apps where people can upload their own data. I see it more and more often, so it's not a unique solution for me. I think many people are converging to this way of doing to mitigate the risk of data sharing.

Speaker 1:

How did you land on the pharma curriculum space? What made you curious about this specific kind of data?

Speaker 2:

Oh yeah, I worked in that field for 10 years before starting Open Source PV. I discovered a field like when was it in 2010 or something like that? And I built a solution for an internal solution for JSK vaccines. At the time they stopped using Oracle products and I advocated that it could be done internally, because I was a SAS programmer and I was like, yeah, we can do the job of data management with SAS and we can do the visualization with Spotify, or let's combine something together. And it was the solution used for four or five years in JSK vaccines.

Speaker 2:

And then when I left, I was like, oh, I want still to be able to master this kind of tools, but let's do it in Open Source, because I could not really afford to pay SAS license and Spotify for a pet project. So I did it and it was working and it attracted some customers. And now, looking backwards, I think AR can do more things than SAS and Shiny can do more things than Spotify. So, and it's the same, it's the same environment, and so, and when I see the development of the new features of Shiny over the years compared to the development of new features of SAS, I guess I bet on the right horse.

Speaker 1:

Yeah, let's talk more about this parallel development, where multiple teams or people are developing different functionality on these Open Source platforms. Maybe very basic question, but how would one find the relevant type of code that one needs?

Speaker 2:

Oh, yeah, that's very difficult to find the right package, or it's really. Yeah, there is documentation, of course, but you still rely on your search engine to find it. And what are the good keywords to put in Google? And that sense ChargeGPT, at least, was good, less relying on the keyword you are using. That's a nice improvement. I didn't try to use ChargeGPT to look for a nice package. Also, there is this problem that what's feeding ChargeGPT is not really recent data, so I guess you might miss the recent package released in 2023, but, yeah, it's very difficult.

Speaker 2:

Sometimes Me, I go to conference. I went to a very nice user conference in Avenue this summer in France. It was really great and actually sometimes you listen to people describing a package about ecology and you are like, yeah, but it could do the job also in pharmacovigilance. So they don't advertise it for pharmacovigilance, of course, because maybe they don't even know the existence of pharmacovigilance. But you can like oh yeah, it can be repurposed, it can be used, and that's what I meant with cross-fertilization you see, they face similar issues or building statistical models that would answer a similar question and it can be repurposed, or visualizations.

Speaker 2:

It's really evolving very fast and it's always a bit complicated to know how would the question be best answered, through quantification or through visualization? I always tend to prefer quantification because it gives a definite answer and you can build like actions on it when it reaches a threshold. Visualization still relies a lot on people actually looking at data and it can be even if it's at an aggregate level. But it can be very quickly tedious with the dashboard always reviewing the same thing. But finding the right mixture, finding using the right visualizations and finding the right quantification is always something you have to to keep looking for the best mix and be ready to change depending on what has been developed somewhere else. So you keep looking.

Speaker 1:

Is there a lot of collaboration between pharmaceutical companies trying to build this space or understand which packages are best?

Speaker 2:

There are a lot of collaborations, but not yet in the pharmacovigilance space, unfortunately. You have it in the real-world evidence. There is this very great initiative like HODISSE so it's H-D-S-I and it's the new name before it was called OMOP and 10 years, or more than 10 years ago, they built a common data model for all the different data source in real-world evidence should it be market scan or other claims database and then they built methods on top of it and it was really the big pharma companies collaborating together in building solutions, exchanging the code, making it open source. It was very great for real-world evidence. They touched a little bit pharmacovigilance, because real-world evidence and pharmacovigilance sometimes they use similar data sources. It was a great effort, very, very nice, very well built, no funding. It's crazy the amount of things they were able to achieve without funding just the time and work of the different stakeholders and having the same data and building the code and benefiting from it. It's really amazing. You can check it out. So it's OHDSIorg, I think. Maybe you can put some link to it. People will be interested and they released most of the algorithms in the open. So that was a good collaboration.

Speaker 2:

Now they collaborate, as I told you in the beginning for creating packages to create the ADAM and the SDTM data sets. There is great packages I'm no more in the clinical trial field so I couldn't really tell you the name, but I can give you the reference afterwards to put in the podcast if you want. So there are great collaborations For pharmacovigilance not so much, unfortunately. The biggest move forward I saw was the release of the most complicated disproportionality analysis method by two people from the FDA as a package. So it's an open eBGM. So the method is the multigamma Poisson Schrincker and the estimate is the eBGM. So empirical geometrical mean and the release this algorithm, which was before something that only Oracle had. So Oracle hired Bill D'Muchel who was the developer of the method. I developed it in 1999. And they hired him and somehow they were the only software company able to release this method. This method is not especially better than the three other methods, but FDA was using it and Pharma tends to get along with FDA.

Speaker 2:

So if FDA is using it. It must be good. Let's use the same thing that they are using. And FDA made the effort to build their own algorithm and release it in the open. It still puzzles me if it's an official move from the FDA, but the two providers, the two contributors of the package, put the FDA email address. So I think it's a sign. Normally they should get the authorization from the FDA to publish. Normally I suspect it's not clear.

Speaker 2:

That's one of the big shifts, I would say, where they say look, it's in R, we release it and you can be software agnostic using this if you want. And they release also the data in public, at least for VERS, which is kind of a big step because for the last 15 years we faced so many questions with the PII what can we share? But for that amount of time VERS data were made public by FDA and they anonymized it and so on. They do some stuff to protect PII, but it didn't kill this initiative of open data. So FDA provided open data for VERS and they provided open algorithm in R and it's not coming from the industry, but I think they show us somehow the way the data could be open, the algorithm could be open and you could do the work, and I built a solution on top of it on top of VERS. On top of VERS I used the open eBGM. So the foundation is thanks to them.

Speaker 2:

Otherwise I would have no data at that time. I would have no data to build solution on top of it. So they show us the way. I just hope that in pharmacovigilance we are going to pick it up the same way they did for real-world evidence and clinical trials data. I think one of the reasons it's slower is because we have less programmers in this area. As simple as that. It's not tradition to have programmers. It's really recent before there was not just user of BI tools and things like that. But no real programmers know it comes. Or I am an anomaly, I don't know.

Speaker 1:

But it is interesting that an area that is so data heavy as pharmacovigilance that we have traditionally not have specific programmers for this space.

Speaker 2:

No, they were using sweet software and BI tools on top of it, but not really programmers. I never experienced really. No, it's more. I use my skills of programmers when I was in pharmacopedemology and pharmacopedemology is a field between epidemiology and pharmacovision and it's not clear in each organization where pharmacopedemology should be in pharmacovision or in epidemiology. It depends on the company. But that's where you have more chance to find programmers, because sometimes you have post alteration safety studies or things like that and you want to basically a bit like the programmers. You have your primary outcome and you want to see their differences between groups and then you are similar activities and then you start programming just to get the results of your study. But in pure pharmacovigilance is the run of the algorithm and generic reports. That BI tools can do it and I guess there was no specific needs. But it changes over time and now it comes as a potential alternative to costly software so they might invest more internally in some companies into some programming profiles.

Speaker 1:

In your mind and based on your understanding of where the pharmacovigilance technology space is moving, how do you think the technology landscape could look in a couple of years from now?

Speaker 2:

Yeah well, everyone is saying there's going to be a shift with AI and machine learning and stuff. I think we witnessed some shift in terms of data entry. It helped somehow a little bit the machine learning and the AI, but for really activities of signal detection and finding some causal relationship, we still rely heavily on physicians doing their job and applying their expertise in distinguishing. Ok, we can show them where there are abnormal data and robust abnormal data, not due to chance. That we can do to help them, and we can do it not only using a disproportionate number of reports but maybe trying also to quantify other causality criteria and bringing that to them. But we still rely on their expertise and I don't see it replaced anytime soon by machine learning. That would require a huge amount of data, a huge amount of training and we don't have so many safety signals to train the algorithm on it. And we have to be ready for the new thing which, almost by definition, is going to be different from what we witnessed in the past. When we was working on some pandemic vaccines at JSK, there was this safety issue with narcolepsy. Narcolepsy was nothing to be expected from what we learned 20 years before, 20 years of history. It was completely new in terms of safety issue or in terms of safety profile. So we have to be ready for something unexpected and this machine learning. They learn from history. They don't learn to be ready for the unexpected, which is the role, by definition, of pharmacovigilance. So I very sound like old school. Maybe I'm getting old. I'm getting old, that's a fact, but maybe I'm getting too old. But I think we can benefit from this new visualization by Enance.

Speaker 2:

What I see potentially but I don't have indications of the industry going forward or going into that direction is the fact that most of the writing of the regulatory reports is still manual. And it kills me Because we have now the tools, like in Quarto, in AR, to automate reporting in very, very complex way potentially. So we are generating the table, figures and listings, but we are still manually mapping them into a template and basically what you have in AR, in Quarto, is the ability really to almost automate everything that can be automated and leave the interpretation then to the experts, but feeding them with a pre-filled report already, because most of the industry works with templates, but we are still somehow making someone controlling the template, someone generating the table, figures and listings, someone copy pasting in there someone doing the quality check. It's been copy pasted at the right place by the right person at the right time when all of that could be a programming activity which has been validated. But I don't see really evidence of the industry going forward in that direction.

Speaker 2:

But for me when I saw Armagh down I was like I was trying to do something artisanally with SAS and Word. It was awful At the time. I was happy because I saw it typing automatically, but it was really amateur. But what Quarto is providing? For me it could be a game changer.

Speaker 2:

But, I don't see the industry completely reshuffling the medical writing team, the programming team, and just merging it, saying let's adapt to the technology. Now we have the tools, we have the data, we have the template. We can see it as a programming activity. We're going to generate it with one flow of activity and then of course we need interpretation. It's very important we don't produce just templates and we have to interpret the data. But instead of wasting people with PhD copy-pasting stuff and with silly instructions on where to copy and stuff, we could hand them over the pre-filled template and then they could start interpreting through and conclusion, having discussion sections and really making the best out of the data and the template.

Speaker 2:

I would like the industry to go in that direction, but I'm not aware of much initiatives going in that direction, unfortunately, despite Quarto. But I'm not sure. Even at art studio I'm not sure they realize the potential of Quarto, because I was discussing at this conference in Avignon, telling them you see, all the industry is working on templates and you have a nice way to fill this. Are you not just a template where you put some codes where it's going to generate the right statistics? You can even make the text conditional to the threshold you obtain, then you can fill that all automatically. I was like, oh yeah, I don't know if the commercial are really selling that as a solution for the pharma industry, so maybe the pharma industry has to guess and see the potential. But it would be nice.

Speaker 2:

Some initiatives go by initiative, saying let's investigate if these templates that we all have can be. Can we have the same one across the industry and then can we build a code Because they have the same structure of data. It's Adam ZTM everywhere. Most of them have ARGUS spot on news report data. So we have the same data, we have the same template, but each of us is building a different solution or is doing it manually. So I would invest in that instead of trying to find a use of AI, which is the wrong way to do it.

Speaker 1:

Normally you are anyway, but I like that and that is so specific and something we could well, both as individual companies and as an industry, do today, and for me, sometimes, as if we forget that the reports we have now someone also invented them at some point. They're not like physical laws of nature.

Speaker 2:

Yeah, and I guess they could be adapted if we tell the regulatory authorities look, if we change slightly, it can be automated. Would you agree? Maybe they are just open to the idea and it could make sense also, but yeah, that's definitely a big potential.

Speaker 1:

that's on the table there and I have some googling to do after this conversation, for sure.

Speaker 2:

OK, yeah, Quarto. I don't know if you know, but Quarto is really something to investigate from art. It's really with a lot of potential, I think.

Speaker 1:

I have not encountered it before, so maybe some of it is also just sharing that perspective that look, here in the industry, we can actually do this already now and start working into it. Well, this ties beautifully into the next question I would like to ask. That's the question that we always ask our guests on the show. If we gave you a magic wand and with that wand you could change one thing in the life sciences industry, what would you wish to change?

Speaker 2:

Well, it's easy to answer. For me it's open data. I wish they release clinical trial data in the open after so, all the SDTM and ADAM data sets are the same thing for the spontaneous report database they advocate? No, it's crazy, we cannot do it because, for example, for vaccines, it's going to feed the anti-vaccine movement. But look, fda CDC released verse. How is it going to be different if the company release also their own product? It includes their products, it includes all products. So I think that could be a big change.

Speaker 2:

For example, to develop solutions we need data and for example, for clinical trial data, it's a bit puzzling because I don't find them. I don't Maybe I looked poorly, but I looked several times on Google and I just found a few very small data sets and I wanted to test some visualizations and for safety purposes and stuff like that. I can't find, but I cannot assess all robust data there are against the different SDTM standards. So sometimes it's not always exactly the same. Sometimes there are some variations and stuff. So they don't release really in the open. I think it prevents building open source solution when you don't release it.

Speaker 2:

If they were to release at least I don't know after five years or three years or something like that. It would help. I know there is the clinical trialgov, but I didn't see really the SDTM or the Adam. You can find the clinical study report or stuff like that. But I mean that would be good we could combine them together for different purposes. We talk a lot about transparency, the fact that pharma industry wants to be more transparent. That would be a very big signal about transparency, saying yeah, our data are in the open now and more and more journals they now require the data to be along with the study results.

Speaker 2:

So maybe that's going to change also the way, because more and more they require to have the data as an anantum, so that might build a bridge to release the data in the open more structurally, maybe centralized somewhere. That would definitely help, instead of finding them in every journal or every appendix. But that's what I would change open data and then the magic will happen. People will find data and they will build stuff on top of it. So that's what I would change.

Speaker 1:

That could cascade a whole load of changes.

Speaker 2:

If that thing was to change. Yeah, yeah, see if I have one thing to change let's try to change the one at the top that's going to cascade down.

Speaker 1:

Lionel, if our listeners want to reach out to you and ask further questions or learn more about what you do, where can they find you?

Speaker 2:

So there is my website, so opensourcepvcom, and otherwise you can reach out by email. So it's lionelvanhole, so my name is dot arobazopensourcepvcom and then it would be my pleasure to follow up this conversation with you or others. Really, and thanks for the opportunity. Thank you so much for coming.

Speaker 1:

This was a super interesting conversation.

Speaker 2:

Yeah, I enjoy it.

Speaker 1:

Thank you, thank you and thank you for listening to Transformation in Trials. If you have a suggestion for a guest for our show, reach out to Sam Parnell or Ivana Rosendale or LinkedIn. You can find more episodes on Apple Podcasts, spotify, google Podcasts or in any other player. Remember to subscribe and get the episodes hot off the editor.

People on this episode