Modern Chemistry Podcast

Sebastian Gross – The devil is in the data

Episode Summary

Sebastian Gross joins the show to discuss the trends in laboratory automation and all things data.

Episode Notes

Episode 18 of the Modern Chemistry podcast features Dr. Sebastian Gross. Sebastian is a consultant at Wega IT, (https://www.wega-it.com/en/). Where he supports clients, using his advanced experience in biotechnology methods, bioprocess development, lab & assay automation and kinetic modelling. Sebastian has strong experience in tools like Liquid Handling station, SiLA, Python, MATLAB, LabVIEW, SQL, and Data modelling.

Prior to Wega, Sebastian was head of process development at TUB (Technische Universität Berlin), where he also did his PhD.

Sebastian is contactable on social media, and you can find him on LinkedIn at https://www.linkedin.com/in/sebastian-hans/
You can also connect with Sebastian via the Wega website link above.

Sebastian’s web link

Our theme music is "Wholesome" by Kevin MacLeod (https://incompetech.com)

Music from https://filmmusic.io

License: CC BY (http://creativecommons.org/licenses/by/4.0/)

Connect with me (Paul) at https://www.linkedin.com/in/paulorange/

H.E.L. group can be found at www.helgroup.com online,
on LinkedIn at https://www.linkedin.com/company/hel-group/
on Twitter, we're @hel_group, https://twitter.com/hel_group
or search for us on Facebook

Episode Transcription

This is an automated transcript, prepared by rev.com. It has not been checked for accuracy or spelling.

Paul Orange (00:07):

Hello, and welcome to the Modern Chemistry Podcast with your host, Paul Orange.

Paul Orange (00:14):

Hi there, and welcome to episode number 18 of the Modern Chemistry Podcast. So on the show this time we have Sebastian Gross and we really dig into the challenges of dealing with automated experiments or really any kind of experiments that generate massive amounts of data. Sebastian is a consultant with a company called Wega IT who are a life sciences or science focused IT provider. And we had also arranged to be joined by Nicholas Cruz, who was a former colleague of Sebastian's, but unfortunately Nicholas, although he tried to join us, the day of the recording, he lost his voice. So thanks Nicholas for trying to join us. And hopefully we'll get a talk to you at some point in the future.

Paul Orange (01:08):

So I spoke to Sebastian down the line from where I was in the UK and he was based at home in Germany. And I think we spend a lot of time talking about data, the importance of data hygiene, and cleanliness, and also talk about how automation plays into that whole process. So we jump straight into the discussion and Sebastian and I are talking about his prior experience, and I hope you can pick it up from there and I'll be back right at the end to say goodbye.

Paul Orange (01:41):

Am I right in saying that your job title is now consultant for Wega after your time at TUB?

Sebastian Gross (01:51):

Yes. So I was head of the highest recruit by process development group at TUB. I made also my PhD there in this group, also related to bio process engineering and in November '21, I was hired by Wega Informatiks. Yeah. As a consultant, as a business analyst, the job is very diverse, so it's not straightforward. You're doing in each project, something different, which is quite nice. And the point on Wega is that it's really focused on life science, pharmaceutical companies. So it's not a IT consulting for everything, it's really focusing on IT solutions, IT architectures for pharmaceutical R&D production, GP environment.

Paul Orange (02:49):

Okay. So from what I was looking at, I get the sense that it's about... I mean, ultimately I guess, about efficiency, of taking those R&D projects and I guess going through the scale up and production, or is that not quite right? Is it broader than that?

Sebastian Gross (03:10):

From my research point of view or from the Wega point of view?

Paul Orange (03:17):

Let's talk about Wega first and then we can talk about the research side, because...

Sebastian Gross (03:23):

I guess the... No, Wega does not have... We also have a specific process optimization topics, but it's really how can the processes in the lab or in a research company be mapped in different software architectures where you need global database for your lead components, where do you need an ALN, how to make integrations to make these mappings to support or take over this whole validation topic, is a huge effort for Wega and a huge business for us because dealing software validation computers, system validation is a big issue. Especially if you think about these... Yeah. Now everything is connected and everything is in a network. This also makes everything a bit more tricky because you easily can be violated by a text from outside.

Paul Orange (04:27):

Sure.

Sebastian Gross (04:30):

And even isolated systems are not longer, possibly everything needs to be connected to limbs and something like this. And the needs of rapid updates is something that is right now, more or less against a strict validation approach. So there we have a fixed system, fixed version. How to deal with this is also a big topic in Wega and we deal with this. And Wega is also on the other side, assisting in lab automation, process automation, parts, and how to do this. Wega is quite strong, enforcing this SiLA standard by developing the SiLA standard, by bringing this to customers because at the end, the customer needs to decide what they want to do.

Sebastian Gross (05:15):

We see in a lot of projects that having something like SiLA globally or S1 standard, it would increase a lot of our projects with the customers.

Paul Orange (05:27):

Okay.

Sebastian Gross (05:29):

And also the customers are always asking for standards and that's why Wega tried to develop or to co-create these standard. And its not wrong, true. So it's just a group of people in Wega who are very engaged in these SiLA coworking group, especially Daniel Juchli, who's also the head of this coworking group. And who's a bit driving this whole development around SiLA. But Wega is not SiLA but we are putting a lot of effort there.

Paul Orange (06:07):

Okay. And then intention is then that the software systems from all sorts of different manufacturers comply to a certain code of operation? This not something I'm familiar with.

Sebastian Gross (06:19):

No. One power of Wega is that's completely vendor-agnostic. So we don't care which vendors you want to use, so for limbs and there are some big players and we have all good connections to all of them. And it's yes, if you have a global database, if you have a limbs, if you have an ELN, or if you have some special tools for antibody discovery, something like this, then you need a data concept, where it's the source of tools, you need to validate all these parts or not. If you on GP or not. Environment, partially you are with some parts, you are in a GP environment and with some are, other not, and also how the software, how the data from the limbs comes to the ELN and back and forth. And also these code integration, these direct mapping of the data, this data workflow design, the mapping to the processes in the companies, collecting the requirements for the vendors, for the users, and also nearly everything until assisting the vendor selection, scheduling demonstration, making the whole process management. That's what the consulting company is doing.

Paul Orange (07:43):

Yeah.

Sebastian Gross (07:45):

I guess it's not that the companies do not want to do this by themselves, but it's a temporary need, where they need a lot of human resources, a lot of manpower over a certain time, maybe one or two years. And it's quite hard to find skills people, because for successful processes, you need this life science background, you need a rough understanding what is going on in the lab. And on the other side, you need IT understanding. This is someone that's talking about data immigration, it's not, and someone is [inaudible 00:08:20] we just do it on the fly. Then from biotechnology point of view, I would say, "Hey, okay. The vendor knows what it's doing and it, they know how to do this." But on a testing point of view, "No. I know that this is a completely other kind of database, and this could be tricky and we should talk about this." And also this risk mitigation is something we can manage because we have done a lot of projects in different environments and have some experiments there... Experience there.

Paul Orange (08:54):

There's two things that I kind of think about. So one, I've interviewed people on the show before. A very common thing that comes out is you've got to have knowledge of more than just one discipline to be an effective scientist today. And I think that's what you were saying there. The other thing is, I mean, I think back to when I did my PhD, getting off at 30 years ago now, I remember us getting our first windows based computer in the lab, but now people, when I interview them or customers I talk to, a lot of them are... In my mind they're doing programming level activities to support their research because they need to do it either to make the systems work or even just to interpret their data. I mean, they've got to write an analysis processes of some sort or another.

Sebastian Gross (09:47):

Yeah.

Paul Orange (09:49):

And there's just a huge shift that's... I don't know... That's happened in the industry over the past 20, 30 years. I mean, it's always been data heavy, but the amount of information, it's just huge, isn't it?

Sebastian Gross (10:05):

Yes. And the amount of information we are able to generate doing one experiment or doing one campaign is increasing over and over.

Paul Orange (10:18):

Yeah.

Sebastian Gross (10:19):

And maybe 20 years ago, you had an experiment, you had five data points of two different variables. If I'm talking maybe for a bio process industry or something like this, it's just really rough example and it's not the truth, but then you have maybe something you could put on a A4 sheet and then, okay, you can make also the calculations more or less by yourself.

Paul Orange (10:44):

Sure.

Sebastian Gross (10:45):

But if you have now thousands of data points, it's not even possible to just evaluate all these single data points. So you need computer capacities and not sure what the solution will be. So there are more or less, with [inaudible 00:11:04] are very powerful bioinformatic science frameworks available who try to make standard operations more or less in a high quality. On the other side, research topics are always so diverse that there is no standard. And that's why all the people starting developing their own stuff. The open source community and this field is also quite important and increasing, but there is a gap in maintaining these open frameworks. And this is the reason why they are quickly outdated because the source framework, even if they're very good, they are developed by one or two PhD students. And after the PhD, they go to industry or somewhere else. And then they do not even have the time and they don't get money for maintaining these open source frameworks.

Sebastian Gross (11:59):

And partly, someone else takes over, partly a bigger community comes up, but more or less, there are debt as soon as the PhD is done, which is quite sad. But this is also the reason why people starting developing their own stuff, because okay, now a new patent framework for machine learning is available, but this one is using still the old, and it's not so efficient. So you have to reevaluate everything. And then you would yourself. This is quite challenging, but not sure what the solution would be. But one reason could be that we bring it also in academics and education of researchers that should have at least one or two lectures, really good programming, maybe also how to do unit testing and code review, because this is skills you may need. And good researchers, have learned this by their own, but it's not really good math and education as I guess.

Sebastian Gross (13:11):

As I know, there's only... Okay, you can volunteer to take some lectures for this, but it's not mandatory.

Paul Orange (13:17):

Right.

Sebastian Gross (13:19):

In the most cases, and I should get, as you should have learned about the pathways and the set, you should also learn how to make a rough good code.

Paul Orange (13:28):

Yeah. Yeah. So Sebastian, we seem to have got... Dug well into the topics and I haven't introduced you to the audience yet. So maybe you should do that. So everybody, thanks for sticking with us. Today, I'm talking to Sebastian Gross, who's a consultant at Wega IT. And Sebastian, your early history, I guess, informs this. So before Wega, you were at the, I'm going to say in English, but the Technical University of Berlin and looking at high throughput experimenting for bio process. And I guess coming back to just what you were saying, high throughput, whether it's for, chemical screening or in the bio process, as you said, can now generate huge amounts of information or data that needs turning into information. When you were at TUB, what sort of volumes of data will be talking about, that we need to look at and handle? I mean, I think put a scale on it helps.

Sebastian Gross (14:30):

Yeah. At Technical University, we had our mini bioreactor facility. So we had a 48 mini bioreactor system mounted in a [inaudible 00:14:40], this was the first facility we have. And then the second facility was a bigger mini bioreactor so 100 milliliter volume, but only eight reactors in parallel. And we could generate around 3000 data points per hour.

Paul Orange (14:59):

Okay.

Sebastian Gross (15:00):

Which is for bio process really big, but it's not called big data anymore. But in bio process, the kind of data's are difficult, it's complex data's. So you have in industry, you have time series, which is different to maybe just counting numbers of cars on the street. So you have different kind of metabolics you want to measure. And all these needs to be bring to somehow relation. Also, to be met on the process day, process conditions, you have several different inputs which can influence the bio process.

Sebastian Gross (15:44):

And yes, if we come back to this automation and data point, and I'm only talking about beta, because as soon as you have something recorded, it's not an information because information always needs context and interpretation. One challenge we have faced at this time was that it was really hard to just monitor all these different bioreactors at the same time, or even to control them because having one or two single bioreactors running, it's always, if you have a good scientist who operates as bioreactors, they can also always run successful bio processes. This is very important and screening where you screen maybe a microorganism the first time, or you have new conditions for evaluation or for screening purpose, to which always change the behavior of the bioreactor, of the bio process over the time.

Sebastian Gross (16:45):

And if you do so, you need to adjust it a bit or you have to completely do the experiment again. And this is not possible if you have 48 or even 12 bioreactors in [inaudible 00:16:58], or even with eight, it's not possible anymore. And that's why we started to collect as much data as possible over the time. That's why we have this tool liquid handing stations coupled together, one for controlling the bio process. One for doing the [inaudible 00:17:13]. Collecting all these data's, making all these data pipelines that we can use the data as soon as possible. And then we developed models, computer models. So on my PhD thesis, only mechanistic models, which has the advantages that you can put a lot of knowledge you already have about the process behavior about the cell into mathematical equations.

Sebastian Gross (17:43):

And with these mechanistic models, you are able to calibrate your model to a certain strength, to certain process with quite few data's. Compared to machine learning, if you have machine learning tools, keep learning algorithms, you need huge data sets over a long time to get a good predictive model. And we use these models to make predictions of the running processes and based on this predictions we starting controlling the process, so changing the input parameters that the process runs in a certain way.

Sebastian Gross (18:32):

Yeah, that's what we have done, I have done at the Technical University together with my team, with my colleagues. And the whole idea comes up from Nico, from Nicholas Cruz, who was the head of this group and my supervisor doing my PhD and also takes the group over after I have left in summer. The [inaudible 00:18:54] this is a good example what different skills you need from a bio process engineering point of view. So you have learned about how the machines look like, you have learned how bioreactor, how bio process is working, you have learned about microbiology and stuff like this, but now you need, "Okay, I need to develop maybe a data model. I need to make data pipelines. I need to learn how to solve mechanistic models." Okay. This was all covered by [inaudible 00:19:27] in the mathematics backgrounds, but there was a lot of stuff to learn for me during this time.

Sebastian Gross (19:35):

And to use this, to be in this point between bio process engineering, mathematics and automation was really challenging because there are only very few tools available you can already use, or they are out of the box. And the next step we did with this project was to try to exchange, is whole mechanistic framework in the background, by a machine learning algorithm tool and forced the controlling to make more detailed predictions with the same data, or even a bit more, which at the beginning was really hard because we were so proud. And then we, again, how many data we can collect? We were so proud about our data's. So we really had a facility which was running several times, we had process data's collected, but the way we collected the data... And even the amount of the data was at the beginning, not sufficient for machine learning approaches.

Sebastian Gross (20:34):

So it's really, as in the sketch, we were so proud having packed all our data's into a shiny box, give it to the machine learning guys, and they start crying and we really need to work with them to understand, "Okay, what is a complete data set?" We are also talking several times about fair data. And even if we have thought in the history, "Okay, the data we have collected are all together, it was not in the way." And even if you start from the other point of view, looking to the data, then you see, "Okay, this is missing, this is missing." Or, "Okay, here, you have it in your [inaudible 00:21:17], this is somewhere thought in the experiment documentations. This is in the database." And to map all these information and maybe what other information you do not have to take and count in the history, in the past. And it's really learning what makes data fair, what makes data complete, from my point of view.

Paul Orange (21:40):

When you talk about the scale of the data, there's a lot within bio processing, but I know within a lot of other processes, there's a continual move towards as much online measurement as possible. So rather than sampling, just constant, from what you're saying, I'm not quite sure whether that helps or hurts in the process or is it just data that needs to be analyzed?

Sebastian Gross (22:04):

Yeah, not sure how it is actually in the industry, but generating more data for us and the academics is on two point channeling. On the one hand its work you have to do. If it is not an online probe you can use, maybe then you need to do offline [inaudible 00:22:22] analytics. You need to have devices for instruments there. You need to consume with something like this, and you need people who are doing this analytics. So it's just work. But on the other side, it's also a matter of money. If you can buy a new instrument for fancy, just as a bullet point, you can try to start have proteomics for each sample data set, but it's really expensive.

Paul Orange (22:50):

Yeah.

Sebastian Gross (22:51):

And you really need to know what you want to do with the data. And I'm was really a fan of this work we had with Nico. So I was more, the guy in the lab with automation, building the facility, maybe building also the data pipeline and together with Nico who is really genius on the modeling part, on understanding how mechanistic models are working and how they need to be designed. And it was always, "Sebastian, I need some more data's here." And I can say, "Okay, Nico, I can generate more samples at this point. Or maybe I need some information at this pathway way. Okay. I easily can create acidic as a measurement, but I cannot create something as prior rate, as an example." And then to see the offset. Okay. The benefit of this additional data point is maybe huge, but the benefit on the modeling part is also very huge. So this is easy, or it's not easy. It's always challenging to generate more data. And that's why you need to know at first, what kind of information brings the data. What can you do with the data.

Sebastian Gross (23:56):

And I'm really a fan to think, "Okay, what is the requirement? What do I need"? Then derivate the data I need for this because otherwise... And even it would be always nice to have all data's for everything always available. And I don't see this as an issue. If you have more data than you need, you just can put it somewhere, but put it somewhere in the right way that it can be reused is the next challenge. And also it's a matter of time if the student or PhD student who have generated the data are not any more available. Okay. Are the data's still available?

Paul Orange (24:35):

Yeah.

Sebastian Gross (24:36):

Or are they lost, but I guess this is an academic issue, it's an industry. I hope it's different. That's the point of additional data. So it does not hurt to have additional data. It's just a matter of work and expense.

Speaker 1 (24:51):

You were talking earlier about this initiative to sort of standardize some of the approaches and you touched before on data completeness. And it's something that I understand a little bit from some of my past roles working with people who looked at big data. So I get that data completeness is super key. How do we start to address those challenges that you just outlined, of you've got five different systems, they're all storing things in different ways. Somebody's recorded something else in Excel. This person is off on maternity leave, so we can't ask them where they put the dis drive in all those kind of things. What are the approaches or what do you think will be the solution, where are we moving towards an industry?

Sebastian Gross (25:37):

Yes. The dream will be that we have some kind of standard, but I guess this will not be the future or on short or midterm point of view. I guess the solution will be linking and description. So yes, you have data. You have maybe stored them in different ways. Some, they are paper based, some were on Excel sheets or in ELN, wherever. The key would be to make links between the different data's, that at least you have mentioned, "Here, I have also something which is maybe from interest in the ELN, here I have something which is stored in the database that you can follow up also the history of data. So, okay, this is the raw data, then you may need a raw data repository, this raw data comes from this machine, this machine has this maintenance interval. It's not possible to store everything in one data container.

Sebastian Gross (26:39):

But if you have the link from the reside to the raw, if you have the link from the raw to the machine, then you can have from the machine to the maintenance intervals. And then you may see, okay, at some point from Wednesday to Thursday, my results looking completely strange. And then you can go back, back, back, back, back on your local PC, without talking with 10 different people, because the evaluation has done by the researcher. The results have been collected from technician in the lab and the maintenance has been done completely off on a Monday where no one was in the lab. You can follow back and learn about. These are also, then if we are talking about data lakes, and there's always this cloudy picture where they put all data's together. And in most slides, I miss a bit the arrow back, what to do with this data in the cloud.

Sebastian Gross (27:38):

And this is not from myself, but this is something colleagues said, we have to take care that the data lake is not a data slump. And right now it is more or less. We're putting data there, but they had described, and this was the same situation in our lab and the teaching and the [inaudible 00:27:54] that the data was there. It was bad described. And you could not use it for another approach that we have thought about before and by describing the data in the right way, and it does not matter how the description is done, if it is in Excel or somewhere else, it needs to be described.

Sebastian Gross (28:09):

So this is data. I have performed these analytics before, I have made these evaluations on this machine. You need to link them. That's the next point. And by describing and linking, you can already get a huge impact on how fair your data's are and how good they can be reused for the next data scientific approach, without having a huge input on a corporate data standard, because if you want to roll out a data standard, even in a tiny company, this would impact nearly all processes you already have. And this is something the most companies wont do.

Paul Orange (28:52):

Right.

Sebastian Gross (28:53):

If they are not starting by scratch.

Paul Orange (28:56):

Okay. Okay. You've you talked about describing the data and I'd just like to drill into that a little bit. What does that mean? Or how does that manifest itself?

Sebastian Gross (29:10):

By easy stuff, by putting a unit to the data.

Speaker 1 (29:13):

Okay.

Sebastian Gross (29:16):

It's kind of obvious, but it's not always there that you have a unit to your data, it does not necessarily be transformated to SE units, like grams or seconds. But if you have, okay, this is in kilograms or milligrams. Okay. You can convert it to everything, what you want. So this is, I guess the first point, these big sets of meta data who has operated this, in which context? What's the original aim of this? Which campaigns does it belong? So it's easy to have a campaign on the one hand, and then you have all the samples collected there, but maybe you have the raw data center, then it would be, okay, this raw data belongs to this, which standard protocol SOP is related to this data points? So yeah. Stuff like this.

Paul Orange (30:03):

Okay. So, some of it is really, really basic, but without it, you're going to be in trouble down the road if you're trying to, like you say, interpret data and the person who did the experiment isn't around.

Sebastian Gross (30:16):

Yes, exactly. And may also try to think, or maybe to ask a colleague, if you would see this data point, what all do you expect to see with this? Because just, if I need a for my... Again, bio process to evaluate my dioxygen curve. Yes. It's just oxygen. And I see, okay, this is behavior of the bioreactor, but maybe someone else sees something completely different. And maybe it's also different if the probe has been calibrated to pressured air or to pure oxygen on what is, there's 21 value, but what does it mean? 21 from from what? Is it a percentage? Is it picomole or whatever?

Sebastian Gross (31:04):

It sounds basic, but it's not there everywhere. This list is also not complete and really to think what other applications could expect to my data. And not only to see the own data need or the own process which we are performing.

Paul Orange (31:23):

Okay.

Sebastian Gross (31:23):

... Which is performing. Let me state this again. It's really a challenge to put all the description or metadata to a data point or to data's by just considering the current use case. Normally, we have always a method or analytical process behind, and this is the use case the data are generated for, and for the next use case. And...

Paul Orange (31:44):

At this point, unfortunately, our video conferencing software kicked us both out. So Sebastian and I reconnected and then continued the discussion. Yeah. So, sorry, you were saying it's about being able to then reuse the data for subsequent campaigns and things like that.

Sebastian Gross (32:04):

So, as I see it, and this was also the idea and this partially in this KIWI-biolab project, you're generating a lot of data and you store them in a data lake and you put it somewhere central to have them in place as soon as something as you want to do with the data. It's not anymore screening purpose. It's the dream that you just have the data, you always have and you put fancy machine learning or artificial intelligence, however you want to call it, on top of this. And then something completely new added value comes on top of this. And maybe in the future, it will be like this. I haven't seen this right now, but to do this, to have the chance, to have an added value on additional analytics, on the data, the data needs to be somehow complete. And this is really challenging.

Sebastian Gross (32:53):

The description of the current use case, is not sufficient of the description for a machine learning use case. There you need a lot of more data. Most cases, and this was also the case in our lab, the informations are available. The informations are in the bad case, in the head of the peoples, in other cases, in other digital format, but it needs to be collected. And if there is no link between the different digital entities, then you start searching. But by a link you can try or you directly see this by evaluation, "Okay, I have this data and there are additional informations there, and there are this additional informations again there." And then you can start to collect them. This would be the second best approach.

Sebastian Gross (33:38):

And the best approach is to enforce people, to have somewhere on a central place to do this, but maybe this also then requires central data model. This will requires a central data model and to have a central data model, which covers everything from maintenance cycles to HPRC analytics, to online measurements of dioxygen curve is really challenging and it will never cover everything. And that's why the idea is really, "Okay, put the data somewhere central and have it even in different data models. It's fine." As soon as they're linked, as soon as you can make the connection from the result to [inaudible 00:34:20] and to really think maybe two times about what influenced my data.

Paul Orange (34:25):

Yeah.

Sebastian Gross (34:27):

And not only what is needed for my current use case.

Paul Orange (34:30):

I think one of the things that I'd like to explore a bit is you've spoken a lot about the data in different aspects of the data, but the data is generated. And I think probably in this case, we're really talking about an instrument of some sort generating data, it's not somebody making manual observations for the most case. So if we think about then, what are the things that then are important if you're trying to design a lab or a facility, what are the things you need to think about with the type of setup that you put in place we've covered central stores, you want to describe your data, hopefully format that allows you to correlate data between different systems where it's appropriate, but then you think about, "Well, is there a workflow set up of the specific types of things I need to be recording?"

Paul Orange (35:20):

I mean, you've already talked about things that I wouldn't have considered, like maintenance cycles. And it makes me think about raw materials QA as well. I mean, but yeah, more on the sort of the physical setup of the lab space or the systems in there, what are the things that are important?

Sebastian Gross (35:38):

I see two trends, which are completely the opposite right now. The one is to have, in both cases, you have more and more automation in the lab and you bring the automation also closer to the people. It's not that you have processes, which running completely automated and processes, which are running completely driven by technicians or people in their lab. It's more, "Okay. You have parts which are good by hand, and you are then overhand your samples to an automated facility like this." And these other automated facilities right now are huge facilities in most cases, where you have these liquid handles and even tiny liquid handler is a meter by a meter and two meters high, something like this. So even the smallest ones are quite heavy. And for this, you need space and space in a lab is something like, gold dust. You don't have it.

Paul Orange (36:31):

Yeah.

Sebastian Gross (36:32):

I never have seen a lab who has saying, "Okay, we have space. No worry, just put two machines again on." So there is a trend to have tinier automation facilities, to have them more modeler. End to end automation, you need not only a liquid handling or piping arm, you need also, maybe plate with plate washers for analyzer, analytic instrument, something like this. And that is also the case in our lab. Even in the technical university, we had a lot of space. We had the possibility to put four liquid handling stations, no five liquid handling stations in one lab, but also we was only able to couple three of them in the way together that you can over hand sample from one liquid handing stations to the other directly. And that's why we had this need on a tiny car with robotic arm on top.

Sebastian Gross (37:20):

You notice instrument from this mobile app assistant from project, but there are other types around, which doing the same, which is driving from one room to the next one, just to put samples from one liquid handling station to an analyzer. There are tiny cars from certain vendors with driving around on the table that you do not need a track anymore, from one instrument to the next one. There are even elevators which came from one floor to the next, one to bring samples from A to B. It's quite nice because then you can operate such facilities 24/7. You may have to take care a bit about consumer bills and that you have enough supply materials for running these processes. But these big liquid handling these big automation facilities needs a lot of investment. And I know if you have a technician who is trained 30 years, he or she is always better than a liquid handling station and even faster.

Sebastian Gross (38:20):

The difference is the technician comes in the morning and goes in the evening and a liquid handling stations can run 24/7. And that's why maybe you do not have enough technicians with these skill sets and you cannot run them 24/7, or it's really, really expensive to produce. And at this point, these huge automated facilities comes in place. I really was looking forward to see what comes these big facilities, where you may need extra rooms, because they do not fit in a common lab space anymore. Or if this is the trend goes really to tiny automation islands, which are connected by robots or something other automations units, or even by humans, it's also fine to have a process running eight hours on one instrument. And then overhanded manually to the next facility and these automation items, it could be much more easier, it could be cheaper, they could fit better and to a current lab space, because only a few companies are able to just build a newer building for new lab. And I'm really interested in what comes.

Sebastian Gross (39:21):

So you have these automated cloud labs like Strateos who just have everything in place and huge automated facilities. And you just on the computer to add what you want to do. You are facing in this kind, also other scheduling issues. And it's really complex to do this. And on the other side, you have instruments like the Opentron, which is a liquid handler by half meter, by half meter. So really, really tiny unit. Limited possibilities but for, I would say 80% of what you want to do in a lab piping, adding buffers, preparation, multi wear plates, that's fine. I'm really interested, I'm really looking forward what comes because I'm preferring this model lab, this tinier approach, because I guess it's easy and works better for the humans to be in such a lab.

Sebastian Gross (40:08):

If you have then your robotic... Call it something like this. I don't say these cobots will come so much in the lab because if you lose a lot of the possibilities or the power of automation, if you give a robot the same pipette as you do it by human, because they are a tailored for human, I guess you would lose a lot of efficiency if you do the same for robots. I guess you can make a lot better stuff if you tailor the automated equipment for robots. Right now, even the plate reader is if you have a front face and maybe you have a panel there, but if you want to cover this to a liquid handler, then the panel looks always to the liquid handler, but I want to operate it from the other side.

Paul Orange (40:46):

Yeah.

Paul Orange (40:46):

And as soon as the instruments are more tailored for automation and not anymore for human use, then you will also get a big impact on automation and efficiency. Yeah. That's what I'm thinking about based in the lab. It's really complicated. And if you are talking with people who start automation, the first question is, "Okay, how many space do I need for this?" And even if you have the possibility for the investment to order big liquid handling station, then you may not have the space or you have to make something free and then it's always [inaudible 00:41:23] "Okay. Now this is my place. This is the place for my group. And here I have my working space" and now two PhD students have to clean their space for one liquid handling station which is then 80% [inaudible 00:41:36].

Paul Orange (41:36):

It's a challenge for sure. Well, Sebastian, this has been really interesting, a little bit different for topics that we talk about on the show. But I think, thinking about data and the thing that's in my mind is what you said right at the beginning is some of the contradictions within this space of these are validated processes. So things need to be nailed down and stable, but they also need to be secure. And that means that things have to change over time. And you've got these two conflicting things coming together. It's a big, hairy challenge basically, isn't it?

Sebastian Gross (42:07):

Yes. It's really challenging because it's also a matter of training. Now you have, in my case, I have studied by technology and I had learned a lot and I also had volunteered to make some programming lectures, but then, okay, now we need a data pipeline, we need to operate a Linux server. I have no clue about Linux administration. I'm a bio [inaudible 00:42:29] and it was quite early that, okay, the server is doing not the same. Okay. I can try Zoodo update, issue is not solved. And at some point you don't have the knowledge anymore. And this comes quite clearly in this research part, these overlapping parts from IT to research and how to do this. Or even if you generate more and more data, you may need a higher server. It's not anymore possible to store everything on your local computer. So you want to have it connected. You want to have it central and even the amount or space.

Sebastian Gross (43:03):

So now we are not talking about megabits anymore. We are talking about terabytes, which comes up monthly. If you see even the Microsoft speed, which generates huge pictures, and then you start operating a server and you have no training of this, and then you're calling your IT. "Oh no, no, no. If you want to run a server, then this is corporate and we have to do this and this and this." And then you say, "Okay, no, I have my server here and it's fine." It works. But sometimes it breaks and then more or less everything breaks down because you have to take care about back arms or maybe a recovery strategy, security issues, making updates. But you do not want to make updates because right now it's everything running. It's nice.

Paul Orange (43:46):

Yeah.

Sebastian Gross (43:47):

But this brings us in a very risky situation. I see this more on academics point of view than on the industry, because industry, they have their IT departments who at least assist researchers by doing this. In the best case they take completely over, they say, then you just say, "I need the file service, this model on space." And then they say, one month later, "There it is." And especially for pharmaceutical companies, I have only a very limited view right now. But what I have seen, they know that security is something they have to take care. There's nothing you have to teach them anymore. And it's easier to have rapid update cycles. And in a non-GXP environment, because in GXP environment, if you have updates and then you have a multi-vendor environment, you have windows PC's, you have Linux PC's, you have drivers from an integration company, you have instruments from different vendors and they all want to have update.

Sebastian Gross (44:46):

And then you have also cloud running software, where the vendors is there, the cloud provider makes updates and everyone is making updates every time, which is good because it makes your system secure, which is maybe bad because if one of these updates influence your process, then maybe your process breaks. And partially you do not have to control anymore when to do updates, especially if you're working. Not, especially if you cloud, but if you are working in a cloud or in a disputed system, then maybe someone makes updates and you did not recognize it as a researcher. I hope IT will always recognize this. And then you start your process in the morning and after a few hours it breaks. And you just ask why in a regulated environment or for robust processes, I guess this is a challenge. To do this, there are concepts and Wega is also working on this concepts.

Sebastian Gross (45:34):

I'm quite proud that we have these networks within Wega who are taking care of this, how to make automated testing, to keep your system in a stable or in the optimal state, or to ensure that everything works properly by automated testing. So there's a lot of testing process you can do. There is code reviews, unit testing, but this is all done by the vendors, but there is a state for testing, which is more or less always in the company or by the end user, if it comes to how the softwares are worked together is the application interface is still the same if the graphically... If I push on button here and it costs 10 applications in the background, it's quite hard to test this, to do this in the automated manner is something that will come very soon or which is partly already there.

Sebastian Gross (46:23):

And how to bring this and regulated environment will be a challenge for the next two or three years. That you can have a regulated environment on weekly basis up to date without redoing the whole validation process. Or you just say, "Okay, at some point, these updates are not relevant." Or at some point you say, "This update is not relevant for validation. You just do it." But my knowledge is, especially in the academics, in the technical university, we skilled all automatic updates because we want to... It always breaks appropriate process. Maybe only once a month, maybe only once a year. But if you are the student or the PhD student who is doing the experiment by automated update, or by update that your process fails on Friday afternoon and you come on morning, then you start crying. It's always a mess doing it the same.

Paul Orange (47:15):

Yeah.

Sebastian Gross (47:15):

Then again, if we are in an automated facility, if you have a bigger facility automated running and it's booked all the time, then yes, it's one matter to do all the effort to do experiment twice, but it's always a matter. "Okay. When do I find a new slot for my experiment?" I also already have booked analytics. And so there is a need on robust processes, there's a need on rapid update cycles and rapid update cycles and robust multi-vendor processes are not always going hand in hand.

Paul Orange (47:49):

Well, Sebastian we've kind of come to the end of the hour and I'm very thankful for your time today. I know we were hoping that Nico was going to join us, but he was suffering from the tail end of COVID it seems, and lost his voice this morning. So thanks very much for volunteering to do the interview today. It's been great, lets say it's been a different view on things we normally talk about, but really important. The reason we do the experiments is to gain great understanding. And if we can't interpret what we're seeing, we know further forward and this is really important. So it's fascinating, thanks very much.

Sebastian Gross (48:21):

It was a pleasure, Paul, thank you very much.

Paul Orange (48:23):

Okay. So I hope you enjoyed episode number 18. And although we were talking largely about Sebastian's experiences coming through from a bio process side, I think you can see the parallels with any kind of scientific discipline or study, which is generating large quantities of data. So we're going to be taking a bit of a summer break now from the Modern Chemistry Podcast. So we will be back towards the end of the summer 2022 with some great new guests. And as always, we're looking for suggestions for people to join us on the show. So if you are interested in taking part or you think there's somebody we should have on the show, please do drop us a line. And there are links in the show notes, until then have a great summer take care, stay well and speak to you next time on the Modern Chemistry Podcast.

Paul Orange (49:22):

Thanks for listening to the modern chemistry podcast. Our theme music is provided by Kevin McLeod under a creative commons license. And if you subscribe to the show, you'll have the next episode drop straight into your podcast feed.