1/23/25 (#26)
Bibliometrics - (1) from biblio- = book; metrics = measurements. Measurements of publications, authors, etc. (2) from bibliome = complete corpus of biological texts; metrics = measurements. Measurements of text corpora.
Agenda and Minutes
- TO:
- Today: Suggests dropping queuing theory and focusing for now on the h-index issue. DB suggests queuing theory could be an interesting chapter later, but we can see later. Note that you can try different indexes, see what they say, critique them, and then try another one. We are *not* looking for the one perfect index, because it does not exist!
- Has identified a paper that models h-index stochastically. Would be a good starting point for a queuing model. We need to read that paper! He will send me a copy and we can start reading it during meetings. TO can help guide us as we read it.
- Hirsch's h-index: A stochastic model, Quentin L. Burrell (see TO directory)
- What is this one? https://f1000research.com/articles/12-1321/v1
- An article referring to a method that accounts for placement in author lists: https://www.nature.com/articles/d41586-024-04006-9
- The article it refers to that actually has the method: Ioannidis, J. P. A. Preprint at bioRxiv https://doi.org/10.1101/2024.10.14.618366 (2024).
- Plans to do a review of many articles. Report briefly on each during our meetings and address questions we have.
- Consider "Ioannidis used data from the Scopus citation database to compile a list of top-cited researchers on the basis of a metric he calls a “composite citation indicator”, which takes into account the varying levels of contribution by a paper’s co-authors."
- See popular account https://www.nature.com/articles/d41586-024-04006-9 (partly walled). Cite it in your lit. review.
- See paper: Ioannidis, J. P. A. Preprint at bioRxiv https://doi.org/10.1101/2024.10.14.618366 (2024). Cite in lit. review.
- The
following supposedly say how the credit is distributed across multiple
authors. That method needs to be written up in the lit. review, and
these references cited. Explain what they did about distributing credit
across multiple authors, why it's not good enough, and how NM and you
are doing it better.
- 15.↵ Ioannidis JPA, Baas J, Klavans R, Boyack KW. A standardized citation metrics author database annotated for scientific field. PLoS Biol. 2019;17(8):e3000384.
- 16.↵Ioannidis JPA, Klavans R, Boyack KW. Multiple Citation Indicators and Their Composite across Scientific Disciplines. PLoS Biol. 2016;14(7):e1002501.
- Consider
the concept of the "Systematic Review" or "Systematic Literature
Review" (SLR). Plenty of guidance on the web about it, just hunt for it.
My advice to PhD students writeup may have some more concrete links.
See https://thehumanraceintospace.blogspot.com/p/how-to-do-research-and-get-phd-few-hints.html.
- Makes your review more publishable.
- Guides the reviewing process in a step by step, organized, systematic way that can help you figure out what to do.
- A first step is decide on a scope of the review. You can change it when you start doing it and seeing what the terrain is like.
- NM: turn thesis into article? AAS is ok, next step is to develop an outline. AIs can be helpful as a first step.
- Looking for a job, schedule and attendance at these meetings is not clear.
- IEEE is ok with using AI to write things but you have to explain in the Acknowledgement section.
- Step
1: get an outline to guide you in making the article. Ask an AI!
Claude.ai is able to take long docs like a thesis. Ask it to suggest a
detailed outline for a scientific article on the thesis. Ask ChatGPT
too. Compare the outlines, decide, etc.
- Overall research could address various experiments, such as
- The interval based method NM used augmented by the add-to-100% constraint.
- The 100%/80%/60% approach. Compare with interval approach, etc.
- Use the mean and 2nd level citations. Etc.
- Queuing model direction, things to check include potentially
- Does publishing in a journal get more citations than other journals (cf. impact factors of journals)
- Is an article a server or an arrival? Try to model it and see what happens!
- Is a citation to an article a server or an arrival? Try to model it and see what happens! Etc.
- Try different modeling approaches and compare and contrast and see if any of them have potential for further investigation.
- https://bibliometriks.blogspot.com/p/to.html for other chapter ideas.
- R&D (read & discuss):
- It was suggested for people to present summaries of papers during meetings.
- Meho, L.; Akl, E. Using Bibliometrics to Detect
Unconventional Authorship Practices and Examine Their Impact on Global
Research Metrics, 2019-2023. Preprints 2024, 2024070691, https://doi.org/10.20944/preprints202407.0691.v1. We are up to Table 3 and can start there next time.
- Adjourn.
- Transcript:
Fri, Jan 24, 2025
0:00 - D.B.
Yeah. Hang on. I got it.
1:04 - G.S.
Good afternoon, guys.
1:08 - Unidentified Speaker
Good afternoon.
1:10 - E.G.
Hello, everyone.
1:16 - D.B.
OK, let's see. So for today, G.S. is going to give a presentation, which is a rehearsal of his research, his dissertation presentation. And next time, we'll see if there's any comments on this guy V., who I believe spoke this morning. And also, next time, we'll get back to those master's students who are working on their master's projects writing books. And also, there was some thought that Dr. M.'s students would be able to explain some of these tools, perhaps next time. So I haven't heard, really, from anybody about any of these things recently. But we'll see. I think we're just going to go straight to G., to you. And you can do your presentation. And everyone, any critical comments or helpful comments, anything like that at the end or during it, that's sort of why we're doing this.
2:30 - G.S.
OK. Thank you. Let me share my screen. Yeah.
2:46 - D.B.
Can I share my screen? Yeah, you should be able to.
2:52 - G.S.
Okay. All right.
2:54 - D.B.
Can you guys see my screen?
2:57 - G.S.
Yes, I can. I can. Okay. All right. Thanks, Dr. B. for the opportunity. Let me just make it full screen. You guys can see the full screen, right?
3:11 - Unidentified Speaker
Yes, sir.
3:12 - D.D.
Yeah.
3:13 - Unidentified Speaker
Okay.
3:13 - G.S.
All right. So the objective of this meeting is to go through the progress that I have made in my PhD research and essentially getting an approval from my committee to proceed with the dissertation. So in my committee, I have Dr. B. is the committee chair and then Dr. P. and other members are Dr. P., Dr. M., and I have an external member, Dr. S., who is the who's a medical doctor by profession and who is an affiliate graduate faculty. The agenda is to go through the key terms and set the context of the presentation and explain what I have done as part of the prior art search and analysis, go through the main idea and hypothesis, the strategy and approach that I have used for my research, and the language models and the experimental framework, and go through my research idea called as DRECT, which is Distant Rapid Embeddings Transfer. It's a novel method of transferring knowledge from a domain model to a base model, and then go through the experimental results and a demo. And then essentially, I will be asking the committee later, whenever I'm having a discussion with them to proceed with the dissertation and then any Q&A. Introduction to the key terms, EBM, stands for evidence-based medicine. It is essentially a systematic approach to collect, appraise, and apply relevant evidence for clinical decision making. Analysis essentially is a framework of structuring the clinical research questions and extracting key elements from medical text. So there are primarily four components of a PICO framework. P stands for population. It explains who the study is about. For example, the population which is participating in that particular test. Intervention is what is being tested. It could be a specific drug like insulin drugs or cancer drugs or whatever is being tested as part of the test. Comparison is what is the intervention being compared against. It could be other drugs or it could even be placebo. And outcome is what is being measured as measurable outcome of that particular test. Stands for Systematic Literature Review, which is a rigorous method followed in the academic field to collect, evaluate, and synthesize relevant research data and to answer a specific question, minimizing bias. And RCT stands for Randomized Controlled Trials. It is an experimental study which is designed to for testing the efficacy of intervention. Language models, it is essentially an AI model trained to understand and generate human-like language. So at a broader level, there are two categories. General purpose models, which are optimized for efficiency and speed. For example, distilled bird is an example of a general purpose model. They don't really have any specific knowledge. They have broader understanding of human language. And domain models are specialized domain trained models which have got domain specific information or knowledge. For example, BioBird, ClinicalBird. These are biomedical domain models which have been trained on biomedical literature and clinical text. Model vocabulary is a set of unique tokens the model recognizes and processes during the training and inference. Embeddings are essentially numerical vector representation of those tokens. And during my presentation, I will also be talking about BLURB, which is a biomedical language understanding and reasoning benchmark maintained by Microsoft Research. It provides a standard benchmarking process for biomedical NLP tasks. The index supports six diverse tasks that is mentioned here, named entity recognition, relationship extraction, PICO, sentence similarity, document classification, Q&A. And there are 13 publicly available data sets so that there is a standard process for the benchmarking exercise. Focus of my research is PICO and not on the other tasks that is mentioned in the blurb index. This is an example of how a PICO classification task will look like, So I have taken a real example from the document that I've used, one of the documents that I have used in my research. So you can see that the purple color text is talking about an intervention or a comparison. Cisplatin is an example of a drug. The pink one stands for the, I mean, it basically shows the population. In this case it is people who have got ovarian cancer and there are 176 eligible patients and green one is the outcome in this case the overall clinical response rate has been classified as an outcome. Intervention and comparison are of the same category that's why they have been clubbed together. Prior at search and analysis as part of before proceeding with the the research itself. I did an extensive prior art search and we followed the same process that has been explained by Kitchenen on how do we perform a systematic literature review. The outcome of this literature review has been published by Springer and has been presented in multiple international conferences. The details are given here. For reference. So the hypothesis of the research, there are three primary hypotheses. The first is the domain knowledge from specialized medical language models can be effectively transferred to a smaller general purpose model without requiring extensive retraining on the original specialized data set. So for the purpose of the research, I considered the general purpose model to be distal bird, which has six transformer layers, 768 hidden layers, and has been trained on 66 million parameters. General purpose model, there is another variation to that. Bird base is a bigger general purpose model having 12 transformer layers, same number of hidden layers, but has significantly larger number of parameters. Domain model, Here's one of the examples, but there are six other domain models that I have used in my research. BioBird is having similar number of layers, but has got significantly higher number of parameters. So slide 11 will have the full details of all the domain models and other details about my experimental framework. The second hypothesis is that the general purpose language model exhibits improved understanding based upon evaluation metrics of medical domain tasks after the knowledge transfer has been done, surpassing its baseline capabilities. So we use standard evaluation metrics for benchmarking the performance of the model that has been explained here. Example, balanced accuracy, MCC, Cohen-Capa, so on and so forth. The last is certain tasks, more specifically, the Pico classification task post-transfer can achieve performance levels comparable to that or exceeding those of the original domain-specific models. So this again, the research is not to show that the general purpose model exceeds the domain model in every specific task. It is only focused on the Pico classification task. To validate the hypothesis, I went through six research questions to test the effectiveness, to what extent the medical domain understanding can be improved. How does the general purpose model specifically measure against PICO classification task? In what scenarios does the general purpose model outperform? And are there any specific characteristics or conditions under which the transferred model consistently excels? And what are the computational and data resource requirements for the two approaches? The data set itself is the one that is available in the blurb index that I explained earlier. It is essentially having four labels, intervention, participant, outcome, and O is the outside category. And as you can see, this is an imbalanced data set with O having significantly a higher number of values. Use. So this is my idea, DRET. Like I explained earlier, it is still a rapid embedded transfer process. I'm introducing a new technique to use a priority-based multi-model knowledge infusion method for resource-constrained domain adaptation. So the process basically merges knowledge from biomedical models into a smaller general purpose model. In this case, we have used distilled BERT as an example of demonstrating the outcome and achieve the same domain knowledge as what was existing in the original domain model. The novel framework advances the state of art. There is no need for having large volumes of labeled data and extensive computation. And the research Like I will explain in the next few slides, the research has confirmed that similar performance can be achieved. In some cases, it can even exceed the performance of the domain model for a specific task. Strategies and approach. As part of the research, I conducted more than 40 plus experiments. These are the various phases that I used in my research. The first was the baseline phase where I went through a series of experiments to find out how the domain models and the base model were performing on that particular data set. And I did the same thing for blurb models. And then there were a set of experiments for data and vocab augmentation. And then there were various variations to the DRED approach. And currently I'm working on DRED version number five, which is future work. But as part of the first 40 experiments, we have been able to demonstrate that the hypothesis is validated. As this is our experimental framework, there are six domain models that I considered that is listed here. And there are two BLURP models, which essentially the leaderboard models, which are performing everything. This is the state-of-the-art model from the benchmarking index. And then there are two base models that I considered for the research. One is the distal bird, other is the bird-based model. There are three direct variations, and they were progressively giving me better results. The first is direct variation 1.0, about merging the vocabularies without doing any embedding modification. The second follows the embedding average process, where the weighted average of embeddings across domain models were considered. And the last was a priority-based embedding transfer process, which used a priority-based scheduling algorithm to assign higher weights for those domain models, which were giving better performance than other domain models. With that, I would like to go through some of the demo and results. Any questions so far?
15:56 - V.W.
I have several, but I'm going to wait till you're done. OK.
16:02 - G.S.
All right. So the first is, this shows the total tokens in the original distilbert versus the updated distilbert. As you can see, the number of tokens significantly higher after the knowledge transfer was done. And the second graph below shows the... The second slide, I mean, the bottom graph shows the token transferred by individual source models, blue bird and the, I can't see the, and the biomed bird model.
16:54 - Unidentified Speaker
Okay. This is the overall metrics comparison for the PicoTask.
17:03 - G.S.
Like I mentioned earlier, the comparison is only for for the vehicle classification task and not for the other task that is supported by the blurb index. For imbalanced datasets, these are the primary metrics that are used for evaluating the performance that is highlighted in the light blue color. Other metrics are good to have, but not essential. As you can see that the dread variation beats at least one blurb model every single time. And in some cases, it has even beaten the blurb model for some of the metrics. For domain models, we were able to demonstrate that direct variations have beaten every single domain model. And we also did a comparative analysis between RET using the BERT approach and RET using the distalBERT approach. And we found that the distalBERT approach gives me better performance than and the BERT variation. Key observations, we found that at least the DREAD variation beats at least one blurb 100% of the time, and it's a leader 50% of the time. Versus base model comparison, it outperforms five out of 12 cases and leads in a few metrics. Versus domain models. It has outperformed the domain models every single time, including meeting all the domain models. Dread distal BERT versus the base model. We found that distal BERT actually outperforms the base model 83% of the time. Key findings here was that, like I explained earlier, Dread distal BERT variation shows superior domain adaptation compared to Dread BERT variation. So that is what is the key takeaway from this analysis. The next is the previous slide talks about the overall metrics. Now, we also went one level inside. We found out how these models are performing at the individual class level to get the class-wise metrics that is shown here. The key observations were that breadth versus blurb outperforms 85% of the time. Leader in 66% of the time. Dread versus base model outperforms 55% of the time. Dread versus domain models, 80% of the time. And distilled bird versus base model, 55% of the time. So Dread variations demonstrate better performance for IPAR and IOUT, which are essentially participant and outcome classes compared to the intervention class. The key takeaway from doing class-wise metrics evaluation. This is the same information represented in a graphical format. The purple colors bars are the rate variation, pink ones are the base models, green ones are the blurb models, and the lightest blue ones are the domain models. Ideally, we would like to see more purple color on the left-hand extreme corner. But we see a few of them on the extreme left corner. The more you have, the better, which basically means that Brett has beaten the base models and the blob models and the baseline model versions. We also did a mathematical analysis of the domain knowledge gain. We used five commonly used metrics. Cosine similarity value indicates, I mean, we saw that the cosine similarity was in the range of 0.1 to 0.3. The nearest neighbor analysis is a qualitative analysis. I have a graph which shows the nearest neighbor analysis and emission and animation. And basically it shows if there are appropriate clustering before and after the transfer was done. And the third is a pairwise token distance. It shows a 9.2% improvement. Increased distance indicates better concept differentiation. The next was a silhouette score or cluster coherence. It shows a 63% reduction. Lower values basically shows more refined semantic groupings. And the last was average semantic shift. It is within the optimal range of 1.5 to 2, which indicates substantial learning while maintaining semantic stability.
22:18 - G.S.
So I have an animation which shows the embeddings for the Pico classes before and after the transfer process was done. As you can see on the right-hand side, this is the nearest neighbor analysis simulation. There are more coherent clusters that are formed after the knowledge transfer was done. So classes of similar categories or embeddings having similar nature are now showing better coherence compared to how it was existing earlier. So this is the similarity nearest neighbor analysis. Then hypothesis and research validation summary. So this is a slide which explains about what is the outcome of the summarized outcome. We are able to show that all the three hypotheses have been validated. And again, this is not a This does not mean that the DREAD approach beats the domain models and the BLOB models for all the tasks. This is only focused on the Pico classification task. And this is where I stand in my research. So I'm done with all the initial phases. Now I'm presenting the results and then the next phase is to get an approval from the committee start documenting my dissertation. Any questions? Thank you, K.
24:10 - Unidentified Speaker
Yeah, time for questions. I have a couple questions and some comments.
24:17 - D.B.
First of all, this looks really professionally executed.
24:22 - V.W.
It's very organized. It shows good train of thought, good depth of research, good scientific method. Comparing and contrasting various manifestations of these models. And so I just feel like I'm in the presence of a great scientist in the making, and I'm going to be hearing all kinds of good things about you in the future. You know, we had an A. recently win a Nobel Prize for his role in the Google Alpha Fold project, which enabled us to know the folded confirmation of all the proteins that were in the protein databank, and it represented a great ride forward. And we have some claim to fame for that in A., for as part of that Google work. So everything I say from now on, it's going to be something I would say in my own internal dialogue to myself, I was doing this work. And so I want you to take everything with that grain of salt. I understand where you are in your dissertation process, and you're at a vulnerable moment. So I don't want to like, I don't want to be so overbearing that it hinders you because you've definitely hit the stride, your wind is at your back, and it looks like you're doing some great work. But I have some things that I need to mention. One is your PICO paradigm looks like a very effective paradigm for evidence-based medicine. And I'd like you to take a moment and compare and contrast the PICO paradigm with a paradigm I call what versus how. What versus how can be expanded into what we know as factual information versus how we figured it out. And let me give you a specific example. When I read a typical biochemistry paper, they'll tell me in the abstract what they figured out, and they'll also tell me how they figured it out. So they tell me what, and then they tell me how. And in a good paper, they'll really clean separate the two aspects of what factually did we discover in this process, and then what were all the backbends and western blots and protein electrophoresis things that we had to do to get the answer. The trouble is, is in a lot of papers that would have the highest quality of answered facts, we have this horrible mixing of what versus how, where we get so, and I'm not talking about you specifically, I'm talking about the biomedical research community. They get tied up so much in how they accomplished in their laboratory procedures, that it blurs the clean delineation of what was found. It's always what we found and how, what we found and how, and it kind of, this knowledge is intermingled when it's actually two very distinct aspects, even though they are related, what we know is different from the detective work of how we figured out the mystery. And this is all we're doing is chasing mysteries, right? So the what versus how thing, I would really like to see, I'm going to do this after we're done here today. I'm going to look at PICO and compare what it yields as opposed to the discipline of separating what we know from how we figured it out. I mean, they're a little bit orthogonal to each other, but not completely. And I think it's worth a little bit of a dive on that. You know, your work is part of an ontological, uh, in modeling medical knowledge, and I put that in the chat. And the kind of question I had, because I've been looking at similar kinds of work, is that if we look at BERT, BERT was coming into its heyday about the time that the "Attention is All You Need" paper from Google was published, giving us the transformer architectures. And there was a lot of work in incorporating biological and biomedical information from PubMed and NIH sources into these large language models before we really got our stride going and learning how to use the transformer model. So if I was looking at this work in 2018, I would say like, you're a Nobel Prize, Google, Alpha, Go, all that kind of thing. But to me, at today's writing, this looks like you're doing an excellent job circa 2018 with the BERT LLM precursor, where now we're with every week we're getting another order of magnitude more tokens in our training set for the large language models. So it makes me wonder if you could re and I'm not saying this is a good idea. I'm just throwing it out there to see if it sticks on the wall. I'm kind of wondering if you started with a more up to date baseline for an LLM with a transfer learning or with RAG, if you wouldn't find yourself positioned more near the real state of the art, as opposed to rediscovering a little bit of what we already know, but actually very few people know it. But nonetheless, when we're talking state of the art, we want to be at the state of the art in all aspects. And that includes the LLM. And I have a couple more remarks that I'll hold till later that has to do with just how crazy good it's getting out there, but I'll wait on your response.
29:49 - Unidentified Speaker
Sure.
29:49 - G.S.
I think, thanks, V., for the inputs. When you say state-of-the-art, I specifically went to the Blurb index. So I will just show briefly, I will share my screen. I don't know why this shows like this. So the BLURP index is considered as an industry benchmark for NLP, biomedical NLP task. So this is maintained by Microsoft. So what I did was I went to the leaderboard and just kind of sorted this by Pico. At that time, whenever I was doing the prior search, this is what we found. So you can see that this is already a leaderboard metric that is maintained by the open source community. But you're right, I don't see any GPD models being explicitly mentioned here. They're all, as you can see, mostly variations of the BERT model.
31:00 - V.W.
Okay, I have an actual exercise that I think might be useful to do. Any of us could do it and present it to the group, but it would be basically to do a phylogenetic tree or an ancestry genealogy of large language models. And if we do that, we're going to see BERT and then its follow on to GPT 1, 2, 3, and then the branch off to the anthropic models like CLAUD and then the delineation between LLMs and GANs. And I think having that phylogenetic tree is a, is it a word that's an exercise you can put like a draw a box around doing that. And there's a second exercise, which is if I go to hugging face, I will find several of the models that are similar to the ones that you have in your blurb index there, but there'll be different aspects or different efforts. And these efforts have a pattern that you can see where a great deal of, uh, time, money, blood, sweat and tears were invested. And then the model kind of hangs because something has a bigger fish has come along and either eaten it or surpassed it or come at it in a slightly different way and shark attacked it. You know, it's like the guy giving the inspiration, inspirational talk in that shark's movie. And then the shark comes up out of the airlock and just bites him in half. And so there's this like there's this ruthless process taking place. And I would really want you not to get eaten because I think you've got all the scientific mind and proper methodology behind you that I would trust a result that you would publish. And I just want to make sure you're at, you know, sometimes, you know, when you're sprinting for the finish and there's a guy in front of you and you think, if I just dig deep, you know, what can I
32:55 - Unidentified Speaker
do to break through that?
32:57 - V.W.
And I want to see that happen in this area for you?
33:01 - G.S.
Right. I think that's a good idea. Extending this to using GPT models would be an interesting future, you know, future research that I would probably do it whenever I have some free time.
33:15 - Unidentified Speaker
Right.
33:15 - V.W.
And free time doesn't exist where you are now. But the ability to build yourself a graph of the genealogy of LLMs is definitely within an after noon's task. And then doing a comparative index between the hugging face models and the blurb models that are both curated in their own way, some more scientifically, some more leading edge, and maybe incorporating that as finding maybe there's a way you can use the rag stepping stone to transfer learning your way into using one of these LLMs. And that gets me to a part I want to say. I am super tired because I spent the last two weeks restoring a code base that during my PhD work had become obsolete. I had written a lot of Java and the NetBeans IDE had drifted out of date with many versions and the JDK, the Java Development Kit, and the Java Runtime Environment. These had all changed out from under me. And there was a user interface capability called a JavaFX, which allowed you to create a user interface which is important in modern day programming. And I went back to my code base and it was unusable because all the tools and environment had moved on. And so I asked Claude Artifacts, actually I asked Claude Sonnet, 3.5, 200K context length, help me sit down and get this straightened out. And over a two night period, I went from being, you know, five years behind to now my stuff completely works. It's all up to the most modern JRE, JDK, Netbeans 24, and you name it. But it's one of those things that you have, I had to dig deep and I had to do so in concert with an LLM assistant that far surpassed my expectations and that far surpassed the level of detail that I even thought was possible to interact machine. The conversation was unbelievable because I was like, if you've ever had a system programmer or somebody who mentored you in programming or somebody you could always go to and ask that system level question or that Unix secret or the Python secret or the statistical way of looking at your p-values or your hacking or whatever you're doing, and you knew they would always have the answer and the answer would be correct. And not only that, they would give you a working example Well, this was V., V.
35:46 - D.B.
Let's come to the point.
35:48 - V.W.
The point is, is that we're, we've come to the point where we have to be constantly making sure that we're exercising the latest state of the art to get, to get over the next hump or else we're going to get eaten. That's the point.
36:08 - G.S.
Right. Where's I.
36:11 - Unidentified Speaker
No, I'm here. Yes.
36:14 - D.B.
Any comments?
36:15 - G.S.
No, I think that was a pretty valuable input, I would say. The problem of comparing what we have on the Hugging Face site and the GPT things is that, And the BLURB index is that BLURB uses a standard means of doing a benchmarking. So they are given the same data set for every single researcher, and they are asked to benchmark their model against that particular task. I mean, basically, they don't say that we have to use only BERT variations. People are free to use GPT variations as well. There is no restriction there, right?
37:06 - G.S.
I think it will not be valid to compare what you see on the hugging face side with what you see on the blurb because they are two different, they have been benchmarked using two different approaches, two different data sets, right? So the task is not similar.
37:28 - V.W.
Right, but we want to make sure we're not benchmarking yesterday's news. In terms of a PhD dissertation, which could, for the same effort, launch us into a new and better way of doing things. And it's but for a misstep that you can find yourself in one bucket rather than the other. And that's what I've seen.
37:50 - Multiple Speakers
I want to comment that this kind of criticism is really valid for any of the kind of AI work that anybody does around here.
37:59 - D.B.
By the time a student finishes their master's degree or their dissertation, you know, well, you know, the latest from Silicon Valley is better than it was. How do you know that you're better than Silicon Valley? And the answer is that, you know, everybody's still, everybody's trying to go in different, you know, in a, explore the spaces in different ways. And, you know, if we want to do AI research at all, we're going to have to do it that way and not try to compete directly with Silicon Valley.
38:27 - V.W.
Oh, I completely disagree. I think if that, and the Google full alpha fold thing is the proof in the pudding that we are for the first time on a level playing field with MIT and Stanford and Purdue and University of Chicago. We haven't had a chance to be at this level of the playing field in our little backwater university since I can remember. And now we have that chance. We've got the MIT minds here. We've got the MIT data here. We've got the giant open courseware. So I completely disagree that we should continue in this business as usual, because we live at the little country store and not do world-class work when, but for putting one foot in front of the other in the right direction, we could be doing world-class work for the very same level of effort.
39:16 - G.S.
It just seems ridiculous. Any last comments on that, G.? No, I think I was about to say something, but I lost my train of thought. But I think it is a valid input. I don't disagree. Yeah, I don't know.
39:32 - Y.P.
Is it okay if I also share some sentiment? Yeah, go ahead. So when G. was presenting, as an entrepreneur of business, I was always thinking on how I can take it to top medical professionals who do a lot of research. And I was always thinking about how, perhaps if I have to, how can I take this discussion to them that, hey, there is some top class things happening. So in that regard, you know, and G., if you take V.'s feedback in a certain way, when you do research and then you go to the market and tell the people, they will always think about how am I going to use it? So, I think from that perspective, I will take V.'s point into consideration because if you go with saying that, hey, because I chose something old, I continued doing, and again, the other point, how and what. I was also thinking exactly, in fact, I had one more thing, why, how and what, like your why is, you know, the medicinal side of it, evidence-based medicine. And, you know, I was just, if you recently watched what L.E. and OpenAI are trying to do in cancer research, and if you want to be associated with something like that, then at least some aspect of the latest and greatest, and I don't know, I'm not a research analyst. Other people can tell you much better. I'm getting in from the other side of the table, if I can say, so you can completely wash out my feedback. But I somehow lean to V.'s sentiments that if your research and you, you know, you want to be recognized, recognized more so, then some sense of, especially if your focus is on how to do research and what is the best way of doing it, considering latest would be beneficial. That is, again, I'm coming completely from how do I help you from a business standpoint? If I take it to the market, will they be open to listening or not? That is the point of view I'm sharing, G.
42:14 - G.S.
is useless, then leave it. I think this is a valid feedback. This is a good, good, good feedback. I mean, the point is, when I started, the objective of the research is not to show that this is the only way of accomplishing the task. So the objective, basic objective was to see if it is possible to transfer the knowledge from a domain model to a general purpose model without having to go through a lot of training and is there an efficient way of doing that knowledge transfer so that the general purpose model becomes more knowledgeable about the domain task? That was the basic objective, right? Whether you do it using, whether you are able to demonstrate that using a BERT model or you're demonstrating that using a GPT variation, I think that is just a process, right? So if it works for BERT, and I'm sure it is going to work for GPT, as well, right? So it is probably maybe more efficient, but that was my objective was not to, that was the basic objective. Just see if it is possible to do that domain transfer or not. And what is the best way of doing that? So that was the basic objective. So the modalities of whether you're using GPT or BERT or some other variation of the model, I think it is not really relevant in my opinion.
43:38 - Y.P.
Thank you for the response.
43:39 - D.B.
and occurred to me, a lot of times in this kind of research and many other kinds of research, the question gets sort of focused on which approach is better as opposed to a different kind of perspective on it, which is given multiple approaches that do it in different ways, is there some way to combine them to get an even better approach? I've been very abstract. To put that tangibly in this context, you didn't test the latest chat GPT on this task, but we suspect it would work OK. Your method works OK. Is there some way to combine the two to get an even better result? Or is there some way to combine the two that might get a better result?
44:31 - Multiple Speakers
Exactly. Yeah. Makes sense. What do you think? Yeah. No real critical feedback.
44:37 - A.B.
I did want to say though, so in the industry, I work at a insurance company and we're actually partnering with a Silicon Valley vendor. Their name, M. is the name of it. But anyway, we're tackling, it's a different, I mean, it's a similar use case, different problem, but it's around summarizing medical records that get audited for like a financial recovery sort of perspective. So basically take medical records and then summarize it in a more condensed view. And then it has like decision support stuff built on top of that. But anyway, it kind of reminds, commercially, I think it's a problem that people are trying to solve in a lot of different ways. And then it kind of resonates with me because there's some of that stuff going on too, where we have to take kind of our own internal knowledge and then feed it back into these LLMs then produce a summary to the output that we need. So anyway, just from encouragement, I think it's an interesting problem. I'd also agree, V., you made some really good comments there too.
45:50 - D.B.
I definitely agree with that some of it.
45:55 - Unidentified Speaker
Yeah. Any other thoughts or comments, questions, anybody?
45:59 - Multiple Speakers
But D. I appreciate, actually, the work that G. has done.
46:05 - V.W.
He's done a good job of presenting today.
46:08 - D.D.
So I wanted to thank him. Thanks, G.
46:12 - D.B.
Thank you. Yeah, thanks, guys.
46:14 - G.S.
I always thought D. had some good questions.
46:18 - D.B.
But if you don't, we'll forgive you. I'm good. I'm good. OK. B. or V.?
46:25 - V.K.
Yeah, I'm good, I think. It was a really excellent presentation.
46:31 - D.B.
OK, so we'll have to see. I guess we're going to meet with your colleague, I forgot his name, the physician, and Dr. P., and Dr. M., who's not here today. So she'll see your presentation anew. OK, well, thanks, everyone. And yeah, so I guess we're kind of at the end. Next time we'll see if anyone would like to tell us about what happened this morning with this guy from Finland. Who actually spoke at our departmental seminar. I didn't go. Next time, we might also have Dr. M.'s students explain these tools. I don't know what the status of this is, but it certainly would be welcome. And also, we'd like to get updates from and guidance to the master's students who are using AI to write larger works like books. I see L. is here. So hopefully, you'll be here next time, L., and we'll be able to hear some interesting stuff.
47:57 - L.G.
I'll be here next time, hopefully with interesting stuff.
48:01 - D.B.
OK, good. I don't see anybody else here, but we'll see how that goes. And then, yeah. So anything else anyone wants to mention before we adjourn? No, I think, thank you so much, Dr.
48:14 - G.S.
B., for giving me this opportunity to talk today.
48:17 - D.D.
Yeah, thank you.
48:18 - G.S.
And thanks, thanks, V. and Y., and everyone else who, you know, gave all the valuable input.
48:25 - D.D.
Yeah, thank you. Yeah, and V., I also used Poe for the animation.
48:29 - G.S.
So that was, that, I mean, the animation is not something that I created from the scratch. So Poe really helped me to put this up together. That's really...
48:39 - V.W.
Well, you're doing great work with great methodology, and that's very inspiring to us all. So you can view the, if it's not hot, it's, you're not in the kitchen and you're not making anything worth eating.
48:51 - G.S.
So I really appreciate your work.
48:54 - D.B.
All right. Take care, everyone. We'll see you next time.
48:57 - D.D.
Bye guys.
48:57 - Unidentified Speaker
Bye. Thank you.
Comments
Post a Comment