Select Page

Original & Concise Bullet Point Briefs

Stanford Webinar – GPT-3 & Beyond

“The Rise of Natural Language Understanding – Professor Chris Potts and the Impact of Models GPT-3 and DaVinci 2/3”

  • Chris Potts is a professor and chair of the Department of Linguistics, and also teaches a graduate course in Natural Language Understanding
  • He has an interesting podcast, is running research papers and projects, and his expertise in NLU makes him a great resource
  • NLU has seen incredible advancements since 2012 due to models like GPT-3 and DaVinci 2/3 that are accessible via API or open source
  • These models have had increased societal impact, with many derived from them for code generation, search technologies, text to image generation, etc.
  • Benchmarks for performance measurement have saturated faster than ever.

“The Rise of Mega Language Models: Navigating the World of 8.3 Billion Parameters and Beyond”

  • Recent advances in large language models have made a mockery of the previous model sizes
  • Progress has been rapid, with models from 2018 having around 100 million parameters which now look small compared to the present-day 8.3 billion parameter megatron model and the 500 billion parameter Palm model from Google
  • The sheer scale of these models has led to a central question: how can researchers contribute if they do not have the resources to build their own large language models?
  • There are many options for contributing such as retrieval augmented learning, creating better benchmarks, solving the last mile problem for productive applications and achieving faithful human interpretable explanations of how these models behave.

Exploring In-Context Learning: The Benefits of the Transformer Architecture and Self-Supervision

  • In-context learning traces back to the GPT3 paper and differs from the standard supervised paradigm in artificial intelligence
  • It makes use of the Transformer architecture and self-supervision, which is a powerful approach for acquiring representations of form and meaning from co-occurrence patterns
  • It is still an open question why this works so well, with many researchers attempting to explain its success.

Unpacking the Success of AI Innovations: Static Word Representations and Large Contextual Language Models

  • Large-scale pre-training has facilitated the rise of two important innovations
  • Static word representations (e.g. Word2Vec and Glove) and large contextual language models (Elmo, Bert, GPT, and GPT3)
  • Both are powerful tools enabling self-supervision and the release of parameters for others to build on. Human feedback and human effort have also been essential in making these models best in class, namely through instruct models which use binary distinctions about good generations and bad ones as well as a ranking system for model outputs
  • This has helped reduce the magical feeling of how these models achieve so much. Finally, advanced prompting techniques help AI systems reason more logically and precisely by providing instructions on how to answer questions such as negation (e.g. if we didn’t eat any food then we didn’t eat any pizza). This is an example of step by step reasoning which helps bridge the gap into knowledge intensive tasks.

Exploring the Benefits and Challenges of Language Modeling for Question Answering

  • The use of language models is an effective way to answer questions in a literal substring guarantee
  • There is an alternative approach, the “LLMs for Everything” which has potential but also issues such as efficiency, updateability and trustworthiness
  • This can be solved by using retrieval augmented NLP which makes use of dense numerical representations and standard information retrieval techniques to synthesize results into a single answer.

Exploring a New Framework for Lightweight AI Programming

  • The current approach to designing AI systems is to use a pre-trained set of components and connect them together with task specific parameters
  • The traditional approach often fails to create effective, integrated systems
  • This has led to an emerging programming mode where large, pre-trained components are used to produce prompts which allow for message passing between them, creating entire AI systems that are entirely about dialogue between components
  • A new paper, Demonstrate Search Predict (DSP), provides a framework for lightweight programming, allowing for maximum use of pre-trained components.

Unveiling the Potential Risk of NLP and AI Technologies

  • NLP technology will cause disruption in laggard industries
  • Artificial assistance will become more ubiquitous and AI writing assistance may be used for student papers
  • Negative effects of AI and NLP, such as disinformation spread, market disruption and systemic bias, will be amplified.

“AI’s Surprising Progress in the Past Two Years: Superhuman Trustworthiness and Unsolved Questions to Come”

  • Chris discussed his predictions for the next four years of AI technology
  • He noted that much of what he predicted two years ago has already come true and he is surprised by the progress made with text image models and diffusions
  • He noted that while AI technologies have become more efficient, they also require very large expenditures and can have an environmental impact
  • Finally, he discussed how trustworthiness in these technologies may require them to be “superhuman” and how large language models may be able to answer questions that humans are not yet aware of.

“A Discussion on the Possibilities and Implications of Artificial Intelligence with Petra”

  • Petra, a professional development student, discussed the implications and possibilities of Artificial Intelligence
  • She suggested that individuals could combine their domain expertise with AI and make meaningful progress on a problem, rather than merely having demos
  • Petra concluded by thanking the audience for joining her webinar, and requested feedback for future topics.

Original & Concise Bullet Point Briefs

With VidCatter’s AI technology, you can get original briefs in easy-to-read bullet points within seconds. Our platform is also highly customizable, making it perfect for students, executives, and anyone who needs to extract important information from video or audio content quickly.

  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness
  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness

Unlock the Power of Efficiency: Get Briefed, Don’t Skim or Watch!

Experience the power of instant video insights with VidCatter! Don’t waste valuable time watching lengthy videos. Our AI-powered platform generates concise summaries that let you read, not watch. Stay informed, save time, and extract key information effortlessly.

so Chris Potts is a professor andactually also the chair of theDepartment of linguistics and bycourtesy also at the department ofcomputer science and he's a great expertin the area of natural languageunderstanding so he's you know therewould not be a better person to hearabout a topic than him and we are sograteful that he could make the time andhe's actually also teaching a graduatecourse cs22 for you natural languageunderstanding that we actuallytransformed into a professional coursethat is starting next week on the sametopic so you know if you're interestedin learning more we have some linksincluded uh you know Down Below on yourplatform you can check it out and youknow there's so many other things thatcan be said about Chris like he has asuper interesting podcast he's runninglike so many interesting research paperslike projects he worked on so you knowgo ahead and learn more about him likeyou should also have a little link Ithink without foreign I think we cankick it off Chris thank you so much onceagainoh thank you so much Petra for the kindwords and uh welcome to everyone it'swonderful to be here with you allum I do think that we live in a goldenage for natural language understandingmaybe also a disconcerting age a weirdage but certainly a time of a lot ofinnovation and a lot of change uh it'ssort of an interesting moment forreflection for me because I startedteaching my nlu course at Stanford in2012 about a decade ago that feels veryrecent in my lived experience but itfeels like a completely different agewhen it comes to nlu and indeed all ofartificial intelligence I I never wouldhave guessed in 2012 that we would havesuch an amazing array of Technologiesand scientific Innovations and that wewould have these models that were justso performant and also so widelydeployed in the world this is also astory of again for better or worseincreasing societal impact and so thatdoes come together for me into a goldenage and just to reflect on this a littlebit it's really just amazing to thinkabout how much many of these models youcan get hands on with if you want toright away right you can download or usevia apis models like Dolly 2 that doincredible text to image generationstable diffusion mid-journey they're allin that class we also have GitHubco-pilot based in the Codex model fordoing code generation tons of peoplederive a lot of value from that systemyou.com is at the Leading Edge I wouldsay of search technologies that arechanging the search experience and alsoleading us to new and better resultswhen we search on the web whisper AI isan incredible model from openai thisdoes speech to text and this modelis a generic model that is better thanthe best user customized models that wehad 10 years ago just astounding notsomething I would have predicted I thinkand then of course the star of our showfor today is going to be these biglanguage models gpt3 is the famous oneyou can use it via an API we have allthese open source ones as well that havecome out opt Bloom GPT Neo X these aremodels that you can download and workwith to your heart's content providedthat you have all the Computingresources necessary so just incredibleand I'm sure you're familiar with thisbut let's just you know get this intoour Common Ground here it's incrediblewhat these models can do here's a quickdemo of uh gpt3 I asked the DaVinci 2engine uh in which year was StanfordUniversity founded when did it enrollits first students who is its currentpresident and what is its mascot andDaVinci 2 gave a fluent and completegreat answer that is correct on allcounts just incredible that was withDaVinci 2 we got a big update to thatmodel in late 2022 that's Da Vinci 3 andhere I'm showing you that it reproducesthat result exactly and I do think thatthat DaVinci 3 is a big step forwardover the previous enginehere's actually an example of that youknow I have like to play adversarialgames with this model and so I askedDaVinci to would it be possible to hirea team of tamarinds to help me paint myhouse assuming I'm willing to pay themin sufficient quantities of fruit tomeet minimum wage requirements inCaliforniathis is adversarial because I know thatthese models don't have a really richunderstanding of the world we live inthey're often distracted by details likethis and sure enough Da Vinci 2 gotconfused yes it would be possible tohire a team of tamarinds to paint yourhouse you would need to make sure thatyou're providing them with enough fruitto meet minimum wage requirements and soforth so easily distracted but I triedthis again with DaVinci 3 and with thesame question it gave a very sensibleanswer no it would not be possible tohire a team of tamarins to help youpaint your house DaVinci 2 was notdistracted by my adversarial game thisis not to say that you can't trick DaVinci to just go on to Twitter andyou'll find examples of that but again Ido think we're seeing a prettyremarkable rate of progress toward thesemodels being robust and relativelytrustworthythis is also a story of scientificinnovation that was a brief anecdote butwe're seeing this same level of progressin the tools that we use to measuresystem performance in the field I've putthis under the heading of Benchmarksaturate faster than ever this is from apaper from 2021 that I was involved withkeyla at all here's the framework alongthe x-axis I have time going back to the1990s and along the y-axis I have anormalized measure of our estimate ofhuman performance that's the red lineset at zero so mnist digit recognition agrand old data set in the field that waslaunched in the 1990s and it took about20 years for us to surpass this estimateof human performanceswitchboard is a similar story launchedin the 90s this is the speech to textproblem it took about 20 years for us toget up past this red line hereimagenet is newer this was launched in2009 it took about 10 years for us toreach this saturation pointand from here the pace is really goingto pick up so Squad 1.1 is questionanswering that was solved in about threeyearsthe response was Squad 2.0 that wassolved in less than two yearsand then the glue Benchmark if you werein the field you might recall back theglue Benchmark is this big set of tasksthat was meant to stress test our bestmodels when it was announced a lot of usworried that it was just too hard forpresent-day models but glue wassaturated in less than a year theresponse was super glue meant to be muchharder it was also saturated in lessthan a year a remarkable story ofprogress undoubtedly even if you'recynical about this measure of humanperformance we are still seeing a rapidincrease in the rate of change here andyou know 2021 was ages ago in the storyof AI now I think this same thingcarries over into the current ERA withour largest language models this is froma really nice post from Jason way he isassessing emergent abilities in largelanguage models you see eight of themgiven here along the x-axis for theseplots you have model size and on they-axis you have accuracy and what Jasonis showing is that at a certain Pointthese really big models just attainthese abilities to do these really hardtasks and Jason estimates that for 137tasks models are showing this kind ofemergent ability and that includes tasksthat were explicitly set up to help usstress test our largest language modelthey're just following away one by onereally incrediblenow we're going to talk a little bitlater about the factors that are drivingthis enormous progress for largelanguage models but I want to be upfront that one of the major factors hereis just the raw size of these models youcan see that in Jason's plots that'swhere the emergent ability kicks in andlet me put that in context for you sothis is from a famous platform of paperthat's actually about making modelssmaller and what they did is track therise of you know increases in model sizealong the x-axis we have time depth itonly goes back to 2018. it's not verylong ago and in 2018 the largest of ourmodels had around 100 million parametersseems small by current comparisonsin late 19 in late 2019 early 2020 westart to see a rapid increase in thesize of these models so that by 20 atthe end of 2020 we have this Megatronmodel at 8.3 billion parameters Iremember when that came out it seemedlike it must be some kind of typo Icould not fathom that we had a modelthat was that large but now of coursethis is kind of on the small side soonafter that we got an 11 billionparameter variant of that model and thengpd3 came out that says 175 billionparameters and that one too now lookssmall in comparison to these trulygargantuan Megatron models and the Palmmodel from Google which surpassed 500billion parametersI want to emphasize that this has made acomplete mockery of the y-axis of thisplot to capture the scale correctly wewould need 5 000 of these slides stackedon top of each other again it stillfeels weird to say that but that is thetruth the scale of this is absolutelyenormous and not something I think thatI would have anticipated way back whenwe were dealing with those hundredmillion parameter babies by comparisonthey seem large to me at that pointso this brings us to our centralquestion it's a golden age this is allundoubtedly exciting and the things thatI've just described to you are going tohave an impact on your lives positiveand negative but certainly an impactbut I take it that we are here todaybecause we are researchers and we wouldlike to participate in this research andthat could leave you with a kind ofworried feeling how can you contributeto nlu in this era of these gargantuanmodels I've set this up as a kind offlow chart first question do you have 50million dollars and a love of deeplearning infrastructureif the answer is yes to this questionthen I would encourage you to go off andbuild your own large language model youcould change the world in this way Iwould also request that you get in touchwith me maybe you could join my researchgroup and maybe fund my research groupthat would be wonderfulbut I'm assuming that most of you cannottruthfully answer yes to this questionI'm in the no Camp right and on bothcounts I am both dramatically short ofthe funds and I also don't have a loveof deep learning infrastructure so forthose of us who have to answer no tothis question how can you contributeeven if the answer is no there are tonsof things that you can be doing allright so just topics that are front ofmind to me include retrieval augmentedin context learning this could be smallmodels that are performantyou could always contribute to creatingbetter benchmarks this is a perennialchallenge for the field and maybe themost significant thing that you can dois just create devices that allow us toaccurately measure the performance ofour systemsyou could also help us solve what I'vecalled The Last Mile problem forproductive applications these Centraldevelopments in AI take us 95 percent ofthe way toward utility but that lastfive percent actually having a positiveimpact on people's lives often requirestwice as much development twice as muchInnovation across domain experts peoplewho are good at human computerinteraction and AI experts right andthere's so there's just a huge amountthat has to be done to realize thepotential of these Technologiesand then finally you could think aboutachieving faithful human interpretableexplanations of how these models behaveif we're going to trust them we need tounderstand how they work at a humanlevel that is supremely challenging andtherefore this is incredibly importantwork you could be doingnow I would love to talk with you aboutall four of those things and reallyelaborate on them but our time is shortand so what I've done is select onetopic retrieval augmented in contextlearning to focus on because it's it'sintimately connected to this notion ofin-context learning and it's a placewhere all of us can participate in lotsof innovative ways so that's kind of thecentral plan for the day before I dothat though I just want to help us getmore common ground around what I take tobe the really Central change that'shappening as a result of these largelanguage models and I've put that underthe heading of the rise of in-contextlearningagain this is something we're allgetting used to it really remarks agenuine paradigm shift I would sayin context learning really traces to thegpt3 paper there are precedents earlierin the literature but it was the gpt3paper that really gave it a thoroughinitial investigation and showed that ithad promised with the earliest GPTmodels here's how this works we have ourbig language model and we prompt it witha bunch of text so for example this isfrom that gpt3 paper we might prompt themodel with a context passage and a titlewe might follow that with one or moredemonstrations here the demonstration isa question and an answer and the goal ofthe demonstration is to help the modellearn in context that is from The Promptwe've given it what Behavior we'retrying to elicit from and so here youmight say we're trying to coax the modelto do extractive question answering tofind the answer as a substring of thepassage we gave it you might have a fewof those and then finally we have theactual question we want the model toanswer we prompt the model with thisprompt here that puts it in some Stateand then its generation is taken to bethe prediction or response and that'show we assess its successand the whole idea is that the model canlearn in context that is from thisprompt what we want it to doso that gives you a sense for how thisworks you've probably all promptedlanguage models like you like thisyourself already I want to dwell on thisfor a second though this is a reallydifferent thing from what we used to dothroughout artificial intelligence letme contrast in context learning with thestandard Paradigm of standardsupervisionback in the old days of 2017 or whateverwe would typically set things up likethis we would have say we wanted tosolve a problem like classifying textsaccording to whether they expressnervous anticipation a complex humanemotion the first step would be that wewould need to create a data set ofpositive and negative examples of thatphenomenon and then we would train acustom built model to make the binarydistinction reflected in the labels hereit can be surprisingly powerful but youcan start to see already how this isn'tgoing to scale to the complexity of TheHuman Experience we're going to needseparate data sets and maybe separatemodels for optimism and sadness andevery other emotion you can think of andthat's just a subset of all the problemswe might want our models to solve foreach one we're going to need data andmaybe a custom built modelthe promise of in-context learning isthat a single big frozen language modelcan serve all those goals and in thismode we do that prompting thing that Ijust described we're going to give themodel examples just expressed in flattext of positive and negative instancesand hope that that's enough for it tolearn in context about the distinctionwe're trying to establish this is reallyreally different consider that over herethe phrase nervous anticipation has nospecial status the model doesn't reallyprocess it it's entirely structured tomake a binary distinction and the labelnervous anticipation is kind of for uson the right the model needs to learnessentially the meanings of all of theseterms and our intentions and figure outhow to make these distinctions on newexamples all from a promptit's just weird and wild that this worksat all I think I used to be discouragingabout this as an Avenue and now we'reseeing it bear so much fruitwhat are the mechanisms behind this I'mgoing to identify a few of them for youthe first one is certainly theTransformer architecture this is thebasic building block of essentially allthe language models that I've mentionedso far we have great coverage of theTransformer in our course naturallanguage understanding so I'm going todo this quickly the Transformer startswith word embeddings and positionalencodings on top of those we have abunch of attention mechanisms these givethe name to the famous paper attentionis all you need which announce theTransformer evidently attention is notall you need because we have thesepositional encodings at the bottom andthen we have a bunch of feed forwardlayers and regularization steps at thetopbut attention really is the BeatingHeart of this model and it really was adramatic departure from the fancymechanisms lstms and so forth that werecharacteristic of the pre-transformereraso that's essentially though on thediagram here the full model in thecourse we have a bunch of materials thathelp you get Hands-On with Transformerrepresentations and also dive deep intometh into the mathematics so I'm justgoing to skip past this I will say thatif you dive deep you're likely to gothrough the same Journey we all gothrough where your first question is howon Earth does this work this diagramlooks very complicatedbut then you come to terms with it andyou realize oh this is actually a bunchof very simple mechanismsbut then you arrive at a question thatis a burning question for all of us whydoes this work so well this remains anopen question a lot of people areworking on explaining why this is soeffective and that is certainly an areain which all of us could participateanalytic work understanding why this isso successfulthe second big innovation here is arealization that what I've calledself-supervision is an incrediblypowerful mechanism for acquiring Richrepresentations of form and meaning thisis also very strange in self-supervisionthe model's only objective is to learnfrom co-occurrence patterns in thesequences it's trained on this is purelydistributional learning another way toput this is the model is just learningto assign high probability to attestedsequencesthat is the fundamental mechanism wethink about these models as generatorsbut generation is just sampling from themodel that's a kind of secondary orderivative process the main thing islearning from these co-occurrencepatterns an enlightening thing about thecurrent ERA is that it's fruitful forthese sequences content to contain lotsof symbols not just language butcomputer code sensor readings evenimages and so forth those are all justsymbol streams and the model learnsassociations among themthe core thing about self-supervisionthough that really contrasts it with thestandard supervised Paradigm I mentionedbefore is that the objective doesn'tmention any specific specific symbols orrelations between them is entirely aboutlearning these co-occurrence patternsand from this simple mechanism we getsuch Rich resultsand that is incredibly empoweringbecause you need hardly any human effortto train a model with self-supervisionyou just need vast quantities of thesesymbol streams and so that hasfacilitated the rise of anotherimportant mechanism here large-scalepre-training and there are actually twoinnovations that are happening hereright so we see the rise of largepre-scale pre-training in the earliestwork on static word representations likeword to VEC and gloveand what those teams realize is not onlythat it's powerful to train on vastquantities of data using justself-supervision but also that it'sempowering to the community to releasethose parameters not just data not justcode but the actual learnedrepresentations for other people tobuild on that has been incredible interms of building effective systemsafter those we get Elmo which was thefirst Model to do this for contextualword representations truly largelanguage models then we get Bert ofcourse and GPT and then finally ofcourse gpt3 at a scale that was reallypreviously unimagined and maybe kind ofunimaginable for mea final piece that we should notOverlook is the role of human feedbackin all of this and I'm thinking inparticular of the open AI models I'vegiven a lot of coverage so far of thismechanism of self-supervision but wehave to acknowledge that our best modelsare what openai calls the instructmodels and those are trained with waymore than just self-supervisionthis is a diagram from the chat GPT blogpost it has a lot of details I'mconfident that there are really twopieces that are important first thelanguage model is fine-tuned on humanlevel supervision just making binarydistinctions about good generations andbad ones that's already Beyondself-supervision and then in a secondphase the model generates outputs andhumans rank all of the outputs the modelhas produced and that feedback goes intoa lightweight reinforcement learningmechanism in both of those phases wehave important human contributions thattake us beyond that self-supervisionstep and kind of reduce the magicalfeeling of how these models areachieving so muchI'm emphasizing this because I thinkwhat we're seeing is a return to afamiliar and kind of cynical soundingstory about AI which is that many of thetransformative step forwards areactually on the back of a lot of humaneffort behind the scenes expressed atthe level of training databut on the positive side here it isincredible that this human feedback ishaving such an important impact instructmodels are best in class in the fieldand we have a lot of evidence that thatmust be because of these human feedbacksteps happening at a scale that I assumeis astounding they must have at open AIlarge teams of people providing veryfine-grained feedback across lots ofdifferent domains with lots of differenttasks in mindfinal piece by way of backgroundprompting itself this has been a realjourney for all of us I've describedthis as step by step and Chain ofThought reasoningto give you a feel for how this ishappening let's just imagine that we'veposed a question like can our modelsreason about negation that is if wedidn't eat any food does the model knowthat we didn't eat any pizzain the old days of 2021 we were so naivewe would prompt models with just thatdirect question like is it true that ifwe didn't need any food then we didn'teat any pizza and we would see what themodel said in Returnnow in 2023 we know so much and we havelearned that it can really help todesign a prompt that helps the modelreason in the intended ways this isoften called step-by-step reasoninghere's an example of a prompt that wasgiven to be by Omar khatab you start bytelling it it's a logic and Common Sensereasoning exam for some reason that'shelpful then you give it some specificinstructions and then you use somespecial markup to give it an example ofthe kind of reasoning that you wouldlike it to followafter that example comes the actualprompt and in this context what weessentially ask the model to do isexpress its own reasoning and thenconditional on what it has producedcreate an answerand the eye-opening thing about thecurrent ERA is that this can betransformatively better I think if youwanted to put this poetically you'd saythat these large language models arekind of like alien creatures and it'staking us some time to figure out how tocommunicate with them and together withall that instruct fine-tuning with humansupervision we're converging on promptslike this as the powerful device andthis is exciting to me because what'sreally emerging is that this is a kindof very light way of programming an AIsystem using only prompts as opposed toall the Deep learning code that we usedto have to write and that's going to beincredibly empowering in terms of systemdevelopment and experimentationall right so we have our background inplace I'd like to move to my main topichere which is retrieval augmented incontext learning what you're going tosee here is a combination of languagemodels with retriever models which arethemselves under the hood large languagemodels as welllet me start with a bit of the backstory here I think we're all probablyvaguely aware at this point that largelanguage models have beenrevolutionizing search again the star ofthis is the Transformer or maybe morespecifically its famous spokes modelBert right after Bert was announcedaround 2018 Google announced that it wasincorporating aspects of Bert into itscore search technology and Microsoftmade a similar announcement at about thesame time and I think those are just twopublic-facing stories of you know manyinstances of large search Technologieshaving burnt Elements Incorporated intothem in that era and then of course inthe current ERA we have startups likeyou.com which have made large languagemodels pretty Central to the entiresearch experience in the form of youknow delivering results but alsointeractive search with conversationalagents so that's all exciting butI am an NLP or at heart and so for me ina way the more exciting Direction hereis the fact that finally search isrevolutionizing NLP by and helping usbridge the gap into much more relevantknowledge intensive tasks to give you afeel for how that's happening let's justuse question answering as an example soprior to this work in NLP we would postquestion answer in your QA in thefollowing wayyou saw this already with the gpt3example we would have as given at testtime a title and a context passage andthen a questionand the task of the model is to find theanswer to that question as a literalsubstring of the context passage whichwas guaranteed by the nature of the datasetas you can imagine models are reallygood at this task superhuman certainlyat this task but it's also a veryrarified task this is not a natural formof question answering in the world andit's certainly unlike the scenario offor example doing web searchso the promise of the open formulationsof this task are that we're going toconnect more directly with the realworld at in this formulation at testtime we're just given a questionand the standard strategy is to rely onsome kind of retrieval mechanism to findrelevant evidence in a large Corpus ormaybe even the web and then we proceedas beforethis is a much harder problem becausewe're not going to get the substringguarantee anymore because we'redependent on the retriever to findrelevant evidence but of course it's amuch more important task because this ismuch more like our experience ofsearching on the webnow I've kind of biased already indescribing things this way where Iassume we're retrieving a passage butthere is another narrative out there letme skip to this then you could call thislike the llms for everything approachand this would be where there's noexplicit retriever you just have aquestion come in you have a big opaquemodel process that question and outcomes an answer voila you hope that theuser's information need is met directlyno separate retrieval mechanism just thelanguage model doing everything I thinkthis is an incredibly inspiring visionbut we should be aware that there arelots of kind of danger zones here so thefirst is just efficiency one of themajor factors driving that explosion inmodel size that I tracked before is thatin this llms for everything approach weare asking this model to play the roleof both knowledge store and languagecapability if we could separate thoseout we might get away with smallermodelswe have a related problem ofupdateability suppose a fact in theworld changes a document on the webchanges for examplewell you're going to have to update theparameters of this big opaque modelsomehow to conform to the change inrealitythere are people hard at work on thatproblem that's a very exciting problembut I think we're a long way from beingable to offer guarantees that a changein the world is reflected in the ModelBehavior and that plays into all sortsof issues of trustworthiness andexplainability of behavior and so forthalso we have an issue of Providence lookat the answer at the bottom there isthat the correct answer should you trustthis model right in the standard websearch experience we typically are givensome web pages that we can click on toverify at least at the next level ofdetail whether the information iscorrect but here we're just given thisresponse and if the model also generateda provenance string if it told us whereit found the information we'd be leftwith the concern that that provenancestring was also untrustworthy right andthis is like a really breaking afundamental contract that users expectto have with search Technologies Ibelieveso those are some things to worry aboutthere are positives though of coursethese models are incredibly effective atmeeting your information need directlyand they're also outstanding atsynthesizing information if yourquestion can only be answered by 10different web pages it's very likelythat the language model will still beable to do it without you having to huntthrough all those pagesso exciting but lots of concerns herehere is the alternative of retrievalaugmented approaches right oh I can'tresist this actuallyjust to give you an example of howimportant this trustworthy thing can besoum I used to be impressed by DaVinci 3because it would give a correct answerto the question are professionalbaseball players allowed to glue smallWings onto their caps this is a questionthat I got from a wonderful article byHector Levesque where he encourages usto stress test our models by asking themquestions that would seem to run upagainst any simple distributional orstatistical learning model and reallyget it whether they have a model of theworldand for Da Vinci 2 it gave what itlooked like a really good levec styleanswer there is no rule against it butit is not common that seems trueso I was disappointed I guess or I'mactually not sure how to feel about thisone I asked DaVinci 3 the same questionand it said no professional baseballplayers are not allowed to glue smallWings onto their caps Major LeagueBaseball has strict rules about theappearance of players uniforms and capsin any modifications of the Caps are notallowedthat also sounds reasonable to me is ittrue it would help enormously if themodel could offer me at least a web pagewith with evidence that's relevant tothese claims otherwise I'm simply leftwondering and I think that shows youthat we've kind of broken this implicitcontract with the user that we expectfrom searchso that'll bring me to my alternativehere retrieval based or retrievalaugmented NLP to give you a sense forthis at the top here I have a standardsearch box and I've put in a verycomplicated question indeedthe first step in this approach isfamiliar from the llms for everythingone we're going to encode that queryinto a dense numerical representationcapturing aspects of its form andmeaning we use a language model for thatthe next step is new though we are alsogoing to use a language model maybe thesame one we use for the query to processall of the documents in our documentcollectionso each one has some kind of numericaldeep learning representation now on thebasis of these representations we cannow score documents with respect toqueries just like we would in thestandard good old days of informationretrieval so we can reproduce everyaspect of that familiar experience if wewant to we're just doing it now in thisvery rich semantic spaceso we get some results back and we couldoffer those to the user as rankedresults but we can also go further wecould have another language model callit a reader or a generator slurp upthose retrieve passages and synthesizethem into a single answer maybe meetingthe user's information need directlyright so let's check in on how we'redoing with respect to our goals herefirst efficiency I won't have time tosubstantiate this today but thesesystems in terms of parameter counts canbe much smaller than the integratedapproach I mentioned beforewe also have an easy path toupdateability we have this index here soas Pages change in our document store wesimply use our Frozen language model toreprocess and re-represent them and wecan have a pretty good guarantee at thispoint that information changes will bereflected in the retrieved results downherewe're also naturally tracking Providencebecause we have all these documents andthey're used to deliver the results andwe can have that carry through into thegeneration so we've kept that contractwith the userthese models are incredibly effectiveacross lots of literature we're seeingthat retrieval augmented approaches arejust superior to the fully integratedllms for everything oneand we've retained the benefit of llmsfor everything because we have thismodel down here the reader generatorthat can synthesize information intoanswers that meet the information needdirectlyso that's my fundamental pitch now againthings are changing fast and even theapproach to designing these systems isalso changing really fast so in the inthe previous era of 2020we would have these pre-trainedcomponents like we have our index andour retriever maybe we have a languagemodel like reader generator and youmight have other pre-trained componentsimage processing and so forth so youhave all these assets and the questionis how are you going to bring themtogether into an integrated solutionthe standard deep learning answer tothat question is to define a bunch oftask specific parameters that are meantto tie together all those components andthen you learn those parameters withrespect to some task and you hope thatthat has kind of created an effectiveintegrated systemthat's the modular vision of deeplearning the truth in practice is thateven for very experienced researchersand system designers this can often goreally wrong and debugging these systemsand figure out how figuring out how toimprove them can be very difficultbecause they are so opaque and the scaleis so largebut maybe we're moving out of an era inwhich we have to do this at allso this will bring us back to in-contextlearning the fundamental Insight here isthat many of these models can inprinciple communicate in naturallanguage right so a retrieveris abstractly just a device for pullingin text and producing text with scoresand a language model is also a devicefor pulling in text and producing textwith scoresand we have already seen in my basicpicture of retrieval augmentedapproaches that we could have theretriever communicate with the languagemodel via retrieve resultswell what if we just allow that to go inboth directions now we've got a systemthat is essentially constructed byprompts that help these models domessage passing between them inpotentially very complicated ways anentirely new approach to system designthat I think is going to have anincredible democratizing effect on whodesigns these systems and what they'reforlet me give you a deep sense for justhow wide open the design space is hereagain to give you a sense for how muchof this research is still left to bedone even in this Golden Eralet's imagine a search context thequestion is what course to takewhat we're going to do in this new modeis begin a prompt that contains thatquestion just as beforeand now what we can do next is retrievea context passage that'll be like theretrieval augmented approach that Ishowed you at the start of this sectionright you could just use our retrieverfor that but there's more that could bedone what about demonstrations let'simagine that we have a little train setof QA pairs that kind of demonstrate forour system what the intended behavior iswell we can add those into the promptand now we're giving the system a lot offew shot guidance about how to learn incontext rightbut that's also just the beginning Imight have sampled these trainingexamples randomly from my train set butI have a retriever remember and so whatI could do instead is find thedemonstrations that are the most similarto the user's question and put those inmy prompt with the expectation that thatwill help it understand kind of topicalcoherence and lead to better resultsbut I could go further right I could usemy retriever again to find relevantcontext passages for each one of thosedemonstrations to further help it figureout how to reason in terms of evidenceand that also opens up a huge designspace we could do what we call hindsightretrieval where for each one of thesewe're using both the question and theanswer to find relevant context passagesto really give you integratedinformational packets that the model canbenefit fromand there's lots more that we could dowith these demonstrations you'reprobably starting to see it right wecould do some rewriting and so forthreally make sophisticated use of theRetriever and the language modelinterwovenwe could also think about how weselected this background passage I wasassuming that we would uh just retrievethe most relevant passage according toour questionbut we could also think about rewritingthe user's query in terms of thedemonstrations that we could constructedto get a new query that will help themodel that's especially powerful if youhave a kind of interactional mode wherethe demonstrations are actually part oflike a dialogue history or somethinglike thatand then finally we could turn ourattention to how we're actuallygenerating the answer I was assuming wewould take the top generation from thelanguage model but we could do much morewe could filter its generations to justthose that match a substring of thepassage reproducing some of the old modeof question answering but now in thiscompletely open formulation that can beincredibly powerful if you know yourmodel can retrieve good backgroundpassages herethose are two simple steps you couldalso go all the way to the Other Extremeand use the full retrieval augmentedgeneration or rag model which is thatessentially creates a full probabilitymodel that allows us to marginalize outthe contribution of passages that can beincredibly powerful in terms of makingmaximal use of the capacity of thismodel to generate text conditional onall the work that we did up hereI hope that's given you a sense for justhow much can happen here what we'restarting to see I think is that there isa new programming mode emerging it's aprogramming mode that involves usingthese large pre-trained components todesign in codeprompts that are essentially full AIsystems that are entirely about messagepassing between these Frozen componentswe have a new paper out that's calleddemonstrate search predict or DSP thisis a lightweight programming frameworkfor doing exactly what I was justdescribing for youand one thing I want to call out is thatour results are fantasticnow you know we can Pat ourselves on theback we have a very talented team and soit's no surprise the results are so goodbut I actually want to be upfront withyou I think the real insight here isthat it is such early days in terms ofus figuring out how to construct theseprompts how to program these systemsthat we've only just begun to understandwhat's optimal we have explored only atiny part of the space and everythingwe're doing is sub-optimal and that'sjust the kind of conditions where youget these huge leap forwards leapsforward in performance on these tasks soI suspect that the Bold row that we havehere will not be long lived given howmuch Innovation is happening in thisspaceand I want to make a pitch for ourcourse here right so we have in thiscourse a bunch of assignment slash bakeoffs uh and the way that worksessentially is that you have anassignment that helps you build somebaselines and then work toward anoriginal system which you enter into abake off which is a kind of informalcompetition around data and modelingour newest of these is called few shotopen QA with Colbert retrieval it's aversion of the problems that I've justbeen describing for you this is aproblem that could not even have beenmeaningfully posed five years ago andnow we are seeing students doingincredible Cutting Edge things in thismode it's exactly what I was justdescribing for you and we're in the sortof moment where a student project couldlead to a paper that you know reallyleads to state-of-the-art Performance insurprising ways again because there isjust so much research that has to bedone hereI'm running out of time what I thinkI'll do is just briefly call out againthose important other areas that I'vegiven short trip to today but I thinkare just so important starting with datasets I've been talking about systemdesign and task performance but it isnow and will always be the case thatcontributing new Benchmark data sets isbasically the most important thing youcan do I like this analogy JacquesCousteau said Water and Air the twoessential fluids on which all lifedepends I would extend that to NLP uhour data sets are the resource on whichall progress depends now Cousteauextended this with have become Globalgarbage cans I am not that cynical aboutour data sets I think we've learned alot about how to create effective datasets we're getting better at this but weneed to watch out for this metaphoricalpollution and we need always to bepushing our systems with harder tasksthat come closer to the humancapabilities that we're actuallyactually trying to get them to achieveand without contributions of data setswe could be tricking ourselves when wethink we're making a lot of progressthe second thing that I wanted to callout relates to model explainability youknow we're in an era of incredibleimpact and that has rightly turnedresearchers to questions of systemreliability safety trust approved useand pernicious social biases we have toget serious about all these issues ifwe're going to responsibly have all ofthe impact that we're achieving at thispointall of these things are incrediblydifficult because the systems we'retalking about are these enormous opaqueimpossible to understand analyticallydevices like this that are just cloudingour understanding of them and so to methat shines a light on the importance ofachieving analytic guarantees about ourmodel behaviors that seems to me to be aprerequisite for getting serious aboutany one of these topics and the goalthere in our terms is to achievefaithful human interpretableexplanations of Model Behavior we havegreat coverage of these methods in thecourse Hands-On materials screencastsand other things that will help youparticipate in this research and also asa side effect right absolutelyoutstanding discussion and Analysissections for your papersand the final thing I wanted to call outis just that last mile problemfundamental advances in AI take us 95percent of the way there but that lastfive percent is every bit as difficultas the first 95. in my group we've beenlooking a lot at image accessibilitythis is an incredibly important societalproblem because images are so Central toModern Life across the being on the weband in social media also in the news andour scientific discourse and it's a sadfact about the current state of theworld that almost none of these imagesare made non-visually accessible soblind and low vision users are basicallyunable to understand all this contextand receive all of this informationsomething has to change thatimage based text generation has becomeincredibly good over the last 10 yearsthat's another story of astoundingprogress but it has yet to take us tothe point where we can actually writeuseful descriptions of these images thatwould help a BLB user and that last bitis going to require HCI researchlinguistic research and fundamentaladvances in Ai and by the way lots ofastounding new data sets and this isjust one example of in the innumerablenumber of Applied problems that fallinto this mode and that can be veryexciting for people who have domainexpertise that can help us close thatfinal mileso let me wrap up hereum I don't want to have a standardconclusion I think it's fun to closewith some predictions about the futureand I have put this under the heading ofpredictions for the text next 10 yearsor so although I'm about to retract thatfor reasons I will get to but here arethe predictionsfirst laggard industries that are richin Text data will be transformed in partby NLP technology and that's likely tohappen from some disruptive newcomerscoming out of left fieldsecond prediction artificial assistancewill get dramatically better and becomemore ubiquitous with the side effectthat you'll often be unsure in lifewhether this customer servicerepresentative is a person or an AI orsome team combining the twomany kinds of writing including studentpapers at universities will be done withAI writing assistance and this might betransparently true given howsophisticated auto-complete and othertools have gotten at this pointand then finally the negative effects ofNLP and of AI will be Amplified alongwith the positives I'm thinking ofthings like disinformation spread Marketdisruption systemic biasit's almost sure to be the case if ithasn't already happened already thatthere will be some calamitous worldevent that traces to the intentional orunintentional misuse of some AItechnology that's in our futureso I think these are reasonablepredictions and I'm curious for yoursbut I have to tell you that I made thesepredictions in 2020 two years agowith the expectation that they would begood for 10 yearsbut more than half of them probably havealready come true two and three aredefinitely true about the world we livein and on the flip side I just failed topredict so many important things likethe most prominent example is that Ijust failed to predict the progress wewould see in text image models likeDolly 2 and and stable diffusion in factI'll be honest with you I might have betagainst them I thought that was an areathat was going to languish for a longtime and yet nonetheless seemingly outof nowhere we had this incredible set ofadvances and there are probably lots ofother areas where I would make similarlybad traditionsum so I said 10 years but I think my newrule is going to be that I'm going topredict only through 2024 at the veryoutside becausein 10 years the only thing I can saywith confidence is that we'll we will bein a radically different place fromwhere we are now but what that placewill be like is anyone's guess I'minterested in your predictions about itbut I think I will stop herethank you very muchthank you so much Chris for the engagingand extremely interesting topic andpresentation that you have given I'malways so amazed by all the new thingsyou're mentioning every single time wetalk I feel it is something newsomething exciting you know it's not younot me especially not me like expectedthat you'll be talking about it so soonmany questions came in I must alreadysay we will unfortunately not be able toget to all of them because the time islimited and the audience is so activeand so many people showed up so let mepick a fewum Chris so the cost of the trainingmodel so it seems it really scales withthe size and we are paying a lot ofattention and like putting a lot ofeffort into the training uh so what doesit mean for the energy requirements andI guess you are talking aboutpredictions but like how does it looklike now and like what do you recommendpeople to to pay attention tooh it's a wonderful set of questions tobe answering and critically important Imeanum I ask myself you know you know if youthink about Industries in the world someof them are improving in terms of theirenvironmental impacts some are gettingmuch worse where is artificialintelligence in that is it gettingbetter or is it getting worse I don'tknow the answer because on the one handthe expenditure for training and nowserving for example gpt3 to everyone whowants to use it is absolutely enormousand has real costs like measured inemissions and things like that on theother handthis is a centralization of all of thatand that can often bring real benefitsand I I want to not forget of theprevious era where every single persontrained every single model from scratchand so now a lot of our research isactually just using these Frozencomponents they were expensive but theexpenditure of our lab is probably goingway down because we are not trainingthese big modelsit kind of reminds me of that last mileproblem again in the previous era it waslike we were all driving to pick up ourgroceries everywhere huge expenditurewith all those individual trips now it'smuch more like they're all brought tothe end of the street and we walk to getthembut of course that's done in big trucksand those have real consequences as wellI don't know but I hope that a lot ofsmart people work continue to work onthis problem and that'll lead tobenefits in terms of us doing all thesethings more efficiently as wellthank you so muchum the next question and you touched onthat a few times but it might be good tosummarize that a little bit uh becausewe got a lot of the questions about kindof the trustworthiness and if the modelactually knows that it's wrong orcorrect and like how do how do we trustthe model or like how do we achieve thetrustworthiness of the model becauseright now it's a lot of the generationhappening generative models happening solike how do we pass thatit's a an incredibly good question andit is the thing I have in mind whenwe're doing all our work on explainingmodelsbecause I feel like offering faithfulhuman interpretable explanations is thestep we can take toward trustworthinessit's a very difficult problem I justwant to add that umit might be even harder than we'veanticipated because people are alsopretty untrustworthyit's just that individual people oftendon't have like a systemic effect rightso if you're really doing a poor job atsomething you probably impact just ahandful of people and other people sayat your company do a much better job butthese AIS are now it's like they'reeveryone and so any kind of smallproblem that they have is Amplifiedacross the entire population theyinteract with and that's going toprobably mean that our standards fortrustworthiness for them need to behigher than they are for humans andthat's another sense in which they'regoing to have to be superhuman toachieve the jobs we're asking of themand the field cannot offer guaranteesright nowso come help usfascinating thank you so much and like Isaw also some questions or commentsabout the bias in data and like youmentioned it also right like like we areimproving like there is a bigImprovement happeningumlast question for youum like a little bit of a thoughtexperiment but like do you think thatthe large language models might be ableto come up with answers to as yetunanswered important scientificquestions like something we are not evensure that it even exists like in ourminds right now oh it's a wonderfulquestion yet and people are asking thisacross multiple domains like they'reproducing incredible artwork but are wenow trapped inside a feedback loopthat's going to lead to less trulyInnovative art and and if we ask them togenerate text are they going to doeither weird irrelevant stuff or justmore of the boring average case stuffumI don't know the answer I will saythough that these models have anincredible capacity to synthesizeinformation across sourcesand I feel like that is a source ofinnovation for humans as well simplymaking those connections and it might betrue that there is nothing new Under theSun but there are lots of newconnections perspectives and so forth tobe hadand I actually do have faith that modelsare going to be able to at leastsimulate some of that and it might lookto us like innovationbut this is not to say that this is uhnot a concern for us it should besomething we think about especiallybecause we might be heading into an erawhen whether we want them to or notmostly these models are trained on theirown output which is being put on the weband then consumed when people createtrain sets and so forth and so ongreat thank you so much and we arenearing the end so like last Pointum do you have any like last remarks anyanything anything interesting you wouldsuggest others to look at follow readum learn about to kind of get moreacquainted with the subject like learnmore about the nlu gpt3 other largelanguage models any recommendationsthe thing that comes to mind based onall the interactions I have with theprofessional development students who'vetaken our course before is that a lot ofyou I'm guessing have incredibly valuevaluable domain expertise you work in anindustry in a position that he hastaught you tons of things and giving youlots of skills and my Last Mile problemshows you that that is relevant to Aiand therefore you could bring it to bearon AI and we might all benefit where youwould be taking all these Innovationsyou can learn about in our course andother courses combining that with yourdomain expertise and maybe actuallymaking progress in a meaningful way on aproblemas opposed to merely having demos andthings that our scientific Communityoften produces real impactso often requires real domain expertiseof the sort you all havebeginning of the quarter hectic Stanfordlive I'd really appreciate you takingthe time to to do this to run thiswebinar thank you also everyoneeverybody who had a chance to join uslive or like who is watching thisrecording if you could please let usknow what kind of other topics you mightbe interested in in this sort of a freewebinar structure we have a littlesurvey uh down on the consoleum and yeah I hope you all have a greatday a wonderful start of the of or likeend of the winter start of the springand yeah thank you everybody for joiningus yeah Petra this is wonderful we gotan astounding number of really greatquestions it's too bad we're out of timethere's a lot to think about here and sothat's just another thank you to theaudience for all this food for thoughtthank you