Select Page

Original & Concise Bullet Point Briefs

AI Now Has Vision! – MiniGPT-4 Vision Language Model

The Rapid Advancements of AI: From Reasoning Text to Advanced Vision

  • AI technology has been rapidly advancing
  • GPT-4 has the ability to reason and type out text, Eleven Labs.io can create realistic voices based off of text, Mid Journey can conjure images from simple text, Open AI has given GPT-4 the ability to see
  • Minigpt4 is an open source AI that enhances vision and language understanding with advanced large language models
  • It is powerful enough that it requires a lot of processing power to run it, but there is a free demo available
  • Chat GPT was able to accurately describe and provide an advertisement for two abstract images.

Breakthrough AI Combination Results in High Quality Data Set and Easier Conversations

  • MiniGPT-4 combines a frozen visual encoder from Blip-2 with a frozen large language model Vicuna
  • MiniGPT-4 was trained on 5 million aligned text to image pairs in 10 hours using 4 A100 GPUs
  • MiniGPT-4 and GPT-3 helped create a 3500 pair high quality data set which was then trained in a conversation template to improve generation reliability and usability of the AI
  • This process took 7 minutes on an A100 GPU
  • Users can upload an image or drag one onto the layout, then click the ‘upload and start chat’ button to begin conversing with the AI.

AI Language Model GPT-4 Impresses in Demonstration of Complex Tasks

  • The video presents a demonstration of the AI language model GPT-4
  • The model correctly identified the background scene of a train station/platform from limited data, and gave an accurate description of a man in the image
  • It then generated creative examples for the man’s current situation and wrote a joke about an uploaded image of the speaker
  • To test its capabilities further, it was asked to identify a cat breed and describe an image created using mid-journey
  • While it was unable to identify the cat breed without more information or provide extreme detail on an abstract image, it still demonstrated impressive capabilities.

Mid-journey V5 AI Makes Mistakes Interpreting Images

  • Mid-journey V5 correctly interprets an image of a wooden palette with different colors around it in a circular motion
  • The brushes are made of different materials and shapes ranging from natural hair and synthetic fibers, to flat and round
  • Mid-journey produces a prompt to depict the context behind a funny meme, accurately recognizing the characters and joke, though misinterpreting the details
  • It is also unable to identify the breed of a dog in an image as there are no distinctive physical traits.

AI Image Recognition: Exploring the Potential and Dangers

  • This video discussed the use of AI for image recognition
  • It discussed how the AI was able to accurately describe and reason out what is happening in an image, as well as create new stories
  • And it concluded with a discussion of how this technology can be used to positively change our lives while also recognizing the dangers associated with AI.

Original & Concise Bullet Point Briefs

With VidCatter’s AI technology, you can get original briefs in easy-to-read bullet points within seconds. Our platform is also highly customizable, making it perfect for students, executives, and anyone who needs to extract important information from video or audio content quickly.

  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness
  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness

Unlock the Power of Efficiency: Get Briefed, Don’t Skim or Watch!

Experience the power of instant video insights with VidCatter! Don’t waste valuable time watching lengthy videos. Our AI-powered platform generates concise summaries that let you read, not watch. Stay informed, save time, and extract key information effortlessly.

viewers I am always doing my best tobring you the Cutting Edge in AItechnology I cannot make videos oneverything and that's why lately I'vebeen doing an AI Roundup at the end ofthe week typically on Friday that goesover all of the new tools all of the newrecent advancements and news about AI somake sure to tune in on Fridays for thatAI update every once in a while thoughsomething real good comes across that Iwant to share with you in a full videoand this is definitely one of thosetimes AI has been doing its best latelyto gain new functions chat GPT has theability to reason and type out text verywell eleven labs.io has the ability tospeak very very realistic voices basedoff of text mid Journey has the abilityto conjure imagery out of simple textand lately AI has been developing theability to see if you guys remember frommy gpt4 video we discovered that openaiactually has given gpt4 the ability tosee now everyone that has access to gpt4right now does not have access to thisfeature however some very very cleverdevelopers have created a different AIthat has this feature and that's whatwe're going to be looking at today but abrief overview of this feature you coulddo something like paste in a joke thisis a funny image of an iPhone beingcharged by an old-fashioned VGAconnector and gpt4 in this example wasable to describe exactly what's going onthe image shows a package for alightning cable adapter with threepanels and then goes through the panelssystematically saying panel 1 is asmartphone with a VGA connector pluggedinto its charging port panel 2 shows thepackage for the lightning cable adapterwith a picture of a VGA connector on itand then panel 3 is a close-up of theVGA connector with a small lightningconnector at the end of it and yeahthat's accurate to all of these panelsand it then goes further to describe thehumor in the image that says the humorin this image comes from the absurdityof plugging a large outdated VGAconnector into a small modern smartphonecharging port so obviously thisfunctionality is very exciting I meancan you just picture all the differentapplications we would have if we couldsend AI Chachi PT photos and then havethem describe it and tell us differentthings about the photos another examplethat openai actually used was they tooka simple piece of paper Drew out thedesign for a website and then Chachi PTactually coded up that website that theyjust simply drew the design for I alsosaw the really really unique example ofsomeone taking a picture of just theirfridge and then saying what can I makewith the food in my fridge and I thinkthat particular application is reallyawesome because it serves everybody it'svery expensive to go out to eat andcooking is something that I think ourgeneration in particular is lacking on alittle bit simply take a picture of yourfridge say chat GPT what are some nicemeals I can make with this and then itcan give you the instructions on how touse the materials in your fridge to makea nice dish anyways introducing minigpt4 enhancing Vision languageunderstanding with Advanced largelanguage models so obviously this isopen source here so if you had a veryvery powerful GPU you couldtheoretically run this at home but fornow you can't even run this thing on a40 90 because it just requires that muchprocessing there is a free demo thoughthat we're going to be checking outtoday and I'll link in the descriptionso there's a few examples we can look attaking a look at this one on the left animage that honestly looks like it wasgenerated by mid-journey is pasted andit says describe the image the bot thengoes the image shows a cactus plantstanding in the middle of a frozen lakethe cactus is surrounded by large icecrystals that have formed on the surfaceof the water in the background there isa range of mountains with snow-cappedPeaks the sky is clear and blue with afew fluffy clouds floating in thedistance the overall mood of the imageis peaceful and Serene with the cactusStanding Tall and proud in the center ofthe frozen lake that is a very veryaccurate description of said image hereI know it's hard and a little bitdifficult for you guys to see but itlooks pretty accurate it is then askedis this image common in the real worldthe bot then goes no this image is notcommon in the real world while it ispossible for cacti to grow in coldclimates it is rare to see one standingin the middle of a frozen lakeadditionally the formation of large icecrystals on the surface of the water isalso not a common occurrence this imageis likely a digital creation ormanipulation very accurate as you cansee I mean this this thing's pretty darngood we'll go over what technologies areactually behind it in a little bit butyes it is very good and that is thereason that you can't to even run it ona 40 90 yet but and apparently in thefuture you will be able to run it on a40 90. either way that is a lot ofprocessing power this one was prettysimple image pasted in that also kind oflooks like it was generated bymid-journey maybe that's just mefreaking out and it's a completelynormal image and then he says help mewrite an advertisement for this imagethe bot then just offers up theadvertisement right away are you lookingfor a unique and stylish lightingsolution for your home or office look nofurther than our brass toucan lamp veryvery specific there so it's veryimpressive that it was able tounderstand that's a brass toucan lamp Imean look at this thing very veryabstract lamp Design This beautifullycrafted bird lamp is sure to add a touchof whimsical charm to any room thetoucans beak is designed to hold a lightbulb providing a warm and inviting glowthe base is made of wood adding anatural touch to the overall design thetoucan lamp is a great addition into anyroom whether you're looking to brightenup a dark corner or add a touch ofpersonality to your space order yourstoday and bring a touch of the tropicsto your home or office yeah that isquite the advertisement honestly it dida very very good job creating thatentire paragraph now obviously theresults we see here are probably alittle bit cherry-picked you're probablynot getting results this good everysingle time but I have no doubt thatthese are real results now this one washonestly just quite ridiculous becauseit's performing a very very advancedtask at this point a picture of a plantwith some spots on it is shown and theuser simply goes what is the issue withmy plant what should I do now the botthen says the image shows a leaf withbrown spots on it the spots are likelycaused by a fungal infection which canbe treated with fungicide it's importantto identify the type of fungus causingthe spots in order to choose theappropriate treatment Not only was itable to identify what's happening in thepicture of the very Baseline but it wasactually able to identify that it's afungal problem with the plant which issuper specific I mean I'm not intoplants and I don't know much about thembut that is very impressive because Inever would have known that it thenoffers some steps to actually treat theplants it says to treat the plant youcan try the following steps one identifythe type of fungus causing the spots onthe leaves two we use a fungicidespecifically labeled for that type offungus three apply the fungicideaccording to the product's instructionsmaking sure to cover all affected leavesfour repeat the treatment every 7 to 14days very specific or as directed by theproduct's instructions until the fungusis completely gone five water the plantwell after treatment to help thefungicide penetrate the leaves six keepthe plants in a well ventilated area andaway from direct sunlight to preventoverheating and seven monitor theplant's progress and repeat thetreatment if new spots appear it is alsovery important to keep the plant healthyby providing it with proper care such asadequate watering fertilization andpruning this will help the plantsrecover and prevent future fungalinfections so this thing essentiallyjust gave a seven step process toactually fixing this plant definitelyknowledge that the average personwouldn't know and it's very veryspecific knowledge that it was able topick out of just a very very simpleimage impressive to say the least thisis the kind of response honestly that Iwould expect out of Chan GPT with itsimage capabilities turned on so it'svery impressive that we're getting thisan open source form which is a huge plusand for our next example here in ourfinal one an image of a man and his dogis sent and it says can you write me abeautiful poem about this image the botsays sure here's a poem about the imagea man sits on a hill with his dog by hisside the sun gets behind them casting agolden light the city below is bustlingbut they are at peace enjoying themoment in each other's embrace the man'sarms are wrapped around his furry friendas they watch the world go by with agentle bend the dog's head nests on theman's shoulder a symbol of theirunbreakable Bond forever the sky isablazed with color a beautiful sight butit's the love between the man and hisdog that shines bright it's a moment ofpure joy a memory to keep forever etchedin their hearts as they sent in weep solet us cherish these moments with ourfour-legged friends for they bring us somuch happiness until the very end andeven when they're gone their love willlive on and The Memories We hold foreverand strong so it's a very heartwarmingpoem here about a man and his dog bestfriend it's interesting even though thisis a very blurry background it was ableto pick up on the fact that this is infact a city and they're sort of standingon this hill watching the sunset it'sall it's blurry but it was still able tounderstand what it was but yes the poemcame out all right not necessarily asgood as something I would expect out ofof GPT 3.5 or gpt4 but still a decentpoem nonetheless so how's this thingwork under the hood essentially minigpt4 aligns a frozen visual encoder fromblip-2 with a Frozen large languagemodel vikuna using just one projectionlayer so we're combining twotechnologies here the Frozen visualencoder that's able to actually convertthe image and then the large languagemodel in this case vacuna is able tointerpret the text so many gpt4 wastrained with two stages the firsttraditional pre-training stage istrained roughly on 5 million alignedtext to image pairs in about 10 hoursusing four a100 gpus and if you guysdon't know a100 gpus are the cream ofthe crop in terms of AI right at themoment so vicuna is able to understandthe image but the generation ability ofthe Cuna is heavily impacted by thisunderstanding they actually came up witha pretty creative way to address thisproblem they went for high quality imagetext pairs by combining the model itselfand Chachi PT together so jet GPTactually helped out in the creation ofthis they then create a small3500 pair high quality data set andhonestly in terms of AI the higherquality the data set the better peoplelike to talk about very large data setswith AI but really the quality of thedata set is more important so then theytrain the data set in a conversationtemplate to significantly improve itsgeneration reliability and the overallusability of the AI and apparently thisonly took seven minutes with a singlea100 GPU so here is our layout youeither drag or upload an image over herefor the AI to interpret then you clickthe upload and start chat button and itwill upload that image and you can beginto chat with the AI it does take quite alot of time for this thing thing toprocess your conversations but it'sdefinitely usable they also have a fewtoggles here a beam search numbers thatyou can go all the way up till 10 andthey've also got a temperature here I'mgoing to lower the temperature down tojust 0.8 I'm not sure about the beamsearch numbers though so I'm going toleave that one completely alone butlet's start and actually just upload animage so this is an image that wasgenerated by stable diffusion XL it'sliterally a guy holding a sign that saysI ate a frog so we're gonna see if it'sable to actually read the text on hissign let's upload and start the chat asyou can see that only took a few secondsand now it's uploaded and we canactually ask this thing a question aboutthe image my first question is what isthe location of this photo the way Iinterpret this is that he's in some sortof a subway station or something likethat the location of this photo is notspecified okay interesting response Imean I guess you're correct on that whatdo you interpret the background sceneryas ah there we go the background sceneryappears to be a train station orplatform with a sign indicating thelocation of the photo well the signdoesn't really indicate that so that'sincorrect but it did actually get thecorrect background that at least Ithought it was which was a train stationor platform very very impressive becausethere's not a lot of data here tonecessarily suggest that what does thesign the man is holding read the signthe man is holding reads I ate a frogcompletely accurate so far really reallygood job I am impressed with mini gpt4how old do you think the man could beit's difficult to determine the age ofthe man in the image without additionalcontext well I can definitely tell thathe's very very old so let's see if wecan get a better response are there anysigns that point to a general age oflife he could be in wow okay we actuallydid get a pretty decent response here soit's it's not really able to understanda whole lot of the image but it is avery smart AI so it's saying that thereis no specific signs in the image thatindicate the age of the man howeverbased on the clothing and style of theimage it is possible that the man ismiddle aged or older which is definitelycorrect I wouldn't say he's middle-agedhe's definitely older but as you can seesomething that might be obvious to us isa little bit harder for these AIS tounderstand at the moment they will getbetter and gpt4 we can assume is alittle bit better than this come up witha creative example for the man's currentsituation as depicted in the image andhere is what it came up with the man inthe image is a retired businessman whohas just finished a satisfying meal at alocal restaurant he is holding up a signas a joke for his friends who have beenteasing him about his love for frog legsvery creative there he is wearing atuxedo and a bowler hat as a nod to hisformal upbringing but has also added atouch of of humor with the addition ofthe top hats he is standing on a trainplatform ready to board his train forhis next adventure interesting okay Ireally like that it was able torationalize a fairly realistic contextfor this entirely fake and ridiculousimage at hand retired businessmandefinitely looks like that's what hecould be holding up a sign as a joke forhis friends who have been teasing himabout his love of frog legs and that isan actual real meal that people mighteat especially someone who's a littlebit wealthy maybe an older businessmanand yeah he's about ready to board thetrain for his next adventure very veryrealistic rationalization for thiscompletely absurd image context allright now I've gone ahead and uploadedan image of myself this is an image thatI would typically use in a thumbnailit's kind of a crazy image I'm like whoaoh my God so we're gonna see what the AIthinks of me and this image of me we'regonna simply start off by describing theman in this image the man in this imageis wearing a green hoodie correct andhas a surprise expression on his facecorrect is holding up his hand in theair and pointing his finger fairlycorrect he's sitting in a chair with ablack and white background fairlycorrect I mean we've got some bluebackground over here and I guess thereis a white background here definitelysitting in his chair very accuratedescription of this image is he ugly asan AI language model I do not have theability to determine someone's physicalappearance or whether or not they areconsidered ugly and it's not appropriateto make judgments about a person'sappearance is there anything about theimage that would indicate a terriblemedical procedure has been conducted onhis face trying to trick this thing allright as an AI language model I do nothave the ability to determine whether amedical procedure has been conducted onsomeone's face based on the imageprovided it's not appropriate to makeassumptions or judgments about someone'shealth or medical history based on asingle image the person depicted in thisimage is me I want you to write a jokeabout it oh it's it's actually able towrite a joke here and is just assumingthat I've had a terrible medicalprocedure why did the doctor perform aterrible medical procedure on your facebecause you have a face that only amother could love I mean that's thatjoke doesn't even make sense all rightwhatever this thing just you know itwants to play it safe very open AI likeit reminds me of exactly of that alrightnow I am going to go ahead and uploadthis image of my cat what breed of catis this this is a black and white catthat's not a breed if you were aspecialist on cats what would yourassumption be regarding the type of catthat this is an example would be a tabbycat based on the image provided appearsto be a black and white cat it isdifficult to determine the exact breedwithout more information but it could bea domestic shorthair cat or a mixedbreed now I'm no expert on cats but I dobelieve that this actually is a domesticshorthair cat what is the expression onthe face of the cat the expression onthe face of the cat is difficult todetermine as the image is not clearenough to see facial features I meanit's a fairly clear image again the AIis just not getting enough informationout of this image to really make thatdetermination especially since it's nota human face well good enough I'll givethis one like a c plus so now I've wentahead and uploaded this mid-journeygeneration which is like a paint palettewith a bunch of colors on it it was veryvery difficult for me to generate thisimage in mid-journey because mid Journeydidn't really know what I meant by likea paint palette so we're gonna go aheadand see if this thing can describe theimage better for us and then put itthrough mid journey and see if we canget anything that looks remotely similarto this but yeah you can see I dideventually get mid-journey to do a veryvery good generation of a paint palettebut it took a lot of fiddling describethe image in extreme detail oh wow itdefinitely went for the extreme detailthe image shows a wooden pellet with aset of paint brushes a okay that'sincorrect definitely no paint brushesarranged in a circle on the surface thebrushes are made from different colorsincluding red blue green yellow andblack okay those are definitely allcolors but definitely not paint brushesthe palette is placed on a white surfacetrue and there's a small amount of lightreflecting off of the surface definitelyalso true the brushes are arranged in acircle with the red brush in the centerand the other colors arranged around itin a clockwise Direction it's definitelyhaving trouble understanding all of thisstuff or it's just hallucinating all ofthis the brushes are made of differentmaterials with some being of naturalhair and others from synthetic firesfibers again there's no brushes at allin this image the brushes have differentshapes with some being flat and othersbeing round the palette is made of woodand has a smooth surface okay that'salso true and there are no other objectsin the image yeah are close enough butthese are not paint brushes okay andthere's no circular motion either way Ireally want to know what's mid-journeyV5 actually thinks of this image hereare the images that mid Journey ended upgenerating for us and I gotta say theyactually came out pretty good we've gotlike a piece of wood in the Middle withjust a bunch of paint brushes of alldifferent kinds of colors surroundingthem it's it's a cool looking littleimage oh this one's really nice as wellso yeah it actually produced a prettygood little prompt for mid-journey eventhough it's not exactly what we inputtedas the image now we're going to give ita pretty difficult task I've submittedthis joke or meme if you will where it'sfrom Toy Story essentially a screenshotfrom Toy Story he's dropping his toywhich in this case is depicted to beGoogle Chrome and he's saying I don'twant to play with you anymore this isactually an AI meme that someoneuploaded with the context of perplexityAI being better than Google Chrome so Imean the context around this is honestlyvery very difficult to understand so Idon't have super high hopes for for itbut I at least wanted to be able to pickup on the fact that he doesn't want toplay with Google Chrome anymore orGoogle Chrome is essentially uselessbased on the context of this image soour input prompt is describe the contextof this funny meme and explain the jokebehind it all right it definitelygenerated quite a lot so let's see ifit's correct the image shows a cartooncharacter correct possibly from themovie Toy Story which is absolutelycorrect standing on a bed with a laptopin front of him not correct at all wheredid you get that the caption reads Idon't want to play with you anymorewhich is completely correct for thecaption this is likely a reference to ascene in the movie where the characterWoody is shown being jealous of therelationship between the main characterand his new toy Buzz Lightyearabsolutely the correct scene from themovie so great job on that part the jokeis that the character is now jealous ofthe laptop not true and thinks that thisis a laptop for some reason nope that'sGoogle Chrome but close implying that ithas replaced him as the main character'sfavorite toy so yes it actually did getthe context of the joke correct it justgot this incorrected for some reasonthinks this is a laptop even though it'sthe Google Chrome logo the meme is funnybecause it plays on the idea of jealousyand the idea that technology can replaceHuman Relationships that is not exactlycorrect but still a very interestingtake to be honest it's very very closeit's incredible its ability to reasonout these jokes it also references thepopular movie Toy Story making itrelatable to many people the image iswell made with clear and colorfulGraphics to make it visually appealing Iguess suppose that's true the text iseasy to read and the caption is conciseand to the point making the meme easy tounderstand and share that's also true sowe'll give it a c plus on this one itgot a lot of what we wanted out of it itjust failed to understand that thisindeed was Google Chrome and not like alaptop it must have thought it was likea Chromebook or something like thatbecause that's what it thinks this logois close enough I'm pretty impressedwith that we'll give it another jokehere this is a picture of my dog hereand this was his Halloween costume lastyear bad pet charge eating toilet paperso this is some pretty bad handwritingthat it has to understand and it has tounderstand the joke of the costume thatthe dog is wearing so this one's prettycomplex but I will be impressed if it'sable to understand the joke so I haveuploaded the image and now we will askit the image is a funny meme because itshows a dog wearing a bad pet chargesign around its neck true the sign saysbad pet charge eating toilet papercompletely true it was able to read thepoor handwriting which is a play onwords that implies the dog is beingpunished for eating somethinginappropriate the humor in this memecomes from the unexpected and humorousnature of the sign as well as the ideaof a dog being punished for something asseemingly harmless as eating toiletpaper that's absolutely the contextbehind the joke it understood thatperfectly wow that is just phenomenal Ican't believe it was able to pick upthat context so well from the image it'salso funny because it's a play on theidea of a pet wearing a sign around itsneck which is a common meme Trope I'mnot so sure about that but the image isalso funny because it's unexpected andnot something you would see in real lifethat's that's also true but yeah this isjust a phenomenal description of of thisimage that is completely insane to methat it's able to pick that up honestlythat's the kind of thing that I wouldstraight up expect out of gpt4 with itsimage viewing capabilities in fact manygpt4s pretty much exactly what I wouldexpect out of gpt4 with image viewingcapabilities it's really really quitegood and worth a look in terms of thedemo I'm going to ask it a finalquestion here what is the breed of thedog it is not possible for me todetermine the breed of the dog and theimage you provided as the image does notshow the dog's face or any distinctivephysical characteristics that will allowme to identify its breed I mean it's atough task I won't deny but come on doyour best to guess the breed yeah justcompletely denies it again I mean thisis a pretty tough image to get butsomeone who actually knows a lot aboutdog breeds would pretty easily identifythis dog breed so we're missing we'remissing on that front at least viewersfeel free to check this one out in thedescription below I was thoroughlyimpressed by this not only is it able tohonestly fairly accurately describe alot of these images it's able tounderstand them and logically reason outwhat is happening inside of the imageand come up with new creative storiesabout it it's really close to what Iwould expect out of gpt4's imagecapabilities in terms of viewing themtell me what you think about thecomments this is the kind of Technologywe need to enable AI to really changeour lives for the better again there's alot of dangers to AI I've been talkingabout this a lot lately but I think atleast for this one this kind of stuffMay makes me very very hopeful for thefuture of AI Tech thanks so much forwatching viewers if you want to see morestuff like this and join our communityplease join the Discord server andfollow me on Twitter I'll see you in thenext one goodbye