Select Page

Original & Concise Bullet Point Briefs

AWS US East-1 Region Experienced Several Outages, What could’ve been the Cause? let us discuss!

Amazon Kinesis Outage Impacts Multiple AWS Services

  • Amazon’s US East 1 region experienced a severe outage on November 25th which affected a number of AWS-specific services and companies that use those services
  • The main issue was with Amazon Kinesis, a proprietary service used for real-time data stream processing
  • Amazon Kinesis was experiencing increased API errors, resulting in other services like Amplify, API Gateway, App Stream 2, AppSync, Athena, Cloud Formation, CloudTrail, CloudWatch, Cognito, DynamoDB, IoT Services and others to suffer as well
  • The problem was solved by Amazon after they reduced the request throttles on the service.

Amazon Outage Causes Disruption Despite Customer Recovery Expectations

  • Amazon kennedy’s experienced an outage at 10 a.m.
  • The company moved to a microservices architecture and likely uses containers for scalability
  • Amazon expects throttles to return to previous levels over the coming hours
  • The exact cause of the issue is unknown, but customers should expect recovery soon
  • There are no official reports from Amazon as to what went wrong.

Original & Concise Bullet Point Briefs

With VidCatter’s AI technology, you can get original briefs in easy-to-read bullet points within seconds. Our platform is also highly customizable, making it perfect for students, executives, and anyone who needs to extract important information from video or audio content quickly.

  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness
  • Scroll through to check it out for yourself!
  • Original summaries that highlight the key points of your content
  • Customizable to fit your specific needs
  • AI-powered technology that ensures accuracy and comprehensiveness

Unlock the Power of Efficiency: Get Briefed, Don’t Skim or Watch!

Experience the power of instant video insights with VidCatter! Don’t waste valuable time watching lengthy videos. Our AI-powered platform generates concise summaries that let you read, not watch. Stay informed, save time, and extract key information effortlessly.

so yesterday november 25th awsu.s east 1 region suffers fromsevere outage that took down a lot ofservices with itfrom alexa amplify streamappsync athena s3dynamodb a lot of aws specific servicesand companies that uses specific awsservices that is located on es east onesuffered as well how about we discusswhat werewrong what was the problem and uhwhat amazon did let's just jump into itso guys uh youuh amazon have several regions datacenters where they locate their servicesright and they have a lot of kind ofproprietary services right andone of them is called amazon kinesisand think of that um think of that astheir version offlavor of kafka where you can do realtime st stream processing of datachannel anything thatis stream like so voicemusic alexayou know video orintense data channel such as gaming youwant to build gamingit all goes through amazon kinesis rightso it processes in real time this isthis is just a replacement for theold style batch job processing such ashadoop whereyou don't get real-time processing rightyou process it nightlybut that's that's that's what it is soamazon kinesisis seems to be the problem hereand so i'm going to read a little bitsnippets of this articleand then we're going to discuss amazonkinesis is experiencing increased apierrors which has caused services likeamplifyapi gateway app stream 2 appsyncathena cloud formation cloudtail cloudwatchcognito dynamodb that's dangerous dynamodeep database going downiot services lambda lexmanaged blockchain s3 sagemakerbridge and workspaces to struggleso a lot of services goes down and thatincludes alsoalexa and that includes amazonmusic that just stopped working i meanas long as you'rein that area which is a uh the us eastnorthern virginia right becausei'm in california i didn't see any ofthese problems in my opinion rightso what what went downis is this this amazon kinesis whichwhich kind of became the hubfor for processing thisuh streams of data again i don't thinknormal services like ec2 or vms wouldsuffer because that doesn't use thatparticular serviceit's only services that requirereal-time processing rightso i'm surprised when i see dynamodb inthat listunless dynamodb in the backend useskinesissomehow right souh robert wallace here on down detectoractually typed in some update he pastedan update i don't know if he works onamazon or not but this is this is veryinteresting it gives usshades more light onto the problemso it's 6 23 pm yesterday we'd like toprovide an update on the issueaffecting the kinesis data streams apiall right thatthat tells us okay it was amazon kinesisall right and other dependent serviceswithin the u.s east 1 region we have nowfully mitigated the impact to thesubsystemwithin kinesis that is responsible forthe processing ofincoming requests and are no longerseeing increased error relates orlatencyoh look at that so now we have someinsight on what happenedthe processing unit in kinesis becauserequest comes into kinesis almost like a reverse proxyrightaccepts a request and now it needs toprocess them in order tomove it to the next uhpipeline right just like kafka again idon't know much about how kinesis workbut kafka is essentially a pops upsystem rightone of the things that pops up systemyou publish and then some other consumerconsumes and then you process it it's acontinuous streamso that the thethe unit that actually processesthe incoming stream was not handling theloadproperly how it happened god knows is itis it ahead of light blocking becausekafka has this problem head oflooking messages that are slow toprocess in generalbut if you have continuous requeststreamscoming and you don't have enough workersto process theseso that will cause a back pressurethat would cause a lot of uhhead-of-line blocking obviously gonnacause slow process these requests willjustnot be able to take all this requestso i'm i'm surprised that you geterrors though right can we just cuethese requests and thentake them long latencies yeah latency isthat explainedlatencies i would understand errors idon't know why we'll get errorsso some some components are failing forsome reasonright that tells me the architectureagain i need to understand thearchitecture to seewhy we were going to get errors if youhave a lot of requestsall right so howeverwe are not yet taking a full trafficloadand are working to relax request uhthrottles on the service so theythrottle the services like okaywe could not possibly handle all theload at once let'smake it a little bit shorterso again i'm allthis is this is all speculating my guessisthere this is this happened to twitterback in 2010my guess is they can't possibly spin upenough services to process the amazonkinesis requestfast enough so that it can handle theloadbecause apparently it went down so thethe the spinning up of this processingunit is not scaling properly so theysaid okaylet's let's stop the requests let's spinup these requeststhis process and so this is like let'sget let's get it rightand then let's open it up for the restof theu.s east asia region that's just a guessbecause otherwise why would you careright ifotherwise just let the requests come andthen like let it getqueued up and you spin up these threeservices and uh the processing unitshorizontally until you give fulfill theload againi need to understand the this this logicbetterbut but yeah if you're interested in inthe case of twitteruh back in the world cupagain this was 2010 so god thisuh it's it's it's a it's it's a littlebit old architecture butwhen they had they had vms and they hadone monolithicrail application and in the world cupeverybody was tweeting every time we hada goaltheir twitter was going down becauseeverybody will go andand tweet and and and and theycould not possibly uhserve this much ofrequests uh with the currentwith their current vms and the momentthey spin up a new vm that takes a longtime to spin upso that it uh to serve these extrarequests so it's just likethe moment they spin up the vm it willjust take it down backthose requests will just flood in andthen take it downso they move to a kind of micro servicesarchitecture service oriented they callit back thenso it was it was interesting um againi'm not sure this is what happening herei'm pretty sure amazon kennedy's isbuilt in a containerized so that it caneasily scale butthere must be something they're nottelling us here over the next few hoursweexpect to relax these throttles toprevious levels we expect customers tobegin seeing recoveryas these throttles are relaxed overthis time frame i think it's back nowit's all good now but let's just take alookas of this morning problems on amazonyep we don't have anythingit's all good no a few reports but idon't think it's majorbut yeah this is this is when ithappened actually at 10 a.mpeople start reporting in and then goesit went down they started solving theproblemso guys what do you think what do youthink happened and nowwould you think do you think amazonwill actually report what went wrongdo you have any ideas what what exactlywent wrong in amazon kidney saysdo you use any of these services did youdid this outage affect youlet me know in the comment section belowi'm gonna see you in the next onehappy holidays goodbye and stay awesome