NICK ORLOVE: Hey, we're latefor the small data meetup.Are you coming?STEPHANIE WONG: I can't.I'm waiting for thisBigQuery job to finish.Can you wait 18.2 seconds?NICK ORLOVE: But thenall the sparkling water'sgoing to be gone.Come on.STEPHANIE WONG:Well, at that point,is it even worth going then?NICK ORLOVE: Well, I bet ifyou use BigQuery's clusteringand partitioningfeatures, your querieswould run muchfaster and cheaper.STEPHANIE WONG: That'd be great.Can you show me how?NICK ORLOVE: Yeah, let's do it.See, BigQuery's a great tool foranalyzing huge amounts of dataquickly, but thereare plenty of timeswhere you only care toanalyze a subset of your data.For example, you mayonly want to lookat data over aparticular period of timeor over a particular column.If you check out theillustration on the screen,you can see that I'mconverting my unordered tableon the left into a datepartition table on the rightand clustering mytags closer together.This is really goingto optimize my querieswhen I analyze my data.Now looking at myGCB project, youcan see that I havea Stack Overflow dataset, which contains a tablewith 10.8 million Stack Overflowquestions which havebeen gathered since 2014.This was pulled from theBigQuery Stack Overflowpublic data set by the way,which everyone has access to.From the schema, you cansee that this table containsa bunch of great informationfrom each Stack Overflowquestion such as the ID,title, body, creation date,number of comments, whatit's been tagged with,view count, et cetera.That's a lot of data across10.8 million Stack Overflowquestions.So let's say you want togather some informationon the types of questionsbeing asked about Androidin the last six months.To accomplish this, I wrotethis simple SQL query.We can see that we only careabout dates in the last sixmonths and anything thathas a tag Android in it.When running this queryagainst my standard table,it has to scan allof the data and allof the tags and thecreation date column,so you can see thatit takes a few secondsand scans the full18 gigabytes of data.Now if we set up thetable to be partitioned,BigQuery will only scanthe parts of the table thatfall into this date range.Ideally, you should start usingpartitioning when you firstcreate your table.However, I'll run a DDL, orData Definition Language,query to create a new tablefrom this existing tableand use the creation underscoredate column as my custompartitioning column.Now let's run that queryagainst my partition table.You can see that it returnsin about a quarter of the timeand scans about a tenthof the amount of data.Impressive.STEPHANIE WONG: So letme get this straight.BigQuery is taking careof all of the hard work?NICK ORLOVE: Yep.You just tell it what datefield you want to partition by,and BigQuery knows exactlywhat partition rangesto use to maximize efficiency.STEPHANIE WONG:But what if I wantto partition my data onsomething like a text column?NICK ORLOVE: That's whereclustering comes in.Let me show you.Looking at thisquery, we're takingadvantage of partitioningbecause we'refiltering by creation date.However, the query still needsa scan across all the tagsto find out which ones areAndroid, even though we're onlylooking for a single keyword.Clustering is a way forBigQuery to optimizethe placement ofyour data and allowsit to store alikedata closer togetherto make queries more efficient.Typically clustering works beston high cardinality fields,which maximizes the amountof benefit you'll get.By the way, you cancluster on multiple columnsso you aren'tlimited to just one.Just like partitioning,you should set upclustering at the get go.So I'll run a similar DDL asbefore to create a new table,but this time we'll clusterour data using the column tags.So now if we run thesequeries with clustering,we can see that we get ourresults slightly faster,and we're now scanning megabytesof data instead of gigabytes.STEPHANIE WONG: So if Iuse BigQuery's partitioningand clustering, I'llonly scan the datathat I need to saveboth time and money.NICK ORLOVE: Exactly.It takes a little bitmore work upfront,but it's definitely worth it.STEPHANIE WONG: And for thoseof you watching at home,check out the linkin the descriptionto try it out for yourselves.Don't forget to like, comment,and subscribe for more greatGoogle Cloud Platform content.[MUSIC PLAYING]