Learn SQL Beginner to Advanced in Under 4 Hours

Download information and video details for Learn SQL Beginner to Advanced in Under 4 Hours
Uploader:
Alex The AnalystPublished at:
10/8/2024Views:
546KDescription:
💻Take my Full MySQL Course Here: Download MySQL: Download Datasets: ____________________________________________ RESOURCES: 💻Analyst Builder - 📖Take my Full MySQL Course Here: 📖Take my Full Python Course Here: 📖Practice Technical Interview Questions: Coursera Courses: Google Data Analyst Certification: Data Analysis with Python - IBM Data Analysis Specialization - Tableau Data Visualization - Udemy Courses: Python for Data Science - Statistics for Data Science - SQL for Data Analysts (SSMS) - Tableau A-Z - *Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!* ____________________________________________ BECOME A MEMBER - Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments! ____________________________________________ Websites: 💻Website: AlexTheAnalyst.com 💾GitHub: 📱Instagram: @Alex_The_Analyst ____________________________________________ 0:00 Intro 0:52 Installing MySQL and Setting up Database 11:19 Select Statement 21:47 Where Clause 33:54 Group By 44:18 Having vs Where 47:41 Limit and Aliasing 51:10 Joins 1:08:04 Unions 1:15:10 String Functions 1:26:39 Case Statements 1:35:16 Subqueries 1:45:58 Window Functions 1:58:55 CTEs 2:09:07 Temp Tables 2:16:34 Stored Procedures 2:28:51 Triggers and Events 2:42:46 Data Cleaning Project 3:32:14 Exploratory Data Analysis Project *All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for*
Video Transcription
What's going on, everybody?
Welcome back to another video.
Today, we're gonna be learning MySQL from beginner all the way to advanced in under four hours.
SQL is one of the most important skills in my career.
And so I'm really excited that you are gonna start learning SQL as well because SQL is super important in the data world.
In this really long lesson, I'm gonna help you get your environment set up.
We're gonna be walking through all of the basics.
So just selecting data, how to query basic data.
But then as we go through it, we'll be working towards more advanced things like CTEs, temp tables, and more.
At the very end, we'll have two complete projects.
One for data cleaning, where we take a messy data set and we clean that data, which is a very important skill to know how to do.
And then lastly, we'll be doing exploratory data analysis on our real data to really dive in and see how you can use SQL to dig into the data and understand it better.
So without further ado, let's jump on my screen and get started.
All right, so let's get started by downloading MySQL.
We're gonna come right over here to dev.mysql.com, forward slash downloads, forward slash installer, and I will have that link in the description so you don't have to write all that out, but you should be seeing this page right here.
Now, we have to select an operating system.
I'm using a Windows machine, and if you aren't, if you're using Linux or Mac or something else, it should populate it for you, but if it doesn't, just select this dropdown and select your operating system.
Next, we have two different downloads.
We can install the MySQL installer community or MySQL installer web community.
This one is very small, but then you actually do have to download the installer.
It just gets it from the web.
This one, I'm going to download the actual installer.
It's larger, but this is the one I'm going to do.
I'm going to go ahead and select download.
It's going to ask me if I want to log in or create an account, and I don't.
I'm going to say no thanks.
Just start my download.
I'm gonna save this in this desktop folder.
Doesn't really matter where you save it.
We're gonna save that and it's gonna download.
It should be done in just a few seconds.
I'm gonna go ahead and click on it and it's gonna open it up when it's finished and we should get the interface or the UI for the actual installation for MySQL.
So here is the MySQL installer.
And first thing we need to do is choose a setup type.
Now we're gonna keep the developer default, unless you really know what you're doing and you can select the server only, the client only, full, which is literally everything MySQL has to offer.
custom so we're gonna keep this developer default just installing the things that we kind of need so let's go ahead and select next and for whatever reason on my computer it's saying this path already exists you probably won't get that but I'm just gonna go ahead and select next and then I'll select yes it keeps doing that I can't explain why but it keeps doing that for me even though I've deleted it from my computer completely but it just remembers it somewhere in its memory
Now, the next thing we need is to check requirements.
Now, I just have this one.
It says I need to download this Visual Studio.
I'm not gonna do that, but on your screen, you may have multiple, multiple requirements.
Typically, you're looking at something like this, Microsoft Visual C++ Redistributable Package.
What you need to do is download this.
All you have to do is click Download.
Once you download and install that on your computer, and then we go back, all of those should be gone.
That's the one that I see the most when I'm actually working with these requirements.
I had to install it myself when I got this new laptop.
So go ahead and install that if you need to.
But if yours looks like mine, we don't need this Visual Studio for what we're going to do.
We want to go ahead and select next.
It is giving us a prompt that we haven't satisfied all the requirements, but that's okay.
We're going to go ahead and select yes as well.
Now we're ready to install all of these things.
These are all things that MySQL wants you to install.
The most important are the server and the workbench, but it does not hurt to have all these other things as well.
Some of these connectors are also important.
So we're going to go ahead and execute.
This will take just a few minutes.
I'll skip ahead when they're all done, but this should take just a few minutes and then we'll continue on installing MySQL.
So everything just completed, and now we're gonna select next.
And now we need to actually configure our product.
Now really the only one that we actually need to configure is the server.
The router says we need to configure it, and the samples and examples say we need to configure it as well, but really it's just the server.
Let's go ahead and select next.
Now we're not going to change anything for this type and networking, unless you know what you're doing with the port, the X protocol port.
We're not going to change any of this.
We'll go ahead and select next.
Next thing we need to do is select an authentication method.
I'm going to be using a password.
I'm not going to be using the legacy authentication method.
So I'm just going to go ahead and create a password.
Now for you, and I keep getting this error and I can't explain why.
Right here for you, you should be creating a password at the bottom.
It's remembering my password somehow, and I really can't explain it.
But I'm gonna create my password or check my password.
This is one that I already created before I deleted it off my computer, but it's still there.
So it's saying my password is still good, but if you need to, you should be entering a password and then confirming your password and saving it, and then you should also be checking it as well.
And then we're gonna configure this as a MySQL server as a Windows service.
I'm gonna keep that checked.
And we're going to start the MySQL server at system startup.
I like that automatically being there.
I don't want to mess with that.
So I'm going to keep it as it has it.
We're going to go ahead and select next.
And the last thing we do is just need to execute this.
And then everything we put in there is going to actually go.
So let's run this and execute it.
And that just finished.
So let's go ahead and select finish.
Now it says configuration complete for the server, but we also need to configure these other two.
Let's take a look at these really quickly.
We're not going to do anything on this.
It even says we really don't need to do this.
We just need to click finish and configuration not needed.
Next, we'll do samples and examples, and we can input our password.
And all this is really going to do is put in some sample databases for us in our database, which if you want, you definitely can do that.
I just connected, and that worked.
I'm gonna hit next and execute, and it's basically just going to put in a database or two, some sample ones for you to look at, and the configuration is complete.
You don't have to do that one, but we'll see that in just a second.
And we're gonna select next, and now the installation is completely done, and we can start MySQL Workbench after setup and start MySQL Shell after setup.
Now, I'm not gonna do the Shell, so I'm gonna actually uncheck that, and we're gonna select finish.
MySQL just popped up for us and this is exactly what you should be seeing.
Now there's a lot of things in MySQL to learn and know how to do.
We're not going to be taking a look at all of that stuff today, but in future lessons, we'll walk through a lot of these different things that kind of correlate with different lessons or things that we're working on in MySQL.
The first thing that we're gonna click on is right over here.
This is our local instance.
This is local to just our machine.
It's not a connection to some other database on the cloud or anything like that.
It's just our local instance.
We're gonna go ahead and click on this.
So this is what you should be seeing right here.
This is where we're gonna actually write all of our SQL code, and I'll show you all of this in just a second.
But this is where we can actually create our database, and our database is gonna go right over here on this left-hand side.
The Silica one is actually a sample database.
It has a bunch of tables and views or procedures functions.
It has all of these things in here if you wanna go ahead and mess around with that.
What we're about to do is create our own database that we're gonna be using throughout this entire series, both beginner, intermediate, and advanced.
We'll use a lot of this.
And sometimes we'll import some other ones for different use cases, but this will serve for most of what we're trying to do throughout this entire series.
Now what I'm going to do is I'm going to go ahead and I'm going to say, open a SQL script file in a new query tab.
And right here, it opened up to a folder that I already created this MySQL beginner series folder.
Within it, we have this right here, the parks and rec create underscore DB.
Now, in order to get this, you just have to go to the GitHub and download this file.
That's all you have to do.
We're then going to open this file.
So let's click on it.
We're going to say open.
And what you're now seeing is basically the query editor.
This is where you can write your code.
Now, we're not importing a database, we're actually creating it by running code.
Now, because this is the first lesson in the beginner series, I'm gonna assume that you don't know a ton about MySQL.
Really, all this is doing is creating the database name
and then we're inserting a few tables into that database, and then we're inserting data into those tables.
So this is all of our data that will go into these tables that we create.
We only have one, two, three different tables that we're gonna be using.
So all you have to do to run this is click this lightning button right up here.
We're gonna go ahead and execute this.
If we come down to the bottom and we pull this up, this is our output.
This says six rows affected.
And we have a bunch of other things like create table, create table, insert, insert, create table, insert into.
These things are all working perfectly.
So now if we go ahead and click refresh in our schemas with this refresh button right here,
This parks and recreation table is populated.
If we go under the tables, we see all of these things.
So now that we've actually created our database and our tables, that's really all we were trying to do in this lesson, but I just want to open up a table really quickly, show you what it looks like, show you how we can run code, and then in the next lesson, we'll start actually learning how to query this data.
So let's go up to employee demographics.
We're gonna right click and select rows limit 1000.
This is gonna open up a new window right up here.
And it's gonna say select everything from this database dot this table, employee demographics.
And it ends with a semicolon.
Now right down here, we have this output window.
This is the actual data that sits in our table.
We have columns right here.
So employee ID, first name, last name, age, gender, and birth date.
And then here are all of our employees on each row.
So these are all separate rows.
We have Leslie Knope, Tom Haverford, and it goes on and on.
So this is all of our data.
The most important things to know when we're actually working with this and I'm going to zoom in is if we hover over this query right here and we run it, we can select this execute, which is this lightning bolt with this eye.
We're going to execute this and it's going to run this because we're highlighted over it.
Now, if we have two queries, let's say this one right here, but let's change it to employee underscore salary.
We'll do underscore salary.
Let's say we want to query this table.
So now if we highlight over this and we go up and select the lighting bolt with the eye, now we're looking at a different table.
But if we select, even if we're hovering over this, if we selected this button, we're going to execute everything in this editor window.
So let's run this.
And now you can see at the bottom, we have two outputs, the employee demographics and the employee salary.
So this button is going to run everything in this editor window.
Whereas if we select this lightning bolt with the eye, we're doing everything that's just under where we have the cursor, where we have it highlighted.
The very last thing that I want to mention is that right over here, you may have this up and you probably don't want that.
We're not going to do any SQL additions in the series.
You can get rid of that by clicking this button right here.
Hello, everybody.
In this lesson, we're going to be learning about the select statement in MySQL.
The select statement is used to work with columns and specify what columns you want to see in your output.
The first thing that we need to do is open up a tab or an editor window.
So let's come right up here to the left hand side and we're going to create a new tab.
and I'm gonna zoom in just a little.
Now what we need to do is we need to select the actual table that we're gonna be querying off of.
If you remember from the very first lesson when we set everything up, we came over here and we right clicked and did select rows, limit 1000.
We're not gonna do that, we're gonna actually write it out.
So what we need to do to select that table, the employee demographics table, is we need to select everything.
That's what this star means.
The star means everything, all tables, all rows.
Now we do have a limit on here.
We have a limit to 1,000 rows.
So if we had a table that had 50,000 rows, this limiter would be an issue.
It would still limit it to 1,000 rows.
We would have to change that to 2,000, 5,000, probably all the way up to 50,000 if we wanted to view everything.
If we had say a million rows, we would need to come up here and say don't limit, and it would give us a million rows.
The reason they do this is mostly to keep the processing time low.
If you have a million rows, it's gonna take a long time for the output to actually appear.
So let's come right back here.
The next thing that we need to do is we need to say select everything, and now we need to say where we're selecting it from.
So we're gonna come right down here, and we're gonna say from, and now we need to specify what table.
And we're gonna say employee underscore demographics,
And at the end, we need a semicolon.
Now, why do we need a semicolon?
This is going to tell MySQL that this is the end of this query.
So if you write another one down here, which we will in just a second, it'll be able to distinguish between the two queries.
We're going to go ahead and we're going to run this and we'll just use this execute right here instead of this one.
And there we have our entire table.
So we were able to get our table.
Now there is one thing that is potentially wrong, depending on what you're using it for.
But what we didn't do is we did not specify the actual database before it.
We only specified the table and this works perfectly fine because if you look over here on this left-hand side, we have parks and recreation it's in black.
It's bold.
That means that we're hitting off of this database.
What's gonna happen though if we come down here to the sys database and we double click on it?
Now this database is highlighted.
So now when we're selecting this table, we're trying to select this table from the sys database.
Let's go ahead and try this.
If you notice, we have no output.
Let's come right down here and pull this up.
It's gonna say employee sys dot employee underscore demographics doesn't exist.
So it's assuming that we're highlighting this sys database.
That means we're trying to pull from that database.
Now, we can still have this highlighted and still select the correct database by saying parks underscore and underscore recreation
Okay, let me spell that right, dot.
So now we're selecting everything from parks and recreation dot employee demographics.
If we run this, we do get the correct output.
That's just something to consider, especially when you're working with a lot of databases and a lot of tables, it's usually best practice to actually put the database in front of the table name.
Although throughout this lesson, we probably won't be doing that every time since we're only gonna be using this parks and recreation database.
let's go ahead and double click this so we have this highlighted again and let's click all of this let's copy all this we're going to come down just a little bit right here now so far we've only selected everything but we don't have to do that we can actually just select one column if we would like to for example if we got rid of that star we say first underscore name
We're selecting the first name column from this table.
If we highlight this query, and we hit the execute button with the I,
Now we are only going to return in our output all of the first names.
And we can add a lot more.
Let's actually look at all of these.
We can separate multiple columns with a comma.
So we can do first name, comma, last, underscore name, and then we could do birth, underscore, date.
So now we have three separate columns.
Let's go ahead and run this.
Now we have first name, last name, and birthdate in our output.
Now the way we just wrote it is all on one line, and that's perfectly acceptable because MySQL is going to read it the exact same as if we did it in a different format, as long as it's still in this order.
But sometimes you'll see it like this.
where it's select first name, comma, last name, comma, birth date, all on different rows.
Now there's a lot of different use cases for this or reasons for this, but it typically can be easier to read.
Also, if you're doing any type of functions or calculations in the select statement, it's easier to separate those out on its individual row.
Now again, we won't always be doing this, but it does help.
Sometimes if you're doing that, it just makes it easier to visualize.
For example, if we added the age, so let's add age in here.
Let's run this.
Let's say we were doing a calculation where we wanted to add 10 years to their age.
So we'll say age and we'll actually create a new row for this or a new column.
We'll do age plus 10.
So now we can easily see that we're doing plus 10 here.
And this is another thing that you can do in the select statement, things like calculations.
So if we go up here and we run this,
We'll now have an age column, but we'll also have an age plus 10 column where it just adds 10 to the age.
And we can at least visualize and really easily see this when we're doing these calculations.
Now, something really important to know about any type of calculations, any math within MySQL is that it follows the rules of PEMDAS.
Now, PEMDAS is written like this.
It's P-E-M-D-E-S. Now, what I just did right here with this pound or this hashtag is actually create a comment.
So this code isn't gonna actually run, but it's just for note taking or seeing things in your actual editor window.
I'll come back to comments in just a second, but just wanted to explain what that was.
Now what PEMDAS is, is the order of operations for arithmetic or math within MySQL.
This stands for parentheses, exponent, multiplication, division, addition, and subtraction.
So this is the order that these calculations are going to run in the execution engine that MySQL has.
So if I do age,
plus 10, and we'll put that all in parentheses.
And then we come over here and we add times 10.
So we're doing plus 10 here and then a times 10 here.
What's going to actually happen is it's going to say age plus 10.
So 44 plus 10 equals 54.
Then we're multiplying times 10.
The parentheses executes first because it comes first in this order, parentheses.
Multiplication comes next because it's this one.
And then anything else after that, if we did, you know, plus 10, you could run this.
and you'll notice that it still follows the logic.
10 was just added at the very end to all of these outputs.
Now let's go right back up here.
Let's select everything again from this table.
Let's pull up this table so we can see it a little better.
and let's go down because the last thing that I wanna show you is something called distinct.
Now this is really, really useful and you use this a lot in MySQL.
What distinct is going to do is it's going to select only the unique values within a column.
Let's go ahead and copy this employee demographics, bring it right down here.
Let's say select and let's do first underscore name.
So now we're just selecting the first name.
Let's come right down here.
There we go.
So now we're selecting just the first name from this column.
Now these are all unique values.
So if we come right here and we say distinct, nothing should happen to this table because these are all unique values.
Let's go ahead and run this.
As you can see, the output looks exactly the same.
But what if we were to do something like gender?
So let's come here.
Let's do gender.
Let's run this.
It keeps going down.
I don't know why it's doing that.
But now we have male and female.
Now these are not all unique.
We have female, female, female, and female, and the rest are males.
So there's only two unique values here.
So if we come right here and we say distinct gender, now there should only be two in the output, male and female.
Let's go ahead and run this.
So now we get male and female in our output.
Now this works perfectly in one column, but what happens if we have two columns?
So let's do first underscore name comma gender.
Let's go ahead and run this.
Now the combination of first name and gender are no longer unique.
Now Leslie and female are being grouped together and it's taking the distinct between both of these columns.
So when we're only working with gender, it's only looking at this one column for both male and female.
So it reduces it down to the only two unique values.
But because we add the first name, all of these values are unique.
So therefore the name plus the gender combination is always going to be unique.
The very last thing that I want to show you in this lesson doesn't actually pertain to the select statement, but I want to save this code.
Let's say we wanted to update this or upload this into our GitHub or save this and send it to somebody.
We can do that.
We can save it by clicking this save button right here.
I'm going to go ahead and click this.
And now we're in our MySQL beginner series folder.
I'm just going to save this and I can save this as anything I want.
So I'm going to say to dot select statement.
tutorial.
So now when I save this, you'll notice that the name gets changed up here to two dot select statement tutorial.
Let's exit out of this.
I'm going to open up and now I'm going to come here to the select statement tutorial.
I'm going to open it.
And now I have our code again, exactly as we had it written before.
Hello everybody.
In this lesson, we're going to be taking a look at the WHERE clause.
The WHERE clause is used to help filter our records or our rows of data, whereas the SELECT statement is used to help filter or select our actual columns.
So when we're using the WHERE clause, we're only going to return the rows that fulfill a specific condition.
Let's take a look at exactly how this works.
Let's say we come right up here.
We're gonna say where, and let's go down with that one.
Let's say where, and now we need to specify what column we're about to create this condition for.
So we're gonna say first underscore name.
So we're saying where the first name we'll say is equal to, and let's do quotes, and let's say Leslie.
So we're saying the first name has to be equal to this value right here, which is Leslie for Leslie Knope.
If we run this,
There's only going to be one row that's returned because Leslie is the only Leslie in this entire table.
Now we just used an equal sign and that's actually called a comparison operator.
And there's a few other comparison operators that you can use.
Let's take a look at some of these other ones.
Let's pull this down right down here and let's actually highlight the select from, and we're going to run it with this one right here.
It's going to only select everything from the whole table.
So we didn't select that where clause.
let's go right down here and let's look at this salary field so i'm going to say where the salary and i'm going to do a different comparison operator called greater than so when the salary is greater than fifty thousand now one thing i want to note before we actually run this is that right down here we have tom haverford who makes exactly fifty thousand and i think there's one more jerry gergich which also makes exactly fifty thousand if we run this
You'll notice that both Tom and Jerry are not in this output, but in the salary field, everything is greater than 50,000.
The reason for that is that Tom and Jerry made exactly 50,000.
What we're saying right here is where the salary is only greater than.
If we want to include Tom and Jerry, we have to say greater than or equal to, and now we'll select 50,000 or above, whereas right here, before when you're doing just this, it was
greater than 50.
It didn't include the 50,000.
Let's go ahead and include it and run this.
And now you'll notice that Tom and Jerry were both included because they had exactly 50,000 and we said greater than or equal to.
Now we can do the exact same thing but with less than.
So we have less than 50,000.
And now we only have two people who make less than 50,000, that's April and Andy.
And if we say less than or equal to, and we run that, now we include both Tom and Jerry who make exactly 50,000, so it's less than or equal to $50,000.
Now what we're gonna do is head on over to a different table.
We're gonna do the demographics table.
Make sure I spell that right.
And let's add our semicolon, let's run this.
And what we wanna look at is the gender really quick.
So we're gonna say where the gender is equal to, we'll do in quotes, female.
And if we run this, we get all the genders that are equal to female, but we do have something called the not equal to.
And it looks like this.
It's an exclamation point and an equal sign.
This is gonna say where the gender is not equal to female.
So if we run this,
you'll notice that the gender is all male now.
Now so far, we've worked with things like integers, which are numbers.
We've worked with characters or strings, like names.
But there's a different type of data type as well in here.
We have a date column for these birth dates.
Now in the where clause we can also filter on birth dates.
Let's come over here and we'll say birth underscore date.
Let's say it's greater than and within quotes we'll say 1985-01-01.
This is kind of the standard default date format within MySQL which is year, month, and day.
If we go ahead and run this.
We can also take all the people who are greater than or were born greater than 1985.
So all of these dates are greater than 1985.
Now the next thing that I want to take a look at is logical operators in the WHERE clause.
So logical operators are things like AND, OR, and NOT.
Now these are called, and let's add this, logical operators.
So logical operators allow us to have different logic.
Now let's take a look at how this works exactly.
Let's copy this down, because we already have this one written out.
We're saying where the birthdate is greater than 1985.
We can also say where the gender is equal to male.
We can say and the gender is equal, and then we'll say male.
So we're adding a different complexity or an additional conditional statement within our where clause.
Let's go ahead and run this.
So now we're only selecting birth dates that are greater than 1985 and where the gender is equal to male.
Only the rows that fulfill both of those are returned.
Now the and says both this and this have to be true.
But we could change this.
We could say or.
What this means is is either this one has to be true
OR this one has to be true in order for it to be returned.
So let's go ahead and run this.
You'll notice that Jerry Grgich was born much before 1985, but since he has a male gender, he is in our output.
And we could also use the NOT operator by saying OR
not gender equal to male.
So now what this is saying is the birthday could be greater than 1985, or it could not be equal to male, which is female.
So if we look at Leslie Knope, she was born before 1985, but because she is female, she is in the output.
Now, like we talked about in the last lesson, there is something called PEMDAS, and that actually applies to these logical operators as well.
So if we run this entire table, let's go ahead and run this.
If we're looking at this entire table, let's say we wanna get someone very, very specific.
Let's say we're gonna do where the first underscore name is equal to Leslie.
and their age has to be equal to 44.
That's extremely specific, and we can actually just do it like this.
We don't need quotes for integers.
We could just do the number if we'd like to.
This is very specific.
This is only one person.
But if we put this in parentheses,
we can add an or over here.
We can say or the age is greater than, let's just do 55.
Let's go ahead and run this and then we'll take a look at it.
So within these parentheses, we have an and operator.
What that means is both this condition has to be met and this condition has to be met.
And that's only one person, that's Leslie Knope.
But then outside of these parentheses, we have another conditional statement.
Or the age is greater than 55.
So what we're saying within these parentheses is that this is an isolated conditional statement.
Within these parentheses, if this is true, then in our output, it'll be returned.
But then we have an or condition which says, or someone with the age of greater than 55 can also be in the output.
So these parentheses can be really helpful when you're actually using it in the where clause with these and, ors, and nots.
Now I want to take a look at just one more thing.
And let's bring this down here.
And let's get rid of this entire thing.
Now, the last thing that we're gonna take a look at is a like statement.
Now the like statement is super unique because we can look for specific patterns.
We're not necessarily looking for an exact match.
Like here, if we said where first underscore name is equal to Jerry.
If we're looking for Jerry, it has to be exactly Jerry.
But if we take this out, say J-E-R, and then we run it, we get no output.
It has to be an exact match.
But here's where the like statement comes in, because we can actually say like J-E-R, and we can add two special sequences or special characters within our like statement.
So those special characters are the percent sign,
and the underscore.
The percent sign means anything and the underscore means a specific value.
Let's see how that actually works.
So what we're gonna do is we're gonna say like J-E-R percent sign.
That's the first one in this like statement.
What this says is the first name is like starting with J-E-R, but then has anything after it.
Doesn't matter what it is, as long as it has J-E-R at the very beginning, it will be returned.
Let's go ahead and run this.
Now the only person who starts with J-E-R is Jerry.
But what if I took the J out of here?
Now it's saying it starts with E-R, and that's not anybody.
What we can do is we can add another percent at the beginning.
This is gonna say anything comes before, anything comes after, all we're looking for is er somewhere in their name.
Let's go ahead and run this.
There still is only one person, and that's Jerry.
Now let's come up here and let's get rid of this.
And let's say we're looking for everyone's name who starts with A.
We can do that really easily by saying A, percent sign.
All that says is it starts with A.
We don't have a percent sign before it, which would say this string just has to have an A somewhere in it.
If we have it like this, this means an A has to come at the beginning.
Let's go ahead and run this.
In our output, we have April, Ann, and Andy.
Now let's take a look at the underscore.
If we get rid of this percent sign and we do two underscores, one, two, this is gonna say it starts with an A and then it has two characters after it, no more, no less.
So if we run this,
Ann is gonna be the only person who's returned, because she has an A, and then two characters after it.
Now if we want Andy, we can specify that by doing another underscore, that's 123.
And now Andy is the only one in our output.
Now there was also April in there, but she had more than three characters.
but we can actually get her in our output by doing a percent sign.
So we can combine both the underscore and the percent sign.
And this is going to say it starts with an A, has one, two, three characters, and then it can have anything after that.
So it just has to have at least an A and have one, two, three characters after it.
So let's run it.
Now you can see April comes into here because she does have A, the P, R, and I are the three next characters, but then we have a percent sign that allows that L to be in the output as well.
Now we don't just have to do this with strings or text like April and Andy, we could also do this with birth dates.
For example, Andy's birth date is 1989.
We could say where the birth underscore date is like, let's say we want to look at everyone who is 1989 or born in 1989.
Let's go ahead and run this.
And Andy's the only person born in 1989.
But again, we looked at the year at the very beginning.
So that is how the like statement works.
It looks for a specific sequence within that column that you can search for.
So it doesn't have to be an exact match.
As long as it has that specified sequence that you've put in there anywhere within that cell or that column.
So everybody in this lesson, we're going to be taking a look at group by and order by in MySQL.
Now, when you use the group by clause in MySQL, it's gonna group together rows that have the same values in the specified column or columns that you're actually grouping on.
Once you group those rows together, you can run something called an aggregate function on those rows.
Let's see how this actually works.
Let's go ahead and copy this right here.
We'll bring that down.
And let me go back up one.
Let's go ahead and write gender right here.
Now we want to group on this gender column and we're going to say group by gender.
Let's go ahead and run this.
We'll see what we get.
And so we have male and female.
Now we could get the exact same output by saying select distinct gender from this table.
What is group by doing that the gender actually isn't doing?
Well, it's actually rolling up all of these values into these rows.
So later when we run aggregate functions like average, min, max, we'll do it based off of these rows and all those rows are rolled up into these two rows.
And we'll see that in a little bit.
Now, what if I was to come up here, and in this demographics we have a first underscore name.
What would happen if I'm selecting the first name, but I'm grouping by the gender?
Let's go ahead and run this.
If we come right down here, we pull this up, you can see that the select list is not in group by clause and contains non-aggregated columns.
What this means is that when you are selecting a column, if it's not an aggregated column, like say average of something, if we're not using the aggregate functions in the select statement, it has to be in the group by.
These have to match.
So this gender has to match this group by if we're not performing an aggregate function on it.
Let's go ahead and run this.
and now it works properly.
Now let's go back up, let's run this query, because I wanna select everything again.
But let's say we wanted to take a look at the average ages for gender.
So what we're gonna do is we're selecting gender, we're also grouping by gender, but what we're gonna do is add a comma, and we'll say the average, that's A-V-G, that stands for average, and then we're gonna put in here age.
So now this right here is an aggregate function.
This does not need to go in the group by, we're just grouping on the gender and then we're performing this aggregate function or kind of a calculation based off of those grouped rows for gender.
So let's go ahead and run this and take a look at the output.
So what this is telling me is that for the males, all of the male rows that were grouped, the average age is 41.3.
And for female, the average age is 38.5.
So super quickly, you can tell that the average age of females is lower than the average age of males.
Now we'll take a look at aggregate functions more in just a little bit.
Let's actually go to a different table.
Let's come right down here.
We're going to go to the salary table and just select everything for now.
Let's go ahead and run this.
Now, what we're gonna actually be grouping on is this occupation right here.
Now there's a lot of unique values.
It's not as distinct as the gender, which only had two values.
You'll notice we do have a few that are the same.
We have ones like office manager.
So when we come up here, let's say occupation, and of course we need to group by the occupation as well.
Now let's run this.
You'll notice that office manager only has one row.
Let's say we also wanna group on the salary.
Let's say salary.
Now we can group on multiple.
So we're gonna say salary like this.
So we're grouping on the occupation as well as the salary.
Now let's run this.
You'll notice that we have two rows for office manager now.
This is because this salary and this salary for those two employees are different.
We have 50,000 and 60,000.
For this, I just wanted to demonstrate that if these had both been 50,000, there would only be one row, office manager, 50,000, but because this is a unique value, different than 50,000, they have their own individual rows, which we would then perform our aggregate calculations on.
Let's go and get rid of that because we will not be using that anymore.
I just wanted to demonstrate it really quickly.
So before we were looking at gender and average age and we're also grouping on the gender.
We can perform other aggregate functions as well.
Let's take a look at some of those.
We could look at the max
age as well.
The max is gonna show us the highest value within each of those groupings.
So we have a male and female.
The max age for those, for the male, is 61, and the highest age for the female is 46.
We can do the exact same thing, except we can say min, or the exact opposite thing, we can say the minimum age.
So this is gonna be the lowest for both the male and the female.
Go ahead and run this.
Now we have female and male, and the minimum age is 29 and 34.
And there is one last one that I want to show you, which is count.
We're going to do count.
Now count is going to count the actual rows within this age column.
So if we run this,
You'll see that we have four females per count and we have seven males.
It's just telling us a count of how many values is in this column when we're actually grouping on the gender.
So that's how we can use group by to actually roll up and group all of these similar values within a column or columns and perform our aggregate functions on them.
Now let's come down here.
And what we're going to take a look at is order by.
So we're going to say order by.
Now let's actually pull in this demographics table right here.
We're just going to say select everything and run this really quickly after we add a semicolon.
So order by.
Order by is going to actually sort the result set in either ascending or descending order.
Let's take a look at how this works.
At the very end, we could say order by, and we could order by the first underscore name.
So we're gonna take this column, we're gonna order all of our rows based off of this one column.
Let's go ahead and run this.
So it's gonna do it based off ascending order, which means smallest to largest.
Now this is a text column or a character column, so we do it A to Z.
So Andy and April all the way down to Tom.
Now by default, this is in ASC order, ascending order.
And if we run this, it's gonna be the exact same output.
But we can change this to do it the opposite, highest to lowest or Z to A by doing descending.
So now if we run this,
You'll see that goes Tom all the way down to Andy.
Now let's take a look at ordering on something like gender and age because we can do both at the same time.
So let's order by the gender first.
Let's go ahead and run this.
And you'll see that all the females are grouped together and then all the males are grouped together because that's just the order in which it is.
But we can do an additional column.
We can also do it based off of the age.
Let's go ahead and run this.
So now within the female, since that came first in our order by, we're ordering by the gender, and then we're also ordering by the age after we've ordered by the gender.
So now it's 29 all the way up to 46, then 34 for males all the way up to 61.
Now we can change this just for the age.
Let's say we wanna do age descending.
So gender will stay the same in ascending order, and now age will be in descending order.
Let's go ahead and run this.
Now female and male stay the same but now it starts at the highest down to the lowest.
Now this is something that I would absolutely do in real life except sometimes you can make mistakes and sometimes you do the wrong column first.
Let's do age and then we'll do gender.
Now if we run this,
the gender is not going to be used at all.
And this is because there are no unique values that are gonna be on the same row.
So notice all of these values are completely unique.
So the gender never is actually used to order anything on because if there were things like 34, 34, 34, 34, these would be ordered
based off of the gender.
But since there's no unique fields, this is really pretty useless.
That's why the order of the order by, or the columns that you place in the order by, are actually quite important.
Now the last thing that I wanna show you, and I'll just go back to gender and age,
is that you don't actually have to use the column names.
We can use the column positions.
Now, I will preface this by saying I don't recommend doing this, but I sometimes do it in shorthand for just a quick query if I know the column position and I don't want to write out the whole name.
Sometimes I do it, although it's not best practice.
But let's take a look at it.
So, gender is the 1, 2, 3, 4, 5th column.
I'm going to replace this with 5, and age is the 1, 2, 3, 4 column.
So these are the positions of the fields, but not the names of them.
If we run it, we're going to get the exact same output because these represent these columns appropriately.
But again, I just don't recommend it.
It's kind of a slippery slope that I've fallen down myself many times.
And when you get to more advanced SQL and you're creating things like stored procedures and triggers and all these things, this can actually cause a lot of
issues if you were to add any columns or remove any columns then you'd be ordering by the wrong column because let's say this last name got removed we didn't want it for some reason then the gender is one two three four now we're ordering on the wrong column and that would be a big mistake so just by best practice it is better to do gender
on my age, but I just wanted to show you that in case you want to be like me and kind of go down the wrong path.
Hello, everybody.
In this lesson, we're going to take a look at the difference between having and where.
Now, in the last lesson, we looked at group by and order by.
The most obvious thing to do would be to come right here and say where, and we're going to say this column, which is actually named this.
We'll say where the average age, let's say is greater than 40, which would only be the males.
So let's go ahead and run this.
And as you can see, we're not getting any output.
Let's bring this up and take a look at the error.
It says invalid use of the groupBy function.
What's actually happening is something to do with this group by gender right here.
When we're selecting gender and then we're performing an aggregate function, this occurs only after the group by actually groups those rows together.
So when we're trying to filter based off of this column right here of average age, it really hasn't been created yet because this group by hasn't happened.
That's where the having clause comes into play.
So let's go ahead and what we're gonna do is we're gonna get rid of this.
we're gonna come right down here, and instead of where, we're gonna say having.
Now, having was specifically created for this exact example.
It comes right after group by, and after group by, we can filter based off of these aggregate functions.
So now if we run this,
We're gonna get an output that only has where the average age is greater than 40.
Now let's take a look at just one more example, and I'm gonna show you how you can use both in one query.
So instead of demographics, let's look at the salary table.
And let's run it.
Now, in this salary table, we have this occupation, and remember, we have this office manager that happens twice, and this is gonna be our main example.
So we're gonna say occupation, and then we'll say the average salary.
Now we'll need to come down here, and we'll say group by, now we're gonna say occupation.
So this should look pretty similar because right here we have our office manager and one of the office managers made 50.
One of the office managers made 60.
So the average is 55,000.
Now I can use the where by saying where, then I'll say occupation like, and let's see people who are managers.
So I'll say a percent manager percent and close that quote.
So they're like a manager.
And then I want to see where a manager makes more than, let's say, 75,000.
So I won't actually say where.
I'm going to say having an average salary.
And I need to add a space there.
Having an average salary greater than, let's say, 75,000.
And let's run this.
So now I filtered at the row level right here in the where clause, but then down here, I filtered at the aggregate function level.
This having is only gonna work for aggregated functions after the group by actually runs.
So that is the difference between the having clause and the where clause.
The where clause you're most likely gonna use a lot more, but if you do wanna filter on those aggregated function columns, you have to use the having clause.
Hello everybody.
In this lesson, we're going to be taking a look at limit and aliasing.
Limit is just going to specify how many rows you want in your output.
If we take this table, for example, if we come right here and we say limit, let's do three.
If we run this, it's only going to take the top three that we have.
Let's go ahead and run this.
As you can see, we have employee one, three, and four, Leslie, Tom, and April.
Now this seems super straightforward, really, really easy, but it can be combined with order by to actually be really powerful.
For example, let's say we wanted to take the three oldest employees.
All we'd have to do is come right under here, and we say order by,
and we'll order by the age in descending order.
So we're gonna order on age descending, and then it's gonna take the top three.
So if we run this, and very quickly we have the top three oldest people in this table.
Now there is one additional parameter that we can use in limit, and all we have to do to access it is have a comma here.
Now what this is gonna do, and I'll put a one here, what this is gonna do is it's now gonna say we're gonna start at position three, and then we're gonna go one row after it.
Now I actually wanna take one of these people, so let's start at position two and select the next one after it, which should be Leslie Knope.
So we're gonna start at position two, and we're gonna select the one right after it.
So we're gonna start at position two, and then one means we're taking the next one row.
Let's go ahead and run this.
And as you can see, we got Leslie Knope in our output.
Now let's come right down here.
We are going to now look at
aliasing.
Now aliasing is just a way to change the name of the column, for the most part.
And it can also be used in joins, but we're gonna take a look at joins or aliasing joins in the intermediate series.
In a previous lesson, we looked at a group by that looked like this.
We selected gender, then we said from, I believe it was employee underscore demographics.
Then we said group by gender.
and we also had the average, and I think it was age.
There we go.
And we'll add our semicolon.
Let's go ahead and run this.
In our output, we have gender as our gender column, the same as the column name, but then average age is average age.
And so if we want to actually do something like a having, where we say having the average age, let's say greater than 40, like we had it,
we have to actually use this aggregate function in our having clause, and we don't wanna always have to do that.
We can actually change the name of this column and subsequently use it throughout our query with that aliased name.
So I'm gonna say as, and that's the key word to actually change it.
We'll say as, and we'll do average underscore age.
So now we've changed this name to average underscore age, and we can come down here to having, and say having the average age greater than 40,
And when we run this, it works perfectly.
And you'll notice that the name of the column was actually changed.
Now this as isn't actually 100% needed.
It's kind of implied, even if we get rid of it, it's implied there's like this as in there somewhere, but we don't have to have it.
If we took it out and ran it like this, it would still work exactly the same.
Hello, everybody.
In this lesson, we're going to be taking a look at joins.
Joins allow you to combine two tables or more together if they have a common column.
That doesn't mean the column name has to be the exact same, but at least the data within it are similar that you can use.
There are several joins that we're going to look at today, like inner joins, outer joins, and self joins.
These are the two tables that we'll be working with the most throughout this lesson.
We have the employee demographics table as well as the employee salary table.
Now within the employee demographics table, we do have this employee ID column.
And if we look at the employee salary, we also have the employee ID column.
So in this instance, the column name is actually the exact same.
And of course the data inside of it is also very similar.
So let's start by writing out an inner join.
This is probably one of the most common joins, one of the most simple joins as well.
An inner join is going to return rows that are the same in both columns from both tables.
So let's see how we can actually write out this join.
Let's come right down here and let's copy this.
This will be the first table that we start with and then we'll join the salary table onto this demographics table.
So what we need to do is we need to come right here and we need to say join.
Now by default, join represents an inner join, although we can write inner join here to make it more explicit, explicitly writing out inner join.
Then we're gonna come up here and we're gonna say employee salary.
So we're selecting everything from the employee demographics and we're doing an inner join on the employee salary.
Now we have to tell MySQL exactly what columns we're supposed to be joining on.
I'm going to hit enter and I'm going to hit tab.
Now you don't have to hit tab.
It just looks more code-like and it's easier to read and that's how I've done it for other programming languages as well.
So that's how I'm going to show you how to do it.
What we need to do is say on.
Now this keyword is going to allow us to say we're joining the demographics table to the salary table based on these two columns.
So from the demographics table, we're doing the employee underscore ID is equal to.
And then in the salary table, it's also the employee ID.
Let's do employee.
And you spell it right.
Employee underscore ID.
Now, if we try to run this and let's do this.
going to get an error and let's bring this up it's going to say column employee id on the end clause is ambiguous now what does it mean ambiguous that means that it doesn't know what table this employee id is from is it from the employee demographics table is it from the employee salary table we don't know because it's ambiguous now what we can do is we can specify it by saying employee demographics
dot employee ID and then employee salary dot employee ID.
Now if we run this, we're gonna get the output that we're looking for and let's take a look at this real quick.
Let me bring this up.
So we're pulling everything from the employee demographics right here, all the way through the birth date.
Then we're pulling the employee salary table.
That's the employee ID all the way, let's scroll over, through the department ID.
So we're basically pulling in all of the rows or all the columns from both tables, but we're not pulling in all of the rows.
Remember, an inner join is only gonna bring over the rows that have the same values in both columns that we're tying on.
So in this employee ID, we're missing number two.
Are we missing any other ones?
No, we're only missing number two.
Let's go back up and I'm going to run both of these tables.
I want to take a look.
So let's run this.
So you'll notice in the employee salary table, we have a number two right here, and that's Ron Swanson.
But in the employee demographics table, we don't have that.
I believe that Ron Swanson did this, that Leslie Knope would not know when his birthday was because he didn't wanna bring that information.
I think that makes the most sense, although Ron was not willing to give a comment on that.
Now, if we run this again,
you'll notice that 2 is not in there.
Since 2 is not in the employee demographics table, the employee ID 2 is not going to be populated or brought over into this output from the employee salary table.
Now really quickly, this is honestly giving me some anxiety because this is so incredibly long.
Something that I mentioned in the beginner series is that you can use something called aliasing when using joins, and it's really helpful.
This is what I mean.
So right here we have employee demographics.
We're gonna call this DEM.
You can also do as DEM.
You can do as SAL.
These are just short names for demographics and short name for salary.
And we can replace these and say DEM.employeeId and SAL.employeeId.
Oh that looks so much better.
Now we're gonna run this and it'll be the exact same output but now we're using these aliases which just makes it so much easier to read.
Now one last thing that I want to show you while we're just looking at the inner join is selecting the actual columns.
Let's say we wanted to select the employee ID and we wanted to select age and then we wanted to select their occupation.
If we try to run this, we're gonna get an error, and it's gonna be almost the exact same error that we got before, which is column employee ID in field list is ambiguous.
So in our field list, which is right up here in the select statement, we have this employee ID.
It does not know which employee ID to pull from, whether it's the demographics or the salary.
So we have to tell it which one to pull from.
So let's pull it from the demographics by saying dem.employeeId.
Now when we run this, we're able to get information from both tables in our output without having all of the information.
And if there are columns that are similar in both tables, we have to denote that by using this alias or the table name.
All right, so that is inner joins.
Now let's move down here.
I'm going to copy this and we're going to come right down here and we're going to look at outer joins next.
and let's put that right here.
Now for outer joins we have a left join and we have a right join or a left outer and a right outer join.
A left join is going to take everything from the left table even if there's no match in the join and then it will only return the matches from the right table.
The exact opposite is true for a right join.
Let's see how this actually works.
Let's start by changing this to a left join or a left
outer join.
They're both the same and you can use them similarly.
I'm just going to say left join and we're joining it on the exact same things and I'm going to take everything because I think that'll be easier to visualize and I'm going to run this.
Now you may notice that this looks exactly the same and that's for a very good reason.
It's because in the left table, which is the employee demographics table, the from statement, that's our left table,
And then the join, where we're actually joining on, that's our right table.
So this is our right table.
So since we're doing a left, it's taking everything from the employee demographics.
Now remember, the employee demographics didn't have Ron Swanson.
It had no information.
So everything in the right table had a match.
Let's change this to a right join.
And what this is going to do, and I want to make it all cap, make it all the same.
What this is going to do is it's going to take everything from the employee salary table.
But if there is not a match in the employee demographics, it just will have nulls.
Let's go ahead and run this.
So now it looks a little bit different.
Now we're taking everything from the employee salary.
So we're taking Ron Swanson, but if there is not a match, it will still populate that row, but it'll have all nulls in it.
Then any of the information that is overlapping or the same, it will bring over.
So employee ID is matched to employee ID one.
Then we'll bring all that information over from the left table.
And that's essentially what a left and a right join is.
With a left table, you're taking everything from the left table and then matches from the right table.
If you do a right join, you're taking everything from the right table, but only matches on the left table.
And again, it populates it with nulls.
Now let's go down and look at our last type of join that we're going to look at.
And this is a self.
Let me spell that right.
A self join.
Now what is a self join?
It is a join where you tie the table to itself.
Now why would you want to do this?
Let's take a look at a very serious use case.
Let's do select everything.
Let's do this from employee underscore salary.
And let's run this.
Now let's say it's December 1st, and the employee and Rex department decided to do a Secret Santa.
And they wanted to assign, based off of their employee ID, the person who they're gonna have as a Secret Santa.
We can help orchestrate this very easily using MySQL.
Well, very easily is subjective, I guess, but let's take a look at how we can do this.
So just like any other join, the first thing we're gonna do is select everything from employee salary and then say join, and then we're gonna say employee salary again.
So we're tying it to itself.
Now when we come down here, and let me do this, when we come down here and we say on,
we have to specify which table we're pulling from.
Are we pulling from the left table, which is like the first table we're pulling from, or are we pulling from when we're joining on the right table?
We need to be able to distinguish these two tables because they are the same.
So I'm going to say EMP1 and I'm going to say EMP2, just to say this is employee table one and employee table two.
So we're going to tie them based off the employee ID because we know those will be the exact same because we're pulling from the same table.
So we'll do emp1.employee.
And just so you know, if it populates like this, you can hit tab and it'll auto finish that for you.
And emp2.employee underscore ID.
Now, if we run this, let's do this.
The output that we're gonna get is literally just a one for one match.
It's all the columns and all the rows, because they all match exactly.
But now what we're gonna do is we're gonna assign an employee ID to the next employee ID, and that'll be their secret Santa.
So just keep it really simple.
The next highest person with an employee ID, that is their secret Santa.
So let's do an employee ID plus one,
is equal to employee two dot employee ID.
So we're adding one over here and we're saying that's equal to this employee ID over here.
Let's run this.
So now you can see Leslie Knope is now going to be assigned to Ron Swanson who has an ID of two.
Ron Swanson is gonna be assigned to Tom Haverford, which I'm sure he's really happy about, and so on and so forth.
Now let's bring this down here, and what we're gonna do is try to simplify this and simplify this output a little bit, because this is a little bit chaotic down here.
So we're gonna specify what columns we want in our output.
What we're gonna want is the employee ID, first name, last name, and then employee ID, first name, last name, of the person who they got for Secret Santa.
So we're gonna start with EMP1.employee, employee underscore ID, and we can call this, we'll just say as EMP underscore Santa.
Then we'll do a comma and we'll come down.
now i need to spell employee right so we have our employee id and now we need our first name and last name now remember leslie nope is going to be the secret santa for ron swanson i don't know if i made that clear but that's i guess how it works so now we need to do emp1 dot and we'll do first underscore name and we'll do as
We do as first underscore name underscore Santa.
We can do a comma.
We'll do the exact same thing except for the last name.
So last underscore name as last underscore name Santa.
Now all we have to do is do a copy all this, bring it down here and change this to two.
We're pulling from the second table.
And it'll look just like this.
And get rid of this comma.
and this is done.
Let's run it and let's bring this up.
So we have employee Santa, first name Santa, Leslie, last name Santa, nope, then employee Santa, and we actually need to change these names.
That is one thing we need to do.
We'll just change it to employee name, first name, employee, and last name, employee.
And now when we run this, we have our Santa, and then we just have the employee who this person's gonna be the Santa for.
Now, this is kind of a silly way to look at it, but in essence, this is exactly how a self-join works.
Now, the very, very, very last thing, I promise you, the last thing that I wanna show you is how we can join multiple tables together.
So we're gonna say joining multiple, how do you spell it, right?
Multiple tables together.
Now not just one table together to another table.
I'm talking about one table to another table to another table.
So let's go all the way back up.
We're going to take this right here and bring it all the way down.
And what we're now going to do is we're going to tie in this table right here, the Parks Department.
Let's actually look at this table and let's select everything real quick.
And let's run this.
Now let's go down here and we're gonna say select everything.
We'll do this from arc underscore departments.
And let's run this.
Now this is something called a reference table.
This is not a table that most likely you'll ever add a bunch of information to.
It's there to reference that we have these department names.
Tables like the salary table or employee demographics table are gonna change pretty often as people get raises or as they get older with their age, those are gonna be updated fairly often, whereas this parks department table is just there for reference.
Now, if we look down here in the columns, we have a department ID, then we have a department name.
So we have the ID and the name of that ID.
If we run our join and we scroll all the way to the right, you'll notice we have a DEPT ID.
This stands for Department ID.
That's in the salary table.
So what we want to do is join this Department ID to the Department ID from the Parks and Rec.
So what we can do is we're going to say enter.
join and now we're going to join let's scroll down just a hair there we go now we're going to take this and do it off this so we're going to call this pd for short
and we're gonna say we're joining it on.
Now, we cannot join this parks department to the employee demographics table.
Why is that?
Well, the employee demographics table only has employee ID all the way through birth date.
There's no common column that we can tie to this parks department.
The only table that has a common column
is this department ID in the salary table.
So what we need to do is actually take sal, so we'll say S-A-L, dot, and then we're gonna say D-E-P, and we'll say department ID is equal to the P-D, dot, and we need to take the department ID.
Now notice, these are not the exact same name.
They are a little bit different, but they have the same values.
One thing I forgot to mention is that in this parks department, there's no repeating.
That's why it's a reference.
Whereas in the salary, the department ID repeats several times because multiple people are in the same department.
So this reference table also usually does not have duplicates.
Just one other thing to note, but we have now tied it successfully.
Let's try to run this.
And if we come down here, go all the way to the right, we now have the department ID, 1111111, and the department ID and department name, Parks and Recreation, Healthcare, Public Works, Finance, Public Works, and Parks and Recreation.
So this worked perfectly.
So this is how you can tie multiple tables together if you have common columns between them.
Even though employee demographics has no column that's related to the parks department table, we can still tie them together based through this employee salary because employee demographics can tie to employee salary and employee salary can tie to the parks department.
And that really is the majority of what you need to know in order to use joins well.
Hello, everybody.
In this lesson, we're going to be taking a look at unions in MySQL.
A union allows you to combine rows together, not like columns like we were doing before with joins where one column is next to the other.
A union allows you to combine the rows of data from separate tables or from the same table.
It's up to you.
But you do that by taking one SELECT statement and using a union to combine it with another SELECT statement.
Let's see how this actually looks.
So what we're going to do is right after the select statement, we're going to come here and say union.
Then we're going to go right below the union and we're going to do another select statement.
So we're going to copy this, place it right here.
But instead of the demographics table, just for example, we'll do the salary table.
Now if we look at the demographics table, let's say we want to take age and gender.
And let's go and take a look really quickly at the salary table.
And let's say we want to take a first name and last name.
We'll do first underscore name and last underscore name.
Now let's go ahead and run this and see what it looks like.
And let's pull this up.
So as you can see, we have age and gender, that's from the very first select statement, and that's also the column names.
But then we have all of the data for the age and gender, and then below, if we move this over a little bit, we have the last name and first name from the employee salary table.
It's just down here.
Now what I just demonstrate is that this doesn't always work for everything.
You can't just combine random data together because this is bad data.
We shouldn't have age and gender mixed with first name and last name.
Really when you're using this, you need to keep the data the same.
So for us, we should take the first, and I'll actually just copy this, the first and last name from the employee demographics as well.
And let's run this.
And now we have all the names from all of the tables.
Now, you may be thinking, where'd all the other data go?
Before we had a lot of rows, but now we only have a unique row for each one.
Well, by default, this is actually a union distinct.
And if you remember, distinct is only gonna take unique values.
So when we're doing this, union is going to remove all the duplicates and the first name and last name from salary overlaps a lot with the employee demographics table.
So when we ran this, the only one that's actually somewhat unique to one table is that in the employee salary table, we have Ron Swanson, whereas we don't have that in the employee demographics.
Now, if we wanted to show all of them without the distinct, there is something called a union all.
If we run this,
Now we're gonna get all of the results without removing any of the duplicates.
So if we scroll down, we're gonna have duplicates in here, but we're just showing all of the results from this table and from this table.
Now that we know how to actually use a union, let's look at a use case.
So let's go right down here and let's copy this, why not?
And let's put it right down here.
Now let's say in the employee demographics, we wanted to take the first name and last name where the age is greater than 50.
And let's run this.
So there's only one person, but let's label them.
Let's add a label.
We're gonna say comma old.
So this person is old.
And if we run this, it says first name, last name, and old.
And we can even call this as label.
And if we run this, the label is old.
So Jerry Gurgich is the only old person in this demographics table.
Now, why are we doing this?
Well, the Parks Department is trying to cut their budget a little bit.
They wanna identify older employees that they can push out, and they also wanna identify high paid employees who they can reduce their pay or push them out to save money.
So we just identified someone who's older who we're gonna wanna try to push out, but let's in the same output,
find people who are also highly paid so now we can come down here we can say union and let's do this like this i need to spell this right all right union and let's take this we're not going to be using this exact same query
but we actually need to pull from the salary table, so the employee salary.
So we also want the first name and last name, but let's say where their salary is greater than, let's say 70,000, because that's a lot of money.
If you're making more than 70, for sure the Parks Department is gonna try to get rid of you.
But for the label, we're gonna change it to a highly paid employee.
Now let's go ahead and run this.
So now we have Leslie Knope and Chris Traeger.
They're both labeled as highly paid employees.
Now 50, I think is just a little too low.
If I'm being completely honest, I think we need to change this and we should do a union and then add another select statement.
Let's bring this down.
I think the 50 is too low.
Let's change it to 40.
and let's add one more thing let's say and the gender is equal to male and then we'll go down here and say where the gender is equal to female because we want to separate this out so we want to know who's the old man oh that's actually old lady this is the female one and for up here where it's male we'll say old man so
We have three different SELECT statements using two separate unions.
We're selecting the first name and the last name in all of them, keeping the data consistent.
And then in our third column, we're labeling it either old man, old lady, or highly paid employee.
Let's go ahead and run this.
And let's look at our output.
Now, you may notice something really quickly, that Chris Traeger and Leslie Knope are an old man and an old lady, and Leslie Knope and Chris Traeger are both highly paid employees.
So these people meet multiple criteria.
So let's actually order by, and then we'll do first underscore name comma last underscore name, because we want to order by these to see.
So let's run.
And now we can easily see that Chris Traeger is both of these.
Donna is just an old lady.
Barry is just an old man.
And Leslie is both an old lady and a highly paid employee.
So now we can send this to whoever we need to send it to to make sure that these people get looked at first so that our job is still secure.
The job market is tough these days.
You gotta do what you gotta do.
So that is how we use Union.
And let's just take one more look at it.
There we go, so this is how we can use unions.
It's kind of a real use case.
I've done something very similar to this in my real job, but this is just an example of how you can have multiple SELECT statements all combined or combining the rows using a union.
Hello everybody.
In this lesson, we're going to be taking a look at string functions.
Now string functions are built-in functions within MySQL that will help us use strings and work with strings differently.
Now we're going to look at a ton of different ones.
They all have different use cases, but I'll try to walk through some of those as we go along, but we'll look at a lot of different string functions in this lesson.
We'll start off with one that's really simple.
This one is called length.
So if we select, and then we say length, and let's say we put in
and I don't know why it's popping up like that.
Let's say we put in something like sky or skyfall or really anything.
If we run this, it's gonna give us the length of how long this string is.
So if I come down here and we say select everything from employee underscore demographics and let's add a semicolon here and let's run this one right down.
Now, what we can do is we can look at how long each person's name is.
What we can do is just take the first name, but then we'll also do the length of the first underscore name.
So if we run this, now we get Leslie, Tom, Jerry, Donna, and it gives us the length of their name.
If we wanted to, we could even order by this.
So we could do order by, and we could just do two for now.
and we can order by the length from the shortest name all the way to the longest name.
Now one use case that I've used length for in my actual job was when I was working with phone numbers.
I wanted to make sure that they were exactly 10 characters long, otherwise something went wrong somewhere in the data cleaning process.
So I would go and look at the length, and I would make sure they're all 10, and if any were above 10, I would go and specifically look at those and try to clean those and fix those up.
Now let's go on to the next one, and these next ones are pretty simple ones, at least I think they're fairly simple.
We're gonna look at upper first, and it's doing the same thing as the other one.
We'll do upper, and let's say we're gonna do sky.
If we select upper sky, it's gonna give us an all uppercase.
Or we can copy this, and we can do lower.
So now, let me add semicolons, otherwise it's going to drive me crazy.
Let's try this lower now.
It's going to do all lower, even if I make it all capital.
So if I say all capital sky, it's going to make it all lower.
So if we come back up, let's copy this.
And instead of doing the length, now we'll do upper.
go ahead and select this so we have leslie and then we have the upper first name so upper all case leslie now this is actually really good this is really helpful especially with standardization is what i found a great use case for it because sometimes it'll be all capital tom and sometimes i'll put it in as t lowercase om and just making them all uppercase or all lowercase can help
correct those really simple standardization issues within a single column.
The next one that we're gonna look at is trim.
Now, there's multiple trims.
We have trim, left trim, and right trim.
Trim is basically going to take the white space on the front or the end and get rid of it, which is really, really helpful.
So what we're gonna do is we're gonna come right here.
I'm gonna say select, and we'll start off with trim.
And let me add a semicolon every time.
So then we'll do trim, and for our actual string, we'll do something a little bit odd.
We'll do some spaces, and then we'll do sky, and then we'll add some spaces.
Now let's run this and add our semicolon.
That's going to be the end of me in this lesson, just adding semicolons.
Now it fixes it completely.
Now what if we don't add sky at all?
We'll just keep it like this.
Well, you can see that there's spaces before and there's spaces after, but that's what trim does.
Trims gets rid of the leading and the trailing white spaces.
Now, if we come up here and we just do the left trim, it's only gonna remove from the left-hand side.
So we're only getting rid of the left-hand side white spaces.
This right-hand side, as you can see, is really long.
It's still there.
And if we do our trim, we go ahead and run this one, it gets rid of the white space on this side, but it doesn't get rid of the left space on this side.
Now let's keep going.
We have a lot to cover still.
We're gonna move on to what I think is probably my most favorite string function, if I'm allowed to have a favorite string function, and that's substring, but I'm gonna kinda work us into substring a little bit by looking at two smaller functions, which is left and right.
So, let's select everything, and we'll do that from the employee demographics again, dot or semicolon.
Now, I'm gonna run this.
Now, I wanna get the first name,
and I'm gonna do left of the first underscore name, just like this.
Now, when you're using this, there's actually gonna be an error.
Let's see if I highlight over this if it'll tell me what the error is.
It says the parentheses is not a valid position.
They're expecting something else.
And basically what they're telling us is that this is not how it should be written.
We're looking for a different value, and that different value is actually a number.
We're gonna do comma, and let's do four.
That's what it was looking for.
It didn't want this at the end, it needed this comma four.
And what we're actually specifying is how many characters from the left hand side do we want to select.
So we're selecting the first name and we're going from the left, four characters.
Let's go ahead and run this.
And so we have Leslie, Tom, April, all the way down.
You can see that there's only four characters in each one.
So someone like Chris, the S is no longer gonna be there because we're only looking at the first four characters.
Now we can do the exact same thing.
And let's actually copy this down here.
So we'll come, we'll go like this.
I make this a little more professional.
And we'll do right.
So now we'll do right.
If we do the right for, it's gonna go from the right hand side of the string and go left for.
So we're looking at the far foremost right characters.
Now this can be useful in certain instances, but if I'm being honest, I don't use these that much.
For the most part, I'm pretty addicted to using substrings.
I love substrings, I think they're fantastic, and let's look at substrings like this.
So a substring is going to allow us to do a few different things.
Let's do first underscore name.
The second thing that we put within this function is the position that we want to start at.
So let's say we want to start at the third position, and then we specify how many characters we want to go.
So with this, we specified four, but let's just do two.
So now we're going to the third position and we're going over to the right two characters.
Let's go ahead and run this.
So with Leslie, we get SL.
So we go one, two, and three.
We start at the third position and then we take two characters, the S and the L.
I have found this one to be extremely, extremely useful.
Let's take this for example.
Let's do comma.
Let's do birth underscore date.
And let's run this.
I'm keeping everything in here, although it might be a bit much.
Let's say we have this birthday and this middle column is the month.
And we're running some, you know, query.
We want to find the month that everyone is born.
So we can do that very easily using substring, and we wouldn't have been able to do this very easily using left or right.
So now we're going to take this birth date, and we're going to use this substring, and we want to select these middle characters.
So what we need to do, since it's all standardized, we do 1, 2, 3, 4, 5, 6.
We start at position 6, and we want to select 1 and 2.
Let's go ahead and run this.
And now we've pulled out all of the months.
So we can say as birth underscore month.
And now we can save that, put it into a temp table, add it as a new column in our table, whatever we wanna do.
But now we have this information that we desperately, desperately wanted to know.
So that is left, right, and substring.
Again, substring is, it's fantastic.
Now let's keep going.
The next thing that we're gonna take a look at is replace.
Now replace will replace specific characters with a different character that you want.
So let's actually copy all this right here, because I don't want to keep writing this out, and we'll say select everything.
Now what we're gonna do is we're gonna take the first underscore name and then we're gonna say replace and then we'll also do the first underscore name.
But we can specify what we want to replace and then what we want to replace it with.
So we have two more parameters that we need to put in this function.
So let's say A and let's replace it with a Z.
Let's just see what that does.
Let's go ahead and run this.
And so now when we see the letter A and we are specifying a lowercase A, like mark, that is replaced with a Z.
So that's really all replace does, specifies what you want to replace and then what you're going to replace it with.
Now let's take a look at the next one.
And we're gonna take a look at a function called locate.
So if I say select and let's do locate,
I'm gonna give it a string.
I'll say Alexander, that's my name.
And I'm gonna specify what I'm looking for.
So let's close this parentheses.
The string that we're actually looking for comes first.
So what we're gonna do is I'm looking for the letter X in my name.
So we'll do X and Alexander.
Let's go ahead and run this.
And it tells us that it is in position four.
So we have one, two, three, and four.
That's where our position is.
That's where it locates that sequence that we're looking for.
Now, if we pull this down here, place this right here, and we'll change this locate.
Now, let's say we're still looking at the first name, but we wanna locate people that have an A-N like this in their name.
Let's go ahead and run this.
and we get zeros for everybody except for Ann and Andy.
So this might be something where we put it into a CTE or a temp table, then we can filter down based off of these results to where it only equals one.
Now the last one that we're gonna take a look at, and let's go right here, we're gonna do first name, last underscore name.
now this one is super super useful because what we can do is have a concatenation of multiple columns so let's go down right here so we have first name and last name but if we come down and we say concat we can then combine these columns into one single column so we'll do concat and we'll do first underscore name and then comma last underscore name
And if we run this, it's going to be Leslie and Nope combined into Leslie Nope.
Now this doesn't look perfect, right?
We don't want it to look like that.
All we have to do is come in here and we could do a little space.
So we'll add a space in there.
And if we run that, now we have Leslie and Nope and we can call this as full underscore name.
And this is something that I've done a million times in my real job where there's multiple columns.
We want to create one column out of it or take two columns and create one column.
Happens all the time.
So this concat is really, really helpful to combine those columns really quickly.
Hello, everybody.
In this lesson, we're going to be taking a look at case statements in MySQL.
A case statement allows you to add logic in your select statement, sort of like an if else statement in almost all other programming languages or even things like Excel.
Let's see how this actually works.
So let's bring this down and let's take this employee demographics table and let's take the first name and let's take the last name and let's add a case statement.
How we need to do this is we have to say case.
So that's gonna signify that we're starting a case statement.
And then I'm gonna go over here and say tab.
So this is where our logic comes into play.
So I'm gonna say when the age, let's say is less than or equal to 30,
then so I'm saying if the age is less than or equal to 30 then what's going to happen we'll just keep it really simple for now we'll just say that this person is young and then if we want to end the case statement we'll come down here and say end so this is a complete case statement let's go ahead and run it and let's take a look at the output
So we have the first name, we have the last name, and then we have this case statement right here.
And if their age is less than or equal to 30, they're young.
Let's actually add the age right here just so we can visually see that as well.
So we have the age.
So this person is the only person who's under or equal to the age of 30, that's April, so she has a label of young.
The great thing about case statements is you can add multiple when statements.
So we can come down here and say when, and then we can do something like when age, and maybe we'll say between.
So I don't know if in previous lessons we've looked at between, but between just says between this number and this number.
So we'll say between 31 and 50.
If they're between 31 and 50,
Well, good night.
That person is old.
So we're going to have it just like this.
We're going to run it.
And now we have a lot of people who are old.
These are all people between the ages of 31 and 50.
But we still have more people outside of the age of 50 or older than 50.
So we could do when the age, and now we can say greater than or equal to 50.
And we're going to say then.
And then we're going to say on death's door.
Because good night.
If you're over 50, my parents are going to love me for this one.
So let's go ahead and run this.
and then if we look at this we have on death store right there now this is huge this is massive so let's actually name this and we'll just say as at the end of end so right after end we'll say as age bracket and let's run this
and this looks a lot better.
So now we have this age bracket just signifying kind of where people are at and most people are quite old or Jerry, you know, can't catch a break that guy.
Now let's go down and let's take a look at a different table.
So let's select everything.
We'll do from employee underscore salary.
Now that we have our employee salary table, here is the scenario that we are given.
The Pawnee Council sent out a memo of their bonus and pay increase for end of year.
And we need to follow it and determine people's end of year salary or the salary going into the new year.
And if they got a bonus, how much was it?
So the first thing we need to do is we need to get the pay increase.
and bonus, and their pay increases look like this.
So if they made less than 50,000, then that equals a 5% raise, very generous.
And if they made greater than 50,000, that equals a 7% raise, very, very generous.
Lastly, if they work in the finance department, that equals a 10% bonus, just cash that goes into their bank account.
Very, very generous, but only the finance department gets it.
So these are the guidelines that the Pawnee Council sent out, and it is our job to determine and figure out those pay increases as well as the bonuses.
So let's come right down here.
We're going to have our salary employee.
I actually want to be able to see these.
Let me pull this up just a touch.
There we go.
So we want to be able to write this out.
So first thing we should do is just select the columns that we need.
First name, last name, probably salary as well.
And now what we can do is determine this first one, which is if they make less than 50,000, they get a 5% raise.
So let's say case, and I'll also add end in here.
And we're gonna say when their salary is less than 50,000,
what's gonna happen.
Then we say then salary, so we're taking their initial salary, and we're saying plus, then we're gonna do salary times 0.05.
And if we run this,
should work.
Let's pull this up really quickly.
So April Ludgate, she made under 50,000.
So she got a raise and her new salary is 26,250.
We can actually call that, we'll say as pay underscore, actually let's do new underscore salary.
That's our new salary.
And let's run this.
So the new salary is $26,250.
Andy Dwyer is now making $21,000.
Now this calculation, you can do it different ways.
We can do it exactly like this, or we can just do times 1.05.
Should be the exact same thing, just however you'd like to write it out.
It's just adding it or multiplying it by this.
So let's take this.
And now we're gonna say when it is greater than 50,000, so let's say greater than 50,000, they get a 1.07.
So this is the 7% increase.
This is a 5% increase, this is a 7% increase.
And let's run this.
And let's put this up here.
So now, if they made greater, so 50,000, 75,000, they got a 7% increase.
Now, unfortunately, we did not make the rules.
The Pawnee Council did.
And the people who made exactly 50,000, unfortunately, were not part of those brackets.
And that just wasn't up to us.
We couldn't control that.
So unfortunately, Tom Haverford and Jerry Gergich just didn't get raises this year.
And that's not our fault, okay?
That's not our fault.
Now, the next thing that we need to do is determine the bonuses.
Now, let's come right back up here really quickly.
Let's just copy this.
Because what we need to determine is how we know that somebody is in the finance department.
Because if they're in the finance department, that means they get a 10% bonus.
That's really important.
Now, it's not in the employee demographics.
Don't have anything about the department.
But if we look in the salary and we run this, we do have the department ID.
Now let's open up and let's pull this up right here.
We'll look at the parks department.
And in the parks department, here we go, the finance is department ID of six.
So if we're looking at the salary, there's only one person who's in department ID equal to six.
So what we can do is another case statement.
We can say comma, we'll do case and end, and we'll do another one.
We're gonna say when deptid, so when the department ID is equal to six,
Then we're going to give them a bonus.
We're going to say salary times 0.10.
And we'll call this as bonus.
Let's go ahead and run this.
And let's pull it up.
So he gets a $7,000 bonus this year.
That's Ben Wyatt, uh, because he was part of the finance department that just did an exceptional job this year, apparently according to the Pawnee council.
So that is how case statements work.
They're really powerful, really useful.
I honestly use them quite often and they're just a way to really add some logic and some.
you know, labeling or even do calculations like we did right here with the salary.
Hello, everybody.
In this lesson, we're going to be taking a look at sub queries in MySQL.
A sub query is basically just a query within another query.
We can do this in a few different ways, and I'm going to try to show you a lot of the different variations within this lesson.
The first way that we're gonna use a subquery is in the WHERE clause.
Then we'll take a look at the SELECT and the FROM clause also.
Let's take this demographics table that we have down here.
What if we only wanted to select the employees who worked in the actual Parks and Rec department?
Well, we could do that if we had a few joins.
We have this salary table, and one actually represents that they work for the Parks and Rec.
If we come over here and we open this up, we can see that Parks and Rec is the department ID of one.
So we do have that option.
We could just join these two tables together, but sometimes we don't wanna do that, and we'll use a subquery.
Let's see how it works in the WHERE clause.
So let's go ahead and get rid of this.
So what we're gonna do is we're gonna say, select everything from employee demographics where, and now we wanna pull, because this is the salary table, we wanna pull employee IDs where the department ID is equal to one.
But remember, we're querying off of this table.
So let's actually pull this up.
This is what we're working with.
So we wanna say where the employee underscore ID, that's referencing this column in the demographics table,
is in what we're going to do is we're going to do a parentheses here and we can even come down and put a parentheses down here so what we're going to do now is write our query which is our sub query and this is our outer query so now we're going to write an entirely other query within this we'll say select and now we're going to say employee underscore id
and let's just bring this over.
I usually have it something like this, and I'm gonna try to bring this down a little bit.
So select everything, and then we'll do from, and then instead of employee demographics, we'll do employee salary.
And let's just format this a little better.
So select the employee ID from employee salary and remember we wanted to do where the department ID is equal to one.
Now let's bring this back up and this is what the query is going to look like.
Now just by itself let's run this subquery or this inner query.
When we run this, it's gonna create this list of just employee IDs where the department ID is equal to one.
So when we say where the employee ID from the employee demographics table is in, it's gonna try to match those employee IDs to this list of employee IDs.
So just remember one, two, three, four, five, six, and 12.
Let's go ahead and run this entire query.
Now we have one, three, four, five, six, 12.
If you remember from previous lessons, the two is Ron Swanson and he's only in the salary table.
So since we're doing just the employee demographics table, he's not in here.
So what we're doing is we're selecting everything from the employee demographics where the employee ID in this table
matches or is in the select employee ID from the salary table where the department ID is equal to one.
In essence, this is what a subquery is.
It's a query within a query.
Now what would happen if we have the employee ID but we also wanted to say the department ID?
Because we just wanted to view this.
Let's go ahead and try to run this.
we are gonna get no output, and we're gonna get an error that says operand should contain one column.
The operand referring to this entire thing right here, because this is an operator.
So this is our operand, and we're returning two columns in here, which is saying we cannot do, we have to only have one.
So now if we run this,
it works perfectly well and let's bring that down now we can also use the subquery in a select statement so let's take a look at that next let's go down here and let's say we want to do select everything from employee underscore salary and let me spell that right
Let's say we wanna look at all the salaries, just like how we have it now, but in a column next to it, we also wanna compare it to the average salary for everyone.
So we'll be able to see whether somebody's salary is above average or below average.
So what we would try to do potentially is do something like first underscore name salary and average salary.
And we try to run this.
And of course we're gonna get an error.
It's gonna basically tell us that we need to group by if we're doing this.
So let's go back down and let's actually add that group by.
And we'll say group by first name and salary.
And we'll look at this output.
And this is not looking good at all.
It's just looking at the average salary for each unique row, which is Leslie 75,000.
So the average is 75,000.
This is not what we're looking for.
This is not what we want.
Here's what we really do want.
We wanna just take the average salary of this entire column, regardless of group by or anything else.
So let's get rid of this and let's see how we can do that.
So let's come right down here.
We're gonna say select.
select the average salary, and then we're gonna say from, out of parentheses, because this is our subquery, from the employee salary table.
Just like that.
Now if we run this, we should get the exact output we're looking for.
So the average salary is $57,250 and we have our salary right here.
So we can compare really quickly just like that.
And we can also use a subquery in the from statement.
So let's go down here and let's say select everything from employee underscore demographics.
have it autocomplete for me so we have the employee demographics table now let's create a group by based off the gender column add some aggregated functions and I'm going to show you how you can use this as a sub query let's go up here say gender and then we'll go ahead and add our group by so we'll say group by gender as well now let's add a few things we'll do average we'll do average age
and then we can do, let's just do all of them based off the age.
We'll just do age, min of age, and count of age.
try to write fast it doesn't always go right so we have this let's run this and this is what our output is going to look like now what if we wanted to get the average of the oldest age or the average of the smallest ages or you know see what the average count is for males and females well we can't do that given this table but
Let's do something right here.
Select everything, and then we're gonna say from, and in our from statement, we're gonna have a parentheses, we're gonna paste our select statement, and then close the parentheses.
So we're gonna select everything from this output that is right down here.
So if we run just this,
We're going to get an error and forgot this was going to happen, but every drive table must have its own alias.
So you have to name a table.
I forgot it does that.
All we have to do to fix this is just name it.
So we'll say as, and we'll say aggregated table.
We'll just call it aggregated table.
So let's run this and we get the exact same output, but here's the neat thing is we can now select, we can do gender.
And these are actually the column names now.
So I can do the average of this column right here.
But I can't do it just like this because it's gonna give us an error.
and I'll show you why in just a second.
It says unknown column age in field list.
So what it's saying is we're trying to perform an aggregated function on the aggregation of an age column, but we don't have an age column in our table right here.
Let me run this again.
we have a column named this exact thing.
So what we actually need to do is do this backtick and backtick.
This is the actual name of the column.
It's not an aggregation anymore.
The backtick on my laptop is right above the tab on the far left-hand side, right under the escape.
That's where mine is.
So these backticks, it's not a quote like this, it's a backtick.
So you just need to find that on your keyboard.
But now if we run this, it looks like we encountered another error.
says in aggregated query without group by that's right now we need a group by so now we need a group by gender sometimes you got to figure this out on the fly and it should work there we go so now we can perform aggregations on this table now this doesn't actually work perfect because we're still grouping by the female male but let's get rid of this for a second and we'll get rid of this group by entirely
And if we run this, we're now looking at the averages of this column right here, max age.
Now, when you're doing something like this, it's actually really smart to rename these.
We'll say as average underscore age.
We'll say as max underscore age, and it makes it so much easier.
You don't have to do these back ticks anymore.
Um, and as men underscore age and so on and so forth.
And I would probably format this better.
and stuff like that.
We don't have to go through everything, right?
I'm just kind of giving you an example.
But then when we're using this table, these columns are actually named this.
So I don't have to do these back ticks anymore.
I can just take this whole thing.
Oops, you're that back tick.
Now I can just take this column because this is the column name.
So let's go ahead and run this.
and it's still gonna work perfectly.
So this one's pretty cool because you're basically creating this kind of like a temp table.
You're just creating your own little output.
Then you can query off of it and you can do more advanced calculations this way.
It's actually really useful.
But there are better ways to do something like this, like a CTE or a temp table that we'll look at in the advanced series.
But this is at least how you can do it and you can actually try it out using subqueries.
So hello, everybody.
In this lesson, we're going to be taking a look at window functions.
Now window functions are really powerful and are somewhat like a group by except they don't roll everything up into one row when grouping window functions allow us to look at a partition or a group, but they each keep their own unique rows in the output.
We're also gonna look at things like row numbers, rank and dense rank at the end of this lesson.
So before we jump into writing a window function and seeing how the syntax works, let's actually write out a group by and then we'll compare the two when we actually do write the window function.
Let's say we wanna take this demographics table and we wanna take this gender and compare it to the actual salaries.
So what we actually need to do, we need to say join and we're gonna join on the employee salary.
Let's go like this.
get rid of all of this and we'll do salary
and we're gonna say on and let's do DEM and SAL for the aliases.
We'll say DEM.employee underscore ID is equal to SAL.employee underscore ID.
Now we're gonna come up here and we're gonna say gender comma and we wanna look at the average salary and we need to get rid of this right here and we need to come down to the bottom and say group by gender.
Now, let's go ahead and run this query, see if it works, and it did.
So we have our gender and we have our average salary from our salary table, and we can rename this as average, and we'll do average underscore salary, just like that.
So this is how group by works.
It rolls everything up into one row.
Now let's try doing something pretty similar, except we're gonna use a window function.
Let's come right down here, and let's paste this, and let's start writing out our window function.
Now we don't have to use the group by, we're gonna go ahead and get rid of that.
And right here for gender, we can keep that the exact same.
All we're really gonna change is this part right here.
We're gonna say average salary, and that is part of creating a window function.
Typically with a straightforward window function, all we have to put is over with a closed parentheses.
This is gonna say we're looking at the average salary over, and normally in here you'll specify something and we'll get to that in a little bit, but we're just gonna look at an average salary over everything.
So let's go ahead and run this output.
So this is gonna look a little bit different, right?
So if the male and female all have their own individual rows, which is not the same as group by,
And this average salary is looking at the average salary of everybody.
We're not breaking it out by the gender like we did up here.
Here we rolled it up.
Now we're looking at the average salary for the entire column.
Now what we can do is actually partition by.
Now partition by is going to separate it out kind of like grouping it.
So let's say partition
partition by, and we'll say gender.
So just like when we did the group by, the group by rolled everything up into one row, this is not gonna roll everything up, but it is going to perform this calculation based off of the different genders, the unique values in this column.
Let's go ahead and run this.
And if you'll notice, the female is 53,750, the male, 57,428.
Now let's go compare these.
I'm gonna run this and this query.
Let's run this.
So if we look at our group by, it's the exact same numbers, except we have it on their own individual rows.
Now, why would we want this?
Well, let's say we want additional information.
So let's just look at this one for now.
So in this one, let's say we wanted to add additional things like the first name.
So we'll do dem.first underscore name.
We can do last or dem.last underscore name.
So we can add other information and it doesn't affect this column at all because we're using a window function.
If we try to add these exact things, I'm going to go up here and do it.
If we try to add these exact things to this, let's see if, yeah, that works.
And then we also have to group by this.
If we run this query now, it's gonna be completely different because we're using a group by.
We're grouping by the first name, the last name, and the gender.
We're breaking everything out based off of the unique values in these columns, whereas down here, it's completely independent
of what's going on in these other columns.
All we're doing is we're doing a window function just based off of that column.
So I think that's pretty amazing.
And there's a lot of additional functionality that we can do with these window functions.
And we're gonna take a look at a lot of those things in just a little bit.
Let's try another example really quickly.
Let's literally just copy this, paste it down here.
And all we're gonna do is we're gonna change this to sum.
So now, instead of the average salary, we're looking at the sum of salaries and we're still partitioning by the gender.
Let's go ahead and run this and let's pull this up.
So all the men together make $402,000.
All the females make $215,000.
Now what we're about to do is something called a rolling total.
If you've never heard of a rolling total, a rolling total is super cool and can be done within MySQL.
A rolling total is gonna start at a specific value and add on values from subsequent rows based off of your partition.
So all we have to do is add an order by, and we're going to order by, let's say the employee underscore ID.
Let's go ahead and take a look at this.
And it looks like the employee ID is ambiguous.
I had a feeling.
So I just need to say DEM dot employee ID.
Let's try this one.
So now we have something called a rolling total.
I'm gonna actually name it as rolling underscore total because this is super cool that window functions can do this and this is something that a lot of people in like finance do.
I did it myself when I worked in healthcare and it partitions based off the female and you can't see the employee ID but there's an employee ID that we're kind of ordering on in the background.
Now what it's doing is it's starting with Leslie Knope and she made 75,000.
Then the next person, April, she made 25,000, which equals 100.
And just to actually see this better, I'm going to add salary.
And so Leslie Knope had 75,000.
Then we're adding this 25,000 to the 75, and we get 100.
Then we're adding the 60,000 to 160,000.
Then we're adding 55,000 to 215,000.
So we're adding...
Every single time we're adding this salary to the already existing total, all the way up to our grand total, which was 215,000.
The exact same thing happens with the males.
So we start with 50,000, then we add 50, then we add 90, then we add 70, all the way up to 402,000.
Now you can do this in a lot of different configurations on a lot of different columns.
But in essence, this is exactly what a rolling total is.
That's how it works.
And we were able to partition based off of this column, we don't have to use partition by, we could do this completely regardless of the partition.
But I thought it was interesting to at least break it out by female versus male.
So now that we know how to use a window function, let's look at some special things that you can really only do
with window functions or window like functions.
So we're going to bring this down.
And what we're going to do is get rid of this entire thing.
And we're going to look at something called row number, then we're gonna look at rank, and then we'll look at dense rank.
So let's look at row underscore number.
And this is just like an aggregate function, like you're doing the average age or average salary or something like that.
This is what we're doing.
We're doing a row number.
Now we're going to do this over and we'll just do everything for right now.
So let's go ahead and run this and just see what it looks like.
Let's bring it up.
And what we're doing is we're saying, okay, we have first name, last name, gender salary.
That's all great.
But then we get to row number and we're doing a row number based off of everything.
It doesn't matter what it is.
So we're starting at one, which is the very first row.
And we go all the way down to the bottom, just like an employee ID.
So let's actually add that.
Let's do DM dot employee underscore ID, just like this.
So we have this 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
Now, if you remember on this table, we are missing Ron Swanson.
So it kind of skips that.
But it's basically like an employee ID.
We're kind of giving it its own unique value.
And these row numbers are not going to repeat itself if you do it like this.
Now, they can repeat themselves if we do a partition.
And let's do a partition on the gender again because we know how to do that one.
We'll do partition.
by let me spell that right partition by the gender now we're going to add a row number based off the gender but again it's broken out or partitioned by gender let's look at this
Now it goes for the females, one, two, three, four.
Then for the males it restarts, one, two, three, four, five, six, seven.
Now this is just in a random order based off how the data was stored in the table itself.
Now what if we wanted to kind of rank these based off of the highest salary first down to the lowest salary?
You nailed it, we just add an order by.
We'll order by salary.
And if you want to do it from highest to lowest, giving the highest salary the number one and the lowest salary, you know, later down, we'll do descending.
And let's run this.
And you'll see that for female, we're still partitioning by gender.
For female, the highest salary is one, next is two, three, and then four.
Then for males, the highest salary is one, all the way down to seven.
So that's what row number does.
It just gives a row number based off of whatever you're partitioning by or ordering by in your window function.
Now, let's go over here and add a comma, and we're gonna add, and let's go down just a hair,
let's add rank so I want to do rank and we'll do our parentheses now rank is going to give it more of an official rank and let's see how this works so we'll do rank and we'll do over partition by salary descended the exact same thing and while we're here I'm going to rename these I'm going to say as as row underscore num and we'll call this one uh rank underscore num
let's go ahead and run this and it looks very very very similar except for one small thing this right here so when we're using the row number whatever we are partitioning by it's not going to have duplicate rows within that partition it just won't so even if there's 50 000 right here it's just going to automatically assign it based off of something that is running in the background whether it's the order of how the data is stored in the table or some other order by that you are using on the table
Now, rank is a little bit different, because rank is going to take it just like it did the ronum, except when it encounters a duplicate based off of the order by, which is the salary, it's going to assign it the same number.
So this is five and five.
What's unique about rank is that the next number is not gonna be the next number numerically, it's gonna be the next number positionally.
So this is one, two, three, four, five,
This is kind of like a six.
And then it goes to seven.
So it skips number six.
Now there's another one.
Let's copy this rank.
There's another type of rank called dense rank.
And we'll do dense underscore rank.
So we'll do dense underscore rank.
And let's run this.
And let's pull this up.
There we go.
Now dense rank is ever so slightly different than rank in the fact that when it gets down to duplicates, it's still gonna duplicate them.
So it's gonna have a five and a five, but it's gonna give the next number numerically, not positionally.
That is the only real difference between rank and dense rank.
And again, row number is just not going to have duplicates.
It's gonna give it its own unique within that partition.
So I know I just threw a lot at you, but that's row number rank and dense rank in a nutshell.
And you can review this mess around with it, all of these things, because you know, these are actually really, really useful.
Hello everybody.
And welcome to the first lesson in the advanced MySQL tutorial series.
Today, we are going to be looking at CTEs.
Now CTE stands for common table expression.
They're going to allow you to define a subquery block that you can then reference within the main query.
Now, that may not make perfect sense, but we've looked at subqueries in the past or in previous lessons in the intermediate series, so you kind of understand that it's kind of like a query within a query, except we're gonna name this subquery block, and it'll be a little bit more standardized, a little bit better formatted than actually using a subquery.
Let's take a look at the basics of writing a CTE.
Let's pull this down really quickly.
And all we're going to do is we want to create this as a CTE.
We'll say with, and that is our keyword to define our CTE.
So we're gonna say with, and then we're gonna name our CTE.
And we'll just call it CTE, and we'll do underscore example.
And then we're gonna say as.
So this is how we define it, and now we need to actually put it in parentheses.
Now you can do this in several different ways.
I'm gonna do it kind of like this, just to really emphasize that this is within the CTE.
Now CTEs are unique because you can only use the CTE immediately after you create it.
So if we come right down here, and we come right below it, we'll say select everything, and we're gonna say from CTE example.
We'll say from CTE example.
And let's bring this back up.
Now if we run this,
We're gonna get the exact same output.
Now this should seem pretty familiar, almost like we're using a subquery.
And within our subquery, we have this right here.
We're kind of building our own little table.
And then we can query off of it down below.
So we can come down here, and let's actually change the names in here.
We're gonna say average underscore sal, and we'll change all of these real quick, just because
I don't like having to actually put the tick marks.
I don't like doing that.
So here we're gonna say max, then we'll say min, and then we'll say count.
And let's go ahead and run this again.
And so now we have these different names.
And when we come right here, we can say select, and then we'll just do something really simple.
We'll just do the average of average underscore cell.
So the average salary.
And let's run this.
And so this is the average between both the males and the females.
Kind of the purpose of these CTEs is to be able to perform more advanced calculations, something that you can't easily do or can't do at all within just one query.
Another reason to use a CTE is just the readability.
You can absolutely write this using a subquery, and let's do that really quickly.
And it's just gonna be a little bit tougher to read and look at.
So let's come right up here, we're gonna say from, and we'll do right here, we'll say select everything, we'll do select average Sal from here, and we're gonna need to name this, so we'll say do example underscore subquery, and we'll get rid of this, and then we just need to get rid of this, and we can run this query, and we get the exact same output.
Now, if I formatted this exactly the same, just like this, and the names down there, if we look at this, the syntax is just a little bit more difficult to read.
We're selecting the average of average Sal from, and then we have our subquery right here, and then we're naming it at the bottom.
If we scroll up and compare this,
This one just looks a lot better.
Now, when you're writing in MySQL, sometimes it doesn't matter if it looks pretty or not as long as it gets the job done.
That is true, especially if you're just gonna be using it yourself.
But in a more professional environment, when you're using this in your actual job, there are gonna be people who've been using this for 10, 20 years, and they're gonna expect you to write it well.
They don't want it to be really messy.
They aren't most likely gonna want it to be written like this.
I've been using it for quite a long time,
and I much prefer CTEs over subqueries just visually, and it makes it a lot quicker to actually read through.
So that is just one of the reasons, although you get the exact same output.
Now there is some additional functionality within CTEs as well.
Now one thing that I mentioned just a second ago is that when you build a CTE, you can only use it immediately after.
You can't use it right below it.
So let's go ahead and let's copy this query, and we're gonna bring it right here.
If we try to run this, and let's do this, we're gonna get an error.
And let's pull this up.
It says TableParksAndRec.CTEExample doesn't exist.
So we're looking for a table called CTExample in our database, but it's not there.
Now the reason this happens is because you're creating a CTE.
You're not creating a permanent object,
like a temp table, which we'll look at in the next lesson, and you're not creating a real table, and you're not creating a view, you're really not creating anything.
It's just a common table expression to create this table right here, this basically almost like a temporary table, almost,
But then you're just using it to query off of it.
You're not saving it.
You're not storing it in memory.
You're not really doing anything with it.
It's just like writing a regular query.
So this is why you can only write it immediately after creating the CTE.
You can't write it down below and reuse it because it's just like calling a query that you wrote before.
It just isn't going to work.
And the next thing that I want to take a look at, and let's copy this down here.
Next thing I want to take a look at is that you can actually create multiple CTEs within just one.
So if you wanted to do a more complex query or joining more complex queries together, we can do that all within one CTE.
So let's come right here.
Let's get rid of all of this.
And we're gonna say from the demographics table, we're gonna say where birth underscore date, and let's just do as larger than 1985-01-01.
So we have one query and we'll take just a few columns from this table.
So we'll take, let's say the employer or employee ID.
We'll take the gender and the birth underscore date.
So this is one query, and we're filtering just based off of this birth date.
Now when we create this, this is the CTE example, but we can have a comma here.
We can come down below, and then we can say CTE underscore example two.
and I need to combine that, so two.
And then we can say as, and then we have another query.
So then right here, we could say select everything, we'll change that in a second, from employee, say salary.
And in the salary, we'll just do a simple one.
We'll do where salary is greater than 50,000.
And we'll actually just take the employee underscore ID and the salary.
Now, if I come right down here, I can say select everything from CTE example, which is, and let me scroll up so we can see everything.
That's our original.
Our CTE example is this first query right here.
Then we're creating our second one right here, and we can join basically on these two common table expressions.
So now we can say join, and then we'll do CTE underscore example two.
Let me change that X.
And then we'll say on, then we're just gonna do ttexample.employee__id is equal to ttexample2.employee__id.
And not an equal sign, but a dot.
There we go.
Now if we run this, it should work, and we can pull this down and look at our output.
Now this is just an example, this isn't a real use case, because of course we just join these two tables together normally, but you can imagine you have a much more complex query, or you're doing a lot of functionality within this table, and you just want a certain subsection of this table, and you're wanting to combine those.
This is how you can do that with ACTE.
So now we have all of our information right here,
And that can be extremely, extremely helpful.
Now one last thing that I want to show you, I want to go all the way back up really quickly, right here.
Let's run this one one more time.
And let's actually take everything.
And let's run this.
So here we have our gender, average salary, max salary, min salary, and count salary.
The last thing that I wanna show you, and this is more of something that's just somewhat helpful, you don't have to actually do it in your main query, is before we went in here and we changed all the column names by doing an alias, by saying as, and then saying the average salary.
And the as is just implied here.
But we're changing this via an alias.
We don't have to do this.
In fact, we could come right here and we could do a parentheses.
We could call it gender.
We could call it average underscore salary, max underscore salary, min underscore salary, and let's do count underscore salary.
So now if we were to run this, let's change this up.
We'll do capital on this one.
If we wanted to run it like this, when we run this,
It'll change all of those names to where we have it right here.
So this will be the default.
This will overwrite the column names that you have in your actual CTE expression or the query that you have within your CTE.
So that is all we're going to take a look at in this lesson on CTEs.
These are very, very helpful, definitely help with more complex queries, and they're just really easy to read and understand, which is why I personally use them a lot.
Hello, everybody.
In this lesson, we're going to be taking a look at temporary tables.
Now, temporary tables are tables that are only visible to the session that they're created in.
So if I create a temp table right now and I exit out of my SQL and I come back in, it's not going to be there anymore.
And we'll look at that in just a little bit.
Now temporary tables can be used for a lot of things, but how I've mostly used them, especially as a data analyst is restoring intermediate results for complex queries, somewhat like a CTE, but also for using it to manipulate data before I insert it into a more permanent table.
So let's take a look at how we can create a temp table.
There's two ways that you can do it.
I'll show you the first way, which I don't think is as popular.
And then I'll show you the second way, which is how I typically use it the most.
Now, the first way to create a temp table is to create a temporary table.
I need to sound it out like that.
It's the only way I can spell.
So we're gonna do temp underscore table.
So this is our name.
Now, if we just took this out and we created a table, this would create a table in our Parks and Recreation database, but we don't want that.
We don't want to create a temporary table that just lives inside of our memory or the memory within our computer.
Now we're going to create this temporary table much like we would a regular table.
And we're going to need to name the columns as well as the data types.
So let's do first underscore name.
And our data type can be varchar, let's say 50.
And we'll do a comma.
Then we'll do last underscore name.
We're going to keep this really simple.
We'll do varchar 50 again.
And then for our last one, we'll do favorite underscore movie.
And for this one, it needs to be longer.
So we'll do varchar, let's say 100.
Now let's get rid of this and let's actually run this after we do our semicolon.
Let's actually run this and nothing's gonna happen.
Let's click refresh.
Nothing's gonna happen.
At least you can't see it happening.
Let's pull this up and you can see that create temporary table.
So zero is affected but it was created.
Now in order to actually see it, we can do select everything and we'll do this from our temp table.
and we'll add a semicolon, and then we run this, and we have this empty table right here.
Now what's really great about these temp tables is then you can insert data into it, and it basically is like a real table, except it just lives in memory and they go away after a while, but you can reuse this temp table over and over and over again.
Now let's insert some data into here, and then we'll take a look at this again.
So let's come right down here, let's insert data, we'll do insert,
into and we want to insert that into the temp table and we're just going to say values now we just say values i'll use myself for this one we'll do alex freeberg and what's my favorite movie give me a comma that'll be lord of i think it's like that lord of the rings the two towers that's probably my favorite movie of all time now let's go ahead and insert this data
And let's pull this down here and let's run it all the way down here after we add our semicolon.
And when we run this, you'll notice that now we have data in here.
So now we can use this table much like any real table.
So that's the first way to create a temp table.
Not my personal favorite way, although there have been some use cases where I've done it like that.
I'm going to show you the way that I typically do it.
And for this, let's select everything from the employee underscore salary table.
Let's run this.
Now let's say I just wanted a subsection of this data to sit in this temp table where the salary is greater than let's say 50,000.
I could easily, easily do this.
I'm gonna say create temporary table and let's do this one as salary over underscore 50K.
Now, one thing about naming either temp tables or CTS or sub careers, or any of these things where you need to name something, I try to typically name it something that actually makes sense.
So the salary over 50 K is something I would actually name it in my real work.
I wouldn't normally name it something like temp table.
The reason for that is because when you're in a work environment and you have
lots of temp tables.
You're creating really advanced stored procedures, really advanced queries.
You have hundreds or even thousands of tables and different databases.
It gets really complex.
So naming conventions are actually pretty important or they become more important the more you get entrenched in this stuff.
So just something to think about.
Now we're creating this temp table.
Now we don't have to really insert data into it more than we're just going to select data from an already existing table.
I'm gonna say select everything from, I'm gonna say employee salary.
Then we're just gonna say where the salary is greater than 50,000.
Now I want Tom or Jerry, I want them to be included as well.
So I'll actually say greater than or equal to.
So now we're creating a temporary table based off of an already existing table, and we're just selecting data into this temporary table.
So when we run this, now we can select the salary over 50K, and let's run this.
And it works perfectly.
Now, the great thing about temp tables is they last as long as you are within that session.
Meaning if I copy this query, let's go to a new window and let's paste this in here.
Let's zoom in a little bit and let's run this.
It still works even in a new window.
But if I'm to exit out and come back in, then it is no longer going to be working.
Now let's exit out of this.
Let's come back in and we'll see if these temp tables still work.
Let's go ahead and exit out.
Oh, geez.
I'm embarrassed.
All right, let's go to MySQL.
Let's come over here to the local instance.
So now it pulls right back up.
zoom in once again on both these and let's try to pull up our salary over 50k temporary table let's run this and we're not getting an output let's go back it's going to say error code the table salary over 50k does not exist so it only lasted as long as we were within this session so that is how we create our temp tables and that's how we use our temp tables
Now, in the last lesson, we had looked at CTEs.
CTEs and temp tables both have their own use cases within MySQL.
For temp tables, this is usually for the more advanced things.
So I'm usually using these in store procedures when I'm really manipulating data and doing a lot more complex queries overall.
And oftentimes I'll use multiple temp tables and I'm joining them together and I'm just doing a lot of more advanced stuff.
With CTEs, it's typically more simple things because you can't make as advanced CTEs or as complex CTEs.
So with those, I'm usually keeping it to just one level of transformation.
I have my base CTE or my base subquery or query, however you want to call that, and I'm changing it or doing one level of advanced thing on top of that query.
That's what a CTE is really great for.
Temp tables, you can just get a lot more advanced with it.
They also last within the session.
And if I'm using it multiple times throughout something like a stored procedure, then it makes so much sense to use a temporary table.
Hello, everybody.
In this lesson, we're gonna be taking a look at stored procedures.
Stored procedures are a way to save your SQL code that you can reuse over and over again.
When you save it, you can call that stored procedure and it's gonna execute all the code that you wrote within your stored procedure.
It's really helpful for storing complex queries, simplifying repetitive code, and just enhancing performance overall.
So let's take a look at how we can create a stored procedure.
Now we're gonna start by just creating a really simple query.
We'll make it a little bit more advanced as we go along and take a look at the different things within the stored procedures that you can do.
Now let's change this query.
Let's say where the salary is greater than, let's do 50,000.
Let's actually do greater than or equal to 50,000.
We want to include Tom and Jerry as well.
So let's go ahead and run this.
Now what we want to do is save this really complex code within a stored procedure.
Let's come right down here.
And we can create a super, super, super simple store procedure by just saying create procedure and pasting that.
Now we just have to name it.
So we have create procedure and we'll call this large underscore salaries.
And then we do a closed parentheses.
Now this is as simple as it can possibly be.
It does not get any simpler than this.
So let's go ahead and run this.
And if we go down, we pull this up, you can see that it says create procedure, zero is affected.
It looks like it worked.
And if we come over here to this refresh button, you should see now that under store procedures, it drops down and we have our large salaries.
That's exactly what should have happened.
We wanted to save that into our parks and recreation.
Now, if you wanted to be careful, you could say use parks,
underscore and underscore recreation.
This is not a bad idea, but you don't have to.
But you can specify what database within your actual editor window.
Sometimes that is helpful.
But now we've created it.
Now let's see how we can call it.
All we have to do is say call.
We're gonna copy this entire thing, including the parentheses.
And let's end it with, that's right, a semicolon.
Let's go ahead and run this.
And as you can see, it worked because we got the exact output.
So we actually called this store procedure and this code ran.
So it's just a select statement.
So it worked perfectly.
Now you can also come over here to large salaries and there's this little tiny little button here that looks like a lightning bolt.
And if you click it, it's gonna open up a different window and we'll say call parks and recreation dot large salaries.
So you can do it that way as well, but we're not gonna be doing it that way.
Now what we've written right here is not best practice by any means, and I'm gonna copy this down here, because there's a lot of different things that you need to take into account when you're creating a stored procedure.
For example, this right here is most likely not what you're gonna be putting into a stored procedure.
This is super, super simple.
Typically you'll be having multiple queries.
And let's see what happens if I try to put another query in here.
And let's get rid of this.
So we're gonna select everything where the salary's greater than 50,000, then we'll select everything where it's greater than 10,000, which is everybody.
Let's call this large salaries two.
So we have two different statements in here, and we want them all to be under this large salaries two.
Let's select everything, and let's run this.
And we're getting an output, which is already not a good sign, but we created the store procedure, and then we selected everything.
So what's actually happening here?
Pull this back down.
What's happening is this is creating the stored procedure and this is just some other random query.
But that's not what we want.
What we want is everything or both of these queries within one stored procedure.
The best practice is to use something called a delimiter.
Now this right here is a delimiter, this semicolon.
So this semicolon separates our queries from one another.
It tells MySQL, hey, this is a different query.
Don't be mixing these and cause errors.
That's essentially what a delimiter does.
Now we can change the delimiter
by coming up here and saying delimiter, and we can change it to almost anything we want.
Now in my actual job, I've seen it done many different ways.
I've seen these forward slashes.
I've also seen dollar signs.
This is probably the one that I've seen the most when I worked with data engineers, data scientists, database developers.
This one I see a lot.
And then you'll come into the code and you'll say begin.
And let's go over here and let's tab all of this.
And then we'll say end.
Now when we end, we're gonna end it with this dollar sign.
So here's what's happening.
We're changing the delimiter right here to dollar sign.
We're creating our store procedure
And within it, we are keeping all of this.
So all of this code is going to go into this one stored procedure.
Then at the end, we are saying this is the end right here of this stored procedure.
These semicolons no longer are the delimiter that's telling us when it is the end of the stored procedure.
That's what the delimiter does.
Now it is best practice at the end to change it back, right?
Let me spell it right.
Because if you don't, then you're gonna have to start using these dollar signs for everything.
And how do you spell delimiter?
Oh man, there we go.
Now we've changed it back to a semicolon afterwards.
So then we can go and write other queries and it'll act appropriately.
Let's go down.
So this is getting closer to best practice.
Let's go ahead and run this entire thing.
And if we pull this up, we're not getting an output.
That's a good sign.
If we pull this up, it's saying we already created number two.
Change that to three, my apologies.
Let's go down here.
Now we've created the stored procedure three.
Now let's go over here.
We're going to right click on this.
We're gonna say alter stored procedure.
And now you can see that we have both of these queries within the stored procedure.
Let's get rid of this.
And we're gonna go and call this.
So let's copy this large salaries three.
Bring this all the way down.
And let's say call that stored procedure.
If we run it, you'll notice we get two outputs.
We have six and seven.
This result six is where it's greater than 50,000 or 50,000 or greater.
This one is where it's greater than 10,000, which is essentially the entire table.
Now, so far, we've done everything just by writing it all out, and that's fantastic, but you can also come over here to store procedures.
You can right click and say create stored procedure.
Now, let's actually copy this.
We're just gonna create the exact same thing.
We'll create stored procedure.
And we can just paste this in here.
And let's go ahead and do that.
There we go.
And sure, we'll call it new procedure, why not?
And if we say apply, you'll notice that it generates this script right here.
And we can apply it and we can create it and we will in just a second, but let's take a look at it.
So we're gonna use arcs and recreation, that's what I was mentioning before.
We're then going to say drop procedure if exists.
Now this is something that I was gonna show you later, but I'll just show it to you now.
Sometimes it is really beneficial to write something like this before you create it, in case you've already created a stored procedure with that name that you're wanting to replace.
So it's checking if it's there, and if that new procedure is already there, it's just gonna drop it.
Then it comes down, and let me see if I can zoom in on this.
And then it's going to create our delimiter, which it uses dollar signs, so MySQL's even validating what I was saying earlier.
We're gonna use Parks and Recreation again.
And now again, we have to use, instead of a semicolon, we're using dollar signs.
Then we're creating the procedure, which is new procedure.
We're saying begin, end, and then it's even changing the delimiter back.
So basically everything that I said, this is kind of doing it for you automatically.
Now when I click apply,
It went ahead and executed that SQL statement, and our new one is ready.
So we can go ahead and alter that stored procedure, and it looks exactly the same as this one out here, which was large salaries number three.
So it looks exactly the same.
Now let's go ahead and get rid of this, get rid of this, and let's go down below.
The next thing I want to take a look at is something called a parameter.
Now, before I actually get into this, I'm going to copy all this down here because I don't want to rewrite all of it, if I'm being honest.
So let's paste this in here.
Now, parameters are variables that are passed as an input into a stored procedure.
And they allow the stored procedure to accept an input value and place it into your code.
Let's take a look at what that actually means.
Now, before I do anything, I'm just gonna change this to number four, so I don't forget.
So let's get rid of all of this.
We're gonna keep it somewhat simple because we're looking at something new.
Now, when I say we're passing through a parameter, I'm talking about when we're calling it.
So let's say we've already created this one.
I'm not gonna run this yet, but let's say we've created it.
Let's say I wanna pass in an employee ID.
I wanna pass in a specific person and I wanna retrieve their salary.
I know their employee IDs.
I just want it to pull up their salary for us.
So what we're gonna do is we'll get rid of this, and when we're calling it,
When we're calling it, I'm gonna pass through a value like one.
That's Leslie Knope.
And then I want the salary to be the output.
So I'm gonna select the salary.
So we're selecting salary from the employee salary, but how do we know that this one is the person we're looking for?
Well, when we're actually creating this parameter, we create it right in here.
That's what tells the store procedure to accept an input value when we're calling it down below.
We're gonna call this employee underscore ID.
Now, after we call it, after we name our parameter, we need to then give it a data type.
So I'll call this an integer.
So we're telling the stored procedure when somebody calls the stored procedure, they have to pass through an integer.
It can't be a string or it can't be a date.
It has to be an integer.
Now, what we're gonna go do is right down here, we'll say where the employee underscore ID, that's from this column,
in the actual table, we'll say is equal to the employee underscore ID, which is our parameter right here.
Now you may be thinking, that's really confusing.
They're named the exact same thing.
Can I change it?
The answer is yes, I actually encourage it.
So there are some naming conventions that are out there that I think are helpful, ones that I personally use.
But remember, this is just kind of a variable parameter name you can
kind of call it whatever you want so if i wanted to say huggy muffin i could and this could be huggy muffin so let's try it with huggy muffin i just came up with that off the top of my head so don't judge me but we're going to create the start procedure and then when we call it later we want it to return the salary where the employee id right here is equal to whatever was passed through that parameter that input parameter and we're going to keep it as one so it should return 75 000
Let's go ahead, we're gonna create this.
And now let's go right down here and we're going to run it.
And we can see that that is the salary and it worked perfectly.
Now, like I was saying, that is not what I would actually name it.
There are some naming conventions like underscore param at the end.
So you kind of want to keep it, at least I recommend you try to keep it similar to what you're actually looking for.
And you can either end it in underscore param, or there's another way that you can do it, which is come right over here and do P underscore.
And these are just ways that you can tell the code or you can just be able to visually see the difference in the code.
So this is just what I recommend.
Then you put it right down here.
You say where the employee ID is equal to P underscore employee ID saying this is the parameter that's being passed through and put into our actual query.
Hello, everybody.
In this lesson, we're going to be taking a look at triggers and events.
A trigger is a block of code that executes automatically when an event takes place on a specific table.
For example, let's take a look at these two tables.
Now, when a new employee is hired, they're put into this table with their salary information and everything.
But sometimes people forget or don't add their information like you know who right here.
They're not put into this demographics table.
And we want to change that because we want to have everybody in here.
So when somebody is put into this salary table, we want it to automatically update with the employee ID, first name and last name into this table right here for the employee ID, first name and last name.
So we're going to write a trigger when data is updated into the salary, it's going to also update the employee demographics for us.
Now let's go right down here and we're going to take a look at how we can do that.
Now, if you watched the last lesson on stored procedures, we'll do a lot of the same writing style or same formatting for triggers and events.
So we're going to start with is the delimiter.
We're just going to do that right off the bat before we get into anything.
And we're going to change that to the double dollar sign.
Now the delimiter again, in case we have multiple lines of code, which we're going to have, if we have multiple lines of code when we're creating this trigger, this delimiter is gonna help us have multiple queries within our create trigger statement.
So this is really important.
We'll just start out by doing that.
Now let's create our trigger and we do need to name this.
So we'll say employee underscore insert, and we'll just call it like that.
Did I spell that right?
Yeah, employee insert.
So we have our create trigger, we've named it.
Now we need to specify what event needs to take place in order for this to be triggered.
So we're gonna say after an insert, and I need to spell insert right, after an insert on, and we'll do the employee underscore salary table.
So after we insert onto the employee salary table, down below we're gonna write what's actually gonna happen.
Now we're writing after because we're doing it where when new information is put on the salary table, it's automatically updated into the demographics table.
But you could also do before, which means if data is deleted from the employee salary table, something could happen.
But we're not doing any deleting or any updating, we're doing insertion.
So we're gonna say after an insert on.
Now the next part that we need to write is for each row.
Now this for each row means that the trigger is gonna get activated for each row that is inserted.
So if we had an insert statement that inserted four different people who were just hired, that means this trigger is gonna be activated four times.
Now, some SQL databases like Microsoft SQL Server have things like batch triggers or table level triggers that'll only trigger once for all four of them.
And in my opinion, those are really, really nice.
I've used those, I like them.
The way that MySQL has it right here is not the most optimal way to do it, unfortunately, but we don't have access to the batch level of the table level triggers at this time.
So this is really just the setup for what we're about to write.
So after it's inserted on the employee salary table for each row, what is gonna happen?
We're gonna go down here and we're gonna say begin and we'll have end.
Now the code that we're gonna write here is what's gonna happen after this event takes place.
So what we're gonna do is we want to take from this table, let's bring this back up real quick.
When we insert a new person, we wanna take the employee ID, the first name, and the last name, and automatically put it into the demographics table.
So we wanna say,
insert and let me do tab insert into we're going to insert into the employee underscore demographics table and we're not taking everything so let's actually specify what columns we're doing we're doing employee underscore id first underscore name and then the last underscore name
Now we need to specify what the values are.
Now from the employee salary table, we're taking employee ID, first name and last name, but we don't wanna take all of them, right?
We don't wanna take every single employee ID, every single first name, every single last name.
We only wanna take the new values that were just inserted.
Well, luckily for us, there is something that we have for this.
So let's do values, new parenthesis.
We have something called new.
Now new is gonna say we're only taking the new rows that were inserted.
There's also an old like this, where it takes rows that were deleted or updated, but of course for us, we're gonna be using new.
So we'll say new.employee underscore ID,
new.first underscore name and then new.last underscore name and we'll close that.
Then we'll come down here and we'll do our delimiter and we'll change this as well.
We'll say delimiter back to a semicolon.
Now we're getting this error because we need this right here.
So let's recap what we've created then we'll actually create it and try it out.
So we're creating our trigger called employee insert.
After a row is inserted into the employee salary table, for each row, here's what's gonna happen.
We are gonna insert into the employee demographics table, the employee ID, the first name and the last name.
Those are the columns that we're gonna insert into.
Then we're taking the values, new.employeeId, new.firstName, and new.lastName.
Now MySQL understands that when we say new, we're talking about the events that takes place.
This is the data that's being inserted.
It just knows that.
So let's go ahead and create it.
We're gonna run this, and it should work.
Let's pull this up.
It says create trigger, so that worked.
Now, the thing about triggers that's unfortunate, it doesn't have its own little section under here, right?
But it does have under the employee salary, let's go right here, and then under the triggers.
So we can find it, which is great.
So we have this employee insert.
If we right click, we can't really do anything with it.
That's the unfortunate thing.
We can't alter it, we can't change it, we can't drop it, we can't do anything.
That's the unfortunate part, but let's actually test it.
So now we're gonna say insert into, and we're gonna insert into this employee salary.
That's how we're gonna trigger it.
So insert into the employee salary.
And then we'll do employee underscore ID.
These are all the columns.
The first underscore name, last underscore name, occupation.
Bear with me for a second.
Then we have salary and then department underscore ID.
So this is what we're inserting into.
Now we have to do our values.
Now this should be shorter, hopefully.
We'll do 13.
We'll call him Jean-Ralphio.
There we go.
Last name is Saperstein.
Just like that, and not actually just like that.
That's not spelled right.
So we have Saperstein.
His occupation is entertainment, 720 CEO.
How much is he making?
Let's say a million, is that a million?
a million he's making a million dollars and he's really not part of any department so we're just going to have null so what we're about to do is we're only inserting on the employee salary table but we're putting all the values that we need into the appropriate places let's add a semicolon let's go ahead and run this make sure it worked it says insert into and then one row affected now let's come back up let's look at our salary table first
and get rid of this.
If we pull this up, you can see Jean-Ralphio Saperstein, he was added.
Let's go over to the demographics.
This is the moment of truth.
Let's see if it worked.
And as you can see, it worked perfectly.
We have Jean-Ralphio Saperstein.
Now they do need to come back and fill in this information, but it's already in here kind of queuing them up saying, hey, we need this person's age, gender, birth date, all that other information.
So that is how we can create a trigger
based off of a specific table.
And then when it happens, it just automatically does it for us.
We don't have to really think about it.
We just know that we've created a trigger and we can actually go and insert data on that table.
And that trigger is going to work.
It's gonna do what it's supposed to do.
And that's really, really helpful in the real world when you're working with a ton of tables, a ton of things need to be automatically done and you don't wanna have to manually do this.
So having these triggers can save you a ton of time.
Now let's scroll down and we're gonna take a look at
Now event is kind of similar to a trigger.
A trigger happens when an event takes place, whereas an event takes place when it's scheduled.
So this is more of a scheduled automator rather than a trigger that happens when an event takes place.
fantastic for a lot of things like when you're importing data you can pull data from a specific file path on a schedule you can build reports that are exported to a file on a schedule you can do it daily weekly monthly yearly really whatever you'd like it's just super helpful for automation in general now let's say the Pawnee council comes up with some new legislation
They need to save some money, especially in the Parks and Rec Department, we're just spending too much or they're spending too much.
And what they wanna do is retire people who are over the age of 60 immediately and give them lifetime pay.
So what we want to do is create an event that checks it, let's say every month or every day.
And then if they're over a specific age, we are then going to delete them from the table and they will be retired.
This is a fake example, so go with it.
So what we're going to do is come right down here.
We'll select everything from employee demographics.
And let's run this.
And let's pull this up.
So let's say if they are over the age of 60, which unfortunately is Jerry Gurgich, like I don't make the rules.
But if they're over the age of 60, they're going to be automatically retired.
So let's come right over here.
We're gonna say create event, and we'll call this the delete, delete underscore retirees.
Now before when we were creating the trigger, we were saying based off of a specific event, but here we're going to schedule it.
We're gonna say on schedule, and then we're gonna say every, and we could do one month.
Maybe we'll look every single month, but here we'll do, let's do every 30 seconds.
every 30 second.
Now we're gonna go down, we'll say do and this is gonna say here's what needs to happen every 30 seconds.
So we'll say begin and end.
Now what's gonna happen every 30 seconds is we're just gonna start with a select statement and I'll just copy this actually.
We'll start with a select statement, but then we'll update it to a delete statement.
But we'll come right here.
We'll say where the age is greater than or equal to 60.
So if we just run this query right here, that's only one person, that's Jerry Gergich.
Now, if we wanna write this correctly, we'll do the delimiter.
We'll have the dollar signs
We'll have the dollar sign right down here as well.
And we'll say the limiter back to a semicolon.
Now every 30 seconds, we don't want to select people who are that age, we want to delete.
So let's go right here, we're gonna change this, because now we know it should be deleting the right person.
And we're gonna go ahead and create this event.
Let's go ahead and run this.
And let's make sure it was created properly.
It looks like create event, zero is affected.
This should be working.
Let's go back up to the demographics table and let's run this.
And let's pull this down and pull this up.
And as you can see, unfortunately, Jerry Gergich is no more.
You know, he's just too old and the Pawnee Council, they recognize that.
And so it wasn't my rule.
That was unfortunately Pawnee Council's rule.
Now really quickly, if that did not work, let's say you couldn't create your event at all.
Let's go down here.
I'm gonna show variables and we'll run it just like this.
I'm gonna show you how you may need to fix this.
So we can say where variables is like, and then we'll say event.
Do it just like this.
So I have event scheduler where the value is on.
If yours is off, which sometimes that can happen, you're just gonna update this to on.
Now another issue could have happened, and I just wanna explore this for just one second.
You may not have permissions to delete things.
If you do not, come right up here.
Let's try to figure this out together.
It's actually edit.
preferences, and I wanna say it's right here into the SQL editor at the very bottom.
Yeah, so save updates, rejects updates, and deletes with no restrictions.
This needs to be unchecked.
So go to preferences, go to the SQL editor, down at the bottom, unclick this.
if that didn't work now if everything worked perfectly you don't need to change a thing but if it didn't i just wanted to work through you know some uh troubleshooting that you may just have to google or chat gpt or something to try to figure out so that is how we can create an event in mysql to run on a schedule now typically you wouldn't do it on every 30 seconds you do something like every one month or every one year or you know a longer time frame but you get the picture of what we're trying to do
Hello everybody and welcome to the very first project in the MySQL series.
Today we're going to be focusing on data cleaning.
Now, if you don't know what data cleaning is, it's basically where you get it in a more usable format.
So you fix a lot of the issues in the raw data that when you start creating visualizations or start using it in your products, that the data is actually useful and there aren't a lot of issues with it.
So that's really what data cleaning is.
Now what we're about to do is create a database.
We're going to import a data set.
This is a real data set.
And what we're gonna do is we're gonna clean the data.
So I'm gonna show you and walk you through all the steps in order to clean the data.
The data set that we're gonna be working with will be in the GitHub.
So you can just go and download that.
I'll have a link somewhere in the description.
But let's get started.
First thing we're gonna do is create a new database.
So we'll go right over here to create a new schema.
And we're just gonna call this one, we'll do this as world underscore layoffs.
So if you can't tell already, we're gonna do world layoffs.
That's the data set that we're gonna be doing.
We'll just click Apply.
And that creates our world layoffs right here.
Now we're going to go into here.
There are no tables.
We're going to right click on tables and go to table data import wizard.
Now we haven't done this yet.
In this series, we haven't imported any data, but that's what we're doing here.
We're going to show you how to import data.
So we'll go ahead and click browse.
And as you can see right here, we have this layoffs data set.
Let's open this up.
and we're gonna click next.
And we're going to create a new table.
There's no existing table in this database.
You can drop it if it exists.
If you'd like to, it doesn't matter if this is new.
We're gonna go ahead and select next.
Now, right here is where you configure import settings.
Now, MySQL is gonna automatically assign a data type based off of the data in these columns.
So we'll take a look at the data later.
Now, there is one thing that you can take a look at real quick.
We have this date column.
Now in here, it assigned it as a text.
That's because of the format.
We are going to import this as the raw data.
We're not going to try to change anything in the import settings.
We're just going to assume this is how the data was in the table.
So I'm not going to change anything.
Although this may be something that you would want to change to something like a date time and go and fix that.
But we're going to import this as the raw data.
Let's go ahead and select next.
We're going to import it.
We just select Next.
Now this could take a little bit.
So while this is importing, I'm just gonna skip ahead.
This should take just a few minutes to import.
All right, this just finished.
Let's select Next.
And we imported 2,361 records.
Let's go ahead and select Finish.
And we can get rid of this.
And let's refresh this.
perfect we have our layoffs table so we'll select everything and I'm gonna go and double click on the world layoffs because I don't want to write out the whole thing every time so we're gonna say from layoffs and let's see what we get
So let's take a look at the data that we're gonna be working with in this data cleaning project.
So this dataset is layoffs from around the world, starting I think 2021, and we'll take a look at that in this date column later.
But it has the company, so it has the company that did the layoffs, it has the location of where they are, what industry they are part of, how many they laid off,
the percentage that they laid off, so the percentage of their company, the date, the stage, which refers to the stage that the company is in, whether it's a Series B, post-IPO, they don't know.
Then there's the country, and then we have funds raised millions.
So we have a lot of information here.
And in the next project, we're gonna be doing exploratory data analysis.
So we're cleaning all of this data,
And then in the next lesson, we're gonna actually dive into it and try to find trends and patterns and all these other things.
So what we are going to do is we're gonna go through multiple steps.
Step number one is we are going to try to remove duplicates if there are any.
That is the first thing I typically do, especially if I know this data shouldn't have any duplicates or it'd be repetitive or unnecessary to have duplicates.
The second thing is going to be to standardize the data.
That just means that if there are issues with the data with spellings or things like that, we just want to standardize it to where it's all the same as it should be.
Number three is we'll look at the null values or blank values.
And there's a lot of null values in here.
There's even a blank value right here.
And we're gonna see if we can populate that if we can.
And there are times where you should, there are times where you shouldn't.
I'll kind of walk through that as well.
And lastly, we want to remove any columns and rows that aren't necessary.
And there's a few different ways to do that.
This one is a little bit, let me write this actually real quick, remove any columns.
So I'm just gonna say there are instances where you can do this.
There are instances where you shouldn't do this.
When you're working with massive data sets and you have a column that's completely irrelevant, completely blank, you don't have any ETL process that is required for it, you can get rid of it and it can save you time when you're querying your data.
Now, with that being said, and we'll talk about this later, in the real workplace, oftentimes you have processes that automatically import data from different data sources.
If you remove a column from the raw data set, that's a big, big problem.
So,
What we're going to do is something I would actually do in my real work, which is I would create some type of staging or raw dataset.
Let's say this one's our raw one.
And we could have even called this layoffs underscore raw.
We're going to create another one.
We're going to create a table.
So we'll say create table and let's call this one layoffs underscore staging.
And we literally just want to copy all of the data from the raw table.
into the staging table so we can do that really quickly by just saying like layoffs and if we run this and we refresh you'll see we have this staging database and let's copy this there we go and we'll do layoffs underscore staging
And so now we have all of the columns, and all we have to do is insert the data.
So we're just gonna say insert, then we're gonna say layoffs staging, go right here, and we'll select everything from layoffs.
And let's run this.
And if we select the table, we now have all the data over.
So super, super easy.
And now we have these two different tables.
Now, again, why do we do this is because we're about to change the staging database a lot.
If we make some type of mistake, we want to have the raw data available.
This does happen.
This is something that you do in the real workplace because you're not gonna work on the raw data.
It just, you shouldn't do it.
It's not best practice.
So I'm gonna show you what I would actually do in my, you know, like a real job.
So that's what we're gonna do.
Now we're only gonna be working off the staging database and we can copy this and make different databases for different things.
As long as we have our raw data, we can really do anything we want going forward.
And that's what we're gonna do.
So the number one thing we're gonna look at is to make sure that we are removing duplicates.
We wanna make sure we don't have any duplicate data in here, and if so, we're gonna get rid of it.
Now really quickly, if you did my Microsoft SQL Server project, we did something very similar, but we had an extra column over here that gave the unique row ID, which made it really easy to remove the duplicates.
Here, there is no identifying factor that's gonna be easy for that.
So I'm just gonna tell you up front, removing these duplicates is not gonna be easy, but we'll walk through it every step of the way.
So what we can do is try and do something like a row number, and we'll match it against all of these columns.
And then we'll see if there are any duplicates.
Now I'm just, we're starting off strong.
Okay.
We're jumping into kind of some of the more advanced things.
It does get actually easier as we go, but this is the actual order that I follow.
So I'm going to keep it.
So let's try to identify duplicates.
So let's copy this.
Let's pull this down, do underscore staging.
There we go.
Now what we can do is we can do row number and we'll do that partition by basically, we could do every single one of these columns.
That's kind of what we're doing.
So what we can do is we can say everything, then we can do a comma, and we'll say row underscore number, and it would be just like this, and we're gonna do this over, and we want to partition by all of these columns, essentially.
We could just do a few for now to see if we get any hits, and then we can look at that, but they're gonna be multiple companies that have layoffs in the same location and industry, although their total layoffs would probably be different, and the date would probably be different.
So if we do something like company, let's do industry, we will do total underscore laid underscore off comma percentage laid off and then let's do date.
Now I'm doing date with the back ticks because date is a keyword in MySQL.
If we do it like this, it just really makes it easy.
So we're going to partition by all of these things.
So let's do partition by and let's bring this down real quick.
So I'm just going to say over partition by, and we're going to call this as row underscore num.
Now let's try running this.
Let's see if it works really quickly.
It's important.
And over here, you can see that we have our row number.
Now these mostly are unique, and these all look unique.
I'm not gonna scroll through all of them, but we wanna be able to filter on this so we can filter where the row number is greater than two.
If it has two or above, that means there's duplicates.
That means there's an issue.
so let's go ahead and we're going to take this we'll put it into either a subquery or a cte i'll create a cte for this because it's really easy so we'll say for or not for we'll say with and then we'll do duplicate underscore cte as then we'll just do our parentheses we'll paste this in here get rid of that right there and now we're going to say
select everything from this duplicate CTE, then we'll say where row underscore num is greater than one.
Now let's run this, and it's not a semicolon, let's run this.
And you can see that these ones have duplicates.
So these are our duplicates actually.
And we want to get rid of these exact rows.
Now just to confirm that these are the duplicates, let's look at this one.
I never heard of this company, but we'll take it really quick.
And let's select, we'll say where company is equal to, and we'll call this OTA.
So let's run this.
And it looks like these, no, no, no, no, these aren't duplicates.
That's a good thing we checked, okay?
Because it looks like these aren't the exact same.
Although they're very, very close, these technically are not duplicates.
So I'm glad we checked this.
We need to do this partition by over every single column.
That's what I'm realizing.
So we'll do company, comma, location.
I'm genuinely glad, it's good to make mistakes and figure things out as you go.
It really is important.
So company, location, industry, total laid off, percentage laid off, date.
Then we'll do stage, and then we'll do country, and then funds, underscore, raised, underscore, millions.
So we're changing the CTE to partition over everything.
So now let's run this.
Okay, ODA is not in there.
That's the only one we checked.
But let's look at Casper.
I know this.
Aren't these the mattress people?
I didn't know they had layoffs before, guys.
All right, let's take a look.
It looks like this row...
and this row are duplicates.
These are our duplicates.
So we are going to want to remove only one of those.
We don't want to remove all of those.
So just looking at this one example, it looks like this query is working well.
So here's our duplicates.
Now we need to identify these exact rows.
We don't want to delete both of them.
When we looked at Casper, there's the real one that we want to keep.
Then there's a duplicate that we want to remove.
We don't want to remove both.
That would be bad.
Now, in MySQL, it's a little bit trickier to remove things than it is in something like Microsoft SQL Server, PostgreSQL.
They have different ways that they can delete rows.
For example, in Microsoft SQL Server, we could literally identify these row numbers in the CTE and delete them from it, and it would delete it from the actual table.
We can't do that in MySQL.
And I'll show you.
Let's actually copy this.
We'll go like this.
And we'll say, let's say we want to delete these.
We'll say delete from.
We're deleting this from where the row number is greater than one.
What am I writing right here?
Delete.
There we go.
So delete from this duplicate CTE where the row number is greater than one.
That's all these duplicates.
We want to remove them.
Let's try to do this.
Let's run it.
Let's go down.
If we look at the bottom, it says the target table duplicate CTE of the delete is not updatable.
So you cannot update a CTE.
A delete statement is like an update statement, essentially.
So what we are going to do is we're gonna do something a little bit different because this is how I would love to do it.
That makes it super, super easy to remove duplicates.
But that is not always the way that things happen in the real world.
I think what we should do is take this right here and let's run this.
we should take this right here and put this into, let's say a staging two database, and then we can delete it because we can filter on these row nums and we can delete those which are equal to two.
So it's essentially like,
you know, creating some type of table and then just deleting the actual column.
So we're, that's exactly what we're gonna do.
So it's essentially just creating another table that has this extra row and then deleting it where that row is equal to two.
So, you know, somewhat fairly straightforward, but let's try it and let's see what happens.
So we're gonna come down
and do is create our table let's try doing that with here let's let's copy the clipboard a create statement let's see if this works perfect that's exactly what I wanted now all we're going to do is say we're creating the table layoff staging 2 now this is a create table statement and we're naming the columns and then we're also assigning the data type
So we have all these things, but we want one more.
I'm gonna do a comma, and we wanna add row underscore num, and I need to underscore num.
And that should be an integer data type.
So we'll just keep it just like this.
Let's go ahead and copy this, and let's run it.
See if it worked.
Bring this up.
Looks like it worked properly.
and let's say let's go back up I want to rewrite things that I don't have to let's run this so now we have this empty table so we want to insert this information right here we're going to insert into so we'll insert
Into and then we'll do this right here.
So insert into staging 2 Now let's try to run this see if it works and let's run it and Let's select that table
And now we have it.
So let's pull this back up and I'll walk through what we just did, because I know I'm going quick, but we have so much to cover in this lesson.
So we just inserted basically a copy of all these columns, but in this new table, we added one more, the row num.
So now we can filter and we can say where, I need to spell that right, where row underscore num is equal to two, or we should say greater than one, because some might have multiple duplicates.
there you go here are our duplicates now we're going to delete these so all we have to do is come right back down where'd I go copy this come right back down here and we're just gonna say delete from we just did a select statement I always recommend doing that to identify what you're deleting then you change it to delete and now if we run this go
And I'm actually going to keep this.
Let me see.
There we go.
And let's run it again.
And now they're gone.
And if we say just the whole table, this looks wonderful.
Now, this row num is going to be a column at the end that we probably don't need anymore, right?
It's a redundant column.
It adds up extra space and memory and storage and all these other things and processing times.
We're just going to get rid of it.
That'll be at the very end, I'm sure.
So it looks like we are good to go.
That's how we remove duplicates.
Now, there are different ways to do it when you have different columns.
Like if you have a unique column over here, makes it so much easier.
So, so, so much easier, but we didn't have that.
So we had to kind of do a workaround.
Welcome to the real world.
Now, let's look at standardizing data.
So standardizing data is finding issues in your data and then fixing it.
So I'm already noticing right here, it looks like we have a space at the beginning.
We could easily just do a trim on this column.
I don't even think I was, I did this when I wrote out all the scripts for this.
Let's just do from this table, why am I writing it all out again?
We actually want to select the company and then the, or actually we'll just do distinct company.
company.
Let's run this.
And if we do a trim around this, let's run this again.
And that looks better.
So if we do a company company comma, and then we'll just do the trim.
I don't want to, we don't need to do this thing right now.
We'll do the company.
This just looks better.
So we're going to update that.
It's super easy now.
If you ran to Nitu just a second ago, I may need to help you change that.
So if you couldn't update or delete those things earlier, I should have told you this earlier, I apologize.
All you need to go is to Edit, you just need to go to Edit, go to Preferences at the very bottom, go to SQL Editor, go all the way down to the bottom, and right here we have Safe Updates on.
If you have this selected, that means you can't update anything.
That's a problem.
So what you need to do is select this,
or unselect it like I have it and save it.
You may have to even restart your MySQL potentially in order for those changes to take effect, but then you should be able to update that.
Now, all we're gonna do is update this table
and we're going to set, and now we need to come back here, and we'll say we're gonna set the company equal to trim.
Now, if you don't know what trim is, or you haven't taken that lesson, trim just takes off the white space off the end, so it took the white space out of here, or off the right hand side as well.
So we're gonna update this, and let's do a semicolon, a semicolon, let's run this, let's select this again,
and it was updated properly.
So we're already off to a great start.
Now the next thing that I wanna take a look at is the actual industry.
Let's go back, copy this, and let's take a look at the industry.
So we'll do industry, and we'll run it.
Now if you look in here, there's a ton of different industries.
And there's marketing and marketing, oh, because I haven't done distinct.
Please ignore me.
Let's do distinct.
And there's a ton of different industries in here.
Transportation, healthcare, consumer.
There's a blank one, which we'll take a look at.
Aerospace, there's a lot of really unique ones.
Let's actually order this.
We'll do order by, and let's just do one, which is the first column.
We're just ordering it on itself.
So we have null, we have blank.
That's a problem.
We'll take a look at that later.
But this is an issue.
Crypto, cryptocurrency, and cryptocurrency.
These are all the same thing.
These should all be labeled the exact same thing.
The reason we need to change this is because when we start doing the exploratory data analysis, visualizing it, these would all be their own rows, their own unique thing, which we don't want.
We want them all to be grouped together so we can accurately look at the data.
Let's take a look at any other ones.
FinTech and finance, that could be the same thing.
I'm not a hundred percent sure.
I'm not a FinTech person.
Um, I think for now, the only one that I'm confident in changing is this one right here, which is cryptocurrency.
So let's go ahead and update that.
So all we have to do, and we need to actually let's select really quickly where it's like crypto.
So we'll say, uh, where industry, and we want to select everything.
where the industry is like, and we'll just do crypto.
See, I'll start with crypto, right, yeah.
We'll do crypto just like this, and let's run this.
And let's just take a look.
Lot of layoffs in the crypto industry.
Good night.
All right, let's find where it's cryptocurrency.
Okay, so even this one, it's crypto, and I know Gemini.
Crypto, crypto, and then it says cryptocurrency.
So these should be all crypto.
You see how 95% of them are crypto?
So we're gonna update these other ones.
Oh, this one is C-R-Y-P-T. Is that how you spell crypto?
Geez, I don't know anything.
All right, so we wanna update all of them to be crypto.
So what we're gonna do is we're gonna say,
update, layoffs, industry two.
We want to set the industry equal to crypto, just like this, where, and we can do it a few different ways.
We can say industry, I think we can do like.
Let's try this real quick.
Some of this stuff I don't have planned out.
I'm just kind of going with it as we go, which I like better.
You know, we kind of, we work together on this.
We figure these things out together.
That's what I like.
Um, then we'll do like crypto, just like this, exactly like we had it up here.
So if it's like crypto, it should be crypto.
Let's try this.
Let's see if it ranks.
It may not have, I can't remember.
Yeah, it worked.
Okay.
So it updated, uh, three rows and that looks correct.
Now let's go back up and let's run this.
Now as we scroll down, they are all the exact same.
Beautiful, beautiful, beautiful.
So if we do a distinct industry again, let's get rid of this.
If we run this query and we scroll down, crypto is its own thing, beautiful.
And it looks great.
We can look at those later on how we can update those, but let's keep going.
Let's look at our whole table again.
And these blanks and these nulls are actually an issue.
We do need to deal with them, but my instinct is telling me go fix it.
But my tutorial side is saying, okay, stick with the tutorial.
the order that we agreed on.
So let's go take a look.
We've looked at company, we've looked at industry.
Let's just real quick look at distinct location.
Now it's good to look at most of these things, right?
There could be small, tiny issues that you just never saw.
And we're just gonna order by, order by one, just do a real quick, just a scan to see if we find any issues.
Um, that could be an issue, but that could just be another language.
If I'm being honest, I don't know.
Um, as I'm just scrolling through here because I want to make sure, cause this is not something I had in my, uh, pre-written script.
This looks pretty good to me.
Um, let's do everything.
We'll run this and now let's look at country.
So we'll do distinct country and let's run this and let's scroll down.
Again, this is sometimes just what I actually do.
All right.
We got an issue right here.
Super common.
Somebody put a period at the end, some dingus, uh, and we're not going to judge that person.
I don't know who it was or who ruined this dataset, but, um, that's a problem.
So we're going to need to just update that.
It looks pretty simple.
Um, but I'll just say where country is equal to, or let's say like, and then I'll say like United States.
There we go.
And oops.
I want to say, select everything.
I just want to see, um, where it's at.
Oh, geez.
There's too many.
Let me see if I can spot it.
I can't spot it.
It looks like they're supposed to be United States, not United States dot.
That's the issue.
Um, we can easily, easily fix this and we can probably let's do, um, really quickly.
Let's do select, oops, select distinct, and then we'll do country, comma, and then we'll do a trim, because we wanna get rid of that one.
We'll do country.
Now, just doing the trim won't fix it.
Let's go to the bottom.
So doing the trim doesn't fix it, but here's what you can do.
It's a little trick of the trade here.
We're gonna do something called trailing, which means coming at the end.
So what's trailing?
The period from country.
Let's try running this.
Scroll to the bottom, and it fixed it.
So this is a little advanced little tidbit for the trim here.
We can do trailing from the country.
We're looking for something that's not a white space.
We're specifying we're looking for a period.
So now what we can do is we can say update.
We can set the country to update this table.
And we'll, oops.
And we'll set, what am I doing?
What's going on here?
We'll set the country equal to,
And we'll do just like this, but we're only going to do it for country, right?
Uh, so we'll say is equal to trim and we'll say where country is equal to, or actually let's say like, and let me see if I have this.
Oh, I don't let's let's just say like United States, like we had it before.
Just like this.
So let's go ahead and update this after I put my semicolon in.
Let's run this.
And let's run this again.
It shouldn't need to fix it anymore.
It's just one row.
That's perfect.
That's exactly what we wanted.
Now, one thing that's really important, and this is a longitudinal, that's not the right word at all, give me a second.
I can't speak and write at the same time, so sometimes I just say dumb things.
If we wanna do, not longitudinal, but time series, that's the word I'm looking for.
If we're trying to do time series, exploratory data analysis, time series visualizations later on,
This needs to be changed.
Right now it's text and we can look at that.
Actually, let's refresh this.
We're not looking at staging, we're looking at staging two.
If we look at the columns and we come down here to date, it is a text column.
That's not good if we're trying to do time series stuff.
We wanna change this to a date column.
Now, how can we do that?
Let's take a look.
So let's do date and let's not actually do it like that.
Let's do date backslash.
So we're just gonna look at the date.
Now, let's change this, because we wanna format it how we wanna format it, which is month, day, year.
So, how can we do this?
Well, there's something that's very, very helpful, works perfectly in this situation, and is exactly what we're gonna do.
It's called string-to-date.
So we're gonna do string, underscore, there it is right there, underscore, to, underscore, date.
It literally helps us go from a string, which is a text, that's the data type, to a date.
So it's perfect.
Now, all we need to do is pass through two parameters.
We have to pass through the column, which is the date column, and then what format we want it in.
Now, if you haven't done date formats before, I'm gonna just kind of walk you through it while we're looking at it.
In order to format this properly, you use a percent sign, so it's gonna be a formatting for a month, a lowercase m. A capital M is something completely different.
I believe it's spelled out.
We can look at that in a second if we want to, actually.
And then we can do this right here, and then we'll do another one.
So we're formatting it in the way that we want it, but also converting it to an actual date column.
So now we want month, and then we want day, lowercase day.
We'll do a forward slash, and then another percent sign, and then a capital Y, which stands for, I believe, the four number long year.
Let's look at this real quick.
So it worked perfect.
So it's taking in this format that it's in right over here and converting it into the date format.
So this is the standard date format that you're gonna find in MySQL.
Now let's see what happens really quickly just for fun.
Let's see if we do capital M.
It looks like that's not going to work at all.
Let's do lowercase y and just formatted it to 2020.
I think it took the first two numbers, it looks like.
I don't know why it's doing that, if I'm being honest.
But if we keep it with the capital Y as we should, this looks perfect.
This looks exactly like what we're trying to do.
So you can mess around with it.
It depends on how the data is formatted in your original column when it converts it to the string to date.
And there's a lot of different stuff.
You should just look up date formatting in MySQL.
Really interesting stuff.
So we're going to update this date column to this, which is our new date column.
Let's go ahead and do that.
We're going to say update date.
You guys should be getting used to this by now.
That's the whole point is getting used to doing these things.
So we're going to set date equal to, and then we're going to put in this right here, the string to date.
Go ahead and do this and let's run it.
Make sure it worked.
2,355 rows.
It looked like it did every single one, but let's go ahead and get rid of this and let's run it.
and it looks like it worked perfectly.
Now there were some nulls, it looks like, and that'll be something we have to look at later when we talk about nulls, but overall, I believe this looks proper.
Now if we refresh this,
Refresh come down to the date.
You'll notice it is still a text its date It's called text, but now it's in the date format now That's really important and maybe I should have done that earlier if I'm being honest Tried to convert it to a date column.
It wouldn't work.
It would give us an error You just have to trust me on that one, but now we can do it where we can change it to a date column So let's do alter table
Now only do this, never, ever do this on your raw table.
Only do this on things like a staging table, because we're about to completely change the data type of the actual table.
So we want to change the layoff staging too.
And then we're gonna come down here.
I want to say modify column.
And what column are we modifying?
It's this date column.
There we go.
And we want to change it to what data type?
A date.
And am I spelling this right?
Yeah, I just need a semicolon here.
Whenever I see an error, I always, you know, just look for the semicolons.
So let's go and run this.
And let's refresh, see if it worked.
And the date was changed to a date, which is perfect.
It's all we wanted to do just to make sure we were doing what, or we'll set ourselves up later in the future really well.
Let's look at our table.
All right, this is very good.
So we fixed a few just issues with the company, I believe something with the industry or the cryptocurrency.
We changed the country.
I'm just gonna go ahead and tell you right now, this one we're not gonna look at until we look at the nulls and whatnot in just a second.
So we're not looking at that one yet.
And then we have this extra column that we've done.
So we've done a lot so far.
But the next thing in the process, step one was remove duplicates, step two was standardization, step three is working with null and blank values.
Now this is going to happen.
You're gonna have nulls and you're going to have blank values in here.
It's somewhere.
It's just going to happen.
And so we need to think about what we're gonna do with that information, whether we wanna make them all nulls, make them all blanks, try to populate that data.
Let's see what we're gonna do.
Let's start off with the total laid off.
We'll just do where total underscore laid underscore off is null.
So in order to look at the null, we say is null.
Let's try equal to null.
It's not gonna give it to us.
We have to say where it is null.
So we have these values.
These are completely null.
There's quite a few of them, but remember,
This is also useful information, but if they have two nulls, that probably is pretty useless to us.
That's something I think we'll take a look at in a little bit actually.
We'll say, and we may save this query.
Percentage laid off.
null so if they're both null like these these are all I believe fairly useless to us these might be ones that we remove so let's actually look at this in step four we look at removing rows and columns but one thing we should take a look at I remember this industry let's do industry
distinct this industry had some missing values and let's take a look at that okay so we have a missing value and we have a null here so let's look at this query and let's say where industry is null or
do industry is equal to a blank like this.
We'll select everything to run this.
All right.
So it looks like there are a few that are blank.
Now, what we can try to do is see if any of these have one that's populated.
Let's take Airbnb, for example.
Let's search for this really quickly.
And this is 100%, you know, it's just helpful.
It's really, really helpful to be able to populate data that is populatable.
Is that a word?
Let's try it.
So we'll say select...
Everything I just want to do where spell that right where company is equal to, and let's do Airbnb.
There we go.
Let's run this.
And it looks like we have this one right here.
So for example, these, whether they have them or not, we're gonna try to populate these.
If this Bally's or Carvana or Juul had multiple layoffs, if these ones aren't blank, if they have one that's not blank, we should be able to populate it.
For example, not the one I was trying to do, if we look at Airbnb, this one has travel.
So we know this is the travel industry.
So we can populate this with travel again,
We want this data to be the same.
So if we're trying to look at what industries were impacted the most, this row isn't gonna be affected or this row won't be in our output because it's blank.
We want that to be traveled to represent the data properly.
So we wanna update it.
So if this one has travel, we should be able to update this row with this travel right here.
So let's see how we can write this.
And let me give myself some rows right here.
All right.
Now what we're gonna need to do is try to do a join here.
So let's try running out in a select statement and then we'll just change it to an update if it works.
So we're gonna select everything and we're gonna do this from staging two, from staging two and we'll call this ST2.
And then we'll join on itself, because what we're gonna do is we're gonna check.
In this table, does it have one that is blank and not blank?
If so, update it with the non-blank one.
That's essentially, in layman terms, what we're trying to write, but writing it out could be a little bit more difficult.
So we're gonna join on itself, and we'll call this, let's actually call this table one, T1 and T2, because they're the exact same table.
And we'll do this on, and we're going to say t1.company is equal to t2.company.
So the company has to be the same.
That's important.
And we probably should do the location is the same as well.
Now we'll do and t1.location is equal to t2.location.
I'm imagining, you know, there's another Airbnb in like
South America somewhere that's called Airbnb but you know I'm just imagining a scenario right where we have to think about different use cases rather than just large companies so those other ones they may have ones that are in different locations we don't want those we don't want to change them if they're not the same so these are the same now what we want to find is we're gonna say oops we want to say where then we'll do t1 dot industry is null
And then we wanna check that t2.Industry is not null.
We'll say and t2.Industry
is not null.
And let's just run this, let's see if we get anything.
So let's think this through, because we got nothing in our output.
We're selecting everything, we're joining on the company, and the company, and the location, where T1 industry is null, and T2 industry is not null.
Let's just get rid of this for a second, I just wanna see if this changes anything, it doesn't.
And it's possible, actually, that instead of doing is null, we could do or, and I'm glad we're walking through this, we can do or is equal to blank.
And let's try running this.
There we go.
Okay, so it looks like there's Juul, Carvana, and Airbnb.
These ones all have industries where it's null or blank, and an industry is not null.
So that's really good.
Now if we scroll over, see the industry here, this is our T1, this is our first table.
If we scroll over, I bet we'll see the T2 industry where it's not null.
Let's scroll over.
And here's our industry.
We have travel, transportation, and consumer.
So, this worked exactly as we had hoped.
I can even pull this up here just to show, kind of show you a little bit easier what that's doing.
And we'll do t2.industry.
This is kind of like what we're trying to do.
So if it's blank, this one is going to be populated into here if there is one that is not blank.
So that's essentially what we're gonna do.
Let's write the update statement and we're gonna see if it works.
We have to translate this to an update statement.
So we'll do update and we're gonna update this right here.
So we'll say update T1 and then we'll do the join.
right there.
And now we have to do a set statement.
So we'll set the t1.industry equal to, and I'll just copy this, t2.industry.
I just don't like, I don't like writing things out.
Then we say where, so we do this, just like that.
And it's at a semi-colon.
Okay.
Let's confirm.
So we're updating this table T1.
We're joining on T2 where the company is the exact same.
We're setting T1 industry equal to T2 industry.
So the T1 should be the blank one.
So where the T1 industry is null or blank and T2 industry is not null.
Let's go ahead and run this semi-colon.
See if there were about three updated.
Yep, rematch.
Zero was affected though.
Let's go take a look.
We have to, let's run this query.
Looks like those are still null.
Let's run this.
That one is still blank.
Now let me think here.
I'm trying to think of why this didn't work.
And I wanna walk you through my thought process.
It is possible that because these are blanks and not nulls that it's not working.
I, and I will say that is something I typically do where I set these blanks to nulls first.
So let's actually try that and see if that changes anything.
I'm just going to update this.
I'm going to say set the industry.
equal to null.
We'll say where industry is equal to blanks.
So we're just changing it to null where it's blank.
Let's try this.
Let's go back down here to our select statement.
So these are all nulls.
Okay, I think this is now going to work.
Because now you can see on this side, there's only one option for it to populate it.
Before there were those blanks, which I think was causing the issue.
Let's get rid of this part, because now we have no nulls.
And now let's try running this.
We're workshopping this on the fly, guys.
Let's see, three rows affected, hey-o.
All right, let's go see if it worked.
Let's run this query.
And we have none, that's perfect.
Let's look at Airbnb.
Hey, all right, all right, ran into some issues, but we worked through it, we figured out the issue, and now it's working properly.
And we can even come back up here to select everything.
And it looks like Bailey's is the only one that still has a null.
Let's look up Bailey's real quick.
And we'll say where company is like Bailey.
Let's run this.
Yeah, and there's only one.
So there wasn't another row.
All these other ones like Carvana and I can't remember the other, Jewel and Airbnb.
Those ones had an extra row.
They did multiple layoffs.
This one only did one layoff.
So we don't have another populated row where it's not null to actually populate the null row.
That's really all that happened.
That's why that worked that way.
So I'm really happy that worked.
Awesome job, guys.
I was starting to question myself.
Do I even know how to use MySQL?
I mean, I was really starting to question my abilities here.
Take a look.
I think that is all we're going to do for populating null values.
Now, here's why.
Things like total laid off, percentage laid off, funds raised, how are we gonna populate that with the data that we have here?
I don't believe we can.
Now we might be able to populate, oops, we might be able to populate some of this if we had the company total.
Like if we had the original total before laid off, because then we could do calculations like, oh, these companies went completely out of business.
That's not good.
At 1%, that means 100% was laid off.
But if we had the total, they had 50 employees and 100% were laid off, we could populate the total laid off, whoops,
Did it again.
We could populate the total laid off by saying, if this is 50, 100% was laid off, that's 50 people were laid off.
We don't have that data, so we can't go and populate it, I don't believe.
Funds raised, we might be able to scrape some data from the web and populate this, but that's a totally different thing, not part of this project.
So, I think the data cleaning for the null values and blank values, I think that's gonna be done.
It's possible that the stage could be the same, and if you wanna go check, you can, but we're gonna keep chugging along because we wanna remove columns and rows that we need to.
Now, if you remember, we were looking at this before.
Did I save that query?
Let's go look.
Here we go.
Bring this down to the bottom.
All right.
These rows, let's really take a look at these and think about if this is gonna be helpful to us.
So what we are trying to do with this data in the near future is we're not just trying to identify a company or a location that had layoffs.
And maybe we are.
Maybe we are trying to do that.
But these have no layoffs and no percentage laid off.
So in my opinion, I don't know if these laid off any at all.
I believe that we can get rid of these.
Now, deleting data is a very interesting thing to do.
You have to be confident.
Am I 100% confident?
No, not really.
But I'm confident enough to know that what we're about to look at in the next one, we're gonna be using these total laid off a lot, percentage laid off a lot when we're looking at actually querying the data.
and doing some exploratory data analysis.
So we're gonna use these a lot.
I don't think these, I'm not even sure if these are accurate.
I'm not even sure if they actually did have a layoff.
It's saying they did, but it doesn't show if they laid off any.
So can we delete this?
Yes.
Should we delete this?
It's iffy.
I'm not 100% if I'm being completely honest.
And there's a lot of rows like that.
This could be like 100 or so.
I mean, I could run a query and run it, but it's not a big deal.
The point being, I don't think we need this information, so we're going to get rid of it.
If nothing else, just to show that you can do it.
So now we'll say delete, and then we'll do from here.
there we go so now we're going to delete these rows let's try to select them again and they are gone so we deleted the ones where the total laid off was blank and the percentage laid off was blank we just i can't trust that data i really can't um and let's go back down come right here semicolon so i sometimes i have to walk myself through these things um all right
this row num.
I mean, come on.
We don't need that anymore.
Let's get rid of it.
So what we can do, now it's a little bit different syntax.
We want to drop a column from this table.
So we have to do the alter table again.
We're gonna alter table, layoff staging two, and then we're gonna say drop column and row underscore num.
If we run this,
Then we run the table again, should be gone.
And it is.
So this is it.
This is our finalized clean data.
Now in the next project, we're gonna be doing exploratory data analysis on this cleaned data.
We're gonna be finding trends and patterns and running complex queries.
It's gonna be phenomenal.
I'm super excited about it.
And I love this data cleaning one.
I made some mistakes.
I'll be the first one to admit, but cleaning data is not always a straightforward thing.
You have to kind of mess around with it, figure it out, and that's what we did.
Whoa, took a while.
So just to recap, we removed duplicates, we standardized the data, we looked at the null values or blank values, then we removed any columns
or rows.
So we did a lot.
If you go back and you actually scroll through here and look at some of this code that we wrote, it's not super beginner stuff.
So if you're following along with these things and you are getting this project, this is a fantastic project to put on your portfolio.
I myself would put this project on my portfolio because it's a very, very relevant thing.
Hello, everybody.
In this project, we're going to be focusing on exploratory data analysis.
Now, in the first project, we worked with this exact data set and we cleaned up the entire thing.
And that was a really good project.
And it set us up to explore the data.
And with all that clean data, we'll be able to look at our data much better and find better insights while we are using it.
Now, normally when you start the EDA process or the exploratory data analysis process, you have some idea of what you're looking for.
Sometimes, not always.
And sometimes when you're exploring the data, you also find issues with the data that you then have to clean.
So even though I did a data cleaning video and then an exploratory data analysis video and they're kind of separate projects, sometimes those coincide together where you're exploring it and cleaning it at the same time.
Now, what we're going to be doing here with this dataset, we're just going to be kind of exploring it.
I don't have any agenda.
I don't have any, you know, one thing that I want to look at.
I just kind of want to look at everything and we'll kind of discover and go about things as we are learning and looking at this dataset.
We will, however, start off really simple with kind of the basics, work a little bit more towards the tougher stuff.
And then the end we'll have some more advanced things that I think will be really fun.
So with that being said, let's start off with kind of more easier things.
We'll kind of just ease our way into exploring this dataset.
Let's pull this down and let's copy this right down here.
Now we're gonna be working with this total laid off and percentage laid off, or most likely this total laid off quite a bit.
The percentage laid off isn't super helpful because we don't know how large the company is.
We don't have another column here that says, here's how many total employees they had.
And then, okay, they had a percentage laid off.
You know, we won't work as much with this one, but we'll work quite a bit with this total laid off.
Let's look real quick.
We can look at something like the max total, and I need to use a parentheses, max total laid underscore off.
And let's look at this.
So on one day there was somebody out there who had the max total laid off of 12,000 people.
That's a lot of people to lay off in one, you know, one go.
That's a lot.
Let's also take a look at the max and I think it was percentage laid off.
Let's run this.
And it looks like one.
Now one represents 100.
That means 100% of the company was laid off.
And that's, you know, that's not great.
It just means an entire company went under essentially.
We can actually take a look at that because I'm interested to see, you know, if there's any companies I recognize or can see where, then you come right down here, where the percentage laid off is equal to one.
Let's go ahead and look at this.
And let's take a look.
So we have this ahead.
I'm just going to go through here and see if I recognize any of these.
I'm in the crypto space BlockFi.
I feel like I recognize that one.
I don't know.
Let's keep going.
Deliveroo.
It's not good.
They left a lack of 120 people.
I'm just curious.
I mean, I'm just kind of scrolling through here trying to see if I recognize anything.
These are companies that like completely went under or lost all their employees.
Volt Bank.
Interesting, just interesting to me.
We're gonna be taking a look at a lot of stuff, but these are companies that completely went under.
That's unfortunate.
We can also order by total, underscore, late, underscore, off, in descending, we'll see which company went under had the largest,
so this one had two thousand construction company had 2400 people they went um under it doesn't say what stage they were at but that's in the united states we can also take a look at and there's another column over here called funds raised in millions let's look at that one so i'm gonna see um these are companies that had a lot of funding or potentially a ton of funding
let's go over this is like 2.4 billion dollars i believe like i think it's like a ton of money um quibi i believe i know this company uh in blockfi i i thought i'd heard of them i'm pretty sure i know that is so quibi is one that i'm definitely familiar with it was like a short form media company yeah yeah and there's british volt which looks like an electric company that went under so
you know, some big companies that went under in 2023, 2020, 2022.
So that's interesting.
So we have a lot of companies here and we're just looking at that had total laid off.
But let's take a look.
Let's use group by real quick.
I'm going to look at the company.
And I also want to look at the sum of the total laid off.
And for that, we need to use a group by the company.
And let's just start with this, and I'm sure we'll use an order by in a second.
Yeah, let's order by, order by, let's just do two for now, in descending.
And two stands for one, two, this is the total laid off.
So for the total for this table, and we don't know how far back it goes, we haven't checked that yet.
We'll check that in a second.
But for this table, you should recognize a lot of these companies.
So I think it starts in like 2020 until like sometime in 2023.
But this is Amazon let go of 1,800 people, Google 12,000.
I'm guessing that's at one time because that was the max that we looked at earlier.
This is Facebook or Meta, Salesforce, Microsoft, Philips, Uber, Dell, Cisco, Peloton.
I mean, these are a ton of big companies.
Havana, they let go of thousands and thousands and thousands of people.
Twitter.
That's not surprising, given what's the change of things.
Groupon, ton of ton of people, or ton of companies.
And that's a lot of people that have been let go.
Now let's really quickly, before we keep going, I wanna look at our date ranges real quick.
So let's select everything.
Whoops, we'll do from there.
And how do we wanna do this?
Let's do minimum of date.
And let me do it like this, date.
And then we'll do the max as well.
Cause I want to look at the date range that we have here.
Let's run this.
It looks like it starts in 2020 of three 11.
So right when like, I believe the pandemic started or the COVID-19 started, I want to say that's like right when it hit at least us in the United States.
And then this is almost exactly three years later.
So early 2023.
So just in those three,
Three years, you know, here's some of what we're looking at.
These companies have let go of quite a few people or have had layoffs.
We can also take this exact thing.
Oops, what did I do here?
Copy this again.
We can also take this exact thing and look at quite a few other things.
There was the industry.
So we can look at industry, like what industry got hit the most during this time or had the most layoffs.
All we're looking at right now is total laid off.
We can also look at percentage in a little bit.
but it looks like consumer got hit really hard, retail really hard.
That makes a lot of sense with shops closing down because people couldn't come in for the coronavirus.
Now, we're just making assumptions, right?
But during that time, it was mostly COVID that impacted a lot of stuff.
Then we have transportation, finance, healthcare, food, real estate.
Yeah, there's a lot, a lot of people.
Let's look at the lowest ones.
Manufacturing, fintech, aerospace, energy, legal.
So low numbers on those, high numbers on these.
So really, really interesting.
Let's go back up.
Just want to look at our whole table really quickly, see what we got while we're looking at this stuff.
And let's run this.
And we looked at the company, looked at the industry.
I would really be interested to look at the country as well, which countries, at least from this data set, and we can copy, or we can go right here, country, because I believe that United States had the most.
Holy mackerel.
They had by far the most.
Then India, this is 256,000 people lost their jobs.
I think we'll look at the dates in a little while, like a kind of like time series, like how many per year, per month, per day, or whatever we want to look at.
But goodness gracious, that's a lot of people within just three years in the United States.
India, Netherlands, Sweden, Brazil, Germany, United Kingdom, then it goes down and down and down.
But these are just reported from this data set that I had gotten.
So really, really interesting.
Good night.
The United States had much more than most for sure.
Let's actually look at that date real quick, or we can look at it by year.
So we have this date, and if we do it like this, and we can do it by date real quick.
So this is going to do it by individual date and let's order by let's do one.
This is the most recent date.
So it's literally by date that's reported.
We don't want that.
Let's do it by the year.
So 2020, 2021, 2022, 2023.
We can do that fairly easily.
We'll use this year function and we'll group by the year as well.
Let's try running this.
There we go.
It looks like in 2020, 80,000 people, 2021, 16,000, 160,000 in 2022.
This looks like the worst year.
And then it's only, we only have three months of data in 2023.
There's 125,000, holy smokes.
So in 2023, it looks like we're ramping up, because I'm recording this in 2023, about a month after this dataset, that we got this dataset.
There's 125,000 people around the world, you know, but just in those first three months.
So this is going to be a lot higher than even 2022.
That's pretty wild.
Very, very interesting.
One other one, while we're looking at Group By, there was a column, and you can go back and look at it if you'd like, but it's called Stage.
And this shows the stage of the company.
And if we run this, and we're all just looking at Total Aid Off, but if you look at the stage of the company, this is like the different series that they're in, A, B, C, D. A, I believe, is like a Series A funding.
That's like a super, super starting, oh, this is like a seed phase.
Then there's Series A, and then it goes up, up, up, up.
until usually they go, like they do IPO or they get acquired or something.
Now, if we go up here and we do two descending, I wanna see which one had the most.
So this is post IPO.
This is the Amazon, the Googles of the world, the large, large companies that are post IPO or initial public offering.
Then there's unknown.
We don't know which that is.
A lot of layoffs from acquisitions.
C, D, B, all the way down.
So it looks like most of it's coming from these ones right here.
Really, really interesting.
Let's go look at percentages.
I'm just going to literally...
I'm trying to say literally.
I'm going to literally copy these.
And with percentages, I don't think...
Let me look at percentage laid off.
I don't think the sum is going to be a good indicator.
I don't know if this is a good one to even look at because, and then we're looking at company right now because percentages refer to a percent of the company, right?
So we don't have hard numbers because we don't know how large these companies are.
So now that we're actually looking at this, this percentage laid off isn't super relevant
um really the one that's kind of more you know has better this is a better use for what we're looking at is this total laid off because again we don't know these sums we could we could look at like the average right um but again that just doesn't help us that much i don't think um i think we're gonna really dive into that too much is my uh is my feeling
Now, one thing that I would be really interested in is to kind of look at the progression of layoff, right?
You could call this a rolling sum.
So start at the very earliest of layoffs and do a rolling sum until the very end of these layoffs.
And let's go to the bottom.
This is where it's going to start getting a little tougher.
um and there's you know we're just doing a little bit of exploratory data analysis you know do digging into this a little bit you can go and dig into this as much as you'd like you don't have to just do what i'm doing but i'm just trying to show you some stuff now let's try to do rolling total layoffs we could do that on the day although i feel like that's going to be way too many rows let's do it based off the month so right here in this month
Now let's see if we do just the month, let's do something.
I'll show you the month and that's going to be an issue.
And I'm in my head.
I already know, but let's look at it.
We could do something like select, um, from, and let's get this.
There we go.
So if we do, we'll do substring, let me add a semicolon.
Let's do substring and we wanna pull out this month right here.
So we'll go one, two, three, four, five, six.
So start at position six and this is of course in the date column.
We'll start at position six and then we'll take two.
Let's just run this really quickly.
And there's our month.
So this, we could do this as month, right?
Or like this.
Is that correct?
Yeah.
So as month, this is our month that we're doing it.
Now, if we group on this and we do like something like a sum of total laid off, I think that's the column.
And then we do a group by on this month.
So it'd be like this right here.
We'll do group by this.
Let's try running this.
We should be able to do month as well.
Let's try this real quick as well, because I don't want to have this if I don't have to.
Run it, perfect.
So the months right here don't show us the year.
So if we're trying to get a rolling total of just the month, it's actually, it would work fine when we actually implement the rolling total, use a window function.
But the issue with this is it's just gonna show us months.
So this is 2020, this is January of 2020, 2021, 2022, 2023, any other years we have it, this is not a great rolling total.
What if we did one all the way to, I want to say it's seven, six, seven.
Let's try this.
Now this is going to give us a much better, and let's order this, order by one ascending.
This is just our first column.
So now, well, we should do it where it's not.
Give me a second.
I'm figuring this out as we go.
We'll do where the month, write that, where the month is not null.
I'm just going to get rid of that one.
And of course that doesn't work because we're looking at the substring.
So let's try doing this.
There we go.
It just wasn't reading in that month that I was trying to use.
Let's go down.
Now here's what we're gonna do, is we wanna take it from the very first month, and we're grouping everything.
So these are all the layoffs from 2020 of 03.
So that's March of 2020.
Then we have April, May, and these are the layoffs.
So this is really good.
This is exactly what I was imagining in my head.
So we want this, and this is just 12 months in a year, and we go all the way to the bottom.
And I wanna do a rolling sum of this.
So let's see how we can do that.
And we'll use this logic in a little bit.
Let's copy this.
and let's do select everything, we'll do right here.
Now what we actually wanna do, now that I'm thinking about it, is we wanna take this data, and we wanna do the rolling sum based off this exact thing.
So we actually need to take this, let's get rid of this, and we'll do it with a CTE.
So we'll say with, and we'll do rolling underscore total, that, we'll say as, and then we'll put this in here,
just like that.
So with rolling total as, now we're gonna say select, and we'll just do from here.
Now what we need to do is we need to select the month, so let's go ahead and select that month, and we'll take it just like this.
So we'll select the month, and we need to do a rolling total.
All we have to do for that is the sum of which column we're doing, let's actually change this real quick.
We're gonna call this as total,
I'm just gonna keep it simple so the sum of total off so now we're doing that but we want to do it over and all we need to add into here is an order by we're not gonna partition by anything because in here we already did a group by so it's you know kind of like partitioning it just need to say order by we just need to order by the month I believe so let's try that
And let's run it.
Let's do that.
And we actually need to, since we're doing this, we need this at the end.
And we can rename this if we'd like.
So we can do this as rolling underscore total lowercase.
Now let's try running this.
And let's see what we get.
Okay, and this looks correct.
So starting in 2020 of 03, we had 9,000 layoffs.
Then the next total we added onto here.
Now, this visually isn't the best.
I would like the month right here as well.
So let me actually add, let me create its own row, put a comma here.
Then right here, I want to keep this total off so we can visually see better, much better.
Okay.
So we have the month, and as it goes down, we're having more laid off.
Now this is our rolling total.
Here's essentially how this works.
It starts with 9,628, then it adds on the next month, which is 26,000, which equals 36,000.
Then it adds on the next month, and we get 62.
Adds on the next month, 69, right?
It keeps going all the way down.
This just shows each month how many were laid off,
And this shows a month-by-month progression all the way down to the bottom.
So let's just take a look.
In 2020 of 03, we had 9,000.
By the end of 2020, we had about 81,000 or so.
Then at the beginning, right here, all the way down to 2021.
By the end of 2021, we only had 96,000.
So 2021 was a good year, it looks like, comparatively.
We had 90, 80, well, let me see.
91 000 people let go and here we only have 96 000 let go so that's what our 81 it's only 15 000 people that's like nothing um comparatively then in 2022 uh things start ramping up dramatically it looks like we have um 12 000 people 17 000 16 000 and they're adding up it's going from 97 all the way up to good night
Right before the holidays in 2022 of this past year, I mean, we had 247,000 people.
So that's like 130-some thousand.
My math's really bad.
It's like 150,000.
And then we only have, oh, we have even more here actually.
And then we only have the first three months of 2023.
So these months right here were really devastating, just around the world.
Now we can also break this out potentially by country.
So we can see how many per country, but this is just around the world.
That's a lot of people losing their jobs, all the way up to 383,000.
So in this range,
383,000 from March of 2023 all the way back to March of 2020 lost their jobs.
And this is just reported.
I'm sure there was, you know, much more than that.
This is at like companies, larger companies that have like Series A funding, IPOs, et cetera.
But a lot of small businesses went out of business.
So we don't have that information in this data set.
So I think that's what we're going to do next is kind of look at the company, maybe, because I'm always interested in the company.
Actually, earlier, let's not do that one.
Earlier, we're looking at the company, the sum of totally loft.
Let's bring this down.
Let's run that.
That's what rolling total is, by the way.
Rolling totals are great.
Really good for visualizations as well.
um let's see yeah so i want to take a look at these companies but i want to see how much they were laying off per year so instead of just looking at it as a total we'll break it out by the year now i'm just going to warn you this is probably going to this most likely be our last one in the in the lesson this is going to be probably our hardest one yet um potentially we'll see maybe the other one was earlier uh was harder earlier
Now let's use this kind of as a starting point.
But what we're gonna need to do is we wanna take the company, but I also want the date.
So I need to do a comma and then date.
So we need our date here and I'm gonna do that.
I need to group by the date as well.
So we'll do date and let's run this.
All right.
now this is just doing the you know company and the exact date we don't want to do that let's actually do the year let's just look at the year i think that'll be plenty you can also do the exact same thing as we did above with the substring although i think that's going to get a little messier stuff you know just a thought let's run this okay so now we're looking at just the year we're grouping by year let's order by let's say the company and we'll do that in setting
There we go.
And let's run this.
So now we have it open.
You can see people who made multiple layoffs.
This is in 2020, they let go of 200.
And then in 2023, they let go of 155.
This is a company I've never heard of.
So this is already looking really good.
Now let's say we wanted to use this.
And what we wanna do is we want to rank
which years they laid off the most employees now this is just a small uh sample we'll look at more in just a little bit we can actually look at um let's just do three
three descending, just like this, should be large companies.
So some of these companies like Microsoft, even Amazon right here and Amazon right there, they let go of multiple or thousands of people in different years.
So I wanna rank those.
I wanna say the highest one based off of the laid off should be ranked number one.
That's the year that they laid off the most people.
So let's go ahead and try to do that.
And we need to do is do a CTE.
We'll start with that.
Let me add some more things down here so we're good to go.
So let's do, we'll do with, let's do company.
So this is gonna be the company year, underscore year.
We'll do it as, and that's what this is gonna be.
This is our company year.
And we can do select everything from company year.
It's gonna be the exact query that we're looking at.
go ahead and run this okay so this is good now i do want to change these columns and i can do that right here we'll do company let's call this years and then we'll do i'm going to do total laid off again so total underscore laid underscore off this is the sum right total laid off per year let's go ahead and run this now
There we go, we have company, years, and total laid off.
So this looks much better.
And what we're gonna do is select everything, but we want to partition it, probably based off this years right here, and then we wanna rank it based off how many they laid off in that year.
So we'll get to see who laid off the most people per year, because some companies like Amazon, they let off multiple people per year, but was it the highest per year?
That's kind of what we're gonna look at.
So we'll do dense underscore rank.
And we're gonna do that over.
Now we're gonna partition by, oops, that's not how you spell partition.
Partition by, we wanna partition by the years.
So all of the 2021 layouts will be in the same partition.
All of the 2022 will be in the same partition.
And we'll do years.
And we want to also order by the total laid off.
Now we wanna do that in descending.
So we'll do total laid off descending.
And then we want to add this dense rank to it.
So let's try it.
Let's run this.
Good night.
That's a big one.
So let's take a look.
So in 2021, it looks like, or 2020, it looks like Uber had the highest.
And we want to take out these nulls.
So let's do where, years,
Let's say is not null.
And let's run that.
Here we go.
So in 2020, and that's what we're partitioning on first, it looks like this is one, two, three.
These are the top ones.
And let's order by, and let's do the, let's order by the rank first.
Let's call this as, bring it down.
What do we want to call this?
We'll call this as ranking.
There we go.
order by ranking ascending there we go now we have our ranking so in 2020 this is the biggest one of layoffs 2021 this is the biggest layoff I guess we'll have to take a look in meta in 2022 they had the biggest layoff and Google had the biggest layoff total for 2023 so this looks correct
But I kind of want to filter on this ranking to be able to only filter maybe the top five companies per year.
And I think we can do that.
Let's actually get rid of this.
I think what we should do is we should add this as another CTE and query off of that.
So now we'll call this company underscore year underscore rank.
So now we have the year rank as...
have our query oops of our query so now this is our company your rank so now if we do select everything from company your rank this we run it okay so now we have our rankings let's come down
Now we have our rankings, but I just want to filter it based off of that ranking.
We'll say where ranking is greater than or equal to, let's say five.
We'll look at the top five rankings.
Let's run this.
And I say greater than, I wanted less than on that.
That's looking good.
Okay, so really quickly, we have in 2020, we had, these are the top five people who laid people off.
Uber, Booking.com, Groupon, Swiggy, Airbnb.
In 2021, the largest layoff was ByteDance, which I think is TikTok, right?
Katerra, Zillow.
Yeah, these are the top five.
So 2021 or 2022 and 2023 were definitely the largest as well.
We have Meta, 11,000 people, Amazon, Cisco, Peloton, and Carvana, as well as Philips.
They tied.
That's why we have the dense ranking because some of these will be ties.
Then we have Google.
In 2023, all the way down to Dell.
These are all ones I know, Microsoft, Ericsson, Amazon, Salesforce, and Dell.
So this is really, really interesting.
Just looking at a year by year snapshot, right?
These are the total laid off for each company.
And we could even go back and change this for like industry or, you know, really whatever we want to change this to.
This isn't just an interesting query in general to look at, you know, per year.
And we could go back and change it for a month or, you know,
Lots of stuff we can change in here, but this is really interesting to me.
It just looks like a lot of the large tech companies took some big Ls, took some big hits.
Let's recap this query really quickly in case it's tough to follow.
But we created this query up here, and we were looking at the company by the year and how many people they let off.
Then right over here, we said with the company year, we change these columns.
This is our CTE.
So we created our first CTE.
then we went and we gave it a rank and we wanted to you know filter on that rank so we did this rank as another ct we just did a comma had a second cte and we hit off the first ct the company year which is right here we hit off our first cte to make this second ct and then finally we um queried off of the final cte
Definitely not an easy query to kind of think through and walk through, but I hope you're able to follow because that's a really good query.
This is something I've definitely done in a real job when I was working with a lot of healthcare data.
This is a lot of stuff that I would do.
And so this is a pretty good query to know how to do.
But with that being said, we are done with this lesson.
I hope this wasn't too short.
I don't know how long I ran, but we looked at a lot of different stuff.
Let's go back to the top.
Again, we were just exploring the data.
We looked at laid off a lot, looked a lot at the company, when these dates actually started for these layoffs in this dataset.
We looked at the country, the actual year of laid off.
Then we went to a little bit more difficult things.
We looked at it per month.
So per month,
how many layoffs they had, and then we did a rolling total.
This one was a pretty good one using that substring.
I love substrings, man.
They're awesome.
Or lady, they're awesome.
And then we came down here and we did the one we just did with multiple CTEs in the company.
I think it was a really, really good solid project.
Combine that with that data cleaning project and man, you got just a really good start with some MySQL projects.
And this one can be expanded upon.
Don't stop where I stopped, right?
Let me go back to the top.
Don't stop where I stopped, right?
This dataset has so much data in it.
You can do a lot of different things.
And even if you want to, you could go and find these companies right over here, and you could try to get their total company that they had.
And you could use this column a lot more.
That'd be really interesting with some calculations there.
So with that being said, that is the end of our exploratory data analysis project.
I hope you enjoyed it.
I hope you learned something both in the data cleaning project and in this exploratory data analysis project.
That's what this is all about.
getting the confidence and gaining the experience to create these projects and add those to your portfolios speaking of which if you haven't already check out my video on how to create a free portfolio website using github awesome i highly recommend it you can add these to your portfolio
So with that being said, thank you so much for watching.
I really appreciate it.
If you liked this video, if you learned anything at all, be sure to like and subscribe below.
Check out my channel for tons of other videos just like this one and more.
I will see you in the next video.
Similar videos: Learn SQL Beginner to Advanced in Under

Python Full Course for Beginners

Python Full Course For Beginners| Job Ready Python Course by Sagar Chouksey 🔥

People as Resource | New One Shot | Class 9 Economics 2024-25 | Digraj Singh Rajput

How I Would Learn Python FAST (if I could start over)

✅ 100+ MCQs To Get 70/70 In Physics Exam 2 | Must Practice! | Catherine Ma’am

