how sql statements gets parsed


how do statements get processed by Oracle? How does Oracle take SQL and do something with it in the database? Let’s say you were given a task by your boss. Say, to develop a company phone book. And you need a little bit of information about the departments that they’re in, the employee’s name, and the employee’s phone number. And you have that in two different tables. The department name is in the departments table. The employee name and the employee phone number is in the employees table. So your boss asked you to write a query on this. Let’s write that query right now.

select D.dept_name, E.emp_name, E.phone
from Dept D, Emp E
where E.d_id=D._did
order by 1,2;

So let’s say we have a SELECT and we’re going to get the department name. And then you’re going to get the employee name, the last name. And then their phone number.

So that’s the first part of the query. We’re going to select those three pieces of data. Now we’re going to have to get this from table. So we need a from clause.

And we’re going to get it from the department table. And I’m going to use an alias, D. Just so I don’t have to write DEPT all the time. And we’re also going to get it from the employees table, EMP. And we’re going to give that an alias, E. Because, again, I don’t want to write it.

Now to make this query work and effective, we need a WHERE clause. And luckily, we have two joining columns that we can put in. WHERE E.DID=D.DID.

pg1

So for every employee who has an employee record who has a department ID, we can now use that to join the departments table. This way we can get the department name.

Now my boss, of course, always wants it sorted. So we’re going to do an ORDER BY. And he wants it by department name first. And I could say department name, but I don’t like to type, so I’m going to use a 1 for the first column. And then by the employee name, 2. So that’s our statement.

Let’s say we type that in SQL*Plus or maybe some reporting tool that we have that’s going to send the query off. So here we are. We’re going to send that query off and who gets that? Our query?

Our server process. I like to think of our server process as our little assistant who’s going to do all the running around for us. That statement is then stored in PGA. Specifically, in the UGA. And then what does Oracle understand of it? Absolutely nothing.

The statement has to be parsed and put into a form that Oracle can understand. Now who does this? Our server process. And the first thing that it goes through and there’s an order at which it goes through things. The first thing that it does is it does a syntax check.

Now think about how many different types of statements there are in SQL. We have SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, a couple of others, GRANT, REVOKE.

So the first thing it does is it starts scanning it. Who scans it? Our server process. Remember, where is our statement? Our statement is existing in just UGA alone still. So we’ll start scanning it. Ah, a SELECT, they must want to query data.

Well, SELECTs have a certain form. Do we have an asterisk? Or do we have columns? We have columns listed. Are these columns joined together with some sort of function such as a concatenation operation? Or are they separated with commas? They’re separated with commas.

Does the last column listed have a comma? No. OK, good. So now we have a FROM clause. We need a place to select our data from. We have tables listed, did we assign aliases? Yes, we did. Do we have commas there? Yes. To divide one table from another that we’re selecting from.

Now equally, this could have been inline views or anything else. Next, optionally, now this is an optional component. Do we have a WHERE clause? Yes, we do. OK. We have some columns listed. Great. Do we have an operator of a sort? Yes, we do. We have an equality operator.

Could’ve been a greater than, less than, not equal to. Could have been is null. Could’ve been anything here. We only have one set. Optionally, we could have had an AND or an OR clause in here. We don’t. Now, the question is is next.

I noticed what can we have next? We could have a GROUP BY. Do we have a GROUP BY? No, we don’t. OK. So if I had a HAVING clause, I would have had an error. OK?

That was one of the big things that you have to realize. Is that we’re going to check syntax and we could use elimination. Speaking of which, what happens if I made a syntax error so far? Like anywhere. Like maybe I spelled the word from wrong? Or maybe I put a semicolon up here?

Well, I would have gotten kicked out with an error at this point. My server process would have kicked me out with an error and notified me that I had some sort of problem. But I don’t.

So I wanted to show you right now that we don’t have a GROUP BY, so we eliminate the need for a HAVING clause. Do we have an ORDER BY? Yes, we do. Did we name columns? No. We used symbolic substitutions from our SELECT clause. Great.

So this is our statement. We passed the first component. We described a SQL statement and we fit the proper form of the SQL statement. OK, now we go on to phase two. Phase two of the check is semantics plus security. This is where our server process must leave just working with the query by itself.

It goes over and communicates with the shared pool. Specifically, the data dictionary cache. Now think about the data dictionary. What is it? It’s metadata about what’s in our database. It’s data describing data.

So our server process queries and says, hi, does this table depth exist? Does this table EMP exist? Whose objects are they? Do these columns exist? Are they there?

Now most importantly, this is the next component. Does this user that I represent, in this case, me, Daryl, does this user have permissions to query these? If I do have permissions then we can proceed on to the next step. If not, then I get an error table, object doesn’t exist. And we error out.

So we’ve identified form here. Then we’ve identified functions. The details of that form. So now we found out that these objects all exist, that I have permissions to query them, now we can go on to the next step. The next step I call SP check.

OK, this is where we’re going to determine if the statement has ever been run before in recent history. So what are we going to do? We’re going to take the statement and we’re going to decompose it into a hashed number.

So we’re going to go through, run it through the mathematical algorithm. We’re going to determine if the statement has ever been run before. Now let’s just pretend this statement hasn’t been run before. What do we do next? This is a new statement, it’s never been run before in our database.

Well, then we go on to the next step. So let’s answer that question, no, it’s never been run before. We go on to the Optimizer. Now who is the Optimizer? Is it a special process? Nope. The server process now changes hats. It’s now going to be doing a different operation in our database. OK, great.

So it’s got more than one job. Remember, I want you to concentrate on the fact that everything is still contained within the UGA. This statement really hasn’t left out yet. And the first thing that comes by in our database Optimizer, in the Optimizer step, that is, is it rewrites our statement. the thing that you have to consider is a couple of things. Think about what a view is. What is a view? A view is a stored statement.
A SQL statement of view is a stored statement. It looks like a table. It returns data like a table. You may think it’s a table, but it’s not a table. It’s a piece of stored SQL that when you query the database, or when you query the view, executes that statement. And you get data back.

Now what if one of these tables wasn’t really a table? What if this was a view? Then we’d have to take that view code and merge it with our code and come through a whole new piece of code. This is why you get a rewrite. Are there other reasons that you can get a rewrite? Well, yes, there are. There’s a couple things– orexpansion. You can look them up.

But let’s talk about another type of view. And this is called the materialized view. Now this is a different type of view. What it is is it’s a piece of stored SQL that has been executed and the result set has actually been taken and instantiated into a table.

So now we have a stored results set that is periodically refreshed. What’s the period of refreshment? I don’t know. It depends on what you defined it to be. But if we have some parameters set, we can actually utilize a materialized view.

Now where would that be useful? Maybe in places like to, what I call to eliminate the query from hell or the JOIN operation that will do lots and lots of work. Maybe I can take a result set and store it. And eliminate having to do that very arduous query over and over again. And those can all be set up transparently in the system with query rewrite enabled and so on.

But our query doesn’t have that. These are just too straight statements that are very easy for it. The rewrite operation is minimal. OK, then what? Then our server process has to look and say, how many items are we querying? We’re querying two. Well, let me go find out about that.

So our server process goes back over here into the shared pool. And then queries the geometry of the tables. I know what you’re thinking, geometry, what is he talking about? What is the word geometry? Geometry just describes the shape of an object. And that’s what we’re doing.

The server process query says what’s the shape of this object? And by shape I mean how many columns exist? How many rows are in the object? What’s the average row length? How many blocks are used? And so on? When was the last analyzed? Is it stale?

It does the same thing for this object. Again, how many rows are there? How many blocks? What’s the general shape of it? It also queries other things like any sub-objects that go along with it.

Now it’s got to decide how to join the objects. So it knows the shape, it knows all about the objects. It knows all about the children of the object. Now what does it do? Now it’s got to think about how to put the objects together to get the query result.

Now, which one should it go by? Should it read first the employees table? Or should it read first the department? Well, that’s a good question. Now I want you to remember that Oracle is a cost-based optimizer. It needs those statistics, it needs the geometry and shape to make that decision.

Now how many iterations of possible JOIN operations does it have to put together? That’s an interesting question. Because it’s based on something mathematically called the travelling salesman problem.
Now I bet I know what you’re thinking right now. And it’s this, well, there’s only two possible routes here. How bad can that be? Now what if I told you that this table, the department table, had three indices on it and this employee table had five indices on it? Those are child objects, or cities, that we have to account for in our traveling salesman problem.

So how many things do we have to visit and account for? 3 plus 5 indices, that’s 8, 9, 10. So how many possible combinations of things do we have to account for? 10 factorial. 10 times 9, times 8, times 7, times 6, times 5, times 4, times 3, times 2, times 1.

Possible different scenarios. Different ways of putting all these objects together. That it’s got to go. Now that’s the theoretical maximum limit. Now Oracle is smarter than that. We’re going to do some pruning and some elimination, but that’s the maximum possible combinations that can be put together.

So it’s going to do what? It’s got more geometry. Remember when it went and queried the geometry of the objects? That’s how we discovered about these indices. So we also got the geometry of those indices as well. OK? So we now know the shape of them.

And when I say geometry of an indice, what do I mean? Well, is the indice unique or not unique? How many distinct entries are there? How many leaf blocks are there? How many levels deep are the indices? That’s the height. Sometimes also referred to as the B-level.

OK, so we have a lot of shape. How sparsely populated are the leaf blocks? Or not? So we have a lot of shape that we have to account for. And now Oracle begins a very, very arduous process of churning the numbers. And it basically is going to go and say, do I use this indice, do I use that one? What’s the effect? What’s the cost?

So eventually it’s going to go through that iterative process. Probably no more than 10 factorial at this point. Because that’s the maximum limit of ways we could put this together. And then it’s going to come up with the best number. A rough idea.

And it’s going to say this is the best plan that I could come up with. Now it takes a lot of things into account. Takes in what’s going on with the operating system, takes in what’s going on with the shared pool, and the shape of the object. But it’s going to come up with an idea. And it’s going to be a long process.

And do you ever have those Eureka moments in life? When you sit down and you come up with a great idea. You’re maybe at lunchtime, you write it on the back of a napkin. And you say, this is a great idea. And you bring it to your boss.

And your boss says, yeah, it is a great idea. I can’t go to management and show them this. I can’t bring a back of a napkin down to the production line and show them this. You need to clean it up.

Well, once it comes up with its great idea moment, its Eureka moment, as I call it. It now goes to the RSG. And that’s the row source generator. And what is that?

Well, that’s your group of assistance. Technically it’s still the server process. It’s the next step in it. What it’s going to do is it’s going to take your statement, it’s going to clean it up, it’s going to package it nicely, it’s going to bind it with the text of the query. It’s going to generate another hash ID. This time on the execution plan that your optimizer thought up. It’s going to assign it an ID.

By they way, where’s all this happening? Still happening up here. Still happening in the UGS. We haven’t left the UGA very much yet. So it’s going to generate this nice package. It’s going to assign it an ID. And then it’s going to transfer that nice compact package to the shared pool.

That nice little compact package is called a cursor. OK? That, if you ever wondered what a cursor is, that’s all it is. So now that’s been transferred into main memory. Great So now what? What’s the next step?

we have one more step. Execution. Now it gets executed. And who executes it? Our server process. That’s all who executes it.

So you’re probably wondering to yourself, why did I take all that time to generate a cursor in here and then put it over here just to have my server process, where it all started from, execute it? It’s an interesting question.

And that has to come up with is, has it ever been done before?
We at Oracle kind of learn from examples like that. This step here, going through the optimizer is a very arduous task. Where would Oracle rather spend its time? Would it rather spend its time figuring out how to answer every single query that comes into the system? Or would it like to utilize past results and try to skip that?

Well, we would like to skip that. And that’s where this SP CHECK comes in. Remember that? We make a determination if this query has ever been run before by running the hash number. And if we find a positive that this query has been run before. And how do we do that?

We take the hash number and then our server process goes searches the shared pool for a match. Has it been run before? If it has, why not just utilize that execution plan?

So why not just come like this. It has been run before, we can skip the optimizer, the RSG, and just go right to execution. We can do that. And that’s what the hash allows us to do.

This is called parsing. Everybody parses. No matter what statement you get, you go through a syntax check, you go through a semantics and security. The question is, depending upon what type of system that you have, what happens here?

So why not leverage past results? Why not use execution plans that have already been done? So now we can do that. Now this parsing, when we don’t find it, I’m going to write this in red. That’s called a hard parse. When we go through and go through the optimizer phase, when we do find it, yes, that’s called a soft parse.

We’ve gone through everything, we found an execution plan that’s already been executed, we can just go and utilize that. In a OLTP system, I would expect to see a lot of soft parses. In a warehouse, since all queries generally are unique, I’d expect to see a lot of hard parses. Why?

Because if nobody’s changed the base set of data, I would ask, why are you running the same query over and over again? The data hasn’t changed. Do you expect something different?

So that’s how parsing works, folks. Think about your type of environment that you’ve got here. If you’ve got an OLTP environment, you want to see lots of soft parses.

Now the reason why we did this, I haven’t answered a question yet. We move the cursor up here. If the cursor is up here, how does the next server process be able to get the execution plan? It can’t reach into my private memory. That’s why we put it here in the shared pool. If the execution plan is there, my next person who comes in Mark, or John, or Susan, or anybody else, can access the execution plan up there.

Now this also leads to another question about execution plans. So we look at this here. If I changed this letter, or let’s just say I wrote this query, and somebody else comes along with the exact same query except this letter is different. This letter is a lowercase e. Is it the same query? Think about it for a second.

Those of you who say, yes, you’re absolutely right. Those of you who say, no, you’re also absolutely right. Because logically it is the same query. It produces the same result set, it’s asking for the same thing. But let’s stop thinking like a human being now. And let’s start thinking like a machine.

To the machine, to the hashing algorithm, it’s a different query because we generate a different hash plan. A hash number. So that I want you to think about. That case matters. That spaces matter. That literals, such as if I searched for data, a particular user in this case, would matter.

As long as your FROMs begin on new lines, as long as you indent the same, you’ll get and start leveraging. OK? That’s the big component that I want you to think about.

Now I know what you’re thinking. How do we do this and search for unique data? Well, that’s an interesting case. Let’s change my query a little bit, folks. Let’s say my boss doesn’t want a phone book, he just wants names and departments of every employee that he wants to search.

So let’s add this to the statement. I’m going to write it in blue so that we can see my addition. AND E.EID=123. Let’s say that’s my employee ID. Now somebody else comes along and they want to search for employee ID 124. Is that a different statement? Absolutely, the hash is completely different.

OK, so how do I search for unique data and still leverage soft parsing? That’s the key. Well, we have something that we’ve invented called a bind variable. And a bind variable is a dynamic runtime substitution.

Now I’m going to use my favorite bind variable for throwing away things. And let me do it in red here. xxx. Why do I use that as a throw away? It’s visually quick for me to be able to find things. I do xxx, xxx1, xxx2, xxx3. It’s a quick visual when I scan something I can find it in code really quick.

So what does this do? The first person who runs the statement, parses, go through everything, they make a cursor. Now when the server process goes and runs it, it fetches the cursor from the shared pool. And it sees there’s a substitution. Oh, I need to look at my own UGA for a variable called xxx.

And when I run this cursor, I’m going to substitute whatever’s in that variable in the run execution. So now I get my own data, the next user gets their own data, and we still get what? One cursor in memory. Not two cursors. Because remember, we’re all about speed at Oracle. We do not want to do what?

We do not want to spend time taking the same ground twice to use a militaristic term. We want to spend time answering your query and spending CPU time answering your query, than figuring out how to answer it.

Let’s leverage the past results. Like I joked around before. First person paid quite dearly for it. They went through and they did all the hard parsing. Now we can leverage that. And because of bind variables, we can even further leverage it. So that we can still look for unique data and still look at execution plans.

And Oracle, actually, since Version 11 and Version 12, it works exceptionally well, as well, is we peek into the bind variables and determine if the execution plan is appropriate for the data. It’s called bind peaking.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s