Dual format of Oracle 12c (Boosting analytical queries) PART-1


Oracle Database In-Memory is the introduction of a new in-memory column store into the Oracle database. It’s going to allow you to populate data in-memory in a column format and have your queries automatically take advantage of that column format.What types of queries would those be? They’re typically analytical style queries, queries that are going to scan a lot of data, apply some WHERE clause predicates or filters to that data, and return a subset of rows. The Oracle optimizer is fully aware of this new in-memory column store and will automatically direct the queries that will benefit from it to the column store. It does not replace the existing row format in-memory that we’ve always had, or the buffer cache.

7

The column format is purely in-memory. It’s not persisted on disk in any way so nothing changes in terms of the disk capacity required to host your database when you use the in-memory store.The in-memory column store is a new component, or pool, inside of the SGA, or the Shared Global Area of your database. The in-memory column store is a static pool, which means it will not grow or shrink during the lifetime of the database.

When you’re allocating the total SGA, using the SGA target parameter, you need to ensure that it’s large enough to encompass the existing components within the SGA, things like the buffer cache, the shared pool, as well as the new in-memory area. You can specify the size of your in-memory area by using the new in-memory size parameter. If you look at V$SGA you’ll see that new pool and how much memory from your total SGA has been allocated to that pool.

Remember, it is a static pool, which means it is not controlled or managed using the new automatic memory management algorithm that was introduced in 11g. It is a static pool, which means it will neither grow nor shrink over the lifetime of the database. So you want to make sure, when you’re specifying your INMEMORY_SIZE, that you make it large enough to accommodate all of the objects you need in the column store. You’ll also need to remember that the SGA_TARGET is set large enough to accommodate that new in-memory area, as well as all the other existing components in your SGA.

You get to decide which objects you want to populate into the column store. And those objects are going to be compressed as they’re being populated.You can either specify the new in-memory attribute at a table, partition, subpartition, or materialized view. That new in-memory attribute can be specified simply by saying

ALTER TABLE sales INMEMORY;

If you want to remove an object from the column store you simply say

ALTER TABLE sales NO INMEMORY;

Because you have the ability to specify the in-memory attribute at a segment level, so either a partition or a subpartition, you can have an object that only has your performance-sensitive data in the column store, while the rest of the object resides on disk. You can do this by specifying a different in-memory attribute on each of the partitions in that object. When a query comes in to access that object, we’ll pick up the data from wherever it resides, whether that’s the column store, or disk, or a combination of both.

The IOTs (index organised tables)and hash clusters, both of which are more OLTP focused features, do not get populated into the column store. The reason being is that these structures typically enhance access to an individual record in the database rather than an analytical style query that’s going to access lots of rows in the database. So they’re not really useful to be put in the column store.

The other component, or objects, that doesn’t always get populated into the column store is a LOB column. LOB columns are typically used to allow you to put unstructured data into the Oracle database. And they come in two forms, an in-line LOB and an out-of-line LOB.

An in-line LOB is one that’s 4k or less in size, and is stored co-located with the other column values of a particular row. Those in-line LOBs will be populated into the column store when the other rows and the other columns of that table are being populated into the column store. An out-of-line LOB, on the other hand, is a LOB that is greater than 4k in size and is stored separately from the other columns in that row. Those out-of-line LOBs, or larger LOBs, are not populated into the column store. We believe that they’ll waste a lot of space, and they’re frequently not accessed along with the rest of the columns.

So if you mark table that’s got a LOB column in-memory then we’ll populate any in-line LOBs automatically into the column store along with the other column values for that table. But we’ll exclude any out-of-line LOBs when we’re doing that population. It’ll happen automatically for you. So if you forget, and you mark a table in-memory that has only out-of-line LOBs, we’ll automatically exclude that column and populate all of the others into the column store.Not all of the columns in all of the tables need to be populated into the column store. You do have the ability to exclude some of the columns from a table during the population. You’re able to do that by simply doing

ALTER TABLE sales INMEMORY,

NO INMEMORY (prod_id, client_id);

That is going to allow you to save space in the column store by not having the frequently accessed columns be populated into the column store. You also get to control when the objects get populated into the column store. You’re going to do this by specifying a PRIORITY attribute on the objects.

create table order(c1 number, c2 varchar(20), c3 number)

inmemory priority critical

no inmemory(c1);

There are five PRIORITY attributes that you can specify, critical being the most important, high, medium, low, and none. None is the default, and that means the objects are only populated into the column store on first access. If you specify any of the other priorities on an object then, when the database starts up, we will process the objects that need to be populated into the column store in a prioritized list starting with the most critical objects first, then the hot objects with a high priority, medium priority, and low priority, until all of the objects that have been marked with the in-memory priority have been populated into the in-memory column store.

By default, all of the objects populated in the in-memory column store are going to be compressed. They’re going to use a new type of compression technique that’s different from the compression techniques we’ve used to date in the Oracle database. These compression techniques are focused on performance, and by that I mean query performance. So these new compression algorithms allow us to be able to apply the WHERE clause predicates, or filter predicates, directly on the compressed data.By default, you’re getting MEMCOMPRESS FOR QUERY. But you can change that if you wish. You can change the types of compression that’s used on the objects in the column store by specifying a different MEMCOMPRESS attribute when you’re either altering or creating an object. You’ve got various choices for the different types of compression we have, all the way, starting from no compression whatsoever to 1.compress for DML 2.compress for query, low and high 3.and compress for capacity, low and high.

Compress for capacity is going to allow you to use a higher level of compression to get more data into the column store. There is a slight got you by using MEMCOMPRESS FOR CAPACITY, though, because it will require additional CPU to decompress some of the data before applying your WHERE clause predicates to that data. So it will allow you to save space, but it comes with the payment of requiring additional CPU when you query it.

So when you are populating the objects into the column store you need to bear in mind that trade-off between saving space and getting a great compression ratio and being able to apply the WHERE clause predicates of your queries efficiently against the data stored in the column store. You do have the option to specify a different compression technique for different columns within a single object, and of course, a different compression technique on all the different objects that are stored in the in-memory column store, should you wish.

How do I know which tables are marked in-memory and for in-memory, and which are not? Well, there’s a new in-memory column in USER_TABLES.

6

You’ll see there that this column is a Boolean. It’s either enabled OR disabled. In this example here, the PRODUCTS table has been ENABLED for the column store.

this is a Boolean column that either has one of two values, enabled OR disabled. So why do two of the tables listed here have no value specified whatsoever? Both the COSTS and the SALES table are partitioned tables. That means that the table level, or the partition table, is just a logical object. The actual physical object that has the in-memory attribute specified on it are the underlying partitions or subpartitions of those objects. So if you want to see whether or not those individual partitions have been marked for in-memory, you’d need to look at user tab partitions or user tab subpartitions to see the INMEMORY attribute specified on those underlying physical objects.

by looking in a new v$ view called v$IM_SEGMENTS. It has a status column in that new view that will tell you whether or not the object has completed population AND is in the column store, or it can tell you if the object has been started.

Objects that have a start status means they’re in the middle of population right now. There is an additional column in this view called BYTES NOT POPULATED that will indicate to you how much more of that object is left to be populated into the column store. Once that column reaches zero then the status will change from STARTED to COMPLETED.

So now you know how to get data into the column store, and you know how to size the column store, but how much space do you really need? How well are your particular objects going to compress when you run MEMCOMPRESS on them? Well, you can find out by running the Compression Advisor.

alter materialized view mv1

inmemory memcompress for query;  //by default

create table trades(name varchar(20), desc varchar(200)) inmemory

memcompress for DML(desc);

The Compression Advisor has been extended in Oracle Database 12c to make it MEMCOMPRESS-aware. So you’ll be able to run the Compression Advisor, just as you do today, by calling DBMS_COMPRESSION and specifying the new MEMCOMPRESS attributes there, and have it tell you what compression ratio you can expect for your objects should they be populated into the in-memory column store. If you’re not sure which objects you want to populate into the column store there is a new In-Memory Advisor. It will look at your existing AWR and ASH repositories and determine, based on the information we find there, which objects would be good candidates for the column store. It’s going to give you a prioritized list of those objects, let you know what kind of performance benefit we believe you’ll get by populating those objects into the column store, and it will also tell you what kind of compression we believe you’ll get for those objects.

Why is the row format not as efficient as the column format

1

2

Take for example, our query here. I’m looking for an individual column in the table, in this case, column four. And my table has been populated in-memory in the row format into the buffer cache. When this query begins, we’ll need to find the offset for the first row in that table. We’ll walk down that row until we find the fourth column. We’ll extract the value that we’re interested in, and we’ll move to the offset of the next row, and so forth, until we’ve scanned all of the rows in the table returning the values just for column four.

If we were to execute this same query against the table in a column format, this is what you’d see. Inside in the column store each of the columns in the table is being stored separately. So we go directly to column four, we don’t scan any additional information, and we’re able to get all of the values that we want directory from column four.

There’s another component of this, of course, is that the data in the column store is compressed, so the volume of data we’re actually scanning in column four is a lot less than it would be scanning the same column in the buffer cache in the row format. So we’re getting several advantages there, only access the data we’re interested in and access that data in a compressed format.

But there are some other advantages to scanning data in the column store, as well. One of those advantages is the fact that each of the columns in the in-memory column store actually has something called a storage index created on it. That storage index is keeping track of the min/max values for each of the extents, or IMCUs, within that column.

.3

We’re able to use that storage index to provide data pruning, or data elimination, as part of the scan. So in other words, we’ll check the value that we’re looking for against that min/max range for each of the IMCUs in that column, and we will only scan the IMCUs where we find a match.

So say, for example, I’m looking for the store_ids where the store_id is eight or higher. I’ll look at the first extent, or IMCU, in this column. I’ll see that I don’t have any store_ids that are eight or higher, so I’m not going to even bother scanning that component, or extent, of this column. I’ll skip it.

5

I’ll move on to the second one. It also has no entries that are eight or higher, so I’ll skip that also. I’ll move on to the third IMCU in the column. I find that I do have the potential to find a match for a store_id eight or higher, so I will scan that particular IMCU. And I’ll move on and do the same for the fourth, because, again, it has the potential to have some values with store_id eight or higher. By getting data elimination, or not scanning, data we’re able to improve the scanned performance by, again, reducing the amount of data that needs to be scanned as part of this query.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s