Thursday, March 29, 2012

Data structure Internals Question

Hi all,
I have found that adding a column to an existing table (tested on a table
with >1M rows >200 bytes long) behaves as follows:
1. If the added column does not allow NULLs and has a DEFAULT value, the
operation actually changes the row structure.
I can see that from the time it takes to complete and the number of reads
and writes I trace with profiler.
SELECTing from the table shows the default value was correctly entered to
the new column.
So far, so good.
2. If the added column allows NULLs, the operation takes ~0 time and just a
few reads and writes, implying a metadata operation only.
SELECTing from the table shows that NULLs were entered instead of the
DEFAULT value.
This is a documented behavior, I know I can use the "WITH VALUES" option to
force the default values into the columns instead of NULLs which will result
in the same behavior as above (1).
This happens regardless of whether the column has a DEFAULT value and also
regardless of whether the column is fixed length or variable length.
What surprises me about it, is the fact that now I actually have a table
with different row structures.
The old rows (before the column was added) were not changed and the new rows
I INSERT or UPDATE will have a different structure reflecting the newly
added column.
From what I know about the data structures, fixed length columns take up
their size regardless of their actual value and variable length columns
require an additional 2 bytes for the column offset in the row. Both fixed
and variable length columns that allow NULLs require an additional NULL bit
in the NULL bitmap for each row as well.
If I SELECT from the table, I get the NULLs as expected from the old rows as
well but where the **** are they actually stored?
I suspected that since the NULL bitmap always uses 8 bit bytes, perhaps
there was a bit left to spare for the new column (although it doesn't seem
logical to have a NULL bit for a column that does not exist in the row
itslef) so I created a table with exactly 8 nullable columns and added a 9th
one to see if it would take the time to allocate another byte for the NULL
bitmap but it seems not to... it completed almost instantaneously (<100ms)
taking only about 20 reads and 3 writes to complete for a >2500 pages table
(I have cleared the cache before running the ALTER TABLE).
Any insights about what's happening under the hood here will be appreciated.
TIA,
AmiAmi,
It would help a lot if for each of the situations you don't
understand you provide the following:
1. A CREATE TABLE statement that creates the table with
the structure it has before the new column is added.
2. An ALTER TABLE statement that adds the new column.
3. What you see from the ALTER TABLE in terms of reads
and writes from Profiler.
4. What you expected from #3.
5. The statements to create any indexes or constraints that
exist on the table beyond what is in the CREATE TABLE
statement.
You seem to have a good understanding of the way rows are
constructed, and I'm not sure exactly what specific situation
it is that you don't understand. As far as whether there is
a bit in the null bitmap to represent NULLs in a non-existent
column, in some sense, this does exist, but a better way of
saying it is that there are at times unused bits in the NULL
bitmap which are just meaningless filler to pad the data
to byte boundaries; when a new nullable column is added
without VALUES in this case, the change can be represented
without any data changes. One of these spare null-bitmap
bits becomes the null bit for the new column through a
change in metadata and at most a flipping of the bit (I don't
know offhand if spare bits are always zeroed out - the
bits may have to be set with write operations also. The null
values don't take up space in the table beyond the null bit.
Some of this is a little different depending on whether the
column in question is fixed or variable length, or whether
the table has any variable-length columns or not. Whether
the table is a heap (no clustered index) or is its clustered
index (if there is one) may also be an issue, and when data
is actually added, the number of writes may change dramatically
depending on whether adding a new column with VALUES
changes the number of rows that fit on each 8K data page.
I hope you can see why it helps to know the particulars of
the examples that concern you. For more information, you
can look at the topics "Estimating the Size of a Table with a
Clustered Index" and "Estimating the Size of a Table Without
a Clustered Index" in Books Online.
Steve Kass
Drew University
Ami Levin wrote:

>Hi all,
>I have found that adding a column to an existing table (tested on a table
>with >1M rows >200 bytes long) behaves as follows:
>1. If the added column does not allow NULLs and has a DEFAULT value, the
>operation actually changes the row structure.
>I can see that from the time it takes to complete and the number of reads
>and writes I trace with profiler.
>SELECTing from the table shows the default value was correctly entered to
>the new column.
>So far, so good.
>2. If the added column allows NULLs, the operation takes ~0 time and just a
>few reads and writes, implying a metadata operation only.
>SELECTing from the table shows that NULLs were entered instead of the
>DEFAULT value.
>This is a documented behavior, I know I can use the "WITH VALUES" option to
>force the default values into the columns instead of NULLs which will resul
t
>in the same behavior as above (1).
>This happens regardless of whether the column has a DEFAULT value and also
>regardless of whether the column is fixed length or variable length.
>What surprises me about it, is the fact that now I actually have a table
>with different row structures.
>The old rows (before the column was added) were not changed and the new row
s
>I INSERT or UPDATE will have a different structure reflecting the newly
>added column.
>From what I know about the data structures, fixed length columns take up
>their size regardless of their actual value and variable length columns
>require an additional 2 bytes for the column offset in the row. Both fixed
>and variable length columns that allow NULLs require an additional NULL bit
>in the NULL bitmap for each row as well.
>If I SELECT from the table, I get the NULLs as expected from the old rows a
s
>well but where the **** are they actually stored?
>I suspected that since the NULL bitmap always uses 8 bit bytes, perhaps
>there was a bit left to spare for the new column (although it doesn't seem
>logical to have a NULL bit for a column that does not exist in the row
>itslef) so I created a table with exactly 8 nullable columns and added a 9t
h
>one to see if it would take the time to allocate another byte for the NULL
>bitmap but it seems not to... it completed almost instantaneously (<100ms)
>taking only about 20 reads and 3 writes to complete for a >2500 pages table
>(I have cleared the cache before running the ALTER TABLE).
>Any insights about what's happening under the hood here will be appreciated
.
>TIA,
>Ami
>
>|||Hi Steve,
Thanks for your prompt reply.
here is a similar script to the one i have used for the last test to try and
focus the question:
CREATE TABLE T1
(
PK_col INT NOT NULL IDENTITY PRIMARY KEY,
Col1 INT NULL,
col2 INT NULL,
col3 INT NULL,
col4 INT NULL,
col5 INT NULL,
col6 INT NULL,
col7 INT NULL,
col8 INT NULL
)
GO
SET NOCOUNT ON
DECLARE @.c INT
SET @.c = 0
WHILE @.c <= 250000
BEGIN
INSERT T1
SELECT @.c,@.c,@.c,@.c,@.c,@.c,@.c,@.c
SET @.c = @.c + 1
END
GO
SELECT dpages
FROM sysindexes
WHERE id = OBJECT_ID('T1')
-- 1420 pages
GO
DBCC DROPCLEANBUFFERS
GO
-- start profiler trace
ALTER TABLE T1 ADD col9 INT NULL
GO
-- 46ms duration, 27 Reads, 3 Writes
Since the ALTER TABLE took only 27 reads, it obviously couldn't have changed
all 1420 pages.
Since the original table had exactly 8 nullable columns, it should have
filled the default 1 byte allocated for the null bitmap.
My questions are:
Where are the null values stored for the existing 250,000 rows?
If I execute the following statements:
INSERT T1
SELECT 1,1,1,1,1,1,1,1,1
GO
Now I have a new row that is physically about 9X4 bytes long (ignoring row
overhead) and 2 bytes of null bitmap.
The other 250K rows are 8X4 bytes long with 1 byte null bitmap.
How does SQL Server know how to distinguish between the 2 different
structures?
The only way it can do that IMHO is to "assume" that if the physical row
strcuture is missing a column compared to the table metadata, that column
must be NULL.
When updating, does it compare the table metadata with the actual row
structure it encounters while reading the pages and change the row structure
"on the fly" if necessary?
Things will get much more complicated now if for example I will do:
1. INSERT a few rows with values for all 9 columns
2. DROP col3 (or any other column) - I know that drop column is also a
metadata only operation, the column will not be physically removed.
3. ADD col10
4. UPDATE some rows adding values to the new columns
4. INSERT more rows
5. ... etc. etc.
I've read quite a few books and articles on SQL Server internals but I can't
remember ever encountering any articles relating to this issue.
I can start digging into it by examining DBCC PAGE internal structures but I
was hoping someone here already knows how it works under the hood to save me
the time and effort :-)
TIA
Ami
"Steve Kass" <skass@.drew.edu> wrote in message
news:OkQfZDuAFHA.2640@.TK2MSFTNGP14.phx.gbl...
> Ami,
> It would help a lot if for each of the situations you don't
> understand you provide the following:
> 1. A CREATE TABLE statement that creates the table with
> the structure it has before the new column is added.
> 2. An ALTER TABLE statement that adds the new column.
> 3. What you see from the ALTER TABLE in terms of reads
> and writes from Profiler.
> 4. What you expected from #3.
> 5. The statements to create any indexes or constraints that
> exist on the table beyond what is in the CREATE TABLE
> statement.
> You seem to have a good understanding of the way rows are
> constructed, and I'm not sure exactly what specific situation
> it is that you don't understand. As far as whether there is
> a bit in the null bitmap to represent NULLs in a non-existent
> column, in some sense, this does exist, but a better way of
> saying it is that there are at times unused bits in the NULL
> bitmap which are just meaningless filler to pad the data
> to byte boundaries; when a new nullable column is added
> without VALUES in this case, the change can be represented
> without any data changes. One of these spare null-bitmap
> bits becomes the null bit for the new column through a
> change in metadata and at most a flipping of the bit (I don't
> know offhand if spare bits are always zeroed out - the
> bits may have to be set with write operations also. The null
> values don't take up space in the table beyond the null bit.
> Some of this is a little different depending on whether the
> column in question is fixed or variable length, or whether
> the table has any variable-length columns or not. Whether
> the table is a heap (no clustered index) or is its clustered
> index (if there is one) may also be an issue, and when data
> is actually added, the number of writes may change dramatically
> depending on whether adding a new column with VALUES
> changes the number of rows that fit on each 8K data page.
> I hope you can see why it helps to know the particulars of
> the examples that concern you. For more information, you
> can look at the topics "Estimating the Size of a Table with a
> Clustered Index" and "Estimating the Size of a Table Without
> a Clustered Index" in Books Online.
> Steve Kass
> Drew University
>
> Ami Levin wrote:
>
a
to
result
also
rows
fixed
bit
as
seem
9th
NULL
(<100ms)
table
appreciated.|||Hi Ami,
My guess is that NULLs for the new columns aren't stored anywhere, and that
the algorithm for reading data is something like this:
- read metadata
- read row
- if you find less columns than expected, assume NULL for missing columns
It's easy to figure out whether there are missing columns since the row
internally holds the actual number of fixed/variable length columns.
When the row is actually modified for any purpose, then all postponed
physical changes take place.
If you want, I can post this in the private MVP forum to get the developers
feedback, but I suspect it won't be much different than this.
Let me know...
BG, SQL Server MVP
www.SolidQualityLearning.com
"Ami Levin" <XXX_NOSPAM___XXX__amlevin@.mercury.com> wrote in message
news:%23GjYwBvAFHA.3988@.TK2MSFTNGP11.phx.gbl...
> Hi Steve,
> Thanks for your prompt reply.
> here is a similar script to the one i have used for the last test to try
> and
> focus the question:
> CREATE TABLE T1
> (
> PK_col INT NOT NULL IDENTITY PRIMARY KEY,
> Col1 INT NULL,
> col2 INT NULL,
> col3 INT NULL,
> col4 INT NULL,
> col5 INT NULL,
> col6 INT NULL,
> col7 INT NULL,
> col8 INT NULL
> )
> GO
> SET NOCOUNT ON
> DECLARE @.c INT
> SET @.c = 0
> WHILE @.c <= 250000
> BEGIN
> INSERT T1
> SELECT @.c,@.c,@.c,@.c,@.c,@.c,@.c,@.c
> SET @.c = @.c + 1
> END
> GO
> SELECT dpages
> FROM sysindexes
> WHERE id = OBJECT_ID('T1')
> -- 1420 pages
> GO
> DBCC DROPCLEANBUFFERS
> GO
> -- start profiler trace
> ALTER TABLE T1 ADD col9 INT NULL
> GO
> -- 46ms duration, 27 Reads, 3 Writes
> Since the ALTER TABLE took only 27 reads, it obviously couldn't have
> changed
> all 1420 pages.
> Since the original table had exactly 8 nullable columns, it should have
> filled the default 1 byte allocated for the null bitmap.
> My questions are:
> Where are the null values stored for the existing 250,000 rows?
> If I execute the following statements:
> INSERT T1
> SELECT 1,1,1,1,1,1,1,1,1
> GO
> Now I have a new row that is physically about 9X4 bytes long (ignoring row
> overhead) and 2 bytes of null bitmap.
> The other 250K rows are 8X4 bytes long with 1 byte null bitmap.
> How does SQL Server know how to distinguish between the 2 different
> structures?
> The only way it can do that IMHO is to "assume" that if the physical row
> strcuture is missing a column compared to the table metadata, that column
> must be NULL.
> When updating, does it compare the table metadata with the actual row
> structure it encounters while reading the pages and change the row
> structure
> "on the fly" if necessary?
> Things will get much more complicated now if for example I will do:
> 1. INSERT a few rows with values for all 9 columns
> 2. DROP col3 (or any other column) - I know that drop column is also a
> metadata only operation, the column will not be physically removed.
> 3. ADD col10
> 4. UPDATE some rows adding values to the new columns
> 4. INSERT more rows
> 5. ... etc. etc.
> I've read quite a few books and articles on SQL Server internals but I
> can't
> remember ever encountering any articles relating to this issue.
> I can start digging into it by examining DBCC PAGE internal structures but
> I
> was hoping someone here already knows how it works under the hood to save
> me
> the time and effort :-)
> TIA
> Ami
>
> "Steve Kass" <skass@.drew.edu> wrote in message
> news:OkQfZDuAFHA.2640@.TK2MSFTNGP14.phx.gbl...
> a
> to
> result
> also
> rows
> fixed
> bit
> as
> seem
> 9th
> NULL
> (<100ms)
> table
> appreciated.
>|||Hi Itzik :-)
How are you?
I have guessed the same (I wrote it in my reply to steve) but i would like
to know how it really works.
Our "guess" seems good enough for the simple script I have used but what
about the more complex situation if after adding col9 we will
1. INSERT a few rows with values for all 9 columns
2. DROP col3 - physical data will not be removed...
3. ADD col10
4. UPDATE some rows with values to the new column
5. DROP col5
5. Add col11
6. INSERT more rows
7. ... etc. etc.
can you see a way to do it without keeping extra metadata?
Let us know what the developers say.
I find this very interesting.
See you soon,
Ami
--
"Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in message
news:ecderKvAFHA.824@.TK2MSFTNGP11.phx.gbl...
> Hi Ami,
> My guess is that NULLs for the new columns aren't stored anywhere, and
that
> the algorithm for reading data is something like this:
> - read metadata
> - read row
> - if you find less columns than expected, assume NULL for missing columns
> It's easy to figure out whether there are missing columns since the row
> internally holds the actual number of fixed/variable length columns.
> When the row is actually modified for any purpose, then all postponed
> physical changes take place.
> If you want, I can post this in the private MVP forum to get the
developers
> feedback, but I suspect it won't be much different than this.
> Let me know...
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com|||Ami, here's what I got back:
"* Add nullable column -- This does not require any data update because the
way the row format is setup, the newly added nullable column assumes NULL
value by default. Similarly, when a
not-nullable column is altered to be nullable, no update is required
because the row format is capable of storing a nullbit for all columns.
* Drop column -- The dropped column(s) continue to consume space (even if
rows are updated or inserted(fixed-len cols)) until the clustered index is
rebuilt. This includes the LOB allocations, and they are only deallocated
on rebuild. For heaps, dbcc cleantable will reclaim space used by dropped
LOB and variable-length columns only.
In Shiloh, we did not have a case where we postponed a data-update due to a
schema-change until the time the row is touched. In Yukon, we do postpone
pushing/pulling data out-of-row/inrow when the setting is changed for the
new max LOB types.
For dropped columns we did not store any information in Shiloh except for a
flag in sysindexes indicating that LOB deallocation/cleanup is required.
Evidence of dropped columns could be deduced from the offsets allocated to
existing columns. As far as compacting the rows is concerned, this occurs
automatically as part of index rebuild. In Yukon, we now store a bit more
information about dropped columns.
Bottom-line, we don't really have a general concept of "postponed or lazy
schema changes", except for the couple of cases mentioned above which only
Sort-of fall into this category."
BG, SQL Server MVP
www.SolidQualityLearning.com
"Ami Levin" <XXX_NOSPAM___XXX__amlevin@.mercury.com> wrote in message
news:%23fB$chvAFHA.2016@.TK2MSFTNGP15.phx.gbl...
> Hi Itzik :-)
> How are you?
> I have guessed the same (I wrote it in my reply to steve) but i would like
> to know how it really works.
> Our "guess" seems good enough for the simple script I have used but what
> about the more complex situation if after adding col9 we will
> 1. INSERT a few rows with values for all 9 columns
> 2. DROP col3 - physical data will not be removed...
> 3. ADD col10
> 4. UPDATE some rows with values to the new column
> 5. DROP col5
> 5. Add col11
> 6. INSERT more rows
> 7. ... etc. etc.
> can you see a way to do it without keeping extra metadata?
> Let us know what the developers say.
> I find this very interesting.
> See you soon,
> Ami
> --
> "Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in
> message
> news:ecderKvAFHA.824@.TK2MSFTNGP11.phx.gbl...
> that
> developers
>|||Thanks Itzik.
That does not explain the "how" but we'll have to settle for it i guess :-)
Thanks again,
Ami
"Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in message
news:uN7eb6VBFHA.3596@.TK2MSFTNGP12.phx.gbl...
> Ami, here's what I got back:
> "* Add nullable column -- This does not require any data update because
> the
> way the row format is setup, the newly added nullable column assumes NULL
> value by default. Similarly, when a
> not-nullable column is altered to be nullable, no update is required
> because the row format is capable of storing a nullbit for all columns.
> * Drop column -- The dropped column(s) continue to consume space (even if
> rows are updated or inserted(fixed-len cols)) until the clustered index is
> rebuilt. This includes the LOB allocations, and they are only deallocated
> on rebuild. For heaps, dbcc cleantable will reclaim space used by dropped
> LOB and variable-length columns only.
> In Shiloh, we did not have a case where we postponed a data-update due to
> a
> schema-change until the time the row is touched. In Yukon, we do postpone
> pushing/pulling data out-of-row/inrow when the setting is changed for the
> new max LOB types.
> For dropped columns we did not store any information in Shiloh except for
> a
> flag in sysindexes indicating that LOB deallocation/cleanup is required.
> Evidence of dropped columns could be deduced from the offsets allocated to
> existing columns. As far as compacting the rows is concerned, this occurs
> automatically as part of index rebuild. In Yukon, we now store a bit more
> information about dropped columns.
> Bottom-line, we don't really have a general concept of "postponed or lazy
> schema changes", except for the couple of cases mentioned above which only
> Sort-of fall into this category."
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
>
> "Ami Levin" <XXX_NOSPAM___XXX__amlevin@.mercury.com> wrote in message
> news:%23fB$chvAFHA.2016@.TK2MSFTNGP15.phx.gbl...
>

No comments:

Post a Comment