sql - How does the Average function work in relational databases? -


i'm trying find geometric average of values table millions of rows. don't know, find geometric average, mulitply each value times each other divide number of rows.

you see problem; number multiplied number exceed maximum allowed system maximum. found great solution uses natural log.

http://timothychenallen.blogspot.com/2006/03/sql-calculating-geometric-mean-geomean.html

however got me wonder wouldn't same problem apply arithmetic mean? if have n records, , n large running sum can exceed system maximum.

so how rdms calculate averages during queries?

very easy check. example, sql server 2008.

declare @t table(i int);  insert @t(i) values (2147483647), (2147483647);  select avg(i) @t; 

result

(2 row(s) affected) msg 8115, level 16, state 2, line 7 arithmetic overflow error converting expression data type int. 

there no magic. column type int, server adds values using internal variable of same type int , intermediary result exceeds range int.

you can run similar check other dbms use. different engines may behave differently, expect of them stick original type of column. example, averaging 2 int values 100 , 101 may result in 100 or 101 (still int), never 100.5.

for sql server behavior documented. expect similar other engines:

avg () computes average of set of values dividing sum of values count of nonnull values. if sum exceeds maximum value data type of return value error returned.

so, have careful when calculating simple average well, not product.


here extract sql 92 standard:

6) let dt data type of < value expression >.

9) if sum or avg specified, then:

a) dt shall not character string, bit string, or datetime.

b) if sum specified , dt exact numeric scale s, data type of result exact numeric implementation-defined precision , scale s.

c) if avg specified , dt exact numeric, data type of result exact numeric implementation- defined precision not less precision of dt , implementation-defined scale not less scale of dt.

d) if dt approximate numeric, data type of result approximate numeric implementation-defined precision not less precision of dt.

e) if dt interval, data type of result inter- val same precision dt.

so, dbms can convert int larger type when calculating avg, has exact numeric type, not floating-point. in case, depending on values can still arithmetic overflow.


Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -