Sunday, October 11, 2015

Why Stored Procedures are Bad

Important:
While writing this article, I am considering Oracle RDBMS as the main reference because, most of the complex stored procedures I saw in my 15+ years carrier are in Oracle. However same reasoning fits for most of other database systems as well.

Myth No.1. "Stored procedures are executed inside database"

Fact: Stored procedures are executed outside the Database (RDBMS engine).
PL/SQL engine executes PL block and sends the SQL statements to RDBMS
It is a separate interpreter Engine called PL/SQL engine


For more reading and understanding:
Oracle Documentation on PL/SQL Architecture
Safari books online PVM

Myth No.2. "Stored procedures are compiled. So faster than straight SQL"

Fact: As explained in the architecture diagram, PL/SQL sends any embeded SQL statement to RDBMS engine.
There is no way PL/SQL Engine can improve the database server where SQL gets executed.
Stored procedures are extra layer on the top of Databae engine. Just like any application server

Myth No.3. "Stored procedures compiled like generic languages like C/C++/Java/.Net"

Fact: Stored procedures are compiled to byte code. NOT manchine code or platform code.
This byte code is then interpreted by PL/SQL Engine. It is an byte code interprer infrastruture
One may argue that there is Native compilation option in new versions of Oracle. Same is discussed below
For more reading and understanding:
Wikipedia article on Bytecode
Wikipedia artcile on Bytecode Interpreter

Myth No.4. "Even though PL/SQL Engine is interpreter. it can be as fast as native languages"

Fact: No magic possible. In general, Interpreter languages can be upto 20 times (2000%) slower than compiled languages.
According to different benchmarks, stored procedures are 7+ times slower than even most of native languages
Sample Benchmark where Java vs Stored Proc is compared
Another One Java vs Stored Proc Benchmark

Myth No.5. "Stored procedures won't take much of memory on database server"

Fact: Call Global Area (CGA) is a part of the PGA, where top level calls are maintianed until the execution ends.
Again, UGA (User Global Area) is part of PGA heap, where all the session specific data is maintained.
And each session maintains their own UGA and it cannot be shared between process.
This includes all variables and cursors etc for each user session
That means, As the number of sessions executing the stored procedure increases, memory overhead escalates sharply.

Myth No.6. "Stored procedures can make the system Highly Available"

Fact: It is just opposite. As explained in the previous point, execution of stored procedure mainly depends on PGA. PGA is not capable of failover to another node. all execution crashes along with the instance. Due tightly coupled nature of Stored procedures, any change in database objects like table can invalidate the code and very often results in outage.

Myth No.7. "Stored procedures are the best way to speedup. because it uses less network"

Fact: Server interconnects are no longer the bottleneck. (That was case interconnect used to work at 10Mbps). Today's giant database clusters are possible because of high-speed interconnects
Still There could be some minority cases. but need to be bench-marked and validated for case by cases. No assumptions please!. Other disadvantages related performance including byte-code interpretation eclipses any small network advantages.

Myth No.8. "Stored procedures helps on code reusablity"

Fact: Again it was old school. Modern Object Oriented languages and frameworks allows better code reusablity. Unfortunately PL/SQL like languages lacks support for OO design and coding practices.
A moderately complex logic an run into thousands of line of procedural code. which is difficult to understand and maintain.

Myth No.9. "Stored Procedures helps to Scale up application"

Fact: Just Opposite. Stored procedures are biggest bottleneck for the scalablity because of the centralized processing. We already discussed about the scalablity issue related to memory consumption (Myth No. 5). In addition to this, Stored procedures causes the database as hotspot. because computation is centralized rather than distributing it to application servers

Myth No.10. "Stored procedures latest versions are better than other langauges"

Fact:Yes, there were significant efforts to improve the performance of stored procedures by natively compiling (NCOMP) it.
The idea is to generate C code from PL/SQL code and compile it with C compiler.
However machine generated C source code can't be compared with handwritten code
Not Only performance is not that great as it could be, this approach got few problem:
1. C code is compiled to loadable shared libraries and loaded to Oracle.any segmentation fult can crash the server

2. C is a great language for system programming, However for application programming, we need to have managed code execution with garbage collector

Memory leaks are big threat to stability and security of the system
other demerits are out weighing the advantages of NCOMP
Reference: For more reading
Oracle FAQ about NCOMP here
User Documentation and testing of NCOMP here

Other reasons why Stored Procedures are a bad choice

Modular Object Oriented Designs are not possible

Modern Object Oriented Designs and application of Design Patterns are not possible. because PL/SQL is a procedure language of last generation.

Centralized processing and lack of modularity prevents Agile methods

We cannot have a frequent releases. During the compilation of stored procedure, there will be unavailability. Entire package need to be compiled opposed to a signle class of OO languages and frameworks. Due to centralized processing of application logic, we cannot have rolling releases like deploying new code to application servers one at a time.

Ends up in non-portable code which works only with a particular RDBMS

Stored procedures written for Oracle will work only with Oracle Database. same is the case with other database systems. Where we are getting into lock-in situations.

ORM (Object Relational Mappers) feature cannot be used, including app side caches

Modern ORM components are very powerful and feature rich. It hides the complexities of databases from an application developer. Advanced entity frameworks, application layer caches, connection pooling are few examples
End of the day application developer need to write lesser code. lesser the code lesser the bugs!
ADO.Net for .Net framework, Hibernate for Java and SQLAlchemy for Python are few examples

Per CPU Licenses and Costly features

CPUs in Database servers are much costlier than CPUs in application server. Because database vendor charges per CPU basis.
When we run same calculations in DB Server, We are actually misusing costly CPUs
In Addition, Many of the features are only available in costly editions of RDBMS software. when we create dependency on that we are bound to pay for costly editions

No comments:

Post a Comment