General (RSS)

General

Oracle Team Development – Locking Oracle Objects

Current source control systems such as Visual Source Safe, CVS, and Subversion is that they all rely on having a working copy of the source code under development on the developers local machine. Unfortunately this does not map quite so easily to the database environment. This is particularly problematic when multiple developers are working on database objects on the same server.

The Problem

The in our environment we are using Oracle as the backend database. One of the problems the Oracle developers are facing is when more than one of them is working on the same Package at the same time. Most tools allow you to extract the source for the Package edit the package and then compile it. It is at the compile stage that the problem occurs. If two developers edit the same package at the same time, the one that compiles last will overwrite the changes made by the other developer. Obviously there is another option, each developer could have his/her own database instance to work against and then use version control software to maintain the sources, but for this team this is how they work and since I have already thrown a spoke in the wheels by introducing Subversion I did not want to push things.

Finding a solution

With my focus on Microsoft Technologies, I am not particularly knowledgeable about Oracle, but I decided that I would take up the challenge to find a solution to this problem and learn a thing or two about Oracle along the way. Obviously the problem with piecing together a solution for a technology that you are not particularly knowledgeable about is that you run the risk of re-inventing the wheel. But assuming that the database developers are familiar with there tools there did not seem to be a satisfactory solution to the problem.

On of the few things I do know about Oracle is that they allow you to create triggers on the schema. These triggers are fired when changes are made to the existing schema. With this I decided I would attempt to piece together an object locking scheme where the developer could lock a database object before working on it and then once he has compiled and tested the object he can unlock it. To compile an object the object must be locked by the developer attempting the compile, if not a message should be displayed indicating that the object is either not locked or is locked by another developer. The solution should also be independent of the tools that are being used by the developer.

The solution

As it turns out the solution to this problem was rather easily implemented even with my limited knowledge of Oracle PL/SQL. I created a table that maintains a list of locked objects.

create table DBSCC_LOCKS
(
  OBJTYPE varchar2(
19) not null,
  OBJNAME varchar2(
30) not null,
  LOCKEDBY varchar2(
64) not null,
  constraint PK_DBSCC_LOCKS primary key (OBJTYPE, OBJNAME)
)

The table has the object type for instance PROCEDURE or PACKAGE etc. and the name of the object and who has the object locked. For my purposed I decided to use the machine name the complete solution actually uses the OS user for this purpose.

The next step was to provide a trigger that would check against the DBSCC_LOCKS table before allowing a object creation to proceed.

create or replace trigger TR_DBSCC_CHANGE
before create on schema
declare
  vMachine v$session.MACHINE%TYPE;
  vLockedBy DBSCC_LOCKS.LOCKEDBY%TYPE;
  vObjExists integer;
begin
  -- Only locking PROCEDURES, PACKAGE and PACKAGE BODY
  if ora_dict_obj_type in (
'PROCEDURE', 'FUNCTION', 'PACKAGE', 'PACKAGE BODY') then
    -- Get the machine name that is executing the current session
    select s.machine into vMachine from v$session s
     where s.audsid = userenv(
'sessionid');

    -- Check if the object already exists
    select count(*) into vObjExists from ALL_OBJECTS o
     where o.object_type = ora_dict_obj_type
       and o.object_name = ora_dict_obj_name;

    -- If the object does not exist then we allow the
    if vObjExists =
0 then
      return;
    end if;

    -- Query the DBSCC_LOCKS table to see if and who
    -- has the object locked
    begin
      select l.LOCKEDBY into vLockedBy from DBSCC_LOCKS l
       where l.OBJTYPE = ora_dict_obj_type
         and l.OBJNAME = ora_dict_obj_name;
    exception
      when NO_DATA_FOUND then
      vLockedBy := null;
    end;

    -- Test the lock state of the object
    if vLockedBy is null then
-- Not locked
      raise_application_error(-
20001,
        ora_dict_obj_type ||
'.' || ora_dict_obj_name || ' is not locked, please use DBSCC_LOCK before updating');
    elsif vLockedBy <> vMachine then
-- Locked from another workstation
      raise_application_error(-
20001, 
        ora_dict_obj_type ||
'.' || ora_dict_obj_name || ' cannot be created/updated because it is currently locked by ' || vLockedBy);
    end if;
    -- If we get here then the workstation owning the session
    -- has the lock and the creation can proceed.
  end if;
end;

Now this is where my lack of PL/SQL experience really shows itself, I feel that there are a number of things that can be improved but as it stands it works.

The trigger checks if the object being created is one of the types that we are expecting to be locked before creation. Then the machine name is extracted from the current session (the OS user could be used instead). The trigger checks that the object is an existing object; if it is further tests are done to check that the machine creating the object has a lock on the existing object. In the case that the object is locked by the machine issuing the create or this is the first time the object is being created the trigger allows the create to proceed. Otherwise an exception is raised indicating either that the object must be locked or that it is already locked from elsewhere. Now all that is left is to enable the developer to easily lock and unlock database objects.

The following procedure can be called by the developer to lock the intended database object.

create or replace procedure DBSCC_LOCK(
  p_objtype in DBSCC_LOCKS.OBJTYPE%TYPE,
  p_objname in DBSCC_LOCKS.OBJNAME%TYPE)
is
  vLocks integer;
  vCanLock integer;
  vLockedBy DBSCC_LOCKS.LOCKEDBY%TYPE;
  vMachine v$session.MACHINE%TYPE;
  vObjType DBSCC_LOCKS.OBJTYPE%TYPE;
  vObjName DBSCC_LOCKS.OBJNAME%TYPE;
begin
  vObjType := upper(trim(p_objtype));
  vObjName := upper(trim(p_objname));

  -- Validate object type
  if not vObjType in (
'PROCEDURE', 'FUNCTION', 'PACKAGE') then
    raise_application_error(-
20001, 'Only objects of type PROCEDURE, FUNCTION and PACKAGE can be locked');
  end if;

  -- Validate object that the object exists
  
select count(*) into vCanLock
    from ALL_OBJECTS o
   where o.object_type = vObjType
     and o.object_name = vObjName;

  if
vCanLock =
0 then
    raise_application_error(-
20001, vObjType || '.' || vObjName || ' does not exist');
  end if;

  -- Assume the object is already locked
  vLocks :=
1;

  -- Query the locks table to determine if and who
  -- has the object locked.
  begin
    select scc.LOCKEDBY into vlockedBy
      from DBSCC_LOCKS scc
     where scc.OBJTYPE = vObjType
       and scc.OBJNAME = vObjName;
    exception
      -- NO_DATA_FOUND implies that the object is not locked
      when NO_DATA_FOUND then
      vLocks :=
0;
  end;

  -- If the object is locked inform the caller that the object is
  -- already locked
  if (vLocks >
0) then
    raise_application_error(-
20001, vObjType || '.' || vObjName || ' is locked by ' || vLockedBy);
  else
    -- Get the name of the workstation requesting the lock
    select s.MACHINE into vMachine
      from v$session s
     where audsid = userenv(
'sessionid');

    -- Special case PACKAGE objects so that both the
    -- PACKAGE and PACKAGE BODY are locked
    if vObjType =
'PACKAGE' then
     
insert into DBSCC_LOCKS 
        (OBJTYPE, OBJNAME, LOCKEDBY)
      values
        (
'PACKAGE', vObjName, vMachine);

      insert into DBSCC_LOCKS
        (OBJTYPE, OBJNAME, LOCKEDBY)
      values
        (
'PACKAGE BODY', vObjName, vMachine);
    else
      insert into DBSCC_LOCKS 
        (OBJTYPE, OBJNAME, LOCKEDBY)
      values
        (vObjType, vObjName, vMachine);
    end if;
    commit;
  end if;
end DBSCC_LOCK;

The procedure mostly just checks that the object is not already locked and if not makes an entry into the DBSCC_LOCKS table. One special case is for packages, if a package is locked an entry is made for both the package and the package body. The reciprocal of this procedure is the DBSCC_UNLOCK, this procedure is used to unlock a database object.

create or replace procedure DBSCC_UNLOCK(
  p_objtype in DBSCC_LOCKS.OBJTYPE%TYPE,
  p_objname in DBSCC_LOCKS.OBJNAME%TYPE,
  p_force in integer :=
0)
is
  vMachine v$session.MACHINE%TYPE;
  vObjType DBSCC_LOCKS.OBJTYPE%TYPE;
  vObjName DBSCC_LOCKS.OBJNAME%TYPE;
  vLocks integer;
begin
  vObjType := upper(trim(p_objtype));
  vObjName := upper(trim(p_objname));

  -- Validate object type
  if not vObjType in (
'PROCEDURE', 'FUNCTION', 'PACKAGE') then
    raise_application_error(-
20001, 'Only objects of type PROCEDURE, FUNCTION and PACKAGE are supported');
  end if;

  -- Get the name of the workstation executing the request
  select s.machine into vMachine
    from v$session s
   where audsid = userenv(
'sessionid');

  -- Determine if the object is locked and if it is locked
  -- by the caller, if p_force is non-zero then the machine
  -- is ignored
  select count(*) into vLocks
    from DBSCC_LOCKS l
   where l.OBJTYPE = vObjType
     and l.OBJNAME = vObjName
     and (l.LOCKEDBY = vMachine or p_force <>
0);

  -- If locked then remove the entry from the DBSCC_LOCKS
  -- table. PACKAGE and PACKAGE BODY are special cased
  if vLocks >
0 then
    if vObjType =
'PACKAGE' then
      delete from DBSCC_LOCKS
       where OBJTYPE in (
'PACKAGE', 'PACKAGE BODY')
         and OBJNAME = vObjName;
    else
      delete from DBSCC_LOCKS
       where OBJTYPE = vObjType
         and OBJNAME = vObjName;
    end if;
    commit;
  else
    raise_application_error(-
20001, 'You do not have ' || vObjType || '.' || vObjName ||' locked');
  end if;
end DBSCC_UNLOCK;

DBSCC_UNLOCK does pretty much the reverse, it checks that the machine issuing the unlock in fact has the object locked and if so removes the entry from the DBSCC_LOCKS table. Again there is a special case for packages, where unlocking a package removes both the entry for the PACKAGE and PACKAGE BODY. Additionally the DBSCC_UNLOCK procedure includes a force parameter which when set will force an object to be unlocked even if it is not currently locked by the user.

So a few questions, if I keep saying the OS User can be used rather than the machine name why did I opt to use the machine name. Well, this might just be ignorance, but I recall reading somewhere that OS User might not be reliable from all environments especially when connecting via SQL*NET. Since this solution has been released to the team they have provided some nice enhancements. One of those was integrating this with PL/SQL Developer, the environment predominantly used by the team members, now they can just right click on an object in the object browser and lock/unlock, in the case of the object being locked the context menu even shows who has it locked, they have also changed the procedures to use the OS User rather than the machine name, maybe I should make that change some time soon.

Hope fully those of you out there that have more Oracle expertise than I do will point out the short comings in my implementation as well as maybe a better or alternative solution to the problem.

Pocket PC

Well it has been awhile since my last post and there have been a number of contributing factor, not the least of which has been some internet downtime here at home. Of course there is also the preasures of those real life things like WORK. The next three months look like they are going to be hell.

But, my most fun excuse must be the gift that my wife recently purchased for me, for a tech junkie or is that .net junkie like me the coolest gift is something electronic, and she hit the nail on the head with a iPAQ PocketPC. All in all this is a really cool device. The coolest feature is that it integrated with my wireless network. Only one piece of advice I would offer before you spend 3 hours trying to get the wireless networking to work, perform the firmware upgrade. After the firmware upgrade everything just works. Though this is the first PocketPC device I have owned, I have done a fair amount of development using the emulators and it has been really fun to see some of the stuff work on the real thing. For now I will be using C++ for the development on the PocketPC, but soon I will delve into the Compact Framework and post here about that experience.

Measuring Performance

In the next few weeks I will be presenting a number of sessions on code optimization in the .NET environment. And one of the key things I will be iterating is the selection of algorithms and of course benchmarking before performing optimizations. So here is what I will be saying on one of my favorite topics ‘Algorithm Selection’.

When evaluating the efficiency of an algorithm, there are a number of factors that affect the timings obtained for an iteration of the algorithm. For this reason the efficiency of an algorithm is not measured in execution time, but on an equation that is a function of the size of the data processed by the algorithm. For example, when searching a document for the occurrences of a specific word, one algorithm would be to search character for character until there is a match on the first letter. Once the first letter is found you can proceed to compare subsequent characters against the remainder of the search string to determine if there is a word match. From this we see that if the document has N characters the algorithm execution time is a function of N. This relationship is represented as O(N). Obviously there are other factors that will affect the execution time, for example the more matches there are of the first letter, the more often the algorithm will perform an alternate sequence of tests. These factors are a function of the nature of the input data and not the algorithm. For that reason are ignored when determining the relationship between the algorithm and the input data, focusing rather on the size of the input data. This approach relates the efficiency of an algorithm in the general case. In the case where the worst-case scenario is likely more prevalent it is worth understanding how the algorithm performs in this case and deciding if this is the best algorithm for the expected data.

As another example, the well-known bubble sort routine can be evaluated. The bubble sort routine iterates the data once for each item in the data, comparing and swapping neighboring items if required. For each iteration one less element is evaluated. The bounding function here is the comparison that is performed for each item. From this we can see that the bubble sort requires N(N-1)/2 or (N2-N)/2 comparisons, where N is the number of items to be sorted. This gives us what is known as the performance equation O((N2-N)/2). Using the performance equation we can calculate that for 10 items 45 comparisons are performed and 100 items will require 4950 comparisons. Looking at the growth in the number of comparisons as the number of items increases clearly the bubble sort is useless for large datasets. Typically algorithms whose performance is N2 or worse are considered unusable. By convention O notation does not place significance on any constants other than those that affect the order of the equation, for this reason the standard representation of the bubble sort algorithm would be O(N2-N). While this does not provide the true number of comparisons performed it clearly conveys the order of the algorithm performance equation, which is what has the greatest impact on performance. I like to call this a performance factor or indicator.

To contrast the above with something more usable, we will look at the performance of the ever-popular quick sort algorithm. In the general case the quick sort has a performance factor of O(N log2 N) for the general case and a worst case performance of O(N2) if the dataset is already in sorted order. Fortunately most modern quick sort implementations guard against the worst cast scenario and reduce the likely hood of the algorithm degenerating into a O(N2) function.

When comparing two algorithms there performance equations are not the only factor to evaluate. While two algorithms might have the same order, it is possible that the work performed by one algorithm is more intensive than the other. In this case two algorithms might have the same order, but one is less process intensive than the other.

Open Source saves my day

This weekend was the big re-install. Among other things, my aging FreeBSD server was replaced by a Windows server which took me deeper into the world of Windows DNS/DHCP and Routing. For the time being I am left without a FreeBSD machine, and that left me without a HTTP proxy. Unfortunately I do not have ISA server, a quick Google and I found a native port of Squid for NT4/2000/XP/2003. You just got to love the open source community. I wish I had the time/courage to contribute in a significant way. Now my next step is to install a Linux box to further explore MONO and dotGNU.

P/Invoke Definitions and Samples

I have been thinking about starting an on-line reference for P/Invoke definitions. Each definition will be accompanied by an example and the .NET managed equivalent if available. Before undertaking this task, I wondered if anyone out there already had something similar. I know that this information is available, but I am not aware of a single consolidated source. A project like this would rely on the inputs of the community to help insure the accuracy and quality of the information provided. Any ideas or suggestions would be appreciated.

Thumbs up for Oracle Query optimizer

While installing Oracle on my notebook for a .NET project, it occurred to me to perform the same series of test I did for my earlier post on optimizing database queries. Wow, was I surprised, it appears that the Oracle query optimizer is automatically performing the transformation of the disjunctive normal form to conjunctive normal form. This is a real performance gain while maintaining the simplicity of the query.

Of course I am by no means an Oracle expert so any comments or opinions would be greatly appreciated. Does anyone have access to Yukon, I would really like to know how they are dealing with this.

Optimizing Database Queries

When it comes to understanding a how a query is resolved there is nothing better than using the ‘Show Execution Plan’ feature of SQL Query analyzer. I am certain you are aware that using OR operators in a SQL where clause can be potentially devastating to the performance of the query. In general when using the OR operator the SQL query engine tends to ignore indexes and revert back to using table scans.

There are basically two alternatives to improve this situation. In my particular case one worked slightly better than the other, but I would urge you to make sure that you test your results before assuming one solution is generally better than the other. Things are seldom what they seem in the land of optimization.

The first and most common solution is the use of a union to resolve the ‘OR’ expression, this technique has been around for a number of years and is very well publicized so I am going to just briefly gloss over it.

The following overly simple example should suffice to explain the concepts.

Given a table of 1,000,000 rows, with two numeric columns value1 and value2, each holding random values between 1 and 1000. The columns are individually indexed. We will evaluate and attempt to optimize the following contrived query.

select value1, value2 
  from numbers
 where ( value1 = 1 AND value2 < 100 ) 
    OR ( value2 = 500 AND value1 < 75 )

As it stands, this query takes a total of 10 seconds to return 155 rows on my very underpowered little SQL server box. Using the UNION technique to try to optimize these results gave the following query.

select value1, value2 from numbers where value1 = 1 and value2 < 100
UNION ALL
select value1, value2 from numbers where value2 = 500 and value1 < 75

In this query, the first select returns a result set containing the records matching the criteria of the left side of the OR operator in the original query. The second select statement returns a second result set containing records that match the expression to the right of the OR operator in the original query. Then a UNION ALL operation is performed between these two result sets, yielding a new result set that matches the results from the original query. Only this time the query optimizer uses the relevant indexes to perform the individual queries improving the performance by a staggering 70% bringing the queries execution time from 10 seconds down to 3 seconds.

There is another technique that can be applied to the original query that will improve on the performance. That is to convert the original query’s where clause, which is in disjunctive normal form (DNF) to conjunctive normal form (CNF). To explain disjunctive and conjunctive normal form is beyond the scope of this post, but in very basic terms, the OR operator would be the root of the expression tree for an expression in disjunctive normal form, while the AND operator is at the root of the expression tree in conjunctive normal form. To convert this to conjunctive normal form there is a law called DeMorgan’s law, which can be applied to convert the expression from disjunctive to conjunctive normal form. To apply DeMorgan’s law to the original query, take the where clause and break it up into its unique atoms (literals).

a -> value1 = 1
b -> value2 < 100
c -> value2 = 500
d -> value1 < 75

Using these atoms, rewrite the Boolean expression where addition (+) represents OR and multiplication represents AND operator as follows.

           ( ab ) + ( cd )

Applying DeMorgan’s law the following expansions can be made

ð ( ab + c )( ab + d )

ð ( a + c )( a + d )(b + c)(b + d)

The resulting query for the above expansion is

select value1, value 2
  from numbers
 
where ( value1 = 1 OR value2 = 500 )
   AND ( value1 = 1 OR value1 < 75 )
   AND ( value2 < 100 OR value2 = 500 )
   AND ( value2 < 100 OR value1 < 75 )

This version of the query yielded better performance than the original query since the query optimizer was able to use the indexes available to the query. It did not however match the performance of the UNION optimization, which used the indexes and entirely removed the dependency on the OR operator.

Still the results given even by the UNION solution did not feel satisfactory to me, so I decided to alter the existing indexes and see what the result would be if the index covered the query. For an index to cover a query, the index should include the columns being retrieved by the query, so I replaced the two individual indexes with a single composite index consisting of both value1 and value2 columns. This is where the real excitement came in, suddenly all versions of the query were returning there results in sub second times, even on my under powered overworked SQL server box. With the covering index the most efficient query in terms of query cost was the conjunctive normal form with the UNION optimization coming in a very close second and the original query performing the worst. But the execution times told a different story. In this case the original query was completing in 180ms, the conjunctive normal form optimization in 650ms and the UNION optimization 600ms.

Conclusion

Whenever I am consulted on the performance of a database query, the first thing I ask is what do the indexes look like. In variably the other party looks strangely at me and assures me that there are indexes. On inspection of the indexes, sure enough there are indexes that on every column that is ever considered in a query. But are they the correct indexes?

And that is the point of this exercise, to demonstrate that before considering complex optimization techniques go back to the basics. All to often I come across code that I can barely read because of the number of obscure optimizations applied, making the code difficult to understand and even more difficult to debug. These optimizations only temporarily conceal the underlying problem, which will eventually comeback to haunt your software.