Database interview questions and answers
Q. What do you understand by the terms clustered index and
non-clustered index?
A. When you create a clustered index on a table, all the rows in
the table are stored in the order of the clustered index key. So, there can be
only one clustered index per table. Non-clustered indexes have their own
storage space separate from the table data storage. Clustered and non-clustered
indexes are stored as binary search tree (i.e. keep data sorted and has the
average performance of O(log n) for delete, inserts, and search) structures
with the leaf level nodes having the index key and it's row locator for a
faster retrieval.
Q. What is the difference between primary key and unique key?
A. Both primary key and unique key enforce uniqueness of the column
on which they are defined. But by default, a primary key creates a clustered
index on the column, whereas a unique key creates a non clustered index by
default. Another major difference is that, a primary key doesn't allow NULL
values, but unique key allows a single NULL.
Q. What are the pros and cons of an index?
A.
PROS
·
If an index does not exist on a table, a table scan must be
performed for each table referenced in a database query. The larger the table,
the longer a table scan takes because a table scan requires each table row to
be accessed sequentially. So, indexes can improve search performance,
especially for the reporting requirements.
CONS
·
Excessive non-clustered indexes can consume additional storage
space.
·
Excessive non-clustered indexes can adversely impact performance
of the INSERT, UPDATE, and DELETE statements as the indexes need to recreated
after each of the above operation.
So, it is essential to have a right balance based on the usage
pattern.
Q. What are the pros and cons of stored procedures?
A.
PROS
·
pre-compiled and less network trips for faster performance
·
less susceptible to SQL injection attacks
·
more precise control over transactions and locking
·
can abstract complex data processing from application by acting as
a facade layer.
CONS
·
There are chances of larger chunks of business logic and
duplications creeping into stored procedures and causing maintenance issues.
Writing and maintaining stored procedures is most often a specialized skill set
that not all developers possess. This situation may introduce bottlenecks in
the project development schedule.
·
Less portable.The stored procedures are specific to a particular
database.
·
Scaling a database is much harder than scaling an application.
·
The application performance can be improved by caching the
relevant data to reduce the network trips.
So, when should stored procedures be used ?
Stored procedures are ideal when there is a complex piece of business logic
that needs complex data logic to be performed involving a lot of database
operations. If this logic is required in many different places, then store
procedure makes even more sense. For example, batch jobs and complex report
generation that performs lots of database operations.
So, when shouldn't stored procedures be used ?
When you are performing basic CRUD (Create, Read, Update, and Delete)
operations. For example, in a Web application a user creates some data, read
the created data, and then updates or deletes some of the created data.
Q. How would you go about writing a stored procedure that needs to
loop through a number of selected rows?
A. You need to use a cursor. A cursor is basically a pointer to row
by operation. For example, you can create a cursor by selecting a number of
records into it. Then, you can fetch each row at a time and perform some
operations like invoking another stored proc by passing the selected row value
as an argument, etc. Once uou have looped through all the records, you need to
close and deallocate the cursor. For example, the stored procedure below
written in Sybase demonstrates the use of a cursor.
Apply to the database "mydatabase"
use mydatabase
go
Drop the stored procedure if it already exists
IF
OBJECT_ID('dbo.temp_sp') IS NOT NULL
BEGIN
DROP PROCEDURE dbo.temp_sp
IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
PRINT '<<< FAILED DROPPING
PROCEDURE dbo.temp_sp >>>'
ELSE
PRINT '<<< DROPPED PROCEDURE
dbo.temp_sp >>>'
END
go
Create the stored procedure that uses cursor
create proc temp_sp
as
DECLARE @ADVISERID char(10)
DECLARE advisers_cur cursor
for select adviser_id FROM
tbl_advisers where adviser_id like 'Z%'
-- select adviser_ids starting with 'Z'
for read only
open advisers_cur -- open the cursor
FETCH advisers_cur INTO @ADVISERID -- store value(s) from the cursor into
declared variables
--@@sqlstatus is a sybase implcit variable
that returns success/failure status of previous statement execution
WHILE (@@sqlstatus = 0)
BEGIN
SELECT @ADVISERID -- select the adviser_id
stored into @ADVISERID
FETCH advisers_cur INTO @ADVISERID --store value(s) from the cursor into
declared variables
END
close advisers_cur
deallocate cursor advisers_cur
go
Execute the stored procedure that uses a cursor
exec mydatabase..temp_sp
Q. Why should you deallocate the cursors?
A. You need deallocate the cursor to clear the memory space
occupied by the cursor. This will enable the cleared space to be availble for
other use.
Q. How would you go about copying bulk data in and out of a
database?
A. The process is known as bulk copy, and the tools used for this
are database specific. For example, in Sybase and SQLServer use a utility
called "bcp", which allows you to export bulk data into comma
delimited files, and then import the data in csv or any other delimited formats
back into different database or table. In Oracle database, you achieve this via
the SQLLoader. The DB2 database has IMPORT and LOAD command to achieve the
same.
Q. What are triggers? what are the different types of triggers?
A. Triggers are stored procedures that are stored in the database
and implicitly run, or fired, when something like INSERT, UPDATE , or DELETE
happens to that table. There are 3 types of DML triggers that happens before or
after events like INSERT, UPDATE, or DELETE. There could be other database
specific triggers.
Q. When to not use a trigger, and when is it appropriate to use a
trigger?
A.
When to not use a trigger?
The database triggers need to be used very judiciously as they are executed
every time an event like insert, update or delete occur. Don't use a trigger
where
·
database constraints like unique constraint, not null, primary
key, check constraints, etc can be used to check for data validity.
·
triggers are recursive.
Where to use a trigger?
·
Maintaining complex integrity constraints (referential integrity)
or business rules where other types of constraints cannot be used. Because
triggers are executed as part of the SQL statement (and its containing
transaction) causing the row change event, and because the trigger code has
direct access to the changed row, you could in theory use them to correct or
reject invalid data.
·
Auditing information in a table by recording the changes. Some
tables are required to be audited as part of the non-functional requirement for
changes.
·
Automatically signaling other programs that action needs to take
place when changes are made to a table.
·
Collecting and maintaining aggregate or statistical data.
Q. If one of your goals is to reduce network loads, how will you
about achieving it?
A.
·
you can use materialized views to distribute your load from a
master site to other regional sites. Instead of the entire company accessing a
single database server, user load is distributed across multiple database
servers with the help of multi-tier materialized views. This enables you to
distribute the load to materialized view sites instead of master sites.
To decrease the amount of data that is replicated, a materialized view can be a
subset of a master table or master materialized view.
·
Write stored procedures to minimize network round trips.
·
Carefully crafting your SQL to return only required data. For
example Don't do select * from tbl_mytable. Instead, specify the columns you
are interested in. For example, select firstname, surname from tbl_mytable.
·
You can set the fetch size to an appropriate value to get the
right balance between data size and number of network trips made.
Q. What are the other uses of materialized views?
A.
·
Materialized view is one of the key SQL tuning approaches to
improve performance by allowing you to pre-join complex views and pre-compute
summaries for super-fast response time.
·
Materialized views are schema objects that can be used to
summarize, precompute, replicate, and distribute data. E.g. to construct a data
warehouse, reporting, etc. A materialized view can be either read-only,
updatable, or writable. Users cannot perform data manipulation language (DML)
statements on read-only materialized views, but they can perform DML on
updatable and writable materialized views.
·
A materialized view provides indirect access to table data by
storing the results of a query in a separate schema object. Unlike an ordinary
view, which does not take up any storage space or contain any data. You can
define a materialized view on a base table, partitioned table or view and you
can define indexes on a materialized view.
Q. If you are working with a legacy application, and some of the
database tables are not properly designed with the appropriate constraints, how
will you go about rectifying the situation?
A. One possible solution is to write triggers to perform the
appropriate validation. Here is an example of an insert trigger.
CREATE TRIGGER
TableA_itrig
ON TableA
FOR INSERT
AS
BEGIN
IF @@rowcount = 0
RETURN
IF NOT EXISTS
(
SELECT
*
FROM
inserted ins, TableB ol
WHERE
ins.code = ol.code
)
BEGIN
RAISERROR 20001, "The associated object
is not found"
ROLLBACK TRAN
RETURN
END
END
Q. If you are working on a new application that requires stringent
auditing requirements, how would you go about achieving it?
A. Since it is a new application, there are a number of options as
listed below.
·
The application is designed from the beginning so that all changes
are logged either synchronously or asynchronously. Asynchronously means
publishing the auditing messages to a queue or topic, and a separate process
will receive these messages and write a database or flat file. All data changes
go through a data access layer of the application which logs all changes
·
The database is constructed in such a way that logging information
is included in each table, perhaps set via a trigger. This approach may
adversely impact performance when inserts and updates are very frequent.
Q. What if you have to work with an existing legacy application?
A. Use triggers.
Q. What do you understand by the terms clustered index and non-clustered index?
A. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage space separate from the table data storage. Clustered and non-clustered indexes are stored as binary search tree (i.e. keep data sorted and has the average performance of O(log n) for delete, inserts, and search) structures with the leaf level nodes having the index key and it's row locator for a faster retrieval.
Q. What is the difference between primary key and unique key?
A. Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default, a primary key creates a clustered index on the column, whereas a unique key creates a non clustered index by default. Another major difference is that, a primary key doesn't allow NULL values, but unique key allows a single NULL.
Q. What are the pros and cons of an index?
A.
PROS
CONS
Q. What are the pros and cons of stored procedures?
A.
PROS
CONS
So, when should stored procedures be used ?
Stored procedures are ideal when there is a complex piece of business logic that needs complex data logic to be performed involving a lot of database operations. If this logic is required in many different places, then store procedure makes even more sense. For example, batch jobs and complex report generation that performs lots of database operations.
So, when shouldn't stored procedures be used ?
When you are performing basic CRUD (Create, Read, Update, and Delete) operations. For example, in a Web application a user creates some data, read the created data, and then updates or deletes some of the created data.
Q. How would you go about writing a stored procedure that needs to loop through a number of selected rows?
A. You need to use a cursor. A cursor is basically a pointer to row by operation. For example, you can create a cursor by selecting a number of records into it. Then, you can fetch each row at a time and perform some operations like invoking another stored proc by passing the selected row value as an argument, etc. Once uou have looped through all the records, you need to close and deallocate the cursor. For example, the stored procedure below written in Sybase demonstrates the use of a cursor.
Apply to the database "mydatabase"
Drop the stored procedure if it already exists
Create the stored procedure that uses cursor
Execute the stored procedure that uses a cursor
Q. Why should you deallocate the cursors?
A. You need deallocate the cursor to clear the memory space occupied by the cursor. This will enable the cleared space to be availble for other use.
Q. How would you go about copying bulk data in and out of a database?
A. The process is known as bulk copy, and the tools used for this are database specific. For example, in Sybase and SQLServer use a utility called "bcp", which allows you to export bulk data into comma delimited files, and then import the data in csv or any other delimited formats back into different database or table. In Oracle database, you achieve this via the SQLLoader. The DB2 database has IMPORT and LOAD command to achieve the same.
Q. What are triggers? what are the different types of triggers?
A. Triggers are stored procedures that are stored in the database and implicitly run, or fired, when something like INSERT, UPDATE , or DELETE happens to that table. There are 3 types of DML triggers that happens before or after events like INSERT, UPDATE, or DELETE. There could be other database specific triggers.
Q. When to not use a trigger, and when is it appropriate to use a trigger?
A.
When to not use a trigger?
The database triggers need to be used very judiciously as they are executed every time an event like insert, update or delete occur. Don't use a trigger where
Where to use a trigger?
Q. If one of your goals is to reduce network loads, how will you about achieving it?
A.
A.
Q. If you are working with a legacy application, and some of the database tables are not properly designed with the appropriate constraints, how will you go about rectifying the situation?
A. One possible solution is to write triggers to perform the appropriate validation. Here is an example of an insert trigger.
Q. If you are working on a new application that requires stringent auditing requirements, how would you go about achieving it?
A. Since it is a new application, there are a number of options as listed below.
Q. What if you have to work with an existing legacy application?
A. Use triggers.
No comments:
Post a Comment