When working with large-scale data, even the most minor changes can dramatically impact performance, which is why we need to think of introducing a performance factor in our database code. In this article, I’d like to accomplish the following:
- Facilitate the developers to write premium and efficient code.
- Enhanced features that should be adopted for future database development.
- Identification of most common DB-level programming mistakes that cause issues on the client-side.
- Conceptual understanding of the various DB-related problems that we encounter regularly.. .
- Few known problems and their workaround.
Query Optimization Tips:
We can easily adapt and follow the best practices while writing a database query. It will ultimately improve the performance of the application in data manipulation on the backend.
- Use specific Column Names instead of * in SELECT query.
- Try to use alternatives of COUNT (*) for returning total tables row count (it causes full table scan)
- Try to avoid using HAVING clause in select statement.
- Try to limit the number of subquery blocks Inside a query.
- SELECT col_1 FROM table_name1 WHERE (col_2, col_3) = (SELECT MAX (col_2), MAX (col_3) FROM table_name2) AND col_4 = _testvalue1;
- Attempt to utilize UNION ALL rather than UNION, whenever possible.
- Use EXISTS instead of DISTINCT
- Use - > SELECT d.col_id, d.col2 FROM table1 d WHERE EXISTS (SELECT ‘X’ FROM table2 e WHERE e.col2 = d.col2);
SELECT DISTINCT d.col_id, d.col2 FROM table1 d, table2 e WHERE d.col2 = e.col2;
Note: Table 2 has duplicate records, and none of its columns areis required in the select list; we can replace join with exists, which enables us to eliminate the distinct keyword form select.
- We should use conditions in the WHERE clause carefully. Below are some examples:
- Use -> SELECT id, col1, col2 FROM table WHERE col2 > 10;
SELECT id, col1, col2 FROM table WHERE col2 != 10;
(index is not picked in case of != operator)
- Use -> SELECT id, col1, col2 FROM table WHERE col1 LIKE ‘Nav%’;
SELECT id, col1, col2 FROM table WHERE SUBSTR (col1,1,3) = ‘Nav’;
- Use -> SELECT Col1, Col2 FROM table WHERE Col3 BETWEEN MAX (Col3) and MIN (Col3)
SELECT Col1, Col2 FROM table WHERE Col3 >=
MAX (Col3) and Col3 <= MIN (Col3)
- Use -> SELECT id, Col1, Col2 FROM table WHERE Col2 < 25000;
SELECT id, Col1, Col2 FROM Table WHERE Col2 + 10000
- Try to use stored procedures instead of heavy queries to reduce network traffic. Our client will only send the stored procedure name (along with some parameters) to the server instead of heavy and lengthy queries text. Stored procedures can be used to enhance security. For example, we can give different users a different set of permissions to execute the stored procedure to work with the restricted set of columns and data.
- Try to use the following guidelines, for sub-query writing:
- Attempt to utilize a correlated sub-query when the returned data is moderately small or other measures to increase efficiency, i.e., if the tables within the sub-query have efficient indexes.
- Try to use a non-correlated (does not refer to the outer query) sub-query when dealing with large tables from which you expect a large return (many rows) or if the tables inside the sub-query do not have efficient indexes.
- Confirm that multiple sub-queries are in the extreme productive order.
- Remember that rewriting a subquery as a join may increase efficiency.
- Do not write order by clause in the sub-query.
Code Optimization Tips:
- Use derived tables instead of a temporary table if possible.
- Use Clustered indexes with temporary tables for large datasets.
- Use INSERT INTO instead of SELECT INTO for the temporary table population.
- SET NOCOUNT ON toward the start of each stored procedure you compose. This statement needs to be incorporated in every stored procedure, trigger, and so forth that you write.
- When you want to execute a string of Transact-SQL, you need to use the sp_executesql stored procedure in place of the EXECUTE statement.
- Use stored procedures instead of views as they offer better performance.
- If you use BULK INSERT to import data into SQL Server, use the TABLOCK hint along with it.
- Do not use the DDLs, especially ones such as enable/disable. Instead, ensure all the FK and references are properly indexed.
- In Oracle, always use NVL instead of ISNULL.
- The detailed table should only be queried if the required fields are not available in master tables.
- If an insert follows an update on the same table, try to incorporate the insert’s update changes to avoid update cost.
MERGE VS UPDATE:
The UPDATE statement will most likely be more efficient than a MERGE if the all you are doing is updating rows. Given the complex nature of the MERGE command’s match condition, it can result in more overhead to process the source and target rows. However, when you need to do more than one operation, the MERGE command is most likely a better choice, as you are only making one pass through the data as opposed to multiple passes, one for each separate UPDATE, INSERT, or DELETE command, through the source data.
Functions with Table Column:
- Functions with the Table column can degrade the performance, so avoid it.
Explain Plan with Trim
Without Trim Function
- Indexes can improve performance in a big way if used sensibly.
- Indexes speed up the queries.
- Indexes are useful only if there is a small portion of data required to be manipulated.
Partitioning is the ability of a database to take extensive tables or indexes and physically break them up into:
- Manageable pieces
- When to partition a table:
- Tables larger than 2GB should consistently be considered for Partitioning.
- When the contents of the table need to be distributed across different types of storage devices.
- Without partition execution plan show cost 2.075 and goes for a full table scan.
- Now we apply a partition on tble_test table on execution_dte column, the cost reduced to 0.20
Optimization using WITH Clause:
- CTE (Common Table Expression) is defined at the beginning of your query or the top.
- It makes it more readable than a derived table.
- It is mainly a more optimized approach if the results of WITH query areis required more than one time in the body of the query.
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
WHERE AvgSubTotal > 100
Methodology to Avoid Deadlocks and Blockings:
Deadlock occurs when
- Each process is hanged, waiting for the other to release a resource.
- DBMS must kill one of the processes on a random basis to proceed forward.
- Usually, Deadlocks occur due to bad programming techniques; there are few things that a programmer must consider while coding to avoid or minimize the Deadlock occurrences.
- Resources should be acquired in some well-defined order because if all concurrent transactions access objects in the same order, deadlocks are less likely to occur.
- Utilizing stored procedures for all data alterations can normalize the order of accessing objects.
- Control the transaction locks for the SELECT statement by choosing the correct isolation or using the locking hints specified in the FROM clause, i.e., no lock.
- Deadlocks nusually occur when some of the long-running transactions execute simultaneously.
- Keep transactions as short as possible. This approach should also be adopted if the application experiences excessive blockings.
We can practice the above guidelines whenever we are dealing with performance issues in databases. The best practice is to consider all the above factors when doing code at the database level or consider them in the unit testing phase. It will help in reducing extra effort consumed in eliminating performance issues in data manipulation. Moreover, these tips can also be considered while performing code reviews.