Are you Aware of Dashboard & Query Performance Issues?
Jun 16, 2023Throughout my career leading analytics teams at eBay, Amazon, GameStop, and a handful of other companies, I’ve seen a lot of data analysts overlook potential performance issues. Whether through SQL written directly against the database or through derived SQL from Tableau and Power BI dashboards, query performance is usually an afterthought, if it’s a consideration at all. Unfortunately, many dashboards and SQL scripts cause significant performance issues, slowing down the database and increasing costs for the company. But performance can be improved, and you can start by using the SQL explain plan.
How to Detect Performance Issues
Each query has an order of operations that will be followed, and each section of the SQL query is processed like an individual task. But for many analysts triggering a query, the details about the order of operations and the amount of time that each operation will take is a mystery. This is where the explain plan can help.
A SQL explain plan is a report that details how a query will run when it is triggered against the database. This report will display each step or operation that the database will take when running the query, and it will provide an estimated amount of time that each step will take to complete.
This means that not only will the analyst be able to estimate how long the entire script will take to run, but also detect the most time-consuming operations of the query. By viewing the explain plan, an analyst can focus on the worst performing parts of the code and make the necessary adjustments to improve performance. But before performance can be improved, the analyst must have a solid understanding of SQL, the database, and the options for improving performance.
As an example, let’s assume that a query consists of a few different operations, joining various tables, filtering data, and performing calculations and aggregations. The entire query takes 40 minutes to run. But because it’s running behind the scenes as part of a dashboard data source, it’s not a point of focus for the data analyst, even though it’s negatively impacting the database and the company.
However, when the analyst is conscious of performance and views the explain plan, the analyst finds that one part of the query is taking 30 minutes. This is the largest bottleneck, and the analyst should try to resolve the performance issues on their own (if they have the knowledge and skills) or collaborate with the data engineering team to find a solution. The question is what are the solutions?
Learning About Potential Solutions
To identify solutions, the analyst needs to understand how performance problems are created. This requires an understand of the order in which the SQL statements execute and what’s really happening within the database. In this hypothetical example, the query is performing a join on two massive tables, then it’s performing an aggregation of the data. Depending on the situation, the. Analyst may be able to perform the calculation before performing the join. This would reduce the number of records that have to be joined between the two tables, thus reducing the amount of time that the query takes to run.
But what if the analyst must perform a join before the aggregation? One option might be to utilize indexes on the table for faster lookups. Another option might be to apply partitions to the table to leverage distributed or parallel processes.
Conclusion
As you look at the explain plan more frequently, you’ll start to see patterns and what operations tend to take longer than others. This will put you on a path to exploring and understanding how SQL performs the order of operations, allowing you to make consideration performance issues when writing your code. For example, if you know that you’re working with two tables that have billions of rows, and you need to perform the join based on a calendar date, you’ll know that this can be an intensive and costly query.
With this knowledge and a performance mindset, you will start to think about how you can make the query run faster. Reducing the amount of data queried by using a filter could help, but helping the database find those filtered records even faster with a partition or index might help even more. This would lead to an inspection of the database tables to ensure that they have the proper indexes and partitions before running the query.
Even if you aren’t sure how to solve performance issues, you can work toward increasing performance by being conscious of query performance. When you start using the explain plan, you’ll have a better understanding of your code, the database, and performance.
Subscribe to Receive New Articles
Join our mailing list to receive the latest articles and tips to elevate your career.
We hate SPAM. We will never sell your information, for any reason.