Should Stakeholders be Writing SQL for Self-Service?
Mar 19, 2025
I’ve spent 25 years of my career in tech and the last 15 years as an analytics leader across Amazon, eBay, VMware, and more. During that time I’ve consistently witnessed non-technical leaders ask for more self-service enablement for data for stakeholders. They aren’t wrong to ask for this, and many times self-service enablement is delivered in the form of dashboards or automated reports. But some non-technical leaders attempt to unknowingly push self-service into risky territory by advocating for stakeholders to have the ability to self-serve by writing SQL.
Enabling such self-service comes with risks that most non-technical leaders and stakeholders are unaware of. I say this with confidence as I’ve seen countless examples where even data professionals have been unaware of the same risks. Making the decision to enable stakeholder access to SQL shouldn’t be taken lightly. Here’s what gets overlooked and what you need to know.
Lack of Training
The first thing worth calling out is the lack of training. Every time a leader has proposed enabling SQL access for stakeholders, I’ve circled up with the stakeholders who would be responsible for writing the SQL.
On occasion there are few stakeholders who may have come from an analytics background, but most stakeholders have never worked in analytics. Given the lack of experience, stakeholders should be trained to use SQL and understand a data environment before being granted access to write and run SQL statements. As logical as this sounds, it’s frequently brushed under the rug or overlooked. Just ask any data analyst working in the field.
Analysts rarely receive formal training on the job and most learn by doing the task in the moment. Or they learn from someone slightly more senior, who likely learned the same way. Depending on your perspective, this might have you thinking, “See, it’s easy, anyone can do this.” But it’s not that easy, even if it looks that way.
Just because analysts are doing the job doesn’t mean that their work is accurate all of the time either. This isn’t a knock on analysts. It’s a highlight of issues with the system and how well analytics is understood and supported.
If analysts aren’t being trained, should we expect that non-data analysts would receive training or be better than the analysts?
SQL Won’t Tell You, “You’re Wrong”
If you’ve ever written SQL as a data analyst you know that it’s completely possible to write a SQL statement that runs without error, but produces inaccurate results. In fact, this happens all the time. Whether it’s 10 lines of code or a massive script, there’s almost always a bug during your first attempt at writing the code. No matter how many years you’ve been doing this, how many lines you’ve written, or how much training you have, this will always be true. The question is, do you know that there’s a bug?
To assume that there isn’t a bug would be an ill-advised assumption. There’s almost always a bug. The bug could be an actual bug that generates inaccurate results today, or poor logic that will produce inaccurate results when the code is run at a future date. While many stakeholders believe, or would at least like to believe that all of their reports are accurate, there’s a good chance that there are a number of inaccuracies in the current production reports and datasets.
To avoid bugs, there are some safeguards. But they are brittle and frequently overlooked.
If you can generate inaccurate results without errors being thrown, how will you ever know that your output is inaccurate?
Quality Assurance Checks
To safeguard against inaccurate data making their way into production reports or output, quality assurance (QA) checks can be conducted. It sounds simple, but again, it’s usually not.
In a simplistic example, we might think of a QA check as follows. Let’s assume that there are two rows (one column) of data in Excel, and we’re looking to sum those two rows. Both rows have a value of 2. If the sum is 10, we know that something is wrong. This is obvious because it’s a small dataset and we can perform mental math. If the sum was 4, we’d assert that the data is correct. But is the code actually correct?
While the output of 2+2=4, we’re not looking at the code. The code might not be written as “value_1 plus value_2”. Instead the code might be written as “value_1 times value_2”. It just so happens in this case that both sets of logic produced the same result. The QA check passed, but it wasn’t a sufficient QA check to prove the accuracy of the logic.
For an analyst, QA checks become difficult, time consuming, and costly. Given the frequent lack of training, complexity of systems, and barrage of urgent requests, many data analysts overlook sufficient quality assurance checks. For example, at Amazon, my team worked with numerous tables that had hundreds of billions of rows, which was difficult to perform QA checks against.
When working with large datasets, quickly querying the data and visually inspecting the output is frequently a challenge or is at times feels nearly impossible. As a former leader of software quality assurance engineering teams, what I found most shocking was that the corporate data tools weren’t designed for efficiency and effective QA.
What this meant was that few people were actually performing QA from code block to code block. Analysts would frequently only perform a QA check on the final output. In many cases, these were aggregated spot checks at best. Rarely were they the one-to-one checks that an experienced quality assurance engineer or detailed auditor would look for.
If analysts who perform the job don’t have the tools or training to perform proper quality assurance checks, can we expect stakeholders to perform better?
The Ground is Always Shifting
Even with perfectly accurate and bug-free code, there’s always a risk of an issue or bug. This is because the underlying datasets and systems are constantly changing. What worked yesterday might not work today, and again, you might not even know it.
For example, when I was at eBay, one of my partner analytics teams owned a metric for calculating losses. The calculation wasn’t technically complex, but there were 9 different fields that had to be pulled into the calculation. Unfortunately, the code and data wasn’t properly architected, which meant that when other teams, such as my team, wanted to calculate losses, we didn’t get it by querying a single value from a single table. We had to copy their code and logic. Again, not difficult, but a major issue nevertheless.
This is because any time the partners would change their formula, I’d have no way of knowing that it changed. The Losses team could change the logic in their code, but what about me and everyone else? Again, my code would likely run without error, but the results would be wrong. And they were. But even with proper engineering practices, issues arise. This is frequently experienced when production tables are changed or deprecated.
Over time, data engineering and analytics teams will deprecate tables. Sometimes they’ll remove the original table. Other times they’ll leave the original table in place, but they’ll stop updating the data in that table. In a perfect world, the table owner will inform all users that the table is being deprecated and that any related queries should be updated. But like I mentioned, that's a perfect world, which is one not the one that we live in.
In many situations, the table owner would send an email to a department, providing a heads-up that something was changing. There was rarely any safeguard to ensure that everyone who needed to know about the change actually saw the notice. Also, rarely was there a safeguard to ensure that changes were actually made. Again, a previously accurate query and output would continue to be assumed to be accurate, but it no longer was..
If data analysts struggle to maintain proper engineering practices to avoid shifting ground, and they encounter bugs without knowing, what hope do non-analysts have?
Messy Environment
Even if every human writing SQL had training, and they conducted proper quality assurance, and they used the best architectural practices, and they were properly notified of changes, there’s another issue: contributions to a messy environment.
Analytics environments maintained by analytics teams are arguably, frequently messy in nature. I’m not sure that every analyst would agree, but I feel that I have a fair perspective on this topic. I’ve worked in software development and quality assurance teams, where there are standards and conventions. Those same standards are rarely carried over into analytics environments.
Over the last 15 years, I’ve been solving these analytics architectural problems at every company that I’ve worked at. These companies include Amazon, eBay, VMware, and others. Keep in mind that these companies are at the top of the Analytics Maturity Model. I’ve never once inherited a situation where a sufficient analytics environment for high-quality and high-efficiency was already in existence. Given the existing challenges for analysts, when non-analysts enter the environment, the problems tend to snowball.
At Amazon for example, many product managers had the ability to write SQL and create scheduled jobs. They would do so in the same environment that the analyst worked within. Unfortunately, they didn’t follow any of the practices or conventions that we developed in our team. This created inconsistency, confusion, and cognitive load for everyone using the system. But this also led to other issues.
If analysts struggle to create environments of clean and consistent practices, can we expect non-analysts working outside of the environment to do any better?
Supporting Stakeholders
At Amazon, some product managers were enabled with self-service, frequently and unknowingly assuming all of the risks mentioned above. And when tables were deprecated and deleted, or when systems changed, sometimes the stakeholders were made aware. When they were, they didn’t always modify their code or fix their own issues. Instead, that responsibility was frequently transferred to the analysts.
Without fail, the request to fix their code would occur when the stakeholder was under pressure and had an urgent request. That urgent request then became the analyst’s urgent request, causing disrupted timelines for anyone working with the analytics team. This was also an inefficient way to update code. But there were additional issues.
When more significant changes were made to back-end systems, our VP would get notified of what I called, “the naughty list”. These were emails that would be sent out to our VP if we had datasets that were supposed to be migrated, but weren’t migrated by the deadline.
Unfortunately, because stakeholders would moonlight in the analytics environment, writing their own code and scheduled jobs, it was the analytics team, not the offending stakeholder, who would be held accountable. This had a negative impact on our team’s brand. On top of that, the analytics team had to chase down stakeholders to understand if their code could be deprecated or if it should be maintained. Again, performing updates isn’t as simple as updating a line of code. There are quality assurance checks that should be performed with every update.
If stakeholders aren’t able to support their own work, is your analytics team prepared to take on the additional work and urgency of requests?
Real Costs
Then there’s the real costs. Running a query costs real money to the company. Unfortunately, even most analysts don’t realize the true cost of their queries. While analysts may sit patiently waiting for a query to complete, the elapsed time witnessed by a human isn’t the same as the amount of processing time in a data environment.
The amount of CPU usage and system usage can change dynamically depending on a number of factors. Also, most of the data displayed regarding usage are vague and don’t easily translate into a dollar cost.
Again, speaking from experience at Amazon, I had to build my own reports and dashboards to help my team understand the cost of the queries that they were running. We didn’t have an out-of-the-box system or easy way to understand these costs across all environments. This lack of transparency, along with architectural issues, frequently led to scheduled jobs being triggered even though nobody needed the data.
This was the case in one department where stakeholders were granted SQL access. When the team conducted an audit of the scheduled jobs, I was told that they found over 1,000 jobs from stakeholders that weren’t being used. But they were running on a scheduled basis, costing the company thousands of dollars.
If data analysts aren’t aware of the financial impact of their work, how much will your non-analysts cost your company?
Multiple sources of Truth
Next, we have issues of multiple sources of truth to contend with. Reducing or removing multiple sources of truth from reports and dashboards is one of the top three concerns that I frequently hear leaders mention. Yet, enabling SQL access to more individuals, especially given the context that I previously mentioned, increases the risk of multiple sources of truth.
When this occurs, the analytics and engineering teams are the groups that are frequently held responsible. Again, damaging the brand for the data teams and causing the data teams to perform additional investigations code authored by non-data professionals.
If multiple sources of truth are already an unresolved issue, is your organization prepared to deal with a potential increase?
They Aren’t Analysts
Being a data analyst isn’t simply about writing queries and pulling data. Almost anyone can do that. But just because you have data doesn’t make it accurate, or necessarily accurate within the context of which it’s being used. I have countless stories across every company that I’ve worked at where data was misinterpreted and misused due to the lack of proper context.
Given the time and money invested in driving data driven decisions, and the implications of making a decision with bad information, it would be prudent to ensure that data is interpreted correctly. It would also be prudent to ensure that the correct questions are asked before writing any code. Unfortunately, teaching proper requirements gathering and business acumen is one of the most challenging aspects of elevating an analytics team. But stakeholder’s face the same challenges as well.
If analysts already struggle to obtain sufficient context and details when receiving stakeholder requests, how will your stakeholders be aware that the data that they are using is appropriate for their context?
Conclusion
Non-technical leaders frequently desire additional self-service capabilities, and some of these requests include the enablement of SQL access for non-data analysts. While I’m not completely opposed to this, there are significant risks that must first be addressed before I would recommend that an organization embark on this path.
Without a sufficient understanding of how analytics teams operate, the current challenges, and potential risks, negative outcomes will likely occur. If you’re considering enabling access, the first step is to increase the levels of operational support within your analytics teams.
Brandon Southern, MBA, is a Sr. Director of Analytics and the founder of Analytics Mentor. With a 25-year career in the tech industry, Brandon has excelled in diverse roles encompassing analytics, software development, project management, and more. He has led analytics teams at Amazon, eBay, VMware, GameStop, and more.
You can learn more about Brandon and Analytics Mentor at http://www.analyticsmentor.io/
Subscribe to Receive New Articles
Join our mailing list to receive the latest articles and tips to elevate your career.
We hate SPAM. We will never sell your information, for any reason.