Skip to main content

Unlocking the Power of Druid MSQ in Superset

· 3 min read
Aidan Mulgrew
Software Engineer

At Millersoft, we’ve been working to make Apache Druid more efficient, flexible, and cost-effective for analytics workloads. One area we’ve focused on is the Python-to-Druid connector (pydruid), which powers integrations with Apache Superset.

We’re excited to share some improvements we’ve made that open the door to new capabilities in Superset dashboards and the potential for substantial cost savings in large-scale Druid deployments.


What We've Done

We extended the pydruid connector to support Druid’s MSQ (Multi-Stage Query) engine, which means Superset dashboards and charts can now use the MSQ engine. In addition, Superset can finally cancel running Druid queries, preventing wasted resources and speeding up the user experience.

These enhancements make Superset more responsive for analysts and more efficient for operators.


Why MSQ Is a Game-Changer

Traditionally, Druid queries rely heavily on historical nodes that keep data loaded on disk. While this ensures speed, it can be expensive to maintain, especially as data volumes grow.

The MSQ engine changes that. Now:

  • It can query directly against deep storage, removing the need to keep all historical data on disk.
  • Organizations can keep only recent hot data (e.g., the last 90 days) on historical nodes.
  • Older data can remain cost-effectively stored in deep storage, only queried on demand.

The Cost-Saving Opportunity

This hybrid model allows companies to reduce the number and size of historical nodes they need to operate.

For example:

  • Keep only 90 days of hot data in historicals.
  • Use MSQ to query older data in deep storage when needed.
  • Scale down historical nodes from multiple larger instances to fewer, smaller ones.

Depending on data volumes, this strategy could save thousands of dollars per month in infrastructure costs - all while retaining complete access to historical data.


A Better Experience in Superset

From the end-user’s perspective, the improvements are seamless:

  • Dashboards and charts remain fast and interactive for recent data.
  • Long-term analytics can still be run when needed, without bloating infrastructure.
  • Analysts can cancel queries directly from Superset, saving time and frustration.

This makes Superset not just a visualization tool, but also a cost-conscious analytics platform that adapts to both business and technical needs.


Looking Ahead

The combination of Superset’s flexibility and Druid’s MSQ engine gives organizations a new way to balance performance, cost, and data accessibility.

By making MSQ a first-class citizen in Superset through our pydruid improvements, we’re helping teams:

  • Deliver fast, interactive dashboards.
  • Keep full access to long-tail historical data.
  • Reduce infrastructure costs by optimizing their Druid footprint.

In other words: do more with less, without sacrificing depth of insight.

Enterprise Enhancements to Apache Superset

· 4 min read
Aidan Mulgrew
Software Engineer

At Millersoft, we’ve continued to invest in making Apache Superset more powerful, flexible, and enterprise-ready. Superset is already a fantastic open-source BI platform, but we’ve found that with a few key enhancements, it can become even better suited to real-world business needs, especially around automated reporting and usability.

Here’s a look at the improvements we’ve made to make Superset a more capable and adaptable analytics platform for our teams.


Smarter Data Handling with Detokenisation

One of our main goals was to make sensitive data handling both secure and user-friendly.

We introduced a new Detokenisation feature, available in both SQL Lab and the charting interface, which allows users to view tokenised data in a more readable format.

When enabled, this feature manipulates the returned dataframe to replace tokens (marked with a prefix like t:) with their plaintext values, which makes data exploration and analysis more intuitive for users who have permission.

This balances data security with ease of analysis, ensuring that tokenised values can be revealed only when needed, and only for authorised users.


Enhanced Email Reporting Capabilities

Out of the box, Superset’s email reporting capabilities are relatively limited, typically supporting only basic recipient and subject fields.

We’ve expanded this functionality significantly by adding CC and BCC fields for broader communication flexibility, and a custom email body, allowing teams to tailor their message content beyond the standard “Explore in Superset” link.

Enhanced Email Reporting

These changes make it easier for teams to send reports that are functional, context-rich, and branded, helping analytics outputs fit seamlessly into business workflows.


New S3 Integration for Report Delivery

In addition to enhanced email support, we’ve added AWS S3 as a new notification and delivery method for Superset’s reporting system. While Superset natively supports Email and Slack, our enhancement allows users to additionally export and deliver reports directly to S3, providing a simple and secure way to integrate reports into external systems or make them accessible via an SFTP server.

S3 Export Example

This functionality is built as a new S3 notification plugin, extending Superset’s base notification framework used by Email and Slack. It’s fully integrated into the Superset UI — users can select S3 as a delivery option directly from the Alerts & Reports configuration screen.


Integration with Superset’s New Alert and Report System

With the release of Superset 4.0, a new Alerts and Reports framework was introduced, replacing the previous reporting mechanisms. To ensure full compatibility, we reimplemented our enhancements, including the S3 delivery option and custom email body support, within this new framework. The updates extend the base notification class used by Superset’s core notification methods, ensuring that our improvements remain upgrade-safe and consistent with Superset’s architecture.

To access the new features, users simply select “Alerts & Reports” from the Superset settings menu, where they’ll find the additional configuration options for S3 delivery and enhanced email fields.


Secure, Enterprise-Grade Authentication with Keycloak

We also integrated Keycloak authentication, enabling teams to manage user access through a central identity provider. This integration supports enterprise requirements like single sign-on (SSO), role-based access control (RBAC), and multi-factor authentication, allowing organizations to manage Superset users securely and at scale.


A More Robust and Flexible Superset

Taken together, these improvements make Superset not only more powerful, but also more aligned with modern business needs:

  • Secure and user-aware data handling through detokenisation.
  • Richer, more customizable reporting with enhanced email options and S3 delivery.
  • Enterprise-ready access control with Keycloak integration.
  • Seamless compatibility with Superset’s new Alerts & Reports system.

By enhancing these key areas, we’ve turned Superset into a more complete BI platform — one that integrates deeply into existing enterprise workflows while maintaining the speed, openness, and flexibility that make it so popular with analysts and engineers alike.


Looking Ahead

We see these enhancements as the foundation for future improvements. As our use of Superset continues to grow, we’re exploring further ways to make it smarter, more automated, and more connected with the wider data ecosystem.

With the right enhancements, open-source analytics tools like Superset can deliver the best of both worlds: enterprise-grade capability and open-source agility.

Counting Distinct Values Accurately in Druid SQL

· 3 min read
Aidan Mulgrew
Software Engineer

When working with Druid SQL, it's easy to fall into a common trap when counting distinct values: using COUNT(DISTINCT ...) directly can sometimes return unexpected results. Recently, I hit a case where COUNT(DISTINCT) returned a different value than selecting the DISTINCT rows manually - and this post explains why that happens, and how to fix it.

Unlock NetSuite Sales and Orders with Sheetloom

· 3 min read
Gerry Conaghan
Business Development Manager

Filling business decision models with essential data captured in NetSuite™ can be: tiring, confusing, repetitive and error prone. The good news for the brave souls, doing all that manual labour, is Sheetloom. Sheetloom is the game-changing SaaS solution that seamlessly automates the injection of NetSuite™ data straight into dynamic Excel decision models, pivots, dashboards and reports.

Credit Control

· One min read
Calum Miller
Director

Credit Control is a challenging and time-consuming affair which, if managed incorrectly, can lead to cash flow problems and even business failure. Keeping track of customer payment history, to indicate potential problems, is an important task but often; labour intensive, complex and error prone.

Finance - Credit Check Automation

· One min read
Calum Miller
Director

A finance company used a mix of the latest digital tech and a human touch to bring a different approach to small businesses lending.

A crucial part of the loan review process required the company to credit score each applicant. Testing of the credit score process proved very time consuming using traditional methods. The finance company would manually; select applicants, collect individual responses and then consolidate results from a credit checking service.

Information Technology must serve decision makers

· 3 min read
Calum Miller
Director

All Business Intelligence (BI) vendors have a dirty little secret. It’s hidden under a weighty digital rock, keeping all those electronic worms company. It’s shared by Looker, Pentaho, Power BI, Tableau, SAP, Quicksight, et al.

It’s obvious, but seldom noticed. Take a deep breath, here it is;

All BI clients use Excel, way more than the BI vendors care to admit.