Unlocking the Power of Druid MSQ in Superset
At Millersoft, we’ve been working to make Apache Druid more efficient, flexible, and cost-effective for analytics workloads. One area we’ve focused on is the Python-to-Druid connector (pydruid
), which powers integrations with Apache Superset.
We’re excited to share some improvements we’ve made that open the door to new capabilities in Superset dashboards and the potential for substantial cost savings in large-scale Druid deployments.
What We've Done
We extended the pydruid
connector to support Druid’s MSQ (Multi-Stage Query) engine, which means Superset dashboards and charts can now use the MSQ engine. In addition, Superset can finally cancel running Druid queries, preventing wasted resources and speeding up the user experience.
These enhancements make Superset more responsive for analysts and more efficient for operators.
Why MSQ Is a Game-Changer
Traditionally, Druid queries rely heavily on historical nodes that keep data loaded on disk. While this ensures speed, it can be expensive to maintain, especially as data volumes grow.
The MSQ engine changes that. Now:
- It can query directly against deep storage, removing the need to keep all historical data on disk.
- Organizations can keep only recent hot data (e.g., the last 90 days) on historical nodes.
- Older data can remain cost-effectively stored in deep storage, only queried on demand.
The Cost-Saving Opportunity
This hybrid model allows companies to reduce the number and size of historical nodes they need to operate.
For example:
- Keep only 90 days of hot data in historicals.
- Use MSQ to query older data in deep storage when needed.
- Scale down historical nodes from multiple larger instances to fewer, smaller ones.
Depending on data volumes, this strategy could save thousands of dollars per month in infrastructure costs - all while retaining complete access to historical data.
A Better Experience in Superset
From the end-user’s perspective, the improvements are seamless:
- Dashboards and charts remain fast and interactive for recent data.
- Long-term analytics can still be run when needed, without bloating infrastructure.
- Analysts can cancel queries directly from Superset, saving time and frustration.
This makes Superset not just a visualization tool, but also a cost-conscious analytics platform that adapts to both business and technical needs.
Looking Ahead
The combination of Superset’s flexibility and Druid’s MSQ engine gives organizations a new way to balance performance, cost, and data accessibility.
By making MSQ a first-class citizen in Superset through our pydruid
improvements, we’re helping teams:
- Deliver fast, interactive dashboards.
- Keep full access to long-tail historical data.
- Reduce infrastructure costs by optimizing their Druid footprint.
In other words: do more with less, without sacrificing depth of insight.