DATA ANALYSIS
Classified in Computers
Written at on English with a size of 4.64 KB.
**BEST**
SkinnerDB: Regret-Bounded
Query Evaluation via
Reinforcement Learning
INTRODUCTION AND PROBLEM DEFINITION:
The work is on query optimization, more specifically: SkinnerDB focuses on finding
The optimal Join Order. Because it has the most impact in practice.
SkinnerDB aims to get
Expected near optimal
Results without needing any
A-priori information.
It does not make strong
Assumptions either.
CONTRIBUTIONS:
•Introduced a new quality criterion for query evaluation strategies that compares
Expected and optimal execution cost.
• Proposed several adaptive execution strategies based on reinforcement learning.
• Formally proved correctness and regret bounds for those execution strategies.
• Experimental comparisations of those strategies, implemented in SkinnerDB,
Against various baselines.
COMPARED TO OTHER METHODS:
--Traditional query optimizers rely on
A-priory information. They predict cost
Based on coarse-grained data statistics
And under simplifying assumptions.
-- SkinnerDB maintains no data statistics
And uses no simplifying cost and
Cardinality models. Instead it learns
(near-)optimal query plans with RL.
--A lot of recent machine learning methods have been used for query optimization
But they learn from previous queries and give good results only if current query is
Similar to previous ones.
-- SkinnerDB however, does not suffer from any kind of generalization error across
Queries.
HOW IT WORKS:
**Preprocessing involves:
Filtering on base tables,
Batching.
**Post-processing involves
Grouping, aggregation, and
Sorting.
**Query-->preprocessor-->join processor-->postprocessor-->result.
UCT FOR JOIN ORDERING FROM RL
UCT stands for “Upper Confidence bounds for Trees”. The UCT algorithm balances between exploration and exploitation in a principled
Manner that results in probabilistic guarantees.
UCT works well on very large search spaces.
The UCT algorithm maintains two counters per node: the number of times the node
Was visited and the average reward that was obtained for paths crossing through
That node. If counters are available for all relevant nodes, the UCT algorithm selects
At each step the child node c maximizing the formula rc+w*sqrtroot(log(vp)/vc where r
C
Is the average reward for c, vc
And v
P
Are the number of visits for child and
Parent node.
EXECUTION STRATEGIES FOR INTRAQUERY LEARNING
For this method to work efficiently there are some very precise requirements on the
Execution engine.
● Early quality feedback: So the system we can learn fast
● Low switching overhead: Since the will try different join orders frequently
● Small execution state: Because the system will stop and resume join
Executions and shall not do too much redundant work.
SUMMARY SkinnerDB
● Use Reinforcement Learning for intra-query learning
-- NO data statistics -- NO cardinality model -- NO cost model.
● Formally guarantees near-optimal expected cost.
● Intra-query learning pays off for difficult queries, does not suffer from poor
Generalization of inter-query learning.
CORRECTNESS:
--The correctness = The proposed system produces the same resulting tuples with
An ordinary joining operation.
--This is proven in a verbal manner in the paper. In short: the system produces no
Duplicates because it keeps track of all component tuple indices in a set.
--Skinnerdb produces each result tuple at least once (ie does not miss any): --Completed result tuples always added to overall results and component tuples are
Processed in a way that all combinations are covered.