Comparison of the queries - sql

There are three queries A, B and C. I should compare the queries B and C to query A. And answer if in comparison to A: are the results of B and C respectively to a rollup, drill down or nothing of these.
Query A:
SELECT
Geography.Region, Time.Month, SUM (Sales.numberSold)
FROM
Sales, Time, Product, Geography
WHERE
Sales.ProductID = Product.ProductID
AND Sales.TimeID = Time.TimeID
AND Sales.GeoID = Geography.GeoID
AND Product.ProductFamily = "video"
AND Time.Year = 2000
AND Geography.Country = "Germany"
GROUP BY
Geography.Region, Time.Month;
Query B:
SELECT
Geography.Region, Time.Month, SUM (Sales.numberSold)
FROM
Sales, Time, Geography
WHERE
Sales.TimeID = Time.TimeID
AND Sales.GeoID = Geography.GeoID
AND Time.Year = 2000
AND Geography.Country = "Germany"
GROUP BY
Geography.Region, Time.Month;
Query C:
SELECT
Geography.City, Time.Month, SUM (Sales.numberSold)
FROM
Sales, Time, Product, Geography
WHERE
Sales.ProductID = Product.ProductID
AND Sales.TimeID = Time.TimeID
AND Sales.GeoID = Geography.GeoID
AND Product.ProductFamily = "video"
AND Time.Year = 2000
AND Geography.Country = "Germany"
GROUP BY
Geography.City, Time.Month;
Compare the queries B and C to query A.
In comparison to A: are the results of B and C respectively:
a rollup or
a drill down or
neither of these two?
The gaps are
The result of query ...... is .......................... in comparison to the result of query A".
The missing parts to be inserted are: "B", "C" and "a rollup", "a drill down", or "neither"
My words
The result of query B is a rollup in comparison to the result of query A".
I don't know if my answer is correct. What is the solution here for this point and why ?
Cube:
A cube consists of cells each of which is defined by the intersection
of all dimensions (axes) belonging to this cube. Each cell of the cube
can contain one or more measures.
Rollup:
The CUBE operator calculates aggregations by combining all possible
subsets of the attributes listed in the parentheses following the word
CUBE. Often, not all the combinations are necessary, however, but it
may be sufficient to aggregate only by taking first one attribute,
then two, then three, etc. until all are taken together. This is done
by the ROLLUP operator.
Drill Down:
In contrast to simple data access, OLAP requires a multidimensional
data model that is built according to analysis needs. => Not a
relational technique. OLAP allows data analysis with the goal of
discovering new information. Reports present consolidated values in
tables and images. The functionality allows for instance to "drill
down" to detailed data and "drill up" ("roll up") again.

Drill down and roll up functions are inverse and allow you to add and remove granularity in axes like a zoom.
a roll up - less granularity in the target table (for example years instead of months)
a drill down - more granularity in the target table (months instead of years)
neither of these two - there is other data in the tables
A three-dimensional cube is given, where the dimensions are:
Geography
Time
Product
Values are sales volumes (SUM (Sales.numberSold)).
Let's call this source cube Z.
In the end, A, B, and C show only two dimensions:
Geography
Time
The Product is always shrunk into one dimension.
Cube A:
The Product is shrunk into one dimension by the slice for Product.ProductFamily = "video"
Z
rollup on Product (from ArticleName to ProductFamily)
slice for Product.ProductFamily = "video"
rollup on Time (from Day to Year)
slice for Time.Year = 2000
rollup on Geography (from BranchName to Country)
slice for Geography.Country = "Germany"
drill down Geography from Country to Region
drill down Time from Year to Month
A
Cube B:
The Product is shrunk into one dimension by rollup on Product (from ArticleName to All).
The B is formed from the Z with the same granularity of rollups and drills down.
The result of query B is neither of these two in comparison to the result of query A.
Z
rollup on Product (from ArticleName to All)
rollup on Time (from Day to Year)
slice for Time.Year = 2000
rollup on Geography (from BranchName to Country)
slice for Geography.Country = "Germany"
drill down Geography from Country to Region
drill down Time from Year to Month
B
Cube C:
The Product is shrunk into one dimension the same way as in Cube A.
So the only difference is in granularity.
The C is more detailed.
The result of query C is a drill down in comparison to the result of query A.
A
drill down Geography (from Region to City)
C
Sources:
Data Warehousing - OLAP on tutorialspoint
Online analytical processing on wikipedia
OLAP Operations in the Multidimensional Data Model on javatpoint
Z
. . . . .
. . . . .
+---+---+---+---+ .
audio 123 / 2 / 9 / / /|.
+---+---+---+---+ + .
video 321 / 3 / 6 / / /|/|.
+---+---+---+---+ + + .
video 123 / 5 / 2 / / /|/|/|.
+---+---+---+---+ + + + .
LA | 5 | 2 | | |/|/|/|.
+---+---+---+---+ + + +
NY | 3 | 8 | | |/|/|/:
+---+---+---+---+ + + :
| | | | |/|/:
+---+---+---+---+ + :
| | | | |/:
+---+---+---+---+ :
: : : : :
: : : : :
01-01-22
02-01-22
Edit 1:
The disadvantage of the sources is that they say what a given function is and not what it isn't. It is important to thoroughly understand what they do to determine what they can't.
Even if you only have to decide between the Roll-Up and the Drill Down it is necessary to understand the Slice in your example. The Slice is pretty weak, so it's a good idea to find out that it's a case of the Dice.
The Roll-Up and the Drill Down aggregate all values - the Group by clause.
The Slice (Dice) filters - the Where clause.
Things achieved by the Slice can't be achieved by the Roll-Up.
The Roll-Up and the Drill Down in dimension Product can scale on All (remove the dimension), Category, Family, Group, and Name.
In select A, we get rid of the Product dimension using the Slice and in query B using the Roll-Up to All.
The Slice in query A leaves only Sales.numberSold for "video". This can't be achieved with Roll up.

Related

Is there any way to perform FOR LOOP in Presto?

Edit to clarify my question
Database: Presto, I'm querying using my company own tool but it's basically similar to MySQL or other stuff.
The purpose is: I have training classes, and I want to evaluate them by comparing a few metrics after and before the training day (says, W+1, W+2, etc. vs pre-training). After doing a few sub-queries, I was able to achieve the table with values as below (each class has its own metric, and is unique).
class | metric | shopid | pre-training | w+1 | w+2 | w+3
A | increasing sth | 1122 | x | x1 | x2 | x3
B | decrease sth | 3322 | y | y1 | y2 | y3 etc.
So now I want to compare the value of W+1, W+2 to pre-training to give a conclusions, for example: if x1 > x -> good, if x2 < x -> bad, etc.
So I write an CASE WHEN statement
CASE metric = 'increase sth'
WHEN x1 > x THEN 'good'
WHEN x1 < x THEN 'bad'
CASE metric = 'decrease sth'
....
To apply on columns w+1, w+2, etc. so to get the desired result, but since I have to write the CASE statement for 4 columns, it would be very lengthy and inefficient and repetitive, so I was thinking of LOOP so that I'd just need to write the CASE statement once and it could be apply on all 4 columns without repetition.
I could've extracted the data and done this execution in Python, but I want to learn how to do it in SQL so that I don't have to do extra work after finishing querying.
Sorry, I'm very new to SQL (only about two months in, still working hard enrich my knowledge)
Hope you can help. Much thanks for your help.
Your question is quite vague, but you can use a lateral join to split the rows apart and do the calculation only once. This will tend to put the values in separate rows:
select t.*, diff.*
from t cross join lateral
(select which, (v.metric - t.metric1) as diff
from (values (2, t.metric2), (3, t.metric3)
) v(which, metric)
) diff;
You can also put the values back in one row.
That said, you don't seem to have a good foundation in working with relational databases. The problem starts with your initial structure, where you are storing values across columns rather than in separate rows.
SQL doesn't have "looping" because it is set-based. And it is set-based so the optimizer can figure out the best way to run queries.

Changes to table not designed for SQL

I am supposed to do some changes to an enormous CSV file based on a different file. Therefore I chose to do it in SQL but after further consideration I am not sure how to proceed..
In the 1st table I have a list of contracts. Columns represent some segments the contract belongs to and some products that can be linked to the contract (example in the table below).
Here contract no. 1234 belongs to segments X1 and Y2. There is no product number 1 linked to it, but it has product number 2 linked to it. The product originaly ends on the 1st of January 2030.
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |01/01/2030 |..
The 2nd file is a list of combinations of segments and an indication how the "date" columns should be adjusted. The example shows following situation - if there is prod_2 linked to the contract which belongs to groups X1 and Y2, end the prod_2 this year. I need this result to alter table no. 1.
prod_no|segment_1|segment_2|result
prod_2 | X1 | Y2 | end the product on anniversary
Ergo I need to get to the result:
cont_n|date|segment_1|segment_2|..|prod_1|date_prod_1|product_2|date_product_2|..
1234 |3011| X1 | Y2 |..| | |YES |30/11/2019 |..
In the original files I have around 600k rows and more than 300 columns (meaning around 100 different products) in table 1 and around 800 possible combinations of segments in table 2.
The algorithm I need to implement (very generally):
for x=1 to 100
IF product_x = YES THEN date_product_x = date + "Seach for result in table2"
Is there a reasonable way how to change the "date_product_x" columns based on the 2nd table or would it be better to find a different solution?
Thanks a lot!
I can only give you a general approach, because the information in your question is general (for example, why does "end the product on anniversary" translate to "30/11/2019"? It's not explained in the question, so I assume you're going to be able to handle that part of the logic).
You can approach this by using an UNPIVOT on Table 1 to get a structure like:
cont_n | segment1 | segment2 | product_number | product_date
You will UNPIVOT..FOR date_product_1 thru date_product_100. You'll either have to type out all 100 column names, or use dynamic sql to build the whole thing.
You'll do some string manipulation to grab the "x" portion of "date_product_x", and turn it into "prod_x", and then you can join to the second table on the two segment columns and the "prod_x" column, get the result column value, and do whatever rules you're doing to get the value you want for date_product_x.
Finally, you take that result, and PIVOT it back to the one-row-per-contract form, and JOIN it to your original table to UPDATE the date_product_x columns.

Partitioning join to limit records in SQL

I have 2 tables:
- first one containing spatial data - geometry of circles
- second contains geometries of lines.
I want to find all lines which are inside each circle. I have a query which can do that, however there are millions of records so it is unusably slow.
There is a column in both tables which is area_id and essentially all circles are assigned to particular area and all lines as well, so if I can do the intersect of the circles only with the lines in the matching area this will reduce the load a lot. The problem is I can't think of solution e.g. using windowing function. The query I am using is:
Select ct.AREA_ID, ct.Circle_descr, lt.Line_descr from circles_table as ct
JOIN lines_table as lt
ON
circles_table.Circle_location.STIntersects(points_table.Point_location)=1
*using a where clause at the end makes no difference as it is essentially part of the slow join...
+---------------+----------------------+--------------------------+
| AREA_ID (int) | Circle_descr(varchar) | Circle_location(geometry)|
+---------------+----------------------+--------------------------+
+---------------+---------------------+-------------------------+
| AREA_ID (int) | Line_descr(varchar) | Line_location(geometry) |
+---------------+---------------------+-------------------------+
Add an additional join criterion to partition the rows by area_id before comparing them. Something like
Select ct.AREA_ID, ct.Circle_descr, lt.Line_descr
from circles_table as ct
JOIN lines_table as lt
ON ct.Circle_location.STIntersects(lt.Point_location)=1
AND ct.area_id = lt.area_id

SQL Spatial Subquery Issue

Greetings Benevolent Gods of Stackoverflow,
I am presently struggling to get a spatially enabled query to work for a SQL assignment I am working on. The wording is as follows:
SELECT PURCHASES.TotalPrice, STORES.GeoLocation, STORES.StoreName
FROM MuffinShop
join (SELECT SUM(PURCHASES.TotalPrice) AS StoreProfit, STORES.StoreName
FROM PURCHASES INNER JOIN STORES ON PURCHASES.StoreID = STORES.StoreID
GROUP BY STORES.StoreName
HAVING (SUM(PURCHASES.TotalPrice) > 600))
What I am trying to do with this query is perform a function query (like avg, sum etc) and get the spatial information back as well. Another example of this would be:
SELECT STORES.StoreName, AVG(REVIEWS.Rating),Stores.Shape
FROM REVIEWS CROSS JOIN
STORES
GROUP BY STORES.StoreName;
This returns a Column 'STORES.Shape' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. error message.
I know I require a sub query to perform this task, I am just having endless trouble getting it to work. Any help at all would be wildly appreciated.
There are two parts to this question, I would tackle the first problem with the following logic:
List all the store names and their respective geolocations
Get the profit for each store
With that in mind, you need to use the STORES table as your base, then bolt the profit onto it through a sub query or an apply:
SELECT s.StoreName
,s.GeoLocation
,p.StoreProfit
FROM STORES s
INNER JOIN (
SELECT pu.StoreId
,StoreProfit = SUM(pu.TotalPrice)
FROM PURCHASES pu
GROUP BY pu.StoreID
) p
ON p.StoreID = s.StoreID;
This one is a little more efficient:
SELECT s.StoreName
,s.GeoLocation
,profit.StoreProfit
FROM STORES s
CROSS APPLY (
SELECT StoreProfit = SUM(p.TotalPrice)
FROM PURCHASES p
WHERE p.StoreID = s.StoreID
GROUP BY p.StoreID
) profit;
Now for the second part, the error that you are receiving tells you that you need to GROUP BY all columns in your select statement with the exception of your aggregate function(s).
In your second example, you are asking SQL to take an average rating for each store based on an ID, but you are also trying to return another column without including that inside the grouping. I will try to show you what you are asking SQL to do and where the issue lies with the following examples:
-- Data
Id | Rating | Shape
1 | 1 | Triangle
1 | 4 | Triangle
1 | 1 | Square
2 | 1 | Triangle
2 | 5 | Triangle
2 | 3 | Square
SQL Server, please give me the average rating for each store:
SELECT Id, AVG(Rating)
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating)
1 | 2
2 | 3
SQL Server, please give me the average rating for each store and show its shape in the result (but don't group by it):
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating) | Shape
1 | 2 | Do I show Triangle or Square ...... ERROR!!!!
2 | 3 |
It needs to be told to get the average for each store and shape:
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId, Shape;
-- Result
Id | Avg(Rating) | Shape
1 | 2.5 | Triangle
1 | 1 | Square
2 | 3 | Triangle
2 | 3 | Square
As in any spatial query you need an idea of what your final geometry will be. It looks like you are attempting to group by individual stores but delivering an average rating from the subquery. So if I'm reading it right you are just looking to get the stores shape info associated with the average ratings?
Query the stores table for the shape field and join the query you use to get the average rating
select a.shape
b.*
from stores a inner join (your Average rating query with group by here) b
on a.StoreID = b.Storeid

Calculate data in a second column using data from the first one

I need to create a SQL query which calculates some data.
For instance, I have such SQL query:
SELECT SUM(AMOUNT) FROM FIRMS WHERE FIRM_ID IN(....) GROUP BY FIRM;
which produces such data:
28,740,573
30,849,923
25,665,724
43,223,313
34,334,534
35,102,286
38,556,820
19,384,871
Now, in a second column I need to show relation between one entry and sum of all entries. Like that:
28,740,573 | 0.1123
30,849,923 | 0.1206
25,665,724 | 0.1003
43,223,313 | 0.1689
34,334,534 | 0.1342
35,102,286 | 0.1372
38,556,820 | 0.1507
19,384,871 | 0.0758
For instance, sum of all entries from first column above is gonna be 255,858,044 and the value in a first entry, second cell is gonna be 28,740,573 / 255,858,044 = 0.1123. And same for each entry in a result.
How can I do that?
UPD: Thanks #a_horse_with_no_name, I forgot to DBMS. It's Oracle.
Most databases now support the ANSI standard window functions. So, you can do:
SELECT SUM(AMOUNT),
SUM(AMOUNT) / SUM(SUM(AMOUNT)) OVER () as ratio
FROM FIRMS
WHERE FIRM_ID IN (....)
GROUP BY FIRM;
Note: Some databases do integer division. So, if AMOUNT is an integer, then you need to convert to a non-integer number in these databases. One easy method is to multiple by 1.0.