creating nested array presto - sql

But I have this table:
with cte (customer_id, product, sell) as (
values
(1, 'a', 100),
(1, 'b', 150),
(2, 'a', 90),
(2, 'b', 110)
)
select * from cte
I want a result like the following:
+----------------------------------------------------------+
| result |
+----------------------------------------------------------+
| {1: {"a": 100, "b": 150}, 2: {"a":90, "b": 110}} |
+----------------------------------------------------------+

Your result is not a nested array but a nested map. I would say that unless this is part of some bigger query it is quite strange to try mapping whole table to a single row especially taking in account size of data usually handled by Athena but for this test data you can use map_agg and nested grouping:
with cte (customer_id, product, sell) as (
values (1, 'a', 100),
(1, 'b', 150),
(2, 'a', 90),
(2, 'b', 110)
)
select map_agg(customer_id, m) as result
from (
select customer_id, map_agg(product, sell) m
from cte
group by customer_id
)
group by true -- fake grouping
Output:
result
{1={a=100, b=150}, 2={a=90, b=110}}

Related

Is it possible to set e the initial-select value of a recursive CTE query with a parameter?

Using this self-referencing table:
CREATE TABLE ENTRY (
ID integer NOT NULL,
PARENT_ID integer,
... other columns ...
)
There are many top-level rows (with PARENT_ID = NULL) that can have 0 to several levels of child rows, forming a graph like this:
(1, NULL, 'A'),
(2, 1, 'B'),
(3, 2, 'C'),
(4, 3, 'D'),
(5, 4, 'E'),
(6, NULL, 'one'),
(7, 6, 'two'),
(8, 7, 'three'),
(9, 6, 'four'),
(10, 9, 'five'),
(11, 10, 'six');
I want to write a query that would give me the subgraph (all related rows in both directions) for a given row, for instance (just showing the ID values):
ID = 3: (1, 2, 3, 4, 5)
ID = 6: (6, 7, 8, 9, 10, 11)
ID = 7: (6, 7, 8)
ID = 10: (6, 9, 10, 11)
It's similar to the query in ยง3.3 Queries against a Graph of the SQLite documentation, for returning a graph from any of its nodes:
WITH RECURSIVE subtree(x) AS (
SELECT 3
UNION
SELECT e1.ID x FROM ENTRY e1 JOIN subtree ON e1.PARENT_ID = subtree.x
UNION
SELECT e2.PARENT_ID x FROM ENTRY e2 JOIN subtree ON e2.ID = subtree.x
)
SELECT x FROM subtree
LIMIT 100;
... with 3 as the anchor / initial-select value.
This particular query works fine in DBeaver. The sqlite version available in db-fiddle gives a circular reference error, but this nested CTE gives the same result in db-fiddle.
However, I can only get this to work when the initial value is hard-coded in the query. I can't find any mention of how to supply that initial-select value as a parameter.
I'd think it should be straightforward. Maybe the case of having more than one top-level row is very unusual, or I'm overlooking something blindingly obvious?
Any suggestions?
As forpas points out above, SQLite doesn't support passing parameters to stored/user defined functions.
Using a placeholder in the prepared statement from the calling code is a good alternative.

SQL Select: Do rows matching id all have the same column value

I have a table like this
sub_id reference
1 A
1 A
1 A
1 A
1 A
1 A
1 C
2 B
2 B
3 D
3 D
I want to make sure all the references in each group have the same reference.
Meaning, for example, all references in:
group 1 should be A
group 2 should be B
group 3 should be D
If they are not, then I would like to have returned a list of sub_id's.
So for the table above my result would be: 1
Ideally, with these conditions reference would be in a separate table with sub_id as PK, but I need to fix first for a massive dataset before I can move on restructuring the database.
You could use the following method:
select t.sub_id
from YourTable t
group by t.sub_id
having max(t.reference) <> min(t.reference)
Change YourTable to suit.
Are you looking for simple aggregation ?
select sub_id
from table t
group by sub_id
having count(distinct reference) > 1;
The query you want:
SELECT sub_id
FROM test_sub
GROUP BY sub_id HAVING count(DISTINCT reference) > 1
;
Here is what I used to test it:
CREATE TABLE `test_sub` (
sub_id int(11) NOT NULL,
reference varchar(45) DEFAULT NULL
);
INSERT INTO test_sub (sub_id, reference) VALUES
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'C'),
(2, 'B'),
(2, 'B'),
(3, 'D'),
(3, 'D'),
(3, 'D'),
(4, 'E'),
(4, 'E'),
(4, 'E'),
(5, 'F'),
(5, 'G')
;

Replace hard coded values with data from table

Currently, I have 3 affiliations hard-coded in a query. They serve as a heirarchy: 1 = Faculty, 2 = Staff, 3 = Student. If a user from the affiliations_tbl table has more than one affiliation (example: a Staff member who is also a Student), it will use their Staff affiliation since it is higher on the heirarchy that is defined with the partition by and decode().
SELECT x2.emplid,
scc_afl_code
FROM (SELECT x.emplid,
scc_afl_code,
row_number() over(partition BY x.emplid ORDER BY x.affil_order) r
FROM (SELECT t.emplid,
scc_afl_code,
DECODE(scc_afl_code,
'FACULTY',
1,
'STAFF',
2,
'STUDENT',
3,
999) affil_order
FROM affiliations_tbl t
WHERE t.scc_afl_code IN
(SELECT a.scc_afl_code
FROM affiliation_groups_tbl a
WHERE a.group = 'COLLEGE')) x) x2
WHERE x2.r = 1;
I have created a table that will store affiliation groups affiliation_groups_tbl so I can scale this by adding data to the table, rather than changing the hard-coded values in this query. Example: Instead of adding 'CONSULTANT', 4 to the decode() list, I would add it to the table, so I wouldn't have to modify the SQL.
scc_afl_code | group | group_name | sort_order
-------------+---------+------------+-----------
FACULTY | COLLEGE | Faculty | 1
STAFF | COLLEGE | Staff | 2
STUDENT | COLLEGE | Student | 3
I've already updated the latter half of the query to only select scc_afl_code that are in the COLLEGE_GROUP group. How can I properly update the first part of the query to use the table as a hierarchy?
Try a piece of code below instead decode in the select clause of your statement:
coalesce((
select g.sort_order
from affiliation_groups_tbl g
where g.scc_afl_code = t.scc_afl_code ), 999)
You can try like that
create table dictionary
(id number,
code varchar2(32),
name varchar2(32),
sort number);
insert into dictionary (id, code, name, sort) values (16, 'B', 'B name', 1);
insert into dictionary (id, code, name, sort) values (23, 'A', 'A name', 2);
insert into dictionary (id, code, name, sort) values (15, 'C', 'C name', 4);
insert into dictionary (id, code, name, sort) values (22, 'D', 'D name', 3);
select partition,
string,
decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999) decode,
row_number() over(partition by partition order by decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8)
minus
-- Alternate --
select partition,
string,
nvl(t.sort, 999) nvl,
row_number() over(partition by partition order by nvl(t.sort, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8) r
left join dictionary t
on t.code = r.string;

SQL Server - Return TOP X items based on aggregate function

Hello I have a SQL query that I have been running, but I'm getting way too much data than what I need.
For context, we carry about 3000 items, across 30 product categories, and 50 sub-categories (parent-child relationship). We sell them in thousands of stores, and our database captures the weekly sales per store per product. We store for multiple years worth of data.
Currently my query returns all records, while I would like to limit it to the top10 selling items, based on the sum of their unit sales for the latest 52 weeks (my where clause specifies the 52 weeks, but I need the weekly details in my pull).
SELECT
store.store_id,
store.sales_rep,
store.sales_rep_manager,
prod.category,
prod.sub_category,
prod.item,
sales.week_id,
sum(sales.units) as "UNITS SOLD",
sum(sale.dollars) as "DOLLARS SOLD"
...
GROUP BY
store.store_id,
store.sales_rep,
store.sales_rep_manager,
prod.category,
prod.sub_category,
prod.item,
sales.week_id,
ORDER BY
7 desc
I think I should be using a TOP statement, but all I've succeeded in doing was limit the entire pull to top 10 records overall.
What I would like to see is the top 10 items based on unit velocity for the date range selected, but for each store & sub-category
Store1
Category1
Sub-Category1
TOP SELLING Item #1
TOP SELLING Item #2
TOP SELLING Item #3
...
TOP SELLING Item #10
Right now I've connected my query in Excel, and I ask my pivot table to filter only top10 items.
My issue with this solution is I'm bringing a TON more data than what I need, making the file irresponsive, too big, and also takes a lot of time to complete Query.
You can limit the results to the total sales in the result set pretty easily:
with q as (<your query here>)
select q.*
from (select q.*, dense_rank() over (order by TotalUs) as rnk
from (select q.*,
sum("Units Sold") over (partition by prod.item) as TotalUS
from q
) q
) q
where rnk <= 10;
Getting it for the last year is a little trickier:
with q as (<your query here>)
select q.*
from (select q.*, dense_rank() over (order by TotalUs) as rnk
from (select q.*,
sum(last_52weeks) over (partition by prod.item) as TotalUS
from (select q.*,
(case when dense_rank() over (partition by item_id order by week_id desc) <= 52
then "Units Sold" else 0
end) as last_52weeks
from q
) q
) q
) q
where rnk <= 10;
I usually try to approach these types of problems using cascading CTEs. I've noticed that this format can solve fairly complex problems, while still being somewhat more readable. The data kind of flows from one CTE to the next, then comes out with the SELECT statement at the end.
Here's an example that I put together for this case. The data itself is somewhat questionable, as store_id 5 is really the only one with more than 10 items sold, but it's really just a demonstration so hopefully you can still get the big picture. Obviously your data structure is quite different, but you should be able to make adjustments as needed to make it work with your real setup:
--===================================================================
-- Create and populate a table for demonstration purposes only:
--===================================================================
IF OBJECT_ID('tempdb..#Sales') IS NOT NULL DROP TABLE #Sales;
CREATE TABLE #Sales (
item_id INT,
category VARCHAR(10),
sub_category VARCHAR(10),
store_id INT,
week_id INT,
units INT,
dollars MONEY
);
INSERT INTO #Sales
VALUES (1, 'A', 'A1', 1, 1, 10, 50),
(1, 'A', 'A1', 2, 1, 10, 50),
(1, 'A', 'A1', 3, 1, 10, 50),
(1, 'A', 'A1', 4, 1, 10, 50),
(1, 'A', 'A1', 5, 1, 20, 50),
(2, 'B', 'B1', 1, 1, 20, 50),
(2, 'B', 'B1', 2, 1, 20, 50),
(2, 'B', 'B1', 3, 1, 20, 50),
(2, 'B', 'B1', 4, 1, 20, 50),
(2, 'B', 'B1', 5, 1, 20, 50),
(3, 'A', 'A1', 5, 1, 40, 50),
(4, 'A', 'A1', 5, 1, 10, 50),
(5, 'A', 'A1', 5, 1, 5, 50),
(6, 'A', 'A1', 5, 1, 100, 50),
(7, 'A', 'A1', 5, 1, 95, 50),
(8, 'A', 'A1', 5, 1, 35, 50),
(9, 'A', 'A1', 5, 1, 15, 50),
(10, 'A', 'A1', 5, 1, 11, 50),
(11, 'A', 'A1', 5, 1, 12, 50),
(12, 'A', 'A1', 5, 1, 49, 50),
(12, 'A', 'A1', 5, 1, 150, 50);
--===================================================================
-- The actual query starts here:
-- (note that the following is a single statement)
--===================================================================
WITH AggregatedSales AS (
-- This CTE will give you the totals for each store, for each item and category / sub-category:
SELECT
s.store_id,
s.category,
s.sub_category,
s.item_id,
--s.week_id, -- If you want to see the combined data for the entire date range, don't include week here
SUM(s.units) [total_units_sold],
SUM(s.dollars) [total_dollars_sold]
FROM #Sales s
WHERE s.week_id BETWEEN 1 AND 52 -- Adjust these to match your actual range
GROUP BY
s.store_id,
s.category,
s.sub_category,
s.item_id
--s.week_id
),
RankedSales AS (
-- This will assign a ranking to each of the records from the previous CTE.
-- The ranking is reset for each store, and ranks higher number of units sold toward the top.
SELECT
a.*,
DENSE_RANK() OVER (
PARTITION BY a.store_id
ORDER BY a.total_units_sold DESC
) [ranking]
FROM AggregatedSales a
)
-- Now we just select all of the "TOP 10" ranked items here:
-- (the WHERE clause is doing all the work in this case, so we don't need an actual TOP)
SELECT
rs.*
FROM RankedSales rs
WHERE rs.ranking <= 10
ORDER BY
rs.store_id,
rs.ranking;

How do I sort results from a nested select while keeping the rollup on the last row?

How do I sort the results into the following example by the sellers name while keeping the rollup at the bottom?
Since the the grouping is applied to the nested SELECT I can't use ORDER BY and since the grouping isn't applied at the top level I can't use the GROUPING either.
Click here to see the working example in SQL Fiddle.
CREATE TABLE Sales
(
SellerID INT
, StoreID INT
, Price MONEY
);
CREATE TABLE Sellers
(
SellerID INT
, Name VARCHAR(50)
)
INSERT INTO Sales VALUES
(1, 1, 100),
(1, 1, 100),
(1, 1, 100),
(2, 2, 200),
(2, 2, 200),
(3, 2, 250),
(3, 2, 250),
(3, 2, 250),
(3, 2, 250);
INSERT INTO Sellers VALUES
(1, 'C. Thirdplace'),
(2, 'A. Firstplace'),
(3, 'B. Secondplace');
SELECT s.Name AS Seller_Name
, x.TotalSales AS Total_Sales
FROM
(
SELECT s.SellerID AS SellerID
, SUM(s.Price) AS TotalSales
FROM Sales s
GROUP BY s.SellerID
WITH ROLLUP
) x
LEFT JOIN Sellers s
ON s.SellerID = x.SellerID;
Which produces the following result:
SELLER_NAME TOTAL_SALES
--------------- -----------
C. Thirdplace 300
A. Firstplace 400
B. Secondplace 1000
(null) 1700
ORDER BY
CASE WHEN seller_name IS NULL THEN 1 ELSE 0 END,
seller_name