BigQuery: Subquery with UNION as ARRAY - sql

I have the following two example tables
Orders Table
order_id
linked_order1
linked_order2
1001
L005
null
1002
null
null
1003
L006
L007
Invoices Table
order_id
linked_order_id
charge
1001
null
4.27
1002
null
9.82
1003
null
7.42
null
L005
2.12
null
L006
1.76
null
L007
3.20
I need to join these so the charges of all the orders (linked and otherwise) can be shown as part of the single order row. My desired output is something like this.
Desired Output
order_id
linked_order1
linked_order2
invoices.charge
invoices.order_id
invoices.linked_order_id
1001
L005
null
4.27
1001
null
2.12
null
L005
1002
null
null
9.82
null
null
1003
L006
L007
7.42
null
null
1.76
null
L006
3.20
null
L007
I can manage to get the main order into the table as follows.
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id) AS invoice
FROM
`orders_table` AS orders
I can run a separate query to union all of the invoice results into a single table for given order ids but I can't combine this with the above query with out getting errors.
Something like this...
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM
(SELECT * FROM `invoices_table` WHERE order=orders.order_id
UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order1
UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order2)
) AS invoice
FROM
`orders_table` AS orders
But this gives me the correlated subqueries error.
[Update]
This is much simpler than I thought. The following query gives me what I was after.
SELECT
orders,
ARRAY(
SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id OR linked_order_id IN(orders.linked_order1, orders.linked_order2)) AS invoice
FROM
`orders_table` AS orders

Using CROSS JOINS,
SELECT o.*, ARRAY_AGG(i) invoices
FROM Orders o, Invoices i
WHERE o.order_id = i.order_id
OR i.linked_order_id IN (o.linked_order1, o.linked_order2)
GROUP BY 1, 2, 3;
Query results
[UPDATE]
Sometimes the query using OR conditions in WHERE clause might show poor perfomrance in large dataset. In that case you may try below query instead that generates same result.
SELECT o.*, ARRAY_AGG(i) invoices FROM (
SELECT o, i FROM Orders o JOIN Invoices i USING (order_id)
UNION ALL
SELECT o, i FROM Orders o JOIN Invoices i ON i.linked_order_id IN (o.linked_order1, o.linked_order2)
) GROUP BY 1, 2, 3;

For the desired output table, the full outer join is the right command.
with tblA as (Select order_id, 1 linked_order1, 2 linked_order2, from unnest([1,2,3]) order_id),
tblB as (Select order_id, 109.99 charge from unnest([3,4,5]) order_id
union all select null order_id, * from unnest([50.1,29.99]) charge
)
Select *
from tblA
full join tblB
using(order_id)
For your setting, there is the need to have several joining conditions. Therefore, the first table is used three times, for each joining key.
with tblA as (Select order_id, "L05" linked_order1, "L2" linked_order2, from unnest(["1","2","3"]) order_id),
tblB as (Select order_id, null linked_order_id, 109.99 charge from unnest(["3","4","5"]) order_id
union all select null order_id, "L05" , * from unnest([50.1,29.99]) charge
)
Select A.order_id,linked_order1,linked_order2, array_agg(struct(tblB.order_id,linked_order_id,charge))
from
(
Select * from tblA, unnest([order_id,linked_order1,linked_order2]) as tmp_id
) A
full join tblB
on tmp_id = ifnull(tblB.order_id,linked_order_id)
where charge is not null #or tmp_id=A.order_id
group by 1,2,3

Related

SQL Count column remaining at 1

I would like to display a running total of Invoice_Amount. Here is my current query:
SELECT cust_name, COUNT(*) as Invoice_Amount, Invoice.invoice_date
FROM Customer, Invoice
WHERE Customer.customer_id = Invoice.customer_id
GROUP BY Invoice.customer_id, Customer.cust_name,invoice_date;
and here is the current output:
cust_name Invoice_Amount invoice_date
Company A 1 2000-10-12 00:00:00.000
Company B 1 2000-09-22 00:00:00.000
Company C 1 2000-05-26 00:00:00.000
Company D 1 2000-08-15 00:00:00.000
Company E 1 2000-11-15 00:00:00.000
Company E 1 2000-05-02 00:00:00.000
Where I would like the Invoice_Amount in both cases to read 2 like so:
cust_name Invoice_Amount invoice_date
Company A 1 2000-10-12 00:00:00.000
Company B 1 2000-09-22 00:00:00.000
Company C 1 2000-05-26 00:00:00.000
Company D 1 2000-08-15 00:00:00.000
Company E 2 2000-11-15 00:00:00.000
Company E 2 2000-05-02 00:00:00.000
This is so I can eventually do something along the lines of:
HAVING (COUNT(*) > 1)
How would I go about getting to this result
There is no need for a GROUP BY or a HAVING because you're not actually grouping by anything in the final result.
;;;/* CTE with leading semi-colons for safety */;;;WITH src AS
(
SELECT c.cust_name, i.invoice_date,
COUNT(i.invoice_date) OVER (PARTITION BY i.customer_id)
AS Invoice_Count
FROM dbo.Customer AS c
INNER JOIN dbo.Invoice AS i
ON c.customer_id = i.customer_id
)
SELECT cust_name, Invoice_Count, invoice_date
FROM src
-- WHERE Invoice_Count > 1;
Well, as from your data, the combinations of invoice_date and cust_name seem to be unique - as COUNT(*) always returns 1.
You now seem to need the count value that you call invoice_amount to tally up for the same cust_name. 'Chopsticks' is occurring twice in your report, and, for 'Chopsticks', you need the value 2. But, still, you want to keep both rows.
Functions that sort-of aggregate data, but still return the same number of rows as the input, are not GROUP BY or aggregate functions, they are window functions, or OLAP/Analytic functions.
So, start from your grouping query, but then select from it, applying an OLAP function , and select from that outer query in turn, filtering for the OLAP function's result:
WITH
grp AS (
SELECT
cust_name
, count(*) AS invoice_amount
, invoice.invoice_date
FROM customer
JOIN invoice ON customer.customer_id = invoice.customer_id
GROUP BY
invoice.customer_id
, customer.cust_name
, invoice_date;
)
,
olap AS (
SELECT
cust_name
, SUM(invoice_amount) OVER(PARTITION BY cust_name) AS invoice_amount
, invoice_date
FROM grp
)
SELECT
*
FROM olap
WHERE invoice_amount > 1;

one to many relation between table columns. Grouping and finding combinations

In sample table t0 :
OrderID | ProductID
0001 1254
0001 1252
0002 0038
0003 1254
0003 1252
0003 1432
0004 0038
0004 1254
0004 1252
I need to find the most popular combination of two ProductIDs under one OrderID. The purpose is to decide which products are more likely to be sold together in one order e.g phone - handsfree. I think the logic is to group by OrderID, calculate every possible combination of productID pairs, count them per OrderID and select the TOP 2, but i realy can't tell if it is doable..
A "self-join" may be used but ensuring that one of the product ids is greater then than the other so that we get get "pairs" of products per order. Then it is simple to count:
Demo
CREATE TABLE OrderDetail
([OrderID] int, [ProductID] int)
;
INSERT INTO OrderDetail
([OrderID], [ProductID])
VALUES
(0001, 1254), (0001, 1252), (0002, 0038), (0003, 1254), (0003, 1252), (0003, 1432), (0004, 0038), (0004, 1254), (0004, 1252)
;
Query 1:
select -- top(2)
od1.ProductID, od2.ProductID, count(*) count_of
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
od1.ProductID, od2.ProductID
order by
count_of DESC
Results:
| ProductID | ProductID | count_of |
|-----------|-----------|----------|
| 1252 | 1254 | 3 |
| 1252 | 1432 | 1 |
| 1254 | 1432 | 1 |
| 38 | 1252 | 1 |
| 38 | 1254 | 1 |
----
With respect to displaying the "top 2" or whatever. You are likely to get "equal top" results so I would suggest you need to use dense_rank() and you may even want to "unpivot" the result so you have a single column of productids with their associated rank. How often you perform this and/or store this I leave to you.
with ProductPairs as (
select
p1, p2, count_pair
, dense_rank() over(order by count_pair DESC) as ranked
from (
select
od1.ProductID p1, od2.ProductID p2, count(*) count_pair
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
od1.ProductID, od2.ProductID
) d
)
, RankedProducts as (
select p1 as ProductID, ranked, count_pair
from ProductPairs
union all
select p2 as ProductID, ranked, count_pair
from ProductPairs
)
select *
from RankedProducts
where ranked <= 2
order by ranked, ProductID
WITH products as (
SELECT DISTINCT ProductID
FROM orders
), permutation as (
SELECT p1.ProductID as pidA,
p2.ProductID as pidB
FROM products p1
JOIN products p2
ON p1.ProductID < p2.ProductID
), check_frequency as (
SELECT pidA, pidB, COUNT (o2.orderID) total_orders
FROM permutations p
LEFT JOIN orders o1
ON p.pidA = o1.ProductID
LEFT JOIN orders o2
ON p.pidB = o2.ProductID
AND o1.orderID = o2.orderID
GROUP BY pidA, pidB
)
SELECT TOP 2 *
FROM check_frequency
ORDER BY total_orders DESC
The following query calculates the number of two-way combinations
among all orders in Orderline:
SELECT SUM(numprods * (numprods - 1)/2) as numcombo2
FROM ( SELECT orderid, COUNT(DISTINCT productid) as numprods
FROM orderline ol
GROUP BY orderid ) o
Notice that this query counts distinct products rather than order lines, so
orders with the same product on multiple lines do not affect the count.
The number of two-way combinations is 185,791. This is useful because the
number of combinations pretty much determines how quickly the query generating
them runs. A single order with a large number of products can seriously
degrade performance. For instance, if one order contains a thousand
products, there would be about five hundred thousand two-way combinations
in just that one order—versus 185,791 in all the orders data. As the number of
products in the largest order increases, the number of combinations increases
much faster.subject to the conditions:
The two products in the pair are different
No two combinations have the same two products.
The approach for calculating the combinations is to do a self-join on the Orderline
table, with duplicate product pairs removed. The goal is to get all pairs of
products
The first condition is easily met by filtering out any pairs where the two products
are equal. The second condition is also easily met, by requiring that the
first product id be smaller than the second product id. The following query
generates all the combinations in a subquery and counts the number of orders
containing each one:
SELECT p1, p2, COUNT(*) as numorders
FROM (SELECT op1.orderid, op1.productid as p1, op2.productid as p2
FROM (SELECT DISTINCT orderid, productid FROM orderline) op1 JOIN
(SELECT DISTINCT orderid, productid FROM orderline) op2
ON op1.orderid = op2.orderid AND
op1.productid < op2.productid
) combinations
GROUP BY p1, p2
source Data Analysis Using SQL and Excel
Try using the following commnand:
SELECT T1.orderID,T1.productId,T2.productID,Count(*) as Occurence
FROM TBL T1 INNER JOIN TBL T2
ON T1.orderid = T2.orderid
WHERE t1.productid > T2.productId
GROUP BY T1.orderID,T1.productId,T2.productID
ORDER BY Occurence DESC
SQL fiddle

How to join 2 queries with different number of records and columns in oracle sql?

I have three tables:
Employee_leave(EmployeeID,Time_Period,leave_type)
Employee(EID,Department,Designation)
leave_eligibility(Department,Designation, LeaveType, LeavesBalance).
I want to fetch the number of leaves availed by a particular employee in each LeaveTypes(Category) so I wrote following query Query1
SELECT LEAVE_TYPE, SUM(TIME_PERIOD)
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID=78
GROUP BY LEAVE_TYPE
order by leave_type;
output for Query1
Leave_Type | SUM(Time_Period)
Casual 1
Paid 4
Sick 1
I want to fetch the number of leaves an employee is eligible for each leave_type(category). Following query Query2 gives the desire result.
Select UNIQUE Leavetype,LEAVEBALANCE
from LEAVE_ELIGIBILITY
INNER JOIN EMPLOYEE
ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT
AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID=78
order by leavetype;
output for Query2
LeaveType | LeaveBalance
Casual 10
Paid 15
Privlage 6
Sick 20
Now I want to join these 2 queries Query1 and Query2 or create view which displays records from both queries. Also as you can see from output there are different no. of records from different queries. For a record which is not present in output of query1, it should display 0 in final output. Like in present case there is no record in output of query1 like privlage but it should display 0 in Sum(time_period) in Privlage of final output. I tried creating views of these 2 queries and then joining them, but I'm unable to run final query.
Code for View 1
create or replace view combo_table1 as
Select UNIQUE Leavetype,LEAVEBALANCE,EMPLOYEE.DEPARTMENT,EMPLOYEE.DESIGNATION, EID
from LEAVE_ELIGIBILITY
INNER JOIN EMPLOYEE
ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT
AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID='78';
Code for View 2
create or replace view combo_table2 as
SELECT LEAVE_TYPE, SUM(TIME_PERIOD) AS Leave_Availed
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID='78'
GROUP BY LEAVE_TYPE;
Code for joining 2 views
SELECT combo_table1.Leavetype, combo_table1.LEAVEBALANCE, combo_table2.leave_availed
FROM combo_table1 v1
INNER JOIN combo_table2 v2
ON v1.Leavetype = v2.LEAVE_TYPE;
But I'm getting "%s: invalid identifier" while executing the above query. Also I know I can't use union as it requires same column which here it is not.
I'm using Oracle 11g, so please answer accordingly.
Thanks in advance.
Desired final output
LeaveType | LeaveBalance | Sum(Time_period)
Casual 10 1
Paid 15 4
Privlage 6 0
Sick 20 1
To get the final desired output ...
"For a record which is not present in output of query1, it should display 0 in final output. "
... use an outer join to tie the taken leave records to the other tables. This will give zero time_duration for leave types which the employee has not taken.
select emp.Employee_ID
, le.leavetype
, le.leavebalance
, sum (el.Time_Duration) as total_Time_Duration
from employee emp
inner join leave_eligibility le
on le.department= emp.department
and le.designation= emp.designation
left outer join Employee_leave el
on el.EmployeeID = emp.Employee_ID
and el.leave_type = le.leavetype
group by emp.Employee_ID
, le.leavetype
, le.leavebalance
;
Your immediate problem:
I'm getting "%s: invalid identifier"
Your view has references to a column EID although none of your posted tables have a column of that name. Likewise there is confusion between Time_Duration and time_period.
More generally, you will find life considerably easier if you use the exact same name for common columns (i.e. consistently use either employee_id or employeeid, don't chop and change).
Try this examle:
with t as (
select 'Casual' as Leave_Type, 1 as Time_Period, 0 as LeaveBalance from dual
union all
select 'Paid', 4,0 from dual
union all
select 'Sick', 1,0 from dual),
t1 as (
select 'Casual' as Leave_Type, 0 as Time_Period, 10 as LeaveBalance from dual
union all
select 'Paid', 0, 15 from dual
union all
select 'Privlage', 0, 6 from dual
union all
select 'Sick', 0, 20 from dual)
select Leave_Type, sum(Time_Period), sum(LeaveBalance)
from(
select *
from t
UNION ALL
select * from t1
)
group by Leave_Type
Ok, edit:
create or replace view combo_table1 as
Select UNIQUE Leavetype, 0 AS Leave_Availed, LEAVEBALANCE
from LEAVE_ELIGIBILITY INNER JOIN EMPLOYEE ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID='78';
create or replace view combo_table2 as
SELECT LEAVE_TYPE as Leavetype, SUM(TIME_PERIOD) AS Leave_Availed, 0 as LEAVEBALANCE
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID='78'
GROUP BY LEAVE_TYPE, LEAVEBALANCE;
SELECT Leavetype, sum(LEAVEBALANCE), sum(leave_availed)
FROM (
select *
from combo_table1
UNION ALL
select * from combo_table2
)
group by Leavetype;

sql Left join giving duplicate multiple values

Friends I am unable to fetch correct result Please suggest , Thanks in Advance .
I have two tables and trying to get balance quantity
One is Purchase Table (purchase_detail)
Pur_Date Item_Id Pur_Qty
2014-10-08 12792 25
2014-11-01 133263 20
2014-10-01 133263 2
2014-11-20 12792 10
Second is Sale Table (sale_detail)
Sale_Date Item_Id Sale_Qty
2014-11-17 133263 -6
2014-11-05 12792 -1
2014-11-24 133263 -2
2014-10-28 12792 -6
2014-11-05 133263 -2
After using left join
SQL :
select a.pur_item, sum(a.pur_qty + b.sold_qty ) as bal_qty
from purchase_item_qty_amount a left join sale_item_qty_amount b
on a.pur_item =b.sale_item where a.pur_item IN( 12792,133263)
group by 1;
Result - But it's incorrect
Item_Id Bal_qty
12792 56
133263 46
Result - It should be
Item_Id Bal_qty
12792 28
133263 12
Try this untested query:
select pur_item, sum(as bal_qty) from (
select pur_item, sum(a.pur_qty) as bal_qty
from purchase_item_qty_amount a
group by 1
union
select pur_item, sum(b.sold_qty)as bal_qty
from sale_item_qty_amount b
group by 1)
group by 1
You have to use union instead of join as the left rows have multiple matches to the right which cause too much data to be calculated. Try running your query without the sum and group by to see the raw data being calculated and you will see what I mean.
Your query with no group/sum:
select a.pur_item, a.pur_qty, b.sold_qty from purchase_item_qty_amount a left join sale_item_qty_amount b on a.pur_item = b.sale_item where a.pur_item IN (12792,133263);
Here is a query that works as you would expect:
select item, sum(a.pur_qty) + sum(a.sold_qty) as bal_qty from (select pur_item as item, sum(pur_qty) as pur_qty, 0 as sold_qty from purchase_item_qty_amount where pur_item in (12792,133263) group by pur_item union select sale_item as item, 0 as pur_qty, sum(sold_qty) as sold_qty from sale_item_qty_amount where sale_item in (12792,133263) group by sale_item) a group by item;
Here is the full dummy of your question,try this one you will get your desired result:-
purchase table
create table pt(Pur_Date date,Item_Id number, Pur_Qty number)
insert into pt values('08.oct.2014',12792,25);
insert into pt values('01.nov.2014',133263,20);
insert into pt values('01.oct.2014',133263,2);
insert into pt values('20.nov.2014',12792,10);
sales table
create table st(Sale_Date date , Item_Id number, Sale_Qty number)
insert into st values('17.nov.2014',133263,-6);
insert into st values('05.nov.2014',12792,-1);
insert into st values('24.nov.2014',133263,-2);
insert into st values('28.oct.2014',12792,-6);
insert into st values('05.nov.2014',133263,-2);
select purchase.Item_Id,(purchase.pur_qty+sale.sale_qty) bal_qty
from
(
select a.Item_Id, sum(a.pur_qty) pur_qty
from pt a
group by a.Item_Id
) purchase
left join
(
select b.Item_Id,
sum(b.sale_qty ) sale_qty
from st b
group by b.Item_Id
) sale
on sale.Item_Id = purchase.Item_Id
where purchase.Item_Id in (12792,133263)
order by 1;

Sql data retrieval issue

I have below tables:
Order
Order_id orde_number Order_name
1 12345 iphone
2 67891 samsung
order_event
order_event_no status
1 D
1 C
2 C
I wrote below query to retrieve status not in ('D') like below ,But it gave me 2 records ,
But query should not return because order_no 1 already as status D, even though it has second record C it should not include.
select o.order_number,o.order_name
from order o
join order_event oe
on (o.order_id=oe.order_event_no) where oe.status not in ('D')
Regards,
Chaitu
This will accomplish what you want with your given schema / data...
SELECT order_number, order_name
FROM order
WHERE order_id NOT IN (SELECT order_event_no FROM order_event WHERE status = 'D')
If you want to exclude any order who has a status like 'D' you need a subquery.
select o.order_number,o.order_name
from order o
where oe.order_event_no
NOT IN
(SELECT order_event_no FROM order_event_no WHERE status = 'D')
This is equivalent. Some RDBMs will execute it faster:
Select
o.order_number,
o.order_name
from
order o
where
not exists (
select
'x'
from
order_event oe
where
oe.order_event_no = o.order_id And
oe.status = 'D'
);