I’m looking for the syntax to return only products whose latest process date had their transactions status as “Paid”
So something like…
Select Products
From Table 1
Where MAX(Process_date) … *(as I don’t know what to do here)*
AND Transactions IN ‘Paid’
AND product_key = z.product_key
...This THEN will be used as a nested query to attach with another who has Z as its indicator.. a little help?
One method is a correlated subquery:
Select t.*
From Table1 t
where t.process_date = (select max(t2.process_date)
from t t2
where t2.product_key = t.product_key
) and
t.status = 'Paid';
If you just want the product key, then there is a fun method using aggregation:
select product_key
from table1
group by product_key
having max(process_date) = max(case when t.status = 'Paid' then process_date end);
This tests if the largest process_date is the process_date on a paid status.
Related
Let's say I have a dataset sample (table 1) as shown below -
Here, one customer can use multiple tokens and one token can be used by multiple customers. I am trying to get for each token, customer and creation date of the record, the number of customers used this token before the creation date.
When I am trying to execute this query in Spark SQL, I am getting the following error -
Option 1 (correlated subquery)
SELECT
t1.token,
t1.customer_id,
t1.creation_date,
(SELECT COUNT(DISTINCT t2.customer_id) FROM Table 1 t2
AND t1.token = t2.token
AND t2.creation_date < t1.creation_date) cust_cnt
FROM Table 1 t1;
Error: Correlated column is not allowed in a non-equality predicate
Option 2 (cross - join)
SELECT
t1.token,
t1.customer_id,
t1.creation_date,
COUNT(DISTINCT t2.customer_id) AS cust_cnt
FROM Table 1 t1, Table 1 t2
WHERE t1.token = t2.token
AND t2.creation_date < t1.creation_date
GROUP BY t1.token, t1.customer_id, t1.creation_date;
Problem: Long running query since Table 1 has millions of rows
Is there any workaround (for eg. using window function) to optimize this query in Spark SQL? Note: window functions does not allow distinct count.
Count the first time a customer appears:
SELECT t1.token, t1.customer_id, t1.creation_date,
SUM(CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY token ORDER BY creation_date) as cust_cnt
FROM (SELECT t1.*,
ROW_NUMBER() OVER (PARTITION BY token, customer_id ORDER BY creation_date) as seqnum
FROM Table1 t1
) t1;
Note: This is also counting the current row. I'm guessing that is acceptable for what you want to do.
Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?
You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T
In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.
Having a bit of trouble when trying to figure out how to return a query of a customer who ordered both A and B
What I'm looking for is all customers who order both product A and product B
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING COUNT(distinct product) = 2
I don't normally post code only answers but there isn't a lot that words can add to this- the query predominantly explains itself
You can also
HAVING max(product) <> min(product)
It may be worth pointing out that in queries, the WHERE is performed, filtering to just products A and B. Then the GROUP BY is performed, grouping customer and counting the distinct number of products (or getting the min and max). Then the HAVING is performed, filtering to just those with 2 distinct products (or getting only those where MIN i.e. A, is different to MAX i.e. B)
If you'v never encountered HAVING, it is logically equivalent to:
SELECT CustomerID
FROM(
SELECT CustomerID, COUNT(distinct product) as count_distinct_product
FROM table
WHERE product in ('a','b')
GROUP BY customerid
)z
WHERE
z.count_distinct_product = 2
In a HAVING clause you can only refer to columns that are mentioned in the group by. You can also refer to aggregate operations (such as count/min/max) on other columns not mentioned in the group by
I have never worked with SQLLite, but since it's specs say it is a Relational Database, it should allow the following query.
select CustomerID
from table t
where exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'A'
)
and exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'B'
)
I'd use a correlated sub-query with a HAVING clause to scoop in both products in a single WHERE clause.
SELECT
t.Customer
FROM
#t AS t
WHERE
EXISTS
(
SELECT
1
FROM
#t AS s
WHERE
t.Customer = s.Customer
AND s.Product IN ('A', 'B')
HAVING
COUNT(DISTINCT s.Product) = 2
)
GROUP BY
t.Customer;
Select customerid from table group by customerid having product like 'A' and product like 'B' or
you can try having count(distinct product) =2this seems to be more accurate.
The whole idea is in a group of customerid suppose 1 if I have several A's and B's count(distinct product) will give as 2 else it will be 1 so the answer is as above.
Another way I just figured out was
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING sum(case product ='a' then 1 else 0 end) > 0
and sum(case when product ='b' then 1 else 0 end) > 0
The goal of this select is to get the latest score for a system that is in status = 'FD'. I want to get the ID of the row (id), the system ID (sys_id), and the score (score).
The following SQL gives me the id of the system (sys_id) as well as the score (score), but I also would like to get the id column associated with this score and sys_id. Hopefully that makes sense.
select sys_id, score from example
where (sys_id, end_date) in
(
select sys_id, max (end_date)
from example
where status = 'FD'
group by sys_id
);
Here is a SQL Fiddle to give you an idea of what I am talking about http://www.sqlfiddle.com/#!4/169a2/3
Before you ask, yes the combination of sys_id and end_date would give me a unique row and I could find the id that way, but I would rather get the id in my select statement.
You can use a simple CTE to get the max date for each SYS_ID, and join that back to your table to get all the details for that particular record.
with CTE as (
select sys_id, max (end_date) as MaxDate
from example
where status = 'FD'
group by sys_id)
select
EXAMPLE.*
from
EXAMPLE
INNER JOIN CTE
ON EXAMPLE.SYS_ID = CTE.SYS_ID
and EXAMPLE.END_DATE = CTE.MaxDate
Check out the change to your SQL Fiddle
answer from comment. SUbquery a is from your statement...lazy programming on my part.
select a.*, e.score
from
(
select sys_id, max (end_date) as 'ed'
from example
where status = 'FD'
group by sys_id
)a
inner join example e on a.ed = e.end_date and a.sys_id = e.sys_ID
Works on the predicate that there is only one unqiue value for a given sys_id and end date. Multiple end dates will return multiple rows in a cross join format.
Say I have a table "transactions" that has columns "acct_id" "trans_date" and "trans_type" and I want to filter this table so that I have just the last transaction for each account. Clearly I could do something like
SELECT acct_id, max(trans_date) as trans_date
FROM transactions GROUP BY acct_id;
but then I lose my trans_type. I could then do a second SQL call with my list of dates and account id's and get my trans_type back but that feels very cludgy since it means either sending data back and forth to the sql server or it means creating a temporary table.
Is there a way to do this with a single query, hopefully a generic method that would work with mysql, postgres, sql-server, and oracle.
This is an example of a greatest-n-per-group query. This question comes up several times per week on StackOverflow. In addition to the subquery solutions given by other folks, here's my preferred solution, which uses no subquery, GROUP BY, or CTE:
SELECT t1.*
FROM transactions t1
LEFT OUTER JOIN transactions t2
ON (t1.acct_id = t2.acct_id AND t1.trans_date < t2.trans_date)
WHERE t2.acct_id IS NULL;
In other words, return a row such that no other row exists with the same acct_id and a greater trans_date.
This solution assumes that trans_date is unique for a given account, otherwise ties may occur and the query will return all tied rows. But this is true for all the solutions given by other folks too.
I prefer this solution because I most often work on MySQL, which doesn't optimize GROUP BY very well. So this outer join solution usually proves to be better for performance.
This works on SQL Server...
SELECT acct_id, trans_date, trans_type
FROM transactions a
WHERE trans_date = (
SELECT MAX( trans_date )
FROM transactions b
WHERE a.acct_id = b.acct_id
)
Try this
WITH
LastTransaction AS
(
SELECT acct_id, max(trans_date) as trans_date
FROM transactions
GROUP BY acct_id
),
AllTransactions AS
(
SELECT acct_id, trans_date, trans_type
FROM transactions
)
SELECT *
FROM AllTransactions
INNER JOIN LastTransaction
ON AllTransactions.acct_id = LastTransaction.acct_id
AND AllTransactions.trans_date = LastTransaction.trans_date
select t.acct_id, t.trans_type, tm.trans_date
from transactions t
inner join (
SELECT acct_id, max(trans_date) as trans_date
FROM transactions
GROUP BY acct_id;
) tm on t.acct_id = tm.acct_id and t.trans_date = tm.trans_date