Redshift - Finding number of times a particular values is store in a column in table - sql

Given below is a view of my table that stores sale data along with the detail whether the customer is a new customer or not. I am trying to find if the same sale_id has multiple entries for the same customer and tags him as a new customer. Given below is a sample view of the table
cust_id,prod_id,sale_id,is_new_cust,store_type
1,prod_a,1001,t,store
2,prod_a,1002,,online
3,prod_a,1003,t,store
3,prod_a,1003,t,store
I need to find how many such customers exist that have the tag of is_new_cust for the same sale_id.
Given below is the SQL I tried:
select cust_id,count(is_new_cust) from sales
where store_type = 'store' and is_new_cust='t'
group by cust_id having count(is_new_cust)> 1;
Expected output:
cust_id,count
3,2
The above SQL returns 1 no results.
I am using Amazon Redshift DB for the above.
Could anyone help me find where am I going wrong with the query. Thanks..

I think you want:
select count(distinct cust_id) as num_customers,
count(distinct case when is_new_cust = 't' then cust_id end) as num_new_customers
from sales
where store_type = 'store';
However, this might be a simpler version that does what you want:
select count(*)
from sales
where store_type = 'store' and is_new_cust = 't';
This assumes the is_new_cust flag is set only once for a new customer.

Related

Delete duplicates using dense rank

I have a sales data table with cust_ids and their transaction dates.
I want to create a table that stores, for every customer, their cust_id, their last purchased date (on the basis of transaction dates) and the count of times they have purchased.
I wrote this code:
SELECT
cust_xref_id, txn_ts,
DENSE_RANK() OVER (PARTITION BY cust_xref_id ORDER BY CAST(txn_ts as timestamp) DESC) AS rank,
COUNT(txn_ts)
FROM
sales_data_table
But I understand that the above code would give an output like this (attached example picture)
How do I modify the code to get an output like :
I am a beginner in SQL queries and would really appreciate any help! :)
This would be an aggregation query which changes the table key from (customer_id, date) to (customer_id)
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(txn_ts) as count_purchase_dates
FROM
sales_data_table
GROUP BY
cust_xref_id
You are looking for last purchase date and count of distinct transaction dates ( like if a person buys twice, it should be considered as one single time).
Although you mentioned you want count of dates but sample data shows you want count of distinct dates - customer 284214 transacted 9 times but distinct will give you 7.
So, here is the SQL you can use to get your result.
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(distinct txn_ts) as count_purchase_dates -- Pls note distinct will count distinct dates
FROM sales_data_table
GROUP BY 1

SQL Select values of a column only if they are associated with all values of another subset, without using multiple queries

I am sorry if this is a very basic question, but I have not been able to find an answer anywhere :
Let's say I want to get the products that are sold at both stores a and b, with a single table that associates products and stores. In class, we have been taught this kind of solution :
select product_id
from product
where store = "a"
and product in (
select product_id
from product
where store = "b"
)
This however seems very inefficient to me, especially in a case where a very large table would be used. Is there any way to make a single request that satisfies both conditions, without nesting ?
It feels like there has to be, but being very new to sql I actually have no idea, and I have found this type of thing extremely hard to look up. No matter what I search, I find answers to similar problems that don't answer my question.
I would approach this by first filtering for the required criteria followed by a corresponding distinct count:
select product_id
from product
where store in ('a','b')
group by product_id
having Count(distinct store)=2;
if I understand correctly you can try to use condition aggregate function in HAVING
SELECT product_id
FROM product
GROUP BY product_id
HAVING COUNT(CASE WHEN store = 'a' THEN 1 END) > 0
AND COUNT(CASE WHEN store = 'b' THEN 1 END) > 0

Repeat a query over the parameter list

I would like to iterate the same query while using different parameter values from a predefined list.
Say, I have a table with two columns. The first columns contains customer name. The second column contains customer spending.
###CUSTOMER; SPENDING###
customer1; 1000
customer2; 111
customer3; 100
customer1; 323
...
I know the complete list of customers: customerlist = {customer1, customer2, customer3}.
I would like to do something like:
Select sum(spending)
from mytable
where customer = #customerlist
The query should compute the sum of spending for each customer defined in the customer list. I have found some examples of sql procedures with stored parameters but not the case with one parameter of multiple values.
Thank you
P.S. This is just a hypothetical example to illustrate my question (I know it would be much more effective to use here a simple group by).
You can use nested query like this
SELECT CustomerList.CustomerName Cust, isnull((SELECT SUM(Spending) CustSpending
FROM Customer
WHERE Customer.CustomerName = CustomerList.CustomerName),0)
FROM CustomerList
This would normally be done using GROUP BY:
Select customer, sum(spending)
from mytable
group by customer;
GROUP BY is a very fundamental part of SQL. You should review your knowledge of SQL so you understand how to use it.

Oracle SQL Update a column based on another table but based on a date

Learning SQL so please forgive me.
I have a table that holds many accounts(and sub accounts) and another table that holds many orders they are custom tables taken from different databases.
What I am trying to do is update order_amt on the account table with the order_val value from the orders table but if there are more than one order with that account I want the order amount from the earliest order date only.
Account Table
Acc _Num..........Comp_Name.......Order_Amt.......Int_Id
123456789-1.....ABC Ltd.......................................123456789
123456789-2.....ABC Ltd.......................................123456789
987654321-1.....Xyz Ltd.........................................987654321
987654321-2.....Xyz Ltd.........................................987654321
Orders Table
Order_Num.....Order_Dt.....Order_Val.....Acc_num
1......................01/01/13......£20.00...........123456789
2......................01/01/14......£10.00...........123456789
3......................01/01/10......£100.00..........987654321
4......................01/01/11......£200.00..........987654321
So the order_amt for accounts 123456789-1 & 2 = £20.00 and from 987654321-1 & 2 would be £100.00.
UPDATE accounts a
SET a.order_amt =
(
SELECT order_val
FROM orders o
WHERE a.int_id = o.acc_num
AND EXISTS
(
SELECT MIN(order_dt)
FROM orders oa
WHERE o.order_num = oa.order_num
);
I am getting a few errors including the error that more than one row returned? Could anyone please help me?
kind Regards
Eden
I think you need something like this:
UPDATE accounts a
SET a.order_amt =
(
SELECT order_val
FROM orders o
WHERE a.int_id = o.acc_num
GROUP BY order_val
HAVING order_dt = MIN(order_dt);
);
Create a FK on the column account_number inside of the orders table and have it reference the auto-increment account id of the accounts table.
Then as Roger suggest, with the slight modification of the column int_id AND a modification of the HAVING clause
UPDATE accounts a
SET a.order_amount =
(
SELECT order_value
FROM orders o
WHERE a.account_number = o.account_number
GROUP BY order_value
HAVING MIN(order_date)
)
This will not fix the error that is occurring with more than one row being returned though since there are multiple orders with the same account id attached to them. You will need to be more specific about how what you are needing. Do you want the average of all order amounts to equal the order value in the accounts table? Or maybe the MAX/MIN value?

Select records for MySQL only once based on a column value

I have a table that stores transaction information. Each transaction is has a unique (auto incremented) id column, a column with the customer's id number, a column called bill_paid which indicates if the transaction has been paid for by the customer with a yes or no, and a few other columns which hold other information not relevant to my question.
I want to select all customer ids from the transaction table for which the bill has not been paid, but if the customer has had multiple transactions where the bill has not been paid I DO NOT want to select them more than once. This way I can generate that customer one bill with all the transactions they owe for instead of a separate bill for each transaction. How would I build a query that did that for me?
Returns exactly one customer_id for each customer with bill_paid equal to 'no':
SELECT
t.customer_id
FROM
transactions t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
Edit:
GROUP BY summarises your resultset.
Caveat: Every column selected must be either 'grouped by' or aggregated in some fashion. As shown by nikic you could use SUM to get the total amount owed, e.g.:
SELECT
t.customer_id
, SUM(t.amount) AS TOTAL_OWED
FROM
transactions AS t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
t is simply an alias.
So instead of typing transactions everywhere you can now simply type t. The alias is not necessary here since you query only one table, but I find them invaluable for larger queries. You can optionally type AS to make it more clear that you're using an alias.
You might try the Group By operator, eg group by the customer.
SELECT customer, SUM(toPay) FROM .. GROUP BY customer