T-SQL query to find the required output - sql

I am new to SQL queries, I have some data and I am trying to find the result which is shown below.
In my sample data, I have customer ID repeating multiple times due to multiple locations, What I am looking to do is create a query which gives output shown in image output format,
If customer exists only once I take that row
If customer exists more than once, I check the country; if Country = 'US', I take that ROW and discard others
If customer exists more than once and country is not US, then I pick the first row
PLEASE NOTE: I Have 35 columns and I dont want to change the ROWS order as I have to select the 1st row in case customer exist more than once and country is not 'US'.
What I have tried: I am trying to do this using rank function but was unsuccessful. Not sure if my approach is right, Please anyone share the T-SQL query for the problem.
Regards,
Rahul
Sample data:
Output required :

I have created a (short) dbfiddle
Short explanation (to just repeat the code here on SO):
Step1:
-- select everyting, and 'US' as first row
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END;
Step2:
-- filter only rows which are first row...
SELECT *
FROM (
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
-- ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END
) x
WHERE x.R=1

I can't vouch for performance but it should work on SQL Server 2005. Assuming your table is named CustomerData try this:
select cust_id, country, Name, Sales, [Group]
from CustomerData
where country = 'US'
union
select c.* from CustomerData c
join (
select cust_id, min(country) country
from CustomerData
where cust_id not in (
select cust_id
from CustomerData
where country = 'US'
)
group by cust_id
) a on a.cust_id = c.cust_id and a.country = c.country
It works by finding all those with a record with US as the country and then unioning that with the first country from every record that doesn't have the US as a country. If min() isn't getting the country you want then you'll need to find an alternative aggregation function that will select the country you want.

Related

How to write SQL query without join?

Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?
You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T
In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.

Adding values with condition in google bigquery

I need to add some values with a condition in GoogleBigQuery
NOTICE: I edited the original question since it was not accurate enough.
Thanks to the two participants who have tried to help me.
I tried to apply the solutions kindly suggested by you but I got the same result from the pct column as a result.
Something like this:
results
Here is the more detailed definition:
TABLE
Columns:
Shop: Shop location
brand: Brands of cars sold at shoplocation
sales: sales of each brand sold at each shop_location
rank: Rank of each brand per shop location (the biggest the greater)
total_sales_shop: SUM of all brand sales per shop location
pct: percentage of sales by brand in relationship with shop location
pct_acc:
What i need to calc is pct_acc which is the cumulative sum of the shops by rank (while it has no relation with brand)
PCT_ACC
My need is to reach something like PCT_ACC, and then save the results in another one like this:endtable
You can use following query to get the required data:
select values, rank,
sum(case when rank<2 then values else 0 end) condition1
from table
group by values, rank
Need to add/remove columns from select and group by as per requirement
To get the cumulative sum you can use following query:
select shop, brand, sales, rank, total_sales_shop, pct ,
sum(pct) over (partition by shop order by rank) as pct_act
from data
And to get the final table you can use combination of case statement and group by
e.g
select shop,
max(case when rank=1 then pct_act end) as rank_1,
max(case when rank=2 then pct_act end) as rank_2,
max(case when rank=3 then pct_act end) as rank_3,
max(case when rank=4 then pct_act end) as rank_4,
max(case when rank=5 then pct_act end) as rank_5
from cumulative_sum
group by shop
If you want only the final sum for the rows with that condition you can try:
SELECT
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
If you want to get the rank and the sum only for the rows with that condition you can try doing
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
WHERE
RANK < 2
GROUP BY
Rank
Finally, if you want to get the rank and the sum considering all the rows you can try doing:
SELECT
Rank,
SUM (IF(Rank < 2, Values, 0) AS condition1
FROM table
GROUP BY
Rank
I hope it helps

SQL Count of two values in one column

I have a table with customer name and Status columns. The status column has two values
Test
Live
The customers appear more than once and can be classed as either test, live or BOTH like below:
**Customer | Status**
Logistics | Test
Logistics | Live
Ample | Live
What I want is a query to give me a count of the number of distinct customers who fall under both statuses. So using the above table, I would count customer logistics (since it has both test and live) but not ample (since it is just live).
Any ideas?
You can use group by clause :
select Customer, count(*)
from table t
group by Customer
having min(status) <> max(status);
If you want it with specific status then include where clause :
select Customer, count(*)
from table t
where status in ('Test', 'Live')
group by Customer
having count(distinct status) = 2;
EDIT : If you want other columns too, then i would prefer :
select t.*
from table t
where exists (select 1 from table t1 where t1.Customer = t.Customer and t1.status <> t.status);
try something like this:
select customer
from
(select customer, max(IsTest) as IsTest , max(IsLive) as IsLive
from
(select customer,
case when status='test' then 1 else 0 end as IsTest,
case when status='live' then 1 else 0 end as IsLive
from table) a
group by customer) b
where IsTest = 1 and IsLive = 1
you can use group by clause to get your desire output.
select Customer, count(*)
from table t
group by Customer
having min(status) <> max(status);

SQL querying a customer ID who ordered both product A and B

Having a bit of trouble when trying to figure out how to return a query of a customer who ordered both A and B
What I'm looking for is all customers who order both product A and product B
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING COUNT(distinct product) = 2
I don't normally post code only answers but there isn't a lot that words can add to this- the query predominantly explains itself
You can also
HAVING max(product) <> min(product)
It may be worth pointing out that in queries, the WHERE is performed, filtering to just products A and B. Then the GROUP BY is performed, grouping customer and counting the distinct number of products (or getting the min and max). Then the HAVING is performed, filtering to just those with 2 distinct products (or getting only those where MIN i.e. A, is different to MAX i.e. B)
If you'v never encountered HAVING, it is logically equivalent to:
SELECT CustomerID
FROM(
SELECT CustomerID, COUNT(distinct product) as count_distinct_product
FROM table
WHERE product in ('a','b')
GROUP BY customerid
)z
WHERE
z.count_distinct_product = 2
In a HAVING clause you can only refer to columns that are mentioned in the group by. You can also refer to aggregate operations (such as count/min/max) on other columns not mentioned in the group by
I have never worked with SQLLite, but since it's specs say it is a Relational Database, it should allow the following query.
select CustomerID
from table t
where exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'A'
)
and exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'B'
)
I'd use a correlated sub-query with a HAVING clause to scoop in both products in a single WHERE clause.
SELECT
t.Customer
FROM
#t AS t
WHERE
EXISTS
(
SELECT
1
FROM
#t AS s
WHERE
t.Customer = s.Customer
AND s.Product IN ('A', 'B')
HAVING
COUNT(DISTINCT s.Product) = 2
)
GROUP BY
t.Customer;
Select customerid from table group by customerid having product like 'A' and product like 'B' or
you can try having count(distinct product) =2this seems to be more accurate.
The whole idea is in a group of customerid suppose 1 if I have several A's and B's count(distinct product) will give as 2 else it will be 1 so the answer is as above.
Another way I just figured out was
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING sum(case product ='a' then 1 else 0 end) > 0
and sum(case when product ='b' then 1 else 0 end) > 0

Show duplicate rows(all columns of that row) where all columns are duplicate except one column

In below table, I need to select duplicate records where all columns are duplicate except Customer Type and Price for a particular week.
For e.g
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
2 John Motor Consumer 200
3 John Motor Consumer 200
3 John Motor Reseller 201
I am using below query but this query doesn't show me both costumer type, it just shows me consumer count(*) for a combination.
select Week, Customer, product, count(distinct Customer Type)
from table
group by Week, Customer, product
having count(distinct Customer Type) > 1
I would like to see below result, that shows me duplicate values and not just the count(*) of duplicate row. I am trying to see customers assigned to multiple customer types in a particular week for a product and at the same time show me all columns. It doesn't matter if the price is different.
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
3 John Motor Consumer 200
3 John Motor Reseller 201
Thanks
Shaki
WITH CustomerDistribution_CTE (WeekC ,CustomerC, ProductC)
AS
(
select Week, Customer, product
from Your_Table_Name group by Week, Customer,
product having count(distinct CustomerType) > 1
)
SELECT Y.*
FROM CustomerDistribution_CTE C
inner join Your_Table_Name Y on C.WeekC =Y.Week
and C.CustomerC =Y.Customer and C.productC =Y.product
Note :Please replace "Your_Table_Name" with exact table name and Try.
One way to achieve this, using generic SQL, is to use a "derived table" like this:
select x.*
from tablex x
inner join (
select Week, Customer, Product
from tablex
group by Week, Customer, Product
having count(*) > 1
) d on x.Week = d.Week and x.Customer = d.Customer and x.Product = d.Product
You can do that by using DISTINCT like
select DISTINCT Customer,Product,Customer_Type,Price from Your_Table_Name
will look for DISTINCT combination.
Note: This query if of SQL Server
From the expected result that you have pasted, it looks like you are not concerned about the week.
If you have a ID (incremental PK), it would be much simpler like below
select * from table where ID not in
(select max(ID) from table group by Customer, Product, CustomerType having count(*) > 1 )
This is tested on MySQL. Do you have a ID column?
In case you don't have a ID column, try the below:
select max(week) week, Customer, Product, CustomerType, max(price) from device group by Customer, Product, CustomerType;
I have not verified this one.
This will return your expected result set:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify
count(*)
over (partition by Week, Customer, product) > 1
For other DBMSes you will need to nest your query:
select *
from
(
select ...,
count(*)
over (partition by Week, Customer, product) as cnt
from table
) as dt
where cnt > 1
Edit:
After re-reading your description above Select might be not exactly what you want, because it will also return rows with a single type. Then switch to:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify -- at least two different types:
min(Customer_Type) over (partition by Week, Customer, product)
<> max(Customer_Type) over (partition by Week, Customer, product)