SQL Select unique combinations of rows for other column value - sql

I’m trying to do an analysis of the different combinations of taxes per invoice to identify how many scenarios exist.
In the tax table, column 1 is invoiceNo, column 2 is taxType. These form the composite key. There can be 1 or more taxType per invoiceNo. Example of data:
https://i.imgur.com/bcQc7vY_d.jpg?maxwidth=640&shape=thumb&fidelity=medium (Sorry but i’m new so can’t add picture).
I want to be able to report on unique taxType for any invoiceNo. Ie, 1 A is unique comb 1, 2 AB is unique comb 2, 3 A is disregarded as already returned for 1, and 4 BC is unique comb 3.
Not sure if this makes sense! Finding it hard to articulate what I’m after!
Expected output would be:
A
AB
BC

The original version of this question was tagged MySQL, so this answers the question.
If I understand correctly, you can use group_concat():
select distinct group_concat(taxtype order by taxtype)
from t
group by invoiceno;

This works with the table you have given and would work with those combinations of Tax types even if they repeat but if there are more tax codes, or there is an AC combination, or if some of the given combinations are omitted then it might get little different! You could develop this to suit the conditions, or you could give some more info: Do invoices have three codes (ABC)? do invoices have just B or just C codes? I notice that the BC invoice etc
WITH CTE (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY a.InvoiceNo),a.InvoiceNo,a.TaxType,b.TaxType
FROM UniqueCombo a INNER JOIN UniqueCombo b ON a.InvoiceNo=b.InvoiceNo
)
,
CTE2 (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT * FROM CTE WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY InvoiceNo HAVING COUNT(InvoiceNo)=1
)
)
SELECT TT1 FROM CTE2 WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY TT1,TT2 HAVING COUNT(InvoiceNo)>1
)
UNION
SELECT TT1+''+TT2 FROM CTE WHERE RN IN
(
SELECT MAX(RN)-1 FROM CTE WHERE TT1<>TT2 GROUP BY InvoiceNo
)

You can try STRING_AGG. Something like:
SELECT DISTINCT TaxTypeString
FROM
(
SELECT InvoiceNo, STRING_AGG(TaxType, '') AS TaxTypeString
FROM t
GROUP BY InvoiceNo
) x
ORDER BY TaxTypeString
The nested query, called x, should give you one row per invoice number, in the format you want. Then you have to select the distinct tax types from there.

Related

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Joining data from two sources using bigquery

Can anyone please check whether below code is correct? In cte_1, I’m taking all dimensions and metrics from t1 excpet value1, value2, value3. In cte_2, I’m finding the unique row number for t2. In cte_3, I’m taking all distinct dimensions and metrics using join on two keys such as Date, and Ad. In cte_4, I’m taking the values for only row number 1. I’m getting sum(value1),sum(value2),sum(value3) correct ,but sum(value4) is incorrect
WITH cte_1 AS
(SELECT *except(value1, value2, value3) FROM t1 where Date >"2020-02-16" and Publisher ="fb")
-- Find unique row number from t2--
,cte_2 as(
SELECT ROW_NUMBER() OVER(ORDER BY Date) distinct_row_number, * FROM t2
,cte_3 as
(SELECT cte_2.*,cte_1.*except(Date) FROM cte_2 join cte_1
on cte_2.Date = cte_1. Date
and cte_2.Ad= cte_1.Ad))
,cte_4 AS (
(SELECT *
FROM
(
SELECT *,
row_number() OVER (PARTITION BY distinct_row_number ORDER BY Date) as rn
FROM cte_3 ) T
where rn = 1 ))
select sum(value1),sum(value2),sum(value3),sum(value4) from cte_4
Please see the sample table below:
Whilst your data does not seem compliant with the query you shared, since it is lacking the field named Ad and other fields have different names, such as Date and ReportDate, I was able to identify some issues and propose improvements.
First, within your temp table cte_1, you are only using a filter in the WHERE clause, you could use it within your from statement in your last step, such as :
SELECT * FROM (SELECT field1,field2,field3 FROM t1 WHERE Date > DATE(2020,02,16) )
Second, in cte_2, you need to select all the columns you will need from the table t2. Otherwise, your table will have only the row number and it won't be possible to join it with other tables, once it does not provide any other information. Thus, if you need the row number, you select it together with the other columns, which it has to include your primary key if you will perform any join in the future. The syntax would be as follows:
SELECT field1, field2, ROW_NUMBER() OVER(ORDER BY Date) FROM t2
Third, in cte_3, I assume you want to perform an INNER JOIN. Thus, you need to make sure that the primary keys are present in both tables, in your case Date and Ad, which I could not find within your data. Furthermore, you can not have duplicated names when joining two tables and selecting all the columns. For example, in your case you have Brand, value 1, value 2 and value 3 in both tables, it will cause an error. Thus, you need to specify where these fields should come from by selecting one by one or the using a EXCEPT clause.
Finally, in cte_4 and your final select could be together in one step. Basically, you are selecting only one row of data ordered by Date. Then summing the fields value 1, value 2 and value 3 individually based on the partition by date. Moreover, you are not selecting any identifier for the sum, which means that your table will have only the final sums. In general, when peforming a aggregation, such as SUM(), the primary key(s) is selected as well. Lastly, this step could have been performed in one step such as follows, using only the data from t2:
SELECT ReportDate, Brand, sum(value1) as sum_1,sum(value2) as sum_1,sum(value3) as sum_1, sum(value4) as sum_1 FROM (SELECT t2.*, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY Date) as rn t2)
WHERE rn=1
GROUP BY ReportDate, Brand
UPDATE:
With your explanation in the comment section. I was able to created a more specific query. The fields ReportDate,Brand,Portfolio,Campaign and value1,value2,value3 are from t2. Whilst value4 is from t1. The sum is made based on the row number equals to 1. For this reason, the tables t1 and t2 are joined before being using ROW_NUMBER(). Finally, in the last Select statement rn is not selected and the data is aggregated based on ReportDate, Brand, Portfolio and t2.Campaign.
WITH cte_1 AS (
SELECT t2.ReportDate, t2.Brand, t2.Portfolio, t2.Campaign,
t2.value1, t2.value2, t2.value3, t1.value4
FROM t2 LEFT JOIN t1 on t2.ReportDate = t1.ReportDate and t1.placement=t2.Ad
),
cte_2 AS(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY ReportDate) as rn FROM cte_1
)
SELECT ReportDate, Brand, Portfolio, Campaign, SUM(value1) as sum1, SUM(value2) as sum2, SUM(value3) as sum3,
SUM(value4) as sum4
FROM cte_2
WHERE rn=1
GROUP BY 1,2,3,4

In T-SQL, how can I collate positive and negative actions in order that they happened?

I have a table like this:
;WITH CTE AS
( SELECT *
FROM (VALUES(1,'BlueCar',NULL),
(2,'RedCar',NULL),
(3,NULL,'BlueCar'),
(4,'GreenCar',NULL),
(5,NULL,'RedCar'),
(6,'BlueCar',NULL)
) AS ValuesTable(Time,Buy,Sell)
)
SELECT *
FROM CTE
Time Buy Sell
1 BlueCar NULL
2 RedCar NULL
3 NULL BlueCar
4 GreenCar NULL
5 NULL RedCar
6 BlueCar NULL
How can I query this table to get the total number of cars still in stock? The Time column is days since the shop opened. The time that the car was purchased must be preserved
Note: The input data is such that there will never be a situation where there are multiple cars in the inventory.
Expected Output
Time Buy
4 GreenCar
6 BlueCar
In the query below, I do two separate aggregations to obtain the buy and sell counts for each car. I left join buys to sells, which should not run the risk of losing data assuming that the dealer did not short sell any inventory which does not actually exist.
Then I join that result to a CTE which finds the latest time for each car. This would then correspond to the time when the most recent car came into inventory, for each car type.
I also include the inventory count, which you did request, but it may be useful for you if you decide to expand the scope of your query later on.
WITH yourTable AS (
SELECT 1 AS Time, 'BlueCar' AS Buy, NULL AS Sell UNION ALL
SELECT 2,'RedCar',NULL UNION ALL
SELECT 3,NULL,'BlueCar' UNION ALL
SELECT 4,'GreenCar',NULL UNION ALL
SELECT 5,NULL,'RedCar' UNION ALL
SELECT 6,'BlueCar',NULL
),
cte AS (
SELECT Buy, Time
FROM
(
SELECT Buy, Time,
ROW_NUMBER() OVER (PARTITION BY Buy ORDER BY Time DESC) rn
FROM yourTable
) t
WHERE rn = 1
)
SELECT
t1.Buy,
t1.buy_cnt - COALESCE(t2.sell_cnt, 0) AS inventory,
t3.Time
FROM
(
SELECT Buy, COUNT(*) AS buy_cnt
FROM yourTable
GROUP BY Buy
) t1
LEFT JOIN
(
SELECT Sell, COUNT(*) AS sell_cnt
FROM yourTable
GROUP BY Sell
) t2
ON t1.Buy = t2.Sell
LEFT JOIN cte t3
ON t1.Buy = t3.Buy
WHERE
t1.Buy IS NOT NULL AND
t1.buy_cnt - COALESCE(t2.sell_cnt, 0) > 0
ORDER BY
t3.Time;
Output:
Demo here:
Rextester
You can do this with a not exists:
;WITH CTE AS
( SELECT *
FROM (VALUES(1,'BlueCar',NULL),
(2,'RedCar',NULL),
(3,NULL,'BlueCar'),
(4,'GreenCar',NULL),
(5,NULL,'RedCar'),
(6,'BlueCar',NULL)
) AS ValuesTable(Time,Buy,Sell)
)
SELECT
[Time], Buy
FROM CTE as T1
WHERE
NOT EXISTS (SELECT 1 FROM CTE as T2 WHERE T2.TIME > T1.TIME AND T1.Buy = T2.Sell) AND
BUY IS NOT NULL
Presumably, you want:
with cte as (
. . .
)
select count(buy) - count(sell)
from cte;
Note: This does not verify that what you sell is something that has already been bought. It just counts up the non-NULL values in each column and takes the difference.
To get the stock at a certain point in time you can do
SELECT car, SUM(Inc) total FROM
(SELECT ID, Buy car, 1 Inc FROM tbl WHERE Buy>''
UNION ALL
SELECT ID, Sell car, -1 Inc FROM tbl WHERE Sell>'') coll
WHERE ID < 20 -- some cut-off time
GROUP BY car
I combine the two columns Buy and Sell into one (= car) and add another column (inc) with the increment of each action (-1 or 1). The rest is simple: select with a group by [car] and summation over column inc.
Here is a little demo: http://rextester.com/LLQDW60692
It is Good Question. I like that. Time by time your expected outputs changes.Its ok.
check below simple query for your problem.
Using Joins and Rownumber() we can achieve this.
;with CTE as
(
select a.time,a.buy,a.rid,COALESCE(b.rid,0)rid2 ,coalesce(b.sell,a.buy)sell from
( select time,buy,ROW_NUMBER()over( partition by buy order by (select 1)) rid
from #tableName where buy is not null)a left join
( select time,sell, ROW_NUMBER()over( partition by sell order by (select 1)) rid
from #TableName
where sell is not null )b on a.buy=b.sell
)
select Time,Buy from CTE
where rid!=rid2
Sample Demo For All Your Expected outputs.
Demo Link : Click Here
ALL Required Outputs :

get ROW NUMBER of random records

For a simple SQL like,
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
how to add row numbers to them so that the row numbers become 1,2, and 3?
UPDATE:
I thought I can simplify my question as above, but it turns out to be more complicated. So here is a fuller version -- I need to give three random picks (from MyTable) for each person, with pick/row number of 1, 2, and 3, and there is no logical joining between person and picks.
SELECT * FROM Person
LEFT JOIN (
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
) D ON 1=1
The problem with above SQL are,
Obviously, pick/row number of 1, 2, and 3 should be added
and what is not obvious is that, the above SQL will give each person the same picks, whereas I need to give different person different picks
Here is a working SQL to test it out:
SELECT TOP 15 database_id, create_date, cs.name FROM sys.databases
CROSS apply (
SELECT top 3 Row_number()OVER(ORDER BY (SELECT NULL)) AS RowNo,*
FROM (SELECT top 3 name from sys.all_views ORDER BY NEWID()) T
) cs
So, Please help.
NOTE: This is NOT about MySQL byt T-SQL as their syntax are different, Thus the solution is different as well.
Add Row_number to outer query. Try this
SELECT Row_number()OVER(ORDER BY (SELECT NULL)),*
FROM (SELECT TOP 3 MyId
FROM MyTable
ORDER BY Newid()) a
Logically TOP keyword is processed after Select. After Row Number is generated random 3 records will be pulled. So you should not generate Row Number in original query
Update
It can be achieved through CROSS APPLY. Replace the column names inside cross apply where clause with valid column name from Person table
SELECT *
FROM Person p
CROSS apply (SELECT Row_number()OVER(ORDER BY (SELECT NULL)) rn,*
FROM (SELECT TOP 3 MyId
FROM MyTable
WHERE p.some_col = p.some_col -- Replace it with some column from person table
ORDER BY Newid())a) cs

Select all from a table, where 2 columns are Distinct

Hi I have a table of deals, I need to return the entire table but I need the title and the price to be distinct, as there is quite a few double ups, I've put in an example scenario below
Col ID || Col Title || Col Price || Col Source
a b c d
a b c b
b a a c
b a a 1
Expected result:
a b c d
b a a c
I'm not sure whether or not to use distinct or group by here, any suggestions would be appreciated
Cheers
Scott
=======================
Looking at some of your suggestions I'm going to have to rethink this, Thanks guys
This will arbitrarily pick one of the rows for each distinct (price,title) pair
;WITH myCTE AS
(
SELECT
*,
ROWNUMBER() OVER (PARTITION BY Price, Title ORDER BY Source) AS rn
FROM
MyTable
)
SELECT
*
FROM
myCTE
WHERE
rn = 1
You can use group by, but to return only title and price, ID and source would have to be ignored
You are asking for entire table but in your sample output you have lost two Records and thus losing the value of 'Col Source'.
a b c b
b a a 1
Group By will help you write very simple query
select id, title, price, source from table group by title, price
A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.
You should go for the GROUP BY as the entire columns required in your resultset. However, the DISTINCT will return only unique list of specific column.
SELECT ID, Title, Price, Source
FROM table as t
GROUP BY Title, Price