sql group by find all combinations of two columns distinct values - sql

I have following table
ORDID EMPID ITEMCOST TIME
-------------------------------------
10023 B2690 675 1992
10024 C3467 8078 1992
10025 B2690 15481 1992
10026 C5621 22884 1992
10027 B2109 30287 1992
10030 B3297 52496 1993
10031 C3467 59899 1993
10032 F5621 67302 1993
10033 G3467 74705 1993
and so on many rows.....
I am trying to find out empid who purchased some item in each and every year.
in other words want to find out empid which exist in each and every year in that table.
BTB I am using Oracle 11g Express.
Thanks in advance.

You can do this with a having clause where you compare the number of distinct years for each empid to the number of distinct years in the data:
select empid
from followingtable
group by empid
having count(distinct time) = (select count(distinct time) from followingtable);

Below query will also work.
SELECT TAB.EMPID FROM
(
SELECT A.EMPID, COUNT(DISTINCT A.TIME) YEARCOUNT FROM MY_TABLE A GROUP BY EMPID
) TAB
WHERE TAB.YEARCOUNT = (SELECT COUNT(DISTINCT B.TIME) FROM MY_TABLE B)

Related

Group by based on field length

I wanted to group number of ids that are of length of 4, 5, 6 bytes based on the year.
ID
year
name
location
geo
new_loc
addr 1
addr 2
addr 3
addr 4
12345
2019
bob
UK
UK-4
basic
dat1
dat11
dat13
dat123
19804
2004
sam
US
US-1
advanced
dat2
dat21
dat23
dat233
19
2000
lister
EU
EU
basic
dat3
dat31
dat33
dat333
190838
2004
harold
US
US-3
basic
dat4
dat41
dat53
dat533
11804
2019
beanie
SK
UK-2
advanced
NULL
NULL
NULL
NULL
Output
ID
year
name
location
new location
num_of_ids_each_year
12345
2019
bob
UK
basic
2
11804
2019
beanie
SK
advanced
2
19804
2004
sam
US
advanced
2
190838
2004
harold
US
basic
2
What I tried:
select ID, year, name, location, [new location], count(year)
from table1
group by ID, year, name, location, [new location], count(year);
Could someone advice on how to include only those ids that has more than 4,5,6 bytes
You can use COUNT() with Partition by Year to get the results without using GROUP BY.
SELECT ID, [year], [name], [location], [new location]
, COUNT(1) OVER (PARTITION BY year) AS num_of_ids_each_year
FROM table1
WHERE LEN(ID) IN (4,5,6)
Thanks #Squirrel, I finally made a way.
select id, Year, name, location, [new location],
count(id) over (partition by year) as num_of_ids_each_year
from table1 where len(id) in (4,5,6);
Please try aggregate function in having clause
e.g.
select ID,
year,
name,
location,
new location,
len(year)
from table1
group by ID, year, name, location, new location
having Len(year) >= 4

SQL - Display Name ID from Consecutive Occurrences of values in a Table

I have a table created, as an example 'Table1', see below;
Name Year
John 2003
Lyla 1994
Faith 1996
John 2002
Carol 2000
Carol 1999
John 2001
Carol 2002
Lyla 1996
Lyla 1997
Carol 2001
John 2009
Based on the above table, I have summarised my findings.
Carol participated for 4 years in a row; 1999, 2000, 2001, 2002
John participated for 3 years in a row; 2001, 2002, 2003 – John also participated in 2009, but this does not count as part of the streak.
Lyla participated in 1994, 1996, 1997 but these were not three consecutive years.
Faith participated only 1 time.
What I am looking to do is write a SQL query where only the Name Id in the table are displayed where the users have participated for 3 consecutive years or more, so I should only be getting the names of only 'Carol' and 'John' based on the above.
I am not exactly sure how to write this and would hope that someone could guide me.
I have only come up with a short and basic start like the one below, but in all honesty I am not sure that is even the correct way to go about it.
Select Name From Table1
Where Year = ?
Order by Name asc
Group by Year
Assuming you have one row per person per year, you can use lag() and select distinct:
select distinct name
from (select t.*,
lag(year, 2) over (partition by name order by name) as prev2_year
from table1 t
) t
where prev2_year = year - 2;
This simply looks back two rows for each name and compares the year on that row to the year on the current row. If there are three years in a row, then that year is exactly year - 2.
You could also do this with joins, but the above probably performs better:
select distinct t1.name
from table1 t1 join
table1 t1_1
on t1.name = t1_1.name and
t1.year = t1_1.year + 1 join
table1 t1_2
on t1.name = t1_2.name and
t1.year = t1_2.year + 2;
select
n1.name,
SUM(CASE WHEN n2.year is null then 0 else 1 end)+1 YearsInRow
from Table1 n1
left join Table1 n2 on n2.name=n1.name and (n2.year=n1.year+1 )
GROUP by n1.name
HAVING SUM(CASE WHEN n2.year is null then 0 else 1 end)+1 >=3
output:
name YearsInRow
---------- -----------
Carol 4
John 3

SQL COUNT the number purchase between his first purchase and the follow 10 months

every customer has different first-time purchase date, I want to COUNT the number of purchases they have between the following 10 months after the first purchase?
sample table
TransactionID Client_name PurchaseDate Revenue
11 John Lee 10/13/2014 327
12 John Lee 9/15/2015 873
13 John Lee 11/29/2015 1,938
14 Rebort Jo 8/18/2013 722
15 Rebort Jo 5/21/2014 525
16 Rebort Jo 2/4/2015 455
17 Rebort Jo 3/20/2016 599
18 Tina Pe 10/8/2014 213
19 Tina Pe 6/10/2016 3,494
20 Tina Pe 8/9/2016 411
my code below just use ROW_NUM function to identify the first purchase, but I don't know how to do the calculations or there's a better way to do it?
SELECT client_name,
purchasedate,
Dateadd(month, 10, purchasedate) TenMonth,
Row_number()
OVER (
partition BY client_name
ORDER BY client_name) RM
FROM mytable
You might try something like this - I assume you're using SQL Server from the presence of DATEADD() and the fact that you're using a window function (ROW_NUMBER()):
WITH myCTE AS (
SELECT TransactionID, Client_name, PurchaseDate, Revenue
, MIN(PurchaseDate) OVER ( PARTITION BY Client_name ) AS min_PurchaseDate
FROM myTable
)
SELECT Client_name, COUNT(*)
FROM myCTE
WHERE PurchaseDate <= DATEADD(month, 10, min_PurchaseDate)
GROUP BY Client_name
Here I'm creating a common table expression (CTE) with all the data, including the date of first purchase, then I grab a count of all the purchases within a 10-month timeframe.
Hope this helps.
Give this a whirl ... Subquery to get the min purchase date, then LEFT JOIN to the main table to have a WHERE clause for the ten month date range, then count.
SELECT Client_name, COUNT(mt.PurchaseDate) as PurchaseCountFirstTenMonths
FROM myTable mt
LEFT JOIN (
SELECT Client_name, MIN(PurchaseDate) as MinPurchaseDate GROUP BY Client_name) mtmin
ON mt.Client_name = mtmin.Client_name AND mt.PurchaseDate = mtmin.MinPurchaseDate
WHERE mt.PurchaseDate >= mtmin.MinPurchaseDate AND mt.PurchaseDate <= DATEADD(month, 10, mtmin.MinPurchaseDate)
GROUP BY Client_name
ORDER BY Client_name
btw I'm guessing there's some kind of ClientID involved, as nine character full name runs the risk of duplicates.

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

SQL Server : count types with totals by date change

I need to count a value (M_Id) at each change of a date (RS_Date) and create a column grouped by the RS_Date that has an active total from that date.
So the table is:
Ep_Id Oa_Id M_Id M_StartDate RS_Date
--------------------------------------------
1 2001 5 1/1/2014 1/1/2014
1 2001 9 1/1/2014 1/1/2014
1 2001 3 1/1/2014 1/1/2014
1 2001 11 1/1/2014 1/1/2014
1 2001 2 1/1/2014 1/1/2014
1 2067 7 1/1/2014 1/5/2014
1 2067 1 1/1/2014 1/5/2014
1 3099 12 1/1/2014 3/2/2014
1 3099 14 2/14/2014 3/2/2014
1 3099 4 2/14/2014 3/2/2014
So my goal is like
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 10
If the M_startDate = RS_Date I need to count the M_id and then for
each RS_Date that is not equal to the start date I need to count the M_Id and then add that to the M_StartDate count and then count the next RS_Date and add that to the last active count.
I can get the basic counts with something like
(Case when M_StartDate <= RS_Date
then [m_Id] end) as Test.
But I am stuck as how to get to the result I want.
Any help would be greatly appreciated.
Brian
-added in response to comments
I am using Server Ver 10
If using SQL SERVER 2012+ you can use ROWS with your the analytic/window functions:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT *,SUM(CT) OVER(ORDER BY RS_Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Run_CT
FROM cte
Demo: SQL Fiddle
If stuck using something prior to 2012 you can use:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT a.RS_Date
,SUM(b.CT)
FROM cte a
LEFT JOIN cte b
ON a.RS_DAte >= b.RS_Date
GROUP BY a.RS_Date
Demo: SQL Fiddle
You need a cumulative sum, easy in SQL Server 2012 using Windowed Aggregate Functions. Based on your description this will return the expected result
SELECT p_id, RS_Date,
SUM(COUNT(*))
OVER (PARTITION BY p_id
ORDER BY RS_Date
ROWS UNBOUNDED PRECEDING)
FROM tab
GROUP BY p_id, RS_Date
It looks like you want something like this:
SELECT
RS_Date,
SUM(c) OVER (PARTITION BY M_StartDate ORDER BY RS_Date ROWS UNBOUNDED PRECEEDING)
FROM
(
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
) counts
The inline view computes the counts of distinct M_Id values within each (M_StartDate, RS_Date) group (distinctness enforced only within the group), and the outer query uses the analytic version of SUM() to add up the counts within each M_StartDate.
Note that this particular query will not exactly reproduce your example results. It will instead produce:
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 8
3/2/2014 2
This is on account of some rows in your example data with RS_Date 3/2/2014 having a later M_StartDate than others. If this is not what you want then you need to clarify the question, which currently seems a bit inconsistent.
Unfortunately, analytic functions are not available until SQL Server 2012. In SQL Server 2010, the job is messier. It could be done like this:
WITH gc AS (
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
)
SELECT
RS_Date,
(
SELECT SUM(c)
FROM gc2
WHERE gc2.M_StartDate = gc.M_StartDate AND gc2.RS_Date <= gc.RS_Date
) AS Active
FROM gc
If you are using SQL 2012 or newer you can use LAG to produce a running total.
https://msdn.microsoft.com/en-us/library/hh231256(v=sql.110).aspx