Group by based on field length - sql

I wanted to group number of ids that are of length of 4, 5, 6 bytes based on the year.
ID
year
name
location
geo
new_loc
addr 1
addr 2
addr 3
addr 4
12345
2019
bob
UK
UK-4
basic
dat1
dat11
dat13
dat123
19804
2004
sam
US
US-1
advanced
dat2
dat21
dat23
dat233
19
2000
lister
EU
EU
basic
dat3
dat31
dat33
dat333
190838
2004
harold
US
US-3
basic
dat4
dat41
dat53
dat533
11804
2019
beanie
SK
UK-2
advanced
NULL
NULL
NULL
NULL
Output
ID
year
name
location
new location
num_of_ids_each_year
12345
2019
bob
UK
basic
2
11804
2019
beanie
SK
advanced
2
19804
2004
sam
US
advanced
2
190838
2004
harold
US
basic
2
What I tried:
select ID, year, name, location, [new location], count(year)
from table1
group by ID, year, name, location, [new location], count(year);
Could someone advice on how to include only those ids that has more than 4,5,6 bytes

You can use COUNT() with Partition by Year to get the results without using GROUP BY.
SELECT ID, [year], [name], [location], [new location]
, COUNT(1) OVER (PARTITION BY year) AS num_of_ids_each_year
FROM table1
WHERE LEN(ID) IN (4,5,6)

Thanks #Squirrel, I finally made a way.
select id, Year, name, location, [new location],
count(id) over (partition by year) as num_of_ids_each_year
from table1 where len(id) in (4,5,6);

Please try aggregate function in having clause
e.g.
select ID,
year,
name,
location,
new location,
len(year)
from table1
group by ID, year, name, location, new location
having Len(year) >= 4

Related

find out the player with highest score in each year

I have a table like these
country
gender
player
score
year
Germany
male
Michael
14
1990
Austria
male
Simon
13
1990
Germany
female
Mila
16
1990
Austria
female
Simona
15
1990
This is a table in the database. It shows 70 countries around the world with player names and gender. It shows which player score how many goals in which year. The years goes from 1990 to 2015. So the table is large. Now I would like to know which female player and which male player score most in every year from 2010 to 2012.
I expect this:
gender
player
score
year
male
Michael
24
2010
male
Simon
19
2011
male
Milos
19
2012
female
Mara
16
2010
female
Simona
16
2011
female
Dania
17
2012
I used that code but got an error
SELECT gender,year,player, max(score) as score from (football) where player = max(score) and year in ('2010','2011','2012') group by 1,2,3
football is the table name
with main as (
select
gender,
player,
year,
sum(score) as total_score -- incase each player played multiple match in a year
from <table_name>
where year between 2010 and 2012
group by 1,2,3
),
ranking as (
select *,
row_number(total_score) over(partition by year, gender order by total_score desc) as rank_
)
select
gender,
player,
year,
total_score
from ranking where rank_ = 1
filter on years
first you add total score, to make sure you cover the cases if there are multiple matches played by the same player in same year
then you create a rank based on year, gender and the total score, so for a given year and for a given gender create a rank
then you filter on rank_ = 1 as it represents the highest score
You can use the dense_rank function to achieve this, if you are using sqlite version 3.25 or higher.
Query
select t.* from(
select *, dense_rank() over(
partition by year, gender
order by score desc
) as rn
from football
where year in ('2010','2011','2012')
) as t
where t.rn = 1;

How can I join the SUMS from 2 different tables into 1

I have 2 tables
Table 1 = LOG
Site Year Quarter SF Seats
------ ------ --------- ------ -------
NYC 2019 Q1 1000 34
NYC 2019 Q1 1289 98
CHI 2019 Q1 976 17
NYC 2019 Q2 3985 986
Table 2 = Headcount
Site Year Quarter HC
------ ------ --------- -------
NYC 2019 Q1 63
NYC 2019 Q1 34
CHI 2019 Q1 73
NYC 2019 Q2 23
I need to be able to join these tables together and display the sum of SF, Seats, and HC for each distinct Site, Quarter, and Year
For example the output should be:
Site Year Quarter HC SF Seats
------ ------ --------- ------- ------ -------
NYC 2019 Q1 97 2289 132
NYC 2019 Q2 23 3985 986
CHI 2019 Q1 73 976 17
Here is my SQL Query:
SELECT DISTINCT SITE,
YEAR,
QUARTER,
SEATS,
SF,
HC
FROM
(SELECT DISTINCT site SITE,
YEAR YEAR,
quarter QUARTER,
sum(SEATS) SEATS,
sum(SF) SF
FROM Headcount
GROUP BY SITE,
YEAR,
QUARTER) A
CROSS JOIN
(SELECT DISTINCT sum(HC) HC
FROM Headcount
GROUP BY site,
YEAR,
quarter, HC) C
But I am getting this error message "Column HC contains an aggregation function, which is not allowed in GROUP BY"
Any idea what I'm doing wrong and why this query isnt working?
The reason for the error is that in the last sub query you have HC in the group by clause, while you also aggregate with sum(HC). That is not allowed. It should be one or the other.
However, a cross join will combine all rows from the first sub query, with all rows from the second. Surely this is not what you need.
Also, distinct is not needed when you use group by. You cannot get duplicates with group by.
I would suggest using union all:
SELECT SITE,
YEAR,
QUARTER,
SUM(HC),
SUM(SEATS),
SUM(SF)
FROM (
SELECT SITE,
YEAR,
QUARTER,
HC,
null AS SEATS,
null AS SF
FROM Headcount
UNION ALL
SELECT SITE,
YEAR,
QUARTER,
null,
SEATS,
SF
FROM Log
) AS base
GROUP BY SITE,
YEAR,
QUARTER
With a N-M relationships between both tables, you would need to do the aggregation in subqueries, and then join the results together :
SELECT h.*, l.SF, l.Seats
FROM
(
SELECT site, year, quarter, SUM(SF) SF, SUM(Seats) Seats
FROM LOG
GROUP BY site, year, quarter
) l
INNER JOIN (
SELECT site, year, quarter, SUM(HC) HC
FROM Headcount
GROUP BY site, year, quarter
) h
ON h.site = l.site AND h.year = l.year AND h.quarter = l.quarter

SQL COUNT the number purchase between his first purchase and the follow 10 months

every customer has different first-time purchase date, I want to COUNT the number of purchases they have between the following 10 months after the first purchase?
sample table
TransactionID Client_name PurchaseDate Revenue
11 John Lee 10/13/2014 327
12 John Lee 9/15/2015 873
13 John Lee 11/29/2015 1,938
14 Rebort Jo 8/18/2013 722
15 Rebort Jo 5/21/2014 525
16 Rebort Jo 2/4/2015 455
17 Rebort Jo 3/20/2016 599
18 Tina Pe 10/8/2014 213
19 Tina Pe 6/10/2016 3,494
20 Tina Pe 8/9/2016 411
my code below just use ROW_NUM function to identify the first purchase, but I don't know how to do the calculations or there's a better way to do it?
SELECT client_name,
purchasedate,
Dateadd(month, 10, purchasedate) TenMonth,
Row_number()
OVER (
partition BY client_name
ORDER BY client_name) RM
FROM mytable
You might try something like this - I assume you're using SQL Server from the presence of DATEADD() and the fact that you're using a window function (ROW_NUMBER()):
WITH myCTE AS (
SELECT TransactionID, Client_name, PurchaseDate, Revenue
, MIN(PurchaseDate) OVER ( PARTITION BY Client_name ) AS min_PurchaseDate
FROM myTable
)
SELECT Client_name, COUNT(*)
FROM myCTE
WHERE PurchaseDate <= DATEADD(month, 10, min_PurchaseDate)
GROUP BY Client_name
Here I'm creating a common table expression (CTE) with all the data, including the date of first purchase, then I grab a count of all the purchases within a 10-month timeframe.
Hope this helps.
Give this a whirl ... Subquery to get the min purchase date, then LEFT JOIN to the main table to have a WHERE clause for the ten month date range, then count.
SELECT Client_name, COUNT(mt.PurchaseDate) as PurchaseCountFirstTenMonths
FROM myTable mt
LEFT JOIN (
SELECT Client_name, MIN(PurchaseDate) as MinPurchaseDate GROUP BY Client_name) mtmin
ON mt.Client_name = mtmin.Client_name AND mt.PurchaseDate = mtmin.MinPurchaseDate
WHERE mt.PurchaseDate >= mtmin.MinPurchaseDate AND mt.PurchaseDate <= DATEADD(month, 10, mtmin.MinPurchaseDate)
GROUP BY Client_name
ORDER BY Client_name
btw I'm guessing there's some kind of ClientID involved, as nine character full name runs the risk of duplicates.

Select unique record for each person based on 1 column

I have data like this. Where sometimes there are 2 records, 1 with a mailing address & a non-mailing address, sometimes there is only 1 record and it could be either a mailing address or a non-mailing address.
UniqueID,FirstName,LastName,DOB,House Number,City,State,Mailing
4444,George,Jetson,10/10/55,800,Orbit City,Space,0
4444,George,Jetson,10/10/55,555,Orbit City,Space,1
5555,Fred,Flintstone,12/12/04,88,Bedrock,PH,0
5555,Fred,Flintstone,12/12/04,100,Bedrock,PH,1
6666,Barney,Rubble,7/7/07,999,Bedrock,PH,0
7777,Jonny,Quest,5/30/64,343,Action City,KS,1
I'm trying to make a query that will return 1 row for each person and prefer the mailing address if it exists. So ideally the query would return these records
4444,George,Jetson,10/10/55,555,Orbit City,Space,1
5555,Fred,Flintstone,12/12/04,100,Bedrock,PH,1
6666,Barney,Rubble,7/7/07,999,Bedrock,PH,0
7777,Jonny,Quest,5/30/64,343,Action City,KS,1
Does anyone have any suggestions, based on some of the articles I've been reading I am thinking maybe I need to have a subquery? I was getting stuck at the OVER PARTITION BY part in the examples I was reading, or should I have some sort of IF statement? I'm kind of new to SQL, so thanks for any direction or help.
You can also express this query as:
select *
from tablename t
where mailing = 1
union all
select *
from tablename t
where not exists (select 1 from tablename t2 where t2.uniqueid = t.uniqueid);
With SQL-Server you can use ROW_NUMBER, for example with a CTE:
WITH CTE AS
(
SELECT UniqueID, FirstName, LastName, DOB, [House Number], City, State, Mailing,
rn = ROW_NUMBER() OVER (PARTITION BY UniqueID ORDER BY Mailing DESC)
FROM dbo.TableName
)
SELECT UniqueID, FirstName, LastName, DOB, [House Number], City, State, Mailing,
FROM CTE
WHERE rn = 1
Here's a fiddle: http://sqlfiddle.com/#!3/886b0/5/0
UNIQUEID FIRSTNAME LASTNAME DOB HOUSE NUMBER CITY STATE MAILING
4444 George Jetson October, 10 1955 00:00:00+0000 555 Orbit City Space 1
5555 Fred Flintstone December, 12 2004 00:00:00+0000 100 Bedrock PH 1
6666 Barney Rubble July, 07 2007 00:00:00+0000 999 Bedrock PH 0
7777 Jonny Quest May, 30 1964 00:00:00+0000 343 Action City KS 1

How to select items according to their sums in SQL?

I've got the following table:
ID Name Sales
1 Kalle 1
2 Kalle -1
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
7 Tina 0
I want to write a SQL query that only returns the rows that
represents a salesperson with sum of sales > 0.
ID Name Sales
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
Is this possible?
You can easily get names of the people with the sum of sales that are greater than 0 by using the a HAVING clause:
select name
from yourtable
group by name
having sum(sales) > 0;
This query will return both Simon and Anna, then if you want to return all of the details for each of these names you can use the above in a WHERE clause to get the final result:
select id, name, sales
from yourtable
where name in (select name
from yourtable
group by name
having sum(sales) > 0);
See SQL Fiddle with Demo.
You can make it like this, I think the join will be more effective than the where name in() clause.
SELECT Sales.name, Sales.sales
FROM Sales
JOIN (SELECT name FROM Sales GROUP BY Sales.name HAVING SUM(sales) > 0) AS Sales2 ON Sales2.name = Sales.name
This will work on some databases, like oracle, mssql, db2
SELECT ID, Name, Sales
FROM
(
SELECT ID, Name, Sales, sum(sales) over (partition by name) sum1
FROM <table>
) a
WHERE sum1 > 0