SQL Query to apply a command to multiple rows - sql

I am new to SQL and trying to write a statement similar to a 'for loop' in other languages and am stuck. I want to filter out rows of the table where for all of attribute 1, attribute2=attribute3 without using functions.
For example:
| Year | Month | Day|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 4 |
| 2 | 3 | 3 |
| 2 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
I would only want the row
| Year | Month | Day|
|:---- |:------:| -----:|
| 3 | 4 | 4 |
because it is the only where month and day are equal for all of the values of year they share.
So far I have
select year, month, day from dates
where month=day
but unsure how to apply the constraint for all of year

-- month/day need to appear in aggregate functions (since they are not in the GROUP BY clause),
-- but the HAVING clause ensure we only have 1 month/day value (per year) here, so MIN/AVG/SUM/... would all work too
SELECT year, MAX(month), MAX(day)
FROM my_table
GROUP BY year
HAVING COUNT(DISTINCT (month, day)) = 1;
year
max
max
3
4
4
View on DB Fiddle

So one way would be
select distinct [year], [month], [day]
from [Table] t
where [month]=[day]
and not exists (
select * from [Table] x
where t.[year]=x.[year] and t.[month] <> x.[month] and t.[day] <> x.[day]
)
And another way would be
select distinct [year], [month], [day] from (
select *,
Lead([month],1) over(partition by [year] order by [month])m2,
Lead([day],1) over(partition by [year] order by [day])d2
from [table]
)x
where [month]=m2 and [day]=d2

Related

How to add records for each user based on another existing row in BigQuery?

Posting here in case someone with more knowledge than may be able to help me with some direction.
I have a table like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
I'd really appreciate any help I could get! I've been trying different options without success
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
if applied to sample data from your question - output is
One option uses generate_date_array() to create the series of dates of each user, then brings the table with a left join.
select d.date, d.user_id,
last_value(t.score ignore nulls) over(partition by d.user_id order by d.date) as score
from (
select t.user_id, d.date
from mytable t
cross join unnest(generate_date_array(min(date), max(date), interval 1 day)) d(date)
group by t.user_id
) d
left join mytable t on t.user_id = d.user_id and t.date = d.date
I think the most efficient method is to use generate_date_array() but in a very particular way:
with t as (
select t.*,
date_add(lead(date) over (partition by user_id order by date), interval -1 day) as next_date
from t
)
select row_number() over (order by t.user_id, dte) as id,
t.user_id, dte, t.score
from t cross join join
unnest(generate_date_array(date,
coalesce(next_date, date)
interval 1 day
)
) dte;

SQL query grouping by range

Hi have a table A with the following data:
+------+-------+----+--------+
| YEAR | MONTH | PA | AMOUNT |
+------+-------+----+--------+
| 2020 | 1 | N | 100 |
+------+-------+----+--------+
| 2020 | 2 | N | 100 |
+------+-------+----+--------+
| 2020 | 3 | O | 100 |
+------+-------+----+--------+
| 2020 | 4 | N | 100 |
+------+-------+----+--------+
| 2020 | 5 | N | 100 |
+------+-------+----+--------+
| 2020 | 6 | O | 100 |
+------+-------+----+--------+
I'd like to have the following result:
+---------+---------+--------+
| FROM | TO | AMOUNT |
+---------+---------+--------+
| 2020-01 | 2020-02 | 200 |
+---------+---------+--------+
| 2020-03 | 2020-03 | 100 |
+---------+---------+--------+
| 2020-04 | 2020-05 | 200 |
+---------+---------+--------+
| 2020-06 | 2020-06 | 100 |
+---------+---------+--------+
My DB is DB2/400.
I have tried with ROW_NUMBER partitioning, subqueries but I can't figure out how to solve this.
I understand this as a gaps-and-island problem, where you want to group together adjacent rows that have the same PA.
Here is an approach using the difference between row numbers to build the groups:
select min(year_month) year_month_start, max(year_month) year_month_end, sum(amount) amount
from (
select a.*, year * 100 + month year_month
row_number() over(order by year, month) rn1,
row_number() over(partition by pa order by year, month) rn2
from a
) a
group by rn1 - rn2
order by year_month_start
You can try the below -
select min(year)||'-'||min(month) as from_date,max(year)||'-'||max(month) as to_date,sum(amount) as amount from
(
select *,row_number() over(order by month)-
row_number() over(partition by pa order by month) as grprn
from t1
)A group by grprn,pa order by grprn
This works in tsql, guess you can adaot it to db2-400?
SELECT MIN(Dte) [From]
, MAX(Dte) [To]
-- ,PA
, SUM(Amount)
FROM (
SELECT year * 100 + month Dte
, Pa
, Amount
, ROW_NUMBER() OVER (PARTITION BY pa ORDER BY year * 100 + month) +
10000- (YEar*100+Month) rn
FROM tabA a
) b
GROUP BY Pa
, rn
ORDER BY [From]
, [To]
The trick is the row number function partitioned by PA and ordered by date, This'll count one up for each month for the, when added to the descnding count of month and month, you will get the same number for consecutive months with same PA. You the group by PA and the grouping yoou made, rn, to get the froups, and then Bob's your uncle.

How to sum up values for specific names in each row?

I am new to SQL and was not able to solve the following problem: I've got a column of names [name]), a column of integer values that I wanna sum up ([Values]) and another column of integer values ([Day]). I want to sum up the values grouped by name for each day. So for example if there is a name "Chris" with value 4 on day 1 and there is another entry "Chris" with value 2 on day 3, I want to show the sum of chris on day_1 (4) and on day_2 (4+2=6).
As in the example above ("chris") I wanna sum them up, showing the sum for each name on each day (the sum from day 1 until day x).
I was only able to sum up the values for each name per day (see code below) but this is not what I am searching for since I need to keep the structure of the database. Therefore I need to show the sum for each value in every row in a further column.
select name, day,
sum(value) over (partition by name order by day) total
from tablename
There is a table below showing what I want to achieve.
With sum() window function:
select *,
sum(value) over (partition by name order by day) SumValue,
sum(value2) over (partition by name2 order by day) SumValue2
from tablename
order by day
or:
select name, day, value, name2, value2,
sum(value) over (partition by name order by day) SumValue,
sum(value2) over (partition by name2 order by day) SumValue2
from (select *, row_number() over (order by day) rn from tablename) as t
order by rn
if you want to preserve the original sequence of the rows.
See the demo.
Results:
> name | day | value | name2 | value2 | SumValue | SumValue2
> :---- | --: | ----: | :---- | -----: | -------: | --------:
> Chris | 1 | 2 | Paul | 5 | 2 | 5
> Alice | 1 | 5 | Ken | 4 | 5 | 4
> Paul | 2 | 8 | Alice | 1 | 8 | 1
> Ken | 2 | 4 | Chris | 2 | 4 | 2
> Alice | 3 | 3 | Ken | 3 | 8 | 7
> Chris | 3 | 6 | Paul | 0 | 8 | 5

postgresql - cumul. sum active customers by month (removing churn)

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.
Here is a sample of my customers table :
customer_id | begin_date | end_date
-----------------------------------------
1 | 15/09/2017 |
2 | 15/09/2017 |
3 | 19/09/2017 |
4 | 23/09/2017 |
5 | 27/09/2017 |
6 | 28/09/2017 | 15/10/2017
7 | 29/09/2017 | 16/10/2017
8 | 04/10/2017 |
9 | 04/10/2017 |
10 | 05/10/2017 |
11 | 07/10/2017 |
12 | 09/10/2017 |
13 | 11/10/2017 |
14 | 12/10/2017 |
15 | 14/10/2017 |
Here is what I am looking to achieve :
month | active customers
-----------------------------------------
2017-09 | 7
2017-10 | 6
I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.
select
"begin_date" as "date",
sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
select
date_trunc('month',begin_date)::date as "begin_date",
count(id) as new_customers
from customers
group by 1
) as new_customers
LEFT JOIN(
select
date_trunc('month',end_date)::date as "end_date",
count(id) as churn_customers
from customers
where
end_date is not null
group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;
You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join
SQL Fiddle
Query 1:
WITH edt
AS (
SELECT to_char(end_date, 'yyyy-mm') AS mon
,count(*) AS ct
FROM customers
WHERE end_date IS NOT NULL
GROUP BY to_char(end_date, 'yyyy-mm')
)
SELECT to_char(c.begin_date, 'yyyy-mm') as month
,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;
Results:
| month | active_customers |
|---------|------------------|
| 2017-09 | 7 |
| 2017-10 | 6 |

SQL Server 2008 How can I SELECT rows with Value by GROUP(ing) other column?

Hi guys I'm a beginner at SQL so please bear with me.. :)
My question is as follows.
I got this table:
DateTime ID Year Month Value Cost
-------------------|------|--------|-------|-------|--------|
1-1-2013 00:00:01 | 1 | 2013 | 1 | 30 | 90 |
1-1-2013 00:01:01 | 1 | 2013 | 1 | 0 | 0 |
1-1-2013 00:02:01 | 1 | 2013 | 1 | 1 | 3 |
1-2-2013 00:00:01 | 1 | 2013 | 2 | 2 | 6 |
1-2-2013 00:01:01 | 1 | 2013 | 2 | 3 | 9 |
1-2-2013 00:02:01 | 1 | 2013 | 2 | 4 | 12 |
1-3-2013 00:00:01 | 1 | 2013 | 3 | 5 | 15 |
1-3-2013 00:01:01 | 1 | 2013 | 3 | 6 | 18 |
1-3-2013 00:02:01 | 1 | 2013 | 3 | 7 | 21 |
Now what I'm trying to get is this result
Year Month Value Cost
|--------|-------|-------|--------|
| 2013 | 1 | 1 | 3 |
| 2013 | 2 | 4 | 12 |
| 2013 | 3 | 7 | 21 |
As you can see I'm trying to GROUP BY the [Month] and the [Year] and to get the last [Value] for every [Month].
Now as you can understand from the result I do not try to get the MAX() value from the [Value] column but the last value for every [Month] and that is my issue..
Thanks in advance
PS
I was able to GROUP BY the [Year] and the [Month] but as I understand that when I adding the [Value] column the GROUP BY is not effecting the result, as the SQL need more spcification on the value you what the SQL to get..
Instead of using row_number(), you can also use rank(). Using rank() might give you multiple values within the same year and month, see this post.
Because of this, a group by is added.
SELECT
[Year],
[Month],
[Value],
[Cost]
FROM
(
SELECT
[Year],
[Month],
[Value],
[Cost],
Rank() OVER (PARTITION BY [Year], [Month] ORDER BY [DateTime] DESC) AS [Rank]
FROM [t1]
) AS [sub]
WHERE [Rank] = 1
GROUP BY
[Year],
[Month],
[Value],
[Cost]
ORDER BY
[Year] ASC,
[Month] ASC
As stated in the comments, this might still return multiple records for a single month. Therefor the ORDER BY statement can be extended, based on the desired functionality:
Rank() OVER (PARTITION BY [Year], [Month] ORDER BY [DateTime] DESC, [Value] DESC, [Cost] ASC) AS [Rank]
Switching the order of [Value] and [Cost] or ASC <> DESC will influence the rank and because of that the result.
Since you are using SQL Server 2008, you can use row_number() to get the result:
select year, month, value, cost
from
(
select year, month, value, cost,
row_number() over(partition by year, month order by datetime desc) rn
from yourtable
) src
where rn = 1
See SQL Fiddle with Demo
Or you can use a subquery to get this (note: with this version if you have more than one record with the same max datetime per month then you will return each record:
select t1.year, t1.month, t1.value, t1.cost
from yourtable t1
inner join
(
select max(datetime) datetime
from yourtable
group by year, month
) t2
on t1.datetime = t2.datetime
See SQL Fiddle with Demo
Both give the same result:
| YEAR | MONTH | VALUE | COST |
-------------------------------
| 2013 | 1 | 1 | 3 |
| 2013 | 2 | 4 | 12 |
| 2013 | 3 | 7 | 21 |