Pulling data with sysdate using for loop? - sql

I am pulling data and need to get a running iterative count of people that fall within the effective and expire date columns. For example I need to get a count of members that fall within a month:
m.memid m.effective_dt m.expiration_dt
00010 3/1/14 7/31/15
00011 1/1/12 1/31/14
00012 10/1/13 1/31/15
select mar2015.nummem032015
, apr2015.nummem042015
from (
select count(distinct m.member) as nummem032015
from m
where to_date('31-mar-2015') between trunc(m.EFFECTIVE_DT) and
trunc(m.EXPIRATION_DT
) mar2015
, (
select count(distinct m.member) as nummem042015
from m
where to_date('30-apr-2015') between trunc(m.EFFECTIVE_DT) and
trunc(m.EXPIRATION_DT
) apr2015
I am looking for a way to not have to do this for each month.
Output to look something like this:
month count
mar2015 1000
apr2015 2010
may2015 1900
Thank you.

Related

SQLite - Output count of all records per day including days with 0 records

I have a sqlite3 database maintained on an AWS exchange that is regularly updated by a Python script. One of the things it tracks is when any team generates a new post for a given topic. The entries look something like this:
id
client
team
date
industry
city
895
acme industries
blueteam
2022-06-30
construction
springfield
I'm trying to create a table that shows me how many entries for construction occur each day. Right now, the entries with data populate, but they exclude dates with no entries. For example, if I search for just
SELECT date, count(id) as num_records
from mytable
WHERE industry = "construction"
group by date
order by date asc
I'll get results that looks like this:
date
num_records
2022-04-01
3
2022-04-04
1
How can I make sqlite output like this:
date
num_records
2022-04-02
3
2022-04-02
0
2022-04-03
0
2022-04-04
1
I'm trying to generate some graphs from this data and need to be able to include all dates for the target timeframe.
EDIT/UPDATE:
The table does not already include every date; it only includes dates relevant to an entry. If no team posts work on a day, the date column will jump from day 1 (e.g. 2022-04-01) to day 3 (2022-04-03).
Given that your "mytable" table contains all dates you need as an assumption, you can first select all of your dates, then apply a LEFT JOIN to your own query, and map all resulting NULL values for the "num_records" field to "0" using the COALESCE function.
WITH cte AS (
SELECT date,
COUNT(id) AS num_records
FROM mytable
WHERE industry = "construction"
GROUP BY date
ORDER BY date
)
SELECT dates.date,
COALESCE(cte.num_records, 0) AS num_records
FROM (SELECT date FROM mytable) dates
LEFT JOIN cte
ON dates.date = cte.date

Using Parameter within timestamp_trunc in SQL Query for DataStudio

I am trying to use a custom parameter within DataStudio. The data is hosted in BigQuery.
SELECT
timestamp_trunc(o.created_at, #groupby) AS dateMain,
count(o.id) AS total_orders
FROM `x.default.orders` o
group by 1
When I try this, it returns an error saying that "A valid date part name is required at [2:35]"
I basically need to group the dates using a parameter (e.g. day, week, month).
I have also included a screenshot of how I have created the parameter in Google DataStudio. There is a default value set which is "day".
A workaround that might do the trick here is to use a rollup in the group by with the different levels of aggregation of the date, since I am not sure you can pass a DS parameter to work like that.
See the following example for clarity:
with default_orders as (
select timestamp'2021-01-01' as created_at, 1 as id
union all
select timestamp'2021-01-01', 2
union all
select timestamp'2021-01-02', 3
union all
select timestamp'2021-01-03', 4
union all
select timestamp'2021-01-03', 5
union all
select timestamp'2021-01-04', 6
),
final as (
select
count(id) as count_orders,
timestamp_trunc(created_at, day) as days,
timestamp_trunc(created_at, week) as weeks,
timestamp_trunc(created_at, month) as months
from
default_orders
group by
rollup(days, weeks, months)
)
select * from final
The output, then, would be similar to the following:
count | days | weeks | months
------+------------+----------+----------
6 | null | null | null <- this, represents the overall (counted 6 ids)
2 | 2021-01-01| null | null <- this, the 1st rollup level (day)
2 | 2021-01-01|2020-12-27| null <- this, the 1st and 2nd (day, week)
2 | 2021-01-01|2020-12-27|2021-01-01 <- this, all of them
And so on.
At the moment of visualizing this on data studio, you have two options: setting the metric as Avg instead of Sum, because as you can see there's kind of a duplication at each stage of the day column; or doing another step in the query and get rid of nulls, like this:
select
*
from
final
where
days is not null and
weeks is not null and
months is not null

SQL - Monthly cumulative count of new customer based on a created date field

Thanks in advance.
I have Customer records that look like this:
Customer_Number
Create_Date
34343
01/22/2001
54554
03/03/2020
85296
01/01/2001
...
I have about a thousand of these records (customer number is unique) and the bossman wants to see how the number of customers has grown over time.
The output I need:
Customer_Count
Monthly_Bucket
7
01/01/2021
9
02/01/2021
13
03/01/2021
20
04/01/2021
The customer count is cumulative and the Monthly Bucket will just feed the graphing package to make a nice bar chart answering the question "how many customers to we have in total in a particular month and how is it growing over time".
Try the following SELECT SQL with a sub-query:
SELECT Customer_Count=
(
SELECT COUNT(s.[Create_Date])
FROM [Customer_Sales] s
WHERE MONTH(s.[Create_Date]) <= MONTH(t.[Create_Date])
), Monthly_Bucket=MONTH([Create_Date])
FROM Customer_Sales t
WHERE YEAR(t.[Create_Date]) = ????
GROUP BY MONTH(t.[Create_Date])
Where [Customer_Sales] is the sales table and ??? = your year

Oracle - Count based on previous and next column

I've got a rather unusual question about some database query with oracle.
I got asked if it's possible to get the number of cases where the patient got a resumption on the same station they were discharged from within 48 / 72 hours.
Consider the following example:
Case
Station
From
To
1
Stat_1
2020-01-03 20:10:00
2020-01-04 17:40:00
1
Stat_2
2020-01-04 17:40:00
2020-01-05 09:35:00
1
Stat_1
2020-01-05 09:35:00
2020-01-10 12:33:00
In this example, I'd have to check the difference between the last discharge time from station one and the first admission time when he's again registered at station 1. This should then count as one readmission.
I've tried some stuff with LAG and LEAD, but you can't use them in the WHERE-Clause, so that's not too useful I guess.
LAG (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Prev_Stat,
LEAD (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Next_Stat,
LAG (vfs.GUELTIG_BIS, 1) OVER (ORDER BY vfs.GUELTIG_BIS) AS End_Prev_Stat,
LEAD (vfs.GUELTIG_AB, 1) OVER (ORDER BY vfs.GUELTIG_AB) AS Begin_Next_Stat
I am able to get the old values, but I can't do something like calculate the difference between those two dates.
Is this even possible to achieve? I can't really wrap my head around how to do it with SQL.
Thanks in advance!
You need a partition by clause to retrieve the previous discharge date of the same user in the same station. Then, you can filter in an outer query:
select count(*) as cnt
from (
select case_no, station, dt_from, dt_to
lag(dt_to) over(partition by case_no, station order by dt_from) as lag_dt_to
from mytable t
) t
where dt_from < lag_dt_to + 2
This counts how many rows have a gap of less than 2 days with the previous discharge date of the same user in the same station.
This assumes that your are string your dates as dates. If you have timestamps instead, you need interval arithmetics, so:
where dt_from < lag_dt_to + interval '2' day
Note that case, from and to are reserverd words in Oracle: I used alternative names in the query.

how to loop through a specified range

I have a database of movies where one field is the year which it was released.
I want to create a query which will loop through each decade and will calculate the sum of a particular field for that decade. I have no idea how I can get a loop for every decade. Can anyone help?
If you want the decades where you don't have any movies as well as those with movies, then you can use generate_series to build you list of decades and the do a left outer join to your table; generate_series is the standard way to build numeric and time lists on the fly in PostgreSQL. Something like this should get you started:
select decade.d, count(t.year)
from generate_series(1900, 2100, 10) as decade(d)
left outer join your_table t on decade.d = floor(t.year / 10) * 10
group by decade.d
order by decade.d
That will produce output like this:
d | count
------+-------
1900 | 1
1910 | 0
1920 | 1
1930 | 3
1940 | 0
1950 | 0
1960 | 1
1970 | 0
1980 | 3
-- ...
2100 | 0
You could adjust the first and last values for the generate_series call to match your data if desired.
The floor(t.year / 10) * 10 bit gives you decade for a given year; it will convert 1942 to 1940, 2000 to 2000, etc.
You can set up a decade table (a one column table with one entry for each decade) if you move to a database that doesn't have something like generate_series. The SQL would be pretty much the same, just replace the generate_series call with your decade table.
Try something like this(don't know how your tables look, guessing):
SELECT movie_year, sum(column_x)
FROM (
SELECT year
, date_trunc('decade', movie_year)::date as decade
, column_x
FROM movies) as movies_with_decades
GROUP BY decade
ORDER BY decade;