How to generate one row per week when specifying a month in the WHERE clause PSQL - sql

I have a database (psql) which is supposed to work as a Library Management System. I have a table called borrowed which contains some information about each borrowing of books. Each book has a borrowdate column, a date value specifying when the book was borrowed from the Library. Each book also has a returndate column, which tells us when the book was returned. If it hasn't been returned, the value of returndate will be null.
I'm now trying to write a query that presents a table that shows a monthly report for the number of books borrowed/returned for each week (for example week 1-4). I've managed to write a query that shows a table of the number of books borrowed, returned, and not returned (missing) for a specified time interval. Time interval is provided in the WHERE clause.
How can I split this time interval up into 4 equally sized parts so that I'll get one row for each part? Essentially, I want to get one row per week in a month. See my query below!
SELECT
ROW_NUMBER() OVER(ORDER BY COUNT(borrowid)) AS week,
COUNT(borrowid) AS borrowedbooks,
SUM(CASE WHEN returndate IS NULL THEN 1 ELSE 0 END) AS returnedbooks,
SUM(CASE WHEN returndate IS NULL THEN 1 ELSE 1 END) -
SUM(CASE WHEN returndate IS NULL THEN 1 ELSE 0 END) AS missingbooks
FROM borrowed
WHERE borrowdate >= '20190901'
AND borrowdate <= '20190930'
;
Thank you so much for any help!

The key is aggregation, but by what? You can calculate the date as a week and group by that calculation:
SELECT TO_CHAR(borrowdate, 'IYYY-IW') AS week,
COUNT(borrowid) AS borrowedbooks,
COUNT(*) FILTER (WHERE returndate IS NOT NULL) AS returnedbooks,
COUNT(*) FILTER (WHERE returndate IS NULL) AS missingbooks
FROM borrowed
WHERE borrowdate >= '20190901' AND borrowdate <= '20190930'
GROUP BY week
ORDER BY week;
Note that this uses TO_CHAR() to generate a standard definition for "week".
Note that I changed the conditional aggregation logic. Postgres supports the standard FILTER clause, so I've used that.

Related

How to write SQL statement to select for data broken up for each month of the year?

I am looking for a way to write an SQL statement that selects data for each month of the year, separately.
In the SQL statement below, I am trying to count the number of instances in the TOTAL_PRECIP_IN and TOTAL_SNOWFALL_IN columns when either column is greater than 0. In my data table, I have information for those two columns ("TOTAL_PRECIP_IN" and "TOTAL_SNOWFALL_IN") for each day of the year (365 total entries).
I want to break up my data by each calendar month, but am not sure of the best way to do this. In the statement below, I am using a UNION statement to break up the months of January and February. If I keep using UNION statements for the remaining months of the year, I can get the answer I am looking for. However, using 11 different UNION statements cannot be the optimal solution.
Can anyone give me a suggestion how I can edit my SQL statement to measure from the first day of the month, to the last day of the month for every month of the year?
select monthname(OBSERVATION_DATE) as "Month", sum(case when TOTAL_PRECIP_IN or TOTAL_SNOWFALL_IN > 0 then 1 else 0 end) AS "Days of Rain" from EMP_BASIC
where OBSERVATION_DATE between '2019-01-01' and '2019-01-31'
and CITY = 'Olympia'
group by "Month"
UNION
select monthname(OBSERVATION_DATE) as "Month", sum(case when TOTAL_PRECIP_IN or TOTAL_SNOWFALL_IN > 0 then 1 else 0 end) from EMP_BASIC
where OBSERVATION_DATE between '2019-02-01' and '2019-02-28'
and CITY = 'Olympia'
group by "Month"```
Your table structure is too unclear to tell you the exact query you will need. But a general easy idea is to build the sum of your value and then group by monthname and/or by month. Sice you wrote you only want sum values greater 0, you can just put this condition in the where clause. So your query will be something like this:
SELECT MONTHNAME(yourdate) AS month,
MONTH(yourdate) AS monthnr,
SUM(yourvalue) AS yoursum
FROM yourtable
WHERE yourvalue > 0
GROUP BY MONTHNAME(yourdate), MONTH(yourdate)
ORDER BY MONTH(yourdate);
I created an example here: db<>fiddle
You might need to modify this general construct for your concrete purpose (maybe take care of different years, of NULL values etc.). And note this is an example for a MYSQL DB because you wrote about MONTHNAME() which is in most cases used in MYSQL databases. If you are using another DB type, maybe you need to do some modifications. To make sure that answers match your DB type, tag it in your question, please.

SQL Server query date and amount

I am trying to create an SQL query which is based on the following info.
I have an amount bought and sold for each day for articles. I am trying to have a query that shows:
Total "amount" per "article" per "month
"amount" should be split into "positive total" and "negative total", summing up all positive "amount" and all negative "amount" separately.
THe date has the format "yyyy-mm-dd 00:00:00.000"
I tried the following
SELECT article, date, SUM (amount) Total FROM shop group by FORMAT(date, 'yyyy_MM'), article
I get the following message
"date is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
If I take the date out of the query everything works fine and it calculates the totals.
You need a so-called injective function for your dates: way to convert all dates in a month to the same value, and you need to use it both in your SELECT and GROUP BY clauses. LAST_DAY() is a decent function to use. So is DATE_FORMAT(date, '%Y-%m'), by the way.
Try this:
SELECT article, LAST_DAY(date) month_ending, SUM (amount) Total
FROM shop
GROUP BY LAST_DAY(date), article
There's a general writeup of solutions to this problem here.
Original answer for MySQL
There are mainly two mistakes in your query:
FORMAT is a function that converts a number to a string. So, MySQL will convert your date to a number first (which should not even be possible and raise an error, but MySQL does convert it to some number nonetheless) and then make sense of the format 'yyyy_MM', maybe taking MM to mean Myanmar, I don't know. I assume you get a different value for each day, instead of one value per month. You want DATE_FORMAT(date, '%Y-%m') instead.
You try to group by month, but then you display the date. Which date? A month has up to 31 different dates. You must display the month you grouped by instead (i.e. again DATE_FORMAT(date, '%Y-%m')).
As to separating positive and negative amounts, you can use conditional aggrgation, i.e. CASE WHEN inside the aggregation function (SUM).
SELECT
DATE_FORMAT(date, '%Y-%m') AS month,
article,
SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS positive_total,
SUM(CASE WHEN amount < 0 THEN amount ELSE 0 END) AS negative_total
FROM shop
GROUP BY DATE_FORMAT(date, '%Y-%m'), article
ORDER BY DATE_FORMAT(date, '%Y-%m'), article;
Updated answer for SQL Server
In SQL Server FORMAT(date, 'yyyy_MM') is a function to get the year and month from a date. The query is hence:
SELECT
FORMAT(date, 'yyyy_MM') AS month,
article,
SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS positive_total,
SUM(CASE WHEN amount < 0 THEN amount ELSE 0 END) AS negative_total
FROM shop
GROUP BY FORMAT(date, 'yyyy_MM'), article
ORDER BY FORMAT(date, 'yyyy_MM'), article;

How to find if there is a match in a certain time interval between two different date fields SQL?

I have a column in my fact table that defines whether a Supplier is old or new based on the following case-statement:
CASE
WHEN (SUBSTRING([Reg date], 1, 6) = SUBSTRING([Invoice date], 1, 6)
THEN ('New supplier')
ELSE('Old supplier')
END as [Old/New supplier]
So for example, if a Supplier was registered 201910 and Invoice date was 201910 then the Supplier would be considered a 'New supplier' that month. Now I want to calculate the number of Old/New suppliers for each month by doing an distinct count on Supplier no, which is not a problem. The last step is where it gets tricky, now I want to count the number of New/Old suppliers over a 12-month period(if there has been a match on Invoice date and reg date in any of the lagging 12 months). So I create the following mdx expression:
aggregate(parallelperiod([D Time].[Year-Month-Day].[Year],1,[D Time].[Year-Month-Day].currentmember).lead(1) : [D Time].[Year-Month-Day].currentmember ,[Measures].[Supplier No Distinct Count])
The issue I am facing is that it will count Supplier no "1234" twice since it has been both new and old during that time period. What I wish is that, if it finds one match it would be considered a "New" Supplier for that 12- month period.
This is how the result ends up looking but I want it to be zero for "Old" since Reg date and Invoice date matched once during that 12-month period it should be considered new for the whole Rolling 12 month on 201910
Any help, possible approaches or ideas are highly appreciated.
Best regards,
Rubrix
Aggregate first at the supplier level and then at the type level:
select type, count(*)
from (select supplierid,
(case when min(substring(regdate, 1, 6)) = min(substring(invoicedate, 1, 6))
then 'new' else 'old'
end) as type
from t
group by supplierid
) s
group by type;
Note: I assume your date columns are in some obscure string format for your code to work. Otherwise, you should be using appropriate date functions.
SELECT COUNT(*) OVER () AS TotalCount
FROM Facts
WHERE Regdate BETWEEN(olddate, newdate) OR InvoiceDate BETWEEN(olddate, newdate)
GROUP BY
Supplier
The above query will return all the suppliers within that time period and then group them. Thus COUNT(*) will only include unique subscribers.
You might wanna change the WHERE clause because I didn't quite understand how you are getting the 12 month period. Generally if your where clause returns the suppliers within that time period(they don't have to be unique) then the group by and count will handle the rest.

How many customers upgraded from Product A to Product B?

I have a "daily changes" table that records when a customer "upgrades" or "downgrades" their membership level. In the table, let's say field 1 is customer ID, field 2 is membership type and field 3 is the date of change. Customers 123 and ABC each have two rows in the table. Values in field 1 (ID) are the same, but values in field 2 (TYPE) and 3 (DATE) are different. I'd like to write a SQL query to tell me how many customers "upgraded" from membership type 1 to membership type 2 how many customers "downgraded" from membership type 2 to membership type 1 in any given time frame.
The table also shows other types of changes. To identify the records with changes in the membership type field, I've created the following code:
SELECT *
FROM member_detail_daily_changes_new
WHERE customer IN (
SELECT customer
FROM member_detail_daily_changes_new
GROUP BY customer
HAVING COUNT(distinct member_type_cd) > 1)
I'd like to see an end report which tells me:
For Fiscal 2018,
X,XXX customers moved from Member Type 1 to Member Type 2 and
X,XXX customers moved from Member Type 2 to Member type 1
Sounds like a good time to use a LEAD() analytical function to look ahead for a given customer's member_Type; compare it to current record and then evaluate if thats an upgrade/downgrade then sum results.
DEMO
CTE AS (SELECT case when lead(Member_Type_Code) over (partition by Customer order by date asc) > member_Type_Code then 1 else 0 end as Upgrade
, case when lead(Member_Type_Code) over (partition by Customer order by date asc) < member_Type_Code then 1 else 0 end as DownGrade
FROM member_detail_daily_changes_new
WHERE Date between '20190101' and '20190201')
SELECT sum(Upgrade) upgrades, sum(downgrade) downgrades
FROM CTE
Giving us: using my sample data
+----+----------+------------+
| | upgrades | downgrades |
+----+----------+------------+
| 1 | 3 | 2 |
+----+----------+------------+
I'm not sure if SQL express on rex tester just doesn't support the sum() on the analytic itself which is why I had to add the CTE or if that's a rule in non-SQL express versions too.
Some other notes:
I let the system implicitly cast the dates in the where clause
I assume the member_Type_Code itself tells me if it's an upgrade or downgrade which long term probably isn't right. Say we add membership type 3 and it goes between 1 and 2... now what... So maybe we need a decimal number outside of the Member_Type_Code so we can handle future memberships and if it's an upgrade/downgrade or a lateral...
I assumed all upgrades/downgrades are counted and a user can be counted multiple times if membership changed that often in time period desired.
I assume an upgrade/downgrade can't occur on the same date/time. Otherwise the sorting for lead may not work right. (but if it's a timestamp field we shouldn't have an issue)
So how does this work?
We use a Common table expression (CTE) to generate the desired evaluations of downgrade/upgrade per customer. This could be done in a derived table as well in-line but I find CTE's easier to read; and then we sum it up.
Lead(Member_Type_Code) over (partition by customer order by date asc) does the following
It organizes the data by customer and then sorts it by date in ascending order.
So we end up getting all the same customers records in subsequent rows ordered by date. Lead(field) then starts on record 1 and Looks ahead to record 2 for the same customer and returns the Member_Type_Code of record 2 on record 1. We then can compare those type codes and determine if an upgrade or downgrade occurred. We then are able to sum the results of the comparison and provide the desired totals.
And now we have a long winded explanation for a very small query :P
You want to use lag() for this, but you need to be careful about the date filtering. So, I think you want:
SELECT prev_membership_type, membership_type,
COUNT(*) as num_changes,
COUNT(DISTINCT member) as num_members
FROM (SELECT mddc.*,
LAG(mddc.membership_type) OVER (PARTITION BY mddc.customer_id ORDER BY mddc.date) as prev_membership_type
FROM member_detail_daily_changes_new mddc
) mddc
WHERE prev_membership_type <> membership_type AND
date >= '2018-01-01' AND
date < '2019-01-01'
GROUP BY membership_type, prev_membership_type;
Notes:
The filtering on date needs to occur after the calculation of lag().
This takes into account that members may have a certain type in 2017 and then change to a new type in 2018.
The date filtering is compatible with indexes.
Two values are calculated. One is the overall number of changes. The other counts each member only once for each type of change.
With conditional aggregation after self joining the table:
select
2018 fiscal,
sum(case when m.member_type_cd > t.member_type_cd then 1 else 0 end) upgrades,
sum(case when m.member_type_cd < t.member_type_cd then 1 else 0 end) downgrades
from member_detail_daily_changes_new m inner join member_detail_daily_changes_new t
on
t.customer = m.customer
and
t.changedate = (
select max(changedate) from member_detail_daily_changes_new
where customer = m.customer and changedate < m.changedate
)
where year(m.changedate) = 2018
This will work even if there are more than 2 types of membership level.

SQL: Average value per day

I have a database called ‘tweets’. The database 'tweets' includes (amongst others) the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss), ‘classified’ and 'processed text'. Within the ‘processed text’ row there are certain strings such as {TICKER|IBM}', to which I will refer as ticker-strings.
My target is to get the average value of ‘classified’ per ticker-string per day. The row ‘classified’ includes the numerical values -1, 0 and 1.
At this moment, I have a working SQL query for the average value of ‘classified’ for one ticker-string per day. See the script below.
SELECT Date( `created_at` ) , AVG( `classified` ) AS Classified
FROM `tweets`
WHERE `processed_text` LIKE '%{TICKER|IBM}%'
GROUP BY Date( `created_at` )
There are however two problems with this script:
It does not include days on which there were zero ‘processed_text’s like {TICKER|IBM}. I would however like it to spit out the value zero in this case.
I have 100+ different ticker-strings and would thus like to have a script which can process multiple strings at the same time. I can also do them manually, one by one, but this would cost me a terrible lot of time.
When I had a similar question for counting the ‘tweet_id’s per ticker-string, somebody else suggested using the following:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG,
coalesce(BAC, 0) AS BAC
FROM dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
This script worked perfectly for counting the tweet_ids per ticker-string. As I however stated, I am not looking to find the average classified scores per ticker-string. My question is therefore: Could someone show me how to adjust this script in such a way that I can calculate the average classified scores per ticker-string per day?
SELECT d.date, t.ticker, COALESCE(COUNT(DISTINCT tweet_id), 0) AS tweets
FROM dates d
LEFT JOIN
(SELECT DATE(created_at) AS date,
SUBSTR(processed_text,
LOCATE('{TICKER|', processed_text) + 8,
LOCATE('}', processed_text, LOCATE('{TICKER|', processed_text))
- LOCATE('{TICKER|', processed_text) - 8)) t
ON d.date = t.date
GROUP BY d.date, t.ticker
This will put each ticker on its own row, not a column. If you want them moved to columns, you have to pivot the result. How you do this depends on the DBMS. Some have built-in features for creating pivot tables. Others (e.g. MySQL) do not and you have to write tricky code to do it; if you know all the possible values ahead of time, it's not too hard, but if they can change you have to write dynamic SQL in a stored procedure.
See MySQL pivot table for how to do it in MySQL.