how do I return the latest month submission from each site in a database? - sql

Hi I'm trying to update one of my pre extraction check query to ensure that all the submitters have made submissions before I extract the updated data as the final data set is something like 49 columns by 80k+rows.
Currently my test code is either a distinct list of submitters filtered for a certain year (financial year) in format YY/YY and current period (financial month) format M. I restrict the dataset which covers all submitters to the region I cover and then manually change the year or month to the correct values for the current period.
What I am hoping to do is change the code that says
SELECT DISTINCT
site,
activity_period
WHERE activity_period= '9'
AND (site = 'site 1'
OR site = 'site 2'
OR site = 'site x')
AND year = 'yy/yy';
FROM submissions
What I want to do is change this from having the month statement needing to be entered manually each month, into a statement that return's the max month in a given year for each of the sites in the site = statement. Then I'd order this by the month in ascending order so I can see the sites yet to submit and can chase accordingly does anyone know how I would do this.
Additionally it would be nice to know how I could set the year/month to be current period by asking for the max value that the database contains within for a separate check I'm working on that would be great.
edit notes added in changed month to activity_period so the columns make sense and added in a from statement. Edit 2 added in a couple sample tables one showing a few lines that may be in the data and the other showing what the outcome would want to show.
sample data
| site| value |submission month|
| --- | --- |--- |
| site 1| 40 |1|
| site 1| 40 |2|
| site 2| 5 |1|
| site 3| 400 |1|
| site 3| 409 |2|
| site 3| 4 |3|
output of query
| site| latest month received|
| ---|---|
| site 2|1|
| site 1|2|
| site 3|3|
I am not interested in finding any of the data (value of submission or other fields within the data I haven't put in the dummy) rather I just want to know which sites have not yet made submissions by having the sites with not having the latest month data (3 in the dummy) shown above the latest data so at a glance I can say yes I have all 18 of my sites in the latest month or not.

Just want to start by rewriting your query with where clause after from
SELECT DISTINCT
site,
activity_period
FROM submissions
WHERE activity_period= '9'
AND (site = 'site 1'
OR site = 'site 2'
OR site = 'site x')
AND year = 'yy/yy';
If you want to only pull the "latest" month per site you can join the submisisons table to itself
SELECT DISTINCT
site,
activity_period
FROM submissions s
inner join (select max(year) year, site from submissions group by site) s2
on s2.site = s.site and s2.year = s.year
WHERE activity_period= '9'
AND (site = 'site 1'
OR site = 'site 2'
OR site = 'site x');
The query provided differs a little bit from the sample data. Try running this to see how it works using your sample data
with a (site, submission, month) as(
select 'site 1' site, ' 40' submission,'1' month
union all select'site 1', '40' ,'2'
union all select 'site 2', '5' ,'1'
union all select 'site 3', '400' ,'1'
union all select 'site 3', '409' ,'2'
union all select 'site 3', '4' ,'3'
)
select a.* from a
inner join (select site, max(month) month from a group by site) a2
on a2.site=a.site and a2.month=a.month
Output is as below
site submission month
site 1 40 2
site 2 5 1
site 3 4 3
Explanation. What we're doing is creating a temporary table that has each site and the max or latest month. If we join this to our initial table with matching criteria on month and site we essentially filter the initial table for latest month per site.

Related

Newbie struggling to join 3 tables

I have 3 tables of purchases in the following format:
date | company_id | apple_txn_amt
date | company_id | orange_txn_amt
date | company_id | pear_txn_amt
There are multiple purchases/sales daily for many companies. I'm trying to join and group so there is only 1 date per company along with total fruit balance:
date | company_id | total_apple_balance | total_apple_orange_balance | total_pear_balance
I have built a query for a similar case earlier, and used 2 joins. But this was for only one company's data so I was only joining on date=date for each table. Process for each table was: gather buys, sells, union those two, union to a new table with generate_series() to insert 0s for days missing, calculate daily delta, and group by day to have a running total. Then something like:
SELECT
apple.day
apple.total
orange.total
pear.total
(apple + orange + pear) AS total_fruit
FROM apple
JOIN orange ON orange.date = apple.date
JOIN pear ON pear.date = apple.date
ORDER BY day
It's like I need to JOIN ON date and company id but from what I can tell this isn't possible.
Should I approach this in a different way?
Sure you can add the company_id like
SELECT
apple.day
apple.total
orange.total
pear.total
(apple.total + orange.total + pear.total) AS total_fruit
FROM apple
JOIN orange ON orange.date = apple.date AND orange.company_id = apple.company_id
JOIN pear ON pear.date = apple.date AND pear.company_id = apple.company_id
ORDER BY day
But the design of your database isn't right, if circumstances don't require it.
you would not have 3 tables, you would have only one with Fruit type as another column, to differentiate them

Select specific grouped element

Hi I would like to ask you because I cannot find solution.
For example I have data like that:
number | date | user
10 | 2022-07-01 | A
15 | 2022-07-08 | A
9 | 2022-07-10 | A
Right now I need get the number for user where date is the newer one.
In this case I need get value 9
Ofcourse I have many diffrent users it is only for illustrate the issue.
Now I would like to select all unique users with his number that date is the newer one.
Is it possible to do it in one query?
I like to teach these problems by breaking it down into smaller problems.
First, write a query that tells you which is the "newer one" for each user.
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
Next, you can join that result back to the original data. My favorite style is called a CTE which makes things readable and easier to debug. Like so,
WITH newest AS (
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
)
SELECT original.*
FROM tbl AS original
INNER JOIN newest
ON original.user = newest.user
AND original.date = newest.newer_one
Some RDBMS don't support this CTE style, but you can put the query in the body which makes it harder to read sometimes but will work basically anywhere.
SELECT original.*
FROM tbl AS original
INNER JOIN
(
SELECT user, max(date) as newer_one
FROM tbl
GROUP BY user
) as newest
ON original.user = newest.user
AND original.date = newest.newer_one

How can I generate a list while ignoring records that have a date that does not fit into the range I am looking for?

I am using Microsoft SQL Server, I currently have a table with records for accounts. These master accounts can have several sub accounts linked to them. For example, master account XXX can have sub account XXXA and XXXB and... XXXN and so on and so forth.
These sub accounts can be opened and added to the master account XXX across time, so at different points in time. When a new sub account is opened, it also opens a new master account. From that point on, other sub accounts can be added to that master account number.
I have a column with the account opening dates. These dates are linked to when the sub accounts are opened.
I am trying to generate a list of master accounts (not sub accounts), that were opened between 2018-11-01 and 2019-02-15. However, I only want to include new MASTER ACCOUNTS, therefore ignoring any master accounts that have an account opening date prior to 2018-11-01.
The issue I am having is master accounts that are showing up in my generated list because they have sub accounts that have been added to them during the date ranges I am looking for.
I Have tried using the MIN function inside of having on my dates. I have checked other stack overflow threads for a solution as well
SELECT master_accounts, accountopendate, accountclosedate
FROM accounts
GROUP BY master_accounts, accountopendate, accountclosedate
HAVING MIN(accountopendate) BETWEEN '2018-11-01' AND '2019-02-15';
It gave me a list of the master accounts, however upon doing some QA, I find some master accounts in the list, that have been opened prior to 2018-11-01.
I would like a list of master accounts with the oldest account opening date being 2018-11-01, ignoring all the master accounts with account opening dates prior to 2018-11-01.
EXPECTED RESULT:
+-----------------+-----------------+------------------+
| master_accounts | accountopendate | accountclosedate |
+-----------------+-----------------+------------------+
| XXX | 2018-11-01 | NULL |
| ZZZ | 2018-12-01 | NULL |
| YYY | 2019-02-01 | NULL |
+-----------------+-----------------+------------------+
This should work, assuming the earliest opening date is always going to include the master account number.
First, isolate the account numbers and the initial opening date, then join that result set to your base table. I used a CTE, but a sub-query would accomplish the same thing.
Using a CTE:
WITH masterOpen AS
(
SELECT
master_accounts
,MIN(accountopendate) AS openDate
FROM
dbo.accounts
GROUP BY
master_accounts
)
SELECT
a.master_accounts
,a.accountopendate
,a.accountclosedate
FROM
dbo.accounts AS a
JOIN
masterOpen AS mo
ON
mo.master_accounts = a.master_accounts
AND
mo.openDate = a.accountopendate
AND
mo.openDate >= '2018-11-01'
AND
mo.openDate <= '2019-02-15';
Sub-query instead:
SELECT
a.master_accounts
,a.accountopendate
,a.accountclosedate
FROM
(
SELECT
master_accounts
,MIN(accountopendate) AS openDate
FROM
dbo.accounts
GROUP BY
master_accounts
) AS mo
JOIN
dbo.accounts AS a
ON
mo.master_accounts = a.master_accounts
AND
mo.openDate = a.accountopendate
AND
mo.openDate >= '2018-11-01'
AND
mo.openDate <= '2019-02-15';
The date parameters could also be broken out into a WHERE clause if you prefer, but with an INNER JOIN it will yield the same results. For current versions of the SQL engine, it's more a matter of preference than performance.
Why not just use filter
SELECT distinct master_accounts, accountopendate, accountclosedate
FROM accounts where accountopendate>='2018-11-01' AND accountopendate<='2019-02-15'

The best way to keep count data in postgres

I need to create a statistic for some aggragete date splitted by days.
For example:
select
(select count(*) from bananas) as bananas_count,
(select count(*) from apples) as apples_count,
(select count(*) from bananas where color = 'yellow') as yellow_bananas_count;
obviously I will get:
bananas_count | apples_count | yellow_bananas_count
--------------+------------------+ ---------------------
123| 321 | 15
but I need to get that data grouped by day, we need to know how many banaras we had yesterday.
The first thought which I got is create aview, but in that case i will not be able split by dates ( or I don't know how to do it).
I need a performance-wise database sided implementation of this task.

Select unique records and display as category headers in rails

I have a rails 3.2 app running on PostgreSQL, and have some data I want to display in my view, which is stored in the database in this structure:
+----+--------+------------------+--------------------+
| id | name | sched_start_date | task |
+----+--------+------------------+--------------------+
| 1 | "Ben" | 2013-03-01 | "Check for debris" |
+----+--------+------------------+--------------------+
| 2 | "Toby" | 2013-03-02 | "Carry out Y1.1" |
+----+--------+------------------+--------------------+
| 3 | "Toby" | 2013-03-03 | "Check oil seals" |
+----+--------+------------------+--------------------+
I would like to display a list of tasks for each name, and for the names to be ordered ASC by the first sched_start_date they have, which should look like ...
Ben
2013-03-01 – Check for debris
Toby
2013-03-02 – Carry out Y1.1
2013-03-03 – Check oil seals
The approach I starting taking was to run a query for unique names and order them by sched_start_date ASC, then run a query for each name to get their tasks.
To get a list of unique names, the SQL would look like this.
select *
from (
select distinct on (name) name, sched_start_date
from tasks
) p
order by sched_start_date;
I would like to know if this is the correct approach (querying for unique names then running another query for all their tasks), or if there is a better rails way.
To get the data sorted like you describe, you might want to use min() as window function in the ORDER BY clause:
SELECT name, sched_start_date, task
FROM tasks
ORDER BY min(sched_start_date) OVER (PARTITION BY name), 1, 2, 3
Your original query would need an additional ORDER BY item to get the earliest date per name:
SELECT DISTINCT ON (name) name, sched_start_date, task
FROM tasks
ORDER BY 1, 2, 3;
I also added task (3) as last ORDER BY item to break ties, in case there can be more than one per date.
But the output is still ordered by name, not by date.
Getting your peculiar format with all data stuffed into one column is a bit more complex:
SELECT one_col
FROM (
WITH x AS (
SELECT name, min(sched_start_date) AS min_start
FROM tasks
GROUP BY 1
)
SELECT 2 AS rnk, name
,sched_start_date::text || ' – ' || task AS one_col
,sched_start_date, min_start
FROM tasks
JOIN x USING (name)
UNION ALL
SELECT 1 AS rnk, name, name, NULL::date, min_start
FROM x
ORDER BY min_start, name, rnk, sched_start_date, task
) y
Assuming that you have associations in your model you would be able to run
#employees = Employee.order(:name, :sched_start_date, :task).includes(:tasks)
You could then iterate over them:
#employees.each do |employee|
employee.name
employee.tasks.each do |task|
task.name
end
end
This isn't gonna exactly match your needs, but should show you where to start.