Show the most current dateusing MAX Date - sql

Table covidDeaths
Location Date total_cases total_deaths
_______________________________________________________________________
United States 2020-01-22 00:00:00.000 1 NULL
United States 2020-01-23 00:00:00.000 1 0
United States 2020-01-24 00:00:00.000 2 1
United States 2020-01-25 00:00:00.000 2 0
United States 2020-01-26 00:00:00.000 5 3
United States 2021-11-11 00:00:00.000 46851529 58626
United States 2021-11-12 00:00:00.000 46991304 139775
United States 2021-11-13 00:00:00.000 47050502 59198
United States 2021-11-14 00:00:00.000 47074080 23578
I'm running into a problem that is leaving me a bit frustrated. I am looking for the total_cases and total_deaths using the most current date where the location is the United States in a table named covidDeaths. I know you can use the Max() function to find the most current date on file so I have tried
SELECT MAX(date) AS "Current Date", total_deaths, total_cases
FROM covidDeaths
WHERE location = 'United States'
GROUP BY total_cases, total_deaths;
I want it to output a single row like this.
_______________________________________
|Current Date|Total_Deaths|Total_Cases|
|____________|____________|___________|
|2021-11-14 |763092 |47074080 |
|____________|____________|___________|
Instead, I'm getting
_______________________________________
|Current Date|Total_Deaths|Total_Cases|
|____________|____________|___________|
|2020-01-23 |Null |1 |
|____________|____________|___________|
|2020-01-24 |Null |2 |
|____________|____________|___________|
and so on until it reaches the max (date).
I am using SQL Server 2019.
I'm hoping someone can explain to me what I am doing wrong and why it's outputting multiple dates instead of just the most current.

Use a TOP query:
SELECT TOP 1 WITH TIES date AS [Current Date], total_deaths, total_cases
FROM covidDeaths
WHERE location = 'United States'
ORDER BY date DESC;
I am using WITH TIES here in case there might be more than one record having the most recent date. If not, or you don't care about ties, then you may simply use TOP 1 instead.
Note: I see no reason to be using GROUP BY here, as your current query does not select any aggregates.

Related

Date Ranges into Monthly Breakdown

Hi I have a SQL question, I'm trying to get end of month records for each person within a certain date range. Essentially I want this record to be tracking historically (years worth of data) using some sort of End of Month record if their start and end dates fall within the last day of each month. So the data currently looks like this (using just 2022 for simplicity)..
Name
StartDate
EndDate
John Smith
2022-01-15
2022-04-10
Jane Doe
2022-01-18
2022-03-05
Rob Johnson
2022-03-07
2022-07-18
And what I'm looking for is something like this
Name
StartDate
EndDate
EndMonth
John Smith
2022-01-15
2022-04-10
2022-01-31
Jane Doe
2022-01-18
2022-03-05
2022-01-31
John Smith
2022-01-15
2022-04-10
2022-02-28
Jane Doe
2022-01-18
2022-03-05
2022-02-28
John Smith
2022-01-15
2022-04-10
2022-03-31
Rob Johnson
2022-03-07
2022-07-18
2022-03-31
Rob Johnson
2022-03-07
2022-07-18
2022-04-30
Rob Johnson
2022-03-07
2022-07-18
2022-05-31
etc...
I tried connecting the Records table with a Calendar table i have that has End of Month data for each day for several years back but can't figure this out. The Calendar table looks something like this..
Date
EndMonth
2022-01-01
2022-01-31
2022-01-02
2022-01-31
.....
JOINing two tables should give the desired result.
You didn't mention a table name for your initial table,
so I will refer to it as the person table.
Your calendar table is a good start.
I don't believe you need the Date column.
Just a single EndMonth column should suffice.
JOIN person against calendar,
WHERE EndMonth BETWEEN StartDate AND EndDate.
And you're done!
OP reports that this works fine:
SELECT P.Name, P.StartDate, P.EndDate, C.EndMonth
FROM PERSON P, CALENDAR C
WHERE EndMonth BETWEEN StartDate AND EndDate;
Well, this is exactly what the Teradata's proprietary EXPAND ON syntax is used for, no need for a calendar table:
SELECT
t.*
,pd -- EXPAND ON returns a period
,begin(pd) -- only show the start of the period
FROM records AS t
-- create a period on-the-fly, adjust the end date as periods exclude the end
EXPAND ON PERIOD(StartDate, Next(EndDate)) AS pd
BY ANCHOR MONTH_END -- return one row per month

How to measure an average count from a set of days each with their own data points, in SQL/LookerML

I have the following table:
id | decided_at | reviewer
1 2020-08-10 13:00 john
2 2020-08-10 14:00 john
3 2020-08-10 16:00 john
4 2020-08-12 14:00 jane
5 2020-08-12 17:00 jane
6 2020-08-12 17:50 jane
7 2020-08-12 19:00 jane
What I would like to do is get the difference between the min and max for each day and get the total count from the id's that are the min, the range between min and max, and the max. Currently, I'm only able to get this data for the past day.
Desired output:
Date | Time(h) | Count | reviewer
2020-08-10 3 3 john
2020-08-12 5 4 jane
From this, I would like to get the average show this data over the past x number of days.
Example:
If today was the 13th, filter on the past 2 days (48 hours)
Output:
reviewer | reviews/hour
jane 5/4 = 1.25
Example 2:
If today was the 13th, filter on the past 3 days (48 hours)
reviewer | reviews/hour
john 3/3 = 1
jane 5/4 = 1.25
Ideally, if this is possible in LookML without the use of a derived table, it would be nicest to have that. Otherwise, a solution in SQL would be great and I can try to convert to LookerML.
Thanks!
In SQL, one solution is to use two levels of aggregation:
select reviewer, sum(cnt) / sum(diff_h) review_per_hour
from (
select
reviewer,
date(decided_at) decided_date,
count(*) cnt,
timestampdiff(hour, min(decided_at), max(decided_at)) time_h
from mytable
where decided_at >= current_date - interval 2 day
group by reviewer, date(decided_at)
) t
group by reviewer
The subquery filters on the date range, aggregates by reviewer and day, and computes the number of records and the difference between the minimum and the maximum date, as hours. Then, the outer query aggregates by reviewer and does the final computation.
The actual function to compute the date difference varies across databases; timestampdiff() is supported in MySQL - other engines all have alternatives.

SQL: Create a flag for separate records in the same table with overlapping date ranges

I'm trying to figure out how to create a boolean field that would tell me when two records have overlapping date ranges.
IN the following example, every unique Location/Counterparty combo within a specified date range can EITHER have a contract, or a DeliveryPoint, not both. So id 1&2 should be flagged, but id's 3 and 4 are ok because they don't overlap, so the flag should read "False".
I started to do a self join, but after that, I couldn't wrap my head around the next step. Did I start correctly, or is the solution totally different?
id Location Counterparty Contract DeliveryPoint StartDate EndDate
1 New York Wal Mart Philadelphia 3/1/2019 12/31/2020
2 New York Wal Mart 123456 5/1/2019 7/31/2019
3 Toronto Target Boston 3/1/2019 5/31/2019
4 Toronto Target 456789 6/1/2019 12/31/2020
With the flag, I'd want it to look like
id Location Counterparty Contract DeliveryPoint StartDate EndDate Overlap
1 New York Wal Mart Philadelphia 3/1/2019 12/31/2020 TRUE
2 New York Wal Mart 123456 5/1/2019 7/31/2019 TRUE
3 Toronto Target Boston 3/1/2019 5/31/2019 FALSE
4 Toronto Target 456789 6/1/2019 12/31/2020 FALSE
On your insert query, I think you could create a subquery that search other record with overlapping dates. Please attention the date fields test. See the example:
insert into table(location, Counterparty, Overlap)
select
#location,
#Counterparty,
case when exists(select Id
from table t
where t.location = #location
and t.Counterparty = #Counterparty
and #startDate <= t.EndDate
and #endDate >= t.StartDate
) then 1 else 0 end as Overlap

Progress query to remove duplicates based on number of duplicates

Our accounting department needs pull tax data from our MIS every month and submit it online to the Dept. of Revenue. Unfortunately, when pulling the data, it is duplicated a varying number of times depending on which jurisdictions we have to pay taxes to. All she needs is the dollar amount for one jurisdiction, for one line, because she enters that on the website.
I've tried using DISTINCT to pull only one record of the type, in conjunction with LEFT() to pull just the first 7 characters of the jurisdiction but it ended up excluding certain results that should have been included. I believe it was because the posting date and the amount on a couple transactions was identical. They were separate transactions but the query took them as duplicates and ignored them.
Here is a couple of examples of queries I've run that have been successful in pulling most of the data, but most times either too much or not enough:
SELECT DISTINCT LEFT("Sales-Tax-Jurisdiction-Code", 7), "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Here is a query that I can to pull all of the data and the subsequent result is below that:
SELECT "Sales-Tax-Jurisdiction-Code", "Taxable-Base", "Posting-Date"
FROM ARInvoiceTax
WHERE ("Posting-Date" >= '2019-09-01' AND "Posting-Date" <= '2019-09-30')
AND (("Sales-Tax-Jurisdiction-Code" BETWEEN '55001' AND '56763')
OR "Sales-Tax-Jurisdiction-Code" = 'Dakota Cty TT')
ORDER BY "Sales-Tax-Jurisdiction-Code"
Below is a sample of the output:
Jurisdiction | Tax Amount | Posting Date
-------------|------------|-------------
5512100City | $50.00 | 2019-09-02
5512100City | $50.00 | 2019-09-03
5512100City | $70.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-02
5512100Cnty | $50.00 | 2019-09-03
5512100Cnty | $70.00 | 2019-09-02
5512100State | $70.00 | 2019-09-02
5512100State | $50.00 | 2019-09-02
5512100State | $50.00 | 2019-09-03
5513100Cnty | $25.00 | 2019-09-12
5513100State | $25.00 | 2019-09-12
5514100City | $9.00 | 2019-09-06
5514100City | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5514100Cnty | $9.00 | 2019-09-06
5515100State | $12.00 | 2019-09-11
5516100City | $6.00 | 2019-09-13
5516100City | $7.00 | 2019-09-13
5516100State | $6.00 | 2019-09-13
5516100State | $7.00 | 2019-09-13
As you can see, the data can be all over the place. One zip code could have multiple different lines. What the accounting department does now is prints a report with this information and, in a spreadsheet, only records (1) dollar amount per transaction. For example, for 55121, she would need to record $50.00, $50.00 and $70.00 (she tallies them and adds the total amount on the website) however the SQL query gives me those (3) numbers, (3) times.
I can't seem to figure out a query that will pull only one set of the data. Unfortunately, I can't do it based on the words/letters after the 00 because not all jurisdictions have all 3 (city, cnty, state) and thus trying to remove lines based on that removes valid lines as well.
Can you use select distinct? If the first five characters are the zip code and you just want that:
select distinct left(jurisdiction, 5), tax_amount
from t;
Take only City/County/.. whatever is first
select jurisdiction, tax_amount, Posting_Date
from (
select *, dense_rank() over(partition by left(jurisdiction, 7) order by substring(jurisdiction, 8, len(jurisdiction))) rnk
from taxes -- you output here
)
where rnk=1;
Sql server syntax, you may need other string functions in your dbms.
Postgresql fiddle

SQL - Creating a timeline for each ID (Vertica)

I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!
EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.