How to select the correct date in the same column when data in different rows are equal - sql

I have the following problem with my data on a DB2 database. I want to create an overview when a machine was used for a project with a begin and end date.
The following data is available:
||Machine name||Description||Project||Start date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|16-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|24-04-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|12-06-2017|
The result that I'm looking for is:
|Machine name||Description||Project||Start date||Last date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|13-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|12-06-2017|
Does anybody have an idea how to create this result with a statement?

This is a classic gaps-and-islands problem, and the standard solutions will work just fine:
WITH Grouped_Run AS (SELECT name, description, project, test, executedOn,
ROW_NUMBER() OVER(ORDER BY executedOn) -
ROW_NUMBER() OVER(PARTITION BY name, description, project, test ORDER BY executedOn) AS groupingId
FROM Machine)
SELECT name, description, project, test, MIN(executedOn) as testStart
FROM Grouped_Run
GROUP BY name, description, project, test, groupingId
ORDER BY testStart
Fiddle example
(it's a little unclear if the group is going to be the whole row, but that's adjustable)
....will produce the results you're looking for.
Note that depending on what specific version you're on, there may be other/faster ways to achieve these results.

It seems like you're trying to get the first and last of "Start date". Write a GROUP BY query with MIN(Start date) and another with MAX(Start date) then union the results. You'll have to select DISTINCT or do another GROUP BY to eliminate the duplicates that will occur when there's only one date.

Related

Get first record based on time in PostgreSQL

DO we have a way to get first record considering the time.
example
get first record today, get first record yesterday, get first record day before yesterday ...
Note: I want to get all records considering the time
sample expected output should be
first_record_today,
first_record_yesterday,..
As I understand the question, the "first" record per day is the earliest one.
For that, we can use RANK and do the PARTITION BY the day only, truncating the time.
In the ORDER BY clause, we will sort by the time:
SELECT sub.yourdate FROM (
SELECT yourdate,
RANK() OVER
(PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
In the main query, we will sort the data beginning with the latest date, meaning today's one, if available.
We can try out here: db<>fiddle
If this understanding of the question is incorrect, please let us know what to change by editing your question.
A note: Using a window function is not necessary according to your description. A shorter GROUP BY like shown in the other answer can produce the correct result, too and might be absolutely fine. I like the window function approach because this makes it easy to add further conditions or change conditions which might not be usable in a simple GROUP BY, therefore I chose this way.
EDIT because the question's author provided further information:
Here the query fetching also the first message:
SELECT sub.yourdate, sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Or if only the message without the date should be selected:
SELECT sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Updated fiddle here: db<>fiddle

SQL Grabbing unque counts per category

I'm pretty new to SQL and Redshift, but there is a weird problem I'm getting.
So my data looks like below. Ignore id, date_time actual values... I just put random info, but its the same format
id date_time(var char 255)
1 2019-01-11T05:01:59
1 2019-01-11T05:01:59
2 2019-01-11T05:01:59
3 2019-01-11T05:01:59
1 2019-02-11T05:01:59
2 2019-02-11T05:01:59
I'm trying to get the number of counts of unique ID's per month.
I've tried the following command below. Given the amount of data, I just tried to do a demo of the first 10 rows of my table...
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 10*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7), id
I expect something like below.
count left
3 2019-01
2 2019-02
But I'm instead getting similar to what's below
I then tried the below command which seems correct.
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 1000000*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7)
However, if you remove the DISTINCT portion, you get the results below. It seems like it is only looking at a certain month (2019-01), rather than other months.
If anyone can tell me what is wrong with the commands I'm using or can give me the correct command, I'll be very grateful. Thank you.
EDIT: Could it possibly be because maybe my data isn't clean?
Why are you using a string for the date? That is simply wrong. There are built-in types. But assuming you have some reason or cannot change it, use string functions:
select left(date_time, 7) as yyyymm,
count(distinct id)
from t
group by yyyymm
order by yyyymm;
In your first query you have id in the group by which does not do what you want.

Get Max(date) or latest date with 2 conditions or group by or subquery

I only have basic SQL skills. I'm working in SQL in Navicat. I've looked through the threads of people who were also trying to get latest date, but not yet been able to apply it to my situation.
I am trying to get the latest date for each name, for each chemical. I think of it this way: "Within each chemical, look at data for each name, choose the most recent one."
I have tried using max(date(date)) but it needs to be nested or subqueried within chemical.
I also tried ranking by date(date) DESC, then using LIMIT 1. But I was not able to nest this within chemical either.
When I try to write it as a subquery, I keep getting an error on the ( . I've switched it up so that I am beginning the subquery a number of different ways, but the error returns near that area always.
Here is what the data looks like:
1
Here is one of my failed queries:
SELECT
WELL_NAME,
CHEMICAL,
RESULT,
APPROX_LAT,
APPROX_LONG,
DATE
FROM
data_all
ORDER BY
CHEMICAL ASC,
date( date ) DESC (
SELECT
WELL_NAME,
CHEMICAL,
APPROX_LAT,
APPROX_LONG,
DATE
FROM
data_all
WHERE
WELL_NAME = WELL_NAME
AND CHEMICAL = CHEMICAL
AND APPROX_LAT = APPROX_LAT
AND APPROX_LONG = APPROX_LONG,
LIMIT 2
)
If someone does have a response, it would be great if it is in as lay language as possible. I've only had one coding class. Thanks very much.
Maybe something like this?
SELECT WELL_NAME, CHEMICAL, MAX(DATE)
FROM data_all
GROUP BY WELL_NAME, CHEMICAL
If you want all information, then use the ANSI-standard ROW_NUMBER():
SELECT da.*
FROM (SELECT da.*
ROW_NUMBER() OVER (PARTITION BY chemical, name ORDER BY date DESC) as senum
FROM data_all da
) da
WHERE seqnum = 1;

sql count all items that day until start of database isn't working because of time

I am trying to count each item in a database table, that is deployments. I have counted the total number of items 3879, by doing this:
use Bamboo_Version6
go
SELECT Count(STARTED_DATE)
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
But I have been struggling to get the number of items each day until the start. I have tried using some of the other similar answers to this like:
select STARTED_Date, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
WHERE STARTED_Date>=dateadd(day,datediff(day,0,STARTED_Date)- 7,0)
GROUP BY STARTED_Date
But this will return every id, and a 1 beside it because the dates have times which are making it unique, so I tried doing this: CONVERT(varchar(12),STARTED_DATE,110) to try and fix the problem but it still happens. How can I count this without, getting all the id's or every id as 1 each time?
Remove the time component:
select cast(STARTED_Date as date) as dte, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
group by cast(STARTED_Date as date)
order by dte;
I'm not sure what the WHERE clause is supposed to be doing, so I just removed it. If it is useful, add it back in.
I have another efficient way of doing this, may be try this with an over clause
SELECT cast(STARTED_DATE AS DATE) AS Deployment_date,
COUNT(deploymentID) OVER ( PARTITION BY cast(STARTED_DATE AS DATE) ORDER BY STARTED_DATE) AS NumberofDeployments
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]

Teradata - Max value of a dataset with corresponding date

This is probably obvious, I just can't seem to get it to work right. Let's say I have a table of various servers and their CPU percentages for every day for the past year. I want to basically say:
"for every server name, show me the max CPU value that this server hit (from this dataset) and the corresponding date that it happened on"
So ideally I would get a result like:
server1 52.34% 3/16/2012
server2 48.76% 4/15/2012
server3 98.32% 6/16/2012
etc..
When I try to do this like so, I can't use a group by or else it just shows me every date:
select servername, date, max(cpu) from cpu_values group by 1,2 order by 1,2;
This of course just gives me every server and every date.. Sub-query? Partition by? Any assistance would be appreciated!
You can use the row_number() OLAP window function:
select servername
, cpu
, date
from cpu_values
qualify row_number() over (partition by servername
order by cpu desc) = 1
Notice that you do not need a GROUP BY or ORDER BY clause. The PARTITION clause is similar to a GROUP BY and the ORDER BY clause sorts the rows within each partition (in this case by descending cpu). The "=1" part selects the single row that satisfies the condition.
A subquery would be the simplest solution:
SELECT
S.Name, Peak.PeakUsage, MIN(S.Date) AS Date
FROM
ServerHistory AS S
INNER JOIN
(
SELECT
ID, MAX(CPUUsage) AS PeakUsage
FROM
ServerHistory
WHERE
Date BETWEEN X AND Y
GROUP BY
ID
) AS Peak ON S.ID = Peak.ID
GROUP BY
S.Name, Peak.PeakUsage
P.S., next time around, you may want to tag with "SQL". There are relatively few Teradata people out there, but plenty who can help with basic SQL questions.