Selecting a Max and Min from the same column in SQL in a single query - sql

I have a table that contains entries for each step in the supplier delivery process.
For example: arrived, on-premise, offloaded, and off-premise.
When each of these steps happens, we capture the timestamp it happened on.
I want to select the duration it was on-premise for all the suppliers that have a status ID bigger or equal to 30. in other words, the biggest status ID - the smallest status ID Where the smallest Status ID >= 30
Note that not all the steps are necessarily completed, I need to select the highest step for that supplier.
How do I do this in SQL?
My supplier delivery history table's columns:
GUID ID,
DATETIME TimeStamp,
GUID FK_SupplierDeliveryID,
TINYINT FK_SupplierDeliveryStatusID
Supplier delivery status table's columns:
TINYINT ID,
NVARCHAR Description,
Supplier table:
GUID ID,
NVARCHAR SupplierName
Ideally, I would like to return the following fields from the query:
SupplierID, SupplierName, LastStatus, Time In, Time Out, Elapse
where Supplier ID is the ID of the supplier table, Supplier Name is the description of the supplier table, LastStatus is the biggest StatusKey captured for the supplier, Time In is the date of the entry where the StatusKey = 30, Time Out is the date of the entry of the biggest StatusKey captured for the supplier is = 40 else null, and Elapse = Time Out - Time In
I have tried:
SELECT
sdh.FK_SupplierDeliveryID,
MAX(sdh.StatusKey) AS HighestStatus,
MIN(sdh.StatusKey) AS LowestStatus,
MAX(sdh.StatusDate) AS HighestDate,
MIN(sdh.StatusDate) AS LowestDate
FROM
SupplierDeliveryStatusHistory AS sdh
WHERE
sdh.StatusKey> 30
GROUP BY
sdh.FK_SupplierDeliveryID,
sdh.StatusKey

You're aggregating on the sdh.StatusKey column, so you shouldn't group by it:
SELECT
sdh.FK_SupplierDeliveryID,
MAX(sdh.StatusKey) AS HighestStatus,
MIN(sdh.StatusKey) AS LowestStatus
FROM
SupplierDeliveryStatusHistory AS sdh
WHERE
sdh.StatusKey> 30
GROUP BY
sdh.FK_SupplierDeliveryID -- sdh.StatusKey removed here

Related

SQL data cleaning SELECT DISTINCT from duplicate ID and return list of records. Scenario: Return unique IDs for first and latest instance

Dataset: customer_data
Table: customer_table (30 records)
Fields: customer_id, name
Datatype: customer_id = INTEGER, name = STRING
The problem or request: the customer_table contains 30 rows of customer data. However, there are some duplicate rows and I need to clean the data. I am using Google BigQuery to perform my SQL querying and I want to query the customer_table from the customer_data dataset to return unique customer_id along with the corresponding name.
If duplicate customer_id exists, but the duplicate has a different name, return the first instance record and discard the duplicate and continue returning all unique customer_id and name.
Alternately, if duplicate customer_id exists, but has different name, return the latest instance record from the table and discard the duplicate and continue returning all unique customer_id and name.
My methods:
Identify the unique values using SELECT DISTINCT.
SELECT DISTINCT customer_id
FROM customer_data.customer_table
Result: 24 rows
SELECT DISTINCT name
FROM customer_data.customer_table
Result: 25 rows
After finding out the number of unique values from customer_id and name do not match, I suspect one of the customer_id shares two different name.
Visualize which duplicate customer_id has two names:
SELECT DISTINCT customer_id, name
FROM customer_data.customer_table
ORDER BY customer_id ASC
Result: 25 rows
It appears there is one duplicate customer_id and the same customer_id has two different name.
Example:
customer_id
name
1890
Henry Fiction
1890
Arthur Stories
Return DISTINCT customer_id and name. If there are duplicates return only the first, discard the duplicate, and continue returning unique customer_id and name.
SELECT DISTINCT customer_id, name
FROM
(SELECT
customer_id, name,
ROW_NUMBER() OVER (PARTITION BY customer_id
ORDER BY customer_id ASC) AS row_num
FROM
customer_data.customer_table) subquery
WHERE
subquery.rownum = 1
Result: 24 rows
I decided to try using ROW_NUMBER() in a subquery to ask the query to perform an inner task first by making an index for the number of times the query count for each customer_id. Then, have it perform the final task with a WHERE clause to return a list of DISTINCT customer_id and the matching name for the first instance the customer_id is recorded in the customer_table.
Excellent! I was able to make a query to return unique customer_id along with their name from the customer_table, and if there are duplicate customer_id but the duplicate id has different name, create a list of customer_id and name that selects the first instance customer_id is recorded in the customer_table.
Now, what if I wanted to ask the query to create a list of unique customer_id and name that, instead of selecting the first customer_id when it encounter duplicates, select the latest record entry in the table if it encounter duplicate customer_id. How should I approach to solving this problem? What query method would you suggest?
Expected result: 24 rows
What I've tried:
SELECT DISTINCT customer_id, name
FROM
(SELECT
customer_id, name,
ROW_NUMBER() OVER (PARTITION BY customer_id
ORDER BY customer_id ASC) AS row_num
FROM
customer_data.customer_table) subquery
WHERE
subquery.row_num > 1
Result : 4 rows
Desired result: 24 rows
I tried changing the WHERE clause for subquery.row_num > 1 just to see what would change and see the desired values I want in my created list of unique customer_id and name. Of the 4 rows produced from the query, only one row has the duplicate customer_id and different name that I want, which is the latest duplicate customer_id having a different name in the customer_table. Referring back to the example where
SELECT DISTINCT customer_id, name
FROM customer_data.customer_table
revealed:
customer_id
name
1890
Henry Fiction
1890
Arthur Stories
One of the duplicates customer_id, 1890, was recorded first in the table and the other recorded later. The alternate request is to return a list of unique customer_id and name that if the query encounters duplicate customer_id it will select the latest record in the customer_table.
In case you don't have a timestamp when a record was added, I am afraid you won't be able to identify the latest record. Based on this post, BQ does not add the timestamp automatically. Is your table partitioned? If yes, then you might be able to identify the latest record using partitions.

Get MIN aggregate field from table for the past X days from updated_at timestamp column

Given the following table structure and sample data:
CREATE TABLE IF NOT EXISTS `records` (
`id` int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`external_id` int unsigned NOT NULL,
`sub_id` int unsigned DEFAULT 0,
`amount` bigint unsigned NOT NULL,
`updated_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) DEFAULT CHARSET=utf8;
INSERT INTO `records` (`external_id`, `sub_id`,`amount`, `updated_at`) VALUES
(1, 0, 160, '2022-01-13 16:00:00'),
(1, 1001, 150, '2022-01-13 16:40:00'),
(1, 1002, 170, '2022-06-13 16:40:00'),
(1, 1003, 170, '2022-06-13 16:40:00');
I'm trying to get the MIN value for amount for the past X (assume 30 days), for a given external_id, using the timestamp field updated_at, with the following constraints:
If there are no records for the past 30 days (changes), the latest record is still a valid one,
Each new record for a given external_id would "cancel and replace" the previous record,
If there are both records with sub_id = 0 and sub_id <> 0, for the same given external_id the records with sub_id <> 0 would prevail.
So, the query for the above data should return 150.
A fiddle and what I have tried at: https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e4ddd6b55dbccf607633c1cf7d9cd4ef
Extra Information (Later edit)
To give you a better picture of the whole idea here:
Each time an update is made on the amount field, there is a new record created in the records table (to create a history log).
My task is to query the MIN amount for the past 30 days.
Some external_id records also have that sub_id.
Whether there is or isn't a sub_id, the record for the external_id would be created.
Amounts with sub_id are usually bigger (something extra gets added to the amount for product_id).
At the moment it's not clear what should happen if there are multiple external_id values. You seem to want only one row returned? (If this is not the case, please improve the example to include the desired results when there are multiple different external_id values.)
If you do just want one value returned, you could simply ORDER BY <something> LIMIT 1
I, however, am going to assume you want just one value per external_id.
WITH
filtered_sorted
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY external_id
ORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END,
amount
)
AS rn
FROM
records AS r
WHERE
updated_at >= (SELECT COALESCE(MAX(updated_at), DATE(NOW()) - INTERVAL 30 DAY)
FROM records
WHERE updated_at <= DATE(NOW()) - INTERVAL 30 DAY
AND external_id = r.external_id
)
)
SELECT
*
FROM
filtered_sorted
WHERE
rn = 1
This is based on the notion that the most recent row on or before the start of the day 30 days ago is still valid and should be included in consideration for the lowest amount.
For each external_id...
Ignore all records updated_at after the start of 30 days ago
Return the most recent remaining updated_at (which could be exactly the start of the day 30 days ago)
If no such row is found, return the start of the day 30 days ago
We will consider all rows for that updated_at onwards
Then the ROW_NUMBER() prefers rows with a non-zero sub_id and then rows with the lowest amount.
For the rows being considered, after the WHERE clause described above
Assign each row a row number
Each external_id should have it's own individual sequence of row numbers (achieved with PARTITION BY external_id`)
Rows with sub_id <> 0 should come before before any rows with sub_id = 0 (achieved by ORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END)
Rows with lower amount values should come first (achieved with `ORDER BY amount.)
Then, just return rows where the assigned row number is 1
Partitioned by external_id
Filtered by updated_at
Sorted by sub_id <> 0, amount
(One row per external_id)
Demo on dbfiddle.uk

Self Join in Postgres

I have a ticket table with create_time column. When a ticket is created, one row is inserted in the table, create_time column inserted with create time. And when the ticket is closed, another row is inserted in the table, but now create_time column gets closed time of the ticket. Please help me in the query in which i can get the Ticket_Number, Create_time as Create Time, Create_time as closed time in one row.
Means one ticket should appear once along with 2 create_time column.
Say i have following data:-
Ticket_Number Create_Time
123 09-12-2018
123 10-12-2018
I want output as single line. Output means Ticket should appear only once and create_time column should come twice one with Create Date and one with close date.
Ticket_Number Create_Time Create_Time
123 09-12-2018 10-12-2018
I expect the close time to always come after the create time, so the creation time is the minimum value of the column create_time and the close time is the maximum value for that column.
So you need a simple group by query:
select ticket_number,
min(create_time) as create_time,
max(create_time) as close_time
from the_table
group by ticket_number;

due date estimating

my "insurance_pay_dtl"(insurance table) consist 'ins_paid_dt'(insurance paid date) column,
i need to select all members whoever not paid insurance amount before due date,
due date is 1 year(365 days)
how do i do..?
You need to link insurance_pay_dtl table with insurance_farmer_hdr with its primary key, for example:
Select member_id, member_name from insurance_farmer_hdr ifd, insurance_pay_dtl ipd
where ifd.insurance_rec_id = ipd.insruance_rec_id and trunc(sysdate) > ifd.due_date
change the columns in above query as per your table columns and try.

How to get maximum count of a field in a table in Tsql and groupby

I have a table Customer_Complex_LoginLogs to log customer entrance.
I want to get the maximum number of entrances that has occurred on a single day (and I want to know the day that this occurred).
I know I should perform a group by TFEnteranceDate
How can I achieve this in TSQL ?
TableName :
Customer_Complex_LoginLogs
Table fields :
Id guid PK
Id_Customer guid FK
TFEnteranceDate datetime
TFEnteranceDatep nvarchar(10)
Without more information this could be a simple GROUP BY
SELECT TOP 1 TFEnteranceDate, Count(TFEnteranceDate) as Enterance
FROM Customer_Complex_LoginLogs
GROUP BY TFEnteranceDate
ORDER BY Count(TFEnteranceDate) DESC
EDIT: The day with the max number of TFEnteranceDate recorded