How to compare dates from separate tables in Oracle - sql

I am trying to compare two tables in my database. One of the tables (named ORDERED) has a field called OREDERDATE. Another table (named ORDERLINE) has a field called DATESUPPLIED.
I would like to run a SQL query to return the difference in days between the two dates.
I have tried a few things thus far (for example...
SELECT ordered.ordernumber, ordered.orderdate, orderline.datesupplied,
customer.customernumber,
LEAD (orderdate) OVER (partition by customernumber
ORDER BY customernumber) - datesupplied DIFF_DAYS
FROM ordered, orderline;
) ... but to no avail.
Please assist. My ERD follows
enter image description here

I think you want something like this:
select ol.ordernumber, ol.productcode,
trunc(ol.datesupplied) - trunc(o.orderdate) as days_difference
from orderline ol join ordered o on ol.ordernumber = o.ordernumber
;
This will only show the order number and the product code (therefore corresponding exactly to the rows in the second table) and the day difference between the supply date and the ordered date. trunc() is needed to truncate the time-of-day to midnight, since this is what you said you needed.
If you need additional columns from either table, add them to the select clause. If you must filter your results (for example, look only for specific orders, or orders before or after a given date, etc.), add a where clause. If you need to order the results in some way, add an order by clause.

Related

SQL to find unique counts between two date fields

I was reading this but can't manage to hack it to work on my own problem.
My data has the following fields, in a single table in Postgres:
Seller_id (varchar) (contains_duplicates).
SKU (varchar) (contains duplicates).
selling_starts (datetime).
selling_ends (datetime).
I want to query it so I get the count of unique SKUs on sale, per seller, per day. If there are any null days I don't need these.
I've tried before querying it by using another table to generate a list of unique "filler" dates and then joining it to where the date is more than the selling_starts and less than the selling_ends fields. However, this is so computationally expensive that I get timeout errors.
I'm vaguely aware there are probably more efficient ways of doing this via with statements to create CTEs or some sort of recursive function, but I don't have any experience of this.
Any help much appreciated!
try this :
WITH list AS
( SELECT generate_series(date_trunc('day', min(selling_starts)), max(selling_ends), '1 day') AS ref_date
FROM your_table
)
SELECT seller_id
, l.ref_date
, count(DISTINCT sku) AS sku_count
FROM your_table AS t
INNER JOIN list AS l
ON t.selling_starts <= l.ref_date
AND t.selling_ends > l.ref_date
GROUP BY seller_id, l.ref_date
If your_table is large, you should create indexes to accelerate the query.

INSERT INTO two columns from a SELECT query

I have a table called VIEWS with Id, Day, Month, name of video, name of browser... but I'm interested only in Id, Day and Month.
The ID can be duplicate because the user (ID) can watch a video multiple days in multiple months.
This is the query for the minimum date and the maximum date.
SELECT ID, CONCAT(MIN(DAY), '/', MIN(MONTH)) AS MIN_DATE,
CONCAT(MAX(DAY), '/', MAX(MONTH)) AS MAX_DATE,
FROM Views
GROUP BY ID
I want to insert this select with two columns(MIN_DATE and MAX_DATE) to two new columns with insert into.
How can be the insert into query?
To do what you are trying to do (there are some issues with your solution, please read my comments below), first you need to add the new columns to the table.
ALTER TABLE Views ADD MIN_DATE VARCHAR(10)
ALTER TABLE Views ADD MAX_DATE VARCHAR(10)
Then you need to UPDATE your new columns (not INSERT, because you don't want new rows). Determine the min/max for each ID, then join the result back to the table to be able to update each row. You can't update directly from a GROUP BY as rows are grouped and lose their original row.
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE,
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE
FROM
Views AS V
GROUP BY
ID
)
UPDATE V SET
MIN_DATE = M.MIN_DATE,
MAX_DATE = M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID
The problems that I see with this design are:
Storing aggregated columns: you usually want to do this only for performance issues (which I believe is not the case here), as querying the aggregated (grouped) rows is faster due to being less rows to read. The problem is that you will have to update the grouped values each time one of the original rows is updated, which as extra processing time. Another option would be periodically updating the aggregated values, but you will have to accept that for a period of time the grouped values are not really representing the tracking table.
Keeping aggregated columns on the same table as the data they are aggregating: this is normalization problem. Updating or inserting a row will trigger updating all rows with the same ID as the min/max values might have changed. Also the min/max values will always be repeated on all rows that belong to the same ID, which is extra space that you are wasting. If you had to save aggregated data, you need to save it on a different table, which causes the problems I listed on the previous point.
Using text data type to store dates: you always want to work dates with a proper DATETIME data type. This will not only enable to use date functions like DATEADD or DATEDIFF, but also save space (varchars that store dates need more bytes that DATETIME). I don't see the year part on your query, it should be considered to compute a min/max (this might depend what you are storing on this table).
Computing the min/max incorrectly: If you have the following rows:
ID DAY MONTH
1 5 1
1 3 2
The current result of your query would be 3/1 as MIN_DATE and 5/2 as MAX_DATE, which I believe is not what you are trying to find. The lowest here should be the 5th of January and the highest the 3rd of February. This is a consequence of storing date parts as independent values and not the whole date as a DATETIME.
What you usually want to do for this scenario is to group directly on the query that needs the data grouped, so you will do the GROUP BY on the SELECT that needs the min/max. Having an index by ID would make the grouping very fast. Thus, you save the storage space you would use to keep the aggregated values and also the result is always the real grouped result at the time that you are querying.
Would be something like the following:
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE, -- Date problem (varchar + min/max computed seperately)
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE -- Date problem (varchar + min/max computed seperately)
FROM
Views AS V
GROUP BY
ID
)
SELECT
V.*,
M.MIN_DATE,
M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID

Joining 2 tables with a timestamp of YEAR for the first item purchased by customer

I have 2 tables "items" and "customers".
In both tables, "customer_id" is present and a single customer can have more than 1 item. In the "items" table there is also a timestamp field called "date_created" when an item was purchased.
I want to construct a query that can return each customer_id and item_id associated with the first item each customer bought in a specific year, let's say 2020.
My approach was
SELECT customer_id, items
INNER JOIN items ON items.customer_id=customers.customer_id
and then try to use the EXTRACT function to take care of the first item each customer bought in 2020 but I can't seem to extract the first item only for the specific year. I would really appreciate some help.
I am using PostgreSQL. Thank you.
Just use distinct on:
select distinct on (customer_id) i.*
from items i
where date_created >= date '2020-01-01' and
date_created < date '2021-01-01'
order by customer_id, date_created;
First, note the use of direct date comparisons. This makes it easier for the optimizer to choose the best execution plan.
distinct on is a handy Postgres extension that returns the first row encountered for the keys in parentheses, which must be the first keys in the order by. "First" is based on the subsequent order by keys.

How to work past "At most one record can be returned by this subquery"

I'm having trouble understanding this error through all the researching I have done. I have the following query
SELECT M.[PO Concatenate], Sum(M.SumofAward) AS TotalAward, (SELECT TOP 1 M1.[Material Group] FROM
[MGETCpreMG] AS M1 WHERE M1.[PO Concatenate]=M.[PO Concatenate] ORDER BY M1.SumofAward DESC) AS TopGroup
FROM MGETCpreMG AS M
GROUP BY M.[PO Concatenate];
For a brief instance it reviews the results I want, but then the "At most one record can be returned by this subquery" error comes and wipes all the data to #Name?
For context, [MGETCpreMG] is a query off a main table [MG ETC] that was used to consolidate Award for differing Material Groups on a PO transaction ([PO Concatenate])
SELECT [MG ETC].[PO Concatenate], Sum([MG ETC].Award) AS SumOfAward, [MG ETC].[Material Group]
FROM [MG ETC]
GROUP BY [MG ETC].[PO Concatenate], [MG ETC].[Material Group]
ORDER BY [MG ETC].[PO Concatenate];
I'm thinking it lies in my inability to understand how to utilize a subquery.
In the case in which the query can return more then one value? Simply add an additonal sort by.
So, a common sub query might be to get the last invoice. So you might have:
select ID, CompanyName,
(SELECT TOP 1 InvoiceDate from tblInvoice
where tblInvoice.CustomerID = tblCompany.ID
Order by InvoiceDate DESC)
As LastInvoiceDate
From tblCustomers
Now the above might work for some time, but then it will blow up since you might have two invoices for the same day!
So, all you have to do is add that extra order by clause - say on the PK of the child table like this:
Order by InvoiceDate DESC,ID DESC)
So top 1 will respect the "additional" order columns you add, and thus only ever return one row - even if there are multiple values that match the top 1 column.
I suppose in the above we could perhaps forget the invoiceDate and always take the top most last autonumber ID, but for a lot of queries, you can't always be sure - it might be we want the last most expensive invoice amount. And again, if the max value (top) was the same for two large invoice amounts, then again two rows could be return. So, simply add the extra ORDER BY clause with an 2nd column that further orders the data. And thus top 1 will only pull the first value. Your example of a top group is such an example. Just tack on the extra order by "ID" or whatever the auto number ID column is.

Combine multiple date fields into one on query

I have a requirement to create a report that counts a total from 2 date fields into one. A simplified example of the table I'm querying is:
ID, FirstName, LastName, InitialApplicationDate, UpdatedApplicationDate
I need to query the two date fields in a way that creates similar output to the following:
Date | TotalApplications
I would need the date output to include both InitialApplicationDate and
UpdatedApplicationDate fields and the TotalApplications output to be a count of the total for both types of date fields. Originally I thought maybe a Union would work however that returns 2 separate records for each date. Any ideas how I might accomplish this?
The simplest way, I think, is to unpivot using apply and then aggregate:
select v.thedate, count(*)
from t cross apply
(values (InitialApplicationDate), (UpdatedApplicationDate)) v(thedate)
group by v.thedate;
You might want to add where thedate is not null if either column could be NULL.
Note that the above will count the same application twice, once for each date. That appears to be your intention.