Joining 2 tables with a timestamp of YEAR for the first item purchased by customer - sql

I have 2 tables "items" and "customers".
In both tables, "customer_id" is present and a single customer can have more than 1 item. In the "items" table there is also a timestamp field called "date_created" when an item was purchased.
I want to construct a query that can return each customer_id and item_id associated with the first item each customer bought in a specific year, let's say 2020.
My approach was
SELECT customer_id, items
INNER JOIN items ON items.customer_id=customers.customer_id
and then try to use the EXTRACT function to take care of the first item each customer bought in 2020 but I can't seem to extract the first item only for the specific year. I would really appreciate some help.
I am using PostgreSQL. Thank you.

Just use distinct on:
select distinct on (customer_id) i.*
from items i
where date_created >= date '2020-01-01' and
date_created < date '2021-01-01'
order by customer_id, date_created;
First, note the use of direct date comparisons. This makes it easier for the optimizer to choose the best execution plan.
distinct on is a handy Postgres extension that returns the first row encountered for the keys in parentheses, which must be the first keys in the order by. "First" is based on the subsequent order by keys.

Related

Delete duplicates using dense rank

I have a sales data table with cust_ids and their transaction dates.
I want to create a table that stores, for every customer, their cust_id, their last purchased date (on the basis of transaction dates) and the count of times they have purchased.
I wrote this code:
SELECT
cust_xref_id, txn_ts,
DENSE_RANK() OVER (PARTITION BY cust_xref_id ORDER BY CAST(txn_ts as timestamp) DESC) AS rank,
COUNT(txn_ts)
FROM
sales_data_table
But I understand that the above code would give an output like this (attached example picture)
How do I modify the code to get an output like :
I am a beginner in SQL queries and would really appreciate any help! :)
This would be an aggregation query which changes the table key from (customer_id, date) to (customer_id)
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(txn_ts) as count_purchase_dates
FROM
sales_data_table
GROUP BY
cust_xref_id
You are looking for last purchase date and count of distinct transaction dates ( like if a person buys twice, it should be considered as one single time).
Although you mentioned you want count of dates but sample data shows you want count of distinct dates - customer 284214 transacted 9 times but distinct will give you 7.
So, here is the SQL you can use to get your result.
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(distinct txn_ts) as count_purchase_dates -- Pls note distinct will count distinct dates
FROM sales_data_table
GROUP BY 1

SQL to find unique counts between two date fields

I was reading this but can't manage to hack it to work on my own problem.
My data has the following fields, in a single table in Postgres:
Seller_id (varchar) (contains_duplicates).
SKU (varchar) (contains duplicates).
selling_starts (datetime).
selling_ends (datetime).
I want to query it so I get the count of unique SKUs on sale, per seller, per day. If there are any null days I don't need these.
I've tried before querying it by using another table to generate a list of unique "filler" dates and then joining it to where the date is more than the selling_starts and less than the selling_ends fields. However, this is so computationally expensive that I get timeout errors.
I'm vaguely aware there are probably more efficient ways of doing this via with statements to create CTEs or some sort of recursive function, but I don't have any experience of this.
Any help much appreciated!
try this :
WITH list AS
( SELECT generate_series(date_trunc('day', min(selling_starts)), max(selling_ends), '1 day') AS ref_date
FROM your_table
)
SELECT seller_id
, l.ref_date
, count(DISTINCT sku) AS sku_count
FROM your_table AS t
INNER JOIN list AS l
ON t.selling_starts <= l.ref_date
AND t.selling_ends > l.ref_date
GROUP BY seller_id, l.ref_date
If your_table is large, you should create indexes to accelerate the query.

How to compare dates from separate tables in Oracle

I am trying to compare two tables in my database. One of the tables (named ORDERED) has a field called OREDERDATE. Another table (named ORDERLINE) has a field called DATESUPPLIED.
I would like to run a SQL query to return the difference in days between the two dates.
I have tried a few things thus far (for example...
SELECT ordered.ordernumber, ordered.orderdate, orderline.datesupplied,
customer.customernumber,
LEAD (orderdate) OVER (partition by customernumber
ORDER BY customernumber) - datesupplied DIFF_DAYS
FROM ordered, orderline;
) ... but to no avail.
Please assist. My ERD follows
enter image description here
I think you want something like this:
select ol.ordernumber, ol.productcode,
trunc(ol.datesupplied) - trunc(o.orderdate) as days_difference
from orderline ol join ordered o on ol.ordernumber = o.ordernumber
;
This will only show the order number and the product code (therefore corresponding exactly to the rows in the second table) and the day difference between the supply date and the ordered date. trunc() is needed to truncate the time-of-day to midnight, since this is what you said you needed.
If you need additional columns from either table, add them to the select clause. If you must filter your results (for example, look only for specific orders, or orders before or after a given date, etc.), add a where clause. If you need to order the results in some way, add an order by clause.

How to produce a distinct count of records that are stored by day by month

I have a table with several "ticket" records in it. Each ticket is stored by day (i.e. 2011-07-30 00:00:00.000) I would like to count the unique records in each month by year I have used the following sql statement
SELECT DISTINCT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM
NAT_JOBLINE
GROUP BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
ORDER BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
This does produce a count but it is wrong as it picks up the unique tickets for every day. I just want a unique count by month.
Try combining Year and Month into one field, and grouping on that new field.
You may have to cast them to varchar to ensure that they don't simply get added together. Or.. you could multiple through the year...
SELECT
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE),
count(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE GROUP BY
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE)
Presuming that TICKETID is not a primary or unique key, but does appear multiple times in table NAT_JOBLINE, that query should work. If it is unique (does not occur in more than 1 row per value), you will need to select on a different column, one that uniquely identifies the "entity" that you want to count, if not each occurance/instance/reference of that entity.
(As ever, it is hard to tell without working with the actual data.)
I think you need to remove the first distinct. You already have the group by. If I was the first Distict I would be confused as to what I was supposed to do.
SELECT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY YEAR(TICKETDATE), MONTH(TICKETDATE)
ORDER BY YEAR(TICKETDATE), MONTH(TICKETDATE)
From what I understand from your comments to Phillip Kelley's solution:
SELECT TICKETDATE, COUNT(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY TICKETDATE
should do the trick, but I suggest you update your question.

SQL "GROUP BY" issue

I'm designing a shopping cart. To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table. Here's an example of what the table looks like:
pid date price
1 1/1/09 50
1 2/1/09 55
1 3/1/09 54
Using SELECT and GROUP BY to find the latest price of each product, I came up with:
SELECT pid, price, max(date) FROM ProductPrice GROUP BY pid
The date and pid returned were accurate. I received exactly 1 entry for every unique pid and the date that accompanied it was the latest date for that pid. However, what came as a surprise was the price returned. It returned the price of the first row matching the pid, which in this case was 50.
After reworking my statement, I came up with this:
SELECT pp.pid, pp.price, pp.date FROM ProductPrice AS pp
INNER JOIN (
SELECT pid AS lastPid, max(date) AS lastDate FROM ProductPrice GROUP BY pid
) AS m
ON pp.pid = lastPid AND pp.date = lastDate
While the reworked statement now yields the correct price(54), it seems incredible that such a simple sounding query would require an inner join to execute. My question is, is my second statement the easiest way to accomplish what I need to do? Or am I missing something here? Thanks in advance!
James
The reason you get an arbitrary price is that mysql cannot know which columns to select if you GROUP BY something. It knows it needs a price and a date per pid and can fetch the latest date as you requested with max(date) but chooses to return a price that is most efficient for him to retrieve - you didn't provide an aggregate function for that column (your first query is not valid SQL, actually.)
Your second query looks OK, but here is a shorter alternative:
SELECT pid, price, date
FROM ProductPrice p
WHERE date = (SELECT MAX(date) FROM ProductPrice tmp WHERE tmp.pid = p.pid)
But if you access the latest price a lot (which I think you do), I would recommend adding the old column back to your original table to hold the newest value, if you have the option of altering the database structure again.
I think you broke your database schema.
To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table.
As you have pointed out you need to keep a change history of prices. But you can still keep the current price in the products table in addition to that new table. That would make your life much easier (and your queries faster).
You cannot solve your problem with the GROUP BY clause, because for each group of pid MySQL will simply fetch the first pid, the maximum date and the first price found (which is not what you need).
You may either use a subquery (which can be inefficient):
SELECT pid, date, price
FROM ProductPrice p1
WHERE date = ( SELECT MAX(p2.date)
FROM ProductPrice p2
WHERE p1.pid = p2.pid)
or you can simply join the table with itself:
SELECT p1.pid, p1.date, p1.price
FROM ProductPrice p1
LEFT JOIN ProductPrice p2 ON p1.pid = p2.pid
AND p1.date < p2.date
WHERE p2.pid IS NULL
Take a look at this section of MySQL docs.
You might wanna try this:
SELECT pid, price, date FROM ProductPrice GROUP BY pid ORDER BY date DESC
Group has some obscure functionality, I'm too always unsure if it's the right field...but it should be the first in the resultset.
Here is another -possibly inefficient- one:
SELECT pid, substring_index( group_concat( price order by date desc ), ',', 1 ) , max(date)
FROM ProductPrice
GROUP BY pid
I think that the key here is simple sounding query - you can see what you want but computers ain't human and so to produce the desired result from set based operations you have to be explicit as in the second query.
The inner query identifies the last price for each product, then the outer query lets you get the value for the last price - that's about as simple as it can get.
As an aside, if you have an invoicing system, you really ought to store the price for the product (and the tax rates as well as the "codes") with the invoice i.e. the invoice tables should contain all the necessary financial information to reproduce the invoice. In general, you do not want to rely on being able to look up a price (or a tax rate) in a mutable table even allowing for the system introduced as above. Regardless of this have the pricing history has its own merits.
i faced same problem in one of my project i used subquery to fetch date and then compare it but it makes system slow when data increases. so, its better to store latest price in your Products table in addition to the new table you have created to keep history of price changes.
you can always use any of query ppl suggested to get latest price of product on particular date. but also you can add one field in the same table is it latest. so for one date you can make flag true once. and you can always find product's latest price for particular date by one simple query.