How to keep two tables synchronised in Postgres? - sql

I have an orders table that looks like this:
CREATE TABLE orders (
order_id INT NOT NULL
,customer_id INT NOT NULL
,order_date DATE NOT NULL
,order_price_total FLOAT NOT NULL
,order_price_tax FLOAT NOT NULL
,order_discounts FLOAT NOT NULL
,order_price_shipping FLOAT NOT NULL
);
Whenever an order is inserted, updated or deleted, I need to keep another table (called customer_orders) synchronised/up-to-date. Here's what customer_orders should look like:
CREATE TABLE customer_orders AS
SELECT
customer_id
,order_id
,order_date
,order_price_total
,DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date, order_id) AS order_number
,COUNT(order_id) OVER (PARTITION BY customer_id) AS total_customer_orders
,LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date, order_id) AS previous_order_date
,MIN(order_date) OVER (PARTITION BY customer_id ORDER BY order_date, order_id) AS first_order_date
,order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date, order_id) AS previous_order_lookback_days
FROM
orders;
What would be a good way to go about this?

You cannot have that data persisted and synchronized at the same time. That would mean that you rewrite the copy whenever you modify data in the original table, which is a non-starter.
I assume that you are asking because calculating those values for each query is too slow for you. But that's the first thing you should try. The right indexes can help a lot!
The other option is to use a materialized view that you refresh regularly, say every hour. If you are ready to live with stale data in that materialized view, that is a viable solution.

Related

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

Using RANK with COUNT(DISTINCT) aggregation at the same time

I have a subquery that joins my customers and transactions table, aliased as jq. I want to create a ranking of each customer's purchases (transactions) by order timestamp (order_ts). So I did,
SELECT customer_id,
order_id,
order_ts,
RANK() OVER (PARTITION BY customer_id ORDER BY order_ts ASC) AS purchase_rank,
amount
FROM jq GROUP BY customer_id
ORDER BY customer_id;
Alongside with the purchase_rank column, I also want to know how many total purchases the customer have done. So this becomes:
SELECT customer_id,
order_id,
order_ts,
RANK() OVER (PARTITION BY customer_id ORDER BY order_ts ASC) AS purchase_rank,
// total purchases of this customer, counted by order_id
amount
FROM jq GROUP BY customer_id
ORDER BY customer_id;
Some order_ids are duplicated, so I want to count distinctly. How do I do this in MS SQL Server and Postgres without joining to a subquery?
Microsoft SQL Server
SELECT customer_id,
order_id,
order_ts,
purchase_rank,
MAX(cnt) OVER(PARTITION BY customer_id) AS purchase_cnt,
amount
FROM
(
SELECT customer_id,
order_id,
order_ts,
RANK() OVER (PARTITION BY customer_id ORDER BY order_ts ASC) AS purchase_rank,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_id ASC) AS cnt,
amount
FROM jq
-- GROUP BY customer_id
) AS D
ORDER BY customer_id;
Sorry, I am not familiar with postgresql.

Bigquery select non duplicate records

Consider the following table (simplified version):
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
The above schema is used to store POS receipts for restaurants. Now, this table sometimes contains a receipt from same date, same transaction_no at same location_id.
In which case what I want to do is get the last receipt of that location_id & transaction_no order by created_at desc.
In MySQL, I use the following query which gets me last (max(created_at) receipt for a location_id & transaction_no:
SELECT id, amount, transaction_no, location_id, created_at
FROM receipts r JOIN
(SELECT transaction_no, max(created_at) AS maxca
FROM receipts r
GROUP BY transaction_no
) t
ON r.transaction_no = t.transaction_no AND r.created_at = t.maxca
group by location_id;
But when I run the same in BigQuery, I get the following error:
Query Failed Error: Shuffle reached broadcast limit for table __I0
(broadcasted at least 150393576 bytes). Consider using partitioned
joins instead of broadcast joins . Job ID:
circular-gist-812:job_A_CfsSKJICuRs07j7LHVbkqcpSg
Any idea how to make the above query work in BigQuery?
SELECT id, amount, transaction_no, location_id, created_at
FROM (
SELECT
id, amount, transaction_no, location_id, created_at,
ROW_NUMBER() OVER(PARTITION BY transaction_no, location_id
ORDER BY created_at DESC) as last
FROM your_dataset.your_table
)
WHERE last = 1

How to capture rows that match an aggregate

Say I have a table with (pseudocode):
TABLE Order
(
orderid int,
type int,
price NUMERIC(18,2),
)
Now I want to list those orders whose price matches the maximum price for a particular order type.
I start with the following, giving me the max price per order type:
SELECT type, MAX(price)
FROM Order
GROUP BY type
Now I know the maximum price by type. However, I want to, as efficiently as possible, get a result set of the actual orders whose price is that maximum price, instead of just the type/MAX(price).
The table is very large with potentially tens of millions of rows, so efficiency is key here (assuming proper indexes are in place, of course, such as on the type column in this case).
I start with something like:
SELECT orderid, price
FROM Order AS O
WHERE O.price=(SELECT MAX(O2.price)
FROM Order AS O2
WHERE O2.type=O.type)
It's not particularly fast, but it does the job.
Then I realize that orders appear multiple times in this table, because it's actually a denormalized order history table and it really looks more like:
TABLE Order
(
id int, -- This is just an identity column - the surrogate key
orderid int, -- multiple records exist for the same
-- orderid with different update times
type int,
price NUMERIC(18,2),
updatetime DATETIME2(3)
)
So, what I want is actually the latest version of those orders based on updatetime whose price matches the maximum price for their particular type. This is my question.
Extending:
SELECT *
FROM Order AS O
WHERE O.price=(SELECT MAX(O2.price)
FROM Order AS O2
WHERE O2.type=O.type)
..., to handle the new requirement seems like a mess waiting to happen. So I was wondering a good efficient (and hopefully readable) solution to the new requirements would be.
Based on Gordon's suggestion of:
select o.*
from (select o.*,
row_number() over (partition by type, price order by updatetime desc) as seqnum
from (select o.*, max(o.price) over (partition by type) as maxprice,
from Orders o
) o
where price = maxprice
) o
where seqnum = 1;
I have come up with the following query, with comments added to describe my thought process. The comments should of course be read from the innermost
query outward:
SELECT *
FROM
(
-- We want the max price for each order type, but we only want to
-- use the latest version of each order (i.e., seqnum=1). So, we
-- partition by type/seqnum, calculate the max price for each
-- partition and the only use the max prices from the seqnum=1
-- partitions for each type via the WHERE clause in the outer query
SELECT *,
MAX(price) OVER (PARTITION BY type, seqnum) AS maxprice
FROM
(
-- We only want to examine the latest version of each order.
-- BTW, the order price can change between versions.
-- So, let's start by marking the latest version of each order
-- with seqnum=1 which we will use as a "filter in" clause later
SELECT *,
row_number() OVER (PARTITION BY orderid
ORDER BY updatetime DESC) AS seqnum
) AS O
WHERE seqnum=1; -- Discard all but the latest versions of orders
) AS O
WHERE price=maxprice
I am not sure if this is correct though, because it is quite complicated...
Use window functions. Your original query can be written as:
select o.*
from (select o.*, max(o.price) over (partition by type) as maxprice
from Orders o
) o
where price = maxprice;
If you want the most recent order for the price:
select o.*
from (select o.*, max(o.price) over (partition by type) as maxprice,
row_number() over (partition by type, price order by updatetime desc) as seqnum
from Orders o
) o
where price = maxprice and seqnum = 1;
EDIT:
This would be a bit more efficient with an index on Orders(type, price, updatetime). You can also try writing this as:
select o.*
from (select o.*,
row_number() over (partition by type, price order by updatetime desc) as seqnum
from (select o.*, max(o.price) over (partition by type) as maxprice,
from Orders o
) o
where price = maxprice
) o
where seqnum = 1;
This may greatly reduce the data being used for the second analytic function.

sql - check for uniqueness of COMPOSITE key

Can somebody please help me with this difficulty I am having?
I would like to check some data whether it is valid, so a small part of the validation consists of entity integrity where I check that my primary key is unique
SELECT order_id, COUNT(order_id)
FROM temp_order
GROUP BY order_id
HAVING ( COUNT(order_id) > 1 )
in this case, order_id is the primary key. This query works fine.
The problem:
I now have another table temp_orditem which has a composite primary key made up of 2 fields: order_id, product_id.
How can I check whether the primary key is unique (i.e. the combination of the 2 fields together)? Can I do the following?
SELECT order_id, product_id, COUNT(order_id), COUNT(product_id)
FROM temp_order
GROUP BY order_id, product_id
HAVING ( COUNT(order_id) > 1 AND COUNT(product_id)>1)
I would just write this:
SELECT order_id, product_id, COUNT(*) AS x
FROM temp_order
GROUP BY order_id, product_id
HAVING x > 1
This is the query you need:
select order_id, product_id, count(*)
from temp_order
group by order_id, product_id
having count(*) > 1