Creating customer_id based on matchcodes in Oracle (SQL) - sql

I have an oracle database containing purchases of customers. (one record is one purchase) Customers provided their personal data again and again at every purchase. So there can be differences due to mistype, address change, etc. Now I have to identify purchases belonging to the same customer.
To do that I created 3 different match code based on simple rules. My table looks somehow like this now:
+-------------+-------------+-------------+-------------+-------------+
| PURCACHE_ID | MATCHCODE_1 | MATCHCODE_2 | MATCHCODE_3 | CUSTOMER_ID |
+-------------+-------------+-------------+-------------+-------------+
| | | | | |
| 1 | 1 | b | x | |
| | | | | |
| 2 | 1 | a | y | |
| | | | | |
| 3 | 2 | c | x | |
| | | | | |
| 4 | 3 | a | z | |
| | | | | |
| ... | ... | ... | ... | ... |
+-------------+-------------+-------------+-------------+-------------+
What I want to do is to assign a customer_id to every purchase. Same customer_id would be assigned to purchases where any matchcode equals to another one.
So for example purchase 1 and purchase 2 would receive the same customer_id because matchcode 1 is the same. Also purchase 2 and purchase 4 belong to the same customer cause Matchcode_2 is the same. Thereby even purchase 1 and purchase 4 would receive the same customer_id though none of their matchcodes equals.
Customer_id can be a simple number starting from 1.
What is the SQL code to make Customer_Id?

A naive solution:
-- Just number them
UPDATE purchases SET customer_id = rownum;
-- Group all customers with given matchcode_1 into one
MERGE INTO purchases p
USING (SELECT matchcode_1, min(customer_id) customer_id
FROM purchases
GROUP BY matchcode_1) m
ON (p.matchcode_1 = m.matchcode_1)
WHEN MATCHED THEN
UPDATE SET p.customer_id = m.customer_id;
-- Repeat the above merge for matchcode_2, matchcode_3
-- then matchcode_1 again and so on
-- until none of the matchcodes make any updates
You could write something nicer with PL/SQL probably...

Related

Complex nested aggregations to get order totals

I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns

SQL: tricky question for finding lockout dates

Hope you can help. We have a table with two columns Customer_ID and Trip_Date. The customer receives 15% off on their first visit and on every visit where they haven't received the 15% off offer in the past thirty days. How do I write a single SQL query that finds all days where a customer received 15% off?
The table looks like this
+-----+-------+----------+
| Customer_ID | date |
+-----+-------+----------+
| 1 | 01-01-17 |
| 1 | 01-17-17 |
| 1 | 02-04-17 |
| 1 | 03-01-17 |
| 1 | 03-15-17 |
| 1 | 04-29-17 |
| 1 | 05-18-17 |
+-----+-------+----------+
The desired output would look like this:
+-----+-------+----------+--------+----------+
| Customer_ID | date | received_discount |
+-----+-------+----------+--------+----------+
| 1 | 01-01-17 | 1 |
| 1 | 01-17-17 | 0 |
| 1 | 02-04-17 | 1 |
| 1 | 03-01-17 | 0 |
| 1 | 03-15-17 | 1 |
| 1 | 04-29-17 | 1 |
| 1 | 05-18-17 | 0 |
+-----+-------+----------+--------+----------+
We are doing this work in Netezza. I can't think of a way using just window functions, only using recursion and looping. Is there some clever trick that I'm missing?
Thanks in advance,
GF
You didn't tell us what your backend is, nor you gave some sample data and expected output nor you gave a sensible data schema :( This is an example based on guess of schema using postgreSQL as backend (would be too messy as a comment):
(I think you have Customer_Id, Trip_Date and LocationId in trips table?)
select * from trips t1
where not exists (
select * from trips t2
where t1.Customer_id = t2.Customer_id and
t1.Trip_Date > t2.Trip_Date
and t1.Trip_date - t2.Trip_Date < 30
);

PostgreSQL: Using the LEAST() command after GROUP BY to achieve first transactions

I am working with a magento table like this:
+-----------+--------------+------------+--------------+----------+-------------+
| date | email | product_id | product_type | order_id | qty_ordered |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W1 | custom | 12 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W2 | simple | 17 | 3 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/20 | z#abc.com | 22Y34 | simple | 119 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/20 | z#abc.com | 22Y35 | custom | 31 | 2 |
+-----------+--------------+------------+--------------+----------+-------------+
I want to make a new view by grouping by email, and then taking the row with the LEAST of order_id only.
So my final table after doing this operation from above should look like this:
+-----------+--------------+------------+--------------+----------+-------------+
| date | email | product_id | product_type | order_id | qty_ordered |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W1 | custom | 17 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | z#abc.com | 18W2 | simple | 31 | 3 |
+-----------+--------------+------------+--------------+----------+-------------+
I'm trying to use the following query (but it's not working):
SELECT * , (SELECT DISTINCT table.email, table.order_id,
LEAST (order_id) AS first_transaction_id
FROM
table
GROUP BY
email)
FROM table;
Would really love any help with this, thank you!
I think you want distinct on:
select distinct on (email) t.*
from t
order by email, order_id;
distinct on is a Postgres extension. It takes one record for all combinations of keys in parentheses, based on the order by clause. In this case, it is one row per email, with the first one being the one with the smallest order_id (because of the order by). The keys in the select also need to be the first keys in the order by.

SQL Query to Work out Every Product Combination

I require a SQL query to work out every product combination.
I have three product categories (game, accessory, upgrade) and products assigned to each of these three categories:
+----+------------+-----------+------------+
| id | category | product | prod_code |
+----+------------+-----------+------------+
| 1 | game | GTA | 100 |
| 2 | game | GTA1 | 200 |
| 3 | game | GTA2 | 300 |
| 4 | accessory | Play Pad | 400 |
| 5 | accessory | Xbox Pad | 500 |
| 6 | upgrade | Memory | 600 |
| 6 | upgrade | drive | 700 |
+----+------------+-----------+------------+
I want to take one product from each of the categories and work out every single combination:
+----+--------------+
| id | combinations |
+----+--------------+
| 1 | 100,400,600 |
| 2 | 100,500,600 |
| 3 | 100,400,700 |
| 4 | 100,500,700 |
| ? | etc |
+----+--------------+
How would I go about doing this?
Thanks in advance, Stuart
Use a CROSS JOIN:
SELECT CONCAT(t1.[prod_code], ',',
t2.[prod_code], ',',
t3.[prod_code])
FROM (
SELECT [prod_code]
FROM mytable
WHERE category = 'game') AS t1
CROSS JOIN (
SELECT [prod_code]
FROM mytable
WHERE category = 'accessory') AS t2
CROSS JOIN (
SELECT [prod_code]
FROM mytable
WHERE category = 'upgrade') AS t3
ORDER BY t1.[prod_code], t2.[prod_code], t3.[prod_code]
CROSS JOIN of derived tables, one for each category, produces the following cartesian product: 'game' products x 'accessory' products x 'upgrade' products
Demo here

Create a pivot table from two tables based on dates

I have two MS Access tables sharing a one to many relationship. Their structures are like the following:
tbl_Persons
+----------+------------+-----------+
| PersonID | PersonName | OtherData |
+----------+------------+-----------+
| 1 | PersonA | etc. |
| 2 | PersonB | |
| 3 | PersonC | |
tbl_Visits
+----------+------------+------------+-----------------------
| VisitID | PersonID | VisitDate | dozens of other fields
+----------+------------+------------+-----------
| 1 | 1 | 09/01/13 |
| 2 | 1 | 09/02/13 |
| 3 | 2 | 09/03/13 |
| 4 | 2 | 09/04/13 | etc...
I wish to create a new table based on the VisitDate field, the column headings of which are Visit-n where n is 1 to the number of visits, Visit-n-Data1, Visit-n-Data2, Visit-n-Data3 etc.
MergedTable
+----------+----------+---------------+-----------------+----------+----------------+
| PersonID | Visit1 | Visit1Data1 | Visit1Data2... | Visit2 | Visit2Data1... |
+----------+----------+---------------+-----------
| 1 | 09/01/13 | | | 09/02/13 |
| 2 | 09/03/13 | | | 09/04/13 |
| 3 | etc. | |
I am really not sure how to do this. Whether SQL query or using DAO then looping through records and columns. It is essential that there is only 1 PersonID per row and all his data appears chronologically into columns.
Start of by ranking the visits with something like
SELECT PersonID, VisitID,
(SELECT COUNT(VisitID) FROM tbl_Visits AS C
WHERE C.PersonID = tbl_Visits.PersonID
AND C.VisitDate < tbl_Visits.VisitDate) AS RankNumber
FROM tbl_Visits
Use this query as a base for the 'pivot'
Since you seem to have some visits of persons on the same day (visit 1 and 2) the WHERE clause needs to be a bit more sophisticated. But I hope you get the basic concept.
Pivoting can be done with multiple LEFT JOINs.
I question if my solution will have a high performance, since I did not test it. It is easier in SQL Server than in MS Access to accomplish.