How to add new rows but keep existing data same? - sql

I have a table which updates every few hours i.e. new rows are added and data for existing rows may also change.
I am looking for an output where the new rows are added but the existing data does not get over-written with the changes but changes are show in a new column
I am using big query so some standard functions may not work. does this require loops?
Base table at T=0
id food
1 cake
2 pepsi
3 peanut
4 chicken
Base table at T=1 (a new row has been added i.e. id 5)
id food
1 cake
2 pepsi
3 peanut
4 chicken
5 watermelon
Output at T=1
id food change
1 cake NULL
2 pepsi NULL
3 peanut NULL
4 chicken NULL
5 watermelon NULL
Base table at T=2 (a new row has been added i.e. id-6 and food names for id-3 and id-5 have been changed)
id food
1 cake
2 pepsi
3 sushi
4 chicken
5 wrap
6 Cherry
Output at T=2
id food change
1 cake NULL
2 pepsi NULL
3 peanut sushi
4 chicken NULL
5 watermelon wrap
6 Cherry NULL

If you are using update to change the value, then there is nothing you can do. You have changed the data and you don't have a record of earlier data.
If you are inserting the data with a timestamp, then you can construct a query. Your data would look like:
id food timestamp
1 cake 0
2 pepsi 0
3 peanut 0
4 chicken 0
1 cake 1
2 pepsi 1
3 peanut 1
4 chicken 1
5 watermelon 1
Then a typical method to get the current values might be:
select id, food, prev_food
from (select t.id, t.food,
lag(t.food) over (partition by t.id order by t.timestamp) as prev_food,
dense_rank() over (order by t.timestamp desc) as seqnum
from t
) t
where seqnum = 1;
In BigQuery, this can actually be simplified to:
select id,
array_agg(food order by timestamp desc limit 1)[safe_ordinal(1)],
array_agg(food order by timestamp desc limit 2)[safe_ordinal(2)]
from t
group by id;
This does, however, assume that id do not disappear.

With the above input/output and under the assumption that the value can changes only once, in postgreSQL we could do something like this.
INSERT INTO foo (id, food)
VALUES
(
3,
'Sushi'
)
ON CONFLICT (id)
DO
UPDATE
SET change = excluded.food;
We could use similar methodology in MySQL but the syntax would be different.

Related

Select rows where the combination of two columns is unique and we only display rows where the first column is not unique

I have an order line table that looks like this:
ID
Order ID
Product Reference
Variant
1
1
Banana
Green
2
1
Banana
Yellow
3
2
Apple
Green
4
2
Banana
Brown
5
3
Apple
Red
6
3
Apple
Yellow
7
4
Apple
Yellow
8
4
Banana
Green
9
4
Banana
Yellow
10
4
Pear
Green
11
4
Pear
Green
12
4
Pear
Green
I want to know how often people place an order with a combination of different fruit products. I want to know the orderId for that situation and which productReference was combined in the orders.
I only care about the product, not the variant.
I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:
Order ID
Product
2
Banana
2
Apple
4
Banana
4
Apple
4
Pear
I just need data output of the combination Banana+Apple and Banana+Apple+Pear happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple, Banana or Pear products, but that assumption needs to be verified.
Problem
I kind of get stuck after the first step.
select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference
This outputs:
Order ID
Product Reference
amount
1
Banana
2
2
Apple
1
2
Banana
1
3
Apple
2
4
Apple
1
4
Banana
2
4
Pear
3
I just don't know how to move on from this step to get the data I want.
You can use a window count() over()
select *
from
(
select orderId, productReference, count(*) as amount
, count(productReference) over(partition by orderId) np
from OrderLines
group by orderId, productReference
) t
where np > 1
You need only the rows where an Order_Id has different products; you can do this many ways.
One way is to aggregate and filter to only rows where the min product <> the max product, then use a correlation to find matching orders:
select distinct t.Order_ID, t.Product_Reference
from t
where exists (
select *
from t t2
where t2.Order_ID = t.Order_ID
group by order_id
having Min(Product_Reference) != Max(Product_Reference)
);
See this demo fiddle
You could use STRING_AGG:
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16
Here's an example:
SELECT orderID, STRING_AGG(productReference, ' ') products
FROM
(
SELECT DISTINCT orderID, productReference
FROM orderLines
) order_products
GROUP BY orderID
For each order ID, this pulls out the distinct products, then the STRING_AGG combines them into one field.
Output
orderID
products
1
Banana
2
Apple Banana
3
Apple
4
Apple Banana Pear
SQL fiddle example: http://sqlfiddle.com/#!18/8a677/6

Using SQL, have components of a product appear horizontally beside product

I am trying to have all components that are part of a product appear on the same row as each other
I have two tables
PRODUCT
ID PRODUCTNUMBER DESCRIPTION TYPE STATUS KIT
1 (PK) 121 1 Apples and 1 Oranges FRUIT PACK YES Y
2 122 2 Brocolli & 2 Carrots VEG PACK NO Y
3 123 3 Strawberries and 3 Blueberries and 1 Pear FRUIT PACK YES Y
4 124 2 Plums and 1 Pears FRUIT PACK YES Y
5 125 4 Grapes and 2 Cabbage COMBO PACK YES Y
6 126 Apple FRUIT YES N
7 127 Orange FRUIT YES N
8 128 Pear FRUIT YES N
9 129 Onion VEG NO N
10 130 Blueberry FRUIT YES N
11 131 Strawberry FRUIT YES N
12 132 Plum FRUIT YES N
PRODUCTCOMPONENT
PRODUCT QTY
5 55
6 45
7 21
8 12
9 0
10 20
11 25
12 50
My SQL query should return:
SKU Description COMPONENT1 QTY1 COMPONENT2 QTY2 COMPONENT3 QTY3
121 1 Apples and 1 Oranges Apple 55 Orange 45
123 3 Strawberries and 3 Blueberries and 1 Pear Strawberries 25 Blueberry 20 Pear 12
124 2 Plums and 1 Pears Plum 50 Pear 12
I tried:
SELECT
PRODUCT.CODE, PRODUCT.DESCRIPTION,
PRODUCTCOMPONENT.PRODUCT, PRODUCTCOMPONENT.QTY
FROM
PRODUCT
INNER JOIN
PRODUCTCOMPONENT ON PRODUCTCOMPONENT.PRODUCT = PRODUCT.ID
WHERE
PRODUCT.STATUS = YES
AND PRODUCT.KIT = Y;
Any help would be appreciated
Okay, this is from memory, but I've verified the syntax with SQL Fiddle.
You are right that you need to start with PRODUCT and PRODUCTCOMPONENT. And the code you posted will give you the data you want -- but it won't have the answers in columns, just in rows.
So what you have is what I call a "rotation problem". You want to "swing" the data over 90 degrees (so to speak), and have multiple columns where you had multiple rows.
There is no automatic, built-in way to do this. But there are indirect ways.
What you're going to have to do is left-outer-join PRODUCTCOMPONENT to PRODUCT once for every set of columns you want to display component information.
If you have 2 columns, you'd need to do it twice. Since your max is 5, you'll need to do it 5 times.
This is why I asked how many components you could have per item. If you had an indefinite number, you'd be out of luck, because there just isn't a simple way to
automatically extend columns out to the right for as many sets of rows you happen to have. You have to do a new left join clause for every additional possible component!
Here's an example of the 2-column case, which should show you how to do the 5-column case:
-- In order to join to just the records from Row 1, we need to number them!
-- We'll do that in a CTE (Common Table Expression).
;
WITH Components as (
-- I don't know all the columns in PRODUCTCOMPONENT, but you presumably have a
-- parent and child ID. Substitute the true names of the columns for the
-- column names I'm using
SELECT ParentId
, ChildId
, Product -- I am assuming this is the product name
, Qty
-- The following line will assign a line number to each component within
-- a product. If there's a particular order you want the columns to appear in,
-- change the "Order by" part of the ROW_NUMBER() OVER expression.
, RowNumber = ROW_NUMBER() OVER (Partition By ParentId Order By ChildId)
FROM ProductComponent
)
SELECT Product.PRODUCTNUMBER as Code
, Product.DESCRIPTION
, Component1.Product as Component1
, Component1.Qty as Qty1
, Component2.Product as Component2
, Component2.Qty as Qty2
FROM Product
-- Note that since some products will have more components than others,
-- you need to left-join to the Components CTE to make sure that rows are
-- still returned even when they only have nulls.
LEFT OUTER JOIN Components as Component1
ON Product.ID = Component1.ParentID
AND Component1.RowNumber = 1
-- The second clause of the JOIN means that you'll only get rows back
-- from the CTE if the RowNumber assigned in the CTE is (in this case) 2.
LEFT OUTER JOIN Components as Component2
ON Product.ID = Component2.ParentID
AND Component2.RowNumber = 2
WHERE Product.STATUS = 'YES'
AND Product.KIT = 'Y';

SQL Getting row number when the value is different from previous one, no matter whether the value shows before

Hi I have a difficulty when creating count: the base table is
rownum product
1 coke
2 coke
3 burger
4 burger
5 chocolate
6 apple
7 coke
8 burger
I want the result like below, as long as the product is different than the previous one, the count add one. I trying to use dense_rank(), rank() function, but it's not what I want. Thank youstrong text
rownum product
1 coke
1 coke
2 burger
2 burger
3 chocolate
4 apple
5 coke
6 burger
Use lag() to see when the value changes and then a cumulative sum:
select t.*,
sum(case when prev_product = product then 0 else 1 end) over (order by rownum) as new_rownum
from (select t.*, lag(product) over (order by rownum) as prev_product
from base t
) t

Correlating varchar values

Is there a built-in way in Oracle 11 to check correlation of values in a varchar2 field? For example, given a simple table such as this:
MEAL_NUM INGREDIENT
--------------------
1 BEEF
1 CHEESE
1 PASTA
2 CHEESE
2 PASTA
2 FISH
3 CHEESE
3 CHICKEN
I want to get a numerical indication that based on MEAL_NUM, CHEESE is paired mostly with PASTA and to lessening degrees with BEEF, CHICKEN, and FISH.
My first inclination is to use the CORR function and transform the strings into a number perhaps by either enumerating them beforehand or grabbing the rownum from a unique select.
Any suggestions how to go about this?
You won't want to use CORR -- if you create a "food number" and assign Beef = 1, Chicken = 2, and Pasta = 3, then a correlation coefficient will tell you whether increased cheese correlates with increased "food number." But the "food number" being higher or lower doesn't mean anything since you made it up. So, don't use CORR unless your foods are actually ordered in some way, like numbers are.
The way statisticians talk about this is with levels of measurement. In the language of the linked article, MEAL_NUM is a nominal measure -- or maybe an ordinal measure if the meals happened in order, but either way, it's a really bad idea to use correlation coefficients on it.
You'll probably instead want to find something like "what percentage of Beef meals also have Cheese?" The following will return, for each ingredient, the number of meals containing it and also the number of meals containing it AND cheese. The trick is that COUNT only counts non-null values.
SELECT Other.Ingredient,
COUNT(*) AS TotalMeals,
COUNT(Cheese.Ingredient) AS CheesyMeals
FROM table Other
LEFT JOIN table Cheese
ON (Cheese.Ingredient = 'Cheese'
AND Cheese.Meal_Num = Other.Meal_Num)
GROUP BY Other.Ingredient
Warning: returns wrong results if you include an ingredient twice in any one meal.
Edit: It turns out you aren't interested in Cheese specifically. You really want all the pairs of "correlations." So, we can abstract "Cheese" out and call them just the First and Second ingredients. I've added a "PossibleScore" to this one which tries to act like a percentage-of-meals but doesn't give a strong score if there are very few instances of the ingredient.
SELECT First.Ingredient,
Second.Ingredient,
COUNT(*) AS MealsWithFirst,
COUNT(First.Ingredient) AS MealsWithBoth,
COUNT(First.Ingredient) / (COUNT(*) + 3) AS PossibleScore,
FROM table First
LEFT JOIN table Second
ON (First.Meal_Num = Second.Meal_Num)
GROUP BY First.Ingredient, Second.Ingredient
When sorted by score, this should return
PASTA CHEESE 2 2 0.400
CHEESE PASTA 3 2 0.333
BEEF CHEESE 1 1 0.250
BEEF PASTA 1 1 0.250
FISH CHEESE 1 1 0.250
FISH PASTA 1 1 0.250
CHICKEN CHEESE 1 1 0.250
PASTA BEEF 2 1 0.200
PASTA FISH 2 1 0.200
CHEESE BEEF 3 1 0.167
CHEESE FISH 3 1 0.167
CHEESE CHICKEN 3 1 0.167
Do a self join to get all the in ingredient combinations, then corr by the two meal_nums
SELECT t1.INGREDIENT, t2.INGREDIENT, CORR(t1.MEAL_NUM, t2.MEAL_NUM)
FROM TheTable t1, TheTable t2
WHERE t1.INGREDIENT < t2.INGREDIENT
GROUP BY t1.INGREDIENT, t2.INGREDIENT
Should give you something like:
BEEF CHEESE 0.999
BEEF PASTA 0.998
CHEESE PASTA 0.977
UPDATE: as Chris points out, this won't work as is. What I was hoping is that there might be some way to fudge a mapping from the ordinal meal_num to an interval (#Chris, thanks for the link) value. That may not be possible, in which case this answer wouldn't help.
Try DBMS_FREQUENT_ITEMSET:
--Create sample data
create table meals(meal_num number, ingredient varchar2(10));
insert into meals
select 1, 'BEEF' from dual union all
select 1, 'CHEESE' from dual union all
select 1, 'PASTA' from dual union all
select 2, 'CHEESE' from dual union all
select 2, 'PASTA' from dual union all
select 2, 'FISH' from dual union all
select 3, 'CHEESE' from dual union all
select 3, 'CHICKEN' from dual;
commit;
--Create nested table type to hold results
CREATE OR REPLACE TYPE fi_varchar_nt AS TABLE OF VARCHAR2(10);
/
--Find the items most frequently combined with CHEESE.
select bt.setid, nt.column_value, support occurances_of_itemset
,length, total_tranx
from
(
select
cast(itemset as fi_varchar_nt) itemset, rownum setid
,support, length, total_tranx
from table(dbms_frequent_itemset.fi_transactional(
tranx_cursor => cursor(select meal_num, ingredient from meals),
support_threshold => 0,
itemset_length_min => 2,
itemset_length_max => 2,
including_items => cursor(select 'CHEESE' from dual),
excluding_items => null))
) bt,
table(bt.itemset) nt
where column_value <> 'CHEESE'
order by 3 desc;
SETID COLUMN_VAL OCCURANCES_OF_ITEMSET LENGTH TOTAL_TRANX
---------- ---------- --------------------- ---------- -----------
4 PASTA 2 2 3
3 FISH 1 2 3
1 BEEF 1 2 3
2 CHICKEN 1 2 3
what about a query like that?
select t1.INGREDIENT, count(*)a
from table t1,
(select meal_num
from table
where INGREDIENT = 'CHEESE') t2
where t1.INGREDIENT <> 'CHEESE'
and t1.meal_num=t2.mealnum
group by t1.INGREDIENT;
the result should be the number of time each ingredient share a meal_num with CHEESE.

SQL - Aggregating data in a result set with identical rows and eliminating multiple rows based on one column's value

I have a table that has transactions by employeeID by TransactionTime. Each employee may have multiple transactions that occur at the same time. For example: EmployeeID 1 has 2 transactions at 12. I need to sum the transactions by EmployeeID at each time interval. So for employeeID 1, the new column (TotalTransactionsByTime) result would be 2. Next, if the CODE for a given TransactionTime has a CODE of BAD, I need to exclude all transactions at that time increment. So for EmployeeID 2, I would need to exclude all three transactions from the result set because they have a CODE of 'BAD' which nullifies all transactions at that increment.
MY TABLE
|EmployeeID|TransactionTime|CODE|
1 12 GOOD
1 12 GOOD
1 5 GOOD
2 1 BAD --need to omit all 3 transactions for employeeID 2
2 1 GOOD
2 1 GOOD
3 3 GOOD
3 3 GOOD
A correct result would look like:
|EmployeeID | TransactionTime | CODE | NUMBERTRNS
1 12 GOOD | 2
1 5 GOOD | 1
3 3 GOOD | 2
select mt1.EmployeeID, mt1.TransactionTime, mt1.CODE, count(*) as NUMBERTRNS
from MyTable mt1
where mt1.EmployeeID not in (select EmployeeID from MyTable where CODE = 'BAD')
group by mt1.EmployeeID, mt1.TransactionTime, mt1.CODE