Query to differentiate between stock and in saleorder

Query to differentiate between stock and in saleorder - sql

I have two tables. Table A contains one column Priority. Table B contains Priority, Lot and salesorder.
Table A
Priority
--------
122
123
124
Table B
Priority | Lotid | salesorderline
--------------------------------
122 | 14257 | 4
122 | 14528 | 6
122 | 14782 | 4
122 | 14587 | 0
I'm expecting results like:
Priority | TotalLot | salesorder | In Stock
-------------------------------------------
122 | 4 | 3 | 1
If salesorderline is greater than 0 than its a salesorder otherwise stock

So for each Priority in Table A (which I'll call Priorities) you want aggregate data on each row group in Table B (Lots)?
There are multiple ways to build this query. The simplest is a single query with predicated aggregate functions:
SELECT
Lots.Priority,
COUNT(*) AS TotalLots,
COUNT(CASE WHEN SalesOrderLine > 0 THEN 1 ELSE NULL END) AS SalesOrder,
COUNT(CASE WHEN SalesOrderLine <= 0 THEN 1 ELSE NULL END) AS InStock
FROM
Lots
INNER JOIN Priorities ON Priorities.Priority = Lots.Priority
-- this JOIN doesn't do anything because the Priorities table does not contain any useful data at this point
GROUP BY
Priority
ORDER BY
Priority ASC
Note that I feel your database design is not correct - I don't see why row SalesOrder/Lot row should represent how something is in stock or not, and overloading the SalesOrderLine to give it meaning is a bad design.

Related

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.

You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk

You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.

The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle

Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

How do I transform the specific row value into column headers in hive [duplicate]

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).
I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
+--------+----------+-----------+
| 1 | B | 3 |
+--------+----------+-----------+
| 2 | A | 9 |
+--------+----------+-----------+
| 2 | C | 40 |
+--------+----------+-----------+
How do I query the database to return something like
+--------+------+-----+-----+
| hostid | A | B | C |
+--------+------+-----+-----+
| 1 | 10 | 3 | 0 |
+--------+------+-----+-----+
| 2 | 9 | 0 | 40 |
+--------+------+-----+-----+

I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.
I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:
select * from history;
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
| 1 | B | 3 |
| 2 | A | 9 |
| 2 | C | 40 |
+--------+----------+-----------+
This will be our goal, the pretty pivot table:
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).
When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):
select the columns of interest, i.e. y-values and x-values
extend the base table with extra columns -- one for each x-value
group and aggregate the extended table -- one group for each y-value
(optional) prettify the aggregated table
Let's apply these steps to your problem and see what we get:
Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.
Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:
create view history_extended as (
select
history.*,
case when itemname = "A" then itemvalue end as A,
case when itemname = "B" then itemvalue end as B,
case when itemname = "C" then itemvalue end as C
from history
);
select * from history_extended;
+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A | B | C |
+--------+----------+-----------+------+------+------+
| 1 | A | 10 | 10 | NULL | NULL |
| 1 | B | 3 | NULL | 3 | NULL |
| 2 | A | 9 | 9 | NULL | NULL |
| 2 | C | 40 | NULL | NULL | 40 |
+--------+----------+-----------+------+------+------+
Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.
Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:
create view history_itemvalue_pivot as (
select
hostid,
sum(A) as A,
sum(B) as B,
sum(C) as C
from history_extended
group by hostid
);
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.
Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:
create view history_itemvalue_pivot_pretty as (
select
hostid,
coalesce(A, 0) as A,
coalesce(B, 0) as B,
coalesce(C, 0) as C
from history_itemvalue_pivot
);
select * from history_itemvalue_pivot_pretty;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
And we're done -- we've built a nice, pretty pivot table using MySQL.
Considerations when applying this procedure:
what value to use in the extra columns. I used itemvalue in this example
what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)
Known limitations:
this solution doesn't allow n columns in the pivot table -- each pivot column needs to be manually added when extending the base table. So for 5 or 10 x-values, this solution is nice. For 100, not so nice. There are some solutions with stored procedures generating a query, but they're ugly and difficult to get right. I currently don't know of a good way to solve this problem when the pivot table needs to have lots of columns.

SELECT
hostid,
sum( if( itemname = 'A', itemvalue, 0 ) ) AS A,
sum( if( itemname = 'B', itemvalue, 0 ) ) AS B,
sum( if( itemname = 'C', itemvalue, 0 ) ) AS C
FROM
bob
GROUP BY
hostid;

Another option,especially useful if you have many items you need to pivot is to let mysql build the query for you:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'ifnull(SUM(case when itemname = ''',
itemname,
''' then itemvalue end),0) AS `',
itemname, '`'
)
) INTO #sql
FROM
history;
SET #sql = CONCAT('SELECT hostid, ', #sql, '
FROM history
GROUP BY hostid');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
FIDDLE
Added some extra values to see it working
GROUP_CONCAT has a default value of 1000 so if you have a really big query change this parameter before running it
SET SESSION group_concat_max_len = 1000000;
Test:
DROP TABLE IF EXISTS history;
CREATE TABLE history
(hostid INT,
itemname VARCHAR(5),
itemvalue INT);
INSERT INTO history VALUES(1,'A',10),(1,'B',3),(2,'A',9),
(2,'C',40),(2,'D',5),
(3,'A',14),(3,'B',67),(3,'D',8);
hostid A B C D
1 10 3 0 0
2 9 0 40 5
3 14 67 0 8

Taking advantage of Matt Fenwick's idea that helped me to solve the problem (a lot of thanks), let's reduce it to only one query:
select
history.*,
coalesce(sum(case when itemname = "A" then itemvalue end), 0) as A,
coalesce(sum(case when itemname = "B" then itemvalue end), 0) as B,
coalesce(sum(case when itemname = "C" then itemvalue end), 0) as C
from history
group by hostid

I edit Agung Sagita's answer from subquery to join.
I'm not sure about how much difference between this 2 way, but just for another reference.
SELECT hostid, T2.VALUE AS A, T3.VALUE AS B, T4.VALUE AS C
FROM TableTest AS T1
LEFT JOIN TableTest T2 ON T2.hostid=T1.hostid AND T2.ITEMNAME='A'
LEFT JOIN TableTest T3 ON T3.hostid=T1.hostid AND T3.ITEMNAME='B'
LEFT JOIN TableTest T4 ON T4.hostid=T1.hostid AND T4.ITEMNAME='C'

use subquery
SELECT hostid,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='A' AND hostid = t1.hostid) AS A,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='B' AND hostid = t1.hostid) AS B,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='C' AND hostid = t1.hostid) AS C
FROM TableTest AS T1
GROUP BY hostid
but it will be a problem if sub query resulting more than a row, use further aggregate function in the subquery

If you could use MariaDB there is a very very easy solution.
Since MariaDB-10.02 there has been added a new storage engine called CONNECT that can help us to convert the results of another query or table into a pivot table, just like what you want:
You can have a look at the docs.
First of all install the connect storage engine.
Now the pivot column of our table is itemname and the data for each item is located in itemvalue column, so we can have the result pivot table using this query:
create table pivot_table
engine=connect table_type=pivot tabname=history
option_list='PivotCol=itemname,FncCol=itemvalue';
Now we can select what we want from the pivot_table:
select * from pivot_table
More details here

My solution :
select h.hostid, sum(ifnull(h.A,0)) as A, sum(ifnull(h.B,0)) as B, sum(ifnull(h.C,0)) as C from (
select
hostid,
case when itemName = 'A' then itemvalue end as A,
case when itemName = 'B' then itemvalue end as B,
case when itemName = 'C' then itemvalue end as C
from history
) h group by hostid
It produces the expected results in the submitted case.

I make that into Group By hostId then it will show only first row with values,
like:
A B C
1 10
2 3

I figure out one way to make my reports converting rows to columns almost dynamic using simple querys. You can see and test it online here.
The number of columns of query is fixed but the values are dynamic and based on values of rows. You can build it So, I use one query to build the table header and another one to see the values:
SELECT distinct concat('<th>',itemname,'</th>') as column_name_table_header FROM history order by 1;
SELECT
hostid
,(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue else '' end) as col1
,(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue else '' end) as col2
,(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue else '' end) as col3
,(case when itemname = (select distinct itemname from history a order by 1 limit 3,1) then itemvalue else '' end) as col4
FROM history order by 1;
You can summarize it, too:
SELECT
hostid
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue end) as A
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue end) as B
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue end) as C
FROM history group by hostid order by 1;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
Results of RexTester:
http://rextester.com/ZSWKS28923
For one real example of use, this report bellow show in columns the hours of departures arrivals of boat/bus with a visual schedule. You will see one additional column not used at the last col without confuse the visualization:
** ticketing system to of sell ticket online and presential

This isn't the exact answer you are looking for but it was a solution that i needed on my project and hope this helps someone. This will list 1 to n row items separated by commas. Group_Concat makes this possible in MySQL.
select
cemetery.cemetery_id as "Cemetery_ID",
GROUP_CONCAT(distinct(names.name)) as "Cemetery_Name",
cemetery.latitude as Latitude,
cemetery.longitude as Longitude,
c.Contact_Info,
d.Direction_Type,
d.Directions
from cemetery
left join cemetery_names on cemetery.cemetery_id = cemetery_names.cemetery_id
left join names on cemetery_names.name_id = names.name_id
left join cemetery_contact on cemetery.cemetery_id = cemetery_contact.cemetery_id
left join
(
select
cemetery_contact.cemetery_id as cID,
group_concat(contacts.name, char(32), phone.number) as Contact_Info
from cemetery_contact
left join contacts on cemetery_contact.contact_id = contacts.contact_id
left join phone on cemetery_contact.contact_id = phone.contact_id
group by cID
)
as c on c.cID = cemetery.cemetery_id
left join
(
select
cemetery_id as dID,
group_concat(direction_type.direction_type) as Direction_Type,
group_concat(directions.value , char(13), char(9)) as Directions
from directions
left join direction_type on directions.type = direction_type.direction_type_id
group by dID
)
as d on d.dID = cemetery.cemetery_id
group by Cemetery_ID
This cemetery has two common names so the names are listed in different rows connected by a single id but two name ids and the query produces something like this
CemeteryID Cemetery_Name Latitude
1 Appleton,Sulpher Springs 35.4276242832293

You can use a couple of LEFT JOINs. Kindly use this code
SELECT t.hostid,
COALESCE(t1.itemvalue, 0) A,
COALESCE(t2.itemvalue, 0) B,
COALESCE(t3.itemvalue, 0) C
FROM history t
LEFT JOIN history t1
ON t1.hostid = t.hostid
AND t1.itemname = 'A'
LEFT JOIN history t2
ON t2.hostid = t.hostid
AND t2.itemname = 'B'
LEFT JOIN history t3
ON t3.hostid = t.hostid
AND t3.itemname = 'C'
GROUP BY t.hostid

I'm sorry to say this and maybe I'm not solving your problem exactly but PostgreSQL is 10 years older than MySQL and is extremely advanced compared to MySQL and there's many ways to achieve this easily. Install PostgreSQL and execute this query
CREATE EXTENSION tablefunc;
then voila! And here's extensive documentation: PostgreSQL: Documentation: 9.1: tablefunc or this query
CREATE EXTENSION hstore;
then again voila! PostgreSQL: Documentation: 9.0: hstore

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+

With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;

The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

Hive / SQL - Left join with fallback

In Apache Hive I have to tables I would like to left-join keeping all the data from the left data and adding data where possible from the right table.
For this I use two joins, because the join is based on two fields (a material_id and a location_id).
This works fine with two traditional left joins:
SELECT
a.*,
b.*
FROM a
INNER JOIN (some more complex select) b
ON a.material_id=b.material_id
AND a.location_id=b.location_id;
For the location_id the database only contains two distinct values, say 1 and 2.
We now have the requirement that if there is no "perfect match", this means that only the material_id can be joined and there is no correct combination of material_id and location_id (e.g. material_id=100 and location_id=1) for the join for the location_id in the b-table, the join should "default" or "fallback" to the other possible value of the location_id e.g. material_id=001 and location_id=2 and vice versa. This should only be the case for the location_id.
We have already looked into all possible answers also with CASE etc. but to no prevail. A setup like
...
ON a.material_id=b.material_id AND a.location_id=
CASE WHEN a.location_id = b.location_id THEN b.location_id ELSE ...;
we tried or did not figure out how really to do in hive query language.
Thank you for your help! Maybe somebody has a smart idea.
Here is some sample data:
Table a
| material_id | location_id | other_column_a |
| 100 | 1 | 45 |
| 101 | 1 | 45 |
| 103 | 1 | 45 |
| 103 | 2 | 45 |
Table b
| material_id | location_id | other_column_b |
| 100 | 1 | 66 |
| 102 | 1 | 76 |
| 103 | 2 | 88 |
Left - Join Table
| material_id | location_id | other_column_a | other_column_b
| 100 | 1 | 45 | 66
| 101 | 1 | 45 | NULL (mat. not in b)
| 103 | 1 | 45 | DEFAULT TO where location_id=2 (88)
| 103 | 2 | 45 | 88
PS: As stated here exists etc. does not work in the sub-query ON.

The solution is to left join without a.location_id = b.location_id and number all rows in order of preference. Then filter by row_number. In the code below the join will duplicate rows first because all matching material_id will be joined, then row_number() function will assign 1 to rows where a.location_id = b.location_id and 2 to rows where a.location_id <> b.location_id if exist also rows where a.location_id = b.location_id and 1 if there are not exist such. b.location_id added to the order by in the row_number() function so it will "prefer" rows with lower b.location_id in case there are no exact matching. I hope you have caught the idea.
select * from
(
SELECT
a.*,
b.*,
row_number() over(partition by material_id
order by CASE WHEN a.location_id = b.location_id THEN 1 ELSE 2 END, b.location_id ) as rn
FROM a
LEFT JOIN (some more complex select) b
ON a.material_id=b.material_id
)s
where rn=1
;

Maybe this is helpful for somebody in the future:
We also came up with a different approach.
First, we create another table to calculate averages from the table b based on material_id over all (!) locations.
Second, In the join table we create three columns:
c1 - the value where material_id and location_id are matching (result from a left join of table a with table b). This column is null if there is no perfect match.
c2 - the value from the table where we write the number from the averages (fallback) table for this material_id (regardless of the location)
c3 - the "actual value" column where we use a case statement to decide if when the column 1 is NULL (there is no perfect match of material and location) then we use the value from column 2 (the average over all the other locations for the material) for the further calculations.

SQL join problems - users betting on matches

I have the following table:
scores:
user_id | match_id | points
1 | 110 | 4
1 | 111 | 3
1 | 112 | 3
2 | 111 | 2
Users bet on matches and depending on the result of the match they are awarded with points. Depending on how accurate the bet was you are either awarded with 0, 2, 3 or 4 points for a match.
Now I want to rank the users so that i can see who is in 1st, 2nd place etc...
The ranking order is firstly by total_points. If these are equal its ordered by the amount of times a user has scored 4 points then by the amount of times a user scored 3 points and so on.
For that i would need the following table:
user_id | total_points | #_of_fours | #_of_threes | #_of_twos
1 | 10 | 1 | 2 | 0
2 | 2 | 0 | 0 | 1
But i cant figure out the join statements which would help me get it.
This is as far as i get without help:
SELECT user_id, COUNT( points ) AS #_of_fours FROM scores WHERE points = 4 GROUP BY user_id
Which results in
user_id | #_of_fours
1 | 1
2 | 0
Now i would have to do that for #_of_threes and twos aswell as total points and join it all together, but i cant figure out how.
BTW im using MySQL.
Any help would be really apreciated. Thanks in advance

SELECT user_id
, sum(points) as total_points
, sum(case when points = 4 then 1 end) AS #_of_fours
, sum(case when points = 3 then 1 end) AS #_of_threes
, sum(case when points = 2 then 1 end) AS #_of_twos
FROM scores
GROUP BY
user_id

Using mysql syntax, you can use SUM to count the matching rows easily;
SELECT
user_id,
SUM(points) AS total_points,
SUM(points=4) AS no_of_fours,
SUM(points=3) AS no_of_threes,
SUM(points=2) AS no_of_twos
FROM Table1
GROUP BY user_id;
Demo here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query to differentiate between stock and in saleorder - sql

Related

How to create BigQuery this query in retail dataset

How do I transform the specific row value into column headers in hive [duplicate]

Comparing different columns in SQL for each row

Hive / SQL - Left join with fallback

SQL join problems - users betting on matches

Categories

Resources