SQL: Merge localized version of a table to the main one - sql

Imagine I have a main table like:
Table guys
|id| name|profession|
|--|------|----------|
| 1| John| developer|
| 2| Mike| boss|
| 3| Roger| fireman|
| 4| Bob| policeman|
I also have a localized version which is not complete (the boss is missing):
Table guys_bg
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
I want to prioritize guys_bg results while still showing all the guys (The boss is still a guy, right?).
This is the desired result:
|id| name | profession|
|--|------|-----------|
| 1| Джон|разработчик|
| 2| Mike| boss|
| 3|Роджър| пожарникар|
| 4| Боб| полицай|
Take into consideration that both tables may have a lot of (100+) columns so joining the tables and using CASE for every column will be very tedious.
What are my options?

Here is one way using union all:
select gb.*
from guys_bg gb
union all
select g.*
from guys g
where not exists (select 1 from guys_bg gb where gb.id = g.id);

You can also make it with using FULL JOIN.
SELECT
ISNULL(b.id,g.id) id
, ISNULL(b.name, g.name) name
, ISNULL(b.profession, g.profession) profession
FROM
guys g
FULL JOIN guys_bg b ON g.id = b.id

Related

How to merge rows in hive?

I have a production table in hive which gets incremental(changed records/new records) data from external source on daily basis. For values in row are possibly spread across different dates, for example, this is how records in table looks on first day
+---+----+----+
| id|col1|col2|
+---+----+----+
| 1| a1| b1|
| 2| a2| |
| 3| | b3|
+---+----+----+
on second day, we get following -
+---+----+----+
| id|col1|col2|
+---+----+----+
| 4| a4| |
| 2| | b2 |
| 3| a3| |
+---+----+----+
which has new record as well as changed records
The result I want to achieve is, merge of rows based on Primary key (id in this case) and produce and output which is -
+---+----+----+
| id|col1|col2|
+---+----+----+
| 1| a1| b1|
| 2| a2| b2 |
| 3| a3| b3|
| 4| a4| b4|
+---+----+----+
Number of columns are pretty huge , typically in range of 100-150. Aim is to provide latest full view of all the data received so far.How can I do this within hive itself.
(ps:it doesnt have to be sorted)
This can archived using COALESCE and full outer join.
SELECT COALESCE(a.id ,b.id) as id ,
COALESCE(a.col1 ,b.col1) as col1,
COALESCE(a.col2 ,b.col2) as col2
FROM tbl1 a
FULL OUTER JOIN table2 b
on a.id =b.id

Can't SUM DISTINCT values in Ruby on Rails

I have 4 tables: Users, Workouts, Exercises, and Results. A "User" posts "Results" for "Exercises" which are linked to a single "Workout". But when the user posts results, since there are multiple exercises, results for one workout can be linked with a unique "post_id". I would like to know how many total minutes a user exercised based on how many "post_ids" they provided which can be linked to the "Workouts" table where a "workout_duration" column shows how many minutes each workout lasts. Here is some sample data, where in this case the workout (workout_id=1) has two exercises and has a workout_duration of 1 minute.
Results:
user_id| workout_id| post_id| exercise_id| number_of_reps|
-------+-----------+--------+------------+---------------+
123| 1 | 1| 1 | 18|
123| 1 | 1| 2 | 29|
123| 1 | 2| 1 | 15|
123| 1 | 2| 2 | 30|
123| 1 | 3| 1 | 20|
123| 1 | 3| 2 | 28|
-------+-----------+--------+------------+---------------+
Workouts:
workout_id| workout_duration|
----------+-----------------+
1| 1|
I tried to retrieve the total number of minutes based on the query below, but it is returning a sum of 6 when I want it to return a value of 3...I think this is because the SUM is not taking into account DISTINCT post_ids...rather it is just summing all post_ids.
#user = User.find(current_user)
#total_minutes = #user.results.includes(:workout).select(:post_id).distinct.sum(:workout_duration)
I have searched high and low for solutions to no avail...any ideas?
EDIT:
Here is the generated SQL from the query above:
SELECT DISTINCT SUM(workout_duration)
FROM "results"
LEFT OUTER JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
WHERE "results"."user_id" = ? [["user_id", 123]]
I solved this by using raw SQL:
#minute_total = ActiveRecord::Base.connection.execute(minute_query)[0][0]
private
def minute_query
"SELECT SUM(workout_duration)
FROM (SELECT DISTINCT(results.post_id), results.user_id, workouts.workout_duration
FROM results LEFT OUTER JOIN workouts ON results.workout_id = workouts.id
WHERE results.user_id = #{#user.id})"
end

SQL query for finding the most frequent value of a grouped by value

I'm using SQLite browser, I'm trying to find a query that can find the max of each grouped by a value from another column from:
Table is called main
| |Place |Value|
| 1| London| 101|
| 2| London| 20|
| 3| London| 101|
| 4| London| 20|
| 5| London| 20|
| 6| London| 20|
| 7| London| 20|
| 8| London| 20|
| 9| France| 30|
| 10| France| 30|
| 11| France| 30|
| 12| France| 30|
The result I'm looking for is the finding the most frequent value grouping by place:
| |Place |Most Frequent Value|
| 1| London| 20|
| 2| France| 30|
Or even better
| |Place |Most Frequent Value|Largest Percentage|2nd Largest Percentage|
| 1| London| 20| 0.75| 0.25|
| 2| France| 30| 1| 0.75|
You can group by place, then value, and order by frequency eg.
select place,value,count(value) as freq from cars group by place,value order by place, freq;
This will not give exactly the answer you want, but near to it like
London | 101 | 2
France | 30 | 4
London | 20 | 6
Now select place and value from this intermediate table and group by place, so that only one row per place is displayed.
select place,value from
(select place,value,count(value) as freq from cars group by place,value order by place, freq)
group by place;
This will produce the result like following:
France | 30
London | 20
This works for sqlite. But for some other programs, it might not work as expected and return the place and value with least frequency. In those, you can put order by place, freq desc instead to solve your problem.
The first part would be something like this.
http://sqlfiddle.com/#!7/ac182/8
with tbl1 as
(select a.place,a.value,count(a.value) as val_count
from table1 a
group by a.place,a.value
)
select t1.place,
t1.value as most_frequent_value
from tbl1 t1
inner join
(select place,max(val_count) as val_count from tbl1
group by place) t2
on t1.place=t2.place
and t1.val_count=t2.val_count
Here we are deriving tbl1 which will give us the count of each place and value combination. Now we will join this data with another derived table t2 which will find the max count and we will join this data to get the required result.
I am not sure how do you want the percentage in second output, but if you understood this query, you can use some logic on top of it do derive the required output. Play around with the sqlfiddle. All the best.
RANK
SQLite now supports RANK, so we can use the exact same syntax that works on PostgreSQL, similar to https://stackoverflow.com/a/12448971/895245
SELECT "city", "value", "cnt"
FROM (
SELECT
"city",
"value",
COUNT(*) AS "cnt",
RANK() OVER (
PARTITION BY "city"
ORDER BY COUNT(*) DESC
) AS "rnk"
FROM "Sales"
GROUP BY "city", "value"
) AS "sub"
WHERE "rnk" = 1
ORDER BY
"city" ASC,
"value" ASC
This would return all in case of tie. To return just one you could use ROW_NUMBER instead of RANK.
Tested on SQLite 3.34.0 and PostgreSQL 14.3. GitHub upstream.

Finding the intersection of tables that use a many-to-many relationship in SQL

I need to get an intersection of two tables that uses two many-to-many tables to relate each other. Example tables as follows:
**Discount** **DiscountRef** **ProductCat** **Product**
|DisId| Discount|Amount| |DisId|RefType|RefId|IsActive| |ProdId|CatId| |ProdId| ProdName|ProdPrice|
+-----+---------+------+ +-----+-------+-----+--------+ +------+-----+ +------+--------------+---------+
| 1| 2% Off| 0.02| | 1|Product| 9004| 0| | 9001| 3456| | 9001| 9" Nail| 0.50|
| 2| 10% Off| 0.10| | 2|Product| 9002| 0| | 9002| 3456| | 9002| 2"x4" Stud| 2.50|
| 3| 25% Off| 0.25| | 2| PCat| 3456| 1| | 9005| 3456| | 9003| Claw Hammer| 5.99|
| 4| 2 for 1| 0.50| | 3| PCat| 7346| 1| | 9001| 7346| | 9004| Wood Glue| 1.20|
| 5|Clearance| 0.75| | 3| PCat| 4455| 1| | 9003| 7346| | 9005|6'x4' Dry Wall| 10.39|
| 5|Product| 9004| 0| | 9003| 4455| | 9006| Screwdriver| 4.25|
| 9006| 4455|
With these tables I need to get the intersection of Product Categories if there under the same Discount Id. The below table is what I need to get:
|DisId|ProdId|DisPrice|
+-----+------+--------+
| 2| 9001| 0.45|
| 2| 9002| 2.25|
| 2| 9005| 9.36|
| 3| 9003| 4.50|
I have tried a few different ways but can't seem to get to that table. The below SQL returns me the discounts that have more then one category applied to it.
SELECT DR.DisId, PC.CatId
FROM DiscountRef DR
INNER JOIN (
SELECT DisId
FROM DiscountRef
GROUP BY DisId
HAVING COUNT(DisId) > 1
) SDR ON SDR.DisId = DR.DisId
INNER JOIN ProductCat PC ON PC.CatId = DR.RefId AND DR.RefType = 'PCat'
GROUP BY DR.DisId, PC.CateId
Table Returned:
|DisId|CatId|
+-----+-----+
| 3| 7346|
| 3| 4455|
Then using the Product Categories Id's with an intersect of Product tables I get the correct amount of product Ids.
SELECT P1.ProdId
FROM Product P1
INNER JOIN ProdCat PC1 ON PC1.ProdId = P1.ProdId AND PC1.CategoryId = 7346
INTERSECT
SELECT P2.ProdId
FROM Product P2
INNER JOIN ProdCat PC2 ON PC2.ProdId = P2.ProdId AND PC2.CategoryId = 4455
Also a discount can have more then two categories (Narrows down the number of products), and some times there's more then one discount active (discount data is omitted for this but a check will be done).
Any help on how I can get my desired table above?
EDIT: If there are multiple DisIds on the DiscountRef table and they happen to be the PCat type they are products that shared in all the categories. Like how Claw Hammer is the only item that appears in both CatId 7346 AND CatId 4455.

Retrieving rows that share multiple ID's in SQL

I am stuck on how to narrow down a selection of rows that are related by multiple ID's. Here is my problem with the data as follows:
|Widget | |Widget Category | |Part Category | |Part |
+---------+ +--------------------+ +--------------+ +-------------+
|Id|Name | |WidId|CatId|CatName | |PartId| CatId | |Id|Name |
+---------+ +-----+-----+--------+ +------+-------+ +--+----------+
| 1|item01| | 1| 1|Windows | | 1| 1| | 1|Glass |
| 2|item02| | 2| 1|Windows | | 1| 2| | 2|Door Frame|
| 3|item03| | 3| 1|Windows | | 2| 2| | 3|Wheel |
| 4|item04| | 1| 2|Door | | 4| 2| | 4|Handle |
| 5|item05| | 5| 2|Door |
| 6|item06| | 6| 3|Trunk |
One or more widgets can be in a Widget Category. Many widget categories can have many part Categories. Many Parts can be part of many part categories. I need to know what Parts are linked to what Widgets. So we know that Item01 has parts "Glass" and Item05 has Parts "Glass, Door Frame, and Handle".
Here is my SQL I have so far but I need it to be dynamic so it can run once a week on a stored procedure.
---- This gives me the Correct number of Widgets to Parts based on set of 2 category ID's as a quick and static hack
SELECT W.Id
FROM Widget W
INNER JOIN dbo.[WidgetCategory] WC1 ON WC1.WidId = W.Id
INNER JOIN dbo.[WidgetCategory] WC2 ON WC2.WidId = W.Id
WHERE WC1.CatId = 1 AND WC2.CatId = 2
GROUP BY W.Id
The reason for the above query is to get a table structure that is grouped by PartId's to WidgetId's as an intersection of the two related categories and all the widgets that are related to parts. The below table is what I am trying to get so that I can aggregate how many widgets are in a part (COUNT(WidId) GROUP BY PartId):
|WidId|PartId|WidgetName|
+-----+------+----------+
| 1| 1| Item01|
| 2| 1| Item02|
| 3| 1| Item03|
| 1| 2| Item01|
| 5| 2| Item05|
Updated question: How can I get this response from the tables above with only returning the intersection of the two categories?
|WidId|PartId|WidgetName|
+-----+------+----------+
| 1| 1| Item01|
| 1| 2| Item01|
Any help would be greatly appreciated! Sorry for the sloppiness, had to post quickly before I left for weekend.
EDIT: Sorry, about the ProductId, was left over from some SQL that I was using. Should be Widget Id. Added more clarity to the problem and added an addition problem I was having.
I think you need a query like this.
SELECT DISTINCT w.WidId, p.ParId, w.Name
FROM Widget w
JOIN WidgetCategory wc ON wc.WidId=w.Id
JOIN PartCategory pc ON pc.CatId=wc.CatId
JOIN Part p ON p.Id=pc.ParId
I don't see why you would need to join twice on the WidgetCategory table. What you need is to reach the Part table by joining the PartCategory table.
And why are you grouping? If you want all the parts, then you can't group, unless you use some specific SQL feature to concatenate all the parts in a single row. This may or may not be possible, depending on which database engine you are using.
I added the DISTINCT, just in case you have more than one ways to get from Widget X to Part Y... that is enough to remove duplicates. There is no need for a GROUP BY unless you need to COUNT or do something else with the aggregation.