SQL - Pivot or Unpivot? - sql

Another time, another problem. I have the following table:
|assemb.|Repl_1|Repl_2|Repl_3|Repl_4|Repl_5|Amount_1|Amount_2|Amount_3|Amount_4|Amount_5|
|---------------------------------------------------------------------------------------|
|4711001|111000|222000|333000|444000|555000| 1| 1| 1| 1| 1|
|---------------------------------------------------------------------------------------|
|4711002|222000|333000|444000|555000|666000| 1| 1| 1| 1| 1|
|---------------------------------------------------------------------------------------|
And here what I need:
|Article|Amount|
|--------------|
| 111000| 1|
|--------------|
| 222000| 2|
|--------------|
| 333000| 2|
|--------------|
| 444000| 2|
|--------------|
| 555000| 2|
|--------------|
| 666000| 1|
|---------------
Repl_1 to Repl_10 are replacement-articles of the assembly. I can have n assemblies with to 10 rep-articles. At the end I need to overview all articles with there amounts of all assemblies.
THX.
Best greetz
Vegeta

This is probably the quickest way of achieving it using UNION ALL. However, I'd recommend normalising your table
SELECT Article, SUM(Amount) FROM (
SELECT Repl_1 AS Article, SUM(Amount_1) AS Amount FROM #Test GROUP BY Repl_1
UNION ALL
SELECT Repl_2 AS Article, SUM(Amount_2) AS Amount FROM #Test GROUP BY Repl_2
UNION ALL
SELECT Repl_3 AS Article, SUM(Amount_3) AS Amount FROM #Test GROUP BY Repl_3
UNION ALL
SELECT Repl_4 AS Article, SUM(Amount_4) AS Amount FROM #Test GROUP BY Repl_4
UNION ALL
SELECT Repl_5 AS Article, SUM(Amount_5) AS Amount FROM #Test GROUP BY Repl_5
) tbl GROUP BY Article

Related

Number of foods that scored "true" in being good, grouped by culture SQL

Okay, so I've been driving myself crazy trying to get this to display in SQL. I have a table that stores types of food, the culture they come from, a score, and a boolean value about whether or not they are good. I want to display a record of how many "goods" each culture racks up. Here's the table (don't ask about the database name):
So I've tried:
SELECT count(good = 1), culture FROM animals_db.foods group by culture;
Or
SELECT count(good = true), culture FROM animals_db.foods group by culture;
But it doesn't present the correct results, it seems to include anything that has any "good" value (1 or 0) at all.
How do I get the data I want?
instead of count , use sum.
SELECT sum(good), culture FROM animals_db.foods group by culture; -- assume good column value have integer data type and good value is represent as 1 otherwise 0
or other way is using count
select count( case when good=1 then 1 end) , culture from animals_db.foods group by culture;
If the purpose is to count the number of good=1 for each culture, this works:
select culture,
count(*)
from foods
where good=1
group by 1
order by 1;
Result:
culture |count(*)|
--------+--------+
| 1|
American| 1|
Chinese | 1|
European| 1|
Italian | 2|
The reason your first query doesn't return the result can be explained as below:
select culture,
good=1 as is_good
from foods
order by 1;
You actually get:
culture |is_good|
--------+-------+
| 1|
American| 0|
American| 1|
Chinese | 1|
European| 1|
French | 0|
French | 0|
German | 0|
Italian | 1|
Italian | 1|
After applied group by culture and count(good=1), you're actually counting the number of NOT NULL values in good=1. For example:
select culture,
count(good=0) as c0,
count(good=1) as c1,
count(good=2) as c2,
count(good) as c3,
count(null) as c4
from foods
group by culture
order by culture;
Outcome:
culture |c0|c1|c2|c3|c4|
--------+--+--+--+--+--+
| 1| 1| 1| 1| 0|
American| 2| 2| 2| 2| 0|
Chinese | 1| 1| 1| 1| 0|
European| 1| 1| 1| 1| 0|
French | 2| 2| 2| 2| 0|
German | 1| 1| 1| 1| 0|
Italian | 2| 2| 2| 2| 0|
Update: This is similar to your question: Is it possible to specify condition in Count()?.

How to use window function in Redshift?

I have 2 tables:
| Product |
|:----: |
| product_id |
| source_id|
Source
source_id
priority
sometimes there are cases when 1 product_id can contain few sources and my task is to select data with min priority from for example
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 2| 9|
| 10| 4| 2|
| 20| 2| 9|
| 20| 4| 2|
| 30| 2| 9|
| 30| 4| 2|
correct result should be like:
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 4| 2|
| 20| 4| 2|
| 30| 4| 2|
I am using query:
SELECT p.product_id, p.source_id, s.priority FROM Product p
INNER JOIN Source s on s.source_id = p.source_id
WHERE s.priority = (SELECT Min(s1.priority) OVER (PARTITION BY p.product_id) FROM Source s1)
but it returns error "this type of correlated subquery pattern is not supported yet" so as i understand i can't use such variant in Redshift, how should it be solved, are there any other ways?
You just need to unroll the where clause into the second data source and the easiest flag for min priority is to use the ROW_NUMBER() window function. You're asking Redshift to rerun the window function for each JOIN ON test which creates a lot of inefficiencies in clustered database. Try the following (untested):
SELECT p.product_id, p.source_id, s.priority
FROM Product p
INNER JOIN (
SELECT ROW_NUMBER() OVER (PARTITION BY p.product_id, order by s1.priority) as row_num,
source_id,
priority
FROM Source) s
on s.source_id = p.source_id
WHERE row_num = 1
Now the window function only runs once. You can also move the subquery to a CTE if that improve readability for your full case.
Already found best solution for that case:
SELECT
p.product_id
, p.source_id
, s.priority
, Min(s.priority) OVER (PARTITION BY p.product_id) as min_priority
FROM Product p
INNER JOIN Source s
ON s.source_id = p.source_id
WHERE s.priority = p.min_priority

SQL query for finding the most frequent value of a grouped by value

I'm using SQLite browser, I'm trying to find a query that can find the max of each grouped by a value from another column from:
Table is called main
| |Place |Value|
| 1| London| 101|
| 2| London| 20|
| 3| London| 101|
| 4| London| 20|
| 5| London| 20|
| 6| London| 20|
| 7| London| 20|
| 8| London| 20|
| 9| France| 30|
| 10| France| 30|
| 11| France| 30|
| 12| France| 30|
The result I'm looking for is the finding the most frequent value grouping by place:
| |Place |Most Frequent Value|
| 1| London| 20|
| 2| France| 30|
Or even better
| |Place |Most Frequent Value|Largest Percentage|2nd Largest Percentage|
| 1| London| 20| 0.75| 0.25|
| 2| France| 30| 1| 0.75|
You can group by place, then value, and order by frequency eg.
select place,value,count(value) as freq from cars group by place,value order by place, freq;
This will not give exactly the answer you want, but near to it like
London | 101 | 2
France | 30 | 4
London | 20 | 6
Now select place and value from this intermediate table and group by place, so that only one row per place is displayed.
select place,value from
(select place,value,count(value) as freq from cars group by place,value order by place, freq)
group by place;
This will produce the result like following:
France | 30
London | 20
This works for sqlite. But for some other programs, it might not work as expected and return the place and value with least frequency. In those, you can put order by place, freq desc instead to solve your problem.
The first part would be something like this.
http://sqlfiddle.com/#!7/ac182/8
with tbl1 as
(select a.place,a.value,count(a.value) as val_count
from table1 a
group by a.place,a.value
)
select t1.place,
t1.value as most_frequent_value
from tbl1 t1
inner join
(select place,max(val_count) as val_count from tbl1
group by place) t2
on t1.place=t2.place
and t1.val_count=t2.val_count
Here we are deriving tbl1 which will give us the count of each place and value combination. Now we will join this data with another derived table t2 which will find the max count and we will join this data to get the required result.
I am not sure how do you want the percentage in second output, but if you understood this query, you can use some logic on top of it do derive the required output. Play around with the sqlfiddle. All the best.
RANK
SQLite now supports RANK, so we can use the exact same syntax that works on PostgreSQL, similar to https://stackoverflow.com/a/12448971/895245
SELECT "city", "value", "cnt"
FROM (
SELECT
"city",
"value",
COUNT(*) AS "cnt",
RANK() OVER (
PARTITION BY "city"
ORDER BY COUNT(*) DESC
) AS "rnk"
FROM "Sales"
GROUP BY "city", "value"
) AS "sub"
WHERE "rnk" = 1
ORDER BY
"city" ASC,
"value" ASC
This would return all in case of tie. To return just one you could use ROW_NUMBER instead of RANK.
Tested on SQLite 3.34.0 and PostgreSQL 14.3. GitHub upstream.

(SQL) Assembling multiple tables of deltas into a single populated view?

Lets say I have 3 different tables. Foo, Bar, and Baz. Each tables has the same structure; a timestamp and a data value. We can also assume that each table is synchronized at the top row.
Foo Bar Baz
________________ ________________ _________________
|Time |Value| |Time |Value| |Time |Value |
|1:00 |0 | |1:00 |10 | |1:00 |100 |
|1:15 |1 | |1:10 |11 | |1:20 |101 |
|1:30 |2 | |1:40 |12 | |1:50 |102 |
|1:45 |3 | |1:50 |13 | |1:55 |103 |
Is there a simple way to to assemble these records into a single view where the value of each column is assumed to be the last known value for each populates the unprovided times?
________________________________________
|Time |Foo.Value|Bar.Value|Baz.Value|
|1:00 | 1| 10| 100|
|1:10 | 1| 11| 100|
|1:15 | 2| 11| 100|
|1:20 | 2| 11| 101|
|1:30 | 3| 11| 101|
|1:40 | 3| 12| 101|
|1:45 | 4| 12| 101|
|1:50 | 4| 13| 102|
|1:55 | 4| 13| 103|
Edit:
What if I wanted to select a time range, but wished to have the last known value of each column brought forward? Is there a simple way to do so without producing the entire table then filtering it down?
e.g. if I wanted records from 1:17 to 1:48, I would want the following...
________________________________________
|Time |Foo.Value|Bar.Value|Baz.Value|
|1:20 | 2| 11| 101|
|1:30 | 3| 11| 101|
|1:40 | 3| 12| 101|
|1:45 | 4| 12| 101|
SQL Server 2008 doesn't support lag(), much less lag() with ignore nulls. So, I think the easiest way may be with correlated subqueries. Get all the times from the three tables and then populate the values:
select fbb.time,
(select top 1 value from foo t where t.time <= fbb.time order by t.time desc
) as foo,
(select top 1 value from bar t where t.time <= fbb.time order by t.time desc
) as bar,
(select top 1 value from baz t where t.time <= fbb.time order by t.time desc
) as baz
from (select time from foo union
select time from bar union
select time from baz
) fbb;
EDIT:
An alternative approach uses aggregation:
select time, max(foo) as foo, max(bar) as bar, max(baz) as baz
from (select time, value as foo, NULL as bar, NULL as baz from foo union all
select time, NULL, value, NULL from bar union all
select time, NULL, NULL baz from baz
) fbb
group by time
order by time;
This probably has better performance than the first method.
Here is an another alternative solution as you are using SQL SERVER 2008:
SELECT *
FROM (
SELECT t, [time], value
FROM ( SELECT 'Foo' as t, *
FROM #Foo
UNION
SELECT 'Bar' as t, *
FROM #Bar
UNION
SELECT 'Baz' as t, *
FROM #Baz
) un
WHERE [time] BETWEEN '1:17' AND '1:48'
) AS fbb
PIVOT (MAX(value) FOR fbb.[t] IN (Foo, Bar, Baz)) pvt

SQL needed to get most popular product based also off a quantity

I'm currently trying to get the most popular productID from my MSSQL Database. This is what the table looks like (With a bit of dummy data):
OrderItems:
+--+-------+--------+---------+
|ID|OrderID|Quantity|ProductID|
+--+-------+--------+---------+
| 1| 1| 1| 1|
| 2| 1| 1| 2|
| 3| 2| 1| 1|
| 4| 2| 50| 2|
The OrderID field can be ignored, but I need to find the most popular ProductID's from this table, ordering them by how often they occur. The results set should look something like this:
+--------+
|PoductID|
+--------+
| 2|
| 1|
As ProductID 2 has a total quantity of 51, it needs to come out first, followed by ProductID 1 which only has a total quantity of 2.
(Note: Query needs to be compatible back to MSSQL-2008)
SELECT
productID
FROM
yourTable
GROUP BY
productID
ORDER BY
SUM(Quantity) DESC
GROUP BY allows SUM(), but you don't have to use it in the SELECT to be allowed to use it in the ORDER BY.
select ProductID
from OrderItems
group by ProductId
order by sum(Quantity) desc;