Convert column values to columns - sql

Currently the database looks like below :
I am trying to convert it as below :
The best way I could come up with was a SQL pivot. But that groups the Product ID and gives only one of the three 330 rows that we see above. I am not able to think of any other way to approach this. If anyone could think of any way to solve could you please share your thoughts ?

You can use conditional aggregation:
select productid,
max(case when description = 'Part No' then unitdesc end) as partno,
. . . -- and so on for the other columns
from t
group by productid;
EDIT:
I see, you have multiple rows per product. You have a problem, because SQL tables represent unordered sets. There is no ordering, unless you have a column that specifies the ordering. That is not obvious.
So, the following will create single rows, but not necessarily combined as you would like:
select productid,
max(case when description = 'Part No' then unitdesc end) as partno,
. . . -- and so on for the other columns
from (select t.*, row_number() over (partition by productid, description order by productid) as seqnum
from t
) t
group by productid, seqnum;
If you have a column that does capture the ordering of the rows, then use that column in the order by.

You can use LEFT JOIN to retrieve the corresponding values:
select
p.product_id,
n.unit_desc as part_no,
d.unit_desc as description,
pn.unit_desc as price_now,
u.unit_desc as unit
from (select distinct product_id from t) p
left join (select product_id, description from t where description = 'Part No') n
left join (select product_id, description from t where description = 'Description') d
left join (select product_id, description from t where description = 'Price Now') pn
left join (select product_id, description from t where description = 'Unit') u

Related

Count values from a concatenated column?

I'm running this query, but I'm getting and error when I add count
SELECT
CONCAT (product_code, product_color) AS new_product_code
FROM [dbo].[Furniture]
where product = 'couch'
enter image description here
I would like to add another column and be able to count how many times a product was purchased according to its color. But I want to keep the product_code and product_color concatenated in a column. Any suggestions?
Thanks
I suspect when you added count(), you didn't have the GROUP BY
The traditional approach
Select Prod_Color = CONCAT(product_code, product_color )
,count(*)
From [dbo].[Furniture]
Where product = 'couch'
Group By CONCAT(product_code, product_color )
Or using a CROSS APPLY for a single expression
Select Prod_Color
,count(*)
From [dbo].[Furniture]
Cross Apply ( values ( CONCAT(product_code, product_color ) ) ) V(Prod_Color)
Where product = 'couch'
Group By Prod_Color

How to write SQL query without join?

Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?
You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T
In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Get the difference of unique elements of a flag-based ID in SQL

I have an SQL table from which I want to extract unique elements by ID, comparing different groups, for example :
ID,Group,Product
a,2,33
a,1,83
b,3,51
c,2,33
b,1,20
a,3,20
b,2,51
a,2,83
If I have two products equals in different groups for the same ID, then I don't save them. Resulting this:
ID,Group,Unique
a,2,33
c,2,33
b,1,20
a,3,20
I'm trying this in SQL, but I don't know how to do it, please help me!
Remove all rows that have the same product and different groups:
select *
from yourtable a
where not exists(
select 1 from yourtable b where a.Product = b.Product and a.Group <> b.Group
)
select * from table_1
qualify count("product") over(partition by "group")=1
One method is aggregation:
select id, max(group) as group, product
from t
group by id, product
having min(group) = max(group);

How to work with problems correlated subqueries that reference other tables, without using Join

I am trying to work on public dataset bigquery-public-data.austin_crime.crime of the BigQuery. My goal is to get the output as three column that shows the
discription(of the crime), count of them, and top district for that particular description(crime).
I am able to get the first two columns with this query.
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
and was hoping I can get that done with one query and then I tried this in order to get the third column showing me the Top district for that particular description (crime) by adding the code below
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
The error i am getting is this. "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
I think i can do that by joins. Can someone has better solution possibly to do that using without join.
Below is for BigQuery Standard SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC