PIVOT with multiple rows result and duplicate data

PIVOT with multiple rows result and duplicate data - sql

I need your suggestions on this.
I want to pivot this rows to a column with the result that have multiple duplicated rows
Before:
line_type
line_name
Internal
Storage 1
Makloon
Storage 2
Internal
Storage 1
Makloon
Storage 3
Process
Storage B
Makloon
Storage 3
After:
Internal
Makloon
Process
Storage 1
Storage 2
Storage B
Storage 1
Storage 3
Storage 3
Can I use pivot or is there another trick to do this?
I have tried using regular pivot but it just doesn't work like what I wanted.
SELECT *
FROM (
SELECT
[line_type],
[line_name]
FROM
[table_name]
) pvt
PIVOT (
MAX(line_name)
FOR [line_type] IN (
[Internal],
[Makloon],
[Process]
)
) AS pvt_table;
The result of that PIVOT query I tried:
Internal
Makloon
Process
Storage 1
Storage 3
Storage B

In order to keep your rows separated, you need a distinct value. One way is to use the ROW_NUMBER() window function to assign distinct values within each pivot column.
Something like:
SELECT rn, [Internal], [Makloon], [Process]
FROM (
SELECT
[line_type],
[line_name],
ROW_NUMBER() OVER(PARTITION BY line_type ORDER BY line_name) AS rn
FROM
[table_name]
) pvt
PIVOT (
MAX(line_name)
FOR [line_type] IN (
[Internal],
[Makloon],
[Process]
)
) AS pvt_table
ORDER BY rn;
rn can be dropped from the final select list. It was placed there for illustration purposes.
See this db<>fiddle.

Related

Azure SQL Transpose a table with all rows

I use Azure SQL database. I have a table - test_excel_poc_head with the below values which I want to transpose using link id values as the columns
The intended output is below where the column is the 'link_id' values. The link_id values are dynamic
I started using UNPIVOT and PIVOT option and below is my unpivot query and results:
SELECT link_id,head_values
FROM
(SELECT link_id,comp1,comp2,comp3,comp4
FROM [dbo].[test_excel_poc_head]
) AS cp
UNPIVOT
(head_values FOR head_value in (comp1,comp2,comp3,comp4)
) AS up
RESULTS:
Now when I add the PIVOT code, it expects an aggregate function which I do not have as it is a string and it errors out.
If I add MAX as the aggregate function, I do not get the intended result.
SELECT * FROM (
SELECT link_id,head_values
FROM
(SELECT link_id,comp1,comp2,comp3,comp4
FROM [dbo].[test_excel_poc_head]
) AS cp
UNPIVOT
(head_values FOR head_value in (comp1,comp2,comp3,comp4)
) AS up
) temp_results
PIVOT(
MAX(head_values)
FOR link_id
IN (
[1],[2],[3],[4],[5],[6]
)
) AS PivotTable
RESULT:
But this is not my expected result. Is there any other option to achieve PIVOT without the use of agg functions?
Thanks for your time and help.

I tried my luck. Could you check below query if it works,
What I did different to your query is making the result of UNPIVOT distinct by adding row_number to it so that the later PIVOT will take max of each row and display separately. My bad if the explanation doesn't makes sense to you.
select [1],[2],[3],[4],[5],[6]
from
( select link_id,head_values,
row_number() over (partition by link_id order by link_id) rn
from
( select link_id
,cast(comp1 as varchar(255)) as comp1
,cast(comp2 as varchar(255)) as comp2
,cast(comp3 as varchar(255)) as comp3
,cast(comp4 as varchar(255)) as comp4
from [dbo].[test_excel_poc_head]
) as cp
unpivot
(
head_values for head_value in (comp1,comp2,comp3,comp4)
) as up
) temp_results
pivot
(
max(head_values)
for link_id in ([1],[2],[3],[4],[5],[6])
) as pivottable;
db<>fiddle for your reference.

How to display in Big Query ONLY duplicated records?

To view records without duplicated ones, I use this SQL
SELECT * EXCEPT(row_number)
FROM (SELECT*,ROW_NUMBER() OVER (PARTITION BY orderid) row_number
FROM `TABLE`)
WHERE row_number = 1
What is the best practice to display only duplicated records from a single table?

Below is for BigQuery Standard SQL
Me personally, I prefer not to rely on ROW_NUMBER() whenever it is possible because with big volume of data it tends to lead to Resource Exceeded error
So, from my experience I would recommend below options:
To view records for those orderid with only one entry:
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY orderid
HAVING COUNT(1) = 1
to view records for those orderid with more than one entry:
#standardSQL
SELECT * EXCEPT(flag) FROM (
SELECT *, COUNT(1) OVER(PARTITION BY orderid) > 1 flag
FROM `project.dataset.table`
)
WHERE flag
note: behind the hood - COUNT(1) OVER() can be calculated using as many workers as available while ROW_NUMBER() OVER() requires all respective data to be moved to one worker (thus Resource related issue)
OR
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE orderid IN (
SELECT orderid FROM `project.dataset.table`
GROUP BY orderid HAVING COUNT(1) > 1
)

Why not just change the row_number ? You have partitionned by order id, creating partitions of duplicates, ranked the records and take only the first element to remove the duplicates. But if you take only the row_number = 2, you'll have only elements from partitions with at least 2 elements, i.e only duplicates.
SELECT * EXCEPT(row_number)
FROM (SELECT*,ROW_NUMBER() OVER (PARTITION BY orderid) row_number
FROM `TABLE`)
WHERE row_number = 2
Note :Use row_number = 2 will give you only 1 element of duplicates. If you go with row_number > 1, the result may contain duplicates again (for example if you had 3 identical elements in the first table).

You can display the duplicated row by showing only raw with row_number greater than 1.
select
* except(row_number)
from (
select
*, row_number() over (partition by ) as row_number
from `TABLE`)
where row_number > 1

If your table has not primary key column, you are obliged to define it. Asuming my table contains 12 columns in BigQuery, I do not find shorter than:
SELECT *, sum(1) as rowcount
FROM `TABLE`
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
HAVING rowcount>1;

Using a CTE in OVER(PARTITION BY)

I'm trying to calculate volume from 3 columns in a table and return only unique volumes. We have many rows with the same Width, Height, and Length and so naturally my volume calculation will have duplicate return values for Volume. I am under the impression that, in order to accomplish this, I must use OVER, PARTITION and a CTE as aliases are not allowed to be referenced in OVER
WITH
cteVolume (Id, Volume)
AS
(
SELECT Id, Width * Height * [Length] AS Volume FROM PackageMaterialDimensions
)
SELECT *
INTO #volumeTempTable
FROM (
SELECT pp.ID, (pp.Width * pp.Height * pp.[Length]) AS Volume,
ROW_NUMBER() OVER(PARTITION BY cte.Volume ORDER BY pp.ID DESC) rn
FROM PlanPricing pp
INNER JOIN cteVolume cte ON pp.ID = cte.Id
) a
WHERE rn = 1
SELECT * FROM #volumeTempTable
ORDER BY Volume DESC
DROP TABLE #volumeTempTable
Note, the reason for the temp tables is because I plan on doing some extra work with this data. I also am currently debugging so I am using these tables to output to the data window
Here is what is wrong with this query
- It is still returning duplicates
- It is only returning one volume for every row
- It is only returning about 75 rows when there are 71000 rows in the table
How can I modify this query to essentially do the following
- Calculate volume for EVERY row in the table
- SELECT rows with unique volume calculations. (I do not want to see the same volume twice in my result set)
Edit - providing data as requested
Current data set Ignore the extra columns
What I would like is
ID | Volume
193 | 280
286 | 350
274 | 550
241 | 720
Basically, I want to calculate volume for every row, then I would like to somehow group by volume in order to cut down duplicates and select the first row from each group

Does this do what you want?
WITH cteVolume (Id, Volume) AS (
SELECT Id, Width * Height * [Length] AS Volume
FROM PackageMaterialDimensions
)
SELECT DISTINCT volume
FROM CTE ;
If you want one id per volume:
WITH cteVolume (Id, Volume) AS (
SELECT Id, Width * Height * [Length] AS Volume
FROM PackageMaterialDimensions
)
SELECT volume, MIN(Id) as Id
FROM CTE
GROUP BY volume;

Perhaps your issue is coming from partitioning cte.volume from the PackageMaterialDimensions table, but you're also selecting pp.volume from the PlanPricing table?
Not able to confirm without more information on your data set and tables.

As far as I can see you can't use windows functions inside the recursive part of the CTE. You have to sum them manually, inside the CTE part.
So, instead of
ROW_NUMBER() OVER(PARTITION BY cte.Volume ORDER BY pp.ID DESC) rn
Just write
1 as rn
in the first part, and
rn+1 as rn
in the second part.

how to pivot multiple records

This is my table structure !
create table t(floor int,apt int)
insert into t values(1,1),(1,2),(1,4),(2,5),(2,6),(2,7)
I want to get like this!
floor room1 room2 room3
1 1 2 4
2 5 6 7

Use a PIVOT in this case.
SELECT * FROM
(
SELECT floor,
apt,
NumberedApt = 'room' + CAST(ROW_NUMBER() OVER
(PARTITION BY floor ORDER BY apt) AS NVARCHAR(100))
FROM t
) AS OrderApts
PIVOT (MAX(apt) FOR Numberedapt IN (room1, room2, room3)) AS PivotedApts
Here is and SQLFiddle of the above working.
If you are going to get many more 'room' columns then you might want to consider using a dynamic pivot, but they can be inefficient due to not having a query plan.

More on pivot here
1.you need to use Row_number() partition by floor to get row then pivot to get your requirement
select p.floor,p.[1] as room1,p.[2] as room2,p.[3] as room3 from
(
select floor,apt,row_number() over(partition by floor order by apt) as rn from #t) as t
pivot
(
min(t.apt)
for t.rn in([1],[2],[3])
)as p;
See in Action

SQL select segment

I'm using SQL Server 2008.
I have a table with x amount of rows. I would like to always divide x by 5 and select the 3rd group of records.
Let's say there are 100 records in the table:
100 / 5 = 20
the 3rd segment will be record 41 to 60.
How will I be able in SQL to calculate and select this 3rd segment only?
Thanks.

You can use NTILE.
Distributes the rows in an ordered partition into a specified number of groups.
Example:
SELECT col1, col2, ..., coln
FROM
(
SELECT
col1, col2, ..., coln,
NTILE(5) OVER (ORDER BY id) AS groupno
FROM yourtable
)
WHERE groupno = 3

That's a perfect use for the NTILE ranking function.
Basically, you define your query inside a CTE and add an NTILE to your rows - a number going from 1 to n (the argument to NTILE). You order your rows by some column, and then you get the n groups of rows you're looking for, and you can operate on any one of those "groups" of data.
So try something like this:
;WITH SegmentedData AS
(
SELECT
(list of your columns),
GroupNo = NTILE(5) OVER (ORDER BY SomeColumnOfYours)
FROM dbo.YourTable
)
SELECT *
FROM SegmentedData
WHERE GroupNo = 3
Of course, you can also use an UPDATE statement after the CTE to update those rows.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PIVOT with multiple rows result and duplicate data - sql

Related

Azure SQL Transpose a table with all rows

How to display in Big Query ONLY duplicated records?

Using a CTE in OVER(PARTITION BY)

how to pivot multiple records

SQL select segment

Categories

Resources