I am trying to unpivot / coalesce multiple columns into one value, based on values in the target columns.
Given the following sample data:
CREATE TABLE SourceData (
Id INT
,Column1Text VARCHAR(10)
,Column1Value INT
,Column2Text VARCHAR(10)
,Column2Value INT
,Column3Text VARCHAR(10)
,Column3Value INT
)
INSERT INTO SourceData
SELECT 1, NULL, NULL, NULL, NULL, NULL, NULL UNION
SELECT 2, 'Text', 1, NULL, NULL, NULL, NULL UNION
SELECT 3, 'Text', 2, 'Text 2', 1, NULL, NULL UNION
SELECT 4, NULL, NULL, NULL, 1, NULL, NULL
I am trying to produce the following result set:
Id ColumnText
----------- ----------
1 NULL
2 Text
3 Text 2
4 NULL
Where ColumnXText column values become one "ColumnText" value per row, based on the following criteria:
If all ColumnX columns are NULL, then ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NULL, then
ColumnText = NULL
If a ColumnXValue value is "1" and the ColumnXText IS NOT NULL, then
ColumnText = ColumnXText.
There are no records with more than one ColumnXValue of "1".
What I'd tried is in this SQL Fiddle # http://sqlfiddle.com/#!6/f2e18/2
I'd tried (shown in SQL fiddle):
Unpivioting with CROSS / OUTER APPLY. I fell down on this approach because I was not able to get WHERE conditions to produce the expected results.
I'd also tried using UNPIVOT, but had no luck.
I was thinking of a brute-force approach that did not seem to be correct. The real source table has 44MM rows. I do not control the schema of the source table.
Please let me know if there's a simpler approach than a brute-force tangle of CASE WHENs. Thank you.
I don't think there is much mileage in trying to be too clever with this
SELECT
Id,
CASE
WHEN COLUMN1VALUE = 1 THEN COLUMN1TEXT
WHEN COLUMN2VALUE = 1 THEN COLUMN2TEXT
WHEN COLUMN3VALUE = 1 THEN COLUMN3TEXT
End as ColumnText
From
Sourcedata
I did have them in 321 order, but considered that the right answer might be hit sooner if the checking is done in 123 order instead (fewer checks, if there are 44million rows, might be significant)
Considering you have 44 million rows, you really don't want to experiment to much to join table on itself with apply or something like that. You need just go through it once, and that's best with simple CASE, what you call "brute-force" approach:
SELECT
Id
, CASE WHEN Column1Value = 1 THEN Column1Text
WHEN Column2Value = 1 THEN Column2Text
WHEN Column3Value = 1 THEN Column3Text
END AS ColumnText
FROM SourceData
But, if you really want to get fancy and write something without case, you could use UNION to merge different columns into one, and then join on it:
wITH CTE_Union AS
(
SELECT Id, Column1Text AS ColumnText, Column1Value AS ColumnValue
FROM SourceData
UNION ALL
SELECT Id, Column2Text, Column2Value FROM SourceData
UNION ALL
SELECT Id, Column3Text, Column3Value FROM SourceData
)
SELECT s.Id, u.ColumnText
FROM SourceData s
LEFT JOIN CTE_Union u ON s.Id = u.id and u.ColumnValue = 1
But I guarantee first approach will outperform this by a margin of 4 to 1
If you do not want to use a case expression, then you can use another outer apply() on a common table expression (or subquery/derived table) of your original unpivot with outer apply():
;with cte as (
select s.Id, oa.ColumnText, oa.ColumnValue
from sourcedata s
outer apply (values
(s.Column1Text, s.Column1Value)
, (s.Column2Text, s.Column2Value)
, (s.Column3Text, s.Column3Value)
) oa (ColumnText, ColumnValue)
)
select s.Id, x.ColumnText
from sourcedata s
outer apply (
select top 1 cte.ColumnText
from cte
where cte.Id = s.Id
and cte.ColumnValue = 1
) x
rextester demo: http://rextester.com/TMBR41346
returns:
+----+------------+
| Id | ColumnText |
+----+------------+
| 1 | NULL |
| 2 | Text |
| 3 | Text 2 |
| 4 | NULL |
+----+------------+
This will give you the first non-null text value in order. It seems as if this is what you are trying to accomplish.
select ID, Coalesce(Column3Text,Column2Text,Column1Text) ColumnText
from SourceData
Related
I have this issue where I want to show only the latest record (Col 1). I deleted the date column thinking that it might not work if it has different values. but if that's the case, then the record itself has a different name (Col 1) because it has a different date in the name of it.
Is it possible to fetch one record in this case?
The code:
SELECT distinct p.ID,
max(at.Date) as date,
at.[RAPID3 Name] as COL1,
at.[DLQI Name] AS COL2,
at.[HAQ-DI Name] AS COL3,
phy.name as phyi,
at.State_ID
FROM dbo.[Assessment Tool] as at
Inner join dbo.patient as p on p.[ID] = at.[Owner (Patient)_Patient_ID]
Inner join dbo.[Physician] as phy on phy.ID = p.Physician_ID
where (at.State_ID in (162, 165,168) and p.ID = 5580)
group by
at.[RAPID3 Name],
at.[DLQI Name],
at.[HAQ-DI Name],
p.ID, phy.name,
at.State_ID
SS:
In this SS I want to show only the latest record (COL 1) of this ID "5580". Means the first row for this ID.
Thank you
The Most Accurate way to handle this.
Extract The Date.
Than use Top and Order.
create table #Temp(
ID int,
Col1 Varchar(50) null,
Col2 Varchar(50) null,
Col3 Varchar(50) null,
Phyi Varchar(50) null,
State_ID int)
Insert Into #Temp values(5580,'[9/29/2021]-[9.0]High Severity',null,null,'Eman Elshorpagy',168)
Insert Into #Temp values(5580,'[10/3/2021]-[9.3]High Severity',null,null,'Eman Elshorpagy',168)
select top 1 * from #Temp as t
order by cast((Select REPLACE((SELECT REPLACE((SELECT top 1 Value FROM STRING_SPLIT(t.Col1,'-')),'[','')),']','')) as date) desc
This is close to ANSI standard, and it also caters for the newest row per id.
The principle is to use ROW_NUMBER() using a descending order on the date/timestamp (using a DATE type instead of a DATETIME and avoiding the keyword DATE for a column name) in one query, then to select from that query using the result of row number for the filter.
-- your input, but 2 id-s to show how it works with many ..
indata(id,dt,col1,phyi,state_id) AS (
SELECT 5580,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5580,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
)
-- real query starts here, replace following comman with "WITH" ...
,
with_rank AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rank_id
FROM indata
)
SELECT
id
, dt
, col1
, phyi
, state_id
FROM with_rank
WHERE rank_id=1
;
id | dt | col1 | phyi | state_id
------+------------+-----------------------------------+-----------------+----------
5580 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168
5581 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168
I've made schema changes/improvements to a table, but I need to ensure that I don't lose any existing data and it is 'migrated' across to the new schema and conforms to its design.
The existing schema is designed as follows:
ID FK_ID ShowChartX ShowChartY ShowChartZ
-- ----- ---------- ---------- ----------
1 2 1 0 1
The columns of ShowChartX, ShowChartY, and ShowChartZ are of type BIT (boolean).
I've now created a standalone table that keeps a record/reference of each chart. Each Chart record has a Chart_ID - the aim here is to use an ID for each type of chart instead of horizontally scaling a 'ShowChart' column for each type of chart going forward. Essentially, I would like to map all columns of 'ShowChart' to their actual Chart_ID key in the table I mention below:
The new schema would look like this:
ID FK_ID Chart_ID
-- ----- --------
1 2 1
2 2 2
I've started looking at Pivot/Unpivot, but I'm not sure if it's the correct operation. Could anyone please point me in the right direction here? Thanks in advance!
This will UNPIVOT the data. You can also, join the charts table by name in order to get the chart_id and check for differences with the new table:
DECLARE #DataSource TABLE
(
[ID] INT
,[FK_ID] INT
,[ShowChartX] BIT
,[ShowChartY] BIT
,[ShowChartZ] BIT
);
INSERT INTO #DataSource ([ID], [FK_ID], [ShowChartX], [ShowChartY], [ShowChartZ])
VALUES (1, 2, 1, 0, 1);
SELECT [ID]
,[FK_ID]
,[column] AS [chart_name]
FROM #DataSource DS
UNPIVOT
(
[value] FOR [column] IN ([ShowChartX], [ShowChartY], [ShowChartZ])
) UNPVT
WHERE [value] = 1;
For checking for differences it's pretty easy to use EXCEPT - for example:
SELECT *
FROM T1
EXCEPT
SELECT *
FROM T2;
to get records that are not including in T2 but in T1 and then the reverse:
SELECT *
FROM T2
EXCEPT
SELECT *
FROM T1;
Thanks to #gotqn for the table definition and values.
The same result can be achieved using CROSS APPLY. Here, I am deriving Chart_Id based on ChartType, as I don't have the table reference for ChartTypes. Ideally, You can join with ChartTypes to get the corresponding Chart_Id.
DECLARE #DataSource TABLE
(
[ID] INT
,[FK_ID] INT
,[ShowChartX] BIT
,[ShowChartY] BIT
,[ShowChartZ] BIT
);
INSERT INTO #DataSource ([ID], [FK_ID], [ShowChartX], [ShowChartY], [ShowChartZ])
VALUES (1, 2, 1, 0, 1);
SELECT id,
fk_id,
CASE charttype
WHEN 'ChartX' THEN 1
WHEN 'ChartY' THEN 3
WHEN 'ChartZ' THEN 2
END AS Chart_ID
FROM #DataSource
CROSS apply (VALUES('ChartX', showchartx),
('ChartY', showcharty),
('ChartZ', showchartz)) AS t(charttype, isavailable)
WHERE isavailable <> 0;
Result set
+----+-------+----------+
| ID | FK_ID | Chart_ID |
+----+-------+----------+
| 1 | 2 | 1 |
| 1 | 2 | 2 |
+----+-------+----------+
(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1
I have an odd requirement which ideally should be solved in SQL, not the surrounding app.
I need to select exactly 5 rows regardless of how many are actually available. In practice the number of rows available will usually be less than 5 and on some rare occasions it will be more than 5. The "extra" rows should have null in every column.
The app is written in a technology that isn't Turing Complete. This requirement is much more difficult to solve in the app's code than you might imagine! To describe it, the app is effectively a transformer: It takes in a bunch of queries and spits out a report. So please understand the app is NOT written in a "programming language" in the traditional sense.
So for example, if I have a table:
A | B
-----
1 | X
2 | Y
3 | Z
Then a valid result would be
A | B
-----------
2 | Y
1 | X
3 | Z
null | null
null | null
I know this is an unusual requirement. Sadly it can't be solved in the application due to the technology being used.
Ideally this shouldn't require changes to the database but if there is no other way that changes can be arranged.
Any suggestions?
You can do something like this:
select top 5 a, b
from (select a, b, 1 as priority from t union all
select null, null, 2 cross join
(values(1, 2, 3, 4, 5)) v(5)
) x
order by priority;
That is, create dummy rows, append them, and then choose the first five.
I do think that this work should be done in the app, but you can do it in SQL.
Create Table #Test (A int, B int)
Insert #Test Values (1,1)
Insert #Test Values (2,1)
Insert #Test Values (3,1)
Select Top 5 * From
(
Select A, B From #Test
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
) A
Wrap this in a stored proc..
declare #rowcount int
select top 5* from dbo.test
set #rowcount=##rowcount
if #rowcount<5
Begin
select * from dbo.test
union all
select null from dbo.numbers where n<=5-#rowcount
End
If you use some sort of tally table (although the numbers themselves do not matter, only that the table has enough records), you can use it to create the dummy rows. e.g. using sys.columns:
select top 5 a,b from
(
select a, b, 0 ord from yourTable
union all
select null a,null b, 1 from sys.columns
) t
order by ord
The advantage of the tally would be that if you need another number of rows in the future, you only need to change the top x (provided the tally table has enough rows)
Get those 3 records from your table.
Take a counter variable.
and then from your code add the NULL content until your counter gets 5.
I need to obtain a query result that would be showing in multiples of a determined number (10 in my case), independent of the real quantity of rows (actually to solve a jasper problem).
For example, in this link, I build a example schema: http://sqlfiddle.com/#!3/c3dba/1/0
I'd like the result to be like this:
1 Item 1 1 10
2 Item 2 2 30
3 Item 3 5 15
4 Item 4 2 10
null null null null null
null null null null null
null null null null null
null null null null null
null null null null null
null null null null null
I have found this explanation, but doesn't work in SQLServer and I can't convert: http://community.jaspersoft.com/questions/514706/need-table-fixed-size-detail-block
Another option is to use a recursive CTE to get the pre-determined number of rows, then use a nested CTE construct to union rows from the recursive CTE with the original table and finally use a TOP clause to get the desired number of rows.
DECLARE #n INT = 10
;WITH Nulls AS (
SELECT 1 AS i
UNION ALL
SELECT i + 1 AS i
FROM Nulls
WHERE i < #n
),
itemsWithNulls AS
(
SELECT * FROM itens
UNION ALL
SELECT NULL, NULL, NULL, NULL FROM Nulls
)
SELECT TOP (#n) *
FROM itemsWithNulls
EDIT:
By reading the requirements more carefully, the OP actually wants the total number of rows returned to be a multiple of 10. E.g. if table itens has 4 rows then 10 rows should be returned, if itens has 12 rows then 20 rows should be return, etc.
In this case #n should be set to:
DECLARE #n INT = ((SELECT COUNT(*) FROM itens) / 10 + 1) * 10
We can actually fit everything inside a single sql statement with the use of nested CTEs:
;WITH NumberOfRows AS (
SELECT n = ((SELECT COUNT(*) FROM itens) / 10 + 1) * 10
), Nulls AS (
SELECT 1 AS i
UNION ALL
SELECT i + 1 AS i
FROM Nulls
WHERE i < (SELECT n FROM NumberOfRows)
),
itemsWithNulls AS
(
SELECT * FROM itens
UNION ALL
SELECT NULL, NULL, NULL, NULL FROM Nulls
)
SELECT TOP (SELECT n FROM NumberOfRows) *
FROM itemsWithNulls
SQL Fiddle here
This might work for you - use an arbitrary cross join to create a large number of null rows, and then union them back in with your real data. You'll need to pay extra attention to the ORDERING to ensure that it is the nulls at the bottom.
DECLARE #NumRows INT = 50;
SELECT TOP (#NumRows) *
FROM
(
SELECT * FROM itens
UNION
SELECT TOP (#NumRows) NULL, NULL, NULL, NULL
FROM sys.objects o1 CROSS JOIN sys.objects o2
) x
ORDER BY CASE WHEN x.ID IS NULL THEN 9999 ELSE ID END
Fiddle Demo
This is super simple. You use a tally as the main table in your query.
http://sqlfiddle.com/#!3/c3dba/20
You can read more about tally tables here.
http://www.sqlservercentral.com/articles/T-SQL/62867/
In the context of a proc/script, you can do your initial query into a table variable or temp table, check ##ROWCOUNT, or query the count of rows in the table, and then do a FOR loop to populate the rest of the rows. Finally, select * from your table variable/temp table.