Transposing Column Data into Row Data SQL - sql

I have some data that looks like this in an SQL table.
[ID],[SettleDate],[Curr1],[Curr2][Quantity1],[Quantity2],[CashAmount1],[CashAmount2]
The issue i have, i need to create 2 records from this data (all information from 1 and all information of 2). Example below.
[ID],[SettleDate],[Curr1],[Quantity1],[CashAmount1]
[ID],[SettleDate],[Curr2],[Quantity2],[CashAmount2]
Does anyone have an ideas how to do so?
Thanks

A standard (ie cross-RDBMS) solution for this is to use union:
select ID, SettleDate, Curr1, Quantity1, CashAmount1 from mytable
union all select ID, SettleDate, Curr2, Quantity2, CashAmount2 from mytable
Depending on your RBDMS, neater solutions might be available.

Just another option. The ItemNbr 1/2 is just to maintain which element.
Select A.[ID]
,A.[SettleDate]
,B.*
From YourTable A
Cross Apply ( values (1,[Curr1],[Quantity1],[CashAmount1])
,(2,[Curr2],[Quantity2],[CashAmount2])
) B{ItemNbr,Curr,Quantity,CashAmount)

Related

Can I use string_split with enforcing combination of labels?

So I have the following table:
Id Name Label
---------------------------------------
1 FirstTicket bike|motorbike
2 SecondTicket bike
3 ThirdTicket e-bike|motorbike
4 FourthTicket car|truck
I want to use string_split function to identify rows that have both bike and motorbike labels.
So the desired output in my example will be just the first row:
Id Name Label
--------------------------------------
1 FirstTicket bike|motorbike
Currently, I am using the following query but it is returning row 1,2 and 3. I only want the first. Is it possible?
SELECT Id, Name, Label FROM tickets
WHERE EXISTS (
SELECT * FROM STRING_SPLIT(Label, '|')
WHERE value IN ('bike', 'motorbike')
)
You can use APPLY & do aggregation :
SELECT t.id, t.FirstTicket, t.Label
FROM tickets t CROSS APPLY
STRING_SPLIT(t.Label, '|') t1
WHERE t1.value IN ('bike', 'motorbike')
GROUP BY t.id, t.FirstTicket, t.Label
HAVING COUNT(DISTINCT t1.value) = 2;
However, this breaks the normalization rules you should have separate table tickets.
You could just use string functions for this:
select t.*
from mytable t
where
'|' + label + '|' like '%|bike|%'
and '|' + label + '|' like '%|motorbike|%'
I would expect this to be more efficient than other methods that split and aggregate.
Please note, however, that you should really consider fixing your data model. Instead of storing delimited lists, you should have a separated table to represent the relation between tickets and labels, with one row per ticket/label tuple. Storing delimited lists in database column is a well-know SQL antipattern, that should be avoided at all cost (hard to maintain, hard to query, hard to enforce data integrity, inefficicent, ...). You can have a look at this famous SO post for more on this topic.
Yogesh beat me to it; my solution is similar but with a HUGE performance improvement worth pointing out. We'll start with this sample data:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#tickets','U') IS NOT NULL DROP TABLE #tickets;
CREATE TABLE #tickets (Id INT, [Name] VARCHAR(50), Label VARCHAR(1000));
INSERT #tickets (Id, [Name], Label)
VALUES
(1,'FirstTicket' , 'bike|motorbike'),
(2,'SecondTicket', 'bike'),
(3,'ThirdTicket' , 'e-bike|motorbike'),
(4,'FourthTicket', 'car|truck'),
(5,'FifthTicket', 'motorbike|bike');
Now the original and much improved version:
-- Original
SELECT t.id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY STRING_SPLIT(t.Label, '|') t1
WHERE t1.[value] IN ('bike', 'motorbike')
GROUP BY t.id, t.[Name], t.Label
HAVING COUNT(DISTINCT t1.[value]) = 2;
-- Improved Version Leveraging APPLY to avoid a sort
SELECT t.Id, t.[Name], t.Label
FROM #tickets AS t
CROSS APPLY
(
SELECT 1
FROM STRING_SPLIT(t.Label,'|') AS split
WHERE split.[value] IN ('bike','motorbike')
HAVING COUNT(*) = 2
) AS isMatch(TF);
Now the execution plans:
If you compare the costs: the "sortless" version is query 4.36 times faster than the original. In reality it's more because, with the first version, we're not just sorting, we are sorting three columns - an int and two (n)varchars. Because sorting costs are N * LOG(N), the original query gets exponentially slower the more rows you throw at it.

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Count instances of value (say, '4') in several columns/ rows

I have survey responses in a SQL database. Scores are 1-5.
Current format of the data table is this:
Survey_id, Question_1, Question_2, Question_3
383838, 1,1,1
392384, 1,5,4
393894, 4,3,5
I'm running a new query where I need % 4's, % 5's ... question doesn't matter, just overall.
At first glance I'm thinking
sum(iif(Question_1 =5,1,0)) + sum(iif(Question_2=5,1,0)) .... as total5s
sum(iif(Question_1=4,1,0)) + sum(iif(Question_2=4,1,0)) .... as total4s
But I am unsure if this is the quickest or most elegant way to achieve this.
EDIT: Hmm on first test this query already appears not to work correctly
EDIT2: I think I need sum instead of count in my example, will edit.
You have to unpivot the data and calculate the % responses thereafter. Because there are a limited number of questions, you can use union all to unpivot the data.
select 100.0*count(case when question=4 then 1 end)/count(*) as pct_4s
from (select survey_id,question_1 as question from tablename
union all
select survey_id,question_2 from tablename
union all
select survey_id,question_3 from tablename
) responses
Another way to do this could be
select 100.0*(count(case when question_1=4 then 1 end)
+count(case when question_2=4 then 1 end)
+count(case when question_3=4 then 1 end))
/(3*count(*))
from tablename
With unpivot as #Dudu suggested,
with unpivoted as (select *
from tablename
unpivot (response for question in (question_1,question_2,question_3)) u
)
select 100.0*count(case when response=4 then 1 end)/count(*)
from unpivoted

How to form an SQL query that is like using 'IN' across two columns as pairs?

I have two integer columns and I wish to select rows with particular pairings of values. What SQL syntax can I use? For example, using IN it might look something like this if IN supported this syntax:
select *
from myTable
where value1, value2 in ((2,3), (3,4), (2,5), (3,6))
To select those rows with
value1 == 2 and value2 == 3 or value1==3 and value2==4 or 2/5 or 3/6.
I'm using a proprietary SQL system, so basic SQL is preferred. Or if there is none, having a statement that works in some standard SQL would be useful as well.
select yourtable.*
from yourtable
inner join
(
select 2 as v1, 3 as v2
union select 3,4
union select 2,5
union select 3,6
) pairs
on yourtable.value1 = pairs.v1
and yourtable.value2 = pairs.v2
Well in SQL Server you can't use IN that way unfortunately. I think your best bet is going to be to write it out like you did below your code sample or to load your data into a CTE or something and then joining on that.
same can be achieved by using VALUES
select table_name.*
from table_name tn,
(values(2,3), (3,4), (2,5), (3,6) ) as val(v1,v2)
where tn.value1 = val.v1 and tn.value2 = val.v2

differentiate rows in a union table

I m selecting data from two different tables with no matching columns using this sql query
select * from (SELECT s.shout_id, s.user_id, s.time FROM shouts s
union all
select v.post_id, v.sender_user_id, v.time from void_post v)
as derived_table order by time desc;
Now is there any other way or with this sql statement only can i
differentiate the data from the two tables.
I was thinking of a dummy row that can be created at run-time(in the select statement only ) which would flag the row from the either tables.
As there is no way i can differentiate the shout_id that is thrown in the unioned table is
shout_id from the shout table or from the void_post table.
Thanks
Pradyut
You can just include an extra column in each select (I'd suggest a BIT)
select * from
(SELECT s.shout_id, s.user_id, s.time, 1 AS FromShouts FROM shouts s
union all
select v.post_id, v.sender_user_id, v.time, 0 AS FromShouts from void_post v)
as derived_table order by time desc;
Sure, just add a new field in your select statement called something like source with a different constant value for each source.
SELECT s.shout_id, s.user_id, s.time, 'shouts' as source FROM shouts s
UNION ALL
SELECT v.post_id, v.sender_user_id, v.time, 'void_post' as source FROM void_post v
A dummy variable is a nice way to do it. There isn't much overhead in the grand scheme of things.
p.s., the dummy variable represents a column and not a row.