Recently I have very specific problem with data we get from our data-warehouse. The problem is being solved, but I have to edit our control environment for a while.
We have data about received invoices, however due to some reason, information about every invoice is split into two rows: First row has important columns unique_code_A, vendor_number, and the second row has important columns unique_code_B, amount. So every invoice has very specific unique code, and with this code I have to somehow join the information from both rows, as you can see in picture.
Well, you can use aggregation:
select date_key, invoice_type,
max(case when unique_code_b is null then unique_code_a end) as unique_code_a,
max(unique_code_b) as unique_code_b,
max(case when unique_code_b is null then vendor_number end) as vendor_number,
max(case when unique_code_b is not null then amount end) as amount
from t
group by date_key, invoice_type;
EDIT:
If the unique codes can be used for matching, then I would suggest:
select date_key, invoice_type,
coalesce(unique_code_a, unique_code_b) as unique_code,
max(case when unique_code_b is null then vendor_number end) as vendor_number,
max(case when unique_code_b is not null then amount end) as amount
from t
group by date_key, invoice_type, coalesce(unique_code_a, unique_code_b);
From what you told, a self join should probably work:
SELECT
A.DATE_KEY,
A.INVOICE_TYPE,
A.UNIQUE_CODE_A,
B.UNIQUE_CODE_B,
A.VENDOR_NUMBER,
B.AMOUNT
FROM MyTable A
INNER JOIN MyTable B ON A.UNIQUE_CODE_A=B.UNIQUE_CODE_B
Related
I have a large data sets of truck tickets which generate two lines of output per ticket. This is because the ticket has an "out" and "in" component to each ticket. I want to generate one line of output but include information from both the "out" portion of the ticket and the "in" portion.
SELECT Ticket_number, Oil_volume,Faciliy_ID,Ticket_Type
FROM Truckticket T
JOIN TBATTERY TB
ON TB.Battery_ID = T.Battery_ID
My Output has two lines:
Ticket_number
Oil_volume
Facility_ID
Ticket_type
1
10
SK01
O
1
10
SK02
I
Now, what I want my output to be when I use a where clause on Facility_ID SK01:
Ticket_number
Oil_volume
Facility_ID
Facility_ID
Ticket_type
1
10
SK01
SK02
O
I know I have to do this with a subquery or CTE to get Facility_ID SK02 on the same line, but I'm stuck. I hope I have presented my question ok, my first time. Thanks!`
Instead of a subquery or CTE, this looks like a good time to use aggregation and pivoting. With a small number of static types, a simple MAX(CASE... expression with a GROUP BY can pivot the rows to columns.
SELECT
Ticket_number,
MAX(CASE WHEN Ticket_Type = 'O' then Oil_volume else NULL END) Oil_volume_out,
MAX(CASE WHEN Ticket_Type = 'I' then Oil_volume else NULL END) Oil_volume_in,
MAX(CASE WHEN Ticket_Type = 'O' then Facility_ID else NULL END) Facility_ID_out,
MAX(CASE WHEN Ticket_Type = 'I' then Facility_ID else NULL END) Facility_ID_in,
FROM Truckticket T
JOIN TBATTERY TB
ON TB.Battery_ID = T.Battery_ID
GROUP BY Ticket_number;
I have SQL Query like this in SSMS
select distinct (b.TransactionNumber),
(case when b.Amount > 0 then c.total else 0 end) as 'Total Sales',
(case when b.TenderID = 1 then b.Amount else 0 end) as 'Cash',
(case when b.TenderID = 20 then b.Amount else 0 end) as 'Gift Certificates'
from [Transaction] c
inner join TenderEntry b on c.TransactionNumber = b.TransactionNumber
but the output is(see image for reference)
This should be the expected output(see image for reference)
I would expect one row per transaction number, especially given your use of select distinct:
select t.TransactionNumber, te.total as total_sales,
sum(case when t.TenderID = 1 then t.Amount else 0 end) as Cash,
sum(case when t.TenderID = 20 then t.Amount else 0 end) as Gift_Certificates
from TenderEntry te join
Transaction t
on te.TransactionNumber = t.TransactionNumber
group by t.TransactionNumber, te.total;
This produces one row per transaction.
Note the changes to the query:
The table aliases are meaningful (i.e. abbreviations of table names) rather than arbitrary letters.
The column aliases do not use single quotes. Only use single quotes for string and date constants.
The column aliases have been simplified so they do not need to be escaped.
It occurs to me that you might want to "list" the cash and gifts in the two columns. This would look like:
select TransactionNumber,
max(case when seqnum = 1 then total end) as total_sales,
sum(case when tenderId = 1 then amount end) as cash,
sum(case when tenderId = 20 then amount end) as Gift_Certificates
from (select t.TransactionNumber, te.total, t.amount, t.TenderID,
row_number() over (partition by t.TransactionNumber, t.TenderId order by t.amount) as seqnum
from TenderEntry te join
Transaction t
on te.TransactionNumber = t.TransactionNumber
where tenderid in (1, 20)
) x
group by t.TransactionNumber, seqnum;
This is only a partial answer, but I put it here because I cannot fit it into comments well enough.
It is likely that you do not want the DISTINCT component in the select. SELECT DISTINCT find all unique rows. So if you have 3 rows which all have some differences, it will show all three. If, on the other hand, there were two rows the same (e.g., they paid for a $100 item with two $50 vouchers) it would just ignore one of them.
Instead, you probably need to become familiar with 'GROUP BY' which allows you to find totals etc across multiple rows.
For example (although not tested)
select
(b.TransactionNumber),
(case when b.Amount > 0 then c.total else 0 end) as 'Total Sales',
SUM(case when b.TenderID = 1 then b.Amount else 0 end) as 'Cash',
SUM(case when b.TenderID = 20 then b.Amount else 0 end) as 'Gift Certificates'
from [Transaction] c
inner join TenderEntry b on c.TransactionNumber = b.TransactionNumber
GROUP BY (b.TransactionNumber, (case when b.Amount > 0 then c.total else 0 end))
In the above, I have removed the DISTINCT, added 'SUM' for the two transaction values (cash/certificates) and the GROUP BY across the TransactionNumber and Total sales (as that seems to be common across the transaction).
For the above data, what that would produce is 1 line of data for TransactionNumber = 1, with sales = 250 (as all the lines have that), and totals for cash and gift certificates (150 and 100 respectively if my maths is correct).
However, that is not your desired answer, - you want this transaction to go over two lines. Firstly, are you sure of that?
If you do, then we need another criteria by which to group them and you will need to specify that e.g., why does this transaction's report need to go over two lines rather than just one?
Other notes
'Transaction' is a special word within SQL Server. You are allowed to use it, but I would consider renaming the table.
I have two tables I am pulling data from. Here is a minimal recreation of what I have:
Select
Jobs.Job_Number,
Jobs.Total_Amount,
Job_Charges.Charge_Code,
Job_Charges.Charge_Amount
From
DB.Jobs
Inner Join
DB.Job_Charges
On
Jobs.Job_Number = Job_Charges.Job_Number;
So, what happens is that I end up getting a row for each different Charge_Code and Charge_Amount per Job_Number. Everything else on the row is the same. Is it possible to have it return something more like:
Job_Number - Total_Amount - Charge_Code[1] - Charge_Amount[1] - Charge_Code[2] - Charge_Amount[2]
ETC?
This way it creates one line per job number with each associated charge and amount on the same line. I have been reading through W3 but haven't been able to tell definitively if this is possible or not. Anything helps, thank you!
To pivot your resultset over a fixed number of columns, you can use row_number() and conditional aggregation:
select
job_number,
total_amount,
max(case when rn = 1 then charge_code end) charge_code1,
max(case when rn = 1 then charge_amount end) charge_amount1,
max(case when rn = 2 then charge_code end) charge_code2,
max(case when rn = 2 then charge_amount end) charge_amount2,
max(case when rn = 3 then charge_code end) charge_code3,
max(case when rn = 3 then charge_amount end) charge_amount3
from (
select
j.job_number,
j.total_amount,
c.charge_code,
c.charge_amount,
row_number() over(partition by job_number, total_amount order by c.charge_code) rn
from DB.Jobs j
inner join DB.Job_Charges c on j.job_number = c.job_number
) t
group by job_number, total_amount
The above query handes up to 3 charge codes and amounts par job number (ordered by job codes). You can expand the select clause with more max(case ...) expressions to handle more of them.
I have a sql code which I am using to do pivot. Code is as follows:
SELECT DISTINCT PersonID
,MAX(pivotColumn1)
,MAX(pivotColumn2) --originally these were in 2 separate rows)
FROM(SELECT srcID, PersonID, detailCode, detailValue) FROM src) AS SrcTbl
PIVOT(MAX(detailValue) FOR detailCode IN ([pivotColumn1],[pivotColumn2])) pvt
GROUP BY PersonID
In the source data the ID has 2 separate rows due to having its own ID which separates the values. I have now pivoted it and its still giving me 2 separate rows for the ID even though i grouped it and used aggregation on the pivot columns. Ay idea whats wrong with the code?
So I have all my possible detailCode listed in the IN clause. So I have null returned when the value is none but I want it all summarised in 1 row. See image below.
If those are all the options of detailCode , you can use conditional aggregation with CASE EXPRESSION instead of Pivot:
SELECT t.personID,
MAX(CASE WHEN t.detailCode = 'cas' then t.detailValue END) as cas,
MAX(CASE WHEN t.detailCode = 'buy' then t.detailValue END) as buy,
MAX(CASE WHEN t.detailCode = 'sel' then t.detailValue END) as sel,
MAX(CASE WHEN t.detailCode = 'pla' then t.detailValue END) as pla
FROM YourTable t
GROUP BY t.personID
So I'm trying to construct a query in BigQuery that I'm struggling with for a final part.
As of now I have:
SELECT
UNIQUE(Name) as SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) as RevenueGenerated
FROM (
SELECT
mantaSubscriptionIdmetadata,
planIdmetadata,
INTEGER(Amount) as RevenueGenerated
FROM
[sample_internal_data.charge0209]
WHERE
revenueSourcemetadata = 'new'
AND
Status = 'Paid'
GROUP BY
mantaSubscriptionIdmetadata,
planIdmetadata,
RevenueGenerated
)a
JOIN (
SELECT
id,
Name,
Interval
FROM
[sample_internal_data.subplans]
WHERE
id in ('150017','150030','150033','150019')
GROUP BY
id,
Name,
Interval )b
ON
a.planIdmetadata = b.id
GROUP BY
ID,
Interval,
Name
ORDER BY
Interval ASC
The resulting query looks like this
Which is exactly what I'm looking for up to that point.
Now what I'm stuck on this. There is another column I need to add called SalesRepName. The resulting field will either be null or not null. If its null it means it was sold online. If its not null, it means it was sold via telephone. What I want to do is create two additional columns where it says how many were sold via telesales and via online. The sum total of the two columns will always equal the SubsPurchased total.
Can anyone help?
You can include case statements within aggregate functions. Here you could choose sum(case when SalesRepName is null then 1 else 0 end) as online and sum(case when SalesRepName is not null then 1 else 0 end) as telesales.
count(case when SalesRepName is null then 1 end) as online would give the same result. Using sum in these situations is simply my personal preference.
Note that omitting the else clause is equivalent to setting else null, and null isn't counted by count. This can be very useful in combination with exact_count_distinct, which has no equivalent in terms of sum.
Try below:
it assumes your SalesRepName field is in [sample_internal_data.charge0209] table
and then it uses "tiny version" of SUM(CASE ... WHEN ...) which works when you need 0 or 1 as a result to be SUM'ed
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telsales
SELECT
UNIQUE(Name) AS SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) AS RevenueGenerated,
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telesales
FROM (
SELECT SalesRepName, mantaSubscriptionIdmetadata, planIdmetadata, INTEGER(Amount) AS RevenueGenerated
FROM [sample_internal_data.charge0209]
WHERE revenueSourcemetadata = 'new'
AND Status = 'Paid'
GROUP BY mantaSubscriptionIdmetadata, planIdmetadata, RevenueGenerated
)a
JOIN (
SELECT id, Name, Interval
FROM [sample_internal_data.subplans]
WHERE id IN ('150017','150030','150033','150019')
GROUP BY id, Name, Interval
)b
ON a.planIdmetadata = b.id
GROUP BY ID, Interval, Name
ORDER BY Interval ASC