Change direction of table in BigQuery - google-bigquery

I have a dataset [lipid] that extracted from electronic medical record system (EMRS). In that EMRS, the physician give order to obtain a laboratory blood profile from a patient with a unique order number BUT with a different service types. So, if one order has 4 service types, EMRS will record the event on 4 rows (identical [duplicates] order number in Order_no column, BUT with a different service types in Service_type column) like this;
Order_no
Service_type
Result
1
TC
230
1
HDL
40
1
TG
150
1
LDL
90
Sometimes, one order may has <4 service types, hence, order will be like that;
Order_no
Service_type
Result
1
TC
230
1
HDL
40
1
TG
150
1
LDL
90
2
TC
230
2
HDL
40
4
TC
230
4
HDL
40
4
LDL
90
5
TC
230
5
TG
150
5
LDL
90
6
TC
230
8
TC
230
8
HDL
40
8
TG
150
8
LDL
90
What I'm trying to do is writing a query that keeps Order_no column and change direction of table as well as merge identical order number in one row like this;
Order_no
TC
HDL
TG
LDL
1
230
40
150
90
2
250
66
4
199
39
99
5
299
45
190
6
400
8
400
40
250
290
How can I write this query in Google BigQuery?

Use below approach
select * from your_table
pivot (any_value(Result) for Service_type in ('TC', 'HDL', 'TG', 'LDL'))
In case if Service Type is not known in advance - you can use below
execute immediate (select '''
select * from your_table
pivot (any_value(Result) for Service_type in (''' || string_agg(distinct "'" || Service_type || "'") ||
"))"
from your_table
)

You can use PIVOT.
Example:
WITH your_table AS
(
SELECT 1 AS Order_no, 'TC' AS Service_type, 230 AS Result
UNION ALL
SELECT 1, 'HDL', 40
UNION ALL
SELECT 1, 'TG', 150
UNION ALL
SELECT 1, 'LDL', 90
)
SELECT *
FROM your_table PIVOT(SUM(Result) FOR Service_type IN ('TC', 'HDL', 'TG', 'LDL'))

Related

ORACLE DB query - maximum stipend in faculty

So, i have an task in uni to get max stipend in each faculty from a table with stipends.
Faculty table is:
ID_FACULTY FACULTY_NAME DEAN TELEPHON
---------- ------------------------------ -------------------- --------
10 Informacijas tehnologiju Vitols 63023095
11 Lauksaimniecibas Gaile 63022584
12 Tehniska Dukulis 53020762
13 Partikas tehnologijas Sabovics 63021075
Money table is:
ID_PAYOUT STUDENT_ID PAYOUT_DA STIPEND COMPENSATION
---------- ---------- --------- ---------- ------------
100 1 24-SEP-20 45.25 15
101 7 20-SEP-20 149.99 0
102 3 18-SEP-20 100 0
103 17 02-SEP-20 90.85 20
104 9 03-SEP-20 85 20
105 19 09-SEP-20 70.75 0
106 25 15-SEP-20 55 15
107 17 17-SEP-20 105.54 0
108 15 22-SEP-20 94 0
109 27 28-SEP-20 100 20
And the student table is:
ID_STUDENT SURNAME NAME COURSE_YEAR FACULTY_ID BIRTHDATE
---------- ------------------------- -------------------- ----------- ---------- ---------
1 Lapa Juris 4 13 27-SEP-96
3 Vilkauss Fredis 2 10 17-MAY-99
5 Karlsone Rasa 1 11 13-MAR-00
7 Grozitis Guntars 3 12 16-APR-97
9 Sonciks Jurgis 2 10 17-MAR-99
11 Berzajs Olafs 3 10 14-FEB-97
13 Vike Ilvija 2 13 14-MAY-99
15 Baure Inga 3 11 12-APR-97
17 Viskers Zigmunds 2 13 15-AUG-99
19 Talmanis Harijs 3 13 15-JUL-97
21 Livmanis Indulis 1 10 19-JAN-00
23 Shaveja Uva 2 13 18-FEB-98
25 Lacis Guntis 4 10 17-SEP-96
27 Liepa Guna 4 11 18-AUG-96
29 Klava Juris 2 10 19-MAY-98
I have tried many variations of queries, i think that I even tried all the possible combinations of joins, but i cannot achieve the neccessary result.
One of my queries looked like this:
SQL> SELECT ROW_NUMBER() OVER (ORDER BY surname) "Nr.",
f.faculty_name,
s.surname,
s.name,
MAX(m.stipend)
FROM faculty f, student s INNER JOIN money m ON s.id_student = m.student_id
WHERE s.faculty_id = f.id_faculty
GROUP BY f.faculty_name, s.surname, s.name
ORDER BY s.surname;
Which returned me the following result:
Nr. FACULTY_NAME SURNAME NAME MAX(M.STIPEND)
---------- ------------------------------ ------------------------- -------------------- --------------
1 Lauksaimniecibas Baure Inga 94
2 Tehniska Grozitis Guntars 149.99
3 Informacijas tehnologiju Lacis Guntis 55
4 Partikas tehnologijas Lapa Juris 45.25
5 Lauksaimniecibas Liepa Guna 100
6 Informacijas tehnologiju Sonciks Jurgis 85
7 Partikas tehnologijas Talmanis Harijs 70.75
8 Informacijas tehnologiju Vilkauss Fredis 100
9 Partikas tehnologijas Viskers Zigmunds 105.54
9 rows selected.
So the goal of this task is to retrieve the maximum amount of stipend granted to a student in a certain faculty.
Can someone please tell what am I doing wrong here?
Just max amount per faculty:
SELECT
f.faculty_name,
MAX(m.stipend)
FROM
faculty f
INNER JOIN student s ON s.faculty_id = f.id_faculty
INNER JOIN money m ON s.id_student = m.student_id
GROUP BY f.faculty_name
Max amount and all other details too:
SELECT * FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY f.faculty_name ORDER BY m.stipend desc) rn,
f.*,
s.*,
m.*
FROM
faculty f
INNER JOIN student s ON s.faculty_id = f.id_faculty
INNER JOIN money m ON s.id_student = m.student_id
) x
WHERE x.rn = 1
Points of note:
Do not use old style joins; if you ever write one table_name, other_table_name in a FROM block, you're using old style joins. Don't do it; they became bad news about 30 years ago
When you have a max-n-per-group, you specify how finely detailed the group is. If you GROUP BY s.first_name, s.last_name, f.faculty_name then your groups are "every unique combination of firstname/lastname/faculty, so the only way you'll get multiple items in your group is if there are two John Smiths in Mathematics. If the group is to be the whole of mathematics, then the faculty name (and anything else that is uniquely related 1:1 to it, like the faculty ID) is all that you can put in your group. Anything not in a group must be in an aggregation, like MAX
When you want other details too, you either group and max the data and then join this groupmaxed data back to the original data to use it as a filter, or you use an approach like here where you use a row_number or rank, with a partition (which is like an autojoined grouped summary). There is no group here; the row numbering acts like a group because it restarts from 1 every different faculty and proceeds incrementally as stipend decreses. This means that the highest stipend is always in row number 1.
Unlike using a groupmax that you join back to get the detail, the row_number route does not produce duplicate rows with tied-for-highest stipends

calculate Count and Sum from two different table with group by without using inner query

I have two table first A having column id,phone_number,refer_amount
and second B having column phone_number,transaction_amount
now i want sum() of refer_amount and transaction_amount and count() of phone_number from both table using group by phone_number without using inner query
Table A
phone_number refer_amount
123 50
456 80
789 90
123 90
123 80
123 20
456 20
456 79
456 49
123 49
Table B
phone_number transaction_amount
123 50
123 51
123 79
456 22
456 11
456 78
456 66
456 88
456 88
456 66
789 66
789 23
789 78
789 46
i have tried following query but it gives me wrong output:
SELECT a.phone_number,COUNT(a.phone_number) AS refer_count,SUM(a.refer_amount) AS refer_amount,b.phone_number,COUNT(b.phone_number) AS toal_count,SUM(b.transaction_amount) AS transaction_amount FROM dbo.A AS a,dbo.B AS b WHERE a.phone_number=b.phone_number GROUP BY a.phone_number,b.phone_number
output (wrong):
phone_number refer_count refer_amount phone_number transaction_count transaction_amount
123 15 867 123 15 900
456 28 1596 456 28 1676
789 5 450 789 5 291
output (That I want):
phone_number refer_count refer_amount phone_number transaction_count transaction_amount
123 5 289 123 3 180
456 4 228 456 7 419
789 1 90 789 5 291
I would do the aggregations on the B table in a separate subquery, and then join to it:
SELECT
a.phone_number,
COUNT(a.phone_number) AS a_cnt,
SUM(a.refer_amount) AS a_sum,
COALESCE(b.b_cnt, 0) AS b_cnt,
COALESCE(b.b_sum, 0) AS b_sum
FROM A a
LEFT JOIN
(
SELECT
phone_number,
COUNT(*) AS b_cnt,
SUM(transaction_amount) AS b_sum
FROM B
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
One major potential issue with your current approach is that the join could result in duplicate counting, as a given phone_number record in the A table gets replicated due to the join.
Speaking of joins, note that above I use an explicit join, rather than the implicit one you were using. In general, you should not put commas into the FROM clause.
This can help. You don't need sum(b.phone_number) when checking for a.phone_number = b.phone_number. Distinct is needed for phone number as there are two columns to consider.
For group by, anything not in aggregate function needs to be in group by function.
select a.phone_number, count(distinct a.phone_number), sum(a.refer_amount),
sum (b.transaction_amount)
from A as a, B as b
where a.phone_number=b.phone_number
group by a.phone_number

PostgreSQL - Group by filter out specific rows

I have 3 tables in a Postgres 9.5 DB like below,
threshold
id threshold_amount
----------------------
111 100
112 200
113 80
customers - each customer has a threshold_id of threshold table
id customer_name threshold_id
--------------------------------
313 abc 111
314 xyz 112
315 pqr 113
charges - per customer there is charges so this table has customer_id
id customer_id amount post_date
------------------------------------
211 313 50 4/1/2017
212 313 50 4/30/2017
213 313 50 5/15/2017
214 314 100 3/1/2017
215 314 50 3/21/2017
216 314 50 4/21/2017
217 314 100 5/1/2017
218 315 80 5/5/2017
I want to query it and return the specific post_date with sum( amount ) == threshold_amount by ascending order of charges.id column,
The resultset look like below,
customer_id post_date
-----------------------
313 4/30/2017
314 4/21/2017
315 5/5/2017
I've tried sum( amount ) with group by customer_id and call the one separate the stored procedure from select clause and pass the amount, post_date and threshold_amount then created one temp table and insert post_date into it if the above condition get match and then again access that temp table but it seems something not valid so I want to know if some other solution or Can I do it in query?
Thanks
Your question is asking about an exact match for the threshold. This is basically a cumulative sum:
select cct.*
from (select ch.customer_id, ch.amount,
sum(ch.amount) over (partition by ch.customer_id order by post_date) as running_amount,
t.threshold_amount
from charges ch join
customers c
on ch.customer_id = c.id join
threshholds t
on c.threshold_id = t.id
) cct
where running_amount = threshold_amount;
try this:
select
c.customer_id,
c.post_date
from charges c
join customers cu on cu.id = c.customer_id
join threshold t on t.id = cu.threshold_id
where (select sum(cc.amount) from charges cc where cc.id <= c.id
and cc.customer_id = c.customer_id) = t.threshold_amount

Oracle 10g unpivot returning values in one column and creating month column

We just found out that the new database we have been given access to is Oracle 10g, so we are unable to use fcn like UNPIVOT.
We have a table like this..
SUBMISSION COUNTRY CPM_ID PFM_ID T_AREA CNTRY_CODE V_TYPE RES_CAT JAN_2014 FEB_2014
01-JUN-2014 USA 10 24 TEST1 USA V1 210 5 10
01-AUG-2014 UK 20 30 TEST2 UK V1 213 20 30
The desired output would look like this...
SUBMISSION COUNTRY CPM_ID PFM_ID T_AREA CNTRY_CODE V_TYPE RES_CAT MONTH VALUE
01-JUN-2014 USA 10 24 TEST1 USA V1 210 01-JAN-2014 5
01-JUN-2014 USA 10 24 TEST1 USA V1 210 01-FEB-2014 10
01-AUG-2014 UK 20 30 TEST2 UK V1 213 01-JAN-2014 20
01-AUG-2014 UK 20 30 TEST2 UK V1 213 01-FEB-2014 30
I am working with a query like this...but I cannot get the month column to come out right...
select *
from (select t.submission,
t.country,
t.cpm_id,
t.pfm_id,
t.t_area,
t.cntry_code,
t.v_type,
t.res_cat,
(case
when n.n = 1 then JAN-2014
when n.n = 1 then FEB-2014 end) as value
from table1 t cross join
(select FEB_2014 as n from dual union all
select FEB_2014 from dual) n
) s
where value is not null;
Thanks for your help,
I would do:
select t.submission,
t.country,
t.cpm_id,
t.pfm_id,
t.t_area,
t.cntry_code,
t.v_type,
t.res_cat,
n.d,
case when n.d = '01-JAN-2014' then t.jan_2014 else t.feb_2014 end value
from table1 t
cross join
(
select '01-JAN-2014' d from dual
union all
select '01-FEB-2014' d from dual
) n;

SQL Select latest row by date

I have a large amount of data that updates every 10 minutes or so.
There are 128 unique ID's that need to be returned but with only there latest values
CURRENT CODE
SELECT DISTINCT
id,
MAX(extractdate) AS [extractdate],
total,
used,
free
FROM
maintable
INNER JOIN datatable ON maintable.unkey = datatable.dataunkey
GROUP BY id, total, used, free
ORDER BY id
CURRENT OUTPUT
id extractdate total used free
1 2014-08-28 00:20:00.000 50 20 30
1 2014-08-28 00:30:00.000 50 30 20
1 2014-08-28 00:40:00.000 50 10 40
2 2014-08-28 00:20:00.000 50 20 30
2 2014-08-28 00:30:00.000 50 30 20
2 2014-08-28 00:40:00.000 50 25 25
etc etc
**DESIRED OUTPUT**
id extractdate total used free
1 2014-08-28 00:40:00.000 50 10 40
2 2014-08-28 00:40:00.000 50 25 25
etc etc
Try:
SELECT
a.id,
a.extractdate,
a.total,
a.used,
a.free
FROM(
SELECT
id,
MAX(extractdate) AS [extractdate],
total,
used,
free,
ROW_NUMBER()OVER(partition by id ORDER BY MAX(extractdate) desc) AS rnk
FROM maintable
INNER JOIN datatable ON maintable.unkey = datatable.dataunkey
GROUP BY id, total, used, free )a
WHERE a.rnk = 1
Should work, i've just tested it on the similar fall, only without join:
SELECT id, extractdate,total,used,free
FROM maintable m INNER JOIN datatable ON m.unkey = datatable.dataunkey
where extractdate = (select max(extractdate) from manitable m1 where m1.id = m.id)
ORDER BY id