BigQuery Standard SQL Pivot Rows to Columns [duplicate] - sql

This question already has answers here:
How to Pivot table in BigQuery
(7 answers)
Closed 2 years ago.
Im trying to query data that uses rows to store time series data using Standard SQL in BigQuery. Example data below. There will be way more Jobs than A-D
+-----+------------+--------------+-----------+
| Job | BatchDate | SuccessCount | FailCount |
+-----+------------+--------------+-----------+
| A | 2018-01-01 | 35 | 1 |
| A | 2018-01-07 | 13 | 6 |
| B | 2018-01-01 | 12 | 23 |
| B | 2018-01-07 | 67 | 12 |
| C | 2018-01-01 | 9 | 4 |
| C | 2018-01-07 | 78 | 6 |
| D | 2018-01-01 | 3 | 78 |
| D | 2018-01-07 | 99 | 5 |
+-----+------------+--------------+-----------+
I would like to have the following as output but cannot work out how to accomplish this in BigQuery.
SuccessCount values in column
+-----+------------+--------------+
| Job | 2018-01-01 | 2018-01-07 |
+-----+------------+--------------+
| A | 35 | 13 |
| B | 12 | 67 |
| C | 9 | 78 |
| D | 3 | 99 |
+-----+------------+--------------+
Is this sort of thing possible with BigQuery? Can anyone provide a working example?
Thanks
Edit
The data will grow over time, with new entries for each job per week. Is there a way to do this without having to hard code each BatchDate as a column?

If the Job is available on all rows, then conditional aggregation does what you want:
select job,
sum(case when batchdate = '2018-01-01' then SuccessCount else 0 end) as success_20180101,
sum(case when batchdate = '2018-01-07' then SuccessCount else 0 end) as success_20180107
from t
group by job
order by job;

use case when
select Job,
sum(case when batchdate='2018-01-01' then SuccessCount else 0 end) as s_01
sum(case when batchdate = '2018-01-07' then SuccessCount else 0 end) as s_07
from t group by job

Related

SQL - Identify consecutive numbers in a table

Is there a way to flag consecutive numbers in an SQL table?
Based on the values in 'value_group_4' column, is it possible to tag continous values? This needs to be done within groups of each 'date_group_1'
I tried using row_numbers, rank, dense_rank but unable to come up with a foolproof way.
This has nothing to do with consecutiveness. You simply want to mark all rows where date_group_1 and value_group_4 are not unique.
One way:
select
mytable.*,
case when exists
(
select null
from mytable agg
where agg.date_group_1 = mytable.date_group_1
and agg.value_group_4 = mytable.value_group_4
group by agg.date_group_1, agg.value_group_4
having count(*) > 1
) then 1 else 0 end as flag
from mytable
order by date_group_1, value_group_4;
In a later version of SQL Server you'd use COUNT OVER instead.
SQL tables represent unordered sets. There is no such thing as consecutive values, unless a column specifies the ordering. Your data does not have such an obvious column, but I'll assume one exists and just call it id for convenience.
With such a column, lag()/lead() does what you want:
select t.*,
(case when lag(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
when lead(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
else 0
end) as flag
from t;
On close inspection, value_group_3 may do what you want. So you can use that for the id.
If your version of SQL Server doesn't have a full suite of windowing functions it should be still possible. This problem looks like a last-non-null problem which Itzik Ben-Gan has good example here... http://www.itprotoday.com/software-development/last-non-null-puzzle
Also, look at Mikael Eriksson's answer here which uses no windowing functions.
If the order of your data is determined by the date_group_1, value_group_3 column values, then why not make it as simple as the following query:
select
*,
rank() over(partition by date_group_1 order by value_group_3) - 1 value_group_3,
case
when count(*) over(partition by date_group_1, value_group_3) > 1 then 1
else 0
end expected_result
from data;
Output:
| date_group_1 | category_group_2 | value_group_3 | value_group_3 | expected_result |
+--------------+------------------+---------------+---------------+-----------------+
| 2018-01-11 | A | 15.3 | 0 | 0 |
| 2018-01-11 | B | 17.3 | 1 | 1 |
| 2018-01-11 | A | 17.3 | 1 | 1 |
| 2018-01-11 | B | 21 | 3 | 0 |
| 2018-01-22 | A | 15.3 | 0 | 0 |
| 2018-01-22 | B | 17.3 | 1 | 0 |
| 2018-01-22 | A | 21 | 2 | 0 |
| 2018-01-22 | B | 23 | 3 | 0 |
| 2018-03-13 | A | 15.3 | 0 | 0 |
| 2018-03-13 | B | 17.3 | 1 | 1 |
| 2018-03-13 | A | 17.3 | 1 | 1 |
| 2018-03-13 | B | 23 | 3 | 0 |
| 2018-05-15 | A | 6 | 0 | 0 |
| 2018-05-15 | B | 6.3 | 1 | 0 |
| 2018-05-15 | A | 15 | 2 | 0 |
| 2018-05-15 | B | 16.3 | 3 | 1 |
| 2018-05-15 | A | 16.3 | 3 | 1 |
| 2018-05-15 | B | 22 | 5 | 0 |
| 2019-05-04 | A | 0 | 0 | 0 |
| 2019-05-04 | B | 7 | 1 | 0 |
| 2019-05-04 | A | 15.3 | 2 | 0 |
| 2019-05-04 | B | 17.3 | 3 | 0 |
Test it online with SQL Fiddle.

PostgreSQL multiple row as columns

I have a table like this:
| id | name | segment | date_created | question | answer |
|----|------|---------|--------------|----------|--------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 |
| 1 | John | 1 | 2018-01-01 | 14 | 37 |
| 1 | John | 1 | 2018-01-01 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 |
| 2 | Jack | 3 | 2018-03-11 | 23 | 16 |
And I want to show this information in a single row, transpose all the questions and answers as columns:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | | |
The number os questions/answers for the same ID is known. Maximum of 15.
I'm already tried using crosstab, but it only accepts a single value as category and I can have 2 (question/answer). Any help how to solve this?
You can try to use row_number to make a number in subquery then, do Aggregate function condition in the main query.
SELECT ID,
Name,
segment,
date_created,
max(CASE WHEN rn = 1 THEN question END) question_01 ,
max(CASE WHEN rn = 1 THEN answer END) answer_01 ,
max(CASE WHEN rn = 2 THEN question END) question_02,
max(CASE WHEN rn = 2 THEN answer END) answer_02,
max(CASE WHEN rn = 3 THEN question END) question_03,
max(CASE WHEN rn = 3 THEN answer END) answer_03
FROM (
select *,Row_number() over(partition by ID,Name,segment,date_created order by (select 1)) rn
from T
) t1
GROUP BY ID,Name,segment,date_created
sqlfiddle
[Results]:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 1 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | (null) | (null) |

Postgres convert values to columns

I have postgres table with structure:
|key| position | date |
|---|----------|------------|
| 1 | 5 | 2017-07-01 |
|---|----------|------------|
| 1 | 9 | 2017-07-02 |
|---|----------|------------|
| 2 | 4 | 2017-07-01 |
|---|----------|------------|
| 2 | 8 | 2017-07-02 |
But I need to have the selected data in a format like this:
| key | 2017-07-01 | 2017-07-02 |
|-----|------------|------------|
| 1 | 5 | 9 |
|-----|------------|------------|
| 2 | 4 | 8 |
How can I do something like this?
If you have one row per key and per date, then one way is conditional aggregation
select
key,
min(case when date = '2017-07-01' then position end) as "2017-07-01",
min(case when date = '2017-07-02' then position end) as "2017-07-02"
from t
group by key

How to select multiple count(*) values then group by a specific column

I've used SQL for a while but wouldn't say I'm at an advanced level. I've tinkered with trying to figure this out myself to no avail.
I have two tables - Transaction and TransactionType:
| **TransactionID** | **Name** | **TransactionTypeID** |
| 1 | Tom | 1 |
| 2 | Jim | 1 |
| 3 | Mo | 2 |
| 4 | Tom | 3 |
| 5 | Sarah | 4 |
| 6 | Tom | 1 |
| 7 | Sarah | 1 |
| **TransactionTypeID** | **TransactionType** |
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
The Transaction.TransactionTypeID is a Forein Key linked TransactionType.TransactionTypeID field.
Here's what I'd like to achieve:
I'd like a query (this will be a stored procedure) that returns three columns:
Name - the value of the Transaction.Name column.
NumberOfTypeATransactions - the count of the number of all transactions of type 'A' for that person.
NumberOfNonTypeATransactions - the count of the number of all transactions that are NOT of type A for that person, i.e. all other transaction types.
So, using the above data as an example, the result set would be:
| **Name** | **NumberOfTypeATransactions** | **NumberOfNonTypeATransactions** |
| Tom | 2 | 1 |
| Jim | 1 | 0 |
| Mo | 0 | 1 |
| Sarah | 1 | 1 |
I might also need to return the results based on a date period (which will be based on a 'transaction date' column in the Transaction table but I haven't finalized this requirement yet.
Any help in how I can achieve this would be much appreciated. Apologies of the layout of the tables is a bit odd - haven't worked out how to format them properly yet.
This is just conditional aggregation with a join:
select t.name,
sum(case when tt.TransactionType = 'A' then 1 else 0 end) as num_As,
sum(case when tt.TransactionType <> 'A' then 1 else 0 end) as num_notAs
from transaction t join
transactiontype tt
on tt.TransactionTypeID = t.TransactionTypeID
group by t.name;

Calculation going wrong due to JOIN issue

Table -
+----+-----------+-----------+---------+---------------------+------------+
| ID | Client_Id | Driver_Id | City_Id | Status | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
My attempt -
WITH src
AS (SELECT Count(status) AS Denom,
request_at
FROM trips
WHERE status = 'completed'
GROUP BY request_at),
src2
AS (SELECT Count(status) AS Num,
request_at
FROM trips
WHERE status <> 'completed'
GROUP BY request_at)
SELECT Cast(Count(num) AS FLOAT)/Cast(Count(Denom) AS FLOAT) AS cancel_rate,
trips.request_at
FROM src,
src2,
trips
GROUP BY trips.request_at;
I am trying to find the cancellation rate per day but it is clearing wrong (MY OUTPUT)-
+-------------+------------+
| cancel_rate | request_at |
+-------------+------------+
| 24 | 2013-10-01 |
| 18 | 2013-10-02 |
| 18 | 2013-10-03 |
+-------------+------------+
The cancellation rate for 2013-10-01 should be 0.5 and not 24. Similarly for other dates it should be different.
I know the problem lies with this part but I do not know what is the correct way or how to approach it
SELECT Cast(Count(num) AS FLOAT)/Cast(Count(Denom) AS FLOAT) AS cancel_rate,
trips.request_at
FROM src,
src2,
trips
Is there any way to put in more than 1 select statement in With NAME as () clause ? So that I won't use any JOIN or multiple tables.
Use conditional aggregation:
SELECT SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as denom,
SUM(CASE WHEN status <> 'completed' THEN 1 ELSE 0 END) as num,
AVG(CASE WHEN status <> 'completed' THEN 1.0 ELSE 0 END) as cancel_rate
FROM trips
GROUP BY request_at;
Note that calculation for the cancel_rate. This is simpler to do using AVG() rather than dividing the two values. The use of 1.0is because SQL Server does integer arithmetic, so 1 / 2 is 0 rather than 0.5.
OK, a bit late again, but here is another variation (edited):
SELECT SUM(CASE LEFT(status,9) WHEN 'cancelled' THEN 1. ELSE 0 END)
/COUNT(*) cancellation_rate,
request_at
FROM trips GROUP BY request_at ORDER BY request_at