Dynamic SQL Statement Creation - Scala - sql

I have a file with following values
matchId Id
2e0c6c42-68ac-43e0-b130-1b986f61a462 segA
2e0c6c42-68ac-43e0-b130-1b986f61a463 segB
2e0c6c42-68ac-43e0-b130-1b986f61a463 segC
2e0c6c42-68ac-43e0-b130-1b986f61a463 segA
I want pivoted result like below
matchid segA segB segC
2e0c6c42-68ac-43e0-b130-1b986f61a463 1 1 1
2e0c6c42-68ac-43e0-b130-1b986f61a462 1 0 0
This means some id is present in SegA and some in all segments (should be denoted as binary 1's and 0's)
Since number of segments can vary I want SQL statement to be generated dynamically using Scala as below (it should scale up or down as per number of segments e.g. today i have 10 seg tomorrow it can be 5 and so on.)
From SQL (AWS Redshift DB) perspective I can generate below query if I know number of segments already but this can get complicated as number of segment increase.
CREATE TABLE pivotedsegments distkey(match_id) AS (
SELECT match_id,
MAX(CASE WHEN segment='Seg1' then 1 else 0 end) Seg1,
MAX(CASE WHEN segment='Seg2' then 1 else 0 end) Seg2,
MAX(CASE WHEN segment='Seg3' then 1 else 0 end) Seg3,
MAX(CASE WHEN segment='Seg4' then 1 else 0 end) Seg4,
MAX(CASE WHEN segment='Seg5' then 1 else 0 end) Seg5,
MAX(CASE WHEN segment='Seg6' then 1 else 0 end) Seg6,
MAX(CASE WHEN segment='Seg7' then 1 else 0 end) Seg7,
MAX(CASE WHEN segment='Seg8' then 1 else 0 end) Seg8,
MAX(CASE WHEN segment='Seg9' then 1 else 0 end) Seg9,
MAX(CASE WHEN segment='Seg10' then 1 else 0 end) Seg10,
MAX(CASE WHEN segment='Seg11' then 1 else 0 end) Seg11,
.
.
.
.
From some-table group by matchid;
So I want to design an scala API that can read such file and convert result in pivoted manner.
Please suggest.

Related

Invalid column name 'total_counted'

I'm querying the bin table to get the total active bins, total counted bins and calculate the percent of bins counted. Here's my query:
SELECT bin.location_id
,SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active
,SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
--,(total_counted / total_active) as pct_counted
From bin
Group by bin.location_id
Order by bin.location_id
When I try to use the code to create my pct_counted, it tells me "invalid column name" for both of the columns I'm using to calculate that value. Data looks like below.
location_id total_active total_counted
2 11502 484
6 2281 108
15 1772 253
Can anyone help?
You need to repeat the expressions (or use a subquery or CTE). I would recommend:
SELECT bin.location_id,
SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active,
SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
(SUM(CASE WHEN bin.date_last_counted > 0 THEN 1.0 ELSE 0 END) /
SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 END)
) AS as pct_counted
FROM bin
GROUP BY bin.location_id
ORDER BY bin.location_id;
Note that I removed the ELSE clause for the second expression. This avoids divide-by-zero. The 1.0 also ensures decimal division even if your database does integer division.
I would format the sub-query this way:
Select
a.location_id
total_counted/total_active as pct_counted
from(select
bin.location_id
,SUM(CASE WHEN bin.delete_flag = 'N' THEN 1 ELSE 0 END) AS total_active
,SUM(CASE WHEN bin.date_last_counted > 0 THEN 1 ELSE 0 END) AS total_counted
From bin
Group by bin.location_id
Order by bin.location_id) a

I need to Count the Number of Columns that have value more than 0 in SQL

I want to create a calculated field at the end of the columns where it will count all the Columns having values greater than 0.
Below is a sample Data Set.
Account_number DAY_0 DAY_30 DAY_60 DAY_90 DAY_120
acc_001 99 10 0 0.2 0
You can use case expressions:
select t.*,
( (case when day_0 > 0 then 1 else 0 end) +
(case when day_30 > 0 then 1 else 0 end) +
(case when day_60 > 0 then 1 else 0 end) +
(case when day_90 > 0 then 1 else 0 end) +
(case when day_120 > 0 then 1 else 0 end)
) as num_gt_zero
from t;
That said, you probably constructed this from a group by query. You might be able to put this logic directly into that query. If that is the case, ask a new question, with sample data, desired results, and an appropriate database tag.

Running Counts on Multiple Columns with One SQL Query

I'm looking to run a count SQL query on multiple columns at once. 83 of them. For each, I'd simply like to figure out how many instances there are in which the value = 1.
ex.
select count(*) from [filename.filename]
where [Column001] = 1
select count(*) from [filename.filename]
where [Column002] = 1
All data in each column is marked with wither a 0 or a 1.
Instead of writing 83 small queries, is there a way for me to write it all in one query and have them all display as a table with the results?
This seems to be what you want:
SELECT SUM(CASE WHEN Column_1 = 1 THEN 1 ELSE 0 END) N_1,
SUM(CASE WHEN Column_2 = 1 THEN 1 ELSE 0 END) N_2,
SUM(CASE WHEN Column_3 = 1 THEN 1 ELSE 0 END) N_3,
.....
SUM(CASE WHEN Column_83 = 1 THEN 1 ELSE 0 END) N_83
FROM YourTable;

SQL Transform of raw data into summary table?

I have a table full of survey data which I need to transform into a nice summary table. The survey question was (for example) "Rank your restaurant preferences in order". The raw data looks like:
CUST_ID WENDYS_RANK MCDONALDS_RANK BURGERKING_RANK
1 First Third Second
2 Second First Third
3 None First Second
4 Second Third First
(repeat for 100,000+ records)
I need to turn this into a nice table that looks like:
NAME NUM_FIRST NUM_SECOND NUM_THIRD
Wendys 1 2 0
McDonalds 2 0 2
BK 1 2 1
But it's been so darn long since I did a transformation like this I not only forgot how to write the SQL, I forgot what this transformation is called, which makes it hard to google. Can anyone help me?
Thanks...
Here is one method:
select 'Wendys',
sum(case when Wendys_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when Wendys_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when Wendys_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata
union all
select 'McDonalds',
sum(case when McDonalds_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when McDonalds_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when McDonalds_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata
union all
select 'BK',
sum(case when BK_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when BK_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when BK_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata;

Proper way to create a pivot table with crosstab

How do I convert the following query into a pivot table using crosstab?
select (SUM(CASE WHEN added_customer=false
THEN 1
ELSE 0
END)) AS CUSTOMERS_NOT_ADDED, (SUM(CASE WHEN added_customer=true
THEN 1
ELSE 0
END)) AS CUSTOMERS_ADDED,
(select (SUM(CASE WHEN added_sales_order=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS SALES_ORDER_NOT_ADDED,
(select (SUM(CASE WHEN added_sales_order=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS SALES_ORDER_ADDED,
(select (SUM(CASE WHEN added_fulfillment=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS ITEM_FULFILLMENT_NOT_ADDED,
(select (SUM(CASE WHEN added_fulfillment=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS ITEM_FULFILLMENT_ADDED,
(select (SUM(CASE WHEN added_invoice=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS INVOICE_NOT_ADDED,
(select (SUM(CASE WHEN added_invoice=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS INVOICE_ADDED,
(select (SUM(CASE WHEN added_ra=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS RA_NOT_ADDED,
(select (SUM(CASE WHEN added_ra=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS RA_ADDED,
(select (SUM(CASE WHEN added_credit_memo=false
THEN 1
ELSE 0
END))
FROM shipments_data
) AS CREDIT_MEMO_NOT_ADDED,
(select (SUM(CASE WHEN added_credit_memo=true
THEN 1
ELSE 0
END))
FROM shipments_data
) AS CREDIT_MEMO_ADDED
FROM shipments_data;
This query gives me data in a standard row format however I would like to show this as a pivot table in the following format:
Added Not_Added
Customers 100 0
Sales Orders 50 50
Item Fulfillemnts 0 100
Invoices 0 100
...
I am using Heroku PostgreSQL, which is running v9.1.6
Also, I'm not sure if my above query can be optimized or if this is poor form. If it can be optimized/improved I would love to learn how.
The tablefunc module that supplies crosstab() is available for 9.1 (like for any other version this side of the millennium). Doesn't Heroku let you install additional modules? Have you tried:
CREATE EXTENSION tablefunc;
For examples how to use it, refer to the manual or this related question:
PostgreSQL Crosstab Query
OR try this search - there are a couple of good answers with examples on SO.
To get you started (like most of the way ..) use this largely simplified and re-organized query as base for the crosstab() call:
SELECT 'added'::text AS col
,SUM(CASE WHEN added_customer THEN 1 ELSE 0 END) AS customers
,SUM(CASE WHEN added_sales_order THEN 1 ELSE 0 END) AS sales_order
,SUM(CASE WHEN added_fulfillment THEN 1 ELSE 0 END) AS item_fulfillment
,SUM(CASE WHEN added_invoice THEN 1 ELSE 0 END) AS invoice
,SUM(CASE WHEN added_ra THEN 1 ELSE 0 END) AS ra
,SUM(CASE WHEN added_credit_memo THEN 1 ELSE 0 END) AS credit_memo
FROM shipments_data
UNION ALL
SELECT 'not_added' AS col
,SUM(CASE WHEN NOT added_customer THEN 1 ELSE 0 END) AS customers
,SUM(CASE WHEN NOT added_sales_order THEN 1 ELSE 0 END) AS sales_order
,SUM(CASE WHEN NOT added_fulfillment THEN 1 ELSE 0 END) AS item_fulfillment
,SUM(CASE WHEN NOT added_invoice THEN 1 ELSE 0 END) AS invoice
,SUM(CASE WHEN NOT added_ra THEN 1 ELSE 0 END) AS ra
,SUM(CASE WHEN NOT added_credit_memo THEN 1 ELSE 0 END) AS credit_memo
FROM shipments_data;
If your columns are defined NOT NULL, you can further simplify the CASE expressions.
If performance is crucial, you can get all aggregates in a single scan in a CTE and split values into two rows in the next step.
WITH x AS (
SELECT count(NULLIF(added_customer, FALSE)) AS customers
,sum(added_sales_order::int) AS sales_order
...
,count(NULLIF(added_customer, TRUE)) AS not_customers
,sum((NOT added_sales_order)::int) AS not_sales_order
...
FROM shipments_data
)
SELECT 'added'::text AS col, customers, sales_order, ... FROM x
UNION ALL
SELECT 'not_added', not_customers, not_sales_order, ... FROM x;
I also demonstrate two alternative ways to build your aggregates - both built on the assumption that all columns are boolean NOT NULL. Both alternatives are syntactically shorter, but not faster. In previous testes all three methods performed about the same.