r sql code for counting combinations of status - sql

I have big dataset on SQl connection, e.g. connection = dbConnect(odbc::odbc(), dsn = "dsn", encoding = "latin1")
In big dataset, I have 3 columns:
ID (e.g. 111, 112, 113, etc.)
YEAR (e.g. 2010, 2011, 2012, etc)
STATUS (1 or 0 : non-numeric).
In dataset, each ID appears more than one (e.g. ID = 111 in 2010 has STATUS = 1, ID = 111 in 2011 has STATUS = 0, etc.)
Using SQL code, I want to find out the total number of IDs in which all STATUS in that ID is :
A: only 0 (e.g. 45% of all rows)
B: only 1 (e.g. 50% of all rows)
C: both 1 and 0 (e.g. 5% of all rows)
I also want to make list of which ID appear in A, B, C. (e.g. A = 111, 112, 115 ; B = 114, 116, etc.)
I read about function dbGetQuery(connection, "insert sql code here") - but I don't know how I can write SQL code to count total numbers and make list of ID.
How I can do this? Is this with window-lag function?

Possible, this example help you to explain your goal
select gr as 'Group',sum(idRows) rowsInGroup
,sum(sum(idRows))over() rowsTotal
,sum(idRows)*1.0/(sum(sum(idRows))over()) pct
,count(*) idsCountInGroup
,string_agg(id,',') within group(order by id) idsInGroup --for SQL Server
from (
select id,min(status) minS,max(status) maxS
, count(*) as idRows -- rows by Id
,case when min(status) =max(status) then --assume status mast have only 2 values
case when min(status)='0' then 'A' else 'B' end
else 'C'
end gr --Group (A,B or C)
from bigData
group by id
) t
group by gr

Related

How to use group by with case statement [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
Improve this question
I have a table with following fields
CREATE TABLE Tblstock
( ID int , SlNo int, Storage varchar(10), stock int);
insert into Tblstock values
(1, 1, 'STORE', 100),
(2, 1, 'Floor 1', 20),
(3, 2, 'STORE', 2000),
(4, 2, 'Floor 1', 40);
I have to dynamically update the left over quantity in store after it got consumed on floor1, I have written a code to calculate qty in store using below mentioned query,
SELECT (
(SELECT CASE WHEN COUNT(B.SlNo) > 1 OR B.Storage = 'STORE' THEN SUM(B.Stock)END FROM TblStock B GROUP BY B.SlNo) -
(SELECT CASE WHEN COUNT(B.SlNo) > 1 OR B.Storage <> 'STORE' THEN SUM(B.Stock)END FROM TblStock B GROUP BY B.SlNo))
However it is not generating the desired result and throwing error
Can anybody help to write it properly so that I get single value of remaining quantity in store
You just need a straight-forward grouping and conditional aggregation
SELECT
s.SlNo,
Total = SUM(CASE WHEN s.Storage = 'STORE' THEN s.Qty ELSE -s.Qty END)
FROM TblStock s
GROUP BY
s.SlNo;
db<>fiddle
Assuming what you are trying to do is to deduct the quantity (qty) in storage called store by the sum of the rest of the other storage. I could think of a query like this:
select *,
(Qty - (select sum(b.Qty) from tblstock as b
where b.Storage <> 'store'
and b.SINo = a.SINo
group by b.SINo)) as remainingQty
from tblstock as a
where a.Storage = 'store' group by a.SINo
The query above, with the following input:
ID
SINo
Storage
Qty
1
1
store
100
2
1
floor 1
20
3
1
floor 2
30
4
2
store
100
5
2
floor 1
40
6
2
floor 2
50
It produces the following output:
ID
SINo
Storage
Qty
remainingQty
1
1
store
100
50
4
2
store
100
10
You can find the SQLFiddle here.
Note:
If you are want to avoid subquery and have the urge to chug in join fiddle:
select a.id,
a.SINo,
a.Storage,
a.Qty,
c.Qty,
(a.Qty - c.Qty) as remainingQty
from tblstock as a
join
(select b.SINo,
sum(b.Qty) as Qty
from tblstock as b
where b.Storage <> 'store'
group by b.SINo) as c
on c.SINo = a.SINo
where a.Storage = 'store' group by a.SINo

Convert a categorical column to binary representation in SQL

Consider there is a column of array of strings in a table containing categorical data. Is there an easy way to convert this schema so there is number of categories boolean columns representing binary encoding of that categorical column?
Example:
id type
-------------
1 [A, C]
2 [B, C]
being converted to :
id is_A is_B is_C
1 1 0 1
2 0 1 1
I know I can do this 'by hand', i.e. using:
WITH flat AS (SELECT * FROM t, unnest(type) type),
mid AS (SELECT id, (type='A') as is_A, (type='B') AS is_B, (type='C') as is_C)
SELECT id, SUM(is_A), SUM(is_B), SUM(is_C) FROM mid GROUP BY id
But I am looking for a solution that works when the number of categories is around 1-10K
By the way I am using BigQuery SQL.
looking for a solution that works when the number of categories is around 1-10K
Below is for BigQuery SQL
Step 1 - produce dynamically query (similar to one used in your question - but now it is built dynamically base on you table - yourTable)
#standardSQL
WITH categories AS (SELECT DISTINCT cat FROM yourTable, UNNEST(type) AS cat)
SELECT CONCAT(
"WITH categories AS (SELECT DISTINCT cat FROM yourTable, UNNEST(type) AS cat), ",
"ids AS (SELECT DISTINCT id FROM yourTable), ",
"pairs AS (SELECT id, cat FROM ids CROSS JOIN categories), ",
"flat AS (SELECT id, cat FROM yourTable, UNNEST(type) cat), ",
"combinations AS ( ",
" SELECT p.id, p.cat AS col, IF(f.cat IS NULL, 0, 1) AS flag ",
" FROM pairs AS p LEFT JOIN flat AS f ",
" ON p.cat = f.cat AND p.id=f.id ",
") ",
"SELECT id, ",
STRING_AGG(CONCAT("SUM(IF(col = '", cat, "', flag, 0)) as is_", cat) ORDER BY cat),
" FROM combinations ",
"GROUP BY id ",
"ORDER BY id"
) as query
FROM categories
Step 2 - copy result of above query, paste it back to Web UI and run Query
I think you've got an idea. Yo can implement it as above purely in SQL or you can generate final query in any client of your choice
I had tried this approach of generating the query (but in Python) the problem is that query can easily reach the 256KB limit of query size in BigQuery
First, let’s see how “easily” it is to reach 256KB limit
Assuming you have 10 chars as average length of category – in this case you can cover about 4750 categories with this approach.
With 20 as average - coverage is about 3480 and for 30 – 2750
If you will "compress" sql a little by removing spaces and AS , etc. you can make it respectively:
5400, 3800, 2970 for respectively 10, 20, 30 chars
So, I would say – Yes/Agree – it most likely reach limit before 5K in real case
So, secondly, let’s see if this is actually a big of a problem!
Just as an example, assume you need 6K categories. Let’s see how you can split this to two batches (assuming that 3K scenario does work as per initial solution)
What we need to do is to split categories to two groups – just based on category names
So first group will be - BETWEEN ‘cat1’ AND ‘cat3000’
And second group will be – BETWEEN ‘cat3001’ AND ‘cat6000’
So, now run both groups with Step1 and Step2 with temp1 and temp2 tables as destination
In Step 1 – add (to the very bottom of query - after FROM categories
WHERE cat BETWEEN ‘cat1’ AND ‘cat3000’
for first batch, and
WHERE cat BETWEEN ‘cat3001’ AND ‘cat6000’
for second batch
Now, proceed to Step 3
Step 3 – Combining partial results
#standardSQL
SELECT * EXCEPT(id2)
FROM temp1 FULL JOIN (
SELECT id AS id2, * EXCEPT(id) FROM temp2
) ON id = id2
-- ORDER BY id
You can test last logic with below simple/dummy data
WITH temp1 AS (
SELECT 1 AS id, 1 AS is_A, 0 AS is_B UNION ALL
SELECT 2 AS id, 0 AS is_A, 1 AS is_B UNION ALL
SELECT 3 AS id, 1 AS is_A, 0 AS is_B
),
temp2 AS (
SELECT 1 AS id, 1 AS is_C, 0 AS is_D UNION ALL
SELECT 2 AS id, 1 AS is_C, 0 AS is_D UNION ALL
SELECT 3 AS id, 0 AS is_C, 1 AS is_D
)
Above can easily be extended to more than just two batches
Hope this helped

Group records and show each records value in one row

I am having trouble using group by query in MS-access.
I am having table structure as below.
Number Date Marks
1 2011/3/25 20
1 2012/3/21 50
1 2013/3/22 22
1 2014/3/25 56
I want to show data like below
Number march-2011 march-2012 march-2013
1 20 50 22
Could anyone please help me with this. how can i do this in ms-access using query . I am new to ms-access.
This query simply requires conditional aggregation:
select number,
sum(iif(year(date) = 2011 and month(date) = 3, Marks, 0)) as March2011,
sum(iif(year(date) = 2012 and month(date) = 3, Marks, 0)) as March2012,
sum(iif(year(date) = 2013 and month(date) = 3, Marks, 0)) as March2013
from table
group by number;

Joining onto a table that doesn't have ranges, but requires ranges

Trying to find the best way to write this SQL statement.
I have a customer table that has the internal credit score of that customer. Then i have another table with definitions of that credit score. I would like to join these tables together, but the second table doesn't have any way to link it easily.
The score of the customer is an integer between 1-999, and the definition table has these columns:
Score
Description
And these rows:
60 LOW
99 MED
999 HIGH
So basically if a customer has a score between 1 and 60 they are low, 61-99 they are med, and 100-999 they are high.
I can't really INNER JOIN these, because it would only join them IF the score was 60, 99, or 999, and that would exclude anyone else with those scores.
I don't want to do a case statement with the static numbers, because our scores may change in the future and I don't want to have to update my initial query when/if they do. I also cannot create any tables or functions to do this- I need to create a SQL statement to do it for me.
EDIT:
A coworker said this would work, but its a little crazy. I'm thinking there has to be a better way:
SELECT
internal_credit_score
(
SELECT
credit_score_short_desc
FROM
cf_internal_credit_score
WHERE
internal_credit_score = (
SELECT
max(credit.internal_credit_score)
FROM
cf_internal_credit_score credit
WHERE
cs.internal_credit_score <= credit.internal_credit_score
AND credit.internal_credit_score <= (
SELECT
min(credit2.internal_credit_score)
FROM
cf_internal_credit_score credit2
WHERE
cs.internal_credit_score <= credit2.internal_credit_score
)
)
)
FROM
customer_statements cs
try this, change your table to contain the range of the scores:
ScoreTable
-------------
LowScore int
HighScore int
ScoreDescription string
data values
LowScore HighScore ScoreDescription
-------- --------- ----------------
1 60 Low
61 99 Med
100 999 High
query:
Select
.... , Score.ScoreDescription
FROM YourTable
INNER JOIN Score ON YourTable.Score>=Score.LowScore
AND YourTable.Score<=Score.HighScore
WHERE ...
Assuming you table is named CreditTable, this is what you want:
select * from
(
select Description, Score
from CreditTable
where Score > 80 /*client's credit*/
order by Score
)
where rownum = 1
Also, make sure your high score reference value is 1000, even though client's highest score possible is 999.
Update
The above SQL gives you the credit record for a given value. If you want to join with, say, Clients table, you'd do something like this:
select
c.Name,
c.Score,
(select Description from
(select Description from CreditTable where Score > c.Score order by Score)
where rownum = 1)
from clients c
I know this is a sub-select that executed for each returning row, but then again, CreditTable is ridiculously small and there will be no significant performance loss because of the the sub-select usage.
You can use analytic functions to convert the data in your score description table to ranges (I assume that you meant that 100-999 should map to 'HIGH', not 99-999).
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 60 score, 'Low' description from dual union all
3 select 99, 'Med' from dual union all
4 select 999, 'High' from dual
5 )
6 select description,
7 nvl(lag(score) over (order by score),0) + 1 low_range,
8 score high_range
9* from x
SQL> /
DESC LOW_RANGE HIGH_RANGE
---- ---------- ----------
Low 1 60
Med 61 99
High 100 999
You can then join this to your CUSTOMER table with something like
SELECT c.*,
sd.*
FROM customer c,
(select description,
nvl(lag(score) over (order by score),0) + 1 low_range,
score high_range
from score_description) sd
WHERE c.credit_score BETWEEN sd.low_range AND sd.high_range

SQL Query Help: Returning distinct values from Count subquery

I've been stuck for quite a while now trying to get this query to work.
Here's the setup:
I have a [Notes] table that contains a nonunique (Number) column and a nonunique (Result) column. I'm looking to create a SELECT statement that will display each distinct (Number) value where the count of the {(Number), (Result)} tuple where Result = 'NA' is > 25.
Number | Result
100 | 'NA'
100 | 'TT'
101 | 'NA'
102 | 'AM'
100 | 'TT'
200 | 'NA'
200 | 'NA'
201 | 'NA'
Basically, have an autodialer that calls a number and returns a code depending on the results of the call. We want to ignore numbers that have had an 'NA'(no answer) code returned more than 25 times.
My basic attempts so far have been similar to:
SELECT DISTINCT n1.Number
FROM Notes n1
WHERE (SELECT COUNT(*) FROM Notes n2
WHERE n1.Number = n2.Number and n1.Result = 'NA') > 25
I know this query isn't correct, but in general I'm not sure how to relate the DISTINCT n1.Number from the initial select to the Number used in the subquery COUNT. Most examples I see aren't actually doing this by adding a condition to the COUNT returned. I haven't had to touch too much SQL in the past half decade, so I'm quite rusty.
you can do it like this :
SELECT Number
FROM Notes
WHERE Result = 'NA'
GROUP BY Number
HAVING COUNT(Result) > 25
Try this:
SELECT Number
FROM (
SELECT Number, Count(Result) as CountNA
FROM Notes
WHERE Result = 'NA'
GROUP BY Number
)
WHERE CountNA > 25
EDIT: depending on SQL product, you may need to give the derived table a table correlation name e.g.
SELECT DT1.Number
FROM (
SELECT Number, Count(Result) as CountNA
FROM Notes
WHERE Result = 'NA'
GROUP
BY Number
) AS DT1 (Number, CountNA)
WHERE DT1.CountNA > 25;