converting NULLs to 0 in an ARRAY join - sql

I have an table that has nulls and I want to replace them with 0s. The table was generated by a join between a table ('Table_A') and an array ('Table_B).
Current table:
Date | Sessions | ID | City
------+----------+-------------+-------------
06-02 | 1 | 107 | Cardiff
| | 102 | Paris
06-03 | NULL | NULL | NULL
11-12 | 1 | 105 | Amsterdam
| | 107 | Cardiff
| | 103 | Rome
27-06 | NULL | NULL | NULL
Desirable Output:
Date | Sessions | ID | City
------+----------+-------------+-------------
06-02 | 1 | 107 | Cardiff
| | 102 | Paris
06-03 | 0 | 0 | 0
11-12 | 1 | 105 | Amsterdam
| | 107 | Cardiff
| | 103 | Rome
27-06 | 0 | 0 | 0
Below is my current code. I can't remove the 'ignore nulls' because it wouldn't allow me to do the join.
select date, Sessions,
array_agg(a.ID ignore nulls) as ID, array_agg(City ignore nulls) as City
from Table_B b, unnest (ID) as ID_un
left join Table_A a on ID_un = cast(a.ID as string)
group by 1, 2
...

Current Table
WITH sample_data AS (
SELECT '06-02' Date, 1 Sessions, [107, 102] ID, ['Cardiff', 'Paris'] City UNION ALL
SELECT '06-03', NULL, [], [] UNION ALL
SELECT '11-12', 1, [105, 107, 103], ['Amsterdam', 'Cardiff', 'Rome'] UNION ALL
SELECT '27-06', NULL, NULL, NULL
)
Note that an empty array is displayed as null in the output.
Desired Output
I have an table that has nulls and I want to replace them with 0s.
Below query replaces an empty or null array to [0] or ['0'] depending on it's type.
SELECT Date,
COALESCE(Sessions, 0) AS Sessions,
IF(ARRAY_LENGTH(ID) = 0 OR ID IS NULL, [0], ID) AS ID,
IF(ARRAY_LENGTH(City) = 0 OR City IS NULL, ['0'], City) AS City,
-- below is a little bit concise notation of above two.
-- IF(ARRAY_LENGTH(ID) > 0, ID, [0]) AS ID,
-- IF(ARRAY_LENGTH(City) > 0, City, ['0']) AS City,
FROM sample_data;
■ Query results

Related

Count of records by category, including zeros

I have a table in the following format:
----------------------------------------------------
| Id | user_name | submitted | reviewed | returned |
---------------------------------------------------------
| 1 | tom | 01-01-2020 | 02-01-2020 | |
| 2 | mary | 01-15-2020 | | |
| 3 | joe | 01-25-2020 | 02-07-2020 | 03-04-2020 |
| 4 | tom | 01-07-2020 | | |
| 5 | tom | 01-04-2020 | | |
| 6 | mary | 01-16-2020 | | |
| 7 | joe | 02-08-2020 | 02-08-2020 | 03-07-2020 |
| 8 | mary | 01-05-2020 | 01-20-2020 | 03-19-2020 |
| 9 | joe | 01-21-2020 | 02-09-2020 | |
---------------------------------------------------------
I want to write a query that counts the Submitted, Reviewed, and Returned records for each user, where "Submitted" is any records where submitted date in not null and reviewed and returned are null. "Reviewed" is any records where submitted and reviewed dates are not null and returned date is null. "Returned is any record where submitted, reviewed and returned dates are not null.
The desired output would be as follows:
-----------------------------------------------------
| user_name | # Submitted | # Reviewed | # Returned |
-----------------------------------------------------
| joe | 0 | 1 | 2 |
| mary | 2 | 0 | 1 |
| tom | 2 | 1 | 0 |
-----------------------------------------------------
I tried doing three separate counts queries grouped by user_name, but those miss the zeros. I'm very new to sql so any help would be greatly appreciated.
Just use count(). Based on your sample data, you can look at each column individually:
select user_name,
count(submitted) as num_submitted,
count(reviewed) as num_reviewed,
count(returned) as num_returned
from t
group by user_name;
There are no examples, for instance, where returned is non-NULL and either of the other columns are NULL.
If that is actually possibly, you could use conditional aggregation:
select user_name,
count(submitted) as num_submitted,
sum(case when submitted is not null and reviewed is not null then 1 else 0 end) as num_reviewed,
sum(case when submitted is not null and reviewed is not null and returned is not null then 1 else 0 end) as num_returned
from t
group by user_name;
You could also use count() and play games with arithmetic:
select user_name,
count(submitted) as num_submitted,
count(day(submitted) + day(reviewed)) as num_reviewed,
count(day(submitted) + day(reviewed) + day(returned)) as num_returned
from t
group by user_name;
This works because day() returns NULL if the value is NULL. And + returns NULL if any value is NULL.
Try this:
DECLARE #DataSource TABLE
(
[id] INT
,[user_name] NVARCHAR(128)
,[submitted] DATE
,[reviewed] DATE
,[returned] DATE
);
INSERT INTO #DataSource ([id], [user_name], [submitted], [reviewed], [returned])
VALUES (1, 'tom', '01-01-2020', '02-01-2020', NULL)
,(2, 'mary', '01-15-2020', NULL, NULL)
,(3, 'joe', '01-25-2020', '02-07-2020', '03-04-2020')
,(4, 'tom', '01-07-2020', NULL, NULL)
,(5, 'tom', '01-04-2020', NULL, NULL)
,(6, 'mary', '01-16-2020', NULL, NULL)
,(7, 'joe', '02-08-2020', '02-08-2020', '03-07-2020')
,(8, 'mary', '01-05-2020', '01-20-2020', '03-19-2020')
,(9, 'joe', '01-21-2020', '02-09-2020', NULL);
SELECT [user_name]
,SUM(IIF([returned] IS NULL AND [reviewed] IS NULL AND [submitted] IS NOT NULL, 1, 0)) AS [ # Submitted]
,SUM(IIF([returned] IS NULL AND [reviewed] IS NOT NULL AND [submitted] IS NOT NULL, 1, 0)) AS [# Reviewed]
,SUM(IIF([returned] IS NOT NULL AND [reviewed] IS NOT NULL AND [submitted] IS NOT NULL, 1, 0)) AS [# Returned]
FROM #DataSource
GROUP BY [user_name];

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

Bigquery avoid null data and merge rows

In Google Bigquery, i'm having data sets with data dispersed between int_value and double value, how can i merge the
-------------------------------------------------------------------------
|user_id | params.string_value | params.int_value | params.double_value |
-------------------------------------------------------------------------
| 12 | null | null | 121 |
| 12 | Tom | null | null |
| 12 | null | null | 141 |
| 12 | Kim | null | null |
| 13 | null | null | 961 |
| 13 | Jack | null | null |
| 14 | null | null | 31 |
| 14 | Jerry | null | null |
-------------------------------------------------------------------------
Result needed
-------------------------------------------------------------------------
|user_id | params.string_value | params.int_value | params.double_value |
-------------------------------------------------------------------------
| 12 | Tom | null | 121 |
| 12 | Kim | null | 141 |
| 13 | Jack | null | 961 |
| 14 | Jerry | null | 31 |
-------------------------------------------------------------------------
There can be multiple data for same user_id but with different params.string_value | params.int_value | params.double_value
I want to merge all the data which has same user_id in Big Query
Below is for BigQuery Standard SQL
#standardSQL
SELECT user_id, STRUCT(string_value, int_value, double_value) params
FROM (
SELECT user_id,
ARRAY_AGG(params.string_value IGNORE NULLS) string_values,
ARRAY_AGG(params.int_value IGNORE NULLS) int_values,
ARRAY_AGG(params.double_value IGNORE NULLS) double_values
FROM `project.dataset.table`
GROUP BY user_id
)
LEFT JOIN UNNEST(string_values) string_value WITH OFFSET
LEFT JOIN UNNEST(int_values) int_value WITH OFFSET USING(OFFSET)
LEFT JOIN UNNEST(double_values) double_value WITH OFFSET USING(OFFSET)
If to apply to sample data from your question
WITH `project.dataset.table` AS (
SELECT 12 user_id, STRUCT<string_value STRING, int_value INT64, double_value FLOAT64>(NULL, NULL, 121) AS params UNION ALL
SELECT 12, STRUCT('Tom', NULL, NULL) UNION ALL
SELECT 12, STRUCT(NULL, NULL, 141) UNION ALL
SELECT 12, STRUCT('Kim', NULL, NULL) UNION ALL
SELECT 13, STRUCT(NULL, NULL, 961) UNION ALL
SELECT 13, STRUCT('Jack', NULL, NULL) UNION ALL
SELECT 14, STRUCT(NULL, NULL, 31) UNION ALL
SELECT 14, STRUCT('Jerry', NULL, NULL)
)
result is
Row user_id params.string_value params.int_value params.double_value
1 12 Tom null 121.0
2 12 Kim null 141.0
3 13 Jack null 961.0
4 14 Jerry null 31.0
You can use the MAX function :
SELECT user_id,
MAX(params.string_value) as string_value,
MAX(params.int_value) as int_value,
MAX(params.double_value) as double_value
FROM your_dataset.your_table
GROUP BY user_id
MAX do not consider NULL values. Neither do MIN so you can also use this one !

SQL - Rows that are repetitive with a particular condition

We have a table like this:
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 3 | Steve | SomeService3 | | | | 2 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 4 | Steve | SomeService4 | | | | 12 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
Every digit in zones is a tooth (dental science) and it means "John" has got "SomeService1" twice for tooth #3.
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
Note that Steve has received services twice for tooth #2 (4th Zone) but services are not one.
I'd write some code that gives me a table with duplicate rows (Checking the only patient and received service)(using "group by" clause") but I need to check zones too.
I've tried this:
select ROW_NUMBER() over(order by vv.ID_sick) as RowNum,
bb.Radif,
bb.VCount as 'Count',
vv.ID_sick 'ID_Sick',
vv.ID_service 'ID_Service',
sick.FNamesick + ' ' + sick.LNamesick as 'Sick',
serv.NameService as 'Service',
vv.Mab_Service as 'MabService',
vv.Mab_daryafti as 'MabDaryafti',
vv.datevisit as 'DateVisit',
vv.Zone1,
vv.Zone2,
vv.Zone3,
vv.Zone4,
vv.ID_dentist as 'ID_Dentist',
dent.FNamedentist + ' ' + dent.LNamedentist as 'Dentist',
vv.id_do as 'ID_Do',
do.FNamedentist + ' ' + do.LNamedentist as 'Do'
from visiting vv inner join (
select ROW_NUMBER() OVER(ORDER BY a.ID_sick ASC) AS Radif,
count(a.ID_sick) as VCount,
a.ID_sick,
a.ID_service
from visiting a
group by a.ID_sick, a.ID_service, a.Zone1, a.Zone2, a.Zone3, a.Zone4
having count(a.ID_sick)>1)bb
on vv.ID_sick = bb.ID_sick and vv.ID_service = bb.ID_service
left join InfoSick sick on vv.ID_sick = sick.IDsick
left join infoService serv on vv.ID_service = serv.IDService
left join Infodentist dent on vv.ID_dentist = dent.IDdentist
left join infodentist do on vv.id_do = do.IDdentist
order by bb.ID_sick, bb.ID_service,vv.datevisit
But this code only returns rows with all tooths repeated. What I want is even one tooth repeats ...
How can I implement it?
I need to check characters in zones.
**Zone's datatype is varchar
This is a bad datamodel for what you are trying to do. By storing the teeth as a varchar, you have kind of decided that you are not interested in single teeth, but only in the group of teeth. Now, however, you are trying to investigate on single teeth.
You'd want a datamodel like this:
service
+------------+--------+-----------------+
| service_id | Name | RecievedService |
+------------+--------+-----------------+
| 1 | John | SomeService1 |
+------------+--------+-----------------+
| 3 | Steve | SomeService3 |
+------------+--------+-----------------+
| 4 | Steve | SomeService4 |
+------------+-------+-----------------+
service_detail
+------------+------+-------+
| service_id | zone | tooth |
+------------+------+-------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 3 | 4 |
+------------+------+-------+
| 1 | 1 | 3 |
| 1 | 1 | 4 |
+------------+------+-------+
| 3 | 4 | 2 |
+------------+------+-------+
| 4 | 4 | 1 |
| 4 | 4 | 2 |
+------------+------+-------+
What you can do with the given datamodel is to create such table on-the-fly using a recursive query and string manipulation:
with unpivoted(service_id, name, zone, teeth) as
(
select recievedservice, name, 1, firstzoneteeth
from mytable where len(firstzoneteeth) > 0
union all
select recievedservice, name, 2, secondzoneteeth
from mytable where len(secondzoneteeth) > 0
union all
select recievedservice, name, 3, thirdzoneteeth
from mytable where len(thirdzoneteeth) > 0
union all
select recievedservice, name, 4, fourthzoneteeth
from mytable where len(fourthzoneteeth) > 0
)
, service_details(service_id, name, zone, tooth, teeth) as
(
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from unpivoted
union all
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from service_details
where len(teeth) > 0
)
, duplicates(service_id, name) as
(
select distinct service_id, name
from service_details
group by service_id, name, zone, tooth
having count(*) > 1
)
select m.*
from mytable m
join duplicates d on d.service_id = m.recievedservice and d.name = m.name;
A lot of work and a rather slow query due to a bad datamodel, but still feasable.
Rextester demo: http://rextester.com/JVWK49901

Horizontal Count SQL

I apologize if this is a duplicate question but I could not find my answer.
I am trying to take data that is horizontal, and get a count of how many times a specific number appears.
Example table
+-------+-------+-------+-------+
| Empid | KPI_A | KPI_B | KPI_C |
+-------+-------+-------+-------+
| 232 | 1 | 3 | 3 |
| 112 | 2 | 3 | 2 |
| 143 | 3 | 1 | 1 |
+-------+-------+-------+-------+
I need to see the following:
+-------+--------------+--------------+--------------+
| EmpID | (1's Scored) | (2's Scored) | (3's Scored) |
+-------+--------------+--------------+--------------+
| 232 | 1 | 0 | 2 |
| 112 | 0 | 2 | 1 |
| 143 | 2 | 0 | 1 |
+-------+--------------+--------------+--------------+
I hope that makes sense. Any help would be appreciated.
Since you are counting data across multiple columns, it might be easier to unpivot your KPI columns first, then count the scores.
You could use either the UNPIVOT function or CROSS APPLY to convert your KPI columns into multiple rows. The syntax would be similar to:
select EmpId, KPI, Val
from yourtable
cross apply
(
select 'A', KPI_A union all
select 'B', KPI_B union all
select 'C', KPI_C
) c (KPI, Val)
See SQL Fiddle with Demo. This gets your multiple columns into multiple rows, which is then easier to work with:
| EMPID | KPI | VAL |
|-------|-----|-----|
| 232 | A | 1 |
| 232 | B | 3 |
| 232 | C | 3 |
| 112 | A | 2 |
Now you can easily count the number of 1's, 2's, and 3's that you have using an aggregate function with a CASE expression:
select EmpId,
sum(case when val = 1 then 1 else 0 end) Score_1,
sum(case when val = 2 then 1 else 0 end) Score_2,
sum(case when val = 3 then 1 else 0 end) Score_3
from
(
select EmpId, KPI, Val
from yourtable
cross apply
(
select 'A', KPI_A union all
select 'B', KPI_B union all
select 'C', KPI_C
) c (KPI, Val)
) d
group by EmpId;
See SQL Fiddle with Demo. This gives a final result of:
| EMPID | SCORE_1 | SCORE_2 | SCORE_3 |
|-------|---------|---------|---------|
| 112 | 0 | 2 | 1 |
| 143 | 2 | 0 | 1 |
| 232 | 1 | 0 | 2 |