Add a column to BigQuery results that adds a description for an ID in a column from the results - google-bigquery

I am using BQ to pull some data and I need to add a column to the results that includes a lookup.
SELECT
timestamp_trunc(a.timestamp,day) date,
a.custom_parameter1,
a.custom_parameter2,
a.score,
a.type,
b.ref
FROM
`data-views_batch_20221021` a
left outer join (select client_uuid,STRING_AGG(document_referrer, "," LIMIT 1) ref from `activities_batch_20221021` where app_id="12345" and document_referrer is not null group by client_uuid) b using (client_uuid)
WHERE
a.app_id="12345"
How can I add a column that takes an array in a.type and looks up each value in the dict. I currently do this in Python and it looks up the values in a dict but I want to include it in the query.
The dict is:
{23:"Description1", 24:"Description2", 25:"Description3"}
I don't have these values in a table within BQ, can I include it within the query? There are about 14 total descriptions to map.
My end result would look like this:
date | custom_parameter1 | customer_paramter2 | score | types | ref | type_descriptions
Edited to add that types is an array.

I don't have these values in a table within BQ, can I include it within the query?
Yes, you can have them as CTE as in below example
with dict as (
select 23 type, "description1" type_description union all
select 24, "description2" union all
select 25, "description3"
)
select
timestamp_trunc(a.timestamp,day) date,
a.custom_parameter1,
a.custom_parameter2,
a.score,
a.type,
b.ref,
type_description
from `data-views_batch_20221021` a
left outer join (
select client_uuid, string_agg(document_referrer, "," limit 1) ref
from `activities_batch_20221021`
where app_id="12345" and document_referrer is not null
group by client_uuid
) b using (client_uuid)
left join dict using (type)
where a.app_id="12345"
There are about 14 total descriptions to map
You can add to dict CTE as many as you need

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs
You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.
A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

Hive - Reformat data structure

So I have a sample of Hive data:
Customer
xx_var
yy_var
branchflow
{"customer_no":"239230293892839892","acct":["2324325","23425345"]}
23
3
[{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]
And I want to transform it into something like this:
Customer_no
acct
xx_var
yy_var
branchflow
239230293892839892
2324325
23
3
[1,2,3,4,5,6,6,6,4]
239230293892839892
23425345
23
3
[1,2,3,4,5,6,6,6,99,4]
I have tried using this query, but getting the wrong output format.
SELECT
customer.customer_no,
acct,
xx_var,
yy_var,
bi_acctno,
values_bi
FROM
struct_test
LATERAL VIEW explode(customer.acct) acct AS acctno
LATERAL VIEW explode(brancflow.acctno) bia as bi_acctno
LATERAL VIEW explode(brancflow.value) biv as values_bi
WHERE bi_acctno = acctno
Does anyone know how to approach this problem?
Use json_tuple to extract JSON elements. In case of array, it returns it also as string: remove square brackets, split and explode. See comments in the demo code.
Demo:
with mytable as (--demo data, use your table instead of this CTE
select '{"customer_no":"239230293892839892","acct":["2324325","23425345"]}' as customer,
23 xx_var, 3 yy_var,
'[{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]' branchflow
)
select c.customer_no,
a.acct,
t.xx_var, t.yy_var,
get_json_object(b.acct_branchflow,'$.value') value
from mytable t
--extract customer_no and acct array
lateral view json_tuple(t.customer, 'customer_no', 'acct') c as customer_no, accts
--remove [] and " and explode array of acct
lateral view explode(split(regexp_replace(c.accts,'^\\[|"|\\]$',''),',')) a as acct
--remove [] and explode array of json
lateral view explode(split(regexp_replace(t.branchflow,'^\\[|\\]$',''),'(?<=\\}),(?=\\{)')) b as acct_branchflow
--this will remove duplicates after lateral view: need only matching acct
where get_json_object(b.acct_branchflow,'$.acctno') = a.acct
Result:
customer_no acct xx_var yy_var value
239230293892839892 2324325 23 3 [1,2,3,4,5,6,6,6,4]
239230293892839892 23425345 23 3 [1,2,3,4,5,6,6,6,99,4]

Return a NULL value if Date not in CTE

I have a query that counts the number of records imported for every day according to the current date. The only problem is that the count only returns when records have been imported and NULLS are ignored
I have created a CTE with one column in MSSQL that lists dates in a certain range e.g. 2019-01-01 - today.
The query that i've currently got is like this:
SELECT TableName, DateRecordImported, COUNT(*) AS ImportedRecords
FROM Table
WHERE DateRecordImported IN (SELECT * FROM DateRange_CTE)
GROUP BY DateRecordImported
I get the results fine for the dates that exist in the table for example:
TableName DateRecordImported ImportedRecords
______________________________________________
Example 2019-01-01 165
Example 2019-01-02 981
Example 2019-01-04 34
Example 2019-01-07 385
....
but I need a '0' count returned if the date from the CTE is not in the Table. Is there a better alternative to use in order to return a 0 count or does my method need altering slightly
You can do LEFT JOIN :
SELECT C.Date, COUNT(t.DateRecordImported) AS ImportedRecords
FROM DateRange_CTE C LEFT JOIN
table t
ON t.DateRecordImported = C.Date -- This may differ use actual column name instead
GROUP BY C.Date; -- This may differ use actual column name instead
Move the position of the CTE from a subquery to the FROM:
SELECT T.TableName,
DT.PCTEDateColumn} AS DateRecordImported,
COUNT(T.{TableIDColumn}) AS ImportedRecords
FROM DateRange_CTE DT
LEFT JOIN [Table] T ON DT.{TEDateColumn} = T.DateRecordImported
GROUP BY DT.{CTEDateColumn};
You'll need to replace the values in braces ({})
You can try this
SELECT TableName, DateRecordImported,
case when DateRecordImported is null
then '0'
else count(*) end AS ImportedRecords
FROM Table full join DateRange_CTE
on Table.DateRecordImported DateRange_CTE.ImportedDate
group by DateRecordImported,ImportedDate
(ImportedDate is name of column of CTE)

Conditional Output in SQL

I have a table (lets call it "Items") that contains a set of names, groups and statuses. For example
Name | Group | Status
FF A ON
GG A OFF
HH A UNKN
ZZ B ON
YY B OFF
I am trying to aggregate the status of all records in a given group, by taking the most relevant status (in order by relevance: UNKN, OFF, ON).
Edit 1: These statuses are only examples, and their names and orders could change in my application, so that should be configurable in the query.
For example, if I query for the overall status of Group A, the status should be UNKN, and if I query for Group B, the status should be OFF.
Edit 2: It is possible that there are multiples of the same status for a group. For example two records that are UNKN.
The query I have managed is to select all items from a group. For example Group A:
SELECT Items.[Group], Items.[Status]
FROM Items
WHERE (((Items.[Group])="A"));
Produces:
Name | Group | Status
FF A ON
GG A OFF
HH A UNKN
but I can't boil it down to the single most relevant status for every group. I have tried to use CASE WHEN and IF EXISTS but I can't get it to work. Any input?
Edit 3:
As an example of the desired output for the overall group status:
Group | OverallStatus
A UNKN
B OFF
If you can build another table, a simple solution would be:
Add another table with the values in the order you want.
Then, just build a query like this:
SELECT TOP 1 Table1.*, Table2.VALUE
FROM Table1 INNER JOIN Table2 ON Table1.status = Table2.STATUS
WHERE Table1.group="A"
ORDER BY Tabla2.VALUE DESC
If the status changes or are added new ones, or you need a new order, just refresh the new table.
EDIT
Acording to the new info by OP, the query can be write in another way. The previous query take into account showing all the record in table1.
If you only need the group and the "max" status, you can use something like this:
SELECT A.group, Table2.STATUS
FROM (SELECT Table1.group, Max(Table2.VALUE) AS MaxVALUE
FROM Table1 INNER JOIN Table2 ON Tabla1.status = Table2.STATUS
GROUP BY Table1.group) as A INNER JOIN Table2 ON A.MaxVALUE= Table2.VALUE;
Use conditional aggregation and some other logic:
select grp,
switch(sum(iif(status = "UNK", 1 0) > 0, "UNK", -- any unknowns
sum(iif(status = "OFF", 1, 0) > 0, "OFF", -- any offs
"ON"
) as group_status
from items
group by grp;
This counts the number of each status and then uses that to determine the overall group status. Your question is not really explicit about the rules, but I think these capture what you are trying to do. It should be easy enough to modify for other rules.
Using the "length" as gauge, this should fit Access SQL:
Select *
From Items
Where Items.Name = (
Select Top 1 T.Name
From Items As T
Where T.Group = Items.Group
Order By Len(T.Status) Desc)
Assuming that the status column will have only three distinct values as shown in data example, you can try below query:
SELECT *
FROM ITEMS
WHERE (GROUP,LENGTH(STATUS)) IN (
SELECT GROUP,MAX(LENGTH(STATUS))
FROM ITEMS
GROUP BY GROUP)
Thanks,
Amitabh

Order by data as per supplied Id in sql

Query:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
Right now the data is ordered by the auto id or whatever clause I'm passing in order by.
But I want the data to come in sequential format as per id's I have passed
Expected Output:
All Data for 1247881
All Data for 174772
All Data for 808454
All Data for 2326154
Note:
Number of Id's to be passed will 300 000
One option would be to create a CTE containing the ration_card_id values and the orders which you are imposing, and the join to this table:
WITH cte AS (
SELECT 1247881 AS ration_card_id, 1 AS position
UNION ALL
SELECT 174772, 2
UNION ALL
SELECT 808454, 3
UNION ALL
SELECT 2326154, 4
)
SELECT t1.*
FROM [MemberBackup].[dbo].[OriginalBackup] t1
INNER JOIN cte t2
ON t1.ration_card_id = t2.ration_card_id
ORDER BY t2.position DESC
Edit:
If you have many IDs, then neither the answer above nor the answer given using a CASE expression will suffice. In this case, your best bet would be to load the list of IDs into a table, containing an auto increment ID column. Then, each number would be labelled with a position as its record is being loaded into your database. After this, you can join as I have done above.
If the desired order does not reflect a sequential ordering of some preexisting data, you will have to specify the ordering yourself. One way to do this is with a case statement:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
ORDER BY CASE ration_card_id
WHEN 1247881 THEN 0
WHEN 174772 THEN 1
WHEN 808454 THEN 2
WHEN 2326154 THEN 3
END
Stating the obvious but note that this ordering most likely is not represented by any indexes, and will therefore not be indexed.
Insert your ration_card_id's in #temp table with one identity column.
Re-write your sql query as:
SELECT a.*
FROM [MemberBackup].[dbo].[OriginalBackup] a
JOIN #temps b
on a.ration_card_id = b.ration_card_id
order by b.id