I am working on Google ads data hub and am relatively new to bigquery (sql in general).
In a certain case, I am trying to do a dynamic group by based on a parameter input (I believe the parameter needs to be an array?)
So if say my table has four fields (two are numerical and two are string)
And I want a scenario where I am doing a group by on either one of the string variable or both of them together
So based on my paramter input, my data will be grouped by on the selected input and a certain mathematical operation would be applied to the numerical data (say average)
Currently, I can just make combinations of different string columns (lets say I have age and name as my columns so I have one age column, one name column and one age_name column) but this workaround falls apart when my string columns go beyond a number since there will be way too many combinations possible.
Is there a way to dynamically provide what columns I want to group by my table on and select those columns and numerical values (post aggregation) in my output?
Sample Input :
Pokemon
Location
Value
Jolteon
City1
230
Jolteon
City2
210
Umbreon
City2
240
Umbreon
City2
180
Umbreon
City3
180
Espeon
City3
260
Espeon
City3
100
Espeon
City4
300
If I supply #hierarchy paramter as a string array. The aggregation logic is average here.
Output if input is "pokemon"
Pokemon
Value
Jolteon
220
Umbreon
200
Espeon
220
Output if input is "pokemon,location"
Pokemon
Location
Value
Jolteon
City1
230
Jolteon
City2
210
Umbreon
City2
210
Umbreon
City3
180
Espeon
City3
180
Espeon
City4
300
Note: Please note that this is just in the case of 2 fields but if my potential paramater input can have more than 3,4 factors and final output have data grouped by those 3,4 factors and then the pre decided aggregation logic on the numerical fields.
Try EXECUTE_IMMEDIATE:
DECLARE columns STRING;
CREATE TEMPORARY TABLE mytable as
select "Jolteon" as pokemon, "City1" as location, 230 as value union all
select "Jolteon", "City2", 210 union all
select "Umbreon", "City2", 240 union all
select "Umbreon", "City2", 180 union all
select "Umbreon", "City3", 180 union all
select "Espeon", "City3", 260 union all
select "Espeon", "City3", 100 union all
select "Espeon", "City4", 300
;
SET columns="pokemon, location";
EXECUTE IMMEDIATE "SELECT " || columns || ", AVG(value) as value FROM mytable GROUP BY " || columns;
If I supply #hierarchy parameter as a string array ...
Consider below
DECLARE hierarchy ARRAY<STRING>;
SET hierarchy = ['pokemon', 'location'];
EXECUTE IMMEDIATE 'SELECT ' || ARRAY_TO_STRING(hierarchy, ',') ||
', AVG(value) as value FROM `project.dataset.table` GROUP BY ' || ARRAY_TO_STRING(hierarchy, ',');
If applied to sample data in your question - output is
In case of SET hierarchy = ['pokemon']; - output is
Related
I want to compare two varchar2 fields and based on the percentage similarity to get this percentage as a result in my function, and the ID of this record from this table.
I have the table (SYMPTOMS), I also have the field Symptom_Descr (VARCHAR2) of this table and the variable v_symptom (varchar2) and I want to compare this variable with this field.
For example, this is my table:
The variable that I want to compare is:
'flashes the room lights 5 times'
I want as a result=
id
1
0%
2
0%
3
90%
Another example if the variable is 'washes her hands 7 times':
id
1
80%
2
0%
3
10%
The above percentages are not exact.
If the above cannot be done, then what can I do to find the similarities?
You can use the UTL_MATCH package:
SELECT id,
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(
symptom_descr,
'flashes the room lights 5 times'
) AS ed_similarity,
UTL_MATCH.JARO_WINKLER_SIMILARITY(
symptom_descr,
'flashes the room lights 5 times'
) AS jw_similarity
FROM symptoms;
Which, for the sample data:
CREATE TABLE symptoms (id, symptom_descr) AS
SELECT 1, 'washes his hands constantly' FROM DUAL UNION ALL
SELECT 2, 'checks several times if he turned off the water heater' FROM DUAL UNION ALL
SELECT 3, 'flashes the room lights too many times' FROM DUAL;
Outputs:
ID
ED_SIMILARITY
JW_SIMILARITY
1
30
62
2
25
62
3
79
93
db<>fiddle here
I have a table like below
ID NUMBER 1 NUMBER 2 NUMBER 3 LOC
1-14H-4950 0616167 4233243 CA
A-522355 1234567 TN
A-522357 9876543 WY
A-522371 1112223 WA
A-522423 1234567 2345678 1234567 NJ
A-A-522427 9876543 6249853 6249853 NJ
and I have a bunch of values (1234567, 9876543, 0616167, 1112223, 999999...etc) which will be used in where clause, if a value from where clause found in one of the three Number columns (Number 1 or Number 2 Number 3) then I will have to write that to output1 (its like VLOOKUP of Excel).
If the value is found in more than one of the three columns then it will be different output2 with a flag as MultipleMatches. If the value is not found in any of the three columns then it should be in Output2 with flag as No Match. I tried using self join and or clauses, but not able to get what I want.
I want to write the SQL to generate both outputs. Outputs will include all the columns from the above table. For eg:
Output 1 from above sample data will look like
ID NUMBER 1 NUMBER 2 NUMBER 3 LOC
1-14H-4950 0616167 4233243 CA
A-522371 1112223 WA
Output 2 will be like:
ID NUMBER 1 NUMBER 2 NUMBER 3 LOC Flag
A-522423 1234567 2345678 1234567 NJ Multiple Match
A-A-522427 9876543 6249853 6249853 NJ Multiple Match
1234 No Match
I want to write the SQL to generate both outputs.
One SELECT operator cannot produce two output sets.
The main question is, why split the output when that the difference is only in the FLAG column? If you really need two different output of the result, then you can do this:
(Rightly) create a common cursor for the query, where the FLAG column will be calculated and split the output screens already in the UI.
drop table test_dt;
create table test_dt as
select '1-14h-4950' id,null num1,616167 num2,4233243 num3,'ca' loc from dual
union all
select 'a-522355',null ,1234567,null,'tn' from dual union all
select 'a-522357',null ,9876543,null,'wy' from dual union all
select 'a-522371',null ,1112223,null,'wa' from dual union all
select 'a-522423',1234567,2345678,1234567,'nj' from dual union all
select 'a3-522423',null,null,null,'nj' from dual union all
select 'a-a-522427',9876543,6249853,6249853,'nj' from dual;
--
select
d.*,
case when t.cc_ndv=0 and t.cc_null=3 then 'Not matching'
when t.cc_ndv=(3-t.cc_null) then 'Once'
else 'Multiplay match'
end flag
--t.cc_ndv,
--t.cc_null
from test_dt d ,lateral(
select
count(distinct case level when 1 then num1
when 2 then num2
when 3 then num3
end ) cc_ndv,
count(distinct case level when 1 then nvl2(num1,null,1)
when 2 then nvl2(num2,null,2)
when 3 then nvl2(num3,null,3)
end ) cc_null
from dual connect by level<=3 and sys_guid()is not null
) t;
Or
create a procedure(see to dbms_sql.return_result) that returns a some data sets.
Process these data of cursors / datasets separately.
Name Gender Amount
Ram male 20.56
Bhavna female 78.2
darshan male 12.02
Avni female 50.366
I want to divide the Amount Column in 2 parts where one Column includes the before decimal value (i.e 20.56=20) And Second column includes after decimal value (i.e 20.56=56)...
-- check this query
select amount, decode (pos,0,amount,substr(amount,1,pos-1)) as before_decimal ,
decode(pos,0,0,substr(amount,pos+1,length(amount))) as after_decimal
from (
select instr((substr(amount,1,length(amount))),'.') as pos,amount
from table_name
)
you can get numbers using FORMAT:
FORMAT(your_number,xxxxx) --you can choose xxxxx whatever you want
usage: FORMAT (N, D)
You can look how to use it : https://www.w3resource.com/mysql/string-functions/mysql-format-function.php
You can use this query to get your expected output like,
Amount is : 20.56
To get '20' as a output we can use this query
SELECT FLOOR(20.56) FROM TABLE_NAME
& To get exact '56' as a output we can use this query
SELECT FLOOR((20.56 - FLOOR(20.56))*100) FROM TABLE_NAME
If you want them in separate columns, you can use arithmetic functions:
select t.*, floor(val) as col_left, floor(val * 100) % 100 as col_right
from (select 20.56 as val) t
I have a single table in Postgres (with PostGIS) in which I have a few thousand rows, each with the name(text), code(text), city(text) and geom(geometry) of a statistical area unit (like a county). The geometries tessellate perfectly.
I'm trying to write a single query that will select all the rows that meet some criteria, and aggregate the rest into a single row. For that single row, I'm not interested in the name and code, just the aggregated geom. E.g., something like this:
code | name | geom
------+--------------------+---------
102 | Central Coast | geo...
115 | Sydney - Baulkham | geo...
116 | Sydney - Blacktown | geo...
117 | Sydney - City | geo...
118 | Sydney - Eastern | geo...
000 | Remaining Counties | geo... <---Second SELECT just to get this row
I used this answer to come up with the following:
SELECT code, name, ST_Force2D(geom) FROM mytable WHERE mytable.city = 'Greater Sydney' UNION
SELECT
CASE WHEN count(code) > 0 THEN '0' ELSE 'x' END AS code,
CASE WHEN count(name) > 0 THEN 'Rest of Sydney' ELSE 'x' END AS name,
ST_Collect(geom)
FROM mytable WHERE mytable.city <> 'Greater Sydney';
This seems like a really roundabout and unclear way of accomplishing something pretty simple. Is there a better way to do this?
You can hard-code what you want those cells to populate with. I believe in postgres you do this with ticks:
SELECT code, name, ST_Force2D(geom)
FROM mytable
WHERE mytable.city = 'Greater Sydney'
UNION
SELECT '0', 'Remaining Countries', ST_Collect(geom)
FROM mytable
WHERE mytable.city <> 'Greater Sydney';
The answer you found compensates for zero values and replaces them with X. If you prefer to see Xs in that case, you can do it that way instead. Seems unnecessary to me.
I don't know if this would work in PostGreSQL since I haven't worked with it in years and never used PostGIS, but you could try:
SELECT
CASE WHEN city = 'Greater Sydney' THEN code ELSE '000' END AS code,
CASE WHEN city = 'Greater Sydney' THEN name ELSE 'Remaining Counties' END AS name,
ST_Collect(geom) AS geom
FROM
MyTable
GROUP BY
CASE WHEN city = 'Greater Sydney' THEN code ELSE '000' END,
CASE WHEN city = 'Greater Sydney' THEN name ELSE 'Remaining Counties' END
Since ST_Collect is an aggregate function, if it aggregates over a single row then it should just return the geometry of that one row anyway. You could enclose that in ST_Force2d if that's necessary, but I'm not sure that it is.
I have a table with roads that contain mileage of start/end of every road.
I need to analyze this data and get query with same data more rows that contain mileage of start/end of gaps between roads with filled column name with value 'gap'.
Initial table:
id name kmstart kmend
1 road1 0 150
2 road2 150 200
3 road3 220 257
4 road4 260 290
Result query:
id name kmstart kmend
1 road1 0 150
2 road2 150 200
null gap 200 220
3 road3 220 257
null gap 257 260
4 road4 260 290
Try this query:
SELECT NULL, 'gap', previous_kmend AS kmstart, kmstart AS kmend
FROM (
SELECT id, name, kmstart, kmend, LAG(kmend) OVER (ORDER BY kmstart, kmend) AS previous_kmend
FROM roads
)
WHERE previous_kmend < kmstart
UNION ALL
SELECT id, name, kmstart, kmend
FROM roads
ORDER BY kmstart, kmend
I just put up a quick test and it works for me.
It uses the LAG function to get the previous kmend row, and then returns the "gap" row if it is less than the current record's kmstart row. I've written an article about the LAG function recently so it was helpful to remember.
Is this what you're after?
Also, as the other commenters have mentioned, "name" isn't a good column name as it's a reserved word. I've left it here in the code so it's consistent with your question though.