Using Snowflake SQL how do you find two records and then change another one based on those records to a predefined record using a local variable? - sql

Using SQL how do you use two records to find a place, hold onto that place and use that record to replace 'Nonsense' value with that held onto place? I am going to show what I have been able to write so far, but then write out what I am still trying to figure out:
SELECT * FROM "TABLES". "ACCTS_OF_SUPERHEROS".;
DECLARE #count_rows INT = 0;
DECLARE #row_total INT = 0;
DECLARE #refAcctNum INT = 0;
DECLARE #selectedPlaceName TINYTEXT;
SET #row_total = SELECT COUNT (*)
WHILE countRows < row_total
for each acct_num store value in refAcctNum.
Using refAcctNum find place: "Gotham City", "Central City", "Metropolis", "Smallville", "Star City", "Fawcett City" store that in selectedPlaceName.
If refAccountNumber has Nonsense then replace with selectedPlaceName record
otherwise add + 1 to countRows and repeat.
END
Current table data; "ACCTS_OF_SUPERHEROS" table:
| row | acct_num | exact_address | place
| --- | -------- |------------------|--------
| 1 | 049403 | 344 Clinton Str | Metropolis
| 2 | 049403 | 344 Clinton Str | Nonsense
| 3 | 049206 | 1007 Mountain Dr | Gotham City
| 4 | 049206 | 1007 Mountain Dr | Gotham City
| 5 | 049206 | 1096 Show Dr. | Fawcett City
| 6 | 049206 | 1096 Show Dr. | Nonsense
| 7 | 049206 | NULL | Nonsense
| 8 | 049291 | 1938 Sullivan Pl | Smallville
| 9 | 049293 | 700 Hamilton Str | Central City
| 10 | 049396 | 800 Nonsense Way | Nonsense
| 11 | 049396 | NULL | Nonsense
Desired output:
| row | acct_num | exact_address | place
| --- | -------- |------------------|--------
| 1 | 049403 | 344 Clinton Str | Metropolis
| 2 | 049403 | 344 Clinton Str | Metropolis
| 3 | 049206 | 1007 Mountain Dr | Gotham City
| 4 | 049206 | 1007 Mountain Dr | Gotham City
| 5 | 049206 | 1096 Show Dr. | Fawcett City
| 6 | 049206 | 1096 Show Dr. | Fawcett City
| 7 | 049206 | NULL | Fawcett City
| 8 | 049291 | 1938 Sullivan Pl | Smallville
| 9 | 049293 | 700 Hamilton Str | Central City
| 10 | 049396 | 800 Tidal Way | Star City
| 11 | 049396 | NULL | Star City

You can use window functions:
select t.*,
max(case when place <> 'Nonsense' then place end) over (partition by acct_num) as imputed_place
from t;
This returns NULL if all the rows are 'Nonsense' for a given acct_num. You can use COALESCE() to replace the value with something else.

I was reading through the available list of window functions in Snowflake and think you're going to need a new window function for this. Perhaps someone can find a more built-in way, but anyway here's a user defined table function REPLACE_WITH_LKG implemented as a window function that will replace a bad value with the last known good value. As long as I was going to write it, I thought it may as well be general purpose, so it matches "bad" values using a regular expression and JavaScript RegExp options.
create or replace function REPLACE_WITH_LKG("VALUE" string, "REGEXP" string, "REGEXP_OPTIONS" string)
returns table(LKG_VALUE string)
language javascript
strict immutable
as
$$
{
initialize: function (argumentInfo, context) {
this.lkg = "";
},
processRow: function (row, rowWriter, context) {
const rx = new RegExp(row.REGEXP, row.REGEXP_OPTIONS);
if (!rx.test(row.VALUE)) {
this.lkg = row.VALUE;
}
rowWriter.writeRow({LKG_VALUE: this.lkg});
},
finalize: function (rowWriter, context) {},
}
$$;
select S.*, LKG.LKG_VALUE as PLACE
from superhero S, table(REPLACE_WITH_LKG(PLACE, 'Nonsense', 'ig')
over(partition by null order by "ROW")) LKG;
;
A note on performance; the way the data shows this the're no partition other than the entire table. That's because the one obvious place to partition, by account, won't work. Row 10 is getting its value from what would be a different window if using account, so the way the sample data appears it needs to be a window that spans the entire table. This will not parallelize well and should be avoided for very large tables.

Related

How to search using a delimited string as array in query

I am trying to search for records columns that match a value within a delimited string.
I have two tables that look like this
Vehicles
| Id | Make | Model |
|----|------|-------|
| 1 | Ford | Focus |
| 2 | Ford | GT |
| 3 | Ford | Kuga |
| 4 | Audi | R8 |
Monitor
| Id | Makes | Models |
|----|-------|----------|
| 1 | Ford | GT,Focus |
| 2 | Audi | R8 |
What I'm trying to achieve is the following:
| Id | Makes | Models | Matched_Count |
|----|-------|----------|---------------|
| 2 | Audi | R8 | 1 |
| 1 | Ford | GT,Focus | 2 |
Using the following query I can get matches on singular strings, but I'm not sure how I can split the commas to search for individual models.
select Id, Makes, Models, (select count(id) from Vehicles va where UPPER(sa.Makes) = UPPER(va.Make) AND UPPER(sa.Models) = UPPER(va.Model)) as Matched_Count
from Monitor sa
(I am using a very SQL Server 2016 however I do not have access to create custom functions or variables)
If you are stuck with this data model, you can use string_split():
select m.*, v.matched_count
from monitor m outer apply
(select count(*) as matched_count
from string_split(m.models, ',') s join
vehicles v
on s.value = v.model and m.makes = v.makes
) v;
I would advise you to put your efforts into fixing the data model, though.
Here is a db<>fiddle.

Count numbers in single row - SQL

is it possible to return count of values in single row?
For example this is test table and I want to count of daily_typing_pages
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 3 | Jack | 2007-04-06 | 100 |
| 4 | Jill | 2007-04-06 | 220 |
| 5 | Zara | 2007-06-06 | 300 |
| 5 | Zara | 2007-02-06 | 350 |
+------+------+------------+--------------------+
Result of this count should be : 1610 how ever if I simply count() AROUND it return:
SQL>SELECT COUNT(daily_typing_pages) FROM employee_tbl ;
+---------------------------+
| COUNT(daily_typing_pages) |
+---------------------------+
| 7 |
+---------------------------+
1 row in set (0.01 sec)
So it return number of rows instead of count single row.
Is there some way how to do things like I want without using external programming language which will count it for me?
Thanks
You want SUM instead of COUNT. COUNT merely counts the number of records, you want them summed.
You didn't mention your DBMS, but see for example, for sql server this
Did you mean you want to summarize alle numbers of daily_typing_pages ?
So you can use sum(daily_typing_pages):
SELECT SUM(daily_typing_pages) FROM employee_tbl

Updating a column in PL/SQL

(Using PL/SQL anonymous program block)
I have a table tblROUTE2 of Mexican state highways:
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
| TYPE | ADMN_CLASS | TOLL_RD | RTE_NUM1 | RTE_NUM2 | STATEROUTE | LENGTH_KM | STATE |
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
| Paved Undivided | Federal | N | 81 | | Tamaulipas Federal Hwy 81 | 124.551 | NULL |
| Paved Undivided | Federal | N | 130 | | Hidalgo Federal Hwy 130 | 76.347 | NULL |
| Paved Undivided | Federal | N | 130 | | Mexico Federal Hwy 130 | 68.028 | NULL |
+-----------------+------------+---------+----------+----------+----------------------------+-----------+--------+
and tblSTATE2 of Mexican states:
+------+-----------------------+---------+-----------+
| CODE | NAME | POP1990 | AREA_SQMI |
+------+-----------------------+---------+-----------+
| MX02 | Baja California Norte | 1660855 | 28002.325 |
| MX03 | Baja California Sur | 317764 | 27898.191 |
| MX18 | Nayarit | 824643 | 10547.762 |
+------+-----------------------+---------+-----------+
I need to update the STATE field in tblROUTE2 with the CODE field found in tblSTATE2, based on the route name in tblROUTE2. Basically, I need to somehow take the first string or two (some routes have two names)-- before the string 'Federal'-- of the STATEROUTE field in tblROUTE2 and make sure it matches with the string in the NAME field in tblSTATE2. Then since the states are matched with a CODE, update those codes in the STATE field of tblROUTE2.
I have started a code:
DECLARE
state_code
tblROUTE2.STATE%TYPE;
state_name
tblSTATE2.NAME%TYPE;
BEGIN
SELECT STATE, NAME
INTO state_code
FROM tblROUTE2 r, tblSTATE2 s
WHERE STATEROUTE LIKE '%Federal';
END;
As well, I will need to remove the state name from the route name. For example, the string in STATEROUTE 'Tamaulipas Federal Hwy' becomes 'Federal Hwy'. I have started a code, not sure if it's right:
UPDATE tblROUTE2
SET STATEROUTE = TRIM(LEADING FROM 'Federal');
Using MERGE update :
MERGE INTO tblROUTE2 A
USING
(
SELECT CODE, NAME FROM tblSTATE2
) B
ON
(
upper(SUBSTR(A.STATEROUTE, 0, INSTR(UPPER(A.STATEROUTE), UPPER('FEDERAL'))-2)) = upper(B.NAME)
)
WHEN MATCHED THEN UPDATE
SET A.STATE = B.CODE;
Here in FIDDLE I've replicated your tables and added additional record where STATEROUTE matches one of the records in NAME. Although Fiddle return an error, I ran it in my Oracle DB, and one record was updated correctly as the following screenshot:

Generate multiple rows for a binary number field?

Example data rows:
| ID | First Name | Last Name | Federal Race Code |
| 101 | Bob | Miller | 01010 |
| 102 | Daniel | Smith | 00011 |
The "Federal Race Code" field contains binary data, and each "1" is used to determine if a particular check box is set on a particular web form. E.g., the first bit is American Indian, second bit is Asian, third bit is African American, fourth is Pacific Islander, and the fifth is White.
I need to generate a separate row for each bit that is set to "1". So, given the example above, I need to generate output that looks like this:
| ID | First Name | Last Name | Mapped Race Name |
| 101 | Bob | Miller | Asian |
| 101 | Bob | Miller | African American |
| 102 | Daniel | Smith | Pacific Islander |
| 102 | Daniel | Smith | White |
Any tips or ideas on how to go about this?
You can do it with either 6 queries with UNION or one UNPIVOT clause.
In any case you should start by splitting that binary logic into 6 columns:
SELECT *,
CASE WHEN federal_race_code & 16 = 16 THEN 1 ELSE 0 END as NativeAmerican,
..
CASE WHEN federal_race_code & 1 = 1 THEN 1 ELSE 0 END as White
FROM myTable
Then UNION:
SELECT *, 'Native American' AS Race
FROM (<subquery>)
WHERE NativeAmerican = 1
UNION
...
UNION
SELECT *, 'White' AS Race
FROM (<subquery>)
WHERE White = 1
If you are on Oracle or SQL server use CTE.

How to add column with the value of another dimension?

I appologize if the title does not make sense. I am trying to do something that is probably simple, but I have not been able to figure it out, and I'm not sure how to search for the answer. I have the following MDX query:
SELECT
event_count ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
which returns something like this:
| | event_count |
+---------------+-------------+
| P Davis | 123 |
| J Davis | 123 |
| A Brown | 120 |
| K Thompson | 119 |
| R White | 119 |
| M Wilson | 118 |
| D Harris | 118 |
| R Thompson | 116 |
| Z Williams | 115 |
| X Smith | 114 |
I need to include an additional column (gender). Gender is not a metric. It's just another dimension on the data. For instance, consider this query:
SELECT
gender.children ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
But this is not what I want! :(
| | female | male | unknown |
+--------------+--------+------+---------+
| P Davis | | | 123 |
| J Davis | | 123 | |
| A Brown | | 120 | |
| K Thompson | | 119 | |
| R White | 119 | | |
| M Wilson | | | 118 |
| D Harris | | | 118 |
| R Thompson | | | 116 |
| Z Williams | | | 115 |
| X Smith | | | 114 |
Nice try, but I just want three columns: name, event_count, and gender. How hard can it be?
Obviously this reflects lack of understanding about MDX on my part. Any pointers to quality introductory material would be appreciated.
It's important to understand that in MDX you are building sets of members on each axis, and not specifying column names like a tabular rowset. You are describing a 2-dimensional grid of results, not a linear rowset. If you imagine each dimension as a table, the member set is the set of unique values from a single column in that table.
When you choose a Measure as the member (as in your first example), it looks as if you're selecting from a table, so it's easy to misunderstand. When you choose a Dimension, you get many members, and a cross-join between the rows and columns (which is sparse in this case because the names and genders are 1-to-1).
So, you could crossjoin these two dimensions on a single axis, and then filter out the null cells:
SELECT
event_count ON 0,
TOPCOUNT(
NonEmptyCrossJoin(name.children, gender.children),
10,
event_count) ON 1
FROM
events
Which should give you results that have a single column (event_count) and 10 rows, where each row is composed of the tuple (name, gender).
I hope that sets you on the right path, and please feel free to ask you want me to clarify.
For general introductory material, I think the book "MDX Solutions" is a good place to start:
http://www.amazon.ca/MDX-Solutions-Microsoft-Analysis-Services/dp/0471748080/
For an online MDX introductory material, you can have a look to this gentle introduction that presents the main MDX concepts.