Generate multiple rows for a binary number field? - sql

Example data rows:
| ID | First Name | Last Name | Federal Race Code |
| 101 | Bob | Miller | 01010 |
| 102 | Daniel | Smith | 00011 |
The "Federal Race Code" field contains binary data, and each "1" is used to determine if a particular check box is set on a particular web form. E.g., the first bit is American Indian, second bit is Asian, third bit is African American, fourth is Pacific Islander, and the fifth is White.
I need to generate a separate row for each bit that is set to "1". So, given the example above, I need to generate output that looks like this:
| ID | First Name | Last Name | Mapped Race Name |
| 101 | Bob | Miller | Asian |
| 101 | Bob | Miller | African American |
| 102 | Daniel | Smith | Pacific Islander |
| 102 | Daniel | Smith | White |
Any tips or ideas on how to go about this?

You can do it with either 6 queries with UNION or one UNPIVOT clause.
In any case you should start by splitting that binary logic into 6 columns:
SELECT *,
CASE WHEN federal_race_code & 16 = 16 THEN 1 ELSE 0 END as NativeAmerican,
..
CASE WHEN federal_race_code & 1 = 1 THEN 1 ELSE 0 END as White
FROM myTable
Then UNION:
SELECT *, 'Native American' AS Race
FROM (<subquery>)
WHERE NativeAmerican = 1
UNION
...
UNION
SELECT *, 'White' AS Race
FROM (<subquery>)
WHERE White = 1
If you are on Oracle or SQL server use CTE.

Related

Using Snowflake SQL how do you find two records and then change another one based on those records to a predefined record using a local variable?

Using SQL how do you use two records to find a place, hold onto that place and use that record to replace 'Nonsense' value with that held onto place? I am going to show what I have been able to write so far, but then write out what I am still trying to figure out:
SELECT * FROM "TABLES". "ACCTS_OF_SUPERHEROS".;
DECLARE #count_rows INT = 0;
DECLARE #row_total INT = 0;
DECLARE #refAcctNum INT = 0;
DECLARE #selectedPlaceName TINYTEXT;
SET #row_total = SELECT COUNT (*)
WHILE countRows < row_total
for each acct_num store value in refAcctNum.
Using refAcctNum find place: "Gotham City", "Central City", "Metropolis", "Smallville", "Star City", "Fawcett City" store that in selectedPlaceName.
If refAccountNumber has Nonsense then replace with selectedPlaceName record
otherwise add + 1 to countRows and repeat.
END
Current table data; "ACCTS_OF_SUPERHEROS" table:
| row | acct_num | exact_address | place
| --- | -------- |------------------|--------
| 1 | 049403 | 344 Clinton Str | Metropolis
| 2 | 049403 | 344 Clinton Str | Nonsense
| 3 | 049206 | 1007 Mountain Dr | Gotham City
| 4 | 049206 | 1007 Mountain Dr | Gotham City
| 5 | 049206 | 1096 Show Dr. | Fawcett City
| 6 | 049206 | 1096 Show Dr. | Nonsense
| 7 | 049206 | NULL | Nonsense
| 8 | 049291 | 1938 Sullivan Pl | Smallville
| 9 | 049293 | 700 Hamilton Str | Central City
| 10 | 049396 | 800 Nonsense Way | Nonsense
| 11 | 049396 | NULL | Nonsense
Desired output:
| row | acct_num | exact_address | place
| --- | -------- |------------------|--------
| 1 | 049403 | 344 Clinton Str | Metropolis
| 2 | 049403 | 344 Clinton Str | Metropolis
| 3 | 049206 | 1007 Mountain Dr | Gotham City
| 4 | 049206 | 1007 Mountain Dr | Gotham City
| 5 | 049206 | 1096 Show Dr. | Fawcett City
| 6 | 049206 | 1096 Show Dr. | Fawcett City
| 7 | 049206 | NULL | Fawcett City
| 8 | 049291 | 1938 Sullivan Pl | Smallville
| 9 | 049293 | 700 Hamilton Str | Central City
| 10 | 049396 | 800 Tidal Way | Star City
| 11 | 049396 | NULL | Star City
You can use window functions:
select t.*,
max(case when place <> 'Nonsense' then place end) over (partition by acct_num) as imputed_place
from t;
This returns NULL if all the rows are 'Nonsense' for a given acct_num. You can use COALESCE() to replace the value with something else.
I was reading through the available list of window functions in Snowflake and think you're going to need a new window function for this. Perhaps someone can find a more built-in way, but anyway here's a user defined table function REPLACE_WITH_LKG implemented as a window function that will replace a bad value with the last known good value. As long as I was going to write it, I thought it may as well be general purpose, so it matches "bad" values using a regular expression and JavaScript RegExp options.
create or replace function REPLACE_WITH_LKG("VALUE" string, "REGEXP" string, "REGEXP_OPTIONS" string)
returns table(LKG_VALUE string)
language javascript
strict immutable
as
$$
{
initialize: function (argumentInfo, context) {
this.lkg = "";
},
processRow: function (row, rowWriter, context) {
const rx = new RegExp(row.REGEXP, row.REGEXP_OPTIONS);
if (!rx.test(row.VALUE)) {
this.lkg = row.VALUE;
}
rowWriter.writeRow({LKG_VALUE: this.lkg});
},
finalize: function (rowWriter, context) {},
}
$$;
select S.*, LKG.LKG_VALUE as PLACE
from superhero S, table(REPLACE_WITH_LKG(PLACE, 'Nonsense', 'ig')
over(partition by null order by "ROW")) LKG;
;
A note on performance; the way the data shows this the're no partition other than the entire table. That's because the one obvious place to partition, by account, won't work. Row 10 is getting its value from what would be a different window if using account, so the way the sample data appears it needs to be a window that spans the entire table. This will not parallelize well and should be avoided for very large tables.

How to get every first result of select query in loop iterating over array of strings?

I have a table (e.g. Users) in PostgreSQL database. Its size is relatively large (ca. 4 GB of data) and I would like to get a table/result consisting of single rows fulfilling the select query. This query shall be executed for each element in an array of strings (couple dozens of elements).
Example single select for one element:
SELECT * FROM "Users" WHERE "Surname" LIKE 'Smith%' LIMIT 1
Value between ' and %' should be an element of input array.
EDIT: It doesn't matter for me whether I get record no. 1 or 2 for LIKE 'Smith%'
How can I achieve this?
I tried to append query results to some array variable within FOREACH loop but with no success.
Example source table:
| Id | Name | Surname |
|---- |-------- |---------- |
| 1 | John | Smiths |
| 2 | Adam | Smith |
| 3 | George | Kowalsky |
| 4 | George | Kowalsky |
| 5 | Susan | Connor |
| 6 | Clare | Connory |
| 7 | Susan | Connor |
And for ['Smith', 'Connor'] the output is:
| Id | Name | Surname |
|----|-------|---------|
| 1 | John | Smiths |
| 5 | Susan | Connor |
In Postgres you can use the ANY operator to compare a single value to all values of an array. This also works with together with the LIKE operator.
SELECT *
FROM "Users"
WHERE "Surname" like ANY (array['Smith%', 'Connor%'])
Note that LIKE is case sensitive, if you don't want that, you can use ILIKE
This will show you the logic. Syntax is up to you.
where 1 = 2
start of loop
or surname like 'Array Element goes here%'
end of loop

SQL Where != stringval not filtering out stringval

I have a table (as table1)comes from HBase that has certain things that I would like to filter out. I have recreated the table, my SQL query, and the output I receive below. What happens is that when I try to filter out the string value it stays in the table, even if I want it out.
table1 ( some positions are fully capitalized some arent, want to make them all capitalized and filter out positions )
name | company | personal_id | position
Joe | Applebees| 32 | manager
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | barista
query
df = sqlContext.sql("select name, company, personal_id, UCASE(position) as position
from table1
where position != 'BARISTA'") #tried lower & upper case
Output Reieved
name | company | personal_id | position
Joe | Applebees| 32 | MANAGER
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | BARISTA /*dont want this output*/
Why did the row Ron | Startbucks| 13 | BARISTA not filter with my where clause?
try
where UCASE(position) != 'BARISTA'
Why are you grouping the result. there is no need to group the result until aggregate function is used. Try below query -
select name, company, personal_id, UCASE(position) as position
from table1
where upper(position) != 'BARISTA'

How to add column with the value of another dimension?

I appologize if the title does not make sense. I am trying to do something that is probably simple, but I have not been able to figure it out, and I'm not sure how to search for the answer. I have the following MDX query:
SELECT
event_count ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
which returns something like this:
| | event_count |
+---------------+-------------+
| P Davis | 123 |
| J Davis | 123 |
| A Brown | 120 |
| K Thompson | 119 |
| R White | 119 |
| M Wilson | 118 |
| D Harris | 118 |
| R Thompson | 116 |
| Z Williams | 115 |
| X Smith | 114 |
I need to include an additional column (gender). Gender is not a metric. It's just another dimension on the data. For instance, consider this query:
SELECT
gender.children ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
But this is not what I want! :(
| | female | male | unknown |
+--------------+--------+------+---------+
| P Davis | | | 123 |
| J Davis | | 123 | |
| A Brown | | 120 | |
| K Thompson | | 119 | |
| R White | 119 | | |
| M Wilson | | | 118 |
| D Harris | | | 118 |
| R Thompson | | | 116 |
| Z Williams | | | 115 |
| X Smith | | | 114 |
Nice try, but I just want three columns: name, event_count, and gender. How hard can it be?
Obviously this reflects lack of understanding about MDX on my part. Any pointers to quality introductory material would be appreciated.
It's important to understand that in MDX you are building sets of members on each axis, and not specifying column names like a tabular rowset. You are describing a 2-dimensional grid of results, not a linear rowset. If you imagine each dimension as a table, the member set is the set of unique values from a single column in that table.
When you choose a Measure as the member (as in your first example), it looks as if you're selecting from a table, so it's easy to misunderstand. When you choose a Dimension, you get many members, and a cross-join between the rows and columns (which is sparse in this case because the names and genders are 1-to-1).
So, you could crossjoin these two dimensions on a single axis, and then filter out the null cells:
SELECT
event_count ON 0,
TOPCOUNT(
NonEmptyCrossJoin(name.children, gender.children),
10,
event_count) ON 1
FROM
events
Which should give you results that have a single column (event_count) and 10 rows, where each row is composed of the tuple (name, gender).
I hope that sets you on the right path, and please feel free to ask you want me to clarify.
For general introductory material, I think the book "MDX Solutions" is a good place to start:
http://www.amazon.ca/MDX-Solutions-Microsoft-Analysis-Services/dp/0471748080/
For an online MDX introductory material, you can have a look to this gentle introduction that presents the main MDX concepts.

update data column with data from another column plus an additional keyword

I have tried using a combination of both of these two previous questions Question 1 Question 2 to help me with my issue.
My issue is I am trying to set one column in DB2 equal to another column with a beginning and ending value. I do have the option of doing this in two steps, of adding the S- first and then on a second pass add the -000 on the end, but I am currently running into the issue of CONCAT not working in DB2 like in MYSQL.
Pre-conversion
name | loc | group
-----------------------------------
sam | 123 |
jack | 456 |
jill | 987 |
mark | 456 |
allen | 123 |
john | 789 |
tom | 123 |
Post-conversion
name | loc | group
-----------------------------------
sam | 123 | S-123-000
jack | 456 | S-456-000
jill | 987 | S-987-000
mark | 456 | S-456-000
allen | 123 | S-123-000
john | 789 | S-789-000
tom | 123 | S-123-000
The SQL I am trying to use:
UPDATE table
SET GROUP = CONCAT('S-',LOC,'-000')
WHERE LENGTH(RTRIM(LOC)) = 3
Any help or guidance would be appreciated.
The DB2 CONCAT takes only two arguments, so you could nest them to achieve your desired result.
UPDATE table
SET GROUP = CONCAT(CONCAT('S-',LOC),'-000')
WHERE LENGTH(RTRIM(LOC)) = 3
There is also a concatenate operator: ||
However some recommend not to use it
UPDATE table
SET GROUP = 'S-'||LOC||'-000'
WHERE LENGTH(RTRIM(LOC)) = 3
If you're LOC field is longer than 3 and you want no whitespace
UPDATE table
SET GROUP = 'S-'||RTRIM(LOC)||'-000'
WHERE LENGTH(RTRIM(LOC)) = 3