How can use WHERE clause in AWS Athena Json queries?

How can use WHERE clause in AWS Athena Json queries? - sql

I have a table where I've stored some information from a Json object:
Table:
investment
unit(string)
data(string)
If a run the the query SELECT * FROM "db"."investment" limit 10; I got the following result:
Unit Data
CH [{"from":"CH","when":"2021-02-16","who":"pp#gmail.com"}]
AB [{"from":"AB","when":"2020-02-16","who":"jj#gmail.com"}]
Now, I run the following basic query to return value within the Json nested object:
SELECT json_extract_scalar(Data, '$[0].who') email FROM "db"."investment";
and I got the following result:
email
jj#gmail.com
pp#gmail.com
How can filter this query with WHERE clause to return just a single value:
I've tried this, but obviously it doesn't work as normal SQL table with row and columns:
SELECT json_extract_scalar(Data, '$[0].who') email FROM "db"."investment" WHERE email = "pp#gmail.com";
Any help with this?

your question seems to have a few typos.
Date in Unit Date should probably be Data
what is key referring to. Perhaps you mean Data
also, note that athena is case insensitive, and column names are converted to lower case (even if you quote them).
with that out of the way, you have to use the full expression that extracts your email from the json document in the where clause. the column alias defined is not accessible to the rest of the query.
here's a self contained example:
with test (unit, data) as (
values
('CH', JSON '[{"from":"CH","when":"2021-02-16","who":"pp#gmail.com"}]'),
('AB', JSON '[{"from":"AB","when":"2020-02-16","who":"jj#gmail.com"}]')
)
select json_extract_scalar(data, '$[0].who') email
from test
where json_extract_scalar(data, '$[0].who') = 'pp#gmail.com';
outputs:
| email |
+--------------+
| pp#gmail.com |

Related

SQL query find few strings in diferent columns in a table row (restrictive)

I have a table like this one (in a SQL SERVER):
field_name
field_descriptor
tag1
tag2
tag3
tag4
tag5
house
your home
home
house
null
null
null
car
first car
car
wheel
null
null
null
...
...
...
...
...
...
...
I'm developing a WIKI with a searchbar, which should be able to handle a query with more than one string for search. As an user enters a second string (spaced) the query should be able to return results that match restrictively the two strings (if exists) in any column, and so with a three string search.
Easy to do for one string with a simple SELECT with ORs.
Tried in the fronted in JS with libraries like match-sorter but it's heavy with a table with more than 100,000 results and more in the future.
I thought the query should do the heavy work, but maybe there is no simple way doing it.
Thanks in advance!
Tried to do the heavy work with all results in frontend with filtering and other libraries like match-sorter. Works but take several seconds and blocks the front.
Tried to create a simple OR/AND query but the posibilities with 3 search-strings (could be 1, 2 or 3) matching any column to any other possibility is overwhelming.

You can use STRING_SPLIT to get a separate row per search word from the search words string. Then only select rows where all search words have a match.
The query should look like this:
select *
from mytable t
where exists
(
select null
from (select value from string_split(#search, ' ')) search
having min(case when search.value in (t.tag1, t.tag2, t.tag3, t.tag4, t.tag5) then 1 else 0 end) = 1
);
Unfortunately, SQL Server seems to have a flaw (or even a bug) here and reports:
Msg 8124 Level 16 State 1 Line 8
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
Demo: https://dbfiddle.uk/kNL1PVOZ
I don't have more time at hand right now, so you may use this query as a starting point to get the final query.

SQL JSON HELP | Selecting ALL records with a certain JSON value

I am trying to get all records that contain WEAPON_COMBATPISTOL in the column inventory within my users table.
It is based on JSON. I am lost I have tried JSON_EXTRACT & JSON_CONTAINS.
JSON DATA:
[{"slot":1,"count":450,"name":"money"},
{"slot":2,"count":54,"name":"ammo-9"},
{"metadata":{"serial":"643280CXJ213639","durability":97.91999999999998,"registered":"Barry McCeiner","components":[],"ammo":11},"slot":3,"count":1,"name":"WEAPON_COMBATPISTOL"},
{"slot":4,"count":8,"name":"burger"},
{"slot":5,"count":8,"name":"icetea"},
{"slot":6,"count":7,"name":"stone"},
{"slot":10,"count":6,"name":"lockpick"}]
SQL Statement:
SELECT * FROM users WHERE JSON_CONTAINS(inventory, 'WEAPON_COMBATPISTOL', '$.name');
I also want to remove just the WEAPON_COMBATPISTOL from the JSON data. If that is possible.

Query to return the amount of time each field equals a true value

I'm collecting data and storing in SQL Server. The table consist of the data shown in the Example Table below. I need to show the amount of time that each field [Fault0-Fault10] is in a fault state. The fault state is represented as a 1 for the fields value.
I have used the following query and got the desired results. However this is for only one field. I need to have the total time for each field in one query. I'm having issues pulling all of the fields into one query.
SELECT
Distinct [Fault0]
,datediff(mi, min(DateAndTime), max(DateAndTime)) TotalTimeInMins
FROM [dbo].[Fault]
Where Fault0 =1
group by [Fault0]
Results

Assuming you want the max() - min(), then you simply need to unpivot the data. Your query can look like:
SELECT v.faultname,
datediff(minute, min(t.DateAndTime), max(t.DateAndTime)) as TotalTimeInMins
FROM [dbo].[Fault] f CROSS APPLY
(VALUES ('fault0', fault0), ('fault1', fault1), . . ., ('fault10', fault10)
) v(faultname, value)
WHERE v.value = 1
GROUP BY v.faultname;

A 228 row results query job writing to table with gives 0 rows when allow large results is True

I have a SQL query that, when I write the results to a table without 'allow large results' set, will write 228 rows.
When I set allow large results however, the destination table will contain 0 rows. Both attempts use write disposition WRITE_TRUNCATE.
I see this both using the API and the BigQuery console.
The working no-allow-large-results job:
eagTEiR0wSMK6b5WLSL04vB9RfTUb8bhvEi1YFWjuhfaF_W0zEeLogxUYwOrhGyOheS_CyyaB1dUeafGPdyR592xMcbeEmpJ85_CO29PSbBAnmEBGHJVHWjpH5DvGyVCEjarfJ5XUQ9UmVT_FSHmkcEZktbfln9E_E1jobM65IuQv2sP4_r7eqK60aPaqxD7taEc1bpM2kS6GAtkxqFsUUOv_JXQgTn3ebCodHFKsdquhy3e1mfbu4QhqnoO5QCi
The non-working allow-large-results job:
G40HW4Z5zGTgL1NSCBBy380kY7Gu7WOU7s_zB9F8Kdrtao2gbzRLptWSSi76MC2gHCHPG0srssaGejfCIN4j1upjyh9vQnA3kPmuJcgm5ZgdYd3YwsmGzvcBXiPy9bY0x0GRhJXimHqhKiYbKz7fa3LljOb4kxNvB8wPazqeYj3xAXwbV8G2Sl3L6gmutvvYPalhd1CCtUbLfiw520_I4zKDgn7LYosyFjA0h9TwR8GQ80Scd5n8yKAsIEou7XDG
Query:
SELECT t1.email, MIN(t1.min_created_time), GROUP_CONCAT(t1.id)
FROM (
SELECT email, MIN(created) as min_created_time, id
FROM TABLE_QUERY([xxxxx], 'table_id in ("yyyyyy_201601", "yyyyyy_201602", "yyyyyy _201603", "yyyyyy_201604")')
WHERE created >= "2016-01-11 00:00:00" AND created < "2016-04-01 00:00:00" AND id != "null" AND name LIKE "%trike%"
GROUP BY email, id
) t1
GROUP EACH BY t1.email
IGNORE CASE
Also note, a simpler SQL works for both cases such as:
select email from xxxx group by email limit 100

This looks like a problem due to IGNORE CASE. The fix is underway, but in the meantime could you wrap string comparisons with LOWER() calls, i.e.
LOWER(id) != "null"
LOWER(name) LIKE "%trike%"
etc.

Splitting text in SQL Server stored procedure

I'm working with a database, where one of the fields I extract is something like:
1-117 3-134 3-133
Each of these number sets represents a different set of data in another table. Taking 1-117 as an example, 1 = equipment ID, and 117 = equipment settings.
I have another table from which I need to extract data based on the previous field. It has two columns that split equipment ID and settings. Essentially, I need a way to go from the queried column 1-117 and run a query to extract data from another table where 1 and 117 are two separate corresponding columns.
So, is there anyway to split this number to run this query?
Also, how would I split those three numbers (1-117 3-134 3-133) into three different query sets?
The tricky part here is that this column can have any number of sets here (such as 1-117 3-133 or 1-117 3-134 3-133 2-131).
I'm creating these queries in a stored procedure as part of a larger document to display the extracted data.
Thanks for any help.

Since you didn't provide the DB vendor, here's two posts that answer this question for SQL Server and Oracle respectively...
T-SQL: Opposite to string concatenation - how to split string into multiple records
Splitting comma separated string in a PL/SQL stored proc
And if you're using some other DBMS, go search for "splitting text ". I can almost guarantee you're not the first one to ask, and there's answers for every DBMS flavor out there.
As you said the format is constant though, you could also do something simpler using a SUBSTRING function.
EDIT in response to OP comment...
Since you're using SQL Server, and you said that these values are always in a consistent format, you can do something as simple as using SUBSTRING to get each part of the value and assign them to T-SQL variables, where you can then use them to do whatever you want, like using them in the predicate of a query.

Assuming that what you said is true about the format always being #-### (exactly 1 digit, a dash, and 3 digits) this is fairly easy.
WITH EquipmentSettings AS (
SELECT
S.*,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 5, 1) EquipmentID,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 3, 3) Settings
FROM
SourceTable S
INNER JOIN master.dbo.spt_values V
ON V.Value BETWEEN 1 AND Len(S.AwfulMultivalue) / 6
WHERE
V.type = 'P'
)
SELECT
E.Whatever,
D.Whatever
FROM
EquipmentSettings E
INNER JOIN DestinationTable D
ON E.EquipmentID = D.EquipmentID
AND E.Settings = D.Settings
In SQL Server 2005+ this query will support 1365 values in the string.
If the length of the digits can vary, then it's a little harder. Let me know.

Incase if the sets does not increase by more than 4 then you can use Parsename to retrieve the result
Declare #Num varchar(20)
Set #Num='1-117 3-134 3-133'
select parsename(replace (#Num,' ','.'),3)
Result :- 1-117
Now again use parsename on the same resultset
Select parsename(replace(parsename(replace (#Num,' ','.'),3),'-','.'),1)
Result :- 117
If the there are more than 4 values then use split functions

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can use WHERE clause in AWS Athena Json queries? - sql

Related

SQL query find few strings in diferent columns in a table row (restrictive)

SQL JSON HELP | Selecting ALL records with a certain JSON value

Query to return the amount of time each field equals a true value

A 228 row results query job writing to table with gives 0 rows when allow large results is True

Splitting text in SQL Server stored procedure

Categories

Resources