Get Group by and count on Array field in PostgreSQL Jsonb column - sql

I have my table like this
user_Id
survey
1001
{"What are you interested in?": "[Games]", "How do you plan to use Gamester?": "[Play games, Watch other people play, Connect with other gamers]" }
1001
{"What are you interested in?": "[Coupons]", "How do you plan to use Gamester?": "[Watch other people play]" }
1001
{"What are you interested in?": "[Games]", "How do you plan to use Gamester?": "[Play games]" }
I want to group by second field How do you plan to use Gamester? And have output like below
Option
Count
Play games
2
Watch other people play
2
Connect with other gamers
1
All the examples I found were for String fields and not for array. Tried few thing but not working. Kindly help how can I group on array field values in josonb column and get count.

Your json data array in the How do you plan to use Gamester? property is wrong. Since you are storing a string and not an array of values.
Replace all values ​​as shown below:
Source Format: "[Play games, Watch other people play, Connect with other gamers]"
Required Format: ["Play games", "Watch other people play", "Connect with other gamers"]
After fixing your data, you can use the jsonb_path_query function and return a list of values ​​from the desired property, and then group them.
with cte_plan as (
select
jsonb_path_query(survey, '$."How do you plan to use Gamester?"[*]') as option
from my_table
)
select option, count(option) from cte_plan group by option;
Result
option
count
Watch other people play
2
Play games
2
Connect with other gamers
1
Demo in DBfiddle

Related

How to combine rows in BigQuery that share a similar name

i'm having trouble creating a query that'll group together responses from multiple rows that share a similar name and count the specific response record in them.
the datatable i currently have looks like this
test_control
values
test
selected
control
selected
test us
not selected
control us
selected
test mom
not selected
control mom
selected
what i'd like, is an output like the below that only counts the number of "selected" responses and groups together the rows that have either "control" or "test" in the name"
test_control
values
test
3
control
1
The query i have below is wrong as it doesn't give me an output of anything. The group by section is where im lost as i'm not sure how to do this. tried to google but couldn't seem to find anything. appreciate any help in advance!!!
SELECT distinct(test_control), values FROM `total_union`
where test_control="%test%" and values="selected"
group by test_control, values
use below
SELECT
REGEXP_EXTRACT(test_control, r'^(TEST|CONTROL) ') AS test_control,
COUNTIF(values = 'selected') AS values
FROM `total_union`
GROUP BY 1
As mentioned by #Mikhail Berlyant, you can use REGEX_EXTRACT to match the expression and COUNTIF to get the count of the total number of matching expressions according to the given condition. Try below code to get the expected output :
Code
SELECT
REGEXP_EXTRACT(test_control, r'^(test|control)') AS test_control,
COUNTIF(values = "selected") AS values
FROM `project.dataset.testvalues`
group by 1
Output

How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL?

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!
You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

Filter by two values with ID column

im analyzing some e-sports soccer championship data.
My original table looks like this:
Every row corresponds to one match with the Date, Players envolved, the Teams they used and their Scores
my df head()
After seaching around tableau community, I pivoted "Player A" and "Player B" columns so i can filter for players individually. Now any match has 2 rows(one for each player on that match) and tey're unified by the 'MatchID' column:
my tableau table
That said, i want to build a view where the viewer could select two players and see statistics about all the matches they played against each other, like these two:
1- Last 10 matches info (Date, teams they played with, scores)
2- Most-frequent results like this graph:
the graph i want to show
Tried bringing some dimensions to colums but i really couldnt find a way to show the entire row data in a view. No idea about h2 filter from two players and take only matches where they encounter using MatchID.
I tried searching around and do some Calculated Fields filters, but i just went Tableau with no background in SQL, Excel or anything, just Python. So im a bit lost with so many options and ways.
If anyone could gimme directions about that i would be very happy. Thx in advice (:
I think you should unpivot your data so you are back with 1 record per match. Then you will be able to use 2 parameters as your filters; one parameter for player 1 and the other for player 2. That would enable the user to select 2 different players.
As there's a chance the same player could be in both the Player 1 and Player 2 columns, to use as a filter is a little more complex. Your filter calculated field for the Player1 parameter would be something like:
[FilterParameterPlayer1]: [ParameterPlayer1] = [Player1] OR ParameterPlayer1] = [Player2]
And for Player2 parameter:
[FilterParameterPlayer2]: [ParameterPlayer2] = [Player1] OR ParameterPlayer2] = [Player2]
Both filter fields should be set to only show True.

Google Bigquery use of substr, never returns back results

I have a table which has two sets of data, one set of data has information like
Type | Name | Id
PackagedDrug |Pseudoephedrine HCl Oral Tablet 120 MG| 110
PackagedDrug |Pseudoephedrine HCl Oral Tablet 60 MG|111
DrugName| Pseudoephedrine HCl| 112
What I want to do is join PackagedDrug with DrugName concepts, so get all Ids for Type PackagedDrug whose Name is matching with Name for Type DrugName. If I hardcode the Name for DrugName in the following query, it runs instantenously, but if I take out the hardcoding then it just keeps on running. Could you please suggest me suitable ways to speed up the big query?
SELECT a.MSC_ID MSC_id, a.MSC_CONcept_type, a.concept_id, a.concept_name , b.concept_name
from
(select MSC_id, MSC_CONcept_type, concept_id, concept_name
FROM [ClientAlerts.MSC_Concepts]
where MSC_CONcept_type in ('MediSpan.Concepts.PackagedDrug') ) a
CROSS JOIN
(select MSC_CONcept_type, concept_id, concept_name , length(concept_name) len
FROM [ClientAlerts.MSC_Concepts]
where MSC_CONcept_type in ('MediSpan.Concepts.NamebasedClassification.DrugName')
-- and concept_name in ('Pseudoephedrine HCl')
) b
where substr(a.concept_name,1,b.len)+' ' = b.concept_name
Thanks,
Savita
This has nothing to do with BigQuery itself. When you hardcode, your values are "filtered" way faster, because it doesn't have to check every row, since it looks for the hardcoded value.
If you don't use the hardcoded value, it will look at WAY more rows, compare ALL the rows from your first query with your second. Honestly, if you describe your use case properly here, I don't think of any way to do this faster.
But one question does come to mind. Why do you have a "type". It seems like it should be two different tables instead.

SQL Group on a combination of values

I'm working a DB design regarding how a user launched something.
My idea was to have timestamp (DateTime) column, and a method column (varchar).
This 'method' (varchar) could be anything:
BUTTON_OK
BUTTON_X
APP_Y
APP_Z
etc
How can I COUNT the uses but group some values. In this case I want to have my result:
BUTTONS: 20
APP_X: 10
APP_Z: 14
You need some way of defining which 'methods' fall into which 'method group'.
One way would be to have a lookup table:
tbl_methodgroup
method_id Method Method_group
1 Button_OK Buttons
2 Button_X Buttons
3 App_Y App_Y
4 App_Z App_Z
then you could use:
select
b.method_group,
count(1)
from
tbl_methodgroup a
inner join tbl_method b on a.Method=b.Method
group by b.method_group
This method would have the advantage of being scalable as more methods get added. Rather than hand coding queries, which would need to be modified each time.
If the name of the table is tblTest, then the query will look like following:
SELECT method, COUNT(*) FROM tblTEst Group BY method
Apologies if I missread question, last chance to make it right if you have consistency in the data and grouping scenarios you can do following:
SELECT LEFT(method,CHARINDEX('_',method)-1),
COUNT(*)
FROM tblTest
GROUP BY LEFT(method,CHARINDEX('_',method)-1)
Otherwise Stuart Moore's answer is correct one.