QUERY - GROUP BY counting different clients by date - sql

Data sample:
https://docs.google.com/spreadsheets/d/1DDs2PvljSsY0jD0v2VmM0NsGkJmhoBPKOXvln9d1MTw/edit?usp=sharing
Above is a link to a spreadsheet where I have a sample of the data I'm working with.
I need to do a query where i can count how many different clients I attended that day. Example (INFO column not needed as a result, just a helper here for me to describe what I need):
DATE
COUNT(DIFFERENT CLIENTS)
INFO
05/01/2021
3
"Fleury Campinas", "SEDI II AME SOROCABA", "Hospital Santa Catarina"
06/01/2021
2
"Hospital e Maternidade Metropolitano Lapa", "Fleury A+ Morumbi"
Can you help me?

Given the layout of the actual data and the locale of Brazil, I added a new sheet ("Erik Help") with this formula:
=ArrayFormula({"DATE"\"UNIQUE CLIENTS"\"INFO";{QUERY(UNIQUE({'Query from Data'!B5:B\'Query from Data'!E5:E});"Select Col1, COUNT(Col2) WHERE Col1 Is Not Null GROUP BY Col1 LABEL COUNT(Col2) ''")\REGEXREPLACE(REGEXREPLACE(TRIM(FLATTEN(QUERY(QUERY({'Query from Data'!B5:B\'Query from Data'!E5:E&","}; "Select MAX(Col2) WHERE Col1 Is Not Null GROUP BY Col2 PIVOT Col1");; 9^9)));"^\S+\s*|[,\s]+$";"");",\s*";CHAR(10))}})
This is complex, and explaining it in full would take far longer than writing it. So I am offering it as-is, inviting anyone who is interested to take it apart and see what each part does alone and collectively.
The short version:
Headers are created.
Under those, there is a virtual array formed of a two-column QUERY to the left of another one-column QUERY. The first returns unique dates and counts of unique clients (two columns). The second combines each date with the unique client list for that date and then uses REGEX-type commands to get rid of the date portion and to replace comma-space with a line return CHAR(10).

Try getting the unique values in the first and third columns then grouping by date using a query:
=query(unique({A:A,C:C}),"select Col1,count(Col2) where Col1 is not null group by Col1")

Related

Extract the highest key:value pair from a string in Standard SQL

I have the following data type below, it is a type of key value pair such as 116=0.2875. Big Query has stored this as a string. What I am required to do is to extract the key i.e 116 from each row.
To make things more complicated if a row has more than one key value pair the iteration to be extracted is the one with the highest number on the right e.g {1=0.1,2=0.8} so the extracted number would be 2.
I am struggling to use SQL to perform this, Particularly as some rows have one value and some have multiple:
This is as close as I have managed to get where I can create a bit of code to extract the highest right hand value (which I don't need) but I just cant seem to create something to either get the whole key/value pair which would be fine and work for me or just the key which would be great.
column
,(SELECT MAX(CAST(Values AS NUMERIC)) FROM UNNEST(JSON_EXTRACT_ARRAY(REPLACE(REPLACE(REPLACE(column,"{","["),"}","]"),"=",","))) AS Values WHERE Values LIKE "%.%") AS Highest
from `table`
Here is some sample data:
1 {99=0.25}
2 {99=0.25}
3 {99=0.25}
4 {116=0.2875, 119=0.6, 87=0.5142857142857143}
5 {105=0.308724832214765}
6 {105=0.308724832214765}
7 {139=0.5712754555198284}
8 {127=0.5767967894928858}
9 {134=0.2530120481927711, 129=0.29696599825632086, 73=0.2662459427947186}
10 {80=0.21242613001118038}
Any help on this conundrum would be greatly appreciated!
Consider below approach
select column,
( select cast(split(kv, '=')[offset(0)] as int64)
from unnest(regexp_extract_all(column, r'(\d+=\d+.\d+)')) kv
order by cast(split(kv, '=')[offset(1)] as float64) desc
limit 1
) key
from your_table
if applied to sample data in your question - output is

Listing the top value of rows based on a certain value in google spreadsheets

I have the following excel DPS table(s) all listed below eachother:
Column A Column B Column C
Parse Name DPS
61 Arlisk 991.7
46 Tritla 913.9
Parse Name DPS
79 Arlisk 1156.3
87 Lucija 1090.8
I have another name-table, which simply lists the names Arlisk, Tritla and Lucija.
Now I want to add another column to the name-table that shows the highest value found in column D of the other table of all rows that refer to the name of that row.
In other words the new table should list each name's highest DPS found across all the other tables.
I found the following, but the formula is wrong and Im not familiara enough with it to fix it further.
=ArrayFormula(MAX(IFERROR(INDEX($C$4:$C$999 ,SMALL(IF($B$4:$B$999=$B$4 ,ROW($B$4:$B$999)-ROW($B$4)+1),ROWS($B$4:$B4))),"")))
Could anyone give me some advice on the solution?
You want to get max DPS per name across all tables?
Given that those tables are in the same columns, then FILTER and MAX can give you the max DPS per name.
Formula:
=max(filter(C:C, B:B=E2))
Output:
You could also use the formula below to create a separate table. This will automatically list the unique names together with the max DPS based on the tables available in the range.
=query({B2:C}, "select Col1, max(Col2) where Col1 is not null and Col2 is not null group by Col1")
Output:

SAP HANA SQL - Concatenate multiple result rows for a single column into a single row

I am pulling data and when I pull in the text field my results for the "distinct ID" are sometimes being duplicated when there are multiple results for that ID. Is there a way to concatenate the results into a single column/row rather than having them duplicated?
It looks like there are ways in other SQL platforms but I have not been able to find something that works in HANA.
Example
Select
Distinct ID
From Table1
If I pull only Distinct ID I get the following:
ID
1
2
3
4
However when I pull the following:
Example
Select
Distinct ID,Text
From Table1
I get something like
ID
Text
1
Dog
2
Cat
2
Dog
3
Fish
4
Bird
4
Horse
I am trying to Concat the Text field when there is more than 1 row for each ID.
What I need the results to be (Having a "break" between results so that they are on separate lines would be even better but at least a "," would work):
ID
Text
1
Dog
2
Cat,Dog
3
Fish
4
Bird,Horse
I see Kiran has just referred to another valid answer in the comment, but in your example this would work.
SELECT ID, STRING_AGG(Text, ',')
FROM TABLE1
GROUP BY ID;
You can replace the ',' with other characters, maybe a '\n' for a line break
I would caution against the approach to concatenate rows in this way, unless you know your data well. There is no effective limit to the rows and length of the string that you will generate, but HANA will have a limit on string length, so consider that.

Get percentage of NULL for all columns in Hive

I would like to get the percentage of NULL values in a table in Hive. Is there an easy way to do this without having to enumerate all column names in the query? In this case there are about 50k rows and 20 columns. Thanks in advance!
Something like:
SELECT count(each_column) / count(*) FROM TABLE_1
WHERE each_column = NULL;
If you do this using code, you need to list the columns. Here is one way:
select avg(case when col1 is null then 1.0 else 0.0 end) as col1_null_p,
avg(case when col2 is null then 1.0 else 0.0 end) as col2_null_p,
. . .
from t;
If you take the list of columns in the table, you can readily construct the query in a spreadsheet.
The approach you need depends on the situation that you have:
For 20 fixed columns: Just type your query
For 200 fixed columns: Copy the column names to your favorite tool (excel) and build the query there
For n columns that may not be fixed: Write a script to generate your code
I once wrote a python script. I now don't have it at hand but it is quite easy to create with the following logic:
Query the first 1 (or 0?) rows of the table, to get all the headers.
Build the desired queries to generate column based statistics (like percentage of null values) and union the result
Then executed the query.
Of course it can be expanded to run for different tables, and statistics, but do realize that this may not scale well.
In my case I think I had to cut the query building in batches of 20 columns each time which would then be concatenated afterwards, because running it on 400 columns just generated a too complex query.

How can I "dynamically" split a varchar column by specific characters?

I have a column that stores 2 values. Example below:
| Column 1 |
|some title1 =ExtractThis ; Source Title12 = ExtractThis2|
I want to remove 'ExtractThis' into one column and 'ExtractThis2' into another column. I've tried using a substring but it doesn't work as the data in column 1 is variable and therefore it doesn't always carve out my intended values. SQL below:
SELECT substring(d.Column1,13,24) FROM dbo.Table d
This returns 'Extract This' but for other columns it either takes too much or too little. Is there a function or combination of functions that will allow me to split consistently on the character? This is consistent in my column unlike my length count.
select substring(col1,CHARINDEX('=',col1)+1,CHARINDEX (';',col1)-CHARINDEX ('=',col1)-1) Val1,
substring(col1,CHARINDEX('=',col1,CHARINDEX (';',col1))+1,LEN(col1)) Val2
from #data
there is duplicate calculation that can be reduced from 5 to 3 to each line.
but I want to believe this simple optimization done by SQL SERVER.