GROUP_CONCAT multiple fields in Vertica - sql

How can I do something like:
SELECT ID, Store,
GROUP_CONCAT(keyword::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS keywords,
GROUP_CONCAT(url::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS urls
FROM table_name
I get the following errors when I run the above query:
cannot specify more than one user-defined transform function in the SELECT list
I tried MySQL GROUP_CONCAT multiple fields but that seems like a MySQL thing. I also believe GROUP_CONCAT is no longer support for vertica 7.1.x, so if there is a better way to do this, I am open to that.

As the error states you can only have one UDTF in a single select statement, so to get around this you can split the query into two sub-queries and join them together.
SELECT x.ID, x.Store, x.keywords, y.urls
FROM (
SELECT
ID,
Store,
GROUP_CONCAT(keyword::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS keywords
FROM table_name
) x
JOIN (
SELECT
ID,
GROUP_CONCAT(url::VARCHAR) OVER (PARTITION BY ID, Store ORDER BY num ASC) AS urls
FROM table_name
) y
ON x.ID = y.ID
;
This will evaluate each query with its own GROUP_CONCAT function separately and then join them together.

You can find GROUP_CONCAT at the Vertica github strings package. You should be able to just make and make install if your vsql path is set up right.
Another alternative would be to use agg_concatenate which is included in the examples directory. You'd have to finagle the sql a little to get the ordering in the concatenation correct, though. You can see examples of how to do this in this stackoverflow answer.

You will need to handle the transformation of keyword and url in a saparate CTE and pass those through to the group_concat...
With cte_table_name AS (
SELECT
ID
,Store
,keyword::VARCHAR AS keywords
,url::VARCHAR AS urls
FROM table_name
)
SELECT
t.ID
,t.Store
,GROUP_CONCAT(c.keyword) OVER (PARTITION BY t.ID, t.Store ORDER BY num ASC) AS keywords
,GROUP_CONCAT(c.url) OVER (PARTITION BY t.ID, t.Store ORDER BY num ASC) AS urls
FROM
table_name t
JOIN
cte_table_name c
ON c.ID = t.ID
AND c.Store = t.Score

Related

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

How can I make a distinct with multiple field

I have some duplicate mail in my database but I can't remove it.
I want Select some field but without duplicate mail.
I have a request like this :
SELECT
DISTINCT MAIL,
ID,
CIVILITE,
PRENOM,
NAME
FROM CONTACT WHERE CODE_PAYS = 'DE'
When I launch this request, my duplicate values on mail are already here.
Do you know how can I do that ?
Update: i have tried this approach but i need to use it in a view:
ALTER VIEW ALL_VW_CONTACT_DE WITH SCHEMABINDING
AS
with cte as
(
select rn = row_number() over (partition by c.Mail Order By c.Id asc), c.Mail, c.Id, c.Civilite, c.Prenom, c.Name
from dbo.CONTACT c
where code_pays = 'DE'
)
select Mail, Id, Civilite, Prenom, Name
from cte
where rn = 1
But this doesn't work, i get this error:
Cannot schema bind view 'MY_TABLE' because name 'CONTACT' is invalid
for schema binding. Name must be in two-part format and an object
cannot reference itself
When I launch this request, my duplicate values on mail are already
here.
The reason for it is that DISTINCT doesn't work like you think. It doesn't look only at the first column after the DISTINCT keyword but it compares all columns in the list. So just if all are equal it is considered a duplicate.
One easy way is using ROW_NUMBER:
with cte as
(
select rn = row_number() over (partition by c.Mail Order By c.Id asc), c.*
from dbo.Contact c
where Code_Pays = 'DE'
)
select Mail, Id, Civilite, Prenom, Name
from cte
where rn = 1
Change the order by if you want to take a different record, here i take the one with min-ID.
you can use row_number as below
Select top (1) with ties * from Contact
where CODE_PAYS = 'DE'
order by row_number() over(partition by mail order by id)
When you use DISTINCT with other fields, then you get only original combinations of these fields.
For this case, you should exclude all dynamic fields from query (possibly ID):
SELECT
DISTINCT MAIL,
CIVILITE,
PRENOM,
NAME
FROM CONTACT WHERE CODE_PAYS = 'DE'
The problem here is probably The ID field. Since it should be unique for each row, you can't group the other fields. Remove it from the query and you should be fine.
When you do a distinct query, the trick is to look at the results and finding what columns are returning different values, that's what's differentiating them. If you add the results in your question we can help you further.

How to avoid order by in group by query result [duplicate]

I am trying to display the records,order as in the where clause..
example:
select name from table where name in ('Yaksha','Arun','Naveen');
It displays Arun,Naveen,Yaksha (alphabetical order)
I want display it as same order i.e 'Yaksha''Arun','Naveen'
how to display this...
I am using oracle db.
Add this ORDER BY at the query's end:
order by case name when 'Yaksha' then 1
when 'Arun' then 2
when 'Naveen' then 3
end
(There's no other way to get that order. You need an ORDER BY to get a specific result set order.)
It may be a bit clunky, but you can create a custom ordering with a case expression:
SELECT *
FROM my_table
WHERE name IN ('Yaksha', 'Arun','Naveen')
ORDER BY CASE name WHEN 'Yaksha' THEN 1
WHEN 'Arun' THEN 2
WHEN 'Naveen' THEN 3
END ASC
A slightly longer option, but one that prevents duplication of the string literals is to use a subquery:
SELECT m.*
FROM my_table m
JOIN (SELECT 'Yaksha' AS name, 1 AS name_order FROM dual
UNION ALL
SELECT 'Arun' AS name, 2 AS name_order FROM dual
UNION ALL
SELECT 'Naveen' AS name, 3 AS name_order FROM dual) o
ON o.name = m.name
ORDER BY o.name_order ASC
You can try with something like the following:
SELECT *
FROM test
WHERE name IN ( 'Yaksha', 'Arun', 'Naveen' )
ORDER BY instr ( q'['Yaksha', 'Arun', 'Naveen']', name ) ASC
This way could be useful if your IN list is somehow dynamic.
If the list of values is dynamic or you just don't want to repeat the values you could use (or abuse, depending on your point of view) a table collection, and join your real table to a table collection expression instead of using IN:
select your_table.name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen')) t
join your_table on your_table.name = t.column_value;
Which will generally work, but of course without an order-by clause is not guaranteed to work, so you can use an inline view to assign the order:
select your_table.name from (
select row_number() over (order by null) as rn, column_value as name
from table(sys.odcivarchar2list('Yaksha','Arun','Naveen'))
) t
join your_table on your_table.name = t.name
order by t.rn;
This still relies on row_number() over (order by null) using the order of the elements in the collection; which relies on collection unnesting preserving the element order. I don't think that's guaranteed either, so there is still some risk involved.

Complex SQL pagination Query

I am doing pagination for my data using the solution to this question.
I need to be using this solution for a more complex query now. Ie. the SELECT inside the bracket has joins and aggregate functions.
This is that solution I'm using as a reference:
;WITH Results_CTE AS
(
SELECT
Col1, Col2, ...,
ROW_NUMBER() OVER (ORDER BY SortCol1, SortCol2, ...) AS RowNum
FROM Table
WHERE <whatever>
)
SELECT *
FROM Results_CTE
WHERE RowNum >= #Offset
AND RowNum < #Offset + #Limit
The query that I need to incorporate into the above solution:
SELECT users.indicator, COUNT(*) as 'queries' FROM queries
INNER JOIN calls ON queries.call_id = calls.id
INNER JOIN users ON calls.user_id = users.id
WHERE queries.isresolved=0 AND users.indicator='ind1'
GROUP BY users.indicator ORDER BY queries DESC
How can I achieve this? So far I've made it work by removing the ORDER BY queries DESC part and putting that in the line ROW_NUMBER() OVER (ORDER BY ...) AS RowNum, but when I do this it doesn't allow me to order by that column ("Invalid column name 'queries'.").
What do I need to do to get it to order by this column?
edit: using SQL Server 2008
Try ORDER BY COUNT(*) DESC . It works on MySQL ... not sure about SQL Server 2008
I think queries your alias name for count(*) column
then use like this
SELECT users.indicator, COUNT(*) as 'queries' FROM queries
INNER JOIN calls ON queries.call_id = calls.id
INNER JOIN users ON calls.user_id = users.id
WHERE queries.isresolved=0 AND users.indicator='ind1'
GROUP BY users.indicator ORDER BY COUNT(*) DESC
http://oops-solution.blogspot.com/2011/11/string-handling-in-javascript.html

Optimize sql query with the rank function

This query gets the top item in each group using the ranking function.
I want to reduce the number of inner selects down to two instead of three. I tried using the rank() function in the innermost query, but couldn't get it working along with an aggregate function. Then I couldn't use a where clause on 'itemrank' without wrapping it in yet another select statement.
Any ideas?
select *
from (
select
tmp.*,
rank() over (partition by tmp.slot order by slot, itemcount desc) as itemrank
from (
select
i.name,
i.icon,
ci.slot,
count(i.itemid) as itemcount
from items i
inner join citems ci on ci.itemid = i.itemid
group by i.name, i.icon, ci.slot
) as tmp
) as popularitems
where itemrank = 1
EDIT: using sql server 2008
In Oracle and Teradata (and perhaps others too), you can use QUALIFY itemrank = 1 to get rid of the outer select. This is not part of the ANSI standard.
You can use Common Table Expressions in Oracle or in SQL Server.
Here is the syntax:
WITH expression_name [ ( column_name [,...n] ) ]
AS
( CTE_query_definition )
The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
The statement to run the CTE is:
SELECT <column_list>
FROM expression_name;