I am new in SQL and I am trying to do a recursive query over the same table to find the brokers of the "master brokers"
I have a table that looks like this (it can grow to any amount of rows and deepness)
So I need a result like this:
master_id => broker_id
I have checked about how to do it and I got:
WITH admin_has_master_brokers
AS (
SELECT DISTINCT master_broker_id, admin_id
FROM admin_has_master_brokers
UNION ALL
/*I DO NOT KNOW HOW TO DO THIS SECTION*/
SELECT
master_broker_id, admin_id
FROM admin_has_master_brokers
)
SELECT
*
FROM
admin_has_master_brokers
ORDER BY master_broker_id ASC
But I can not understand how to do the recursive part to only get the results I need because I am getting this:
Any idea?
Povided original table is Mytable the query lists all desendants of every master_broker_id.
WITH RECURSIVE admin_has_master_brokers
AS (
SELECT DISTINCT master_broker_id master, master_broker_id, admin_id
FROM mytable
UNION ALL
SELECT a.master,
m.master_broker_id, m.admin_id
FROM admin_has_master_brokers a
JOIN mytable m ON m.master_broker_id = a.admin_id
)
SELECT DISTINCT master, admin_id
FROM
admin_has_master_brokers
ORDER BY master, admin_id
Related
I am trying to delete rows from a data set based on multiple criteria, but I am receiving a syntax error. Here is the current code:
With cte As (
Select *,
Row_Number() Over(Partition By ID, Numb1 Order by ID) as RowNumb
from DataSet
)
Delete from cte Where RowNumb > 1;
Where DataSet looks like this:
I want to delete all records in which the ID and the Numb1 are the same. So I would expect the code to delete all rows except:
I am not very experienced with Vertica but it seems like it is not very flexible about delete statements.
One way to do it would be to use a temporary table to store the rows that you want to keep, then truncate the original the table, and insert back into it from the temp table:
create temporary table MyTempTable as
select id, numb1, state_coding
from (select t.*, count(*) over(partition by id, numb1) cnt from DataSet) as t
where cnt = 1;
truncate table DataSet;
insert into DataSet
select id, numb1, state_coding from MyTempTable;
Note that I used a window count instead of row_number. This will remove records for which at least another record exists with the same id and numb1, which is what I understand that you want from your sample data and expected results.
Important: make sure to backup your entire table before you do this!
WITH Clauses in Vertica only support SELECT or INSERT, not DELETE/UPDATE.
Vertica Documentation
The cte is a temporary table. You cannot delete from it. It is effectively read-only.
If you are trying to delete duplicates out of the original DataSet table, you have to delete from the DataSet, not from the cte table.
Try this:
with cte as
(
select
ID,
Row_Number() Over(Partition By ID, Numb1 Order by ID) as RowNumb
from
DataSet
)
delete from DataSet where ID in (select ID from cte where RowNumb > 1)
Can't delete from CTEs. Just manually use delete syntax but rollback transactions or if you have permissions you can always replicate it and test.
You'd have saved me ~5 min had you pasted the data as text and not as picture - as I could not copy-paste and had to retype ...
Having said that:
Rebuild the table here:
DROP TABLE IF EXISTS input;
CREATE TABLE input(id,numb1,state_coding) AS (
SELECT 202003,4718868,'D'
UNION ALL SELECT 202003, 35756,'AA'
UNION ALL SELECT 204281, 146199,'D'
UNION ALL SELECT 204281, 146199,'D'
UNION ALL SELECT 204346, 108094,'D'
UNION ALL SELECT 204346, 108094,'D'
UNION ALL SELECT 204389, 14642,'DD'
UNION ALL SELECT 204389, 96504,'F'
UNION ALL SELECT 204392, 22010,'D'
UNION ALL SELECT 204392, 8051,'G'
UNION ALL SELECT 204400, 74118,'D'
UNION ALL SELECT 204400, 103900,'D'
UNION ALL SELECT 204406,1387304,'D'
UNION ALL SELECT 204406, 0,'HJ'
UNION ALL SELECT 204516, 894,'D'
UNION ALL SELECT 204516, 3927,'D'
UNION ALL SELECT 204586, 234235,'D'
UNION ALL SELECT 204586, 234235,'D'
)
;
And then:
Based on what was said in other responses, and keeping in mind that a mass delete of an important part of the table, not only in Vertica, is best implemented as an INSERT ... SELECT with inverted WHERE condition - here goes:
CREATE TABLE input_help AS
SELECT * FROM input
GROUP BY id,numb1,state_coding
HAVING COUNT(*) = 1;
DROP TABLE input;
ALTER TABLE input_help RENAME TO input;
At least, it works with that simplicity if the whole row is the same - I notice you don't put state_coding into the condition yourself. Otherwise, it gets slightly more complicated.
Or did you want to re-insert one row of the duplicates each afterwards?
Then, just build input_help as SELECT DISTINCT * FROM input; , then drop, then rename.
I have this table called item:
| PERSON_id | ITEM_id |
|------------------|----------------|
|------CP2---------|-----A03--------|
|------CP2---------|-----A02--------|
|------HB3---------|-----A02--------|
|------BW4---------|-----A01--------|
I need an SQL statement that would output the person with the most Items. Not really sure where to start either.
I advice you to use inner query for this purpose. the inner query is going to include group by and order by statement. and outer query will select the first statement which has the most items.
SELECT * FROM
(
SELECT PERSON_ID, COUNT(*) FROM TABLE1
GROUP BY PERSON_ID
ORDER BY 2 DESC
)
WHERE ROWNUM = 1
here is the fiddler link : http://sqlfiddle.com/#!4/4c4228/5
Locating the maximum of an aggregated column requires more than a single calculation, so here you can use a "common table expression" (cte) to hold the result and then re-use that result in a where clause:
with cte as (
select
person_id
, count(item_id) count_items
from mytable
group by
person_id
)
select
*
from cte
where count_items = (select max(count_items) from cte)
Note, if more than one person shares the same maximum count; more than one row will be returned bu this query.
Sorry if the title is not clear, I'm a beginner and I didn't know exactly how to formule it...
I have this query working with Oracle :
SELECT
( SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
( SELECT ROUND(AVG(FINANCIALOPERATIONBYPERSON),2)
FROM
(
SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
)
) AS AVERAGELOADAMOUNTBYPERSON
FROM DUAL
I'm looking for the equivalent for Sql Server...
The goal is to have multiple queries in a single query.
So I removed the "FROM DUAL" but I get an error on "FINANCIALOPERATIONBYPERSON" (Invalid column name), certainly because it's defined in the subquery...
How can I modify the query for SQL-Server ?
SQL Server requires aliases for subqueries. So, you can rewrite this as:
SELECT (SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
(SELECT ROUND(AVG(FINANCIALOPERATIONBYPERSON),2)
FROM (SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
) t
) AS AVERAGELOADAMOUNTBYPERSON;
In both databases, though, I would be inclined to write this as:
SELECT c.NBCATEGORIES, ROUND(fo.AVERAGELOADAMOUNTBYPERSON, 2) AS AVERAGELOADAMOUNTBYPERSON
FROM (SELECT COUNT(*) as NBCATEGORIES
FROM CATEGORY c
) c CROSS JOIN
(SELECT SUM(AMOUNT) / COUNT(DISTINCT PERSONID) AS AVERAGELOADAMOUNTBYPERSON
FROM FINANCIALOPERATION fo
WHERE PERSONID IS NOT NULL
) fo;
One note for both these forms: SQL Server does integer arithmetic on integers. So, if AMOUNT is an integer, then you should convert it to an appropriate floating or fixed point numeric type.
You need to add a table alias for the subquery.
SELECT
( SELECT COUNT(*)
FROM CATEGORY
) AS NBCATEGORIES,
( SELECT ROUND(AVG(RESULTS.FINANCIALOPERATIONBYPERSON),2)
FROM
(
SELECT SUM(AMOUNT) AS FINANCIALOPERATIONBYPERSON
FROM FINANCIALOPERATION
WHERE PERSONID IS NOT NULL
GROUP BY PERSONID
) RESULTS
) AS AVERAGELOADAMOUNTBYPERSON
I have two tables with data. Both tables have a CUSTOMER_ID column (which is numeric). I am trying to get a list of all the unique values for CUSTOMER_ID and know whether or not the CUSTOMER_ID exists in both tables or just one (and which one).
I can easily get a list of the unique CUSTOMER_ID:
SELECT tblOne.CUSTOMER_ID
FROM tblOne.CUSTOMER_ID
UNION
SELECT tblTwo.CUSTOMER_ID
FROM tblTwo.CUSTOMER_ID
I can't do just add an identifier column to the SELECT statemtn (like: SELECT tblOne.CUSTOMER_ID, "Table1" AS DataSource) because then the records wouldn't be unique and it will get both sets of data.
I feel I need to add it somewhere else in this query but am not sure how.
Edit for clarity:
For the union query output I need an additional column that can tell me if the unique value I am seeing exists in: (1) both tables, (2) table one, or (3) table two.
If the CUSTOMER_ID appears in both tables then we'll have to arbitrarily pick which table to call the source. The following query uses "tblOne" as the [SourceTable] in that case:
SELECT
CUSTOMER_ID,
MIN(Source) AS SourceTable,
COUNT(*) AS TableCount
FROM
(
SELECT DISTINCT
CUSTOMER_ID,
"tblOne" AS Source
FROM tblOne
UNION ALL
SELECT DISTINCT
CUSTOMER_ID,
"tblTwo" AS Source
FROM tblTwo
)
GROUP BY CUSTOMER_ID
Gord Thompson's answer is correct. But, it is not necessary to do a distinct in the subqueries. And, you can return a single column with the information you are looking for:
select customer_id,
iif(min(which) = max(which), min(which), "both") as DataSource
from (select customer_id, "tblone" as which
from tblOne
UNION ALL
select customer_id, "tbltwo" as which
from tblTwo
) t
group by customer_id
We could add an identifier column with the integer data type and then do an outer query:
SELECT
CUSTOMER_ID,
sum(Table)
FROM
(
SELECT
DISTINCT CUSTOMER_ID,
1 AS Table
FROM tblOne
UNION
SELECT
DISTINCT CUSTOMER_ID,
2 AS Table
FROM tblTwo
)
GROUP BY CUSTOMER_ID`
So if the "sum is 1" then it comes from tablOne and if it is 2 then it comes from tableTwo an if it is 3 then it exists in both
If you want to add a 3rd table in the union then give it a value of 4 so that you should have a unique sum for each combination
I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.
You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server
Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)
select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID