Redshift: Cross join make data disappear

Redshift: Cross join make data disappear - sql

I have a weird issue in Redshift with a crossjoin.
I am generating days and want to join them with some ids.
The sample query is this:
with ids as (
Select number as id
from models.number_10000
limit 10
),
day as (
SELECT
TO_CHAR(DATEADD(day,num.number,CAST(DATEADD(day,-463,GETDATE()) AS DATE)),'YYYY-MM-DD') as date_string
FROM
(Select * from models.number_10000 limit 463)
as num
)
SELECT
id,date_string
from ids,day
Everything is working fine so far.
However, if I group by then I have no results.
with ids as (
Select number as id
from models.number_10000
limit 10
),
day as (
SELECT
TO_CHAR(DATEADD(day,num.number,CAST(DATEADD(day,-463,GETDATE()) AS DATE)),'YYYY-MM-DD') as date_string
FROM
(Select * from models.number_10000 limit 463)
as num
)
SELECT
id,date_String
from ids,day
group by 1,2
How is this happening? I have never faced something similar. I guess it's something with the cross join and the group by but it seems very strange.
Any thoughts?

I'd start with the following:
State your JOIN explicitly (ANSI92 Style).
State the names of items you want to be grouped explicitly.
Moreover - remove your GROUP BY statement (as you do not have any aggregate functions) and have a DISTINCT clause in your select statement.

Related

Is there any optimal way to find the count of rows

I wrote SQL query in which I have one inner query and one outer query, My outer query produces the result on behalf of inner query, now I need to find the no of rows returning by my outer query, so what I did, I enclosed it inside another select statement and use count() function which produces the result, but i need to know more precise way to calculate the row count, please see my below query and suggest me the best way to do the same.
SELECT count(*) FROM (
SELECT
COUNT(*) NO_OF_EMP
,SUM(tbl.AMOUNT) TOTAL_AMOUNT
,tbl.YYYYMM
,tbl.DATA_PICKED_BY_NAME
,MIN(DATA_PICKED_DATE) DATA_PICKED_DATE
,ROW_NUMBER() OVER (ORDER BY tbl.REFERENCE_ID) AS ROW_NUM
FROM (
SELECT
SALARY_REPORT_ID
,EMP_NAME
,EMP_CODE
,PAY_CODE
,PAY_CODE_NAME
,AMOUNT
,PAY_MODE
,PAY_CODE_DESC
,YYYYMM
,REMARK
,EMP_ID
,PRAN_NUMBER
,PF_NUMBER
,PRAN_NO
,ATTOFF_EMPCODE
,DATA_PICKED_DATE
,DATA_PICKED_BY
,DATA_PICKED_BY_NAME
,SUBSTR(REFERENCE_ID,0,3) REFERENCE_ID
FROM SALARY_DETAIL_REPORT_HISTORY
WHERE PAY_CODE=999
AND REFERENCE_ID LIKE '202%'
) tbl
GROUP BY tbl.REFERENCE_ID,tbl.YYYYMM,tbl.DATA_PICKED_BY_NAME
order by tbl.YYYYMM
)mytbl1

Select count distinct of the most abbreviated version of a single value of your group values from your original query:
SELECT count(distinct SUBSTR(REFERENCE_ID,0,3) || YYYYMM || DATA_PICKED_BY_NAME)
FROM SALARY_DETAIL_REPORT_HISTORY
WHERE PAY_CODE=999
AND REFERENCE_ID LIKE '202%'

SQL Server : UNION ALL but remove duplicate IDs by choosing first date of occurrence

I am unioning two queries but I'm getting an ID that occurs in each query. I do not know how to keep only the first time the id occurs. Everything else about the row is different. In general, it will be hard to know which of the two queries I will have to keep a duplicate on, therefore, I need a general solution.
I was thinking about creating a temp table and choosing the min date (once the date has been converted to an int).
Any ideas on the proper syntax?

You can do this using the row_number() function. This will assign a sequential number, starting with 1, to each row with the same id (based on the partition by clause). The ordering of the sequence is determined by the order by clause. So, the following assigns 1 to the earliest date for each id:
select t.*
from (select t.*,
row_number() over (partition by id order by date asc) as seqnum
from ((select *
from <subquery1>
) union all
(select *
from <subquery2>
)
) t
) t
where seqnum = 1;
The final where clause simply filters for the first occurrence.

If you use the keyword UNION, then it will remove duplicates from the two data sets you are working with. UNION ALL preserves duplicates.
You can view the specifics here:
http://www.w3schools.com/sql/sql_union.asp

If you want to only have one of the 2 records and they are not identical you will have to filter them yourself. You may need to do something like the following. THis may be possible to do with the one (select union select) block but this should get you started.
select *
from (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x1
, (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x2
where x1.id = x2.id
and x1.date < x2.date
Although rethinking this if you go down a path like this why bother to UNION it?

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?

In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X

For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)

Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.

SELECT , COUNT() in SQLite

If i perform a standard query in SQLite:
SELECT * FROM my_table
I get all records in my table as expected. If i perform following query:
SELECT *, 1 FROM my_table
I get all records as expected with rightmost column holding '1' in all records. But if i perform the query:
SELECT *, COUNT(*) FROM my_table
I get only ONE row (with rightmost column is a correct count).
Why is such results? I'm not very good in SQL, maybe such behavior is expected? It seems very strange and unlogical to me :(.

SELECT *, COUNT(*) FROM my_table is not what you want, and it's not really valid SQL, you have to group by all the columns that's not an aggregate.
You'd want something like
SELECT somecolumn,someothercolumn, COUNT(*)
FROM my_table
GROUP BY somecolumn,someothercolumn

If you want to count the number of records in your table, simply run:
SELECT COUNT(*) FROM your_table;

count(*) is an aggregate function. Aggregate functions need to be grouped for a meaningful results. You can read: count columns group by

If what you want is the total number of records in the table appended to each row you can do something like
SELECT *
FROM my_table
CROSS JOIN (SELECT COUNT(*) AS COUNT_OF_RECS_IN_MY_TABLE
FROM MY_TABLE)

adding count( ) column on each row

I'm not sure if this is even a good question or not.
I have a complex query with lot's of unions that searches multiple tables for a certain keyword (user input). All tables in which there is searched are related to the table book.
There is paging on the resultset using LIMIT, so there's always a maximum of 10 results that get withdrawn.
I want an extra column in the resultset displaying the total amount of results found however. I do not want to do this using a separate query. Is it possible to add a count() column to the resultset that counts every result found?
the output would look like this:
ID Title Author Count(...)
1 book_1 auth_1 23
2 book_2 auth_2 23
4 book_4 auth_.. 23
...
Thanks!

This won't add the count to each row, but one way to get the total count without running a second query is to run your first query using the SQL_CALC_FOUND_ROWS option and then select FOUND_ROWS(). This is sometimes useful if you want to know how many total results there are so you can calculate the page count.
Example:
select SQL_CALC_FOUND_ROWS ID, Title, Author
from yourtable
limit 0, 10;
SELECT FOUND_ROWS();
From the manual:
http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_found-rows

The usual way of counting in a query is to group on the fields that are returned:
select ID, Title, Author, count(*) as Cnt
from ...
group by ID, Title, Author
order by Title
limit 1, 10
The Cnt column will contain the number of records in each group, i.e. for each title.

Regarding second query:
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl
cross join (select count(*) as cnt from tbl) as x
If you will not join to other table(s):
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl, (select count(*) as cnt from tbl) as x

My Solution:
SELECT COUNT(1) over(partition BY text) totalRecordNumber
FROM (SELECT 'a' text, id_consult_req
FROM consult_req cr);

If your problem is simply the speed/cost of doing a second (complex) query I would suggest you simply select the resultset into a hash-table and then count the rows from there while returning, or even more efficiently use the rowcount of the previous resultset, then you do not even have to recount

This will add the total count on each row:
select count(*) over (order by (select 1)) as Cnt,*
from yourtable

Here is your answare:
SELECT *, #cnt count_rows FROM (
SELECT *, (#cnt := #cnt + 1) row_number FROM your_table
CROSS JOIN (SELECT #cnt := 0 AS variable) t
) t;

You simply cannot do this, you'll have to use a second query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift: Cross join make data disappear - sql

I'd start with the following: State your JOIN explicitly (ANSI92 Style). State the names of items you want to be grouped explicitly. Moreover - remove your GROUP BY statement (as you do not have any aggregate functions) and have a DISTINCT clause in your select statement.

Related

Is there any optimal way to find the count of rows

SQL Server : UNION ALL but remove duplicate IDs by choosing first date of occurrence

Compare SQL groups against eachother

SELECT , COUNT() in SQLite

adding count( ) column on each row

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift: Cross join make data disappear - sql

I'd start with the following: State your JOIN explicitly (ANSI92 Style). State the names of items you want to be grouped explicitly. Moreover - remove your GROUP BY statement (as you do not have any aggregate functions) and have a DISTINCT clause in your select statement.

Related

Is there any optimal way to find the count of rows

SQL Server : UNION ALL but remove duplicate IDs by choosing first date of occurrence

Compare SQL groups against eachother

SELECT *, COUNT(*) in SQLite

adding count( ) column on each row

Categories

Resources

SELECT , COUNT() in SQLite