I am learning to use SQL and got the problem that I cant find a solution to add two select statements.
I tried it with union and the function sum(). I also tried to find a similar question here - without success.
select *
from (select 6,4,2,4,7,2,7
from dual
union
select 3,8,9,2,7,4,5
from dual)
I tried this but it shows me two rows with the numbers in the code.
I want the result of the rows in one single row, like:
9,12,11,6,14,6,12
You must alias the columns of the 1st query, and use sum() to aggregate on each of the columns:
select
sum(col1) sum1, sum(col2) sum2, sum(col3) sum3, sum(col4) sum4, sum(col5) sum5, sum(col6) sum6, sum(col7) sum7
from (
select 6 col1, 4 col2, 2 col3, 4 col4, 7 col5, 2 col6, 7 col7 from dual
union
select 3, 8, 9, 2, 7, 4, 5 from dual
)
See the demo.
Results:
SUM1 | SUM2 | SUM3 | SUM4 | SUM5 | SUM6 | SUM7
---: | ---: | ---: | ---: | ---: | ---: | ---:
9 | 12 | 11 | 6 | 14 | 6 | 12
First of all Union doesnt add data but combine rows and that too when the data to be unioned is having same no of columns and same type.
Your query is irrelevant and adding numerics like this is a bad practice and is off logic.
1. Select 6+3,....,.. from table
2. Select col1+col2 from table
where col1 in (6,4,2,4,7,2,7) and col2 in
(3,8,9,2,7,4,5)
As you could see the second query makes sense but first query doesnt.
Related
Let's look at the following table:
| col1 | col2 |
| -------- | -------------- |
| 1 | NULL |
| 23 | c |
| 73 | NULL |
| 43 | a |
| 3 | d |
Suppose you wanted to sort it like this:
| col1 | col2 |
| -------- | -------------- |
| 1 | NULL |
| 73 | NULL |
| 43 | a |
| 23 | c |
| 3 | d |
With the following code this would be almost trivial:
SELECT *
FROM dbo.table1
ORDER BY col2;
However, to sort it in the following, non-standard way isn't that easy:
| col1 | col2 |
| -------- | -------------- |
| 43 | a |
| 23 | c |
| 3 | d |
| 1 | NULL |
| 73 | NULL |
I made it with the following code
SELECT *
FROM dbo.table1
ORDER BY CASE WHEN col2 IS NULL THEN 1 ELSE 0 END, col2;
Can you explain to me 1) why and 2) how this query works? What bugs me is that the CASE-statement returns either 1 or 0 which means that either ORDER BY 1, col2 or ORDER BY 0, col2 will be executed. But the following code gives me an error:
SELECT *
FROM dbo.table1
ORDER BY 0, col2;
Yet, the overall statement works. Why?
How does this work?
ORDER BY (CASE WHEN col2 IS NULL THEN 1 ELSE 0 END),
col2;
Well, it works exactly as the code specifies. The first key for the ORDER BY takes on the values of 1 and 0 based on col2. The 1 is only when the value is NULL. Because 1 > 0, these are sorted after the non-NULL values. So, all non-NULL values are first and then all NULL values.
How are the non-NULL values sorted? That is where the second key comes in. They are ordered by col2.
Starting with this sample data:
--==== Sample Data
DECLARE #t TABLE (col1 INT, col2 VARCHAR(10))
INSERT #t(col1,col2) VALUES (1,NULL),(23,'c'),(73,NULL),(43,'a'),(3 ,'d');
Now note these three queries that do the exact same thing.
--==== QUERY1: Note the derived query
SELECT t.col1, t.col2
FROM
(
SELECT t.col1, t.col2, SortBy = CASE WHEN col2 IS NULL THEN 1 ELSE 0 END
FROM #t AS t
) AS t
ORDER BY t.SortBy;
--==== QUERY2: This does the same thing but with less code
SELECT t.col1, t.col2, SortBy = CASE WHEN col2 IS NULL THEN 1 ELSE 0 END
FROM #t AS t
ORDER BY SortBy;
--==== QUERY3: This is QUERY2 simplified
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY CASE WHEN col2 IS NULL THEN 1 ELSE 0 END;
Note that you can simplify your CASE statements like so:
--==== Simplified Case statemnt examples
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY CASE col2 WHEN NULL THEN 1 ELSE 0 END;
SELECT t.col1, t.col2
FROM #t AS t
ORDER BY IIF(col2 IS NULL,1,0);
Try this:
DECLARE #Table TABLE (col1 int, col2 char(1))
INSERT INTO #Table
VALUES
( 1 , NULL)
, ( 23, 'c' )
, ( 73, NULL)
, ( 43, 'a' )
, ( 3 , 'd' )
;
SELECT *
FROM #Table
ORDER BY ISNULL(col2, CHAR(255))
Common table expressions can be a big help both as a way of clarifying an issue as well as solving it. If you move the CASE clause up into the CTE and then use it to sort, this answers both why and how it works.
With Qry1 (
SELECT col1,
col2,
CASE WHEN col2 IS NULL THEN 1 ELSE 0 END As SortKey
FROM dbo.table1
)
SELECT *
FROM Qry1
ORDER BY SortKey, col2;
This is a description for oracle database SQL's ORDER BY:
ORDER [ SIBLINGS ] BY
{ expr | position | c_alias }
[ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ]
[, { expr | position | c_alias }
[ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ]
]...
We can see that position and expr were depicted as separate paths in the diagram. From the fact, we can conclude that the 0 and 1 are not categorized as position because the CASE expression is not position even though the expression would be evaluated to a number, which is can be viewed as position value.
I think this view can be applied to T-SQL too.
I have a dataset table like this in Google Big Query:
| col1 | col2 | col3 | col4 | col5 | col6 |
-------------------------------------------
| a1 | b1 | c1 | d1 | e2 | f1 |
| a2 | b2 | c2 | d1 | e2 | f2 |
| a1 | b3 | c3 | d1 | e3 | f2 |
| a2 | b1 | c4 | d1 | e4 | f2 |
| a1 | b2 | c5 | d1 | e5 | f2 |
Let's say the given threshold number is 4, in that case, I want to transform this into one of the tables given below:
| col1 | col2 | col4 | col5 | col6 |
---------------------------------------------------------------------
| [a1,a2] | [b1,b2,b] | [d1] |[e2,e3,e4,e5]| [f1,f2] |
Or like this:
| col | values |
------------------------
| col1 | [a1,a2] |
| col2 | [b1,b2,b] |
| col4 | [d1] |
| col5 | [e2,e3,e4,e5] |
| col6 | [f1,f2] |
Please note col3 was removed because it contained more than 4 (threshold) distinct values. I explored lot of documents here but was not able to figure out the required query. Can somebody help or point in the right direction ?
Edit: I have one solution in mind, where I do something like this:
select * from (select 'col1', array_aggregate(distinct col1) as values union all
select 'col2', array_aggregate(distinct col2) as values union all
select 'col3', array_aggregate(distinct col3) as values union all
select 'col4', array_aggregate(distinct col4) as values union all
select 'col5', array_aggregate(distinct col5) as values) X where array_length(values) > 4;
This will give me the second result but requires complex query construction assuming I don't know the number and names of the columns up front. Also, this might cross 100MB per row limit for BigQuery table as I will be having more than a billion rows in the table. Please also suggest if there is a better way to do this.
How about:
WITH arrays AS (
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001) AS values)
, ('col_actor_login', ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001))
, ('col_type', ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001))
, ('col_org_login', ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001))
]
FROM `githubarchive.year.2017`
))
)
SELECT *
FROM arrays
WHERE ARRAY_LENGTH(values)<=1000
This query processed 20.6GB in 11.9s (half billion rows). It only returned one row, because every other row had more than 1000 unique values (my threshold).
That's traditional SQL -- but see here an even simpler query, that produces similar results:
SELECT col, ARRAY_AGG(DISTINCT value IGNORE NULLS LIMIT 1001) values
FROM (
SELECT REGEXP_EXTRACT(x, r'"([^\"]*)"') col , REGEXP_EXTRACT(x, r'":"([^\"]*)"') value
FROM (
SELECT SPLIT(TO_JSON_STRING(STRUCT(repo.name, actor.login, type, org.login)), ',') x
FROM `githubarchive.year.2017`
), UNNEST(x) x
)
GROUP BY col
HAVING ARRAY_LENGTH(values)<=1000
# 17.0 sec elapsed, 20.6 GB processed
Caveat: This will only run if there are no special values in the columns, like quotes or commas. If you have those, it won't be as straightforward (but still possible).
Below is for BigQuery Standard SQL
#standardSQL
SELECT col, STRING_AGG(DISTINCT value) `values`
FROM (
SELECT
TRIM(z[OFFSET(0)], '"') col,
TRIM(z[OFFSET(1)], '"') value
FROM `project.dataset.table` t,
UNNEST(SPLIT(TRIM(to_JSON_STRING(t), '{}'))) kv,
UNNEST([STRUCT(SPLIT(kv, ':') AS z)])
)
GROUP BY col
HAVING COUNT(DISTINCT value) < 5
You can test, play with above using sample data from your question - result will be
Row col values
1 col1 a1,a2
2 col2 b1,b2,b3
3 col4 d1
4 col5 e2,e3,e4,e5
5 col6 f1,f2
#FelipeHoffa I was able to use your idea with a little modification in the query for my use-case.
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001) AS values)
, ('col_actor_login', ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001))
, ('col_type', ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001))
, ('col_org_login', ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001))
]
FROM `githubarchive.year.2017`
))
This UNNEST on an array of structs will not work as it is because the underlying columns will have different data-types and BigQuery will not be able to put the arrays under a single column (with error like this: Array elements of types {STRUCT>, STRUCT>} do not have a common supertype). I modified it to something like this to serve my use-case.
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, to_json_string(ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001)) AS values)
, ('col_actor_login', to_json_string(ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001)))
, ('col_type', to_json_string(ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001)))
, ('col_org_login', to_json_string(ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001)))
]
FROM `githubarchive.year.2017`
))
And this worked well !
select sum(col1), distinct col2 from table group by col1;
Above query fails. Is there an alternative to get distinct value if all the values in the group are unique?
For ex 1(if value in col2 is same):
|col1 | col2 |
|-----|------|
|1 | 2 |
|1 | 2 |
the output should be:
|col1(sum) |col2 |
|----------|-----|
| 2 | 2 |
ex 2(if value in col2 is different):
|col1 | col2 |
|-----|------|
|1 | 2 |
|1 | 3 |
the output should be:
|col1(sum) | col2 |
|----------|------|
| 2 |'...' |
DISTINCT is not an operation that is applied to a single column - it applies to all columns and does not make sense in the context of an aggregation.
the output should be:
col1(sum) col2
2 '...'
You can use a case statement to get your desired output but you will need to have all the outputs from the case statement having the same data-type; so, if you want ... when there are multiple values then you will need to convert your numbers to a string:
SELECT SUM( col1 ),
CASE
WHEN COUNT( DISTINCT col2 ) = 1
THEN TO_CHAR( MAX( col2 ) )
ELSE '...'
END
FROM your_table
GROUP BY col1
Do GROUP BY col1.
Have a case expression that returns col2 if there is only one distinct col2 value (for that col1 group.) Othwerwise return -999 (or any other value you chose).
select sum(col1),
case when count(distinct col2) = 1 then max(col2)
else -999
end
from t
group by col1
I have a table table1 like below
+----+------+------+------+------+------+
| id | loc | val1 | val2 | val3 | val4 |
+----+------+------+------+------+------+
| 1 | loc1 | 10 | 190 | null | 20 |
| 2 | loc2 | 20 | null | 10 | 10 |
+----+------+------+------+------+------+
need to combine the val1 to val4 into a new column val with a row for each so that the output is like below.
NOTE: - I data I have has val1 to val30 -> ie. 30 columns per row that need to be converted into rows.
+----+------+--------+
| id | loc | val |
+----+------+--------+
| 1 | loc1 | 10 |
| 1 | loc1 | 190 |
| 1 | loc1 | null |
| 1 | loc1 | 20 |
| 2 | loc2 | 20 |
| 2 | loc2 | null |
| 2 | loc2 | 10 |
| 2 | loc2 | 10 |
+----+------+--------+
You can use lateral join for transform columns to rows :
SELECT a.id,a.loc,t.vals
FROM table1 a,
unnest(ARRAY[a.val1,a.val2,a.val3,a.val4]) t(vals);
If you want to this with a dynamic added columns:
CREATE OR REPLACE FUNCTION columns_to_rows(
out id integer,
out loc text,
out vals integer
)
RETURNS SETOF record AS
$body$
DECLARE
columns_to_rows text;
BEGIN
SELECT string_agg('a.'||attname, ',') into columns_to_rows
FROM pg_attribute
WHERE attrelid = 'your_table'::regclass AND --table name
attnum > 0 and --get just the visible columns
attname <> all (array [ 'id', 'loc' ]) AND --exclude some columns
NOT attisdropped ; --column is not dropped
RETURN QUERY
EXECUTE format('SELECT a.id,a.loc,t.vals
FROM your_table a,
unnest(ARRAY[%s]) t(vals)',columns_to_rows);
end;
$body$
LANGUAGE 'plpgsql'
Look at this link for more detail: Columns to rows
You could use a cross join with generate_series for this:
select
id,
loc,
case x.i
when 1 then val1
when 2 then val2
. . .
end as val
from t
cross join generate_series(1, 4) x (i)
It uses the table only once and can be easily extended to accommodate more columns.
Demo
Note: In the accepted answer, first approach reads the table many times (as many times as column to be unpivoted) and second approach is wrong as there is no UNPIVOT in postgresql.
I'm sure there's a classier approach than this.
SELECT * FROM (
select id, loc, val1 as val from #t a
UNION ALL
select id, loc, val2 as val from #t a
UNION ALL
select id, loc, val3 as val from #t a
UNION ALL
select id, loc, val4 as val from #t a
) x
order by ID
Here's my attempt with unpivot but cant get the nulls, perhaps perform a join for the nulls? Anyway i'll still try
SELECT *
FROM (
SELECT * FROM #t
) main
UNPIVOT (
new_val
FOR val IN (val1, val2, val3, val4)
) unpiv
It will not work in postgress as needed by user. Saw when it was mentioned in comments.
I am finding a way to handle "NULL"
select p.id,p.loc,CASE WHEN p.val=0 THEN NULL ELSE p.val END AS val
from
(
SELECT id,loc,ISNULL(val1,0) AS val1,ISNULL(val2,0) AS val2,ISNULL(val3,0) AS val3,ISNULL(val4,0) AS val4
FROM Table1
)T
unpivot
(
val
for locval in(val1,val2,val3,val4)
)p
Test
EDIT:
Best Solution from my Side:
select a.id,a.loc,ex.val
from (select 'val1' as [over] union all select 'val2' union all select 'val3'
union all select 'val1' ) pmu
cross join (select id,loc from Table1) as a
left join
Table1 pt
unpivot
(
[val]
for [over] in (val1, val2, val3, val4)
) ex
on pmu.[over] = ex.[over] and
a.id = ex.id
Test
I am using oracle PL/SQL.
I am trying to compare column values with LAG function.
Following is the statement:
decode(LAG(col1,1) OVER (ORDER BY col3),col1,'No Change','Change_Occured') Changes
As for first row, LAG will always compare with the previous empty row. So for my query the first row of column 'Changes' is always showing the value as Change_Occured when in fact no change has happened. Is there any way to handle this scenario ?
Assume this table:
| col1 | col2 |
| 2 | 3 |
| 2 | 6 |
| 2 | 7 |
| 2 | 9 |
Each row of col1 is compared with previous value so result will be
| col1 | col2 | Changes |
| 2 | 3 | Change_occured |
| 2 | 9 | No Change |
| 2 | 5 | No Change |
| 2 | 8 | No Change |
So how should I handle the first row of column Changes
The syntax for LAG Analytic function is:
LAG (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)
default - The value returned if the offset is outside the scope of the window. The default value is NULL.
SQL> WITH sample_data AS(
2 SELECT 2 col1, 3 col2 FROM dual UNION ALL
3 SELECT 2 col1, 6 col2 FROM dual UNION ALL
4 SELECT 2 col1, 7 col2 FROM dual UNION ALL
5 SELECT 2 col1, 9 col2 FROM dual
6 )
7 -- end of sample_data mimicking real table
8 SELECT col1, LAG(col1,1) OVER (ORDER BY col2) changes FROM sample_data;
COL1 CHANGES
---------- ----------
2
2 2
2 2
2 2
Therefore, in the DECODE expression you are comparing the NULL value with a real value and it is evaluated as Change_Occurred
You could use the default value as the column value itself:
DECODE(LAG(col1,1, col1) OVER (ORDER BY col2),col1,'No Change','Change_Occured') Changes
For example,
SQL> WITH sample_data AS(
2 SELECT 2 col1, 3 col2 FROM dual UNION ALL
3 SELECT 2 col1, 6 col2 FROM dual UNION ALL
4 SELECT 2 col1, 7 col2 FROM dual UNION ALL
5 SELECT 2 col1, 9 col2 FROM dual
6 )
7 -- end of sample_data mimicking real table
8 SELECT col1,
9 DECODE(
10 LAG(col1,1, col1) OVER (ORDER BY col2),
11 col1,
12 'No Change',
13 'Change_Occured'
14 ) Changes
15 FROM sample_data;
COL1 CHANGES
---------- --------------
2 No Change
2 No Change
2 No Change
2 No Change
SQL>
May be:
decode(LAG(col1,1, col1) OVER (ORDER BY col3),col1,'No Change','Change_Occured') Changes
The optional default value is returned if the offset goes beyond the scope of the window. If you do not specify default, then its default is null.