Only reads columns with values + print date from same column (complicated) - sql

Here is something for your brains to bite on :D
Im not able to solve this out by myself. My table has the same princip as the fiddle example but col1-col32 instead of only col1-col5 like in the example.
http://sqlfiddle.com/#!6/6f6da
Goal is to get the output:
Apples, 20120104, 9.73
Berries, 20120101, 4.00
Berries, 20120103, 3.50
Bananas, 20120101,2.30
Kiwi, 20120103, 5.55
I know that the table has bad columns names and that the data is badly stored. Im not searching for help how to change the table, i have to work with the data as it is.
Thanks for your help

It is not so complicated as it seems:
;with cte as(
Select * from example
unpivot(c for d in([col2],[col3],[col4],[col5]))u
)
select c2.col1, c2.c, c1.c from cte c1
Join cte c2 on c1.d = c2.d
where c1.col1 = 'datum' and c2.col1 <> 'datum' and c2.c <> '0.00'
Fiddle http://sqlfiddle.com/#!6/6f6da/22

Related

SQL Server 'AS' alias unexpected syntax

I've come across following T-SQL today:
select c from (select 1 union all select 1) as d(c)
that yields following result:
c
-----------
1
1
The part that got me confused was d(c)
While trying to understand what's going on I've modified T-SQL into:
select c, b from (select 1, 2 union all select 3, 4) m(c, b)
which yields following result:
c b
----------- -----------
1 2
3 4
It was clear that d & m are table reference while letters in brackets c & b are reference to columns.
I wasn't able to find relevant documentation on msdn, but curious if
You're aware of such syntax?
What would be useful use case scenario?
select c from (select 1 union all select 1) as d(c)
is the same as
select c from (select 1 as c union all select 1) as d
In the first query you did not name the column(s) in your subquery, but named them outside the subquery,
In the second query you name the column(s) inside the subquery
If you try it like this (without naming the column(s) in the subquery)
select c from (select 1 union all select 1) as d
You will get following error
No column name was specified for column 1 of 'd'
This is also in the Documentation
As for the usage, some like to write it the first method, some in the second, whatever you prefer. It's all the same
An observation: Using the table constructor values gives you no way of naming the columns, which makes it neccessary to use column naming after the table alias:
select * from
(values
(1,2) -- can't give a column name here
,(3,4)
) as tableName(column1,column2) -- gotta do it here
You've already had comments that point you to the documentation of how derived tables work, but not to answer you question regarding useful use cases for this functionality.
Personally I find this functionality to be useful whenever I want to create a set of addressable values that will be used extensively in your statement, or when I want to duplicate rows for whatever reason.
An example of addressable values would be a much more compelx version of the following, in which the calculated values in the v derived table can be used many times over via more sensible names, rather than repeated calculations that will be hard to follow:
select p.ProductName
,p.PackPricePlusVAT - v.PackCost as GrossRevenue
,etc
from dbo.Products as p
cross apply(values(p.UnitsPerPack * p.UnitCost
,p.UnitPrice * p.UnitsPerPack * 1.2
,etc
)
) as v(PackCost
,PackPricePlusVAT
,etc
)
and an example of being able to duplicate rows could be in creating an exception report for use in validating data, which will output one row for every DataError condition that the dbo.Product row satisfies:
select p.ProductName
,e.DataError
from dbo.Products as p
cross apply(values('Missing Units Per Pack'
,case when p.SoldInPacks = 1 and isnull(p.UnitsPerPack,0) < 1 then 1 end
)
,('Unusual Price'
,case when p.Price > (p.UnitsPerPack * p.UnitCost) * 2 then 1 end
)
,(etc)
) as e(DataError
,ErrorFlag
)
where e.ErrorFlag = 1
If you can understand what these two scripts are doing, you should find numerous examples of where being able to generate additional values or additional rows of data would be very helpful.

Convert SQL Data in columns into an array

Imagine you have a simple table
Key c1 c2 c3
A id1 x y z
B id2 q r s
what I would like is a query that gives me the result as 2 arrays
so something like
Select
id1,
ARRAY_MAGIC_CREATOR(c1, c2, c3)
from Table
With the result being
id1, <x,y,z>
id2, <q,r,s>
Almost everything I have searched for end up converting rows to arrays or other similar sounding but very different requests.
Does something like this even exist in SQL?
Please note that the data type is NOT a string so we can't use string concat. They are all going to be treated as floats.
It is called ARRAY:
Select id1, ARRAY[c1, c2, c3] as c_array
from Table
This will also work :o)
select key, [c1, c2, c3] c
from `project.dataset.table`
Consider below generic option which does not require you to type all columns names or even to know them in advance - more BigQuery'ish way of doing business :o)
select key,
regexp_extract_all(
to_json_string((select as struct * except(key) from unnest([t]))),
r'"[^":,]+":([^":,]+)(?:,|})'
) c
from `project.dataset.table` t
If applied to sample data in your question - output is

Is my query correct? Can I optimize it? Positive sum and negative sum of integers

I have two solutions for finding the sum of positive integers and negative integers. Please,tell which one is more correct and more optimized?
Or Is there any other more optimized and correct query ?
Q:
Consider Table A with col1 and below values.
col1
20
-20
40
-40
-30
30
I need below output
POSITIVE_SUM NEGATIVE_SUM
90 -90
I have two solutions.
/q1/
select POSITIVE_SUM,NEGATIVE_SUM from
(select distinct sum(a2.col1) AS "POSITIVE_SUM" from A a1 join A a2 on a2.col1>0
group by a1.col1)
t1
,
(select distinct sum(a2.col1) AS "NEGATIVE_SUM"from A a1 join A a2 on a2.col1<0
group by a1.col1) t2;
/q2/
select sum (case when a1.col1 >= 0 then a1.col1 else 0 end) as positive_sum,
sum (case when a1.col1 < 0 then a1.col1 else 0 end) as negative_sum
from A a1;
POSITIVE_SUM NEGATIVE_SUM
90 -90
I wonder how you even came up with your 1st solution:
- self-join (twice) the table,
- producing 6 (identical) rows each and finally with distinct get 1 row,
- then cross join the 2 results.
I prepared a demo so you can see the steps that lead to the result of your 1st solution.
I don't know if this can be in any way optimized,
but is there case that it can beat a single scan of the table with conditional aggregation like your 2nd solution?
I don't think so.
The second query is not only better performing, but it returns the correct values. If you run the first query, you'll see that it returns multiple rows.
I think for the first query, you are looking for something like:
select p.positive_sum, n.negative_sum
from (select sum(col1) as positive_sum from a1 where col1 > 0) p cross join
(select sum(col1) as negative_sum from a1 where col1 < 0) n
And that you are asking wither the case expression is faster than the where.
What you are missing is that this version needs to scan the table twice. Reading data is generally more expensive than any functions on data elements.
Sometimes the second query might have very similar performance. I can think of three cases. First is when there is a clustered index on col1. Second is when col1 is used as a partitioning key. And third is on very small amounts of data (say data that fits on a single data page).

How can I combine multiple columns of the same data into a single column?

I have an issue here that is arising from poor data formatting (not on my behalf). I had a large CSV file downloaded from an external entity with nation wide data - it has about 5,000,000+ rows so that its too large of a file to open, let alone manually manipulate the data. I did get it uploaded to our SQL database, but getting the data into a usable format is difficult; each row has 10 different category codes, and can have multiple codes in each category. Unfortunately, they added new columns to handle this instead of adding a new row. Its tough to describe without an example:
ID A_Code1 A_Code2 A_Code3 B_Code1 B_Code2 B_Code3
1 123 765 654 qwe asd zxc
2 987 345 567 poi lkj mnb
and this is what I need:
ID A_Code B_Code
1 123 qwe
1 765 asd
1 654 zxc
2 987 poi
2 345 lkj
2 567 mnb
The way it is set up now makes querying nearly impossible as there are about 10 different types of on each row, and there are 10 columns for each code type. This means I have to query 100 different columns when I should only have to query 10.
If somebody knows a way to do this, it would be greatly appreciated. I have not been able to find anything like this so far, so I am getting desperate!
Thank you!
You need to unpivot the multiple columns of data into multiple rows, depending on your version of SQL Server there are several ways to get the result.
You can use CROSS APPLY and UNION ALL if using SQL Server 2005+:
select id, A_Code, B_Code
from yourtable
cross apply
(
select A_Code1, B_Code1 union all
select A_Code2, B_Code2 union all
select A_Code3, B_Code3
) c (A_Code, B_Code);
See SQL Fiddle with Demo.
You can also use CROSS APPLY with VALUES if using SQL Server 2008+:
select id, A_Code, B_Code
from yourtable
cross apply
(
values
(A_Code1, B_Code1),
(A_Code2, B_Code2),
(A_Code3, B_Code3)
) c (A_Code, B_Code);
See SQL Fiddle with Demo.
This allows you to convert the columns into rows in pairs - meaning A_Code1 and B_Code1 will be matched in the final result.
You could also use a UNION ALL:
select id, A_Code = A_Code1, B_Code = B_Code1
from yourtable
union all
select id, A_Code = A_Code2, B_Code = B_Code2
from yourtable
union all
select id, A_Code = A_Code3, B_Code = B_Code3
from yourtable ;
See SQL Fiddle with Demo

T SQL, breaking the loop once I found the records I needed

I m trying to read from temporary table using following query
select
a,b,c, result, sampleDate
from dbo.abc
where
a = #la and b = #lb and sampleDate > #lSampleDate and
resultType in ('sugar','salt','peppers')
What i want to achieve is, once I found the matching rows, I want to stop reading the table, delete the rows just read and search the table again, find new values and so on.
I dnt understand how to stop it once I found my values.
e.g
tsampledate tResultType result
10/08/2005 cream 10.9
10/08/2005 sugar 10.0
10/08/2005 Salt 15.0
10/08/2005 peppers 20.0
21/10/2012 sugar 21.9
21/10/2012 salt 23
21/10/2012 peppers 19.3
so I want read with tSampleDate 10/08/2005, break the loop goes back to search again. but the loop keep reading and give me all the values.
was thinking of SELECT CASE but cant figure out how to implement.
any help please.
The proper thing would be to
delete
from dbo.abc
where
a = #la and b=#lb and sampleDate > #lSampleDate and
resultType in ('sugar','salt','peppers')
Assuming that you are looking for first match that satisfies sampleDate > #lSampleDate and ..
i.e '10/08/2005' then Try this
SELECT a,b,c, result, sampleDate
FROM
(
select
a,b,c, result, sampleDate
,DENSE_RANK() OVER(ORDER BY sampleDate) AS rnk
from dbo.abc
where
a = #la and b = #lb and sampleDate > #lSampleDate and
resultType in ('sugar','salt','peppers')
) t
WHERE rnk=1
check this Ranking function DENSE_RANK