Using with CUBE on the same column twice to output perculated results - sql

Hypothetically, I have a table that consists of int values only one column with values like 1,2,3 etc., called Number.
When I try:
SELECT Number,Number FROM Table Group By Number WITH CUBE
It returns:
Number Number
------ ------
1 1
2 2
3 3
I was expecting it to return something more like this:
Number Number
------ ------
1 1
1 2
1 3
2 1
2 2
2 3
and so forth... (with every combination)
How would this be possible, WITH CUBE doesn't seem to be cutting it here.

It seems you want the cartesian product:
SELECT a.Number, b.Number
FROM [Table] a, [Table] b
Or, another way to write:
SELECT a.Number, b.Number
FROM [Table] a CROSS JOIN [Table] b

Related

Find whether id matches and substitute using Case Hive query

I have a table called "Scan" customer transactions where an individual_id appears once for every different transaction and contains column like scan_id.
I have another table called ids which contains random individual_ids sampled from Scan Table
I would like to join ids with scan and get a single record of ids and scan_id if it matches certain values.
Suppose data is like below
Scan table
Ids scan_id
---- ------
1 100
1 111
1 1000
2 100
2 111
3 124
4 1000
4 111
Ids table
id
1
2
3
4
5
I want below output i.e if scan_id matches either 100 or 1000
Id MT
------ ------
1 1
2 1
3 0
4 1
I executed below query and got error
select MT, d.individual_id
from
(
select
CASE
when scan_id in (90069421,53971306,90068594,136739913,195308160) then 1
ELSE 0
END as MT
from scan cs join ids r
on cs.individual_id = r.individual_id
where
base_div_nbr =1
and
country_code ='US'
and
retail_channel_code=1
and visit_date between '2019-01-01' and '2019-12-31'
) as d
group by individual_id;
I would appreciate any suggestions or help with regard to this Hive query. If there is an efficient way of getting this job done. Let me know.
Use a group by:
select s.individual_id,
max(case when s.scan_id in (100, 1000) then 1 else 0 end) as mt
from scan s
group by s.individual_id;
The ids table doesn't seem to be needed for this query.

Create multiple rows based on 1 column

I currently have a table with a quantity in it.
ID Code Quantity
1 A 1
2 B 3
3 C 2
4 D 1
Is there anyway to write a sql statement that would get me
ID Code Quantity
1 A 1
2 B 1
2 B 1
2 B 1
3 C 1
3 C 1
4 D 1
I need to break out the quantity and have that many number of rows
Thanks
Here's one option using a numbers table to join to:
with numberstable as (
select 1 AS Number
union all
select Number + 1 from numberstable where Number<100
)
select t.id, t.code, 1
from yourtable t
join numberstable n on t.quantity >= n.number
order by t.id
Online Demo
Please note, depending on which database you are using, this may not be the correct approach to creating the numbers table. This works in most databases supporting common table expressions. But the key to the answer is the join and the on criteria.
One way would be to generate an array with X elements (where X is the quantity). So for rows
ID Code Quantity
1 A 1
2 B 3
3 C 2
you would get
ID Code Quantity ArrayVar
1 A 1 [1]
2 B 3 [1,2,3]
3 C 2 [2]
using a sequence function (e.g, in PrestoDB, sequence(start, stop) -> array(bigint))
Then, unnest the array, so for each ID, you get a X rows, and set the quantity to 1. Not sure what SQL distribution you're using, but this should work!
You can use connect by statement to cross join tables in order to get your desired output.
check my solution it works pretty robust.
select
"ID",
"Code",
1 QUANTITY
from Table1, table(cast(multiset
(select level from dual
connect by level <= Table1."Quantity") as sys.OdciNumberList));

Get duplicate on single column after distinct across multiple columns in SQL

I have a table that looks like this:
name | id
-----------
A 1
A 1
B 2
C 1
D 3
D 3
F 2
I want to return id's 1 and 2 because they are duplicate on names. I don't want to return 3, because it is distinct for D 3.
Basically, I'm thinking of doing a query to first get a distinct pairing, so the above reduces to
name | id
-----------
A 1
B 2
C 1
D 3
F 2
And then doing a duplicate find on the id column. However, I'm struggling to find the correct syntax to construct that query.
You should be able to get the result you want by using a GROUP BY along with a HAVING clause that counts the distinct names. The HAVING clause will filter for those ids that have more than one distinct name:
select id
from Table1
group by id
having count(distinct name) > 1
Here is a demo

MS Access 2010 SQL Top N query by group performance issue (continued2)

I have rephrased a previous question MS Access 2010 SQL Top N query by group performance issue (continued) as I believe the context was not clearly described before. The anwwer to my previous question did not provide the top n by group result. The rephrased question is more generic. I have now all data in one table.
Here is my situation: I have a table (Analysis) that contains products (Partnumber) of various categories (Category). Every product has a price (Value). The objective of the query is to show the 10 products with the highest price of each category The table contains 15000 records and will continue to grow.
This is the query:
SELECT
a.Location,
a.Category,
a.Partnumber
a.Value
FROM Analysis a
WHERE a.Partnumber IN (
SELECT TOP 10 aa.Partnumber
FROM Analysis aa
WHERE aa.Category = a.Category
ORDER BY aa.Value DESC
)
ORDER BY
a.Category;
Here is my question: My current query works with 1000 records in the table (respond time 3 seconds). With 15000 records the query runs endlessly long. How can I rebuild the query to significantly improve performance?
The answer to my previous question was to not use the in-list operation. But this eliminated function to give the top n records by group. The query gave the top n of all records.
For sample data in a table called [Analysis]
ID Location Category Partnumber Value
-- --------- -------- ---------- -----
1 here cat1 part001 1
2 there cat1 part002 2
3 wherever cat1 part003 3
4 someplace cat2 part004 4
5 nowhere cat2 part005 5
6 unknown cat2 part006 6
the "ranking query"
SELECT
a1.ID,
a1.Location,
a1.Category,
a1.Partnumber,
a1.Value,
COUNT(*) AS CategoryRank
FROM
Analysis a1
INNER JOIN
Analysis a2
ON a1.Category = a2.Category
AND a1.Value <= a2.Value
GROUP BY
a1.ID,
a1.Location,
a1.Category,
a1.Partnumber,
a1.Value
returns
ID Location Category Partnumber Value CategoryRank
-- --------- -------- ---------- ----- ------------
1 here cat1 part001 1 3
2 there cat1 part002 2 2
3 wherever cat1 part003 3 1
4 someplace cat2 part004 4 3
5 nowhere cat2 part005 5 2
6 unknown cat2 part006 6 1
so if you only want the top 2 items in each category just wrap the above query in a SELECT ... WHERE
SELECT *
FROM
(
SELECT
a1.ID,
a1.Location,
a1.Category,
a1.Partnumber,
a1.Value,
COUNT(*) AS CategoryRank
FROM
Analysis a1
INNER JOIN
Analysis a2
ON a1.Category = a2.Category
AND a1.Value <= a2.Value
GROUP BY
a1.ID,
a1.Location,
a1.Category,
a1.Partnumber,
a1.Value
) AS RankingQuery
WHERE CategoryRank <= 2
ORDER BY Category, CategoryRank
to give you
ID Location Category Partnumber Value CategoryRank
-- -------- -------- ---------- ----- ------------
3 wherever cat1 part003 3 1
2 there cat1 part002 2 2
6 unknown cat2 part006 6 1
5 nowhere cat2 part005 5 2
Note: Ensure that the [Category] and [Value] fields are indexed for best performance.

Multiple columns from a table into one, large column?

I don't know what in the world is the best way to go about this. I have a very large array of columns, each one with 1-25 rows associated with it. I need to be able to combine all into one large column, skipping blanks if at all possible. Is this something that Access can do?
a b c d e f g h
3 0 1 1 1 1 1 5
3 5 6 8 8 3 5
1 1 2 2 1 5
4 4 2 1 1 5
1 5
there are no blanks within each column, but each column has a different number of numbers in it. they need to be added from left to right so a,b, c, d, e, f. And the 0 from be needs to be in the first blank cell after the second 3 in A. And the first 5 in H needs to be directly after the 1 in g, with no blanks.
So you want a result like:
3
3
0
5
1
4
1
6
1
4
etc?
Here is how I would approach the problem. Insert your array into a work table with an autonumber column (important to retain the order the data is in, databases do not guarnatee an order unless you can give them something to sort on) called id
as well as the array columns.
Create a final table with an autonumber column (see above note on why you need an automnumber) and the column you want as you final table.
Run a separate insert statment for each column in your work table and run them in the order you want the data.
so the inserts would look something like:
insert table2 (colA)
select columnA from table1 order by id
insert table2 (colA)
select columnB from table1 order by id
insert table2 (colA)
select columnC from table1 order by id
Now when you do select columnA from table2 order by id you should have the results you need.