How to Quickly Flatten a SQL Table - sql

I'm using Presto. If I have a table like:
ID CATEGORY VALUE
1 a ...
1 b
1 c
2 a
2 b
3 b
3 d
3 e
3 f
How would you convert to the below without writing a case statement for each combination?
ID A B C D E F
1
2
3

I've never used Presto and the documentation seems pretty thin, but based on this article it looks like you could do
SELECT
id,
kv['A'] AS A,
kv['B'] AS B,
kv['C'] AS C,
kv['D'] AS D,
kv['E'] AS E,
kv['F'] AS F
FROM (
SELECT id, map_agg(category, value) kv
FROM vtable
GROUP BY id
) t
Although I'd recommend doing this in the display layer if possible since you have to specify the columns. Most reporting tools and UI grids support some sort of dynamic pivoting that will create columns based on the source data.

My 2 cents:
If you know "possible" values:
SELECT
m['web'] AS web,
m['shopping'] AS shopping,
m['news'] AS news,
m['music'] AS music,
m['images'] AS images,
m['videos'] AS videos,
m[''] AS empty
FROM (
SELECT histogram(data_tab) AS m
FROM datahub
WHERE
year = 2017
AND month = 5
AND day = 7
AND name = 'search'
) searches
No PIVOT function (yet)!

Related

Search for the occurrence of a list of values

I'm trying to find an optimized way to identify if a specific set of values exists in a list.
For example, lets assume the following list of records in a table
Id Value
1 A
2 B
3 A
4 C
5 A
6 B
7 C
8 C
9 A
I'm trying to find a way to check how much times the sequence {A, B} or {A, B, C} occurs, for example.
I know I can do this with cursors but I was checking if there's any other option that would be preferable in terms of performance.
The result I'd expect would by something like this:
{A, B}: 2 times:
{A, B, C}: 1 time.
I'm using Sql Server.
Probably the simplest way is to use the ANSI standard functions lag() and/or lead():
select count(*)
from (select t.*,
lead(value) over (order by id) as next_value,
lead(value, 2) over (order by id) as next_value2,
from t
) t
where value = 'A' and next_value = 'B' and next_value2 = 'C';

How to covert tuple to string in pig?

I have data as
id company
1 (a,b)
2 (a,c)
3 (f,g,h)
company is tuple, I generate it from BagToTuple(sortedbag.company) AS company.
I would like to remove the formate of tuple, I would like the data is looked as following:
id company
1 a b
2 a c
3 f g h
I would like the company column has no brackets and separate by space. Thanks.
===================update
I have the data set as
id company
1 a
1 b
1 a
2 c
2 a
I wrote the code as following:
record = load....
grp = GROUP record BY id;
newdata = FOREACH grp GENERATE group AS id,
COUNT(record) AS counts,
BagToTuple(record.company) AS company;
The output is looks like:
id count company
1 3 (a,b,a)
2 2 (c,a)
But I would like company can be sorted and distinct, and no Brackets, and divide by space.
What I expect result is as following:
id count company
1 3 a b
2 2 a c
I think you can just replace BagToTuple with BagToString in the last step:
newdata2 = FOREACH grp
GENERATE group AS id, COUNT(record) as counts,
BagToString(record.company, ' ') as company:chararray;
STORE newdata2 into outdir using PigStorage('#');
After the script runs
$ cat outdir2/part-r-00000
1#3#a b a
2#2#a c
for general tuple to bag, if you don't want UDF, you can do BagToString(TOBAG( your tuple ))
You can use the in-built FLATTEN() operator. http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Flatten+Operator.

Trying to group items into different categories based on a specific field's value

i cant quite figure out how to put this into a simple question, so I'll explain what I have and what I need to do.
Table A
|..ItemNum..|..ItemUse..|..SubC..|..MainC..|
|..123..|..B..|..AAA..|..QQQ..|
|..456..|..J..|..BBB..|..QQQ..|
|..123..|..D..|..DDD..|..RRR..|
|..789..|..C..|..CCC..|..WWW..|
|..345..|..W..|..EEE..|..TTT..|
|..678..|..B..|..FFF..|..YYY..|
I need to make a list of ItemNum and MainC that are grouped into 3 categories:
B / C / D = 1
<anything else> = 2
B / C / D & <anything else> = 3
So my results would be:
|..MainC..|..Group..|
|..QQQ..|..3..|
|..RRR..|..1..|
|..WWW..|..1..|
|..TTT..|..2..|
|..YYY..|..1..|
I've got an iif setup that takes care of groups 1 and 2, but cant figure out how to get the values in MainC to come out with Group 3.
Any ideas?
I don't understand your explanation ( especially the mapping to 3 ) but here's a shot:
select MainC, case when ItemUse in ('B','C','D') then 1
else 2
end as group
from A

SQL query for object with specific entries on a join table

I have a join table with the following columns:
target_id
assoc_id
int_attr
They represent my target object, its associated object, and an integer attribute that describes the association.
I am given a hash with keys representing the association attribute and values which contain the associated id with that attribute. For example:
{
1: [3, 5],
2: [7, 9],
}
I am trying to develop an SQL query which finds all target_ids with the appropriate join table entries. In the example above, it would find any target object with the 4 entries:
`targets_assocs`
target_id assoc_id int_attr
X 3 1
X 5 1
X 7 2
X 9 2
A 3 1
A 5 1
A 7 2
A 9 2
C 2 1
C 4 1
C 6 2
C 8 2
In this case, it would return X and A, ignoring, C.
I was trying to use some type of HAVING clause. I am trying to avoid having multiple nested subqueries using IF EXISTS. Any thoughts or advice would be appreciated.
I really have not written requests for MySQL, sorry for typos.
SELECT `target_id` FROM (
SELECT `target_id`, count(*) FROM (
SELECT `targets`.`target_id`, `targets_assocs`.`int_attr` FROM `targets`
INNER JOIN `targets_assocs` ON `targets_assocs`.`target_id` = `targets`.`id`
GROUP BY `targets`.`target_id`, `targets_assocs`.`int_attr`
)
GROUP BY `target_id`
HAVING COUNT(*) = 2)

Help with select query

I have a table with 2 columns:]
ID Name
1 A
1 B
1 C
1 D
2 E
2 F
And I want write a query to have output like :
1 A B C D
2 E F
It's possible ?
You want a Pivot, which is easy in excel, but requires (I believe) quite a bit of work in SQL Server, as it is hard to determine how many columns you need. You could dynamically construct the sql based on a max() aggregate, I suppose.
Start looking here
a good article you can look at :
http://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/