Conditional SELECT depending on a set of rules - sql

I need to get data from different columns depending on a set of rules and I don't see how to do it. Let me illustrate this with an example. I have a table:
ID ELEM_01 ELEM_02 ELEM_03
---------------------------------
1 0.12 0 100
2 0.14 5 200
3 0.16 10 300
4 0.18 15 400
5 0.20 20 500
And I have a set of rules which look something like this:
P1Z: ID=2 and ELEM_01
P2Z: ID=4 and ELEM_03
P3Z: ID=4 and ELEM_02
P4Z: ID=3 and ELEM_03
I'm trying to output the following:
P1Z P2Z P3Z P4Z
------------------------
0.14 400 15 300
I'm used to much simpler queries and this is a bit above my level. I'm getting mixed up by this problem and I don't see a straightforward solution. Any pointers would be appreciated.
EDIT Logic behind the rules: the table contains data about different aspects of a piece of equipment. Each combination of ID/ELEM_** represents the value of one aspect of the piece of equipment. The table contains all values of all aspects, but we want a row containing data on only a specific subset of aspects, so that we can output in a single table the values of a specific subset of aspects for all pieces of equipment.

Assuming that each column is numeric and ID is unique you could do:
SELECT
SUM(CASE WHEN ID = 2 THEN ELEM_01 END) AS P1Z,
SUM(CASE WHEN ID = 4 THEN ELEM_03 END) AS P2Z,
SUM(CASE WHEN ID = 4 THEN ELEM_02 END) AS P3Z,
SUM(CASE WHEN ID = 3 THEN ELEM_03 END) AS P4Z
...

Related

Need help combining multiple row data from same field, into multiple columns with alias

I have this table (see below) with rows from same "id" but different "codes" Capital/Expense as shown below.
Need to be able to generate an output that sums the "value" for Capital Codes and Expense Codes in two separate columns. I tried using Where and Alias but running into problems.
My data view table name: budgetcell_view
My data file is as shown below:
projectid
code
coodename
fieldtypename
value
6
01-00-000
capital
cost1
325000
6
02-00-000
expense
cost1
250000
7
01-00-000
capital
cost1
200000
7
02-00-000
expense
cost1
125000
8
01-00-000
capital
cost1
400000
8
02-00-000
expense
cost1
210000
9
01-00-000
capital
cost1
550000
9
02-00-000
expense
cost1
330000
my desired output is below.... any help will be appreciated:
projectid
capital_value
expense_value
6
325000
250000
7
200000
125000
8
400000
210000
9
550000
330000
In this specific case, you can simply create such a query using a case when construct:
SELECT projectid,
SUM(CASE WHEN code = '01-00-000' THEN value ELSE 0 END) capital_value,
SUM(CASE WHEN code != '01-00-000' THEN value ELSE 0 END) expense_value
FROM table1
GROUP BY projectid ORDER BY projectid;
Please see the working example here: db<>fiddle
Please note that you maybe need to extend this query or modify it if there are more difficult options (not only capital or expense), that's why I asked you to provide the entire table data.

Update query just showing zero values when there exists non-zero values. (ACCESS)

I have been struggling with this for hours. I am trying to update all values that have the same 'SHORT#'. If the 'SHORT#' is in 017_PolWpart2 I want this to be the value that updates the corresponding 'SHORT#' in 017_WithdrawalsYTD_changelater. This update query is just displaying zeroes, but these values are in fact non-zero.
So say 017_WithdrawalsYTD_changelater looks like this:
SHORT# WithdrawalsYTD
1 0
2 0
3 0
4 0
5 0
and 017_PolWpart2 looks like this:
SHORT# Sum_MTD_AGG
3 50
5 12
I want this:
SHORT# WithdrawalsYTD
1 0
2 0
3 50
4 0
5 12
But I get this:
SHORT# WithdrawalsYTD
1 0
2 0
3 0
4 0
5 0
I have attached the SQL for the Query below.
Thanks!
UPDATE 017_WithdrawalsYTD_changelater
INNER JOIN 017b_PolWpart2 ON [017_WithdrawalsYTD_changelater].[SHORT#] =
[017b_PolWpart2].[SHORT#]
SET [017_WithdrawalsYTD_changelater].WithdrawalsYTD = [017b_PolWpart2].[Sum_MTD_AGG];
EDIT:
As I must aggregate on the fly, I have tried to do so. Still getting all kinds off errors. Note the table 17a_PolicyWithdrawalMatch is of the form:
SHORT# MTG_AGG WithdrawalPeriod PolDurY
1 3 1 1
1 5 1 0
2 2 1 1
2 22 1 1
So I aggregate:
SHORT# MTG_AGG
1 3
2 24
And put these aggregated values in 017_WithdrawalsYTD_changelater.
I tried to this like so:
SELECT [017a_PolicyWithdrawalMatch].[SHORT#], Sum([017a_PolicyWithdrawalMatch].MTD_AGG) AS Sum_MTD_AGG
WHERE ((([017a_PolicyWithdrawalMatch].WithdrawalPeriod)=[017a_PolicyWithdrawalMatch].[PolDurY]))
GROUP BY [017a_PolicyWithdrawalMatch].[SHORT#]
UPDATE 017_WithdrawalsYTD_changelater INNER JOIN 017a_PolicyWithdrawalMatch ON [017_WithdrawalsYTD_changelater].[SHORT#] = [017a_PolicyWithdrawalMatch].[SHORT#] SET 017_WithdrawalsYTD_changelater.WithdrawalsYTD =Sum_MTD_AGG;
I am getting no luck... I get told SELECT statement is using a reserved word... :(
Consider heeding #June7's comments to avoid the use of saving aggregate data in a table as it redundantly uses storage resources since such data can be easily queried in real time. Plus, such aggregate values immediately become historical figures since it is saved inside a static table.
In MS Access, update queries must be sourced from updateable objects of which aggregate queries are not, being read-only types. Hence, they cannot be used in UPDATE statements.
However, if you really, really, really need to store aggregate data, consider using domain functions such as DSUM inside the UPDATE. Below assumes SHORT# is a string column.
UPDATE [017_WithdrawalsYTD_changelater] c
SET c.WithdrawalsYTD = DSUM("MTD_AGG", "[017a_PolicyWithdrawalMatch]",
"[SHORT#] = '" & c.[SHORT#] & "' AND WithdrawalPeriod = [PolDurY]")
Nonetheless, the aggregate value can be queried and refreshed to current values as needed. Also, notice the use of table aliases to reduce length of long table names:
SELECT m.[SHORT#], SUM(m.MTD_AGG) AS Sum_MTD_AGG
FROM [017a_PolicyWithdrawalMatch] m
WHERE m.WithdrawalPeriod = m.[PolDurY]
GROUP BY m.[SHORT#]

Grouping together query counts into as few queries as possible in SQLite?

I'm not a database expert but I've inherited this SQLite database I have to work with. It contains tags, images and events. An event contains a number of images and an image contains a number of tags (the tags describe the image content e.g. coffee, phone, laptop, etc.).
The table structure looks something like this:
row_id tags image_id event_id
1 computer 1 1
2 desk 1 1
3 chair 1 1
4 computer 2 1
5 coffee 2 1
6 desk 2 1
7 dog 3 2
8 phone 3 2
etc. etc. etc. etc. // many 1000's
The users of our system used to search for images by choosing some tags and we had a very simple query which returned a ranked list favoring images containing the most tags. It looked like this:
SELECT image_id
FROM TagsTable
WHERE tags
IN ('computer', 'desk', 'chair') // user variables
GROUP BY image_id
ORDER BY COUNT(image_id) DESC
But now we want to return a list of the events (which I need to rank) instead of individual images. I can achieve this by doing many queries in a loop but it's very slow. Ideally I'm trying to produce the following information in as few queries as possible.
So if the user searched for 'computer', 'desk' and 'chair', you would get...
event_id computer_count desk_count chair_count event_image_count
1 12 15 9 56
2 22 0 13 24
3 14 7 0 32
etc. etc. etc. etc. etc.
// no results if all tag counts are 0
So at a glance we can see event 1 contains a total of 56 images and the tag 'computer' appears 12 times, 'desk' appears 15 times and 'chair' appears 9 times.
Is this possible using just SQL or do I need to perform multiple queries? Please note I am using SQLite.
You can answer this specific question using conditional aggregation:
SELECT event_id,
SUM(CASE WHEN tags = 'computer' THEN 1 ELSE 0 END) as computer_count,
SUM(CASE WHEN tags = 'desk' THEN 1 ELSE 0 END) as desk_count,
SUM(CASE WHEN tags = 'chair' THEN 1 ELSE 0 END) as chair_count,
COUNT(DISTINCT image_id) as image_count
FROM TagsTable
WHERE tags IN ('computer', 'desk', 'chair')
GROUP BY event_id;
EDIT:
To add an "average" column:
SELECT . . .
SUM(CASE WHEN tags IN ('computer', 'desk', 'chair') THEN 1.0 ELSE 0 END) / 3 as tag_average

Converting Column Headers to Row elements

I have 2 tables I am combining and that works but I think I designed the second table wrong as I have a column for each item of what really is a multiple choice question. The query is this:
select Count(n.ID) as MemCount, u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
from name as n inner join
UD_Demo_ORG as u on n.ID = u.ID
where n.MEMBER_TYPE like 'ORG_%' and n.CATEGORY not like '%_2' and
(u.Pay1Click = '1' or u.PayMailCC = '1' or u.PayMailCheck = '1' or u.PayPhoneACH = '1' or u.PayPhoneCC = '1' or u.PayWuFoo = '1')
group by u.Pay1Click, u.PayMailCC, u.PayMailCheck, u.PayPhoneACH, u.PayPhoneCC, u.PayWuFoo
The results come up like this:
Count Pay1Click PayMailCC PayMailCheck PayPhoneACH PayPhoneCC PayWuFoo
8 0 0 0 0 0 1
25 0 0 0 0 1 0
8 0 0 0 1 0 0
99 0 0 1 0 0 0
11 0 1 0 0 0 0
So the question is, how can I get this to 2 columns, Count and then the headers of the next 6 headers so the results look like this:
Count PaymentType
8 PayWuFoo
25 PayPhoneCC
8 PayPhoneACH
99 PayMailCheck
11 PayMailCC
Thanks.
Try this one
Select Count,
CASE WHEN Pay1Click=1 THEN 'Pay1Click'
PayMailCC=1 THEN ' PayMailCC'
PayMailCheck=1 THEN 'PayMailCheck'
PayPhoneACH=1 THEN 'PayPhoneACH'
PayPhoneCC=1 THEN 'PayPhoneCC'
PayWuFoo=1 THEN 'PayWuFoo'
END as PaymentType
FROM ......
I think indeed you made a mistake in the structure of the second table. Instead of creating a row for each multiple choice question, i would suggest transforming all those columns to a 'answer' column, so you would have the actual name of the alternative as the record in that column.
But for this, you have to change the structure of your tables, and change the way the table is populated. you should get the name of the alternative checked and put it into your table.
More on this, you could care for repetitive data in your table, so writing over and over again the same string could make your table grow larger.
if there are other things implied to the answer, other informations in the UD_Demo_ORG table, then you can normalize the table, creating a payment_dimension table or something like this, give your alternatives an ID such as
ID PaymentType OtherInfo(description, etc)...
1 PayWuFoo ...
2 PayPhoneCC ...
3 PayPhoneACH ...
4 PayMailCheck ...
5 PayMailCC ...
This is called a dimension table, and then in your records, you would have the ID of the payment type, and not the information you don't need.
So instead of a big result set, maybe you could simplify by much your query and have just
Count PaymentId
8 1
25 2
8 3
99 4
11 5
as a result set. it would make the query faster too, and if you need other information, you can then join the table and get it.
BUT if the only field you would have is the name, perhaps you could use the paymentType as the "id" in this case... just consider it. It is scalable if you separate to a dimension table.
Some references for further reading:
http://beginnersbook.com/2015/05/normalization-in-dbms/ "Normalization in DBMS"
http://searchdatamanagement.techtarget.com/answer/What-are-the-differences-between-fact-tables-and-dimension-tables-in-star-schemas "Differences between fact tables and dimensions tables"

Hive: Create rows with summed data, by date (unknown number of dates)

I am currently working with a Hive Table which contains transactions data and I need to do some basic statistics on these data, and put the results in a new table.
EDIT: I'm using Hive 0.13 on Hadoop 2.4.1.
CONTEXT
First, let me try to present the input table: here's a table with 3 columns, an ID, a date (month/year), and an amount:
<ID> <Date> <Amount>
1 11.2014 5.00
2 11.2014 10.00
3 12.2014 15.00
1 12.2014 7.00
1 12.2014 15.00
2 01.2015 20.00
3 01.2015 30.00
3 01.2015 45.00
... ... ...
And the desired output consist of a table grouped by IDs, where in each line I sum the the amounts, for each corresponding months:
<ID> <11.2014> <12.2014> <01.2015> <...>
1 5.00 22.00 0.00 ...
2 10.00 0.00 20.00 ...
3 15.00 0.00 75.00 ...
... ... ... ... ...
Considering that the original table has >4 million IDs and > 500 million lines, on more then 2 years. It seems pretty hard to hardcode the table by hand since I don't know how many columns I should create.
(I know how many different dates I have, but if the original table grows over 5, 10, 15 years, there is going to be a lot to do by hand and that's risky.)
THE CHALLENGE
I know how to do some basic manipulations and GROUP BYs, I can even do some CASE WHEN, but the tricky part in my problem is that I can not create columns like this (as mentionned above)...
SUM (CASE WHEN Date = 11.2014 THEN Amount ELSE 0 END) AS 11.2014
SUM (CASE WHEN Date = 12.2014 THEN Amount ELSE 0 END) AS 12.2014
SUM (CASE WHEN Date = 01.2015 THEN Amount ELSE 0 END) AS 01.2015
SUM (CASE WHEN Date = ??? THEN Amount ELSE 0 END) AS ???
... because I don't know how many different dates I'll eventually have, so I would need something like this:
SUM (CASE WHEN Date = [loop over each dates] THEN Amount ELSE 0 END)
AS [the date selected in the loop]
THE QUESTION
Do you have something to propose in order to :
How can I loop over all the dates ?
And be able to create a colum for every dates I have without specifying myself the name of the soon to be created column ?
Is it doable in a single HiveQL script ? (not obligated but could be really nice)
I would like to avoid UDF but at this point I'm not sure it's preventable since I haven't find any case that ressemble mine.
Thanks in advance and don't hesitate to ask for more info.
This is too long for a comment.
You cannot do exactly what you want in Hive, because a SQL query has to have a fixed number of columns when it is defined.
What can you do?
The easiest thing is simply to change what you want. Product multiple rows instead of multiple columns:
select id, date, sum(amount)
from table t
group by id, date;
You can then load the data into your favorite spreadsheet and pivot it there.
Other alternatives. You can write a query that will write the appropriate query. This would go through the table, identify the possible dates, and construct a SQL statement. You can then run the SQL statement.
Or, you could use some other data types, such as a list or JSON to store the aggregated values in one row.