SQL WHERE Help: Pulling Data Multiple Rows - sql

I want to pull, say, all rows where a User has Color=blue and Color=red. I am interested in pulling these multiple rows to determine which users CHANGED their Color from blue to red, or from red to blue.
The general query i have now is this. What is wrong and how can i improve it? thank you!
Does this return Zero results because I am asking that the row's value has BOTH blue and red at the same time? (which is impossible)
my other worry, is that if I use OR instead of AND, that i will include rows for users that are color blue, or color red, but did NOT change between the two colors.
I want the results to ONLY show rows 1 and 4
SELECT *
FROM Table a
WHERE a.color='blue'
AND a.color='red'
Table Structure is below
Row | Date | Userid | Session | Color
1 | 11/1 | 001 | 24 | Blue
2 | 11/2 | 002 | 25 | Green
3 | 11/2 | 003 | 26 | Yellow
4 | 11/6 | 001 | 32 | Red

The glaring problem is:
SELECT *
FROM Table a
WHERE a.color='Blue'
OR a.color='Red'
You will either need a field with the previous color to be stored (kind of like a history) if you wish because otherwise there's not enough information in the database to properly assess what colors have been changed from.

Related

How to select or view only the top row in airtable?

I have this table:
Name | Weight | Color
1 Cherry | 1 | Red
2 Apple | 4 | Green
3 Pear | 3 | Yellow
I need a view in which only the top row is visible
Cherry | 1 | Red
When the table changes (new record, sorting), the view changes accordingly
Example 1:
Name | Weight V| Color
1 Apple | 4 | Green
2 Pear | 3 | Yellow
3 Cherry | 1 | Red
single row view:
Apple | 4 | Green
Example 2:
Name | Weight | Color
1 Almond | 0.5 | Brown
2 Apple | 4 | Green
3 Pear | 3 | Yellow
4 Cherry | 1 | Red
single row view:
Almond | 0.5 | Brown
This doesn't seem possible. Didn't find anything in related forums.
GPT3 suggestions were selecting a record by row_id or time_of_creation fields, but this won't help with table resorting.
It also suggested using SELECT(table_name, {}, {limit: N, fields: ["field_1", "field_2"]}) - but limit does not work. Same for FIRST() which doesn't exist.
Any solution to this?
Try a record list. You can sort elements there as well as limit how many you want to list.
I guess the other would be scripting and just "limiting" what the query returns when displaying it (which doesn't sound like what you want). I am not sure there is a simple native way.
I guess in the end record list limits are the closest to what you are trying to achieve, especially since your main criterion for a top row is sorting.

Create and display table column hierarchy in Tableau

My table currently has a number of similar numerical columns I'd like to nest under a common label.
My current table is something like:
| Week | Seller count, total | Seller count, churned | Seller count, resurrected |
| ---- | ------------------- | --------------------- | ------------------------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
And I'd like it to be:
| | Seller count |
| Week | Total | Churned | Resurrected |
| ---- | ----- | ------- | ----------- |
| 1 | 100 | 10 | 4 |
| 2 | 105 | 12 | 5 |
I've seen examples of this, including a related instructional video, but this video hides the actual creation of the nested object (called "Segment").
I also tried creating a hierarchy by dragging items in the "Data" tab on top of one another. This function appears to only be possible for dimensions (categorical data), not measures (numerical data) like mine.
Even so, I can drag my column names from the measures side onto the dimensions side to get them to be considered dimensions. Then I can drag to nest and create the hierarchy. But then when I drag the top item of the hierarchy ("Seller count" in the example below) into the "Columns" field, I get the warning "the field being added contains 92,000 members, and maximum recommended is 1,000". It thinks this is categorical data, and is maybe planning to create a subheading for each value (100, 105, etc.), instead of the desired hierarchy sub-items as subheadings.
Any idea how to accomplish this simple hierarchical restructuring of my column labels?
Actually, this is some data restructuring and Tableau isn't best suited for it. Still, it is simple one and you can do it like this-
I recreated one table like yours in excel, and imported it in Tableau
Rename the three cols, (removed seller count from their names)
selected these three columns at once, and select pivot to transform these like
Rename these columns again
create a text table in tableau, as you have shown in question

Database design for partially changing data points, with history and snapshot functionality?

I'm looking for a best practice or solution, on a conceptual level, to a problem I'm working on.
I have a collection of data points (around 500) which are partially changed, by a user, over time. It is important to able to tell, which values have been changed at what point in time. The data might look like this:
Data changed over time:
+--------------------------------------------------------------------------------------+
| Date | Value no. 1 | Value no. 2 | Value no. 3 | ... | Value no. 500 |
|------------+---------------+---------------+---------------+-------+-----------------|
| 1/1/2018 | | | 2 | | 1 |
| 1/3/2018 | 2 | 1 | | | |
| 1/7/2018 | | | 4 | | 8 |
| 1/12/2018 | 5 | 3 | | | |
....
It must be possible to take a snapshot at a certain point in time, to get a complete set of data points, that were valid for that particular point in time, like this:
Snapshot taken 1/3/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 2 | 0 | 1 |
Snapshot taken 1/9/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 2 | 1 | 4 | 0 | 8 |
Snapshot taken 1/13/2018 will yield:
+---------------------------------------------------------+
| Value 1 | Value 2 | Value 3 | ... | Value 500 |
|-----------+-----------+-----------+-------+-------------|
| 5 | 3 | 4 | 0 | 8 |
and so on...
I'm not bound by a particular database technology, so either SQL or NoSQL will do. It is probably not possible to satisfy all the requirements in the DB-domain - some will probably have to be addressed in code. But my main question is what database technology is best suited for this task?
I'm not quite sure this fits a time-series database (TSDB), since only a portion of the values are changed at a given time, and it is important to know which values changed. Maybe I'm wrong?
/Chris
My suggestion would be to model this in a sparse format, something like:
CREATE TABLE DataPoint (
DataID int, /* 1 to 500 in your example, or whatever you need to identify it*/
ValidFrom timestamp, /*default value 01/01/1970-00:00:00 or a suitable "Epoch" */
ValidUntil timestamp, /*default value 31/12/3999-00:00:00 or again something that is in the far future for your case */
value Number (7,5) /* again, this may be any data type, or even more than one field if needed, like Price & Currency
);
What we have just defined is a set of data and the "interval" in which each data has a specific value, so if you measured DataPoint 1 yesterday and got a value of 89.768 you will insert:
DataId=1
ValidFrom=26/11/2018-14:52:41
ValidUntil=31/12/3999-00:00:00
Value=89.768
Then you measure it again tomorrow and get:
DataId=1
ValidFrom=28/11/2018-14:51:23
ValidUntil=31/12/3999-00:00:00
Value=89.443
(Let assume that you have also logic so that when you record a new value you update the current value record and assign ValidUntil=28/11/2018-14:51:23 this is not really needed but will make the example query simpler).
One month from now you have accumulated more measurements for data #1, and the same, on different moments, for data #2 to 500.
You now want to find out what the values were at noon today (i.e. one month "ago") i.e. at 27/11/2018:12:00:00:00
Select DataID, Value from DataPoint where ValidFrom <= 27/11/2018:12:00:00 and ValidUntil > 27/11/2018:12:00:00
This will return:
001,89.768
002,45.678
...,...
500,112.809
Regarding logging who did this, or for what reason, you can either log it separately (saving for example DataPoint Id, Timestamp, UserId...) or make it part of the original table, so that whenever you register a new datapoint you also log who measured it.
Have a look at SQL Server temporal tables engine which may be a solution in your case. This approach allow to run the queries mentioned in the question, for example
SELECT *
FROM my_data
FOR SYSTEM_TIME AS OF '2018-01-01'
However, the table in the example seems to be very large (maybe denormalized). I would suggest to group columns by some technical or functional characteristics (vertical partitioning) to avoid further maintenance drawbacks.

Pivoting a redshift table

I think I am needing to pivot my database... or maybe there is some other function I can use to get the result I am looking for. Below is what my current dataset looks like (I actually have about 15 metrics):
+----------------------------------+---------+------------------------+----------------+
| ID | Metric 1| Metric 2 | Overall Column |
+----------------------------------+---------+------------------------+----------------+
| 1 | Red | Yellow | Red |
| 2 | Yellow | Yellow | Yellow |
| 3 | Yellow | | Yellow |
+----------------------------------+---------+------------------------+----------------+
The overall column already has logic in SQL to say 'Red' if any of the Metrics are Red (even if they are Yellow, too), and then 'Yellow' if any are Yellow. There are also cases where Two metrics can be Yellow, Red, etc. What I am looking to do is add a new column that will show specifically which metric (or metrics) caused the overall value of Red or Yellow. What I am thinking is some sort of pivot that will, for each ID, have metrics as a row value and the corresponding color also as a row value (if that makes sense), and then I can do a listagg function and then join that table back on to my original dataset based on the ID.
Pivot example, ignore col2 & col3..
+----------------------------------+---------+------------------------+----------------+
| ID | col1 | col2 | col3 |
+----------------------------------+---------+------------------------+----------------+
| 1 | Red | | |
| 1 | Yellow | | |
| 3 | Yellow | | |
+----------------------------------+---------+------------------------+----------------+
After this I can listagg that table to capture multiple colors and then join it to the original table. The only thing I am leaving out there is if there is both Red and Yellow metric for an individual ID and then I do a listagg, that would bring both Red and Yellow even though the overall value is based on the Red metric. Hoping the SQL experts can help me out here.
Redshift is currently based on Postgres 8.03 so it is missing a lot of functionality we've come to expect from Postgres over the last few years. So trying to come up with a solution involving unnest, array or lateral is out of the question (I've learned this the hard way).
So barring the availability of all those new-fangled features, you can unpivot the source table and create a set of each id and its metrics by using union all and creating a union for each metric column.
select a.id, metrics.metric
from tbl a
inner join (
select id, metric1 metric from tbl where metric1 is not null
union all select id, metric2 from tbl where metric2 is not null
union all select id, metric15 from tbl where metric15 is not null
) metrics ON metrics.id = a.id
order by a.id, metrics.metric
Results
id | item
---+--------
1 | red
1 | yellow
2 | blue
2 | green
2 | pink
3 | orange
SQL Fiddle

Generate 'source' column value when joining tables to form a view

I'm creating an android app using sqlite and have the following question:
Is there any way I can generate a value for a 'source' column when joining two or more tables to create a view? For instance, Say I have the following two tables
Table 1 (Fruit) Table 2 (Vegies)
Name | Colour Name | Colour
-------|-------- --------|--------
Apple | Red Celery | Green
Orange | Orange Carrot | Orange
Pear | Green Lettuce | Green
I'd like to create a view that looks something like this:
View (Food)
Name | Colour | Type
--------|---------|---------
Apple | Red | Fruit
Orange | Orange | Fruit
Pear | Green | Fruit
Celery | Green | Vegie
Carrot | Orange | Vegie
Lettuce | Green | Vegie
This may or may not be possible... But I figured it would be worth asking. It is important that I can tell which table or 'source' the row came from in my application. There may be a better way to do it than with a view but I figured this way I can keep all of the data I get in their own tables (which have extra info specific to what the table holds) and I don't have to duplicate anything.
P.S. Very new to SQL/Sqlite so if you could add a bit of an explanation that would be awesome!
Many thanks.
Maybe this would be a simple solution
select Fruits.*, 'Fruit' as type from Fruits union select Vegies.*,
'Vegie' from vegies
Hope it helps