How to count distinct values in a list

How to count distinct values in a list - sql

I am fairly new to writing queries in Snowflake and have run into a hiccup. I am trying to count how many times an item appears in a list all in the same column.
I was able to use the flatten function and then tried to add in the count function with no luck.
Here is a dummy version of my data:
Ticket# Tasks
1 ["cut apple","peel orange","slice cheese"]
2 ["slice cheese","peel orange"]
3 ["cut apple"]
4 ["cut apple","slice cheese"]
5 ["cut apple", "chop kiwi"]
Here is what I want the output to look like:
(hopefully auto populating the distinct list of tasks in desc order)
Tasks Quantity
Cut Apple 4
Slice Cheese 3
Peel Orange 2
Chop Kiwi 1

Step 1: Define a normalized data schema and put the schema into a database.
Your normalized data schema might look something like this:
Step 2: Add your data
Step 3: Then you will be able to use SQL COUNT with DISTINCT to find the unique rows in your data table(s)
SQL COUNT with DISTINCT

Too long for a comment, but guidance for you to look into, then try to write a sample query. While you have the opportunity to do so while learning, I would look into Data Normalization and adjust your "Tasks" column.
You should have a secondary lookup table that has a primary key ID and a description of each unique task (you'll see in the data normalization). So you can follow along from your data context to the document, I will provide the layout examples and see how that helps you.
Starting with your lookup task table...
Tasks Table
TaskID TaskDescription
1 cut apple
2 peel orange
3 slice cheese
4 chop kiwi
Then, you would have another table that has TicketID, and a third table shows multiple records for each TicketID.
Ticket Table
TicketID ExPurchaseDate
1 someDate
2 sameDate
3 etc...
Now, a detail table per ticket.
TicketTasks Table
TicketTaskID TicketID TaskID
1 1 1
2 1 2
3 1 3
4 2 3
5 2 2
6 3 1
7 4 1
7 4 3
8 5 1
9 5 4
Try to digest this some with the normalization and then look into writing a sql query with COUNT(*) and GROUP BY. More than happy to help you more after, but hope this HELPs guide you some.

Related

SQL - Update in a cross apply query

UPDATE Table1
SET SomeColumn = X.SomeOtherColumn
FROM Table1 T1
CROSS APPLY
(SELECT TOP 1 SomeOtherColumn
FROM Table2 T2
WHERE T2.SomeJoinColumn = T1.SomeJoinColumn
ORDER BY CounterColumn) AS X
I want to increase CounterColumn by 1 each time the cross apply query runs. Is there any way I could achieve this?
Some context and sample data
I have a table containing information about companies. I want to anonymize the company numbers in this table. To do this, I want to use data from another table, containing synthetized data. This table has a much smaller sample size. So I have to reuse the same synthetic companies multiple times. For each row in the table I anonymize, I want to pick a synthetic company of the same type. I want to use all the synthetic companies. That's where the counter comes in, counting how many times I've used that specific synthetic company. By sorting by this counter, I was hoping to be able to always pick the synthetic company that's been used the least.
Company table (Table1)
CompanyNumber
Type
67923
2
82034
2
90238
7
29378
2
92809
5
72890
2
Synthetic company table (Table2)
SyntheticCompanyNumber
Type
Counter
08366
5
0
12588
2
0
33823
2
0
27483
7
0
Expected output of Company table:
CompanyNumber
Type
12588
2
33823
2
27483
7
12588
2
08366
5
33823
2
Expected output of synthetic company table
SynteticCompanyNumber
Type
Counter
08366
5
1
12588
2
2
33823
2
2
27483
7
1

Access SQL query select with a specific pattern

I want to select each 5 rows to be unique and the select pattern applies for the rest of the result (i.e if the result contains 10 records I am expecting to have 2 set of 5 unique rows)
Example:
What I have:
1
1
5
3
4
5
2
4
2
3
Result I want to achieve:
1
2
3
4
5
1
2
3
4
5
I have tried and searched a lot but couldn't find anything close to what I want to achieve.

Assuming that you can somehow order the rows within the sets of 5:
SELECT t.Row % 5, t.Row FROM #T t
ORDER BY t.Row , t.Row % 5
We could probably get closer to the truth with more details about what your data looks like and what it is you're trying to actually do.

This will work with the sample of data you provided
SELECT DISTINCT(thevalue) FROM theresults
UNION ALL
SELECT DISTINCT(thevalue) FROM theresults
But it's unclear to me if it's really what you need.
For instance :
if your table/results returns 12 rows, do you still want 2x5 rows or do you want 2x6 rows ?
do you have always in your table/results the same rows in double ?
There's a lot more questions to rise and no hint about them in what you asked.

Multicriteria Insert/Update

I'm trying to create a query that will insert new records to a table or update already existing records, but I'm getting stuck on the filtering and grouping for the criteria I want.
I have two tables: tbl_PartInfo, and dbo_CUST_BOOK_LINE.
I'm want to select from dbo_CUST_BOOK_LINE based upon the combination of CUST_ORDER_ID, CUST_ORDER_LINE_NO, and REVISION_ID. Each customer order can have multiple lines, and each line can have multiple revision. I'm trying to select the unique combinations of each order and it's connected lines, but take the connected information for the row with the highest value in the revision column.
I want to insert/update from dbo_CUST_BOOK_LINE the following columns:
CUST_ORDER_ID
PART_ID
USER_ORDER_QTY
UNIT_PRICE
I want to insert/update them into tbl_PartInfo as the following columns respectively:
JobID
DrawingNumber
Quantity
UnitPrice
So if I have the following rows in dbo_CUST_BOOK_LINE (PART_ID omitted for example)
CUST_ORDER_ID CUST_ORDER_LINE_NO REVISION_ID USER_ORDER_QTY UNIT_PRICE
SCabc 1 1 0 100
SCabc 1 2 4 150
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 1 0 0
SCxyz 1 2 3 50
It would return
CUST_ORDER_ID CUST_ORDER_LINE_NO (REVISION_ID) USER_ORDER_QTY UNIT_PRICE
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 2 3 50
but with PART_ID included and without REVISION_ID
So far, my code is just for the inset portion as I was trying to get the correct records selected, but I keep getting duplicates of CUST_ORDER_ID and CUST_ORDER_LINE_NO.
INSERT INTO tbl_PartInfo ( JobID, DrawingNumber, Quantity, UnitPrice, ProductFamily, ProductCategory )
SELECT dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO, Max(dbo_CUST_BOOK_LINE.REVISION_ID) AS MaxOfREVISION_ID
FROM dbo_CUST_BOOK_LINE, tbl_PartInfo
GROUP BY dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO;
This has been far more complicated that anything I've done so far, so any help would be greatly appreciated. Sorry about the long column names, I didn't get to choose them.

I did some research and think I found a way to make it work, but I'm still testing it. Right now I'm using three queries, but it should be easily simplified into two when complete.
The first is an append query that takes the two columns I want to get distinct combo's from and selects them and using "group by," while also selecting max of the revision column. It appends them to another table that I'm using called tbl_TempDrop. This table is only being used right now to reduce the number of results before the next part.
The second is an update query that updates tbl_TempDrop to include all the other columns I wanted by setting the criteria equal to the three selected columns from the first query. This took an EXTREMELY long time to complete when I had 700,000 records to work with, hence the use of the tbl_TempDrop.
The third query is a basic append query that appends the rows of tbl_TempDrop to the end destination, tbl_PartInfo.
All that's left is to run all three in a row.
I didn't want to include the full details of any tables or queries yet until I ensure that it works as desired, and because some of the names are vague since I will be using this method for multiple query searches.
This website helped me a little to make sure I had the basic idea down. http://www.techonthenet.com/access/queries/max_query2_2007.php
Let me know if you see any flaws with the ideology!

T-SQL 2008 INSERT dummy row WHEN condition is met

**Schema & Dataset**
id version payment name
1 1 10 Rich
2 1 0 David
3 1 10 Marc
4 1 10 Jess
5 1 0 Steff
1 2 10 Rich
2 2 0 David
3 2 10 Marc
4 2 10 Jess
5 2 0 Steff
2 3 0 David
3 3 10 Marc
4 3 10 Jess
http://sqlfiddle.com/#!3/1c457/18 - Contains my schema and the dataset I'm working with.
Background
The data above is the final set after a stored proc has done it's processing so everything above is in one table and unfortunately I can't change it.
I need to identify in the dataset where a person has been deleted with a payment total greater than 0 in previous versions and insert a dummy row with a payment of 0. So in the above example, Rich has been deleted in version 3 with a payment total of 10 on previous versions. I need to first identify where this has happened in all instances and insert a dummy row with a 0 payment for that version. Steff has also been deleted on version 3 but she hasn't had a payment over 0 on previous versions so a dummy row is not needed for her.
Tried so far -
So I looked at pinal dave's example here and I can look back to the previous row which is great so it's a step in the right direction. I'm not sure however of how to go about achieving the above requirement. I've been toying with the idea of a case statement but I'm not certain that would be the best way to go about it. I'm really struggling with this one and would appreciate some advice on how to tackle it.

You can do this by generating all possible combinations of names and versions. Then filter out the ones you don't want according to the pay conditions:
select n.name, v.version, coalesce(de.payment, 0) as payment
from (select name, max(payment) as maxpay from dataextract group by name) n cross join
(select distinct version from dataextract) v left outer join
dataextract de
on de.name = n.name and de.version = v.version
where de.name is not null or n.maxpay > 0;
Here is a SQL Fiddle.

Comparing row based on contents to another tables rows in Access SQL

Hi I'm working on a project and for time constraint reasons I need to keep working in Access which may be the root of all my problems but maybe there's hope.
I have a database that includes a table ANSWERS filled with input for users "wants" there are multiple columns which each correspond to an answer to a different question asking if they, Don't Care, Want, or Need something.
EG: Answers:
Bacon | Ham | Sausage
________________________________
1 0 0 2
2 2 1 0
3 0 2 0
4 1 1 1
(0 = Don't Care, 1 = Want, 2 = Need)
I want to compare a row from table Answers to the Available table.
EG: Available:
Bacon | Ham | Sausage
________________________________
1 0 1 0
2 0 0 0
3 1 1 1
4 1 1 0
(0 = Unavailable, 1 = Available)
So I would want to compare row 1 from Answers to Available so because row 1 includes sausage=2 then I would want to receive row 3 from Available because sausage=1.
I'd be happy receiving the entire row, or the row ID and a "1" for the rows being a match.
Ultimately I'd need to do this for all each of the rows in Answers.
Any ideas are appreciated, I was thinking using Intersect might work but since that doesn't work in access. I've also considered joining the tables, I could also change data variables or formats if necessary.
Thanks very much
Edit: Don't Care was previously Don't Want. Changed for clarity.

Give this a try:
SELECT tblAnswers.UserID, IIf([tblAnswers].[bacon]>0 And [tblMenus].[Bacon]<>0,[MenuID],Null) AS BaconMenu, IIf([tblAnswers].[Ham]>0 And [tblMenus].[Ham]<>0,[MenuID],Null) AS HamMenu, IIf([tblAnswers].[Sausage]>0 And [tblMenus].[Sausage]<>0,[MenuID],Null) AS SausageMenu
FROM tblAnswers, tblMenus
WHERE (((IIf([tblAnswers].[bacon]>0 And [tblMenus].[Bacon]<>0,[MenuID],Null)) Is Not Null)) OR (((IIf([tblAnswers].[Ham]>0 And [tblMenus].[Ham]<>0,[MenuID],Null)) Is Not Null)) OR (((IIf([tblAnswers].[Sausage]>0 And [tblMenus].[Sausage]<>0,[MenuID],Null)) Is Not Null));
Just paste that into a SQL view make Query window (After changing the table and column names to match yours) You will obviously need to tweak it as reality needs, but it does what you asked for with the data provided.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to count distinct values in a list - sql

Step 1: Define a normalized data schema and put the schema into a database. Your normalized data schema might look something like this: Step 2: Add your data Step 3: Then you will be able to use SQL COUNT with DISTINCT to find the unique rows in your data table(s) SQL COUNT with DISTINCT

Related

SQL - Update in a cross apply query

Access SQL query select with a specific pattern

Multicriteria Insert/Update

T-SQL 2008 INSERT dummy row WHEN condition is met

Comparing row based on contents to another tables rows in Access SQL

Categories

Resources