SQL JOIN and COUNT COMMANDS - sql

I have two different tables, we will call table 1 and table 2. Within table 1, I have a column known as featureid which is a numerical value that corresponds to a numerical value in table 2 known as the termid. Also, within this table 2, each of the different termid corresponds to a plain text description of that termid.
What I am attempting to do is join the featureid in table 1 to the termid in table 2, but have the output be a two column display of the plain text description and occurrence of each within table 1.
I know I need to use the JOIN and COUNT syntax within SQL, but not sure how to correctly write the command.
Thanks!

You actually only need one column in the GROUP BY. It's also good practice to specify which values are coming from which table, like so:
SELECT
table2.textDesc,
COUNT(table1.featureid) AS OccurrenceCount
FROM
table1
INNER JOIN
table2 ON
table1.featureid = table2.termid
GROUP BY table2.textDesc

I think this is what you are looking for:
SELECT textDesc, COUNT(featureid)
FROM table1, table2
WHERE featureid=termid
GROUP BY featureid, textDesc
Alternatively, you can use a different syntax (with the same end result) like so:
SELECT textDesc, COUNT(featureid)
FROM table1 INNER JOIN table2
ON featureid=termid
GROUP BY featureid, textDesc

Related

Joining two tables in SQL in which one column has to be "cleaned"

I need to join two tables in SQL, which has two related columns (column ID1 in Table 1 and column ID in Table 2). ID1 in table 1 consists of 6 digits, whereas ID2 in table 2 consists of 6 digitis but an additional quotation marks (") in the beginning and end of the string. I need to remove these quotation marks and join the two tables to verify if there is any values reocurring in both columns.
I know how to remove first and last character of the string in table 2:
SELECT SUBSTRING ([ID2],2,Len([ID2])-2) FROM [dbo].[table2]
I need to join this new "trimmed" column with the other column from table 1.
Any suggestions?
Assuming you are using ms sql server db, and need everything from table1 and matched from table2 then:
sample:
table1 | table2
[ID] | [ID]
547832 | "547832"
-----------------------------
select table1.* , table2.*
from
db.tb1 table1
left join
db.tb2 table2
on
table1.[ID] = SUBSTRING([ID2],2,Len([ID2])-2) ;
First extract your trimmed column with different name by using 'AS' and then you can join the tables.
Try like the below
syntax: SELECT Substring( columnname , positon, length) AS Newcolumnname FROM Tablename;
EX: SELECT Substring(customerName,1,5) AS Newstr from Customer
Joins Table2 ON customer.Newstr = Table2.name;
I am using MS SQL, yes.
Thanks for the reply. However, why is it a left join and not an inner join here? Just curious.
So, essentially what I need to do is:
In the first table, I have around 10 columns, in the second table I have 5 columns. They all have different names, ID was just used as an example. Two of the columns from table 2 appears to have similar values as two of the columns from table 1 (one is an ID of 6 digits, the other is names). I want to remove the first and last character of the 6 digits in the ID column in table 2 and join that and the names column with ID and names from table 1. Hope it makes sense

Store only 1 values and remove the rest for same duplicated values in bigquery

I have duplicated values in my data. However, from the duplicated values, i only want to store 1 values and remove the rest of same duplicated values.
So far, I have found the solution where they remove ALL the duplicated values like this.
Code:
SELECT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
Sample input:
Sample output (Store only 1 values from the duplicated values and remove the rest):
NOTE: query might wrong as it remove all duplicated values.
Considering your screenshot's sample data and your explanation. I understand that you want to remove duplicates from your table retaining only one row of unique data. Thus, I was able to create a query to select only one row of data ignoring the duplicates.
In order to select the rows without any duplicates, you can use SELECT DISTINCT. According to the documentation, it discards any duplicate rows. In addition to this method, CREATE TABLE statement will also be used to create a new table (or replace the previous one) with the new data without duplicates. The syntax is as follows:
CREATE OR REPLACE TABLE project_id.dataset.table AS
SELECT DISTINCT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
And the output will be exactly the same as you shared in your question.
Notice that I used CREATE OR REPLACE, which means if you set project_id.dataset.table to the same path as the table within your select, it will replace your current table (in case you have the data coming from one unique table). Otherwise, it will create a new table with the specified new table's name.
You can use aggregation. Something like this:
SELECT ANY_VALUE(a).*, ANY_VALUE(b).*
FROM table1 a LEFT JOIN
table2 b
USING (ID)
WHERE date.A = 1
GROUP BY id, a.date;
For each id/datecombination, this returns an arbitrary matching row froma/b`.

SQL Server ISNULL multiple columns

I have the following query which works great but how do I add multiple columns in its select statement? Following is the query:
SELECT ISNULL(
(SELECT DISTINCT a.DatasourceID
FROM [Table1] a
WHERE a.DatasourceID = 5 AND a.AgencyID = 4 AND a.AccountingMonth = 201907), NULL) TEST
So currently I only get one column (TEST) but would like to add other columns such as DataSourceID, AgencyID and AccountingMonth.
If you want to output a row for some condition (or requested values ) and output a row when it does not meet condition,
you can set a pseudo table for your requested values in the FROM clause and make a left outer join with your Table1.
SELECT ISNULL(Table1.DatasourceId, 999999),
Table1.AgencyId,
Table1.AccountingMonth,
COUNT(*) as count
FROM ( VALUES (5, 4, 201907 ),
(6, 4, 201907 ))
AS requested(DatasourceId, AgencyId, AccountingMonth)
LEFT OUTER JOIN Table1 ON requested.agencyid=Table1.AgencyId
AND requested.datasourceid = Table1.DatasourceId
AND requested.AccountingMonth = Table1.AccountingMonth
GROUP BY Table1.DatasourceId, Table1.AgencyId, Table1.AccountingMonth
Note that:
I have put a ISNULL for the first column like you did to output a particular value (9999) when no value is found.
I did not put the ISNULL(...,NULL) like your query in the other columns since IMHO it is not necessary: if there is no value, a null will be output anyway.
I added a COUNT(*) column to illustrate an aggregate, you could use another (SUM, MIN, MAX) or none if you do not need it.
The set of requested values is provided as a constant table values (see https://learn.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql?view=sql-server-2017)
I have added multiple rows for requested conditions : you can request for multiple datasources, agencies or months in one query with one line for each in the output.
If you want only one row, put only one row in "requested" pseudo table values.
There must be a GROUP BY, even if you do not want to use an aggregate (count, sum or other) in order to have the same behavior as your distinct clause , it restricts the output to single lines for requested values.
To me it seems that you want to see does data exists, i guess that your's AgencyID is foreign key to agency table, DataSourceID also to DataSource, and that you have AccountingMonth table which has all accounting periods:
SELECT ds.ID as DataSourceID , ag.ID as AgencyID , am.ID as AccountingMonth ,
ISNULL(COUNT(a.*),0) as Count
FROM [Table1] a
RIGHT JOIN [Datasource] ds ON ds.ID = a.DataSourceID
RIGHT JOIN [Agency] ag ON ag.ID = a.AgencyID
RIGHT JOIN [AccountingMonth] am on am.ID = a.AccountingMonth
GROUP BY ds.ID, ag.ID, am.ID
In this way you can see count of records per group by criteria. Notice RIGHT join, you must use RIGHT JOIN if you want to include all record from "Right" table.
In yours query you have DISTINCT a.DatasourceID and WHERE a.DatasourceID = 5 and it returns 5 if in table exists rows that match yours WHERE criteria, and returns null if there is no data. If you remove WHERE a.DatasourceID = 5 your query would break with error: subquery returned multiple rows.
the way you are doing only allows for one column and one record and giving it the name of test. It does not look like you really need to test for null. because you are returning null so that does nothing to help you. Remove all the null testing and return a full recordset distinct will also limit your returns to 1 record. When working with a single table you don't need an alias, if there are no spaces or keywords braced identifiers not required. if you need to see if you have an empty record set, test for it in the calling program.
SELECT DatasourceID, AgencyID,AccountingMonth
FROM Table1
WHERE DatasourceID = 5 AND AgencyID = 4 AND AccountingMonth = 201907

Select values from one table depending on referenced value in another table

I have two tables in my SQLite Database (dummy names):
Table 1: FileID F_Property1 F_Property2 ...
Table 2: PointID ForeignKey(fileid) P_Property1 P_Property2 ...
The entries in Table2 all have a foreign key column that references an entry in Table1.
I now would like to select entries from Table2 where for example F_Property1 of the referenced file in Table1 has a specific value.
I tried something naive:
select * from Table2 where fileid=(select FileID from Table1 where F_Property1 > 1)
Now this actually works..kind of. It selects a correct file id from Table1 and returns entries from Table2 with this ID. But it only uses the first returned ID. What I need it to do is basically connect the returned IDs from the inner select by OR so it returns data for all the IDs.
How can I do this? I think it is some kind of cross-table-query like what is asked here What is the proper syntax for a cross-table SQL query? but these answers contain no explaination of what they are actually doing so I'm struggeling with any implementation.
They are using JOIN statements, but wouldn't this mix entries from Table1 and Table2 together while only checking matching IDs in both tables? At least that is how I understand this http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
As you may have noticed from the style, I'm very new to using databases in general, so please forgive me if not everything is clear about what I want. Please leave a comment and I will try to improve the question if neccessary.
The = operator compares a single value against another, so it is assumed that the subquery returns only a single row.
To check whether a (column) value is in a set of values, use IN:
SELECT *
FROM Table2
WHERE fileid IN (SELECT FileID
FROM Table1
WHERE F_Property1 > 1)
The way joins work is not by "mixing" the data, but sort of combining them based on the key.
In your case (I am assuming the key field in Table 1 is unique), if you join those two tables on the primary key field, you will end up with all the entries in table2 plus all corresponding fields from table1. If you were doing this:
select * from table1, table2 where table1.fieldID=table2.foreignkey;
then, providing your key fields are set up right, you will end up with the following:
PointID ForeignKey(fileid) P_Property1 P_Property2 FileID F_Property1 F_Property2
The field values from table1 would be from matching rows.
Now, if you do this:
select table1.* from table 1, table2 where
table1.fieldID=table2.foreignkey and F_Property1>1;
Would essentially get the same set of records, but will only show the columns from the second table, and only those that satisfy the where condition for the first one.
Hope this helps :)
If I understood your question correctly this will get the job done.
Select t2.*
from table1 t1
inner join table2 t2 on t2.id = t1.id
where t1.Prop = 'SomeValue'

SQL statement to return data from a table in an other sight

How would the SQL statement look like to return the bottom result from the upper table?
The last letter from the key should be removed. It stands for the language. EXP column should be split into 5 columns with the language prefix and the right value.
I'm weak at writing more or less difficult SQL statements so any help would be appreciated!
The Microsoft Access equivalent of a PIVOT in SQL Server is known as a CROSSTAB. The following query will work for Microsoft Access 2010.
TRANSFORM First(table1.Exp) AS FirstOfEXP
SELECT Left([KEY],Len([KEY])-2) AS [XKEY]
FROM table1
GROUP BY Left([KEY],Len([KEY])-2)
PIVOT Right([KEY],1);
Access will throw a circular field reference error if you try to name the row heading with KEY since that is also the name of the original table field that you are deriving it from. If you do not want XKEY as the field name, then you would need to break apart the above query into two separate queries as shown below:
qsel_table1:
SELECT Left([KEY],Len([KEY])-2) AS XKEY
, Right([KEY],1) AS [Language]
, Table1.Exp
FROM Table1
ORDER BY Left([KEY],Len([KEY])-2), Right([KEY],1);
qsel_table1_Crosstab:
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT qsel_table1.XKEY AS [KEY]
FROM qsel_table1
GROUP BY qsel_table1.XKEY
PIVOT qsel_table1.Language;
In order to always output all language columns regardless of whether there is a value or not, you need to spike of those values into a separate table. That table will then supply the row and column values for the crosstab and the original table will supply the value expression. Using the two query solution above we would instead need to do the following:
table2:
This is a new table with a BASE_KEY TEXT*255 column and a LANG TEXT*1 column. Together these two columns will define the primary key. Populate this table with the following rows:
"AbstractItemNumberReportController.SelectPositionen", "D"
"AbstractItemNumberReportController.SelectPositionen", "E"
"AbstractItemNumberReportController.SelectPositionen", "F"
"AbstractItemNumberReportController.SelectPositionen", "I"
"AbstractItemNumberReportController.SelectPositionen", "X"
qsel_table1:
This query remains unchanged.
qsel_table1_crosstab:
The new table2 is added to this query with an outer join with the original table1. The outer join will allow all rows to be returned from table2 regardless of whether there is a matching row in the table1. Table2 now supplies the values for the row and column headings.
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT Table2.Base_KEY AS [KEY]
FROM Table2 LEFT JOIN qsel_table1 ON (Table2.BASE_KEY = qsel_table1.XKEY)
AND (Table2.LANG = qsel_table1.Language)
GROUP BY Table2.Base_KEY
PIVOT Table2.LANG;
Try something like this:
select *
from
(
select 'abcd' as [key], right([key], 1) as id, expression
from table1
) x
pivot
(
max(expression)
for id in ([D], [E])
) p
Demo Fiddle