I have two tables:
Product Table
ID (PK), Description, CategoryID, SegmentID, TypeID, SubTypeID, etc.
Attribute Table
ID (PK), ProductID (FK), Key, Value
And I would like to query these two tables in a join that returns 1 row for each product with all of the Key/Value pair records in the Attribute table returned in a single column, perhaps separated by a pipe character (Key1: Value1 | Key2: Value2 | Key3: Value3 | etc.). Each product could have a different number of key/value pairs, with some products have as few as 2-3 and some having as many as 30. I would like to figure out how to get the query results to look something like this (perhaps selected into a new table):
product.ID, product.Description, [special attributes column], product.CategoryID, product.SegmentID, etc.
example result:
65839, "WonderWidget", "HeightInInches: 26 | WeightInLbs: 5 | Color: Black", "Widgets", "Commerical"
Conversely, it would be helpful to figure out how to take the query results, formatted as mentioned above, and push them back into the original Attribute table. For example, if we output the query above into a table where the [special attributes column] was modified (values updated/corrected by a human), it would be nice to know how to use the table containing the [special attributes column] to update the original Attribute table. I think for that to be possible, the Attribute.ID field would need to be included in the query output.
In the end, what I am trying to accomplish is way to export the Product and Attribute data out to 1 row per product with all the attribute data so that it can be reviewed/updated/corrected by a human in something as simple as an Excel file, and then pushed back into SQL. I think I can figure out how to do all of that once I get over the hurdle of figuring out how to get the products and attributes out as one row per product. Perhaps the correct answer is to pivot all of the attributes into columns, but I'm afraid the query would be incredibly wide and wasteful. Open to suggestions for this as well. Changing to a document type database is not an option right now; need to figure out the best way to handle this in relational SQL.
You first need to group the Key value pairs. This can be achieved using a concat operatoor like ||, you need to think about nulls as well. NUll concatenated with NULL is still NULL in most DBs.
SELECT ProductID, Key || ':' || Value as KeyValue FROM AttributeTable
Then you would need to group those using an aggregating function like STRING_AGG (Assuming SQL Server above 2017). Other databases have different aggregate functions Mysql f ex uses GROUP_CONCAT
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017
https://www.geeksforgeeks.org/mysql-group_concat-function/
SELECT ProductID, STRING_AGG( Key || ':' || Value, '|') as Key Value FROM AttributeTable GROUP BY ProductId
I can expand on the answer if you can provide more information.
Related
this is my first time dealing with indexes and would like to understand few things.
I have the tables of the following schemas:
Table1: Customer details
id
name
createdOn
username
phone
address
1
xyz
some date
xyz12
12345678
abc
The id in the above table is unique. The id is not defined as PK in the table though. Would id + createdOn be a good complex index?
Table2: Tracked customer actions
customer id
name
timestamp
action type
cart value
address
1
xyz
some date
click
.
abc
The above table does not have any column with unique values and there can be a lot of sparse data. The above actions table is a sample and can have almost 18 columns, with new data being added frequently. Is having all columns as a index a good one?
The queries on these tables could be both simple and complex as below:
select * from customerDetails
OR
with target_customers as (
select
id as customer_id
from customerDetails
where customer_date > {some date}
)
select avg(cart_value) from actions a
where action_type = 'cart updated'
inner join target_customers b on a.customer_id = b.customer_id
These are sample queries and I believe I will be having even more complex queries using different aggregations and joins with other tables as well to gain insights while performing analytics in the future.
I want to understand the best columns for indexes on the above tables.
The id is not defined as PK in the table though."
That's unusual. Why is that?
Would id + createdOn be a good complex index?
No, you'd reverse it: createdOn, id. An index can use the first column alone. This allows you to use the index to order by createdOn and also createdOn between X and Y.
But you probably wouldn't include id in there at all. Make id a primary key and it is indexed.
In general, if you want to cover all possibilities for two keys, make two indexes...
columnA, columnB
columnB
columnA, columnB can cover queries which only reference columnA and can also order by columnA. It can also cover queries which reference both columnA and columnB. But it can't cover a query which only references columnB, so we need an single-column index for columnB.
Is having all columns as a index a good one?
Maybe, it depends on your queries, but probably not.
You want to index foreign keys, they should be indexed automatically, because that will speed up all joins.
You probably want to index timestamps that you're going to search or order by.
Any flags you often query by, such as where action_type = 'cart updated' you may want to index. Or you may want to partition the table by the action type.
The above actions table is a sample and can have almost 18 columns, with new data being added frequently.
This may be a good use of a single jsonb column to store all the miscellaneous attributes. This allows you to use a single index for the jsonb column. However, jsonb is not a panacea and you will have to choose what to put in jsonb and what to make columns.
For example, a timestamp such as createdOn should probably be a column. Also any foreign keys. And status flags such as action_type.
I have a logic problem to calculate the final value of this table:
https://i.stack.imgur.com/YPXXX.png
I need calculate for every row with column TIPO having the value "E" +1 and for "S" -1, grouping by columns Codigo and Configuracao.
Basically, I need a simple stock control, the columns Codigo and Configuracao is product column control, and TIPO is the type of moviment, S = OUT and E = IN
Anyone can give me a light?
untested but maybe this
select SUM(t1.TipoNumeric), t1.CODIGO, t1.CONFIGURACAO from (
select
case (TIPO)
when 'E' then 1
when 'S' then -1
else 0
end as TipoNumeric,
CODIGO,
CONFIGURACAO
from MyTable
) as t1
group by t1.CODIGO, t1.CONFIGURACAO
Just add that +1/-1 column, perhaps?
alter table MyTable
add tipo_val computed by
(
decode( upper(TIPO), 'E', +1, 'S', -1 )
)
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-ddl-tbl
https://www.firebirdsql.org/refdocs/langrefupd21-intfunc-decode.html
And then:
Select * from MyTable;
Select SUM(tipo_val), CODIGO, CONFIGURACAO
From MyTable
Group by 2, 3
P.S. do not use pictures to show your data.
Instead put them to http://dbfiddle.uk/?rdbms=firebird_3.0 as a script,
and then use Markdown Export there to copy both data and a hyperlink into your question text.
P.P.S. i believe your whole approach is wrong there, if "need a simple stock control".
https://en.wikipedia.org/wiki/Double-entry_bookkeeping
https://medium.com/#RobertKhou/double-entry-accounting-in-a-relational-database-2b7838a5d7f8
I think your table should have columns like that:
surrogate row id, primary key, auto-incrementing integer, 32-bits or 64-bits
columns identifying your item, usually it is, again, a single surrogain integer SKU (Stock Keeping Unit) referencing (see - Foreign Keys) another "dictionary table". In your case it seemes to be two columns Codigo and Configuracao but that also means you can not add extra information ("attributes") about your items, like price or photo (read: database normalization). It also makes grouping harder for Firebird Engine, than using a single integer column. Also, you did created an index on the item-identifying column(s) did you not? What is your query plan on those selects, do they use index on Codigo and Configuracao or an ad hoc external sorting instead?
the timestamp of an operation, that is automatically set by the Firebird server to be current_timestamp, so you always know when exactly that row was inserted. Indexed, of course.
the computer user who added that row, again, automatically set by Firebird server to current_user or to an ID of a user in some stock_workers table you would create. Surely, indexed too.
some description of an operation, like contract number, or seller name, anything that would help you later to remember what real world event that row even describes. Being free form text, it probable would not be indexed. But maybe you would eventually make some contracts or sellers table and add integer references (FK IDs) to those tables? That depends which exactly kind of data would be repeated often enough to be worth extracting into an extra indexed columns.
maybe a unit measure, maybe all your units forever would only be measured in pieces, in integer quantity. But maybe there would be some items measured in kilograms, meters, liters, etc?
finaly two integer (or float?) columns like Qty_Income and Qty_Outcome, where you would record how many items were added or taken from your depot. There would be not that E/S column! There would be two integer columns, that you would put number into one or another. Why? read the articles about bookkeeping above!
In such a database scheme your query would finally look like this:
select Sum(s.Qty_Income) as Credit, Sum(s.Qty_Outcome) as Debit,
Sum(s.Qty_Income) - Sum(s.Qty_Outcome) as Saldo,
min(g.Codigo), min(g.Configuracao)
from stock_movements s
join known_goods g on g.ID = s.SKU_ID
group by s.SKU_ID
And you would also be able to flexibly compose similar requests grouping by workers, or dates, or quantities (like, only care about BIG events like 1000 or more items added in one operation), or anything.
I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)
I am trying to create a query that returns a single row for each unique ID in my oracle table.
The problem is that i have one column, Description, that isnt unique in each row (Description-column is the only coulmn that can differ for each ID row btw). This is what my table looks like:
ID Description Customer
==================================================
5119450733 Cost GOW_1
5119450733 Price GOW_1
1543512377 Cost GOW_2
Is there a way to query the table so that i append the results from Description so that i can have unique id rows? for example like this:
ID Description Customer
==================================================
5119450733 Cost,Price GOW_1
1543512377 Cost GOW_2
Use LISTAGG function if you are using Oracle 11g Release 2.
SELECT Id,
listagg(Description,',') WITHIN GROUP(ORDER BY description) AS Description,
Customer
FROM <table_name>
GROUP BY id, customer;
Refer the below link to know more about String Aggregation Techniques on different versions.
I have two tables which look like the following
tools:
id | part name
---------------
0 | hammer
1 | sickle
2 | axe
people:
personID | ownedTool1 | ownedTool2 | ownedTool3 ..... ownedTool20
------------------------------------------------------------------
0 | 2 | 1 | 3 ... ... 0
I'm trying to find out how many people own a particular tool. A person cannot own multiple copies of the same tool.
The only way I can think of doing this is something like
SELECT COUNT(*)
FROM tools JOIN people ON tools.id = people.ownedTool1.id OR tools.id = people.ownedTool2 ... and so on
WHERE tools.id = 0
to get the number of people who own hammers. I believe this will work, however, this involves having 20 OR statements in the query. Surely there is a more appropriate way to form such a query and I'm interested to learn how to do this.
You shouldn't have 20 columns each possibly containing an ID in the first place. You should properly establish a normalized schema. If a tool can belong to only one user - but a user can have multiple tools, you should establish a One to Many relationship. Each tool will have a user id in its row that maps back to the user it belongs to. If a tool can belong to one or more users you will need to establish a Many to Many relationship. This will require an intermediate table that contains rows of user_id to tool_id mappings. Having a schema set up appropriately like that will make the query you're looking to perform trivial.
In your particular case it seems like a user can have many tools and a tool can be "shared" by many users. For your many-to-many relation all you would have to do is count the number of rows in that intermediate table having your desired tool_id.
Something like this:
SELECT COUNT(ID) FROM UserTools Where ToolID = #desired_tool_id
Googling the terms I bolded should get you pointed in the correct direction. If you're stuck with that schema then the way you pointed out is the only way to do it.
If you cannot change the model (and I'm sure you will tell us that), then the only sensible way to work around this broken datamodel is to create a view that will give you a normalized view (pun intended) on the data:
create view normalized_people
as
select personid,
ownedTool1 as toolid
from people
union all
select personid,
ownedTool2 as toolid
from people
select personid,
ownedTool3 as toolid
from people
... you get the picture ...
Then your query is as simple as
select count(personid)
from normalized_people
where toolid = 0;
You received your (warranted) lectures about the database design.
As to your question, there is a simple way:
SELECT count(*) AS person_ct
FROM tbl t
WHERE translate((t)::text, '()', ',,')
~~ ('%,' || #desired_tool_id::text || ',%')
Or, if the first column is person_id and you want to exclude that one from the search:
SELECT count(*) AS person_ct
FROM tbl t
WHERE replace((t)::text, ')', ',')
~~ ('%,' || #desired_tool_id::text || ',%')
Explanation
Every table is accompanied by a matching composite type in PostgreSQL. So you can query any table this way:
SELECT (tbl) FROM tbl;
Yields one column per row, holding the whole row.
PostgreSQL can cast such a row type to text in one fell swoop: (tbl)::text
I replace both parens () with a comma , so every value of the row is delimited by commas ,.
My second query does not translate the opening parenthesis, so the first column (person_id) is excluded from the search.
Now I can search all columns with a simple LIKE (~~) expression using the desired number delimited by commas ~~ %,17,%
Voilá: all done with one simple command. This is reliable as long as you don't have columns like text or int[] in your table that could also hold ,17, within their values, or additional columns with numbers, which could lead to false positives.
It won't deliver performance wonders as it cannot use standard indexes. (You could create a GiST or GIN index on an expression using the tgrm module in pg 9.1, but that's another story.)
Anyway, if you want to optimize, you'd better start by normalizing your table layout as has been suggested.