albert's blog: Database

Showing posts with label Database. Show all posts

2018/08/12

[PostgreSQL] How to retrieve a value which separated by comma

Problem
I retrieve 3 records via the following SQL statement:

  select id,  domain_projects 
  from project
  where domain_classifier = true

The value in domain_projects column is separated by comma, if I would like to pick up the value which contains 26, how to do it?

How-To
Here has SQL statement:

  select id,  domain_projects 
  from project
  where domain_classifier = true 
        and '26' = any(string_to_array(domain_projects, ','))

Reference
[1] https://www.postgresql.org/docs/9.1/static/functions-array.html
[2] https://www.postgresql.org/docs/9.1/static/functions-comparisons.html

2018/08/09

[PostgreSQL] No operator matches the given name and argument type(s). You might need to add explicit type casts.

Problem
I have a table which name project (the data type of Project.id is integer):

the domain_projects column stored values which separated by comma.

As I try to execute the following SQL statement, it will occur this error

SQLState: 42883, No operator matches the given name and argument type(s). You might need to add explicit type casts.

  select *
  from project
  where id in (select unnest(string_to_array( (select domain_projects from project where id = :projectId), ',')))

How-To
Owing to the return type of unnest function is set of any element, the result should be casted to integer to match project.id.

Hence, the updated SQL statement is as bellows:

  select *
  from project
  where id in (select cast(unnest(string_to_array( (select domain_projects from project where id = :projectId), ',')) as int))

2018/08/08

[PostgreSQL] How to split string into multiple rows?

Problem
I have a table which name project:

the domain_projects column stored values which separated by comma.

I attemp to split domain_projects string into multiple rows:

How do to it?

How-To
Here has SQL statement to fulfill this requirement:

  select cast(unnest(string_to_array( (select domain_projects from project where id = :id), ',')) as int) as project_id

Reference
[1] https://www.postgresql.org/message-id/20100121190706.GA12363@tux
[2] https://www.postgresql.org/docs/9.2/static/functions-array.html

2018/07/10

[SQuirreL SQL Client] How to avoid An I/O error occurred while sending to the backend error

Problem
I am using SQuirreL SQL Client as my database client tool, but it is very annoying to get this error message when I do not use for a while:

An I/O error occurred while sending to the backend error

This error results from connection session timeout, how can I keep session alive?
It's very annoying to re-connect again.

How-To
Here has the process to iron out this annoying error:

2018/07/09

[PostgreSQL] How to insert data with new line character?

Problem
I have a table which name Test

The content column will have new line characters, I try to use \n to insert but in vain.

1 2	insert into test (project_id, name, content) values(5, 'test', 'aaa\nbbb');

How-To
I should use chr(10) instead of \n. Therefore, the SQL statement will be updated as bellows:

1 2	insert into test (project_id, name, content) values(5, 'test', 'aaa' \|\| chr(10) \|\| 'bbb');

2018/07/08

[PostgreSQL] Example for with...as..insert into

Problem
I try to get data from intent and utter_template, and insert data into story when data is not existed.

How-To
Here has the example:

   with source as (
      select t.project_id, t.intent, t.utter_template, t.story_name
      from (
       select t1.project_id, '* _' || t1.value || chr(10) as intent, 
              '   - ' || t2.name || chr(10) utter_template, 
              'story_' || substring(t1.value from 4) story_name 
         from (select * from intent 
               where project_id=:project_id and value like 'qa_%') t1, 
              (select * from utter_template 
               where project_id=:project_id and name like 'utter_qa%') t2 
         where substring(t1.value from 4) =  substring(t2.name from 10)     
      ) t
      where (t.story_name) not in 
            (select name from story where project_id=:project_id)
   )
   insert into story (project_id, name, content) 
   (select project_id, story_name, intent || utter_template from source)

2018/06/13

[PostgreSQL] How to update data from a sub-query

Problem
I have two tables, project and intent, in my database.

I import data into intent table, but found out I mistype some data in intent.value column. All data in intent.value should start with qa_.

How to iron out this problem using update statement?

How-To
Here has an example to fix this problem:

  update intent
  set value = source.new_value
  from(
      select id as intent_id, value, 'qa_' || value as new_value, project_id as proj_id
      from intent 
      where project_id= (select id from project where identifier='test')
      and substring(value, 1, 3) <> 'qa_' 
  ) as source
  where id = source.intent_id and project_id = source.proj_id

2018/05/15

[PostgreSQL] window function: row_number() over (partition by col order by col)

Here has a query SQL statement and its result:

select p.identifier, i.value 
from project p, intent i
where identifier = 'qa_chat' and p.id = i.project_id
order by i.id desc
limit 10

We can find out the first column of the preceding picture has duplicate values (i.e. qa_chat), if I would like to leave blank when the value of first column is equal to preceding record:

We can use window function to fulfill this requirement, the SQL statement will look like:

with source as (
    select row_number() over(partition by p.identifier order by i.id desc) as row_number, 
           p.identifier, i.value 
    from project p, intent i
    where identifier = 'qa_chat' and p.id = i.project_id
    limit 10
)
select case when row_number = 1 then identifier else '' end as identifier, value
from source

Reference
[1] https://www.postgresql.org/docs/9.3/static/functions-window.html

2018/05/13

[PostgreSQL] How to get the first record in SELECT statement

Problem
I will get multiple record from this select statement:

  select * 
  from sentence 
  where intent_id in (
      select id 
      from intent 
      where project_id=5
  )

I only want the first record, how to do it?

How-To
Add LIMIT 1 at the end of the select statement:

  select * 
  from sentence 
  where intent_id in (
      select id 
      from intent 
      where project_id=5
  )
  limit 1

2018/02/08

[PostgreSQL] How to assign customized numbering rule in primary key

Problem
Assume I have a table, loan_main, which have a primary key, loan_id, with numbering rule : LN_YYYYMMDD_9999999999.
How to insert data into loan_main automatically without caring how to build loan_id with this numbering rule.

How-to

In this case, we need to know
1. how to concatenate string, i.e. ||
2. how to get current date with specific format:

select to_char(now(), 'YYYYMMDD')

3. how to get value from sequence and pad zero to the left:

select LPAD(nextval('loan_sequence')::text, 10, '0')

The syntax looks like:

ALTER TABLE loan_main ALTER COLUMN loan_id SET DEFAULT ('LN_' || (to_char(now(), 'YYYYMMDD')) || '_' || LPAD(nextval('loan_sequence')::text, 10, '0'));

2017/09/11

[DB2] The transaction log for the database is full.

Problem
When I execute the following SQL statement:

DELETE FROM test_table where id between 1 and 500000;

I get the error log:

The transaction log for the database is full.

How-To
This error resulted from "All space in the transaction log is being used."

I need to modify the delete statement into 5 SQL statement, delete 100000 records in each statement instead of delete 500000 at one time.

The updated delete SQL statement are the following:

-- delete rownumber from 1 to 100000
DELETE
FROM test_table
WHERE id IN
    (
    SELECT id
     FROM
       (SELECT id,
               row_num() over(ORDER BY id) AS rownumber
        FROM test_table)
     WHERE rownumber BETWEEN 1 AND 100000 
     );

-- delete rownumber from 100001 to 200000     
DELETE
FROM test_table
WHERE id IN
    (
    SELECT id
     FROM
       (SELECT id,
               row_num() over(ORDER BY id) AS rownumber
        FROM test_table)
     WHERE rownumber BETWEEN 100001 AND 200000 
     );     

-- delete rownumber from 200001 to 300000     
DELETE
FROM test_table
WHERE id IN
    (
    SELECT id
     FROM
       (SELECT id,
               row_num() over(ORDER BY id) AS rownumber
        FROM test_table)
     WHERE rownumber BETWEEN 200001 AND 300000 
     );          

-- delete rownumber from 300001 to 400000          
DELETE
FROM test_table
WHERE id IN
    (
    SELECT id
     FROM
       (SELECT id,
               row_num() over(ORDER BY id) AS rownumber
        FROM test_table)
     WHERE rownumber BETWEEN 300001 AND 400000 
     );               

-- delete rownumber from 400001 to 500000           
DELETE
FROM test_table
WHERE id IN
    (
    SELECT id
     FROM
       (SELECT id,
               row_num() over(ORDER BY id) AS rownumber
        FROM test_table)
     WHERE rownumber BETWEEN 400001 AND 500000 
     );

2017/08/11

[DB2] Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

Problem

When I using Apache Commons DBUtils to do batch insert data into IBM DB2, I get this SQLException:

1
2
3

SQL Message: [jcc][t4][102][10040][4.14.88] Batch failure.  
The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

Code Snippet:

    /**
     * 批次處理.
     * 
     * @param sql
     *            insert/update/delete sql statemet
     * @param params
     *            parameters
     * @throws SQLException
     *             in case of fail to do insert/update/delete
     */
    public void batchUpdate(String sql, Object[][] params) throws SQLException {
        QueryRunner run = new QueryRunner();
        run.batch(this.getConnection(), sql, params);
    }

How-To
According to the troubleshooting information from IBM Support, it said:

The error is caused by the DB2 Transaction log for the Data Warehouse becoming full and not able to process the full batch operation.

But the root case is "value too large" in insert SQL statement, having nothing to do with DB2 Transaction log.

2012/02/06

20 Database Design Best Practices

Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...).
Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better).
Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.).
Keep passwords as encrypted for security. Decrypt them in application when required.
Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...).
Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
Provide authentication for database access. Don’t give admin role to each user.
Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance.
Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters.
Partition big and unused/rarely used tables/table parts to different physical storages for better query performance.
For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc.
Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code.
Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts.
Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.
Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage.
Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.
Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance.
Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.

Total Pageviews