PostgreSQL – XML based fields

PostgreSQL – XML based fields post thumbnail image

You will be capable of studying about

  • PostgreSQL XML fields
  • How to insert the XML to the DB field
  • Retrieve data from the XML field

Scenario:

Sometimes we need to work with XML-based files for the customer requirement. According to my requirement, the customer will upload their Microsoft project file as an XML to our system and we need to display this data throughout our web portal as a dashboard.

We provide our solution for the requirement as follows:

  1. Front end application offers the option to upload their MS project file through a browsing option
  2. The system verifies the file and converts the file to XML file format through a java class.
  3. Store the XML file uploaded to the system in a file location.
  4. Converted XML file send to the DB function as a text string.
  5. DB function will check for the duplicate data and if not, data will insert into the table.

Start the process

Create a Table with field type XML

CREATE TABLE transactions.bia_t_project_file_info (
    pfi_id bigint NOT NULL,
    pfi_project_type bigint,
    pfi_filename character varying(250),
    pfi_current_date integer,
    pfi_project_date date,
    pfi_project_name character varying(1000),
    pfi_project_description character varying(1000),
    pfi_xml xml,
    pfi_status integer,
    pfi_active integer,
    created_by bigint,
    created_date timestamp without time zone DEFAULT CURRENT_TIMESTAMP,
    last_modified_by bigint,
    last_modified_date date
);

you can see, that there is a field called “pfi_xml XML” to maintain the XML string value for the process in the above table.

Note: When we are managing blob-type data in a DB field, it may directly impact your system performance. As a precaution, I kept this data only for the process and after the process completion removed it from the field. The original file uploaded is saved in a file location.

How to Retrieve data from the XML field

Sample 1

In an XML file, we know there are lots of hierarchical data available. To retrieve the relevant data, what we have to do is,

  • First, need to convert this file into a Table
SELECT "LastSaved"
	  FROM bia_t_project_file_info,
	       XMLTABLE('//Project'
	                PASSING pfi_xml
	                COLUMNS "LastSaved" timestamp));

How the above code works:

  1. We need to get the “last save” date from the XML that we saved in the field “pfi_xml“.
  2. for that, we need to access the table “bia_t_project_file_info” table (Which contains the “pfi_xml” field).
  3. We can convert the XML blob field into a table by using the XMLTABLE() method
  4. More about XMLTABLE function : https://www.postgresql.org/docs/current/functions-xml.html
  5. To access a particular field, we must give the level of the XML file. to that, we used “//project” as the “LastSaved” value is in the root element of the XML file.
  6. Then we pass the Column name(s) with its data type as we need. In our example, LastSaved is a date-time field.
  7. Now you can understand how the below code is working
	  FROM bia_t_project_file_info,
	       XMLTABLE('//Project'
	                PASSING pfi_xml
	                COLUMNS "LastSaved" timestamp));
  • Take the “pfi_xml” field from the table “bia_t_project_file_info”
  • Access the “pfi_xml” field and convert it to the XML table
  • Retrieve the fields by giving the XML element path

Sample 2:

                        SELECT 
			   xmltable.*
			  FROM transactions."bia_t_project_file_info",
			       XMLTABLE('//Project/Tasks/Task'
			                PASSING pfi_xml
			                COLUMNS "UID" text,
			                "GUID" text,
			                "ID" text,
			                "Name" text,
			                "Active" text,
			                "Manual" text,
			                "Type" text,
			                "IsNull" text,
			                "CreateDate" timestamp);
  • The above sample will retrieve the data from the same field “pfi_xml” from the table “bia_t_project_file_info“.
  • This time it takes the data from the project tasks which are located in “//project/Tasks/task” level.
  • and by referring to the xmltable.* it is accessible to all data we defined at the XMLTABLE() function.

Sample of the XML file

<?xml version="1.0" standalone="yes"?>
<Project>
	<SaveVersion>14<SaveVersion>
	<BuildNumber>16.0.5044.1000</BuildNumber>
	<Name>20200918 (Submission) OCTS ver. 2020.08.09 Rev.01-2.xml</Name>
	<GUID>A6665F30-1EFA-EA11-9455-68ECC58EE175</GUID>
	<Title>xxxxx xxxx</Title>
	<CreationDate>2020-08-17T08:00:00</CreationDate>
	<LastSaved>2020-09-30T17:49:00</LastSaved>
	<ScheduleFromStart>1</ScheduleFromStart>
	<StartDate>2020-08-09T08:00:00</StartDate>
	<FinishDate>2020-10-31T17:00:00</FinishDate>
	<FYStartDate>1</FYStartDate>
	<Tasks>
		<Task>
			<UID>0</UID>
			<GUID>A7665F30-1EFA-EA11-9455-68ECC58EE175</GUID>
			<ID>0</ID>
			<Name>Bandaranaike International Airport Develoment Project Phase II Stage 2 - Package B</Name>
			<Active>1</Active>
			<Manual>0</Manual>
			<Type>1</Type>
			<IsNull>0</IsNull>
			<CreateDate>2020-08-17T07:47:00</CreateDate>
			<WBS>0</WBS>
			<OutlineNumber>0</OutlineNumber>
			<OutlineLevel>0</OutlineLevel>
			<Priority>500</Priority>
			<Start>2020-08-09T08:00:00</Start>
			<Finish>2020-10-31T17:00:00</Finish>
			<Duration>PT576H0M0S</Duration>
			<ManualStart>2020-08-09T08:00:00</ManualStart>
			<ManualFinish>2020-10-31T17:00:00</ManualFinish>
			<ManualDuration>PT576H0M0S</ManualDuration>
			<DurationFormat>21</DurationFormat>
			<Work>PT0H0M0S</Work>
			<Stop>2020-08-15T08:43:12</Stop>
			<Resume>2020-08-15T08:43:12</Resume>
			<ResumeValid>0</ResumeValid>
			<EffortDriven>0</EffortDriven>
			<Recurring>0</Recurring>
			<OverAllocated>0</OverAllocated>
			<Estimated>0</Estimated>
			<Milestone>0</Milestone>
			<Summary>1</Summary>
			<DisplayAsSummary>0</DisplayAsSummary>
			<Critical>1</Critical>
			<IsSubproject>0</IsSubproject>
			<IsSubprojectReadOnly>0</IsSubprojectReadOnly>
			<ExternalTask>0</ExternalTask>
			<EarlyStart>2020-08-09T08:00:00</EarlyStart>
			<EarlyFinish>2020-10-31T17:00:00</EarlyFinish>
			<LateStart>2020-08-09T08:00:00</LateStart>
			<LateFinish>2020-10-31T17:00:00</LateFinish>
			<StartVariance>0</StartVariance>
			<FinishVariance>0</FinishVariance>
			<WorkVariance>0.00</WorkVariance>
			<FreeSlack>0</FreeSlack>
			<TotalSlack>0</TotalSlack>
			<StartSlack>0</StartSlack>
			<FinishSlack>0</FinishSlack>
			<FixedCost>0</FixedCost>
			<FixedCostAccrual>3</FixedCostAccrual>
			<PercentComplete>44</PercentComplete>
			<PercentWorkComplete>0</PercentWorkComplete>
			<Cost>0</Cost>
			<OvertimeCost>0O</vertimeCost>
			<OvertimeWork>PT0H0M0S</OvertimeWork>
			<ActualStart>2020-08-09T08:00:00</ActualStart>
			<ActualDuration>PT256H9M24S</ActualDuration>
			<ActualCost>0</ActualCost>
			<ActualOvertimeCost>0</ActualOvertimeCost>
			<ActualWork>PT0H0M0S</ActualWork>
			<ActualOvertimeWork>PT0H0M0S</ActualOvertimeWork>
			<RegularWork>PT0H0M0S</RegularWork>
			<RemainingDuration>PT319H50M36S</RemainingDuration>
			<RemainingCost>0</RemainingCost>
			<RemainingWork>PT0H0M0S</RemainingWork>
			<RemainingOvertimeCost>0</RemainingOvertimeCost>
			<RemainingOvertimeWork>PT0H0M0S</RemainingOvertimeWork>
			<ACWP>0.00</ACWP>
			<CV>0.00</CV>
			<ConstraintType>0</ConstraintType>
			<CalendarUID>-1</CalendarUID>
			<LevelAssignments>1</LevelAssignments>
			<LevelingCanSplit>1</LevelingCanSplit>
			<LevelingDelay>0</LevelingDelay>
			<LevelingDelayFormat>8</LevelingDelayFormat>
			<IgnoreResourceCalendar>0</IgnoreResourceCalendar>
			<HideBar>0</HideBar>
			<Rollup>0</Rollup>
			<BCWS>0.00</BCWS>
			<BCWP>0.00</BCWP>
			<PhysicalPercentComplete>0</PhysicalPercentComplete>
			<EarnedValueMethod>0</EarnedValueMethod>
			<IsPublished>0</IsPublished>
			<CommitmentType>0</CommitmentType>
			<Baseline>
				<Number>0</Number>
				<Start>2020-08-09T08:00:00</Start>
				<Finish>2020-10-31T17:00:00</Finish>
				<Duration>PT576H0M0S</Duration>
				<Cost>7290000</Cost>
			</Baseline>
		</Task>		
	 </Tasks>

Above is a part of the code and not the full-length file. you can just download an XML file or you can

Full DB Function:

CREATE FUNCTION transactions.bia_fun_create_project_info(project_type bigint, file_name character varying, pficurrent_date integer, project_date date, project_name character varying, project_description character varying, pfixml xml, status integer DEFAULT 1, active integer DEFAULT 1, created_by bigint DEFAULT 0) RETURNS bigint
LANGUAGE plpgsql
    AS $$
  -- (bigint, character varying, integer, character varying, xml, integer, integer, bigint) 
 declare _pfiid bigint;
  _LastSaved timestamp; -- Checked last saved value for verifications
  _reccount bigint;
begin

	_pfiid := 0;
	
    -- drop tem table if exists
    DROP TABLE IF EXISTS TempTable;
   
    -- Temp table to check the project is already exists and 
    -- also to replace the  tag
	CREATE TEMPORARY TABLE TempTable(
        col1 serial not null,
        colxml xml
    );
	
    -- Replace the  tag
	pfixml := replace(pfixml::varchar,''::varchar,''::varchar);
    _reccount := 0;
   

    
    insert into TempTable (colxml) values (pfixml);
  
     -- Check the Last saved date is already exists with the system. If yes,
     -- Ignore the records insertions 
	 select  count(*) into _reccount from transactions.bia_t_project_file_info btpfi  
	 inner join transactions.bia_t_project_header ph on ph.prj_pfi_id = btpfi.pfi_id 
	 where coalesce(pfi_active,0) = 1 and pfi_project_type = project_type and  prj_lastsaved in
	 (  
	 SELECT "LastSaved"
	  FROM TempTable,
	       XMLTABLE('//Project'
	                PASSING colxml
	                COLUMNS "LastSaved" timestamp));
	  
   
	-- Check the record is already available accourdng to the 
	-- previous step  _pfiid for other header and details tables             
	if (not found) or (_reccount = 0) then
	
			-- Insert the project upload history to the info table and 
			-- return the last generated ID value for 
			INSERT INTO transactions.bia_t_project_file_info
			(pfi_project_type, pfi_filename, pfi_current_date, pfi_project_date, pfi_project_name, pfi_project_description, pfi_xml, pfi_status, pfi_active, created_by)
			VALUES(project_type, file_name, pficurrent_date, project_date, project_name, project_description, pfixml, status, active, created_by) returning  pfi_id into _pfiid;
			
			
			
					-- Get XML info related to the Project root and
					-- insert into the Header record with the relevent info ID (_pfiid)
					INSERT INTO transactions.bia_t_project_header
					(prj_pfi_id, prj_saveversion, prj_buildnumber, prj_name, prj_guid, prj_title, prj_creationdate, prj_lastsaved, prj_schedulefromstart, prj_startdate, prj_finishdate, prj_fystartdate, prj_criticalslacklimit, prj_currencydigits, prj_currencycode, prj_currencysymbolposition, prj_calendaruid, prj_defaultstarttime, prj_defaultfinishtime, prj_minutesperday, prj_minutesperweek, prj_dayspermonth, prj_defaulttasktype, prj_defaultfixedcostaccrual, prj_defaultstandardrate, prj_defaultovertimerate, prj_durationformat, prj_workformat, prj_editableactualcosts, prj_honorconstraints, prj_insertedprojectslikesummary, prj_multiplecriticalpaths, prj_newtaskseffortdriven, prj_newtasksestimated, prj_splitsinprogresstasks, prj_spreadactualcost, prj_spreadpercentcomplete, prj_taskupdatesresource, prj_fiscalyearstart, prj_weekstartday, prj_movecompletedendsback, prj_moveremainingstartsback, prj_moveremainingstartsforward, prj_movecompletedendsforward, prj_baselineforearnedvalue, prj_autoaddnewresourcesandtasks, prj_statusdate, prj_currentdate, prj_microsoftprojectserverurl, prj_autolink, prj_newtaskstartdate, prj_newtasksaremanual, prj_defaulttaskevmethod, prj_projectexternallyedited, prj_extendedcreationdate, prj_actualsinsync, prj_removefileproperties, prj_adminproject, prj_updatemanuallyscheduledtaskswheneditinglinks, prj_keeptaskonnearestworkingtimewhenmadeautoscheduled)
					SELECT 
					 _pfiid,
					  xmltable.*
					 FROM transactions."bia_t_project_file_info",
					       XMLTABLE('//Project'
					                PASSING pfi_xml
					                COLUMNS "SaveVersion" text,
					                "BuildNumber" text,
					                "Name" text,
					                "GUID" text,
					                "Title" text,
					                "CreationDate" timestamp,
					                "LastSaved" timestamp,
					                "ScheduleFromStart" text,
					                "StartDate" timestamp,
					                "FinishDate" timestamp,
					                "FYStartDate" text,
					                "CriticalSlackLimit" text,
					                "CurrencyDigits" numeric(18,4),
					                "CurrencyCode" text,
					                "CurrencySymbolPosition" text,
					                "CalendarUID" integer,
					                "DefaultStartTime" time,
					                "DefaultFinishTime" time,
					                "MinutesPerDay" integer,
					                "MinutesPerWeek" integer,
					                "DaysPerMonth" integer,
					                "DefaultTaskType" text,
					                "DefaultFixedCostAccrual"  numeric(18,4),
					                "DefaultStandardRate"  numeric(18,4),
					                "DefaultOvertimeRate"  numeric(18,4),
					                "DurationFormat" text,
					                "WorkFormat" text,
					                "EditableActualCosts"  numeric(18,4),
					                "HonorConstraints" text,
					                "InsertedProjectsLikeSummary" text,
					                "MultipleCriticalPaths" text,
					                "NewTasksEffortDriven" text,
					                "NewTasksEstimated" integer,
					                "SplitsInProgressTasks" text,
					                "SpreadActualCost"  numeric(18,4),
					               "SpreadPercentComplete"  numeric(18,4),
					                "TaskUpdatesResource" text,
					                "FiscalYearStart" integer,
					                "WeekStartDay" integer,
					                "MoveCompletedEndsBack" text,
					                "MoveRemainingStartsBack" text,
					                "MoveRemainingStartsForward" text,
					                "MoveCompletedEndsForward" text,
					                "BaselineForEarnedValue"  numeric(18,4),
					                "AutoAddNewResourcesAndTasks" text,
					                "StatusDate" timestamp,
					                "CurrentDate" timestamp,
					                "MicrosoftProjectServerURL" text,
					                "Autolink" text,
					                "NewTaskStartDate" text,
					                "NewTasksAreManual" text,
					                "DefaultTaskEVMethod" text,
					                "ProjectExternallyEdited" text,
					                "ExtendedCreationDate" timestamp,
					                "ActualsInSync" text,
					                "RemoveFileProperties" text,
					                "AdminProject" text,
					                "UpdateManuallyScheduledTasksWhenEditingLinks" text,
					                "KeepTaskOnNearestWorkingTimeWhenMadeAutoScheduled" text
					                
					                ) where  pfi_id = _pfiid;	
					               
					               
			-- Get the relevent Tasks info from the XML file and save
			-- it to the Tasks table with Info id (_pfiid)		               
			INSERT INTO transactions.bia_t_project_tasks
			(tsk_pfi_id, tsk_prj_id, tsk_uid, tsk_guid, tsk_iid, tsk_name, tsk_active, tsk_manual, tsk_type, tsk_isnull, tsk_createdate, tsk_wbs, tsk_outlinenumber, tsk_outlinelevel, tsk_priority, tsk_start, tsk_finish, tsk_duration, 
			tsk_manualstart, tsk_manualfinish, tsk_manualduration, tsk_durationformat, tsk_work, tsk_stop, tsk_resume, tsk_resumevalid, 
			tsk_effortdriven, tsk_recurring, tsk_overallocated, tsk_estimated, tsk_milestone, tsk_summary, tsk_displayassummary, 
			tsk_critical, tsk_issubproject, tsk_issubprojectreadonly, tsk_externaltask, tsk_earlystart, tsk_earlyfinish, tsk_latestart, 
			tsk_latefinish, tsk_startvariance, tsk_finishvariance, tsk_workvariance, tsk_freeslack, tsk_totalslack, tsk_startslack,
			tsk_finishslack, tsk_fixedcost, tsk_fixedcostaccrual, tsk_percentcomplete, tsk_percentworkcomplete, tsk_cost, 
			tsk_overtimecost, tsk_overtimework, tsk_actualstart, tsk_actualduration, tsk_actualcost, tsk_actualovertimecost, 
			tsk_actualwork, tsk_actualovertimework, tsk_regularwork, tsk_remainingduration, tsk_remainingcost, tsk_remainingwork, 
			tsk_remainingovertimecost, tsk_remainingovertimework, tsk_acwp, tsk_cv, tsk_constrainttype, tsk_calendaruid, 
			tsk_levelassignments, tsk_levelingcansplit, tsk_levelingdelay, tsk_levelingdelayformat, tsk_ignoreresourcecalendar, 
			tsk_hidebar, tsk_rollup, tsk_bcws, tsk_bcwp, tsk_physicalpercentcomplete, tsk_earnedvaluemethod, tsk_ispublished, tsk_commitmenttype,
			tsk_link_list)
			SELECT 
			 _pfiid, cast(1 as bigint) as tsk_prj_id,
			  xmltable.*, 
			  case when position('.' in ("WBS")) = 0 then '-1' else  left("WBS",length("WBS")- position('.' in reverse("WBS"))) end as tsk_link_list
			  FROM transactions."bia_t_project_file_info",
			       XMLTABLE('//Project/Tasks/Task'
			                PASSING pfi_xml
			                COLUMNS "UID" text,
			                "GUID" text,
			                "ID" text,
			                "Name" text,
			                "Active" text,
			                "Manual" text,
			                "Type" text,
			                "IsNull" text,
			                "CreateDate" timestamp,
			                "WBS" text,
			                "OutlineNumber" text,
			                "OutlineLevel" text,
			                "Priority" text,
			                "Start" timestamp,
			                "Finish" timestamp,
			                "Duration" text,
			                "ManualStart" timestamp,
			                "ManualFinish" timestamp,
			                "ManualDuration" text,
			                "DurationFormat" text,
			                "Work" text,
			                "Stop" timestamp,
			                "Resume" timestamp,
			                "ResumeValid" text,
			                "EffortDriven" text,
			                "Recurring" text,
			                "OverAllocated" text,
			                "Estimated" text,
			                "Milestone" text,
			                "Summary" text,
			                "DisplayAsSummary" text,
			                "Critical" text,
			                "IsSubproject" text,
			                "IsSubprojectReadOnly" text,
			                "ExternalTask" text,
			                "EarlyStart" timestamp,
			                "EarlyFinish" timestamp,
			                "LateStart" timestamp,
			                "LateFinish" timestamp,
			                "StartVariance" text,
			                "FinishVariance" text,
			                "WorkVariance" text,
			                "FreeSlack" text,
			                "TotalSlack" text,
			                "StartSlack" text,
			                "FinishSlack" text,
			                "FixedCost" numeric(18,4),
			                "FixedCostAccrual" numeric(18,4),
			                "PercentComplete" numeric(18,4),
			                "PercentWorkComplete" numeric(18,4),
			                "Cost" numeric(18,4),
			                "OvertimeCost" numeric(18,4),
			                "OvertimeWork" text,
			                "ActualStart" timestamp,
			                "ActualDuration" text,
			                "ActualCost" numeric(18,4),
			                "ActualOvertimeCost" numeric(18,4),
			                "ActualWork" text,
			                "ActualOvertimeWork" text,
			                "RegularWork" text,
			                "RemainingDuration" text,
			                "RemainingCost" numeric(18,4),
			                "RemainingWork" text,
			                "RemainingOvertimeCost" numeric(18,4),
			                "RemainingOvertimeWork" text,
			                "ACWP" numeric(18,4),
			                "CV" numeric(18,4),
			                "ConstraintType" text,
			                "CalendarUID" text,
			                "LevelAssignments" text,
			                "LevelingCanSplit" text,
			                "LevelingDelay" text,
			                "LevelingDelayFormat" text,
			                "IgnoreResourceCalendar" text,
			                "HideBar" text,
			                "Rollup" text,
			                "BCWS" numeric(18,4),
			                "BCWP" numeric(18,4),
			                "PhysicalPercentComplete" numeric(18,4),
			                "EarnedValueMethod" text,
			                "IsPublished" text,
			                "CommitmentType" text
			                
			                
			                )  where  pfi_id = _pfiid;		
               
	else 
	
		_pfiid := -2; -- Saved record already exsists
	
	end if;

    DROP TABLE IF EXISTS TempTable;

return _pfiid;-- Returning the Info ID. If _pfiid < 0, that mean, it contain some issues

end;$$;

Leave a Reply

Your email address will not be published. Required fields are marked *

9 + = 11

Related Post