
You will be capable of studying about
- PostgreSQL XML fields
- How to insert the XML to the DB field
- Retrieve data from the XML field
Scenario:
Sometimes we need to work with XML-based files for the customer requirement. According to my requirement, the customer will upload their Microsoft project file as an XML to our system and we need to display this data throughout our web portal as a dashboard.
We provide our solution for the requirement as follows:
- Front end application offers the option to upload their MS project file through a browsing option
- The system verifies the file and converts the file to XML file format through a java class.
- Store the XML file uploaded to the system in a file location.
- Converted XML file send to the DB function as a text string.
- DB function will check for the duplicate data and if not, data will insert into the table.
Start the process
Create a Table with field type XML
CREATE TABLE transactions.bia_t_project_file_info ( pfi_id bigint NOT NULL, pfi_project_type bigint, pfi_filename character varying(250), pfi_current_date integer, pfi_project_date date, pfi_project_name character varying(1000), pfi_project_description character varying(1000), pfi_xml xml, pfi_status integer, pfi_active integer, created_by bigint, created_date timestamp without time zone DEFAULT CURRENT_TIMESTAMP, last_modified_by bigint, last_modified_date date );
you can see, that there is a field called “pfi_xml XML” to maintain the XML string value for the process in the above table.
Note: When we are managing blob-type data in a DB field, it may directly impact your system performance. As a precaution, I kept this data only for the process and after the process completion removed it from the field. The original file uploaded is saved in a file location.
How to Retrieve data from the XML field
Sample 1
In an XML file, we know there are lots of hierarchical data available. To retrieve the relevant data, what we have to do is,
- First, need to convert this file into a Table
SELECT "LastSaved" FROM bia_t_project_file_info, XMLTABLE('//Project' PASSING pfi_xml COLUMNS "LastSaved" timestamp));
How the above code works:
- We need to get the “last save” date from the XML that we saved in the field “pfi_xml“.
- for that, we need to access the table “bia_t_project_file_info” table (Which contains the “pfi_xml” field).
- We can convert the XML blob field into a table by using the XMLTABLE() method
- More about XMLTABLE function : https://www.postgresql.org/docs/current/functions-xml.html
- To access a particular field, we must give the level of the XML file. to that, we used “//project” as the “LastSaved” value is in the root element of the XML file.
- Then we pass the Column name(s) with its data type as we need. In our example, LastSaved is a date-time field.
- Now you can understand how the below code is working
FROM bia_t_project_file_info, XMLTABLE('//Project' PASSING pfi_xml COLUMNS "LastSaved" timestamp));
- Take the “pfi_xml” field from the table “bia_t_project_file_info”
- Access the “pfi_xml” field and convert it to the XML table
- Retrieve the fields by giving the XML element path
Sample 2:
SELECT xmltable.* FROM transactions."bia_t_project_file_info", XMLTABLE('//Project/Tasks/Task' PASSING pfi_xml COLUMNS "UID" text, "GUID" text, "ID" text, "Name" text, "Active" text, "Manual" text, "Type" text, "IsNull" text, "CreateDate" timestamp);
- The above sample will retrieve the data from the same field “pfi_xml” from the table “bia_t_project_file_info“.
- This time it takes the data from the project tasks which are located in “//project/Tasks/task” level.
- and by referring to the xmltable.* it is accessible to all data we defined at the XMLTABLE() function.
Sample of the XML file
<?xml version="1.0" standalone="yes"?> <Project> <SaveVersion>14<SaveVersion> <BuildNumber>16.0.5044.1000</BuildNumber> <Name>20200918 (Submission) OCTS ver. 2020.08.09 Rev.01-2.xml</Name> <GUID>A6665F30-1EFA-EA11-9455-68ECC58EE175</GUID> <Title>xxxxx xxxx</Title> <CreationDate>2020-08-17T08:00:00</CreationDate> <LastSaved>2020-09-30T17:49:00</LastSaved> <ScheduleFromStart>1</ScheduleFromStart> <StartDate>2020-08-09T08:00:00</StartDate> <FinishDate>2020-10-31T17:00:00</FinishDate> <FYStartDate>1</FYStartDate> <Tasks> <Task> <UID>0</UID> <GUID>A7665F30-1EFA-EA11-9455-68ECC58EE175</GUID> <ID>0</ID> <Name>Bandaranaike International Airport Develoment Project Phase II Stage 2 - Package B</Name> <Active>1</Active> <Manual>0</Manual> <Type>1</Type> <IsNull>0</IsNull> <CreateDate>2020-08-17T07:47:00</CreateDate> <WBS>0</WBS> <OutlineNumber>0</OutlineNumber> <OutlineLevel>0</OutlineLevel> <Priority>500</Priority> <Start>2020-08-09T08:00:00</Start> <Finish>2020-10-31T17:00:00</Finish> <Duration>PT576H0M0S</Duration> <ManualStart>2020-08-09T08:00:00</ManualStart> <ManualFinish>2020-10-31T17:00:00</ManualFinish> <ManualDuration>PT576H0M0S</ManualDuration> <DurationFormat>21</DurationFormat> <Work>PT0H0M0S</Work> <Stop>2020-08-15T08:43:12</Stop> <Resume>2020-08-15T08:43:12</Resume> <ResumeValid>0</ResumeValid> <EffortDriven>0</EffortDriven> <Recurring>0</Recurring> <OverAllocated>0</OverAllocated> <Estimated>0</Estimated> <Milestone>0</Milestone> <Summary>1</Summary> <DisplayAsSummary>0</DisplayAsSummary> <Critical>1</Critical> <IsSubproject>0</IsSubproject> <IsSubprojectReadOnly>0</IsSubprojectReadOnly> <ExternalTask>0</ExternalTask> <EarlyStart>2020-08-09T08:00:00</EarlyStart> <EarlyFinish>2020-10-31T17:00:00</EarlyFinish> <LateStart>2020-08-09T08:00:00</LateStart> <LateFinish>2020-10-31T17:00:00</LateFinish> <StartVariance>0</StartVariance> <FinishVariance>0</FinishVariance> <WorkVariance>0.00</WorkVariance> <FreeSlack>0</FreeSlack> <TotalSlack>0</TotalSlack> <StartSlack>0</StartSlack> <FinishSlack>0</FinishSlack> <FixedCost>0</FixedCost> <FixedCostAccrual>3</FixedCostAccrual> <PercentComplete>44</PercentComplete> <PercentWorkComplete>0</PercentWorkComplete> <Cost>0</Cost> <OvertimeCost>0O</vertimeCost> <OvertimeWork>PT0H0M0S</OvertimeWork> <ActualStart>2020-08-09T08:00:00</ActualStart> <ActualDuration>PT256H9M24S</ActualDuration> <ActualCost>0</ActualCost> <ActualOvertimeCost>0</ActualOvertimeCost> <ActualWork>PT0H0M0S</ActualWork> <ActualOvertimeWork>PT0H0M0S</ActualOvertimeWork> <RegularWork>PT0H0M0S</RegularWork> <RemainingDuration>PT319H50M36S</RemainingDuration> <RemainingCost>0</RemainingCost> <RemainingWork>PT0H0M0S</RemainingWork> <RemainingOvertimeCost>0</RemainingOvertimeCost> <RemainingOvertimeWork>PT0H0M0S</RemainingOvertimeWork> <ACWP>0.00</ACWP> <CV>0.00</CV> <ConstraintType>0</ConstraintType> <CalendarUID>-1</CalendarUID> <LevelAssignments>1</LevelAssignments> <LevelingCanSplit>1</LevelingCanSplit> <LevelingDelay>0</LevelingDelay> <LevelingDelayFormat>8</LevelingDelayFormat> <IgnoreResourceCalendar>0</IgnoreResourceCalendar> <HideBar>0</HideBar> <Rollup>0</Rollup> <BCWS>0.00</BCWS> <BCWP>0.00</BCWP> <PhysicalPercentComplete>0</PhysicalPercentComplete> <EarnedValueMethod>0</EarnedValueMethod> <IsPublished>0</IsPublished> <CommitmentType>0</CommitmentType> <Baseline> <Number>0</Number> <Start>2020-08-09T08:00:00</Start> <Finish>2020-10-31T17:00:00</Finish> <Duration>PT576H0M0S</Duration> <Cost>7290000</Cost> </Baseline> </Task> </Tasks>
Above is a part of the code and not the full-length file. you can just download an XML file or you can
Full DB Function:
CREATE FUNCTION transactions.bia_fun_create_project_info(project_type bigint, file_name character varying, pficurrent_date integer, project_date date, project_name character varying, project_description character varying, pfixml xml, status integer DEFAULT 1, active integer DEFAULT 1, created_by bigint DEFAULT 0) RETURNS bigint LANGUAGE plpgsql AS $$ -- (bigint, character varying, integer, character varying, xml, integer, integer, bigint) declare _pfiid bigint; _LastSaved timestamp; -- Checked last saved value for verifications _reccount bigint; begin _pfiid := 0; -- drop tem table if exists DROP TABLE IF EXISTS TempTable; -- Temp table to check the project is already exists and -- also to replace thetag CREATE TEMPORARY TABLE TempTable( col1 serial not null, colxml xml ); -- Replace thetag pfixml := replace(pfixml::varchar,'' ::varchar,'' ::varchar); _reccount := 0; insert into TempTable (colxml) values (pfixml); -- Check the Last saved date is already exists with the system. If yes, -- Ignore the records insertions select count(*) into _reccount from transactions.bia_t_project_file_info btpfi inner join transactions.bia_t_project_header ph on ph.prj_pfi_id = btpfi.pfi_id where coalesce(pfi_active,0) = 1 and pfi_project_type = project_type and prj_lastsaved in ( SELECT "LastSaved" FROM TempTable, XMLTABLE('//Project' PASSING colxml COLUMNS "LastSaved" timestamp)); -- Check the record is already available accourdng to the -- previous step _pfiid for other header and details tables if (not found) or (_reccount = 0) then -- Insert the project upload history to the info table and -- return the last generated ID value for INSERT INTO transactions.bia_t_project_file_info (pfi_project_type, pfi_filename, pfi_current_date, pfi_project_date, pfi_project_name, pfi_project_description, pfi_xml, pfi_status, pfi_active, created_by) VALUES(project_type, file_name, pficurrent_date, project_date, project_name, project_description, pfixml, status, active, created_by) returning pfi_id into _pfiid; -- Get XML info related to the Project root and -- insert into the Header record with the relevent info ID (_pfiid) INSERT INTO transactions.bia_t_project_header (prj_pfi_id, prj_saveversion, prj_buildnumber, prj_name, prj_guid, prj_title, prj_creationdate, prj_lastsaved, prj_schedulefromstart, prj_startdate, prj_finishdate, prj_fystartdate, prj_criticalslacklimit, prj_currencydigits, prj_currencycode, prj_currencysymbolposition, prj_calendaruid, prj_defaultstarttime, prj_defaultfinishtime, prj_minutesperday, prj_minutesperweek, prj_dayspermonth, prj_defaulttasktype, prj_defaultfixedcostaccrual, prj_defaultstandardrate, prj_defaultovertimerate, prj_durationformat, prj_workformat, prj_editableactualcosts, prj_honorconstraints, prj_insertedprojectslikesummary, prj_multiplecriticalpaths, prj_newtaskseffortdriven, prj_newtasksestimated, prj_splitsinprogresstasks, prj_spreadactualcost, prj_spreadpercentcomplete, prj_taskupdatesresource, prj_fiscalyearstart, prj_weekstartday, prj_movecompletedendsback, prj_moveremainingstartsback, prj_moveremainingstartsforward, prj_movecompletedendsforward, prj_baselineforearnedvalue, prj_autoaddnewresourcesandtasks, prj_statusdate, prj_currentdate, prj_microsoftprojectserverurl, prj_autolink, prj_newtaskstartdate, prj_newtasksaremanual, prj_defaulttaskevmethod, prj_projectexternallyedited, prj_extendedcreationdate, prj_actualsinsync, prj_removefileproperties, prj_adminproject, prj_updatemanuallyscheduledtaskswheneditinglinks, prj_keeptaskonnearestworkingtimewhenmadeautoscheduled) SELECT _pfiid, xmltable.* FROM transactions."bia_t_project_file_info", XMLTABLE('//Project' PASSING pfi_xml COLUMNS "SaveVersion" text, "BuildNumber" text, "Name" text, "GUID" text, "Title" text, "CreationDate" timestamp, "LastSaved" timestamp, "ScheduleFromStart" text, "StartDate" timestamp, "FinishDate" timestamp, "FYStartDate" text, "CriticalSlackLimit" text, "CurrencyDigits" numeric(18,4), "CurrencyCode" text, "CurrencySymbolPosition" text, "CalendarUID" integer, "DefaultStartTime" time, "DefaultFinishTime" time, "MinutesPerDay" integer, "MinutesPerWeek" integer, "DaysPerMonth" integer, "DefaultTaskType" text, "DefaultFixedCostAccrual" numeric(18,4), "DefaultStandardRate" numeric(18,4), "DefaultOvertimeRate" numeric(18,4), "DurationFormat" text, "WorkFormat" text, "EditableActualCosts" numeric(18,4), "HonorConstraints" text, "InsertedProjectsLikeSummary" text, "MultipleCriticalPaths" text, "NewTasksEffortDriven" text, "NewTasksEstimated" integer, "SplitsInProgressTasks" text, "SpreadActualCost" numeric(18,4), "SpreadPercentComplete" numeric(18,4), "TaskUpdatesResource" text, "FiscalYearStart" integer, "WeekStartDay" integer, "MoveCompletedEndsBack" text, "MoveRemainingStartsBack" text, "MoveRemainingStartsForward" text, "MoveCompletedEndsForward" text, "BaselineForEarnedValue" numeric(18,4), "AutoAddNewResourcesAndTasks" text, "StatusDate" timestamp, "CurrentDate" timestamp, "MicrosoftProjectServerURL" text, "Autolink" text, "NewTaskStartDate" text, "NewTasksAreManual" text, "DefaultTaskEVMethod" text, "ProjectExternallyEdited" text, "ExtendedCreationDate" timestamp, "ActualsInSync" text, "RemoveFileProperties" text, "AdminProject" text, "UpdateManuallyScheduledTasksWhenEditingLinks" text, "KeepTaskOnNearestWorkingTimeWhenMadeAutoScheduled" text ) where pfi_id = _pfiid; -- Get the relevent Tasks info from the XML file and save -- it to the Tasks table with Info id (_pfiid) INSERT INTO transactions.bia_t_project_tasks (tsk_pfi_id, tsk_prj_id, tsk_uid, tsk_guid, tsk_iid, tsk_name, tsk_active, tsk_manual, tsk_type, tsk_isnull, tsk_createdate, tsk_wbs, tsk_outlinenumber, tsk_outlinelevel, tsk_priority, tsk_start, tsk_finish, tsk_duration, tsk_manualstart, tsk_manualfinish, tsk_manualduration, tsk_durationformat, tsk_work, tsk_stop, tsk_resume, tsk_resumevalid, tsk_effortdriven, tsk_recurring, tsk_overallocated, tsk_estimated, tsk_milestone, tsk_summary, tsk_displayassummary, tsk_critical, tsk_issubproject, tsk_issubprojectreadonly, tsk_externaltask, tsk_earlystart, tsk_earlyfinish, tsk_latestart, tsk_latefinish, tsk_startvariance, tsk_finishvariance, tsk_workvariance, tsk_freeslack, tsk_totalslack, tsk_startslack, tsk_finishslack, tsk_fixedcost, tsk_fixedcostaccrual, tsk_percentcomplete, tsk_percentworkcomplete, tsk_cost, tsk_overtimecost, tsk_overtimework, tsk_actualstart, tsk_actualduration, tsk_actualcost, tsk_actualovertimecost, tsk_actualwork, tsk_actualovertimework, tsk_regularwork, tsk_remainingduration, tsk_remainingcost, tsk_remainingwork, tsk_remainingovertimecost, tsk_remainingovertimework, tsk_acwp, tsk_cv, tsk_constrainttype, tsk_calendaruid, tsk_levelassignments, tsk_levelingcansplit, tsk_levelingdelay, tsk_levelingdelayformat, tsk_ignoreresourcecalendar, tsk_hidebar, tsk_rollup, tsk_bcws, tsk_bcwp, tsk_physicalpercentcomplete, tsk_earnedvaluemethod, tsk_ispublished, tsk_commitmenttype, tsk_link_list) SELECT _pfiid, cast(1 as bigint) as tsk_prj_id, xmltable.*, case when position('.' in ("WBS")) = 0 then '-1' else left("WBS",length("WBS")- position('.' in reverse("WBS"))) end as tsk_link_list FROM transactions."bia_t_project_file_info", XMLTABLE('//Project/Tasks/Task' PASSING pfi_xml COLUMNS "UID" text, "GUID" text, "ID" text, "Name" text, "Active" text, "Manual" text, "Type" text, "IsNull" text, "CreateDate" timestamp, "WBS" text, "OutlineNumber" text, "OutlineLevel" text, "Priority" text, "Start" timestamp, "Finish" timestamp, "Duration" text, "ManualStart" timestamp, "ManualFinish" timestamp, "ManualDuration" text, "DurationFormat" text, "Work" text, "Stop" timestamp, "Resume" timestamp, "ResumeValid" text, "EffortDriven" text, "Recurring" text, "OverAllocated" text, "Estimated" text, "Milestone" text, "Summary" text, "DisplayAsSummary" text, "Critical" text, "IsSubproject" text, "IsSubprojectReadOnly" text, "ExternalTask" text, "EarlyStart" timestamp, "EarlyFinish" timestamp, "LateStart" timestamp, "LateFinish" timestamp, "StartVariance" text, "FinishVariance" text, "WorkVariance" text, "FreeSlack" text, "TotalSlack" text, "StartSlack" text, "FinishSlack" text, "FixedCost" numeric(18,4), "FixedCostAccrual" numeric(18,4), "PercentComplete" numeric(18,4), "PercentWorkComplete" numeric(18,4), "Cost" numeric(18,4), "OvertimeCost" numeric(18,4), "OvertimeWork" text, "ActualStart" timestamp, "ActualDuration" text, "ActualCost" numeric(18,4), "ActualOvertimeCost" numeric(18,4), "ActualWork" text, "ActualOvertimeWork" text, "RegularWork" text, "RemainingDuration" text, "RemainingCost" numeric(18,4), "RemainingWork" text, "RemainingOvertimeCost" numeric(18,4), "RemainingOvertimeWork" text, "ACWP" numeric(18,4), "CV" numeric(18,4), "ConstraintType" text, "CalendarUID" text, "LevelAssignments" text, "LevelingCanSplit" text, "LevelingDelay" text, "LevelingDelayFormat" text, "IgnoreResourceCalendar" text, "HideBar" text, "Rollup" text, "BCWS" numeric(18,4), "BCWP" numeric(18,4), "PhysicalPercentComplete" numeric(18,4), "EarnedValueMethod" text, "IsPublished" text, "CommitmentType" text ) where pfi_id = _pfiid; else _pfiid := -2; -- Saved record already exsists end if; DROP TABLE IF EXISTS TempTable; return _pfiid;-- Returning the Info ID. If _pfiid < 0, that mean, it contain some issues end;$$;