I l@ve RuBoard |
17.20 Module: Parsing a String into a Date/Time Object PortablyCredit: Brett Cannon Python's time module supplies the parsing function strptime only on some platforms, and not on Windows. Example 17-2 shows a strptime function that is a pure Python implementation of the time.strptime function that comes with Python. It is similar to how time.strptime is documented in the standard Python documentation. It accepts two more optional arguments, as shown in the following signature: strptime(string, format="%a %b %d %H:%M:%S %Y", option=AS_IS, locale_setting=ENGLISH) option's default value of AS_IS gets time information from the string, without any checking or filling-in. You can pass option as CHECK, so that the function makes sure that whatever information it gets is within reasonable ranges (raising an exception otherwise), or FILL_IN (like CHECK, but also tries to fill in any missing information that can be computed). locale_setting accepts a locale tuple (as created by LocaleAssembly) to specify names of days, months, and so on. Currently, ENGLISH and SWEDISH locale tuples are built into this recipe's strptime module. Although this recipe's strptime cannot be as fast as the version in the standard Python library, that's hardly ever a major consideration for typical strptime use. This recipe does offer two substantial advantages. It runs on any platform supporting Python and gives perfectly identical results on different platforms, while time.strptime exists only on some platforms and tends to have different quirks on each platform that supplies it. The optional checking and filling-in of information that this recipe provides is also quite handy. The locale-setting support of this version of strptime was inspired by that in Andrew Markebo's own strptime, which you can find at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py. However, this recipe has a more complete implementation of strptime's specification that is based on regular expressions, rather than relying on whitespace and miscellaneous characters to split strings. For example, this recipe can correctly parse strings based on a format such as "%Y%m%d". Example 17-2. Parsing a string into a date/time object portably""" A pure-Python version of strptime. As close as possible to time.strptime's specs in the official Python docs. Locales supported via LocaleAssembly -- examples supplied for English and Swedish, follow the examples to add your own locales. Thanks to Andrew Markebo for his pure Python version of strptime, which convinced me to improve locale support -- and, of course, to Guido van Rossum and all other contributors to Python, the best language I've ever used! """ import re from exceptions import Exception _ _all_ _ = ['strptime', 'AS_IS', 'CHECK', 'FILL_IN', 'LocaleAssembly', 'ENGLISH', 'SWEDISH'] # metadata module _ _author_ _ = 'Brett Cannon' _ _email_ _ = 'drifty@bigfoot.com' _ _version_ _ = '1.5cb' _ _url_ _ = 'http://www.drifty.org/' # global settings and parameter constants CENTURY = 2000 AS_IS = 'AS_IS' CHECK = 'CHECK' FILL_IN = 'FILL_IN' def LocaleAssembly(DirectiveDict, MonthDict, DayDict, am_pmTuple): """ Creates locale tuple for use by strptime. Accepts arguments dictionaries DirectiveDict (locale-specific regexes for extracting info from time strings), MonthDict (locale-specific full and abbreviated month names), DayDict (locale-specific full and abbreviated weekday names), and the am_pmTuple tuple (locale-specific valid representations of AM and PM, as a two-item tuple). Look at how the ENGLISH dictionary is created for an example; make sure your dictionary has values corresponding to each entry in the ENGLISH dictionary. You can override any value in the BasicDict with an entry in DirectiveDict. """ BasicDict={'%d':r'(?P<d>[0-3]\d)', # Day of the month [01,31] '%H':r'(?P<H>[0-2]\d)', # Hour (24-h) [00,23] '%I':r'(?P<I>[01]\d)', # Hour (12-h) [01,12] '%j':r'(?P<j>[0-3]\d\d)', # Day of the year [001,366] '%m':r'(?P<m>[01]\d)', # Month [01,12] '%M':r'(?P<M>[0-5]\d)', # Minute [00,59] '%S':r'(?P<S>[0-6]\d)', # Second [00,61] '%U':r'(?P<U>[0-5]\d)', # Week in the year, Sunday first [00,53] '%w':r'(?P<w>[0-6])', # Weekday [0(Sunday),6] '%W':r'(?P<W>[0-5]\d)', # Week in the year, Monday first [00,53] '%y':r'(?P<y>\d\d)', # Year without century [00,99] '%Y':r'(?P<Y>\d\d\d\d)', # Year with century '%Z':r'(?P<Z>(\D+ Time)|([\S\D]{3,3}))', # Timezone name or empty '%%':r'(?P<percent>%)' # Literal "%" (ignored, in the end) } BasicDict.update(DirectiveDict) return BasicDict, MonthDict, DayDict, am_pmTuple # helper function to build locales' month and day dictionaries def _enum_with_abvs(start, *names): result = {} for i in range(len(names)): result[names[i]] = result[names[i][:3]] = i+start return result """ Built-in locales """ ENGLISH_Lang = ( {'%a':r'(?P<a>[^\s\d]{3,3})', # Abbreviated weekday name '%A':r'(?P<A>[^\s\d]{6,9})', # Full weekday name '%b':r'(?P<b>[^\s\d]{3,3})', # Abbreviated month name '%B':r'(?P<B>[^\s\d]{3,9})', # Full month name # Appropriate date and time representation. '%c':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d) ' r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)', '%p':r'(?P<p>(a|A|p|P)(m|M))', # Equivalent of either AM or PM # Appropriate date representation '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)', # Appropriate time representation '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'}, _enum_with_abvs(1, 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'), _enum_with_abvs(0, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'), (('am','AM'),('pm','PM')) ) ENGLISH = LocaleAssembly(*ENGLISH_Lang) SWEDISH_Lang = ( {'%a':r'(?P<a>[^\s\d]{3,3})', '%A':r'(?P<A>[^\s\d]{6,7})', '%b':r'(?P<b>[^\s\d]{3,3})', '%B':r'(?P<B>[^\s\d]{3,8})', '%c':r'(?P<a>[^\s\d]{3,3}) (?P<d>[0-3]\d) ' r'(?P<b>[^\s\d]{3,3}) (?P<Y>\d\d\d\d) ' r'(?P<H>[0-2]\d):(?P<M>[0-5]\d):(?P<S>[0-6]\d)', '%p':r'(?P<p>(a|A|p|P)(m|M))', '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)', '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'}, _enum_with_abvs(1, 'Januari', 'Februari', 'Mars', 'April', 'Maj', 'Juni', 'Juli', 'Augusti', 'September', 'Oktober', 'November', 'December'), _enum_with_abvs(0, 'Måndag', 'Tisdag', 'Onsdag', 'Torsdag', 'Fredag', 'Lördag', 'Söndag'), (('am','AM'),('pm','PM')) ) SWEDISH = LocaleAssembly(*SWEDISH_Lang) class StrptimeError(Exception): """ Exception class for the module """ def _ _init_ _(self, args=None): self.args = args def _g2j(y, m, d): """ Gregorian-to-Julian utility function, used by _StrpObj """ a = (14-m)/12 y = y+4800-a m = m+12*a-3 return d+((153*m+2)/5)+365*y+y/4-y/100+y/400-32045 class _StrpObj: """ An object with basic time-manipulation methods """ def _ _init_ _(self, year=None, month=None, day=None, hour=None, minute=None, second=None, day_week=None, julian_date=None, daylight=None): """ Sets up instances variables. All values can be set at initialization. Any info left out is automatically set to None. """ def _set_vars(_adict, **kwds): _adict.update(kwds) _set_vars(self._ _dict_ _, **vars( )) def julianFirst(self): """ Calculates the Julian date for the first day of year self.year """ return _g2j(self.year, 1, 1) def gregToJulian(self): """ Converts the Gregorian date to day within year (Jan 1 == 1) """ julian_day = _g2j(self.year, self.month, self.day) return julian_day-self.julianFirst( )+1 def julianToGreg(self): """ Converts the Julian date to the Gregorian date """ julian_day = self.julian_date+self.julianFirst( )-1 a = julian_day+32044 b = (4*a+3)/146097 c = a-((146097*b)/4) d = (4*c+3)/1461 e = c-((1461*d)/4) m = (5*e+2)/153 day = e-((153*m+2)/5)+1 month = m+3-12*(m/10) year = 100*b+d-4800+(m/10) return year, month, day def dayWeek(self): """ Figures out the day of the week using self.year, self.month, and self.day. Monday is 0. """ a = (14-self.month)/12 y = self.year-a m = self.month+12*a-2 day_week = (self.day+y+(y/4)-(y/100)+(y/400)+((31*m)/12))%7 if day_week==0: day_week = 6 else: day_week = day_week-1 return day_week def FillInInfo(self): """ Based on the current time information, it figures out what other info can be filled in. """ if self.julian_date is None and self.year and self.month and self.day: julian_date = self.gregToJulian( ) self.julian_date = julian_date if (self.month is None or self.day is None ) and self.year and self.julian_date: gregorian = self.julianToGreg( ) self.month = gregorian[1] # year ignored, must already be okay self.day = gregorian[2] if self.day_week is None and self.year and self.month and self.day: self.dayWeek( ) def CheckIntegrity(self): """ Checks info integrity based on the range that a number can be. Any invalid info raises StrptimeError. """ def _check(value, low, high, name): if value is not None and not low<value<high: raise StrptimeError, "%s incorrect"%name _check(self.month, 1, 12, 'Month') _check(self.day, 1, 31, 'Day') _check(self.hour, 0, 23, 'Hour') _check(self.minute, 0, 59, 'Minute') _check(self.second, 0, 61, 'Second') # 61 covers leap seconds _check(self.day_week, 0, 6, 'Day of the Week') _check(self.julian_date, 0, 366, 'Julian Date') _check(self.daylight, -1, 1, 'Daylight Savings') def return_time(self): """ Returns a tuple of numbers in the format used by time.gmtime( ). All instances of None in the information are replaced with 0. """ temp_time = (self.year, self.month, self.day, self.hour, self.minute, self.second, self.day_week, self.julian_date, self.daylight) return tuple([t or 0 for t in temp_time]) def RECreation(self, format, DIRECTIVEDict): """ Creates re based on format string and DIRECTIVEDict """ Directive = 0 REString = [] for char in format: if char=='%' and not Directive: Directive = 1 elif Directive: try: REString.append(DIRECTIVEDict['%'+char]) except KeyError: raise StrptimeError,"Invalid format %s"%char Directive = 0 else: REString.append(char) return re.compile(''.join(REString), re.IGNORECASE) def convert(self, string, format, locale_setting): """ Gets time info from string based on format string and a locale created by LocaleAssembly( ) """ DIRECTIVEDict, MONTHDict, DAYDict, AM_PM = locale_setting REComp = self.RECreation(format, DIRECTIVEDict) reobj = REComp.match(string) if reobj is None: raise StrptimeError,"Invalid string (%s)"%string for found in reobj.groupdict().keys( ): if found in 'y','Y': # year if found=='y': # without century self.year = CENTURY+int(reobj.group('y')) else: # with century self.year = int(reobj.group('Y')) elif found in 'b','B','m': # month if found=='m': # month number self.month = int(reobj.group(found)) else: # month name try: self.month = MONTHDict[reobj.group(found)] except KeyError: raise StrptimeError, 'Unrecognized month' elif found=='d': # day of the month self.day = int(reobj.group(found)) elif found in 'H','I': # hour hour = int(reobj.group(found)) if found=='H': # hour number self.hour = hour else: # AM/PM format try: if reobj.group('p') in AM_PM[0]: AP = 0 else: AP = 1 except KeyError: raise StrptimeError, 'Lacking needed AM/PM information' if AP: if hour==12: self.hour = 12 else: self.hour = 12+hour else: if hour==12: self.hour = 0 else: self.hour = hour elif found=='M': # minute self.minute = int(reobj.group(found)) elif found=='S': # second self.second = int(reobj.group(found)) elif found in 'a','A','w': # Day of the week if found=='w': # DOW number day_value = int(reobj.group(found)) if day_value==0: self.day_week = 6 else: self.day_week = day_value-1 else: # DOW name try: self.day_week = DAYDict[reobj.group(found)] except KeyError: raise StrptimeError, 'Unrecognized day' elif found=='j': # Julian date self.julian_date = int(reobj.group(found)) elif found=='Z': # daylight savings TZ = reobj.group(found) if len(TZ)==3: if TZ[1] in ('D','d'): self.daylight = 1 else: self.daylight = 0 elif TZ.find('Daylight')!=-1: self.daylight = 1 else: self.daylight = 0 def strptime(string, format='%a %b %d %H:%M:%S %Y', option=AS_IS, locale_setting=ENGLISH): """ Returns a tuple representing the time represented in 'string'. Valid values for 'options' are AS_IS, CHECK, and FILL_IN. 'locale_setting' accepts locale tuples created by LocaleAssembly( ). """ Obj = _StrpObj( ) Obj.convert(string, format, locale_setting) if option in FILL_IN,CHECK: Obj.CheckIntegrity( ) if option == FILL_IN: Obj.FillInInfo( ) return Obj.return_time( ) 17.20.1 See AlsoThe most up-to-date version of strptime is always available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime, where you will also find a test suite using PyUnit; Andrew Makebo's version of strptime is at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py. |
I l@ve RuBoard |