python - Regular expression to split a huge string into multiple sets of key-value pairs -


i've huge string contains many sets, each separated ,. each set has key-value pairs in it, each pair separated &.

here small example,

tag=43&id=8787&type=video/webm;+codecs="vp8.0,+vorbis"&quality=medium,type=video/webm;+codecs="vp8.0,+vorbis"&quality=medium&tag=172&id=8978,tag=41&type=video/webm;+codecs="vp8.0,+vorbis"&id=1738&quality=medium 

this string has following sets (3 sets, each separated ,):

tag=43&id=8787&type=video/webm;+codecs="vp8.0,+vorbis"&quality=medium  type=video/webm;+codecs="vp8.0,+vorbis"&quality=medium&tag=172&id=8978  tag=41&type=video/webm;+codecs="vp8.0,+vorbis"&id=1738&quality=medium 

i want write regular expression split original strings sets of key-value pairs. tried this,

sets = huge_string.split(',') 

but not work, there comma inside one key-value pair also:

type=video/webm;+codecs="vp8.0,+vorbis" # <--- causing problem! 

here , causing problem.

how write regular expression accomplish task? i'm using python 3.3.1.

now don't know how many pairs there, , in order.

this how parse response youtube api:

# content str stores content of link query = urllib.parse.parse_qs(content)  fullurls = query['url_encoded_fmt_stream_map'][0].split(',') data = [urllib.parse.parse_qs(i) in fullurls] print(data) 

this output list of dict stores information of each of links. of course, code above demonstration of concept. assumptions should cut down , checks should added in actual code.

the youtube api returns response of mime type application/x-www-form-urlencoded, need use urllib.parse.parse_qs decode it.

the url_encoded_fmt_stream_map key contains value comma-separated list of url encoded strings, need split along commas , , decode each of tokens urllib.parse.parse_qs. there no worry commas in codecs description, since url encoded, not interfere splitting.


Comments