Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
I have a number of csv files that I am working to ingest using autoloader. There is an ID field that I want to require to be a STRING, but using SchemaHints is not working and is instead setting as an INT.The first few csv files have just integer va...
Hi @Jennette Shepard We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards
In my ETL case, I want to be able to adjust the table schema as needed, meaning the number of columns may increase or decrease depending on the ETL script. Additionally, I would like to use dynamic partition overwrite to avoid potential errors when u...
Hi @Thanapat Sontayasara,Does @Werner Stinckens's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? If not, would you be happy to give us more information?Thanks!
Hello guys,I'm using Jira API to return "ISSUES". But to be able to use pyspark I need to create the Dataframe passing in the Schema. But I am not able to create the Schema based on the model below. Would you have any ideas?root
|-- expand: string ...
if columns are missing, that particular data is not present in the json. I am not aware of spark skipping columns when reading json with inferschema. There is an option dropFieldIfAllNull but that is False by default.That makes me think: you might ...
How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table? All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings. Running the pipeline gives me an error about in...
After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:@dlt.view(
comment=""
)
def vw_raw():
return (
spark.readStream.format("cloudF...