by
MattM
• New Contributor III
- 1883 Views
- 1 replies
- 2 kudos
We are ingesting our data from ADLS into databricks as delta table. After raw layer we need to refer to a control\mapping layer which defines certain logic\measure definition. This would be incorporated in the subsequent silver or gold layer. This co...
- 1883 Views
- 1 replies
- 2 kudos
Latest Reply
MattM
New Contributor III
Thanks for your response. Can business user without the help of any script modify any rows in the table after loading it onetime from CSV fiels?
- 3646 Views
- 1 replies
- 32 kudos
Databricks Roadmap AzureThere are a lot of excitement new features coming in 2022. I tried to put them all on one list:Unity catalog (seems that it will exists next to hive metastore and it will be possible to migrate)Control metastore, unity creatio...
- 3646 Views
- 1 replies
- 32 kudos
- 1902 Views
- 1 replies
- 7 kudos
Thanks to everyone who joined the Hassle-Free Data Ingestion webinar. You can access the on-demand recording here. We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find Ingestion Q&A listed first, f...
- 1902 Views
- 1 replies
- 7 kudos
Latest Reply
Check out Part 2 of this Data Ingestion webinar to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.
- 1169 Views
- 1 replies
- 1 kudos
AutoML presumably tries a few different algorithms while hyperparameter searching. What model types are considered?
- 1169 Views
- 1 replies
- 1 kudos
Latest Reply
At the moment, it's really just xgboost, and sklearn implemenations like random forests, logistic regression, and linear regression as applicable. More possibilities are coming.
- 1233 Views
- 1 replies
- 0 kudos
I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...
- 1233 Views
- 1 replies
- 0 kudos
Latest Reply
It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...
- 2094 Views
- 1 replies
- 0 kudos
.where((col('state')==state) & (col('month')>startmonth)I can do the where conditions both ways. I think the one below add readability. Is there any other difference and which is the best?.where(col('state')==state).where(col('month')>startmonth)
- 2094 Views
- 1 replies
- 0 kudos
Latest Reply
You can use explain to see what type of physical and logical plans are getting created . This is the best way to see difference , but as mentioned in the question , it should give the same physical plan
- 2636 Views
- 2 replies
- 0 kudos
How do you do deploy a model in Databricks.
- 2636 Views
- 2 replies
- 0 kudos
Latest Reply
The following resources provide more detail on this:Databricks model registry example notebook: https://docs.databricks.com/_static/notebooks/mlflow/mlflow-model-registry-example.htmlDatabricks model lifecycle - https://docs.databricks.com/applicatio...
1 More Replies