cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Techmate
by New Contributor
  • 1591 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 1591 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos
dlevy
by New Contributor II
  • 1574 Views
  • 1 replies
  • 1 kudos
  • 1574 Views
  • 1 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I think this was added Databricks Runtime 8.2https://docs.databricks.com/release-notes/runtime/8.2.html

  • 1 kudos
alphaRomeo
by New Contributor
  • 4894 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 4894 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 2084 Views
  • 1 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 2084 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Databricks Employee
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
Alex_G
by New Contributor II
  • 2538 Views
  • 1 replies
  • 4 kudos

Resolved! Databricks Feature Store in MLFlow run CLI command

Hello!I am attempting to move some machine learning code from a databricks notebook into a mlflow git repository. I am utilizing the databricks feature store to load features that have been processed. Currently I cannot get the databricks library to ...

  • 2538 Views
  • 1 replies
  • 4 kudos
Latest Reply
sean_owen
Databricks Employee
  • 4 kudos

Hm, what error do you get? I believe you won't be able to specify the feature store library as a dependency, as it's not externally published yet, but code that uses it should run on DB ML runtimes as it already exists there

  • 4 kudos
irfanaziz
by Contributor II
  • 2282 Views
  • 2 replies
  • 3 kudos

Does anyone know why the optimize does not complete?

I feel there is some issue with a few partitions of the delta file. The optimize runs fine and completes within few minutes for other partitions but for this particular partition the optimize keeps running forever. OPTIMIZE delta.`/mnt/prod-abc/Ini...

  • 2282 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@nafri A​ - Thank you for letting us know.

  • 3 kudos
1 More Replies
User16137833804
by Databricks Employee
  • 4277 Views
  • 3 replies
  • 1 kudos
  • 4277 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sebastian
Contributor
  • 1 kudos

the best solution is to store the .whl locally and do a pip install of the local whl while server boots up. this will freeze the library version. if you install from the pip it might impact your production work.

  • 1 kudos
2 More Replies
User16856693631
by New Contributor II
  • 2146 Views
  • 1 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 2146 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos
BorislavBlagoev
by Valued Contributor III
  • 4757 Views
  • 2 replies
  • 1 kudos

DBUtils cannot find widgets [Windows 10]

I use databricks connect to connect PyCharm with databricks cluster remotely but when I try to get dbutils.widget throw an error. cluster conf: spark.databricks.service.server.enabled true spark.databricks.hive.metastore.glueCatalog.enabled true ...

  • 4757 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

This is normal behavior. databricks-connect does not support the whole dbutils class.https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutilsWidgets are not on the list.

  • 1 kudos
1 More Replies
bciampa
by New Contributor II
  • 21349 Views
  • 1 replies
  • 1 kudos

Unable to infer schema for Parquet at

I have this code in a notebook:val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content")) import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = stre...

  • 21349 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.

  • 1 kudos
cconnell
by Contributor II
  • 742 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 742 Views
  • 0 replies
  • 1 kudos
mokshaessential
by New Contributor
  • 523 Views
  • 0 replies
  • 0 kudos

mokshaessentials.com

Mokshaessentials is one of the best essential oils providers that provides 100 % pure & natural essential oils.Also, buy essential oils only @Mokshaessentials.Web:- https://mokshaessentials.com/#naturalessentialoils,#buyessentialoils, #bestessentialo...

  • 523 Views
  • 0 replies
  • 0 kudos
haseebkhan1421
by New Contributor
  • 2957 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 2957 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels