cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Techmate
by New Contributor
  • 930 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 930 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos
User16788316474
by New Contributor II
  • 879 Views
  • 1 replies
  • 1 kudos
  • 879 Views
  • 1 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I think this was added Databricks Runtime 8.2https://docs.databricks.com/release-notes/runtime/8.2.html

  • 1 kudos
alphaRomeo
by New Contributor
  • 2299 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 2299 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 1371 Views
  • 2 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 1371 Views
  • 2 replies
  • 0 kudos
Latest Reply
alexott
Valued Contributor II
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
1 More Replies
irfanaziz
by Contributor II
  • 1380 Views
  • 3 replies
  • 3 kudos

Does anyone know why the optimize does not complete?

I feel there is some issue with a few partitions of the delta file. The optimize runs fine and completes within few minutes for other partitions but for this particular partition the optimize keeps running forever. OPTIMIZE delta.`/mnt/prod-abc/Ini...

  • 1380 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@nafri A​ - Thank you for letting us know.

  • 3 kudos
2 More Replies
max651
by New Contributor
  • 958 Views
  • 1 replies
  • 0 kudos

How to create circles and find diameter in point cloud?

Good day, Everybody. I have a task I should create many circles in a point cloud (my file can be .csv) and calculate their diameter in the area 100m x 100 m. I have the coordinates of the starting point. The circles should be created at the height ...

0693f000007OoQoAAK
  • 958 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ max651 ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

  • 0 kudos
User16137833804
by New Contributor III
  • 2692 Views
  • 3 replies
  • 1 kudos
  • 2692 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sebastian
Contributor
  • 1 kudos

the best solution is to store the .whl locally and do a pip install of the local whl while server boots up. this will freeze the library version. if you install from the pip it might impact your production work.

  • 1 kudos
2 More Replies
User16856693631
by New Contributor II
  • 1235 Views
  • 2 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 1235 Views
  • 2 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos
1 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2819 Views
  • 3 replies
  • 1 kudos

DBUtils cannot find widgets [Windows 10]

I use databricks connect to connect PyCharm with databricks cluster remotely but when I try to get dbutils.widget throw an error. cluster conf: spark.databricks.service.server.enabled true spark.databricks.hive.metastore.glueCatalog.enabled true ...

  • 2819 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

This is normal behavior. databricks-connect does not support the whole dbutils class.https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutilsWidgets are not on the list.

  • 1 kudos
2 More Replies
bciampa
by New Contributor II
  • 18760 Views
  • 2 replies
  • 1 kudos

Unable to infer schema for Parquet at

I have this code in a notebook:val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content")) import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = stre...

  • 18760 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.

  • 1 kudos
1 More Replies
cconnell
by Contributor II
  • 386 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 386 Views
  • 0 replies
  • 1 kudos
mokshaessential
by New Contributor
  • 297 Views
  • 0 replies
  • 0 kudos

mokshaessentials.com

Mokshaessentials is one of the best essential oils providers that provides 100 % pure & natural essential oils.Also, buy essential oils only @Mokshaessentials.Web:- https://mokshaessentials.com/#naturalessentialoils,#buyessentialoils, #bestessentialo...

  • 297 Views
  • 0 replies
  • 0 kudos
VirajV
by New Contributor
  • 910 Views
  • 1 replies
  • 0 kudos

mlflow project train and validate - Control over the data used in the script?

Hi there, Trying to decide if I am going to get started with ml and really enjoyed it so far. When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train t...

0693f000007OoS1AAK
  • 910 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ VirajV! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your question first. Or else I will follow up shortly with a response.

  • 0 kudos
fabiwilys84
by New Contributor II
  • 731 Views
  • 1 replies
  • 1 kudos

Databricks spark certification

Hi guys , Is there any way to get 100 showbox% off voucher or a good discount voucher for Databricks spark certification? Currently the certifica speed testtion is very costly (200$). Any help is appreciated.

  • 731 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ fabiwilys84! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels