cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jsaddam28
by New Contributor III
  • 44257 Views
  • 24 replies
  • 15 kudos

How to import local python file in notebook?

for example I have one.py and two.py in databricks and I want to use one of the module from one.py in two.py. Usually I do this in my local machine by import statement like below two.py__ from one import module1 . . . How to do this in databricks???...

  • 44257 Views
  • 24 replies
  • 15 kudos
Latest Reply
StephanieAlba
Valued Contributor III
  • 15 kudos

USE REPOS! Repos is able to call a function that is in a file in the same Github repo as long as Files is enabled in the admin panel.So if I have utils.py with:import pandas as pd   def clean_data(): # Load wine data data = pd.read_csv("/dbfs/da...

  • 15 kudos
23 More Replies
Sam
by New Contributor III
  • 3076 Views
  • 2 replies
  • 1 kudos

Resolved! Query Pushdown in Snowflake

Hi,I am wondering what documentation exists on Query Pushdown in Snowflake.I noticed that a single function (monitonically_increasing_id()) prevented the entire query being pushed down to Snowflake during an ETL process. Is Pushdown coming from the S...

  • 3076 Views
  • 2 replies
  • 1 kudos
Latest Reply
siddhathPanchal
New Contributor III
  • 1 kudos

Hi Sam,The Spark Connector applies predicate and query pushdown by capturing and analyzing the Spark logical plans for SQL operations. When the data source is Snowflake, the operations are translated into a SQL query and then executed in Snowflake to...

  • 1 kudos
1 More Replies
MarcoCaviezel
by New Contributor III
  • 4642 Views
  • 6 replies
  • 3 kudos

Resolved! Use Spot Instances with Azure Data Factory Linked Service

In my pipeline I'm using Azure Data Factory to trigger Databricks notebooks as a linked serviceI want to use spot instances for my job clusters Is there a way to achieve this?I didn't find a way to do this in the GUI.Thanks for your help!Marco

  • 4642 Views
  • 6 replies
  • 3 kudos
Latest Reply
MarcoCaviezel
New Contributor III
  • 3 kudos

Hi @Werner Stinckens​ ,Just a quick follow up question.Does it make sense to you that you can select the following options in Azure Data Factory?To my understanding, "cluster version", "Python Version" and the "Worker options" are defined when I crea...

  • 3 kudos
5 More Replies
Maverick1
by Valued Contributor II
  • 3750 Views
  • 8 replies
  • 14 kudos

Resolved! Real-time model serving and monitoring on Databricks at scale

How to deploy real-time model on databricks at scale? Right now, The model serving is very limited to 20 requests per second. Also, There are no model monitoring framework/graphs like the one's provided with AzureML or Sagemaker frameworks.

  • 3750 Views
  • 8 replies
  • 14 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 14 kudos

I believe the next update to serving will include 1, not 2 (this is still within a Databricks workspace in a region). I don't think multi-model endpoints are on the roadmap next.How does Airflow integration relate?

  • 14 kudos
7 More Replies
Artem_Yevtushen
by New Contributor III
  • 1544 Views
  • 1 replies
  • 4 kudos

Embed Google Slides (PowerPoint) into Databricks Interactive Notebooks Use the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg4...

Embed Google Slides (PowerPoint) into Databricks Interactive NotebooksUse the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg42MtXqGd68gffP-Y16CR59c' slide_number = 'id.p9'   displayHTML(f''' <iframe src="https://docs.google.com/...

8F9761FE-B986-48EB-8461-0AAEA891DEDB_4_5005_c
  • 1544 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Artem Yevtushenko​ - Thank you for sharing this solution.

  • 4 kudos
Kotofosonline
by New Contributor III
  • 4418 Views
  • 3 replies
  • 3 kudos

Resolved! Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by ...

  • 4418 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 3 kudos

The code from above is worked in both cases.

  • 3 kudos
2 More Replies
krishnachaitany
by New Contributor II
  • 4173 Views
  • 3 replies
  • 4 kudos

Resolved! Spot instance in Azure Databricks

When I run a job enabling using spot instances , I would like to know how many number of workers are using spot and how many number of workers are using on demand instances for a given job run In order to identify the spot instances we got for any...

  • 4173 Views
  • 3 replies
  • 4 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 4 kudos

You can. do it on Azure Portal using the Virtual Machines list:You will filter by either JobId tag or RunName tag (job name) and group by azure spot eviction policy or azure spot eviction type, the vm's under Stop/Deallocate and Capacity (using the 2...

  • 4 kudos
2 More Replies
amichel
by New Contributor III
  • 2679 Views
  • 3 replies
  • 4 kudos

Resolved! Is there a stable, ideally official JMS/ActiveMQ connector for Spark?

We're delivering pipelines that are mostly based on Databricks Spark Streaming, Delta Lake and Azure Event Hubs, and there's a requirement to integrate with AMQ/JMS endpoints (Request and Response queues in ActiveMQ).Is there a proven way to integrat...

  • 2679 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@amichel We have a feature request to add structured streaming support for Tibco EMS and JMS. Unfortunately, it's yet to be prioritized for the roadmap. I would request to file a feature request in our ideas portal https://ideas.databricks.com/ideas/...

  • 4 kudos
2 More Replies
vasanthvk
by New Contributor III
  • 7240 Views
  • 7 replies
  • 3 kudos

Resolved! Is there a way to automate Table creation in Databricks SQL based on a ADLS storage location which contains multiple Parquet files?

We have ADLS container location which contains several (100+) different data subjects folders which contain Parquet files with partition column and we want to expose each of the data subject folder as a table in Databricks SQL. Is there any way to au...

  • 7240 Views
  • 7 replies
  • 3 kudos
Latest Reply
User16857282152
Contributor
  • 3 kudos

Updating dazfuller suggestion, but including code for one level of partitioning, of course if you have deeper partitions then you will have to make a function and do a recursive call to get to the final directory containing parquet files. Parquet wil...

  • 3 kudos
6 More Replies
MartinB
by Contributor III
  • 8541 Views
  • 4 replies
  • 3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Example_SCD2
  • 8541 Views
  • 4 replies
  • 3 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 3 kudos

Currently, out of bound timestamps are not supported in pyArrow/pandas. Please refer to the below associated JIRA issue. https://issues.apache.org/jira/browse/ARROW-5359?focusedCommentId=17104355&page=com.atlassian.jira.plugin.system.issuetabpanels%3...

  • 3 kudos
3 More Replies
User16783852686
by New Contributor II
  • 2813 Views
  • 5 replies
  • 2 kudos

Resolved! Slow first time run, jar based jobs

When running a jar-based job, I've noticed that the 1st run always takes the extra time to complete the job and consecutive runs take less time to finish the job. This behavior is reproducible on an interactive cluster. What's causing this? Is this e...

  • 2813 Views
  • 5 replies
  • 2 kudos
Latest Reply
User16783852686
New Contributor II
  • 2 kudos

@Sandeep Katta​ , this is a fat jar that does read-transform-write. @DD Sharma​  response matches @Werner Stinckens​  & I intuition that there was efficiency on the second job due to jar already being loaded. I would not have noticed this had job run...

  • 2 kudos
4 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2135 Views
  • 4 replies
  • 7 kudos

Resolved! Visualization of Structured Streaming in job.

Does Databricks have feature or good pattern to visualize the data from Structured Streaming? Something like display in the notebook.

  • 2135 Views
  • 4 replies
  • 7 kudos
Latest Reply
BorislavBlagoev
Valued Contributor III
  • 7 kudos

I didn't know about that. Thanks!

  • 7 kudos
3 More Replies
User16752246002
by New Contributor II
  • 1726 Views
  • 2 replies
  • 6 kudos

Resolved! New Bill Inmon Book, What are your thoughts?

Have you checked out the new Bill Inmon Book, Building the Data Lakehouse? https://dbricks.co/3uxCXjO What were your thoughts if you read it?

  • 1726 Views
  • 2 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

The quality of the book depends on the audience IMO. For people who have no background in data warehousing it will be interesting to read. For the others the book is too general and descriptive. The 'HOW do you do x' is missing.

  • 6 kudos
1 More Replies
FMendez
by New Contributor III
  • 11911 Views
  • 4 replies
  • 7 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

  • 11911 Views
  • 4 replies
  • 7 kudos
Latest Reply
User16753724663
Valued Contributor
  • 7 kudos

Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

  • 7 kudos
3 More Replies
del1000
by New Contributor III
  • 17022 Views
  • 6 replies
  • 3 kudos

Resolved! Is it possible to passthrough job's parameters to variable?

Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...

  • 17022 Views
  • 6 replies
  • 3 kudos
Latest Reply
del1000
New Contributor III
  • 3 kudos

@Balbir Singh​ , I'm newbie in Databricks but the manual says you can use a python cell and transfer variables to scala's cell by temp tables.https://docs.databricks.com/notebooks/notebook-workflows.html#pass-structured-data

  • 3 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels