cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

207474
by New Contributor
  • 1698 Views
  • 3 replies
  • 2 kudos

How do I get the total number of queries run per day on a databricks SQL warehouse/endpoint?

I am trying to access the API: GET https://<databricks-instance>.cloud.databricks.com/api/2.0/sql/history/queries

  • 1698 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hi there @Sravan Burla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 2 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 937 Views
  • 0 replies
  • 0 kudos

Delta Live Tables Example Questions

I am testing with some examples of Delta Live Tables from https://github.com/databricks/delta-live-tables-notebooks/tree/main/divvy-bike-demoI have ran all the relevant files of ingestion:python-weatherinfo-api-ingest.pypython-divvybike-api-ingest-st...

THIAM_HUATTAN_0-1689301274037.png
  • 937 Views
  • 0 replies
  • 0 kudos
Nick_Hughes
by New Contributor III
  • 6670 Views
  • 3 replies
  • 1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

  • 6670 Views
  • 3 replies
  • 1 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 1 kudos

Hi @Nick_Hughes This may be late for your scenario - but hopefully others facing similar issues will find it useful.You can specify how data is generated in `dbldatagen` using rules in the data generation spec. If rules are specified for data generat...

  • 1 kudos
2 More Replies
Raghav2
by New Contributor
  • 6544 Views
  • 1 replies
  • 0 kudos

AnalysisException: [COLUMN_ALREADY_EXISTS] The column `<col>` already exists. Consider to choose an

Hey Guys,          I'm facing this exception while trying to read public s3 bucket "Analysis Exception: [COLUMN_ALREADY_EXISTS] The column `<column name>` already exists. Consider to choose another name or rename the existing column.",also thing is I...

  • 6544 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

You can use dbutils to read the file.%fshead <s3 path>

  • 0 kudos
kll
by New Contributor III
  • 12252 Views
  • 4 replies
  • 0 kudos

PythonException: TypeError: float() argument must be a string or a number, not 'NoneType'

I get an PythonException: float() argument must be a string or a number, not 'NoneType' when attempting to save a DataFrame as a Delta Table. Here's the line of code that I am running:```df.write.format("delta").saveAsTable("schema1.df_table", mode="...

  • 12252 Views
  • 4 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

Even though the code throws the issue while write, the issue can be in the code before as spark is lazily evaluated. The error "TypeError: float() argument must be a string or a number, not 'NoneType'" generally comes when we pass a variable to float...

  • 0 kudos
3 More Replies
erigaud
by Honored Contributor
  • 5691 Views
  • 4 replies
  • 6 kudos

Resolved! Save to parquet with fixed size

I have a large dataframe (>1TB) I have to save in parquet format (not delta for this use case). When I save the dataframe using .format("parquet") it results in several parquet files. I want these files to be a specific size (ie not larger than 500Mb...

  • 5691 Views
  • 4 replies
  • 6 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 6 kudos

In addition to the solutions provided above, we can also control the behavior by specifying maximum records per file if we have a rough estimate of how many records should be written to a file to reach 500 MB size.df.write.option("maxRecordsPerFile",...

  • 6 kudos
3 More Replies
kll
by New Contributor III
  • 6096 Views
  • 5 replies
  • 0 kudos

AnalysisException : when attempting to save a spark DataFrame as delta table

I get an, `AnalysisException Failed to merge incompatible data types LongType and StringTypewhen attempting to run the below command, `df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite")` I am casting the column before saving:...

  • 6096 Views
  • 5 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

The issue seems to be because the job is trying to merge columns with different schema. Could you please make sure that the schema matches for the columns.

  • 0 kudos
4 More Replies
alexisjohnson
by New Contributor III
  • 9238 Views
  • 7 replies
  • 6 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Screen Shot 2021-11-18 at 12.48.25 PM Screen Shot 2021-11-18 at 12.48.32 PM
  • 9238 Views
  • 7 replies
  • 6 kudos
Latest Reply
Carv
Visitor II
  • 6 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

  • 6 kudos
6 More Replies
Enzo_Bahrami
by New Contributor III
  • 578 Views
  • 0 replies
  • 0 kudos

Connect File Arrival Trigger to on-prem file server

Hello everyone!I was wondering if there is any way to connect File Arrival Trigger to an on-prem file server. Can I use JDBC or ODBC? will those connect to an on-prem file server (not a SQL server)Thank you

Data Engineering
File Arrival Trigger
  • 578 Views
  • 0 replies
  • 0 kudos
Volkan_Gumuskay
by New Contributor III
  • 5636 Views
  • 6 replies
  • 3 kudos

Resolved! Is there a way to run a single or selected lines in a notebook?

Assume we have a given cellprint('A') print('B') print('C')I want to run only the below line.print('B')Obviously, I can seperate the cell into three and run the one I want, but this is timely. This is a feature I use so often (e.g. in pycharm) and wo...

  • 5636 Views
  • 6 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 3 kudos

@Volkan_Gumuskay This is also available as an option in the notebook run options.

  • 3 kudos
5 More Replies
Hemant
by Valued Contributor II
  • 3008 Views
  • 2 replies
  • 3 kudos

Row_Num function in spark-sql

I have a doubt row_num with order by in spark-sql gives different result(non-deterministic output) every time i execute it?​It's due to parallelism in spark ?​​Any approach how to takle it?​I order by with a date column and a integer column and take...

  • 3008 Views
  • 2 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 3 kudos

@Hemant If the order by clause provided yields a unique result, then we would get deterministic output. For ex:If we create a rowID for this dataset, with CustomerID used in OrderBy clause, then depending upon the runtime, we may get non-deterministi...

  • 3 kudos
1 More Replies
alexiswl
by Contributor
  • 7609 Views
  • 3 replies
  • 0 kudos

Resolved! Merge Schema Error Message despite setting option to true

Has anyone come across this error before:```A schema mismatch detected when writing to the Delta table (Table ID: d4b9c839-af0b-4b62-aab5-1072d3a0fa9d). To enable schema migration using DataFrameWriter or DataStreamWriter, please set: '.option("merge...

  • 7609 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @alexiswl  Share the wisdom! By marking the best answers, you help others in our community find valuable information quickly and efficiently. Thanks!

  • 0 kudos
2 More Replies
Yogybricks
by New Contributor II
  • 1777 Views
  • 2 replies
  • 0 kudos
  • 1777 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Yogybricks  Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too. Cheers!

  • 0 kudos
1 More Replies
zsucic1
by New Contributor III
  • 3315 Views
  • 2 replies
  • 0 kudos

Resolved! Trigger file_arrival of job on Delta Lake table change

Is there a way to avoid having to create an external data location Simply to trigger a job when new data comes to a specific Delta Lake table?

  • 3315 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @zsucic1  Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too. Cheers!

  • 0 kudos
1 More Replies
SaraCorralLou
by New Contributor III
  • 9856 Views
  • 7 replies
  • 2 kudos

Resolved! dbutils.fs.mv - 1 folder and 1 file with the same name and only move the folder

Hello!I am contacting you because of the following problem I am having:In an ADLS folder I have two items, a folder and an automatically generated Block blob file with the same name as the folder.I want to use the dbutils.fs.mv command to move the fo...

  • 9856 Views
  • 7 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @SaraCorralLou  Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 2 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors